Sunday, July 3, 2011

Book tour begins

We decided to jam in a launch of the book tour {clients and workshops} before the full European holidays take effect.  I am in Stockholm the last few days, which is nice... but my laptop crashed (fan failure) and I'm on a borrowed laptop, with a Swedish keyboard... and I am getting stressed.  I have a deadline for a cloud article (mostly done) so I decided to goof off and update the blog.  I will be in Denmark tomorrow Monday night and London on Wednesday night.  Meetings so far are good but my 'spidy sense' is tingling.  Might be jet lag... or an epiphany!  I'll get some more data (visits) and digest.

An Interview on Application Performance Management with CA Technologies

Business Management Systems June 2011

A couple typos snuck through... no worries.

They do not track comments on the article... so feel free to opine here. 

APM best practices: A conversation with author Michael J. Sydor

Appearing in SearchSoftwareQuality.com  6 May 2011 Registration required {free}

Part 1  Introductory discussion
Part 2  Discussing staffing issues
Part 3  Discussing implementation

If you have not got the book yet... this gives you some flavor of the discussion...

They do not track comments on the article, so feel free to opine here!

Saturday, March 12, 2011

Artifacts for the Book

Seems I missed an upload during the final publishing push. While I sort that out with the publisher, here are the various docs that the book refers to, at Google Docs:

https://docs.google.com/leaf?id=0B9WYLZErvx39NzZlZjNjYTMtNjYyOS00NzcwLWIwMDQtZWYxNDE0ODYxNGE4&hl=en&authkey=CPjs-tEM

Friday, February 18, 2011

Incident Tracking and APM Maturity

I've being doing more Architecture Assessments (post-deployment) than Planning Assessments lately and have noticed something troubling. Folks in general have mature incident and trouble management practices for the applications that they operate/manage. Yet they do not do any tracking for any incidents related to the APM solution. What's going on?

I believe this is due to the overall misunderstanding about APM being 'just another monitoring tool'. Some folks think APM looks like many other tools - just with better capabilities. And they treat it just the same - a tool for operations to use when there is a problem; and back on the shelf when things are quiet. This ignores the 24x7 reality of the solution. We already know that this presumption leaves significant gaps in managing the capacity of the metrics storage component.

A small APM initiative can go for years before they run out of capacity and no one is managing this until the solution becomes unstable - and then they realize they have limited understanding. A growing APM initiative will run into this problem more quickly, depending on the pace of their successive deployments. But the pace of deployment is not as significant as much as the absence or presence of incident tracking for the APM solution.

In the absence of incident tracking, the client continues on blindly experiencing instability of the monitoring solution and then will escalate to vendor support, initialling labeling everything as a 'product defect'. The vendor support will then attempt to confirm the 'defect' but after finding nothing wrong (no known incidents, no prior history of instability), you end up with a bit of an impasse: no defect, yet no resolution because the problem is the configuration, not the monitoring software.

Why then is 'incident tracking' so magical? Stability problems don't suddenly happen. They are often the result of a long, slow grind to the point where stability (or performance) is unacceptable. Something happens. No one is quite sure. Reboot a few things - and the problem seems solved! A couple of weeks later, the reboots are more frequent. After a couple of months, the reboots don't seem to have any lasting effect - and a support incident gets opened. Incident tracking captures these seemingly unrelated events and generates a larger perspective. You still may not know what to do but you can see where it is trending. You may start to look for other correlations. You end up with a much better history and timeline of when and how things started going bad - and that will be a big help once it becomes a support incident.

It also makes it easy to confirm the fix is helping, or not, as the frequency of incidents changes.

As to the nature of the instability, and the effort to re-mediate - these are turning out to be real systemic problems - so no easy fixes. Getting Incident tracking re-established for the APM solution - that's easy enough - but the damage has already been done.

How do you track the performance and capacity of your APM solution? When do you think it will "matter"?

Friday, December 17, 2010

The book just went to the printer. Pre-order on Amazon!!! Should ship before the end of the year.

http://www.amazon.com/Application-Performance-Management-Michael-Sydor/dp/1430231416/ref=sr_1_1?ie=UTF8&s=books&qid=1286980312&sr=8-1

Thursday, December 9, 2010

Some light at the end of the tunnel

Finished all writing and editing last week. This week brings the galley proofs.