Monday, August 22, 2011

Application Performance: Best Practices Do Work

This is a article discussing survey results collected by NetForecast, appearing in Business Communications Review, May 2007 - found here.  The basic theme is to survey how moving to APM Best Practices improved the overall IT management experience.  I was a bit confused by the use of "benchmarking" with APM, which I would call the APM Pilot.  But they do make a good definition of how process changes enhance the APM experience.

"The goal of APM best practices is to improve the performance outcome, and for the best outcome these best practices cannot stand alone. Each must be embedded into a continuous improvement process that ensures that application performance meets your business needs..."

They describe how "APM Benchmarking" will help an organization understand their capabilities and how they compare with other practitioners.  This is what I call the 'Skills Assessment' (Chapter 3), and with other exercises for assessing enterpise visibility, lets you understand overall maturity of your APM practice.  I created more of a idealized model, based on an amalgam of actual customer achievements.  But I never had the opportunity to count how many of each customers achieved a particular level of APM maturity.

What is interesting that among those companies that participated, here is where they 'self-assessed':


The bulk of the population rated themselves on the low side and this is consistent with what I have been seeing over the last few years.  Only 1 participant gave themselves a full rating, out of 329 total participants.  Folks really need best practices to help them progress in using APM to address performance problems and show value for the new processes that will be established.

There is also a nice chart that summarizes the types of metrics that the participants are interested in.

Again, the chart title doesn't make sense but these are the metrics that the participants are interested in.  The difference between what is important, and what is actually thresholded, confirms that visibility gaps exist and implementation maturity - which the authors define as "follow-through".  I prefer to use an APM technology stack, which implies the same metrics, but separating network, server and transaction monitoring - which tool will get you those metrics.

Another interesting graph is the impediments to realizing an APM practice.  THe authors call these "inhibitors" and it is a useful graph.

I explore these issues in Chapter 2 of my book s "entry points" and the situations which cause all of these varied ways to impede an APM initiative.




Monday, August 15, 2011

APM Best Practices Can Help Organizations Plan and Manage Cloud Migration

Here is the first installment of a short series of articles about using APM techniques to plan and realize Cloud Computing initiatives.  Appearing in"Database Trends and Applications".

Sunday, July 3, 2011

Book tour begins

We decided to jam in a launch of the book tour {clients and workshops} before the full European holidays take effect.  I am in Stockholm the last few days, which is nice... but my laptop crashed (fan failure) and I'm on a borrowed laptop, with a Swedish keyboard... and I am getting stressed.  I have a deadline for a cloud article (mostly done) so I decided to goof off and update the blog.  I will be in Denmark tomorrow Monday night and London on Wednesday night.  Meetings so far are good but my 'spidy sense' is tingling.  Might be jet lag... or an epiphany!  I'll get some more data (visits) and digest.

An Interview on Application Performance Management with CA Technologies

Business Management Systems June 2011

A couple typos snuck through... no worries.

They do not track comments on the article... so feel free to opine here. 

APM best practices: A conversation with author Michael J. Sydor

Appearing in SearchSoftwareQuality.com  6 May 2011 Registration required {free}

Part 1  Introductory discussion
Part 2  Discussing staffing issues
Part 3  Discussing implementation

If you have not got the book yet... this gives you some flavor of the discussion...

They do not track comments on the article, so feel free to opine here!

Saturday, March 12, 2011

Artifacts for the Book

Seems I missed an upload during the final publishing push. While I sort that out with the publisher, here are the various docs that the book refers to, at Google Docs:

https://docs.google.com/leaf?id=0B9WYLZErvx39NzZlZjNjYTMtNjYyOS00NzcwLWIwMDQtZWYxNDE0ODYxNGE4&hl=en&authkey=CPjs-tEM

Friday, February 18, 2011

Incident Tracking and APM Maturity

I've being doing more Architecture Assessments (post-deployment) than Planning Assessments lately and have noticed something troubling. Folks in general have mature incident and trouble management practices for the applications that they operate/manage. Yet they do not do any tracking for any incidents related to the APM solution. What's going on?

I believe this is due to the overall misunderstanding about APM being 'just another monitoring tool'. Some folks think APM looks like many other tools - just with better capabilities. And they treat it just the same - a tool for operations to use when there is a problem; and back on the shelf when things are quiet. This ignores the 24x7 reality of the solution. We already know that this presumption leaves significant gaps in managing the capacity of the metrics storage component.

A small APM initiative can go for years before they run out of capacity and no one is managing this until the solution becomes unstable - and then they realize they have limited understanding. A growing APM initiative will run into this problem more quickly, depending on the pace of their successive deployments. But the pace of deployment is not as significant as much as the absence or presence of incident tracking for the APM solution.

In the absence of incident tracking, the client continues on blindly experiencing instability of the monitoring solution and then will escalate to vendor support, initialling labeling everything as a 'product defect'. The vendor support will then attempt to confirm the 'defect' but after finding nothing wrong (no known incidents, no prior history of instability), you end up with a bit of an impasse: no defect, yet no resolution because the problem is the configuration, not the monitoring software.

Why then is 'incident tracking' so magical? Stability problems don't suddenly happen. They are often the result of a long, slow grind to the point where stability (or performance) is unacceptable. Something happens. No one is quite sure. Reboot a few things - and the problem seems solved! A couple of weeks later, the reboots are more frequent. After a couple of months, the reboots don't seem to have any lasting effect - and a support incident gets opened. Incident tracking captures these seemingly unrelated events and generates a larger perspective. You still may not know what to do but you can see where it is trending. You may start to look for other correlations. You end up with a much better history and timeline of when and how things started going bad - and that will be a big help once it becomes a support incident.

It also makes it easy to confirm the fix is helping, or not, as the frequency of incidents changes.

As to the nature of the instability, and the effort to re-mediate - these are turning out to be real systemic problems - so no easy fixes. Getting Incident tracking re-established for the APM solution - that's easy enough - but the damage has already been done.

How do you track the performance and capacity of your APM solution? When do you think it will "matter"?