IT Management: Are We Chasing Yesterday’s Problem? - Part 2
Posted by Shmuel Kliger on Thu, Feb 02, 2012 @ 11:51 AM
This is the 2nd Part in my blog series IT Management: Are We Chasing Yesterday’s Problem? You can read Part 1 here. I’d love to hear your comments along the way.
PART TWO: To troubleshoot or Not to troubleshoot?
In the "old world", the pre-virtualization world, we had no choice. We had to troubleshoot. We spent many cycles pinpointing the root cause of problems and fixing them to assure the application quality of service was maintained (or restored). Troubleshooting is a highly complex and often lengthy process that requires deep domain expertise, but we had no choice. Service couldn't have been restored and performance couldn't have been assured unless the root cause was identified and resolved. Pre-virtualization, we saw 80% of the Mean Time to Repair (MTTR) was spent identifying the root cause and 20% was fixing it. Root Cause Analysis (RCA) was the problem to solve. By automating RCA and minimizing the time it took, we could help customers reduce MTTR by 80%.
Virtualization is a “game-changer”…
It used to be relatively easy, right? This “old world” was static with fewer moving parts. A single application on a single OS on a single server with attached storage. With a few (sometimes more than a few) point tools we were able to get our hands around and manage our environments. Well, virtualization changes everything. No more static boundaries and well defined interactions between the IT silos.
Ask yourself:
- Do I know where my applications are? Do I know where my virtual machines are?
- Do I know what resources they are using? Do I know how they are performing?
- Do they need more or less resources to deliver on their goals?
- Are there bottlenecks in my environment? And, if so, where are they?
And more importantly:
- Do I really know what I need to do? In the next minute? Hour? Day? Week? Month?
- Do I need to start a new VM? Stop a VM? Move a VM?
- Do I know where to start or move the VM?
- Do I need to reconfigure any of its resources? Provide more? Provide less?
- What do I need to do to address the bottlenecks?
It all really drives at the biggest question: How do I prevent resource contention and performance problems from happening in the first place?
Are today’s management tools up to the task at hand?
The good news is that despite this increased complexity, virtualization also provides much more flexibility and fluidity across today’s IT environment. However, today’s management tools do not take full advantage of this to actually improve the way we manage and control it. Instead, they focus on collecting more data across these thousands of knob and levers (metrics), alerting when thresholds are breached, and leaving you with the heavy lifting of troubleshooting, root cause analysis and (most importantly) how to fix it!
So, “No” is the obvious answer to the question I posed at the beginning. Troubleshooting and root cause analysis was the problem of the past – it fundamentally doesn’t work in a virtualized environment. There are just too many permutations to determine causation after the fact – and often times the knee-jerk reaction to deal with a point threshold breech can magnify the problem across the rest of the environment. It’s also not necessary if you actually use virtualization to Control IT in first place. That’s the game changer. More on that in my next blog…