I’m kicking off the New Year with a series of articles about IT Operations Analytics and specifically where I see this field heading in the near future. I’ll start by suggesting a model that illustrates of the types of analytics we use today, where we can realistically expect to get to using today’s technology and the utopian model for the future. In subsequent postings I will drill down into each and consider how to optimize them for future growth.
In the model below I map the different types of analytics that we see in the data center today, and expect to see in the future; I have placed them into the following three categories.
- Monitoring: What’s happening to systems based on performance of discrete components in the data center?
- Analytics: Based on more variables why did something happen and what’s likely to happen in the future?
- Modeling: What will happen if we…or what would have happened if we’d....?
The x-axis denotes how valuable each is to the efficient running of a company. Specifically, how can each help to reduce costs without increasing the risk of missing service level objectives. I expect this model to evolve over time as I refine the categorization but it’s a good start.
Monitoring encompasses the most well-established type of analytics – the presentation of real-time system data and basic trend analysis. Typically these systems are going to show you:
- The current performance of specific elements of your infrastructure (CPU utilization, storage availability) and potentially the performance of actual applications (transaction throughout).
- The future performance of those same elements over time should things continue as-is. I know that trending is analytics but it is so basic and ‘old hat’ that I am lumping it into monitoring!
The value of these solutions can be found in the potential prevention of system failures through trend analysis but more likely by reducing the time taken to resolve an issue should it occur. The latter is achieved by being able to see which components of the infrastructure have failed, usually from a single console.
The issue with trend analysis and real-time monitoring is that it is very limited in its capabilities. Your car has a fuel gauge and may even have a computer that will estimate “miles until empty”. The fuel gauge is real-time monitoring, the “miles until empty” is a trend analysis. These are both reliable and useful but neither will deal with all of the other environmental considerations that tell you when you’ll really run out of fuel.
Analytics brings an additional level of intelligence to the table. By applying more complex mathematical principals to a broader set of data sources, analytics allows you to better predict system behavior. I have split analytics into three distinct levels of sophistication:
- Reactive: Looking back at historical data to better understand the effect of particular events on the infrastructure. Can be useful to predict future issues – modeling what happened on last Black Friday for a retailer for example.
- Real-time: This is the promise that big data brings to the table. Being able to consider multiple, disparate, and variable data sources in order to report on more than just the behavior of components of the infrastructure brings high value. Knowing that a specific disk controller is nearing breaking point even though the storage array is currently performing well and has capacity available is critical to preventing system failures.
- Predictive: This is effectively the manifestation of trending into real analytics. Not just drawing a linear trend line based on the performance of a particular component but taking into consideration the intersection of all/many external the factors to really predict the future.
Let’s return to the car fuel analogy – imagine if the car had access to the route planned including the topography, the expected temperatures during the trip, the historical driving trends of the driver, the traffic patterns, etc.. Now you’d get a substantially more accurate prediction of your driving range. In fact, you’d even be able to determine how little gas you should put in the car to get to the destination thus reducing the weight of the car. For you as a driver this is overkill but scale this up to someone like USPS whose carriers drive 4 million miles a day, (in total, not each!) and you can imagine the savings.
Modeling simply allows you to change inputs into the predictive analytics model to establish what would happen if you changed something. This might sound trivial but I’d suggest that modeling provides a gateway to two extremely attractive features:
- Topology optimization: this is a Dilbert-esque way of saying, “don’t buy more stuff than you’ll need”. If you ask an IT hardware sales person how to ensure that their systems perform to your required SLA then their answer will be to buy 50% more than you really need. That solves two problems – it ensures you’ll have excess capacity and it ensures that they’ll be able to make their next yacht payment. However, with sophisticated modeling you’d be able to see more precisely how your systems would perform by purchasing any specific combination of hardware. This could save everyone a lot of money without jeopardizing their businesses.
- Automation: If the system knows that allocating more space on a solid state device as a buffer over here will prevent that system over there from slowing down at quarter end then why not let it go ahead and make the change? OK, we are a ways from allowing systems to perform autonomously at scale but we could at least automate the generation of a work order with the suggestion and rationale embedded in the details…no?
Back to the car fuel gauge analogy. You could generate a virtual mother-in-law who would not only tell you to slow down, watch for the car in front breaking and to take the by-pass not the back roads – that virtual mother-in-law could estimate the exact savings in fuel and disk pad usage. Equally useful, and a lot less terrifying, would be the ability of haulage companies to really understand the value of route optimization, vehicle selection and even service scheduling.