Lots of people get intimidated by the math and scope of Availability Management. It is too bad, because you can get quite a few quick wins by applying some simple Availability Management techniques.
I am going to illustrate this by describing the story of a University that I worked with. They used Availability Management to take back control of their infrastructure.
The University had issues of availability with one of their underpinning contracts (UC) with a telecom provider. They kept losing service. They would open an incident, which would invariably close as “no trouble found” or “came clear while testing” in a few days. They were simply not able to get the attention of the service provider.
Then they became aware of availability management, took some action, and not only got their issues resolved, but got some cash back in the process. Once again, actually getting benefits from ITIL “ain't sexy” but “it sure is valuable.”
Based on my personal experience, following I describe how a major University was able to improve quality and reduce costs through focused application of the basic ITIL concept of Availability Management.
The University, which shall remain nameless, was having issues with a carrier. The IT department at the University supported several distributed campuses, all connected by wide area network (WAN) circuits from the regional carrier (who shall also remain nameless to protect the guilty!)
About every week a WAN circuit would fail and either disconnect the campus, or cause significant delays on other circuits due to saturation as the traffic re-routed. The result was significant negative impact on the campus, professors, researchers, administrators, and students. Not a pretty picture.
Whenever this happened, University IT would open a ticket with the carrier. After a couple of days the circuit would “miraculously” recover, and the incident would close. A follow-up with the carrier invariably yielded that classic telco response: “we didn't do anything, but does it work now?” and the ticket would be closed as NTF (No Trouble Found) or CWT (Came Clear While Testing).
Let me translate here. In other words, the telco had an intermittent problem that they could not find, and when the fiddled around with enough things the systems would recover on their own. Since the telco never got to the root cause, the underlying root cause of the incidents remained. This is the classic “chronic” incident and problem.
My customer, director of the University IT organization decided to take action. He obtained a copy of the UC with the telco, which he read very carefully. As do most UCs there was small print mentioning that any degradation in availability greater than a set amount would authorize the University for a refund to be applied to the next months bill.
Of course, the issue was how to document these outages in a manner acceptable to the telco. He decided to collect WAN interface utilization metrics from the routers connected to the problem circuits.
Over time he gathered enough outage information with enough granularity to be able to see exactly (to within a few seconds) when the circuit went down, and when it came back up. He would open a ticket as required in the UC, but he monitored the circuit himself.
He then produced a report showing all the outages in the month, which he associated with telco trouble ticket reference numbers. He gathered this information regularly and created a database of information. This was what ITIL would describe as an Availability Management Database (AMDB).
Finally, he created an invoice, and sent the telco a bill, asking that the amount due the University be applied to his next months bill.
As you might expect, the telco was not happy with his bill. After all, telcos bill users, not the other way round. It didn't take long for the telco to decide they had to resolve this issue once and for all, and that is exactly what they did. Amazingly, they were able to resolve the Universities issues. And until it was resolved, the University indeed got credits off their future bills.
When the telco, now driven to identify the root cause, actually resolved the issue, the quality of service to the many campuses greatly improved. All users were much happier, as was my client.
The solution to this real life clients problems was not sexy, but it was quite valuable. And yes, they could have done all this without the ITIL. I agree that this was “simple common sense” -- but isn't that all the ITIL really presents?
Sometimes it takes help to see the way forward clearly. The difference is between knowing and doing. Having an AMDB or even a CMDB (knowledge) is not the goal. The goal is taking action based on the knowledge (doing.)
The ITIL presents what you should do, and offers many angles to appreciate the job at hand. However, ITIL does nothing for you or to you. It is how you, the practitioner, applies the techniques described in the ITIL that makes a difference. In this example, the benefit came from using the ITIL model as a reference to understand and talk about the issue, and then creating an AMDB with the express purpose of solving a problem.
This is the sort of focused, pragmatic and results oriented approach any organization can take, even without a formal ITIL program. So don't get caught up in all the hype, and don't get intimidated. You can do it yourself!