Availability Management intimidates many new
practitioners, and they often leave it to last or skip it altogether. This
is too bad, because even without a formal ITIL program, Availability
Management can yield dramatic results, as one of my clients, a major
University, found out...
By
Hank Marquis
Lots of people get
intimidated by the math and scope of Availability Management. It is too bad,
because you can get quite a few quick wins by applying some simple
Availability Management techniques.
I am going to
illustrate this by describing the story of a University that I worked with.
They used Availability Management to take back control of their
infrastructure.
The University had
issues of availability with one of their underpinning contracts (UC) with a
telecom provider. They kept losing service. They would open an incident,
which would invariably close as “no trouble found” or “came clear while
testing” in a few days. They were simply not able to get the attention of
the service provider.
Then they became
aware of availability management, took some action, and not only got their
issues resolved, but got some cash back in the process. Once again, actually
getting benefits from ITIL “ain't sexy” but “it sure is valuable.”
Based on my
personal experience, following I describe how a major University was able to
improve quality and reduce costs through focused application of the basic
ITIL concept of Availability Management.
The Issue
The University, which shall
remain nameless, was having issues with a carrier. The IT department at the
University supported several distributed campuses, all connected by wide
area network (WAN) circuits from the regional carrier (who shall also remain
nameless to protect the guilty!)
About every week a WAN
circuit would fail and either disconnect the campus, or cause significant
delays on other circuits due to saturation as the traffic re-routed. The
result was significant negative impact on the campus, professors,
researchers, administrators, and students. Not a pretty picture.
Whenever this happened,
University IT would open a ticket with the carrier. After a couple of days
the circuit would “miraculously” recover, and the incident would close. A
follow-up with the carrier invariably yielded that classic telco response:
“we didn't do anything, but does it work now?” and the ticket would be
closed as NTF (No Trouble Found) or CWT (Came Clear While Testing).
Let me translate here. In
other words, the telco had an intermittent problem that they could not find,
and when the fiddled around with enough things the systems would recover on
their own. Since the telco never got to the root cause, the underlying root
cause of the incidents remained. This is the classic “chronic” incident and
problem.
The Solution
My customer, director of the University IT organization
decided to take action. He obtained a copy of the UC with the telco, which
he read very carefully. As do most UCs there was small print mentioning that
any degradation in availability greater than a set amount would authorize
the University for a refund to be applied to the next months bill.
Of course, the issue was how to document these outages
in a manner acceptable to the telco. He decided to collect WAN interface
utilization metrics from the routers connected to the problem circuits.
Over time he gathered enough outage information with
enough granularity to be able to see exactly (to within a few seconds) when
the circuit went down, and when it came back up. He would open a ticket
as required in the UC, but he monitored the circuit himself.
He then produced a report showing all the outages in
the month, which he associated with telco trouble ticket reference numbers.
He gathered this information regularly and created a database of
information. This was what ITIL would describe as an Availability Management
Database (AMDB).
Finally, he created an invoice, and sent the telco a
bill, asking that the amount due the University be applied to his next
months bill.
The Results
As you might expect, the telco was not happy with his
bill. After all, telcos bill users, not the other way round. It didn't take
long for the telco to decide they had to resolve this
issue once and for all, and that is exactly what they did. Amazingly, they were
able to resolve the Universities issues. And until it was
resolved, the University indeed got credits off their future bills.
When the telco, now driven to identify the root cause,
actually resolved the issue, the quality of service to the many campuses
greatly improved. All users were much happier, as was my client.
The Moral
The solution to this real life clients problems was not
sexy, but it was quite valuable. And yes, they could have done all this without
the ITIL. I agree that this was “simple common sense” -- but isn't that all
the ITIL really presents?
Sometimes it takes help to see the way forward clearly.
The difference is between knowing and doing. Having an AMDB or
even a CMDB (knowledge) is not the goal. The goal is taking action based on
the knowledge (doing.)
The ITIL presents what you should do, and offers many
angles to appreciate the job at hand. However, ITIL does nothing for you or
to you. It is how you, the practitioner, applies the techniques described
in the ITIL that makes a difference. In this example, the benefit came from using
the ITIL model as a reference to understand and talk about the issue, and
then creating an AMDB with the express purpose of solving a problem.
This is the sort of focused, pragmatic and results
oriented approach any organization can take, even without a formal ITIL
program. So don't get caught up in all the hype, and don't get intimidated.
You can do it yourself!
Where to go from here:
-
digg (discuss or comment) on this article. Show your support for DITY!
- Subscribe to our newsletter and get
new skills delivered right to your Inbox,
click here.
- Download this article in PDF format
for use at your own convenience, click here.
- Use your favorite
RSS reader to stay up to date,
click here.
Related articles:
|