By
Hank Marquis
Many know Component Failure Impact
Analysis (CFIA) is somehow related to Problem and Availability Management,
yet it remains at best a fuzzy concept for most. While CFIA is impressive
sounding, it is really just a way of
evaluating (and predicting) the impact of failures, and locating Single
Points of Failure (SPOF). CFIA can:
1. Identify Configuration Items (CIs)
that can cause an outage 2. Locate CIs that have no backup 3. Evaluate the risk of failure for each CI
4. Justify future investments 5. Assist in CMDB creation and maintenance
All it takes to gain these benefits is
an Excel Spreadsheet or some graph paper. Following are the 3 steps to
success with Component Failure Impact Assessment.
- Select an IT Service, and get the
list of CIs, hopefully from Configuration Management, upon which the IT
Service depends. If there is no formal Configuration Management
Database CMDB, then ask around IT for
documentation, paper diagrams and general knowledge.
- Using a spread sheet or graph
paper, list CIs in one column and the IT Service(s) across the top row.
Then, for each CI, under each service:
a.) Mark “X” in the column if a CI failure causes an outage
b.) Mark “A” when the CI has an immediate backup (“hot-start”)
c.) Mark “B” when the CI has an intermediate backup (“warm-start”)
You now have a basic CFIA matrix!
Every “X” and “B” is a potential liability, the final step is to develop a
Request for Change (RFC):
- Examine first the “X’s”,
then the “B’s”, by asking the following questions:
- Is this CI a SPOF?
- What is the Business/Customer
impact of this CI failing? How many Users would be impacted?
What would be the cost to the Business?
- What is the probability of
failure? Is there anything we can do differently to avoid this
impact?
- Are there design changes that
could prevent this impact? Should we propose redundancy or some
form of resiliency? What would redundancy cost?
As you get good at CFIA, consider
expanding your CFIA matrix to include the procedure used to recover from a
CI failure as a row across the bottom of your CFIA matrix. (Of course,
this requires that you are mature enough to have written procedures!)
Adding documented response procedures to your CFIA matrix lets you examine
the organization as well as infrastructure. Ask yourself:
- How do we respond when this CI
fails?
- What procedures do we follow?
Are these procedures documented? Could they be improved? Could they be
automated?
- Can we improve the procedure
through staff training? New tools or techniques?
- Could preventative maintenance
have helped avoid this problem?
Sound CFIA at any level
(infrastructure, organization or both) delivers RFCs
that can deliver real improvements to the Business without requiring high
process maturity or expensive supporting software. There are some
IT-centric benefits to CFIA as well, including a head-start on IT Service
Continuity Management; Aiding Configuration Management which benefits from
the addition of recovery procedures to the CMDB; and Problem and Incident
Management who may follow these procedures. All in all another win-win!
Without expensive tools, new dedicated resources or complex systems.
--
- Subscribe to our newsletter and get
new skills delivered right to your Inbox,
click here.
- To browse back-issues of
the DITY Newsletter, click here.
|