Many know Component Failure Impact Analysis (CFIA) is somehow related to Problem and Availability Management, yet it remains at best a fuzzy concept for most.
While CFIA is impressive sounding, it is really just a way of evaluating (and predicting) the impact of failures, and locating Single Points of Failure (SPOF). CFIA can:
- Identify Configuration Items (CIs) that can cause an outage
- Locate CIs that have no backup
- Evaluate the risk of failure for each CI
- Justify future investments
- Assist in CMDB creation and maintenance
All it takes to gain these benefits is an Excel Spreadsheet or some graph paper. Following are the 3 steps to success with Component Failure Impact Assessment.
- Select an IT Service, and get the list of CIs, hopefully from Configuration Management, upon which the IT Service depends. If there is no formal Configuration Management Database CMDB, then ask around IT for documentation, paper diagrams and general knowledge.
- Using a spread sheet or graph paper, list CIs in one column and the IT Service(s) across the top row. Then, for each CI, under each service:
- Mark “X” in the column if a CI failure causes an outage
- Mark “A” when the CI has an immediate backup (“hot-start”)
- Mark “B” when the CI has an intermediate backup (“warm-start”)
You now have a basic CFIA matrix! Every “X” and “B” is a potential liability, the final step is to develop a Request for Change (RFC):
- Examine first the “X’s”, then the “B’s”, by asking the following questions:
- Is this CI a SPOF?
- What is the Business/Customer impact of this CI failing? How many Users would be impacted? What would be the cost to the Business?
- What is the probability of failure? Is there anything we can do differently to avoid this impact?
- Are there design changes that could prevent this impact? Should we propose redundancy or some form of resiliency? What would redundancy cost?
As you get good at CFIA, consider expanding your CFIA matrix to include the procedure used to recover from a CI failure as a row across the bottom of your CFIA matrix. (Of course, this requires that you are mature enough to have written procedures!) Adding documented response procedures to your CFIA matrix lets you examine the organization as well as infrastructure. Ask yourself:
- How do we respond when this CI fails?
- What procedures do we follow? Are these procedures documented? Could they be improved? Could they be automated?
- Can we improve the procedure through staff training? New tools or techniques?
- Could preventative maintenance have helped avoid this problem?
Sound CFIA at any level (infrastructure, organization or both) delivers RFCs that can deliver real improvements to the Business without requiring high process maturity or expensive supporting software. There are some IT-centric benefits to CFIA as well, including a head-start on IT Service Continuity Management; Aiding Configuration Management which benefits from the addition of recovery procedures to the CMDB; and Problem and Incident Management who may follow these procedures. All in all another win-win! Without expensive tools, new dedicated resources or complex systems.