The safety precautions taken by workers on a power line visually demonstrate the clear thinking behind them. Can we improve the way we assess risk in the less visual IT world?
I woke up to unusual noises around my house today. On inspection, it appeared that the high-voltage power lines around my house were being inspected. I took a moment to go out and take a picture and have a chat with the guys who were inspecting the paint on the towers that support the wires. The worker at the top would shout numbers to the guy below indicating paint measurements at different locations.
Clearly there had been a lot of safety thinking about this activity. Who wants to climb a 380-kilovolt power line? So many things could go wrong: electrocution, falling, getting stuck… obviously the work had been well planned and the safety thinking was plain to see: hard hats, a green flag on the tower, and lots of safety gear.
As an IT guy I’m always impressed by these very visual activities. From a Kepner-Tregoe point of view, all the safety precautions demonstrate good risk assessment thinking. When assessing risks, the probability and seriousness of something going wrong is considered and is evident in the safety actions taken.
In the less visual IT world, it’s not always so easy. What is the risk that a hardware or software upgrade will go wrong? The consequences (which determine the seriousness) if it all goes horribly wrong are often not easy to estimate in increasingly complex datacenter structures. And estimating the probability of things going wrong is complicated by the fact that many changes are one-of-a-kind. In the IT landscape estimating the chances that a specific problem will occur are highly speculative. As a result, using the KT approach to analyzing risk—Potential Problem Analyses—can appear more difficult than it is.
What can we learn from the guys climbing multiple power-line towers every day?
The same risk assessment is used over and over again each time the guy starts climbing the next tower. Of course, there will be some additional consideration for specifics that may apply to the environment of each individual tower. For example, the next one has a mobile phone base station at the top, so the person climbing that one may not only consider the risk for electrocution and falling, but also the risk of radiation from the GSM/UMTS transmitters.
In the IT space we can do more of the risk assessments that we see being done in the world around us. An upgrade of an IT system may look very specific and one-of-a-kind, but on closer inspection, it may be the fourth upgrade we’ve been involved in this month. How did we do the previous upgrades, and how would our experience from previous activity help us do it more safely this time?
If you think about the re-use of Potential Problem Analysis work—and the value it delivers again and again—you have good justification for spending some serious time on it and then using it again and again.
At this point I can see the guys climbing the next tower with the GSM/UMTS antennas at the top. Surely a lot of good safety thinking is going on there. By the way, do you have any idea why my mobile phone isn’t working this morning?