On a recent project, I worked with a principal engineer. He’s a very prolific contributor and involved in several projects. Most of his contributions are related to risk management.
That made me think about the other senior-most engineers and what they work on, leading me to conclude that the amount of time you spend managing risk is directly proportional to seniority. And at the top, it’s nearly all a CTO does.
And, of course, the higher you go, the more existential the risks become.
A senior engineer might evaluate the risk of using a new library – a principal engineer might think about what happens if new legislation passes.
It made me reflect on how involved I should be in identifying potential risks in my domain and how to do that effectively.
Time spent managing risk
There isn’t a fixed amount of time you should spend on identifying and mitigating risk.
Instead, it’s about continuously analyzing the things you’re working on and trying to understand where there might be bottlenecks, dead-ends, or other issues.
Rather than sitting down and going through a finished design and pointing out where there might be issues down the line, do it every step of the way.
At what level?
It’s difficult to tell how senior you need to be before you should consciously start evaluating risk.
It seems to happen progressively throughout the career. As you become more senior, you start to think about the bigger picture and the impact of your work and part of that is what might go wrong down the line.
At some point, you will not be able to progress further without being able to gauge risk accurately.
Besides, you also need a level of credibility before you can mobilize people to help you mitigate risk. It’s much easier for a principal engineer to get a week of dev time, than for a senior engineer.
Risk is invisible
Accurately gauging risk is invisible work. If you identify, estimate and tackle it in time, nothing happens. The company gets through it, and feels no adverse effects.
If you aren’t very senior, that might be a net negative to your political capital. And, if you do it more than once, you might become the boy who cried risk.
But, at the senior-most level, it’s a huge part of your job. Underestimating risk means you’ve made a significant error in judgment. Overestimating risk means you’ve wasted time and money. Your effectiveness is measured by how well you can do this.
In general, risk is classified as
likelihood × severity.
Severity is relatively easy to estimate – it’s a bit more tangible. If a library breaks we’ll have to spend
x hours replacing it.
But how do you accurately estimate likelihood?
One piece of advice I’ve got is to ask several others and then aggregate.
That would work, especially when coupled with frequently re-visiting the estimates, which is a good idea anyway.
Then, you also know whether the likelihood might be increasing or decreasing – a good indicator of whether you should be spending more or less time on it.
Another thing to do is to estimate the likelihood on different time-horizons. Some things might not be a problem now, but they might be in the future.
Similarly, you can ask: “How far in the future before the likelihood of this happening is 100%?”
I don’t have a conclusion. I’m still working on understanding this aspect of the job. I haven’t really found any authoritative sources on this topic that go beyond the
likelihood × severity formula.
Feel free to send me any resources or advice you might have.