Low-Stakes and High-Stakes Metrics

Metrics, OKRs, KPIs are marketed on a basis of, “don’t you want to know what’s going on?”, “how will you know where your problem is without being able to measure it?”, and similar appeals to the desire to know. “Once you measure things you’ll be able to see what you need to change!”.

Generally speaking, these are very good arguments. Large companies everywhere have made use of tracking KPIs, OKRs, and other measurement frameworks to (mostly) successfully get a grasp on systems that are much, much too large for a single person to grapple with, and steer them in a mostly-sensible direction. They enable mechanical processing of data, all while surfacing unexpected anomalies that we might gloss over as repeated “oddity”.

On the other side of the bill of measurement, we have Amazon warehouse workers being put under such intense demands1 that the company is being threatened by running out of people willing to work for them. The chief tool used by Amazon in pursuit of maximum efficiency is the measurement of everything, optimising out every bit of slack, and with it, demanding its employees run at full throttle at all times, as well.

These two are talking about the same tool set, but generally speaking we want the former, but not the latter2. So, what gives? Why does a tool like this cut both ways, and cut both ways this hard?

I believe it’s best explained by looking at the stakes attached to metrics, and more generally the culture around what these numbers represent.

When metrics are low stakes, when they stay inside the team, they are beneficial. They are instituted, monitored, and acted upon by the people that are subject to them. This is the Diagnostic or Improvement Paradigm.

On the other side, where stakes for the metrics are high, there is the Accountability Paradigm. Here, measures and metrics are not necessarily for improvement or finding issues, they are for making sure that people do what they are supposed to. The fundamental purpose of capturing the numbers3 is for accountability, and “transparency”.

To use the concrete example of Amazon, diagnostic metrics would be if warehouse workers would monitor measures like packing mistakes, or using the wrong path to create better workflows, signage, and mistake-proofing. They would be able to see that a certain box is often mis-used, and do something in order to promote better use of said boxes. Accountability metrics would be, well, what produces the news stories you see written about Amazon warehouse workers.

Accountability-based metrics can have devastating effects, and for one particularly brutal example, look at the American highschool system, specifically, No Child Left Behind. Here, funding was tied to grades, teacher pay was tied to grades, and the predictable happened: The schools that were needed most ended up being closed for being “ineffective”, because they did not have the results of those in more affluent neighborhoods.

In general, accountability metrics will suffer from Goodhardt’s Law4 and Campbell’s Law5. Accountability metrics will be gamed as hard as possible, and everything that is inside, but also outside the bounds of the legal will be employed to game them once people’s careers and livelihoods are tied to them.

It is incredibly tempting to take measures created for improving the situation, and tie them to assessment. However, in doing so, whatever made them originally useful will be lost, and so will any goodwill attached. Sometimes, it’s a price worth paying, but knowing when that is is not a decision I would like to have to make.

  1. Yes, the “peeing in bottles” thing is very well known at this point, but the warehouse workers are physically pushed very hard, too. 

  2. I mean, Amazon evidently thinks this is the right thing to do. 

  3. Numbers are tricky in this regard, they pretend to be neutral, and objective, but really, they’re a condensation of a lot of assumptions, process, and judgement, and with that they require more expertise to interpret correctly, not as one would assume, less. 

  4. “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” 

  5. “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”