Goodhart's Law, the Cobra Effect, and Unintended Consequences

The dangers of relying too much on measurements and incentives to drive behavior

Apr 03, 2023

In today's world, data and statistics reign supreme. From tracking a company's sales performance to monitoring a student's academic progress, we are constantly bombarded with measurements that are supposed to inform our decisions. However, as the adage goes, “not everything that counts can be counted, and not everything that can be counted counts.” And more importantly, for anything that’s counted, if it really counts, someone is going to game it and completely misuse it. This is where Goodhart's Law—which points us to the dangers of relying too much on measurements and incentives—comes into play.

The content of this article is also available as a YouTube video:

or a podcast:

Goodhart's Law states that when a measure becomes a target, it ceases to be a good measure. When some activity is being measured numerically, and it is important enough that people want to increase their number by any means, then the number becomes less and less useful because people start focusing on just the number and not the underlying activity that is (or was) being measured.

The Cobra Effect

Let me take an example1:

During the times of the British Raj in India, in Delhi, there was a problem of too many cobras. So, the government announced a cash prize for anyone who would kill a cobra and bring it into the office. The way to reduce cobras they decided was to measure dead cobras and then pay for those. That makes sense, right?

Now, the problem is that the dead cobras being brought into the office is just a measure of a more complex underlying concept: “reduction in venomous cobras in Delhi”. And initially, it worked great. People killed several cobras and got their cash. But soon, this measure became a target for people who wanted the cash prizes. So they started breeding cobras for the cash prizes.

When the government realized what was happening, they cancelled the incentive program. As soon as this happened, the people breeding cobras released them into the wild, since they were of no further use.

So a program intended to reduce cobras actually ended up increasing cobras. Because when a measure becomes a target it ceases to be a good measure (and because the map is not the territory.)

Goodhart’s Law Affects Everything Everywhere All at Once

The cobra example above is really old and probably apocryphal. But this kind of stuff happens all the time in real life with serious consequences. There is a Wikipedia page with a long list of examples. Here are some:

In building the first transcontinental railroad in the 1860s, the United States Congress agreed to pay the builders per mile of track laid. As a result, Thomas C. Durant of Union Pacific Railroad lengthened a section of the route forming a bow shape unnecessarily adding miles of track.
In 2002 British officials in Afghanistan offered Afghan poppy farmers $700 an acre in return for destroying their poppy crops. As you might be able to predict by now, more and more Afghan farmers started plating poppies to collect payouts from the cash-for-poppies program. Worse, some farmers harvested the sap before destroying the plants, and then sold the sap on the market.
The 20th-century paleontologist G. H. R. von Koenigswald used to pay Javanese locals for each fragment of hominin skull that they produced. He later discovered that the people had been breaking up whole skulls into smaller pieces to maximize their payments.
Between 1945 and 1960, the federal Canadian government paid orphanages 70 cents per day, per orphan, and paid psychiatric hospitals $2.25 per day, per patient. Allegedly, up to 20,000 orphaned children were falsely certified as mentally ill so that the province of Quebec could receive the larger payment.
A university with a low acceptance rate is considered a good university, because its admissions are so selective. Here’s what University of Chicago did:

Christopher Frey @FreyChristopher

U Chicago’s acceptance rate in 2006 was 38%. In 2020 it was 6%. How? By intentionally convincing hundreds of students with no real chance of acceptance to apply. And up the rankings they go!

Soren Schwab @soren_schwab

Free application No test scores Drum up applications Reject more students Get lower acceptance rate Become “more elite” —> don’t fall for the myth that a low acceptance rate means it’s a better college

Goodhart’s Law doesn’t just affect humans! Here’s a dog from 1908:

UberFacts @UberFacts

In 1908, the New York Times reported a story about a dog in Paris who regularly pushed children into a river and then rescued them for the treats he would receive at the end

Goodhart’s Law in Education and Work

One of the most common places to find Goodhart's Law is the use of performance metrics used to evaluate employees. Performance metrics can be very helpful in identifying areas of improvement. But, they often get twisted and misused.

There are lots and lots of examples, but I’ll just give a couple:

Imagine a company that wants to quickly reduce the number of bugs in their software product and gives an incentive to software programmers per bug fixed. You can see how that can go wrong pretty quickly.

In education, marks/grades/percentages are supposed to measure how well a student has learned the subject material. Then this is used for admissions to colleges and job offers and other such things. Soon, nobody cares about the education and everyone starts focusing on increasing their marks by any means possible.

This leads to “grade inflation” which is a problem worldwide. Here’s the issue in India:

In CBSE exams, a 95 per cent aggregate is 21 times as prevalent today as it was in 2004. And this is true across all boards, not just CBSE. Now if just one board fixes stops grade inflation, it will be a problem for that board’s students because they’ll get lower marks than students in other boards, and in India, that is usually a very bad thing. In 2017, CBSE decided to fix this problem called a meeting of all 40 school boards in India to urge them to discontinue “artificial spiking of marks”. And CBSE announced that they would lead by example. But whatever they did only had a small effect. Almost 6.5 per cent of mathematics examinees in 2017 scored 95 or more — 10 times higher than in 2004 — and almost 6 per cent of physics examinees scored 95 or more, 35 times more than in 2004. (Source and references.)

But Metrics Are Necessary

But this doesn’t mean that metrics are useless and must be discarded entirely. In fact, metrics are necessary, because the alternative is to use our gut instinct, and our gut instincts are often wrong. Humans are prone to so many biases, that without metrics too many mistakes are made. So what is to be done?

First, it is important to keep in mind that metrics are just a tool, and should never be used as the sole basis for decision-making. People, with their judgment and experience, should always have the final say. (This was famously demonstrated in the Vietnam War, where the Secretary of Defense at the time, Robert McNamara, tried to run the war purely based on data. The result was a catastrophic failure that cost countless lives. This problem has its own name now: the McNamara Fallacy.)

Another technique is to use paired indicators. The idea is to have two metrics that balance each other out. If you try to game one metric you’ll do badly on the other metric. For example, a credit card company that wants to reduce fraud could measure both the number of fraudulent transactions and customer satisfaction. This way, they ensure that they are achieving their goal without compromising the quality of their customer service.

George Mack @george__mack

@rabois encourages Doublethink via "Pairing Indicators". If a customer service team is only measured by a reduction of fraud rate, they start treating every customer like a potential con artist. So he measures both effect (Fraud Rate) and countereffect (Service quality).

FutureIQ

Discussion about this post