Perhaps I treated you too harshly, GDP
The much-maligned statistic is actually fairly good at measuring well-being
In my previous post (more than half a year ago, wow!) I outlined why GDP isn’t an ideal measure of a country’s well-being. It ignores leisure time, unpaid housework, and non-economic components of well-being, while counting ‘regrettables’ like defense spending and advertising. I also advocated for replacing it with other measures, like the ‘Measure of Economic Welfare’ designed by Nordhaus and Tobin (1972). However, two recent studies have somewhat changed my view on this issue: Martinez (2022) and Delhey & Kroll (2013).
Countries that never sleep
Martinez’ study, titled “How much should we trust the dictator’s growth estimates?”, uses satellite imagery to offer rock-solid confirmation of a suspicion many economists have held for a long time: autocracies lie about economic growth.
He compares growth in GDP with growth in the amount of lights visible from space at night in 88 countries between 1992-2013, which results in the striking graph seen below: autocracies report more GDP growth than democracies, given the same increase in night lights.1
Night lights are fairly well-established as a very strong correlate of true economic growth (Henderson 2012). The mechanism is simple: when the economy is larger, there is more activity at night and more money to pay to light it. Car headlights, streetlights, light inside homes and offices shining out through windows … anyone who’s into astronomy can confirm that rich and dense countries have a lot more light pollution than poor and sparsely-populated ones.
The key of the study is another fact about night lights: they’re very hard to manipulate. It isn’t like GDP, where a dictator can fudge the numbers a bit here and a bit there; to fudge this data (which is collected by independent agencies), you’d have to actually go out and create fake lights. Which, aside from looking very odd to the local populace (“Mommy, why are the government men installing lights on top of our house?”), is also rather costly in terms of both money and energy. Much cheaper to just not let your people see the website where the data is shown.
(Now, if I were a dictator, I probably would mess up the numbers in the other direction, by demanding that all lights be turned off for an hour every night so I can see the goddamn stars for once. Democracies seem not to have the co-ordination capacity to do that properly even on Earth Day.)
There are a number of potential objections to this study, but I think Martinez handles them all with commendable rigor, using “an exhaustive set of robustness checks”.2 These separate into two kinds: sensitivity checks and econometric controls.
Sensitivity checks are a way of testing how much a given assumption or metric matters for the result of your study: just change it to something else, and see if the results still hold!
For example, one potential issue is that the measure of democracy used might be somehow biased towards some countries over others. So Martinez replicates his analysis, which originally used Freedom House’s index, with three other democracy indexes, finding the same results each time. Further, he tries nightlight data from different sources, several ways of calculating nightlight density, GDP from separate World Bank calculations, different groups of countries, … you get the point, this man is thorough.
Econometric controls are a way to check whether (and how much) a variable is confounding your analysis. Confounding variables in this case could be that autocracies have worse statistical collection and are therefore accidentally miscounting, or that autocracies have smaller informal economies because of higher state capacity, or that they have different economic structures (e.g. more government spending, less private consumption) which could lead to higher GDP-to-nightlights ratios.
Martinez controls for every potential confounding variable I could think of and a bunch more. I usually warn against trusting studies that have “controlled for confounders” - there’s always more confounders hiding under the bed, waiting to grab you when you aren’t paying attention - but if anyone has done it, he has.
There are also some other signs indicating that this result is correct - for example, autocracies that are poorer than the International Development Association’s GDP/capita threshold for foreign aid pretty much don’t exaggerate their GDP numbers at all, which is exactly what you would expect from governments that want the foreign aid to keep flowing in. (Incidentally, this also screws up calculations of how much foreign aid increases GDP.)
Wait a second …
So a very good study has decisively shown that GDP can be, and is being, systematically manipulated by governments. I thought this study was supposed to increase my trust in GDP?
Fear not, dear reader, this is no elementary mistake of mine. Take a look at this graph from The Economist’s very short article on this study:
You see those beautiful yellow dots on the left? That’s the free world. Notice how almost none of them are overestimating their GDP growth at all. That means that, in free/democratic countries, GDP statistics are accurately reflecting real economic activity levels. You may recall that two of the big problems with GDP is that it doesn’t include household labor and that it’s hard to estimate the size of the informal economy. This data implies that in free countries, there is not much difference in the amount of household labor (or that it has a negligible influence on the amount of nightlights), and that the estimates of the size of the informal economy (which is included in GDP) are pretty much correct.
In conclusion, this is decent evidence that when governments aren’t lying (i.e. in free countries), we can trust GDP to accurately reflect economic activity. In authoritarian countries, on the other hand, we need to look at less fudge-able measures like nightlights, exports, imports, and business surveys, and triangulate economic activity from that. But what good is economic activity if it’s not actually increasing well-being? This is where our second study comes in.
Three-Letter Acronyms Abound
In 2013, two German researchers from the University of Bremen somehow managed to be the first to research a question mind-numbingly obvious it should’ve been done years ago: how much do different national well-being measures correlate with happiness?
Perhaps I am being unfair to the scientific community. Maybe there were methodological issues, or a lack of data, or something else that prevented this study from being written earlier. But to me, this was so clearly the first question to ask that upon learning of the existence of the ‘happiness economics’ research field, I immediately looked for studies like this and was absolutely shocked to find only one (if anyone knows of the existence of others, please inform me).
I don’t wish to belittle what these German researchers (Jan Delhey and Christian Kroll) have done. In fact, I wish to thank them for their invaluable contribution to the field. I wish to belittle all the other social scientists in this field for not having done more big, comprehensive studies into this matter in the last decade. Come on, guys! We could have solved the Easterlin Paradox by now! What gives?
Anyhow, with my standard uncharitable short rant about the state of social science out of the way, let’s get into what this study is actually about.
Delhey and Kroll take note of the recent rise in popularity of national well-being measures which are not GDP, and are curious to see whether they are actually, you know, measuring national well-being better than GDP does. To do so, they take data on 'life contentment'3, 'life satisfaction'4, and 'life happiness'5 from the Gallup World Poll and European Values Survey, which they then merge into a single measure they call "Subjective Well-Being" (SWB). (I generally approve of this approach, with the usual caveats as outlined in my post on technical problems in happiness measurement.) For each measure of well-being, the correlation with SWB is then calculated. Fairly simple stuff.
(All of the countries studied here are part of the OECD, which is a club of mostly-rich countries, from Mexico and Turkey to the US and Norway. There tends to be little interest in the economic activity - well-being relationship in poor countries, because basically everyone agrees that it’s obviously a huge positive correlation.)
This approach rests on two (fairly trivial) assumptions:
SWB surveys actually measure some large and significant facet of well-being, such that it makes sense to check ‘correlation with happiness surveys’ to find out how valid other well-being measures are.
There are some facets of well-being left unobserved with (current) surveys; otherwise why bother with all these complex measures at all? You could just survey your population once a year and not have to do all that other tough research work.6 You either want a metric you can use alongside surveys (with each capturing different elements), or one which is straight-up better than surveys, observing everything the surveys observe and then some.
So we should expect a useful national well-being measure to have fairly high correlation with SWB surveys, but not near 100%, because then you might as well use one or the other.
I realize I’ve talked a lot about ‘national well-being measures’ without really giving you any examples; let me ameliorate that. Our German friends writing this study have chosen seven measures to compare:
GDP: the grandfather of economic metrics. Measures total economic activity in a country. Widely critiqued for all sorts of reasons, but quite objective (as we saw in the previous study).
HDI: the main competitor. The Human Development Index combines GDP with measures of education and health.7 In poor countries the HDI often diverges from GDP, but in the rich world they're basically the same except for a few weird educational policy differences.
I-HDI is just the HDI, but adjusted for inequality. That’s inequality in all metrics, mind you, not just income: if some people in a country are very educated and others didn’t finish middle school, or some live 90 years and others 60 years, that pulls the country’s score down a lot too.
The Well-Being Index (WBI) is just the HDI but without counting GDP and calculated a bit differently. Moving on.
The Better Life Index is the OECD’s pet project and my personal favorite of these options. It’s an 11-part index counting things like health, safety, housing, income, environment, and work-life balance. This study chooses to weigh all 11 equally, but they acknowledge that there’s no empirical justification for that. Check out their fun website, where you can fill in your own weights for how much you care about each element. [They also include life satisfaction, but Delhey & Kroll obviously excluded that data for the purposes of this study.]
The (Weighted) Index of Social Progress (WISP) is like the Better Life Index but, well, worse. It is a clusterfuck which has 41 indicators over nine domains: education, health status, women’s status, defense effort (-), economy, demography, environment, social chaos (-), and cultural diversity. The (-) indicates that it’s counted negatively. As you can tell from the name and the domains, this was not meant to be a pure well-being index, but rather an indicator of what some academics consider to be ‘social progress’. And that’s fine! But we shouldn’t expect it to perform well on a happiness test.
And finally, the Social Development Index (SDI) has as its components: “life expectancy, adult literacy rate, gross enrollment ratio, infant survival rate, supply of calories, proteins, and fat per day, respectively, telephone lines per 1,000 people, physicians per 100,000 people, as well as electricity consumption”. This is very obviously an index for measuring poor countries. Why are we using this for the OECD. Adult literacy rate is the exact same for all rich countries - 100%! The supply of calories? How about “too much”. Telephone lines? What is this, the 20th century? I expect this to have a high correlation with GDP because the only things on the list that really differ between rich countries (infant survival rate, electricity consumption) are highly correlated with GDP. I do not, however, find this a useful metric for our purposes.
I’ve dragged this post on for long enough, so I’ll cut to the chase and give you the results:
The Better Life Index is a clear winner with 0.66 correlation, followed by GDP (GNI really, same difference) at 0.58, and then four other measures between 0.47-0.51. The (Weighted) Indicator of Social Progress lags, as expected.
These results should be surprising to the many, many social scientists who have claimed that GDP is obviously a terrible measure and that it’d be easy to make a better one. I know I was surprised. Clearly, GDP does fairly well at this new job it was never designed for, and many a researcher has tried and failed to create a better measure.
As the OECD’s Better Life Index proves, though, it can be done. I’d love to see a repeat of this study with more sophisticated measures; a properly-weighted Better Life Index, some Measure of Economic Welfare-type thing, an inequality-adjusted GDP, decent positive/negative affect studies, … If nobody does this in the next few years, I’ll claim it as my thesis topic!
To conclude:
I (and many other people) discounted GDP too quickly in the past based on theoretical arguments, without looking at the empirical evidence.
In free countries, government-reported GDP is a very accurate measure of economic activity, which is comparable across different countries.
In authoritarian countries, we can’t trust GDP numbers because dictators and oligarchs don’t mind fudging the data to exaggerate.
GDP is more strongly correlated with subjective well-being surveys than most other national metrics of well-being.
The Better Life Index proves that it’s possible to do better than GDP in terms of measuring welfare.
We need more basic correlational research on this topic.
Sources
Delhey, J., & Kroll, C. (2013). A “Happiness Test” for the New Measures of National Well-Being: How Much Better than GDP are They? Happiness Studies Book Series, 191–210. https://doi.org/10.1007/978-94-007-6609-9_14
Delhey, J., & Kroll, C. (2013). A Happy Nation? Opportunities and Challenges of Using Subjective Indicators in Policymaking. Social Indicators Research, 114(1), 13–28. https://doi.org/10.1007/s11205-013-0380-1
Martínez, L. R. (2022). How Much Should We Trust the Dictator’s GDP Growth Estimates? Journal of Political Economy, 130(10), 2731–2769. https://doi.org/10.1086/720458
Nordhaus, W., & Tobin, J. (1973). Is Growth Obsolete? NBER Chapters, 509–564. https://ideas.repec.org/h/nbr/nberch/3621.html
OECD Better Life Index. (n.d.). Retrieved November 2, 2022, from https://www.oecdbetterlifeindex.org/
The Economist. (2022, October 21). A study of lights at night suggests dictators lie about economic growth. The Economist. https://www.economist.com/graphic-detail/2022/09/29/a-study-of-lights-at-night-suggests-dictators-lie-about-economic-growth
The circles and triangles on the graph each represent the average growth across all democracies, resp. autocracies, in a given year.
Exhaustive? More like exhausting. It took me half an hour to get through like five pages of that stuff.
This is an alternative name for the Cantril Ladder, which asks: “Please imagine a ladder with steps numbered from zero at the bottom to 10 at the top. The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?”
Life satisfaction question: “All things considered, how satisfied are you with your life as a whole these days? Please use this card to help with your answer [1 dissatisfied (…) 10 satisfied].”
Life happiness question: “Taking all together, how happy would you say you are: very happy, quite happy, not very happy, not at all happy?”
I believe that with a lot of methodological adjustments, we probably can reach the point where surveys capture almost every part of well-being, at least on a population level.
For health, the main indicator is life expectancy; for education, it’s years of schooling. Bryan Caplan would not approve.
I've been a GDP sympathizer since I ran a first order regression of GDP on self reported happiness in European countries and found that it explained virtually all the variance.
I think it's important for a variety of reasons to measure other things though, and I would prefer to use other measures of both psychological and economic welfare for this reason.
Great stuff! Excited to see more