Excess deaths, deficient maths

A primer on India's death registration system & statistics. This helps make better sense of (mostly flawed) excess-death and undercount-factor articles. First essay of series on this topic.

Executive summary:

My first essay (of a new series on covid death reporting) lays out a primer on India’s death registration statistics, to provide a historical grounding to any state-level analysis. This is sorely lacking in articles published so far. I highlight trend growth of 3% a year, higher in some states, that makes it erroneous to blindly ape Western methods where analysts compare covid-period to prior periods without trend adjustment. I also highlight variability in death data and how uptick-years are surprisingly common, making it important to carefully study a deviant datapoint before jumping to conclusions. I outline how I’ll operationalize this framework, starting with state-wise case studies and gradually moving to comprehensive analysis of first-wave followed by second-wave. I close with a sampler of analyses to come, using a Bihar example.

Long-form essay:

Excess deaths are in the news. Sensationally so. “40x excess deaths in Undercount Pradesh”. Any ginormous number headlined without context is meant to make us click, not think. But, the more fundamental problem isn’t the number. It’s the term itself. Getting to a better number, especially for all of India, means starting with a better definition.

‘Excess deaths’ is misleading. Both words. First, excess compared to what is ill defined. If it is compared to a prior trend, then that trend is rarely shown. Methodologies are copied from elsewhere without incorporating contextual differences. When large deviations from trend are the norm, distinguishing excess from noise is non-trivial.  Second, we can only count registrations, not deaths. The two are different, and trend differently. This has implications for estimation and extrapolation.

Excess deaths are not what we originally set out to find. It was “how many people died of covid in India”. In any country, this is surprisingly difficult to answer. It is extra hard in India, given systemic limitations. Answering this question requires an estimate of above-trend death-registrations attributable to covid. Since this is a headline-writer’s nightmare, ‘excess deaths’ became convenient shorthand that took a life of its own. Important qualifiers such as trend, registrations and attribution got omitted.

While a precise estimate of appropriately-attributed above-trend deaths is not possible, a better approach is. That’s what I’ll attempt, starting with this essay. As I have discussed earlier, sound judgment starts with a frame of reference that is grounded in history. This addresses the ‘what’ in ‘excess compared to what’. So, here’s a primer on India’s death registration system and the data it throws out.

We count registrations, not deaths

In India, Civil Registration System (CRS) captures death registrations. Pan-India CRS statistics were last published for 2019 (report came out a few days back). 7.6 million deaths were registered in 2019, out of an estimated 8.3 million deaths. Note that actual deaths are always an estimate, based on survey-based inputs, while registered deaths are a precise summation of local level records across India. CRS-2019 estimates that 92% of deaths were registered, up from 69% in 2009. Registration percentage varies widely across states, from 52% in Bihar to over 200% in Delhi.

As implausible 200% suggests, CRS isn’t flawless. Delhi likely logs deaths pertaining to neighbouring states. CRS claims that number of deaths (not registrations) in India peaked in 2013 and has been declining since. However, many states with near-100% death registration in 2013 saw continued growth in registered deaths at above average pace over 2013-19 (e.g. Andhra Pradesh, Delhi, Gujarat, Haryana, Karnataka, Tamil Nadu). Go figure!

My limited take-away from CRS is to solely analyse registered-death data, without fussing about true deaths numbers or registration percentages. It is hard enough separating signal from noise with formally tabulated registrations. It is futile to attempt the same using estimated parameters. In Western countries, this distinction is moot as systems have evolved to a near 1:1 match between deaths and registrations.

India’s trend growth matters.

Take a look at two charts:

India’s registered deaths (in red) show a consistent increasing trend, with 3% cagr over last three decades. UK (representative of Western countries) showed -0.2% cagr over same period. Even during an increasing phase, 10-year cagr didn’t exceed 0.8%. Western methodology for calculating ‘excess deaths’ simply compares covid-period deaths to prior years (typically 2015-19) without any trend adjustment. This makes sense when trend is zero-growth. This does not make sense when trend growth is 3%. As we’ll see in next section, many states show trend growth well above 4%. Incorporating trend-adjustment eliminates 100% of ‘excess deaths’ from many superficial analyses.

Variability also matters, and complicates.

Now for the trickier part. Pan-India registered deaths data is unavailable for covid-impacted 2020 and 2021. So, we have to make do with whatever state-level data enterprising journalists have scraped. This has to be framed against state-level historic data, which is way more messy than national aggregates. What follows is 10-year state-wise registration data from CRS-2018 and CRS-2019 (for top 19 states, accounting for 96% of registered deaths). I show raw data, annual growth over entire period and year-wise growth rate in the next two tables.

Treat the first table as reference data for subsequent analysis. Let’s only discuss the second. For a particularly volatile example, look at Bihar. 9 of 10 years show massive double-digit growth or decline in registered deaths. You’d be forgiven for thinking that covid hit Bihar in 2019. How does one discern a covid-uptick out of this erratic pattern (Did recent media report on Bihar excess-deaths even mention this relevant history)? 14 of 19 states had at least one outlier uptick. India, as a whole, showed near-double-digit growth across 2018 and 2019. In aggregate, 33 of 182 datapoints (18% of the time, shaded red) show double-digit growth over previous year. That’s a lot of cleverly hidden pandemic years, long before the damn bug escaped its lab! More seriously, this is an artefact of an evolving death registration system, not a tragedy every 5-6 years. But when a pandemic occurs, how do we separate the two?

Making sense of covid-period deaths requires careful framing against historical context

Why assign so much space to a backstory? Because, context  and framing matter. Incremental information is meaningless unless appropriately viewed against a historical trend. Expected/trend deaths for 2020/2021 is the baseline against which excess is measured. When trend is unclear or volatility is high, this baseline becomes fuzzy. Above-trend deaths is a crude range, not one definite number. If this falls within historic range of fluctuations, we simply don’t know what to make of it.

There are other dimensions to be incorporated. Not all excess deaths are due to covid, since other deaths are elevated due to healthcare deprivation. Often, this is downplayed. Disingenuous references are made about road accidents being down (at 2% of deaths, effect is marginal compared to >80% of deaths being impacted by crunched hospitals). In Western countries where better cause-of-death data is available, studies attribute of ~70% of excess deaths to covid, not 100%. This is rarely incorporated into sensational ‘undercount factors’. Similarly, internals of state-level data can hold relevant information. This could be district-wise or age-wise colour. Idiosyncratic factors help refine analysis or gauge data reliability. In second-wave, where timeframes are shorter, lead time for registering deaths (CRS prescribes 21-days, reality is longer) is a factor in deciding choice of time-period to zoom in on.

In summary, it’s complicated. It’s not about click-n-drag. It’s not about a model spitting out a pan-India number. This requires careful, often subjective, incorporation of history into analyses, one state at a time. Extrapolation from parts to whole requires caution and humility. Reliably estimating above-trend deaths is way harder than shallow ‘massive undercount’ stories imply.

On a related note, I have a simple rule that works in my day-job. Good companies show history, comprehensively and transparently. Flip through investor-relations website of a respectable business and you’ll find last five years balance sheet, cash flow and return on capital in big font in investor presentation. Take a dodgy outfit and you’ll have to kidnap promoter’s first-born child to get that information (not a recommendation). Hiding relevant history is a red flag. It’s the same with good analysis. An analyst (even seemingly credentialled ones) who does not show relevant history, and fails to explain how it impacts conclusions, reveals more about himself than about the topic at hand.

Whither from hither

This essay lays out the problem being attempted, relevant context and challenges. It hints at a more historically-grounded approach that can minimize flaws seen is typical ‘excess death’ stories. Next step is to implement this approach, state by state, wave by wave, across all available death registration data. While I will attempt this, it will not be all at once. I’ll start by analysing a few states in detail, to better illustrate the approach as well as to log specific problems that will become apparent only as we go deep (e.g. data inconsistencies, confusing trends). Only examples can convey how messy this world is. I may poke into a few cities/districts, where data gets even more volatile. Then, I’ll see what we can make of first-wave if we aggregate all available regional analyses. While second-wave data is still coming in, we can gingerly attempt a similar exercise in July. Somewhere along the way, I’ll also update my life insurer claims analysis, for independent triangulation.

I cannot guarantee reliability of results or how soon I’ll do all this. However, I can guarantee that I will show you as much historical data as is available in each case. When I start from a media report on a particular state, I will only use its raw data, not someone else’s assumptions. I’ll be transparent about my own assumptions or subjective judgment calls. In each case, you should be able to separate data from my biases, so that you can make different inferences as required.

Unlike those confident folks, I doubt if I’ll reach a grand India-undercount estimate for many weeks. Maybe longer. But, that’s not the point of this exercise. This isn’t about an answer. It’s about a way of thinking. About numeracy, history, complexity, uncertainty, imprecision. About humility in jumping to conclusions, when buggy humans analyse messy world. Once you appreciate these, you’ll get to better answer than what’s out there, with or without me.

PS. As a hint of how this works, let’s close with a quick look at Bihar. Recent article ran with sensational headline of 75,000 excess deaths over Jan-May 2021. While article didn’t explicitly table month-wise numbers, it implied ~35,000 deaths/month over Jan-April and ~70,000 deaths in May. CRS places Bihar’s monthly deaths at 30,000 in 2019 and trend growth at 10% a year (we’ll use average & ignore variation, since this is a shallow-dive). This suggests that Jan-April 2021 deaths were in line with trend and Bihar saw ~35,000 above-trend deaths in May. Incorporating relevant history has already cut excess deaths by 55%. Say, we attribute ~70% of above-trend deaths to covid. Against ~8000 reported covid deaths (including retrospectively added 4000), this equates to 3x undercount in a state with India’s weakest death registration system. (Any second-wave analysis has to be qualified with: disease, death registrations, covid death reporting are all ongoing, making inferences conditional on incoming data).

Whether you agree with my assumptions or trend calculation is the less important part. The more important one is that you have a decade of history, against which you can make your own informed inference about 2021 covid reporting in Bihar. That’s a step up from just seeing a 75,000 or 10x headline, or worse, a sensationalist pan-India extrapolation from a flawed starting point.