Analysis of excess deaths during first-wave

CRS data, across 12 states which account for over 70% of India's deaths, indicates that India did not witness above-trend deaths during covid first-wave

Enterprising journalists have done a commendable job of extracting data out of civil registration systems (CRS) across multiple states. This has shifted discourse from unreliable anecdotes to robust data, furthering our understanding of covid impact. While a larger part of analysis has focused on second-wave, that exercise is constrained by incomplete information. Less attention has been paid to studying how India did during first-wave using the same data. This article reviews all publicly available data pertaining to first-wave, to gauge how India’s deaths trended during that phase.

2020 deaths seem in line with trend

For 2020, CRS death-registration data is available for twelve states that cumulatively accounted for 72% of India’s registered deaths in 2019. Following table shows growth in registered-deaths in 2020 compared to historic growth over 2009-19. This allows first-wave to be viewed against long-run trend and variations in trend, to make sound inferences.

Before getting into nuances, what’s the gist? Aggregate registered-death growth of 4% in 2020 across twelve states is in line with 2013-2019 growth rate of 4% a year. If AP is excluded (since it limits historic comparisons to 2013 onward), 2020 growth for remaining eleven states is also in line with 2009-19 trend growth of 3% a year. 2018 and 2019 witnessed higher growth in deaths than 2020.

While we can come up with theories for outlier states, it’s worth noting that every year has 3 to 5 states that show high growth. For eleven of twelve states, 2020 is not the worst increase on record (except Assam, where 15% increase in 2020 is slightly higher than 14% in 2019). However, before any definitive conclusions, we should go through a few pitfalls pertaining to data and reasoning.

Data inconsistencies

Why did I only show growth rates and not absolute number of deaths in 2020? For every state, source for 2020 data also mentions numbers for 2018 or 2019. These simply do not match numbers from official CRS-2019 report. This is not about media unreliability, as official state government communication also shows similar discrepancies. Kerala’s official note has 2015-19 data that is (slightly) different from CRS-2019 report. Kerala’s 2020 figures also increased by 6% between initial state government note and recent media report (I used higher number). Rajasthan mentions a figure for 2019 that is 12% lower than in CRS-2019. TN state government CRS data doesn’t tally with central CRS data for either 2018 or 2019. TN state CRS shows an incredibly high 16% growth for pre-covid 2019. Media data for MP, TN, AP and Bihar do not match CRS data for 2018 or 2019. For UP, media-sourced data places 2019 deaths 18% lower than CRS-2019. Only Gujarat and Karnataka data seem to match for 2019. In an exercise where excess deaths are derived using small deviations from noisy trend, such inconsistencies add to uncertainty and imprecision.

At a granular level, inconsistencies increase. Bihar had a spike in deaths only in December-2020, months after first-wave peaked. This differs from other states that witnessed above-trend deaths somewhere between July and October. Assam witnessed 20% growth in Jan-March 2020 (pre-covid) and 13% growth in April-December (first-wave), suggesting that a part of the increase may just be improvement in registration system.

My best guess is that discrepancies are due to delays in logging data, period-end reconciliations, eliminating duplicate entries, definitional issues (applied vs approved) etc. Taking 2019 data from one source (i.e. CRS-2019) and 2020 data from another (i.e. media) leads to comparisons that aren’t like-to-like. It also makes it possible for me to cheat, by cherry-picking a data-pair that best suits by biases. So, in each case, I have taken 2019 and 2020 data from same source, to calculate like-to-like growth rate (which is shown in above table). Wherever there was ambiguity, I have erred on the side of using a higher number for 2020 growth. For calculating average across states, I have weighted growth rates by CRS-2019 registered-deaths.

Subjective inferences

“1,42,143 excess deaths in TN”. Precise headlines have become commonplace, as if they are indisputable fact. Excess deaths are poorly defined and subjectively estimated. Excess deaths actually mean above-trend deaths. Trend is ambiguous. It depends on period of reference. Past trend need not be representative of future years. If most years deviate from trend, determining whether 2020 is deviant is dicey. By how much is dicier.

In Assam, decadal growth is 5% a year. If anything above this is ‘excess’, Assam witnessed excess deaths in four of six years preceding covid. This is evidently absurd, as excess cannot be the norm. It is highly likely that Assam’s trend growth in death-registrations is closer to 10%, in light of 9% growth over 2015-19 and 20% growth in early-2020. In TN and Bihar, what to make of 2019 growth being higher than 2020? If we attribute 2020 excess deaths to covid, what caused greater ‘excess’ in prior years? In every state with above-trend deaths, excess deaths are a subjective estimate, not ordained fact. Reasonable people can disagree here.

Why ramble about unreliability of data and ambiguity in method? It’s a statutory warning to not take headlines or precise large numbers seriously. Note that I am logging confounding issues pertaining to 2020, which is firmly in rear-view mirror. More confident folks are making sensationally precise estimates for mid-2021, pertaining to an ongoing wave. Admitting that “it’s complicated” or discussing error-bands interferes with eye-catching headlines. From what I have observed, headlined numbers can easily change by 50% with small (and not unreasonable) tweaks in approach.

As a general guideline, do not ignore input data presented in media reports. But, do ignore estimates (including mine). It’s best to use data, in conjunction with historic trends, to make an independent assessment. I prefer my own mistakes and biases to others’.

What about remaining states?

State-wide data is unavailable for 2020 outside of these twelve states. Some city-level data is available. Hyderabad and Kolkata deaths grew 17% and 7% respectively over 2019. However, cities can be unrepresentative of states. In 2020, Bangalore deaths grew 16% while Karnataka deaths grew at a way lower 8%. Similarly, Chennai deaths grew 11% while TN grew 8%. Mumbai grew 23% while Maharashtra grew 1%. Hyderabad’s historic growth of 10% a year over 2016-19 is way higher than Telengana’s 4%. Perhaps, cities serving as medical hubs distorted trends. Whatever the reason, it doesn’t seem prudent to extrapolate from city-level data.

There is no way to precisely know how remaining 28% of India did in 2020. There are unreliable straws in the wind. Kolkata data hints that West Bengal may not have been too elevated. States like Orissa and Jharkhand seem to have done better than most even in second-wave, making them unlikely candidates for excess deaths in first-wave. However, serious analysis cannot be based on such speculations.

For that, I can think of two approaches. First, 72% of India is closer to universe than to a sample. While it may not be perfectly representative, it is our best available estimate for how all of India did. Second, use what-if scenarios for the remainder. If remaining 28% of India witnessed deaths 5% above trend (i.e. 8-10% growth in deaths over 2019), that is over 100,000 excess deaths (5% of ~2 million). With India reporting 150,000 covid deaths over 2020, this is not indicative of elevated undercount. Rest of India will have to do way worse for such claims to hold, something not backed by available first-wave evidence.

Early-2021 was similar to 2020

While it is hard to put precise start/end dates on pandemic waves, first-wave did not end when 2020 ended. Early part of 2021 marked the tail end of first-wave. 2021 month-wise death data is available for six states (TN, Karnataka, AP, Bihar, Kerala, MP). Jan-March monthly deaths over these six states are 6% above 2019-average, which is roughly consistent with trend-growth over a two-year period. This is does not suggest elevated excess deaths during latter part of first-wave either.

First-wave summary

All things considered, India’s all-cause mortality was roughly in line with historic trend during covid first-wave. This is not suggestive of elevated excess deaths or covid death undercount. As I have written earlier, this does not equate to no undercount at all. Even West, with better systems, estimates at least 1.5x undercount. Let’s just say, outside of inherently unavoidable undercount, India’s covid reporting seems reasonable in first-wave. At the least, excess death analysis doesn’t offer a path to finding undercount. With India recording 160,000 covid deaths till March-2021, ‘true’ first-wave death toll could be somewhat higher but not way higher. That is about as precise as we can get in our messy world.

Implications for second-wave analysis

Above analysis offers a template for analysing second-wave. Idea is to frame all available evidence against historic trend/variations and cautiously estimate above-trend deaths. First-wave analysis also guides us on where to focus. Since deaths were consistent with prior trend till early-2021, entire ‘massive undercount’ debate is limited to a 3-month period (April-May-June 2021). Over any three months of 2021, India is expected to witness over 2 million registered deaths, as per trend. Question becomes, how many additional deaths were witnessed in April-May-June, due to direct and indirect impact of covid.

For a reliable answer, it makes sense to wait for June data. It’s imprudent to leave out one of three months of interest. Also, given inconsistencies in data from prior year, data from prior month(s) will be even more patchy and buggy. Analysis needs to be more careful and inferences more guarded. It will be a tentative estimate, pending validation via formal 2021 statistics. Until then, it is best to not take headlined estimates at face value.