Excess deaths in Tamil Nadu

Deaths in line with trend during first-wave; higher in second-wave but well below what’s claimed


During first-wave, TN’s death registrations were in line with historic trend, pointing to reliable covid reporting. During second-wave, TN witnessed 40,000 above-trend deaths till end-May, which is high but well below headlined estimates. June data is required for more comprehensive analysis.  

Trend analysis for TN

First essay of this series outlined an approach to estimating ‘excess deaths’, by viewing covid-period data against relevant long-run trend. This essay illustrates that process using data from a non-controversial state. Since most can’t tell MDMK from DMDK, it’s easier to make this about analytics, not politics. All we need is one picture:

Pre-covid data, in blue, is from CRS-2019. Covid-period data, in red, is from media reports. Note that two rightmost points pertain to 4-months and 1-month, while rest pertain to a year. Just eyeball the data, maybe hold up a ruler against the series of dots. It’s easy to see that, except for one outlier month (likely to become two), covid-period deaths are in line with prior trend. This tells us that:

1.       First-wave deaths (2020, early-2021) are in line with trend. If excess deaths are road to undercount, this is a dead-end.  

2.       Not so in second-wave. May-21 witnessed ~40,000 deaths above trend (I deliberately use round numbers for estimates as a reminder that all this is inherently imprecise). Since June could also be above-trend, we’ll await more data before any profound inference.

If above-trend equates to undercount, corollary requires us to equate below-trend (or in-line) to no undercount. But I won’t go that far. Some undercount is inevitable everywhere. Even West, with better systems, estimates at least 1.5x undercount. Let’s just say, outside of inherently unavoidable undercount, TN’s covid reporting seems reasonable in first-wave. At the least, excess death analysis doesn’t offer a path to undercount.

One picture, few words, no math. That didn’t seem hard. But, that’s not how it’s typically done.  

Different approach and inconsistent data lead to inflated estimates

How do other estimates compare to 40,000? One report has it at 1,61,581. Another at 1,29,215 (admittedly, with 13 days of June included). A part of this difference is because both reports directly compare 2020/21 data to 2018-19 average, without incorporating trend. To illustrate problems with this approach, had you compared India’s 2019 deaths to 2016-17 average, headline would read “1.2 million excess deaths in 2019”. Infinite undercount too.

Second reason for difference is that the reports use 2018/19 data that neither match each other nor match published CRS-2019 data. There are other inconsistencies. In one report, TN deaths rose faster in 2019 than 2020, making it odd to claim excess deaths in 2020. While one report provides month-wise data for Jan-May 2021, other provides aggregate data till June 13th 2021, mentioning that TN doesn’t provide month-wise data. 2020’s 6.44 lakh deaths is from detailed data uploaded by media analyst in XL format. With CRS-2019 showing 6.34 lakh deaths in 2019, 1.6% growth in 2020 is below-trend compared to decadal 3.5% cagr. Since no trend is perfectly smooth, it is best to view small deviations as inevitable fluctuations rather than quantify every blip into under/over counts. Unless a deviation from trend is material, we end up analysing noise, not signal.

I mention these quirks so that you appreciate messiness of it all, not to doubt anyone. Across multiple states, analysts have done a remarkable job of shifting discourse from anecdotes to statistics by meticulously extracting systemic data. I respect them and their work. What’s seen here is inherent unreliability of mid-period data from an evolving death registration system. Getting into inner workings has revealed inconsistencies in every report I have seen (particularly severe in recent UP estimation, but that’s for another essay). When data is patchy and method is inappropriate, headline numbers are best ignored unless independently corroborated.

What am I hiding?

Nothing really, but this is a good question to ask since what’s not said is often more important than what’s said. Let’s make a list.

·       I use averages, not month-wise data. Ok for first-wave since it was spread over most of 2020. In 2021, did I hide an April-spike? No, since April was 58k vs 4-month average of 56k.

·       Seasonality. Gotcha. What if Jan-May baseline is lower than full-year? As it turns out, it’s actually higher in TN, based on available data. More generally, month-wise data is noisier than yearly data, seasonality is often inconsistent and extracting decadal baselines for each month is impossible. I’d rather consistently use full-year averages as trend-data. Soon, we’ll just be looking back at all of 2020 and 2021 without nit-picking over specific months.

·       What about first 13 days of June? Yes, June seems above trend so far. However, given above inconsistencies, I ignore part-month data that only one of two reports mentioned. I’ll get to June once full-month data is out.

·       No comments on way above-trend deaths in second-wave. I’ll address this and close.

Second-wave analysis and extrapolation

Bulk of covid deaths in second-wave happened in April and May, although TN’s delayed second-wave could impact June too. Given 3-4 week lag in death registrations, these will be mostly recorded over May and June. Registration spike seen in May corresponds to a combination of April and May deaths. Covid death reporting is generally more timely than death registrations. However, part of it is delayed, as states retrospectively recognize prior-period deaths. Mismatched timeframes for mortality and covid reporting means we shouldn’t over-analyse short durations (e.g. just May). It also doesn’t make sense to start too early, since data is incomplete.

A reasonable approach is to view April-May-June as a single-unit, as soon as data permits. Same analysis should then be repeated for 2021 as a whole, since excess deaths tend to be mean-reverting and full-year numbers might even be lower than half-year numbers.

In summary, it’s too early to (over) analyse second-wave. When others do so, remember that if headlined excess death estimates are erroneous at state-level (as seen with TN), headlined pan-India numbers are likely to be even more so.


I started with a picture. Data embedded in it pertains to a single state. Method embedded in it is universally applicable. It entails framing recent data against long history rather than arbitrarily subtracting one recent datapoint from another. Luckily, all required history is available in one place. In fact, on one page – ‘Statement 9’ on Page 28 of CRS-2019 report. Simply framing covid-period data from media-reports against prior data from this page leads to a better inference than what others want you to believe.


Apart from data sources mentioned above, data is also available from Tamil Nadu’s CRS system (http://www.crstn.org/birth_death_tn/). Unfortunately, that data does not tally with either national CRS-2019 report (for 2018, 2019) or with data uploaded as part of one of the media reports referred to above. According to TN-CRS, registered deaths grew at a very high 16% in 2019 (pre-covid) and 8% in 2019. Depending on data source used, TN’s growth in registered deaths varies between 2% and 8%, compared to trend growth of 4%. At the upper end of this range, it is possible that TN recorded ~20,000 above-trend deaths in 2020. However, with 12,000 reported covid deaths over the same period and not all excess death attributable to covid, this article’s inference of no first-wave undercount (above an inherently unavoidable level) holds. As does its inference of TN’s excess deaths being elevated mainly during May-June 2021, coinciding with second-wave. Per TN-CRS, TN could witness over 100,000 above-trend deaths across May-June 2021, since June appears more elevated than May. However, as this data has been inconsistent with centrally published CRS data in the past, it is best to treat these numbers as provisional.