Skip to main content
Intended for healthcare professionals
Free access
Research article
First published online February 1, 2024

Multiple conformity tests to assess deviations from the Newcomb-Benford Law (NBL): A replication of Koch and Okamura (2020)

Abstract

In this paper, we critically reevaluate Koch and Okamura’s (2020) conclusions on the conformity of Chinese COVID-19 data with Benford’s Law. Building on Figueiredo et al. (2022), we adopt a framework that combines multiple tests, including Chi-square, Kolmogorov-Smirnov, Euclidean Distance, Mean Absolute Deviation, Distortion Factor, and Mantissa Distribution. The primary rationale behind employing multiple tests is to enhance the robustness of our inference. The main finding of the study indicates that COVID-19 infections in China do not adhere to the distribution expected under Benford’s Law, nor does it align with the figures observed in the U.S. and Italy. The usefulness of deviations from Benford’s Law in detecting misreported or fraudulent data remains controversial. However, addressing this question requires a more careful statistical analysis than what is presented in the Koch and Okamura (2020) paper. By employing a combination of several tests using fully transparent procedures, we establish a more reliable approach to evaluating conformity to the Newcomb-Benford Law in applied research.

1. Introduction

The Newcomb-Benford Law (NBL), also known as Benford’s Law or the First-Digit Law, is a statistical phenomenon that characterizes the expected distribution of leading digits in a wide range of real-world datasets (Benford, 1938; Newcomb, 1881). According to the Newcomb-Benford Law (NBL), the digit 1 appears as the leading digit about 30% of the time, followed by the digit 2 at roughly 18%, with the frequency gradually decreasing for higher digits. A rigorous mathematical proof of the law was developed by Hill (1995).
NBL is widely applied as a forensic tool for detecting suspicious patterns in data (Nigrini, 2012). Scholars have applied this law in various fields, including international trade (Cerioli et al., 2019), money laundering (Deleanu, 2017), elections (Figueiredo Filho et al., 2022; Mebane, 2006; Pericchi & Torres, 2011) and campaign finance (Cho & Gaines, 2007; Gamermann & Antunes, 2018). Researchers have also applied the Newcomb-Benford Law to analyze data related to the COVID-19 pandemic (Campolieti, 2021; Farhadi, 2021; Lee et al., 2020; Silva & Figueiredo Filho, 2020). These studies primarily evaluate how well the digit frequencies in epidemiological figures align with the NBL. Deviations from theoretical distribution are interpreted as potential signals of inconsistencies, encompassing deliberate fraud or failures in surveillance systems to provide reliable information (Balashov et al., 2021; Figueiredo Filho et al., 2022; Kennedy & Yam, 2020).
In this paper, we follow the framework developed by Figueiredo et al. (2022) to challenge the conclusions reported by Koch and Okamura (2020) regarding the distribution of Chinese COVID-19 figures. The reasoning for reanalyzing Koch and Okamura (2020) data is the lack of rigor in their data analysis. First, they reject the null hypothesis by cherry-picking evidence. Second, they offer an unsolid claim to use Kuiper test instead of the chi-square test. Additionally, we believe that students and professionals will benefit from our replication materials since we provide detailed guidance on how to implement the aforementioned tests using R statistical programming language.
The remainder of the paper is structured as follows: Section 2 offers an account of the data employed in this study and delivers a concise overview of the Newcomb-Benford Law. In Section 3, a series of multiple conformity tests are presented, focusing specifically on the data provided by Koch and Okamura (2020), with the aim of questioning and scrutinizing their primary findings. Finally, Section 4 concludes the paper.

2. Materials and methods

2.1 Data

Following the best scientific practices (Figueiredo Filho et al., 2019), we contacted both authors asking for cleaned data and computational scripts, but as of the submission of this paper they have not replied. Thus, we downloaded the original dataset from the Mendeley website (Koch & Okamura, 2020). However, the display of the information is not standardized across countries and neither computational scripts nor codebooks were provided, which makes it challenging to reproduce their results.
In spite of this difficulty, we identified which columns were used to run Benford Law first digit analysis. Then, we saved the spreadsheets as independent files and after some data cleaning we produced three files in .xlsx format: China, Italy and US. All data is aggregated by country with daily periodicity. We have collected data from February 28, 2020 to July 1, 2020.
Table 1 Theoretical digit distribution of NBL
Digit1st2nd3th4th
0 1210.210
130.111.410.110
217.610.910.110
312.210.410.110
49.7101010
57.99.7109.9
66.79.39.99.9
75.899.99.9
85.18.89.99.9
94.68.59.89.9

2.2 Statistical analysis

Discovered independently by Simon Newcomb in 1881 and later popularized by physicist Frank Benford in 1938, this empirical observation asserts that in naturally occurring numerical datasets, the leading digits of numbers are not uniformly distributed as one might intuitively assume (Benford, 1938; Newcomb, 1881). Instead, smaller digits, particularly ‘1,’ occur more frequently as the first digit than larger ones, such as ‘9.’ The exact distribution for the NBL for the first digit is given by:
P(d)=log10(1+1d) for d{1,,9}
(1)
This intriguing non-uniform distribution has been found to emerge across several datasets, ranging from financial accounting, population demographics, scientific data, to even naturally occurring phenomena. As a result, the Newcomb-Benford Law has gained significant prominence for its potential applications in fraud detection, data integrity assessment, and as a valuable tool for anomaly detection in large-scale datasets. Table 1 shows the NBL theoretical frequency of the first, second, third, and fourth digits.
For the application of Benford’s Law to a specific dataset, the data should exhibit a geometric progression or consist of multiple geometric progressions (Lee et al., 2020; Nigrini, 2012). Moreover, it requires large data sets whose numbers combine multiple distributions, cover several orders of magnitude, and where the mean is greater than the median with a positive skew (Cho & Gaines, 2007; Ciofalo, 2009; Janvresse, 2004). In the context of COVID-19 data, the exponential rise in SARS-COV-2 infections fulfills these assumptions (Hutzler et al., 2021). MAD estimates should be interpreted following Nigrini’s (2012) guidelines, as reported in Table 2.
Table 2 MAD range according Nigrini (2012)
First digit MAD rangeConclusion
0.0000 to 0.006Close conformity
0.006 to 0.012Acceptable conformity
0.012 to 0.015Marginally acceptable conformity
Above 0.015Nonconformity

2.3 Computational tools

All statistical analyses were performed using R Statistical, version 4.1.2, and all tests were two sided with 5% of significance level Replication materials including raw data and computational scripts are available at: https://osf.io/ ep3wd/.
Table 3 First digit distribution of number of COVID-19 confirmed cases by country
Digit123456789
China3518.612.47.98.86.85.23.22
Italy33.314.5109.69.36.26.65.45.1
U.S.29.717.61310.98.16.75.14.93.8
Figure 1. Koch and Okamura (2020) observed values x NBL theoretical expectation.

3. Results

Figure 1 and Table 3 compare the first digit distribution of COVID-19 confirmed cases with the theoretical expectation under Newcomb-Benford Law by country.
Comparatively, the figures from China exhibit the highest deviation from what is expected under the hypothesis of conformity to NBL. Specifically, while the expected theoretical frequency of the first digit is 30.1%, estimates from China indicate an observed frequency of 35%. We also detected a strong underestimation of 8 and 9 digits. In contrast, data from the U.S. and Italy demonstrate a strongest adherence to Benford’s Law. Following Nigrini’s (2012) recommendation, Table 4 shows the reanalysis of Koch and Okamura (2020) data by including multiple conformity tests.
The results indicate that regardless of the measure, the Chinese data fails to conform to the Newcomb-Benford Law. Both Chi-square (China = 26.12, p-value 0.001; Italy = 18.13, p-value 0.02; US = 17.35; p-value 0.027) and Kolmogorov-Smirnov (China = 15.51, p-value < 0.001; Italy = 6.76, p-value < 0.001; US = 10.26, p-value < 0.001) tests are highly significant, leading to the rejection of the null hypothesis. The Mean Absolute Deviation (MAD) values further reinforces this nonconformity (China = 0.0154 [Nonconformity]; Italy = 0.0137 [Marginally acceptable conformity]; US = 0.0044 [Close conformity]), as described in Table 4.
Table 4 Reanalysis of Koch and Okamura (2020) data
ParameterChina (N= 717)Italy (N= 980)U.S (N= 4.427)
Chi-square26.12**18.13*17.35*
Kolmogorov-Smirnov15.51***6.76***10.26***
Euclidean distance5.73***2.99***4.29***
Mean absolute deviation0.01540.01370.0044
Distortion factor-22.42 -2.27   -4.04
Mantissa (0.500)0.4090.4810.487
Variance (0.083)0.090.0950.083
Kurtosis (-1.2)-1.217-1.312-1.166
Skewness (0)0.118-0.015-0.032

*p-value < 0.05; **p-value < 0.01; ***p-value < 0.001.

Figure 2. Conformity measures of number of COVID-19 confirmed cases by country. Note: chi2 = Pearson chi-square; ks = Kolmogorov-Smirnov D statistic; ed = Euclidean distance; mad = Mean absolute deviation; mantissa = Average mantissa.
Comparatively, the Chinese data exhibits higher distortion factor (-22.42) than Italy (-2.27) and the U.S (-4.03), indicating a strong underestimation (Nigrini, 2012). Theoretically, adherence to Benford’s Law also implies a uniform distribution of mantissa. According to Newcomb (1881), “the law of probability of the occurrence of numbers is such that all mantissa of their logarithms are equally probable” (Newcomb, 1881, p. 3). While China (0.412) significantly deviates from the expected distribution under Benford’s Law (0.500), Italy (0.481) and the U.S. (0.487) show closer values to the theoretical distribution. In summary, the distribution of COVID-19 confirmed infections in China neither matches the distribution expected under Benford’s Law nor aligns with the figures observed in the U.S. and Italy, as reported by Koch and Okamura (2020). This finding is supported by Peng and Nagata (2020), Kennedy and Yam (2020) and Lee et al. (2020).
In what follows we show the advantage of using multiple conformity tests to evaluate the results of NBL empirical applications. To do so, we constructed three artificial datasets that match the exact same sample size analyzed by Koch and Okamura (2020). Figure 2 compares the conformity measures from the simulated data, which perfectly fits the Benford-Law expectations, to the goodness-of-fit estimates reported in Koch and Okamura (2020).
Statistical theory teaches us that when sample size increases, statistical tests tend to become more sensitive, increasing the likelihood of detecting smaller effects. Therefore, statistical power plays a key role in scientific inference by determining the probability of correctly rejecting a false null hypothesis. Koch and Okamura (2020) are right when they argue that the Chi-square test is sensitive to sample size. However, they failed to acknowledge that the excess of power “starts being noticeable for data sets with more than 5,000 records” (Nigrini, 2012, p. 154), which is not the case in their study. In essence, Koch and Okamura (2020) specifically selected the Kuiper test, which aligned with their hypothesis. Had they chosen any other test, they could have arrived at the opposite conclusion. It is essential to recognize the potential influence of test selection on the study’s outcomes, In addition, our simulations show that the joint application of multiple conformity tests leads to more reliable conclusions regarding the role of sample size driving empirical results when using NBL.

4. Conclusions

This paper expands upon the work conducted by Koch and Okamura (2020) in their application of Benford’s Law to evaluate the integrity of COVID-19 data. In an effort to enhance our comprehension of NBL, we emphasize the significance of employing multiple conformity tests, which yield more robust inferences compared to relying solely on a single measure. Our results show that Koch and Okamura (2020) findings do not hold under multiple testing. In particular, we demonstrate that the joint application of conformity tests is a more reliable approach to evaluate data integrity in NBL settings. Whether deviations from Benford’s Law are useful for detecting misreported or fraudulent data remains controversial, but approaching this question demands a more thoughtful statistical analysis than what is presented in the Koch and Okamura (2020)’s piece.
Despite the contribution we have made, there are some limitations that are worth mentioning. First, we were unable to access appropriate replication materials from Koch and Okamura’s (2020) study. Consequently, we may have missed some of their methodological procedures. Second, there is widespread skepticism regarding COVID-19 epidemiological data in general, mainly due to reporting delays and measurement errors. Taking these shortcomings into consideration, they could potentially act as sources of bias.

Acknowledgments

We are thankful to the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Fundação de Amparo à Pesquisa do Estado de Alagoas (FAPEAL) for their financial support. We also appreciate the referees for their constructive comments which have led to significant improvement of the manuscript.

References

1. Balashov V.S. Yan Y., & Zhu X. (2021). Using the Newcomb-Benford law to study the association between a country’s COVID-19 reporting accuracy and its development. Scientific Reports, 11(1), 22914.
2. Benford F. (1938). The law of anomalous numbers. Proceedings of the American Philosophical Society, 78(4), 551-572.
3. Campolieti M. (2021). COVID-19 deaths in the USA: Benford’s law and under-reporting. Journal of Public Health, fdab161.
4. Cerioli A. Barabesi L. Cerasa A. Menegatti M., & Perrotta D. (2019). Newcomb-Benford law and the detection of frauds in international trade. Proceedings of the National Academy of Sciences, 116(1), 106-115.
5. Cho W.K.T., & Gaines B.J. (2007). Breaking the (Benford) Law. The American Statistician, 61(3), 218-223.
6. Ciofalo M. (2009). Entropy, Benford’s first digit law, and the distribution of everything. Dipartamento Di Ingenieria Nucleare, Universita Degli Studi Di Palermo, Italy, 35.
7. Deleanu I.S. (2017). Do Countries Consistently Engage in Misinforming the International Community about Their Efforts to Combat Money Laundering? Evidence Using Benford’s Law. PloS One, 12(1), e0169632.
8. Farhadi N. (2021). Can we rely on COVID-19 data? An assessment of data from over 200 countries worldwide. Science Progress, 104(2), 00368504211021232.
9. Figueiredo Filho D. Lins R. Domingos A. Janz N., & Silva L. (2019). Seven reasons why: A user’s guide to transparency and reproducibility. Brazilian Political Science Review, 13(2), e0001.
10. Figueiredo Filho D. Silva L., & Carvalho E. (2022). The forensics of fraud: Evidence from the 2018 Brazilian presidential election. Forensic Science International: Synergy, 5, 100286.
11. Figueiredo Filho D. Silva L., & Medeiros H. (2022). “Won’t get fooled again”: Statistical fault detection in COVID-19 Latin American data. Globalization and Health, 18(1), 105.
12. Gamermann D., & Antunes F.L. (2018). Statistical analysis of Brazilian electoral campaigns via Benford’s law. Physica A: Statistical Mechanics and Its Applications, 496(C), 171-188.
13. Hill T.P. (1995). Base-Invariance Implies Benford’s Law. Proceedings of the American Mathematical Society, 123(3), 887-895.
14. Hutzler F. Richlan F. Leitner M.C. Schuster S. Braun M., & Hawelka S. (2021). Anticipating trajectories of exponential growth. Royal Society Open Science, 8(4), 201574.
15. Janvresse T. (2004). The Pascal adic transformation is loosely Bernoulli. Annales de l?Institut Henri Poincare (B) Probability and Statistics, 40(2), 133-139.
16. Kennedy A.P., & Yam S.C.P. (2020). On the authenticity of COVID-19 case figures. PLOS ONE, 15(12), e0243123.
17. Koch C., & Okamura K. (2020). Benford’s Law and COVID-19 reporting. Economics Letters, 196, 109573.
18. Lee K.-B. Han S., & Jeong Y. (2020). COVID-19, flattening the curve, and Benford’s law. Physica A: Statistical Mechanics and Its Applications, 559, 125090.
19. Mebane W. (2006). Election forensics: The second-digit Benford’s law test and recent American presidential elections.
20. Newcomb S. (1881). Note on the frequency of use of the different digits in natural numbers. American Journal of Mathematics, 4(1), 39-40.
21. Nigrini M.J. (2012). Benford’s Law: Applications for Forensic Accounting, Auditing, and Fraud Detection (1 edição). Wiley.
22. Peng Y., & Nagata M.H. (2020). Statistical analysis of the Chinese COVID-19 data with Benford’s Law and clustering. Laboratório de Aprendizado de Máquina em Finanças e Organizações (LAMFO). https://lamfo-unb.github.io/2020/04/21/COVID-China-EN/.
23. Pericchi L., & Torres D. (2011). Quick Anomaly Detection by the Newcomb-Benford Law, with Applications to Electoral Processes Data from the USA, Puerto Rico and Venezuela. Statistical Science, 26(4), 502-516.
24. Silva L., & Figueiredo Filho D. (2020). Using Benford’s law to assess the quality of COVID-19 register data in Brazil. Journal of Public Health, fdaa193.