New content
Model Assisted Statistics and Applications

Free access

Research article

First published online February 1, 2024

Multiple conformity tests to assess deviations from the Newcomb-Benford Law (NBL): A replication of Koch and Okamura (2020)

Dalson Figueiredo https://orcid.org/0000-0001-6982-2262 and Lucas Silva https://orcid.org/0000-0002-5013-6278 [email protected]View all authors and affiliations

Volume 19, Issue 1

https://doi.org/10.3233/MAS-231459

Abstract
1. Introduction
2. Materials and methods
3. Results
4. Conclusions
Acknowledgments
References

PDF/EPUB

Abstract

In this paper, we critically reevaluate Koch and Okamura’s (2020) conclusions on the conformity of Chinese COVID-19 data with Benford’s Law. Building on Figueiredo et al. (2022), we adopt a framework that combines multiple tests, including Chi-square, Kolmogorov-Smirnov, Euclidean Distance, Mean Absolute Deviation, Distortion Factor, and Mantissa Distribution. The primary rationale behind employing multiple tests is to enhance the robustness of our inference. The main finding of the study indicates that COVID-19 infections in China do not adhere to the distribution expected under Benford’s Law, nor does it align with the figures observed in the U.S. and Italy. The usefulness of deviations from Benford’s Law in detecting misreported or fraudulent data remains controversial. However, addressing this question requires a more careful statistical analysis than what is presented in the Koch and Okamura (2020) paper. By employing a combination of several tests using fully transparent procedures, we establish a more reliable approach to evaluating conformity to the Newcomb-Benford Law in applied research.

1. Introduction

The Newcomb-Benford Law (NBL), also known as Benford’s Law or the First-Digit Law, is a statistical phenomenon that characterizes the expected distribution of leading digits in a wide range of real-world datasets (Benford, 1938; Newcomb, 1881). According to the Newcomb-Benford Law (NBL), the digit 1 appears as the leading digit about 30% of the time, followed by the digit 2 at roughly 18%, with the frequency gradually decreasing for higher digits. A rigorous mathematical proof of the law was developed by Hill (1995).

NBL is widely applied as a forensic tool for detecting suspicious patterns in data (Nigrini, 2012). Scholars have applied this law in various fields, including international trade (Cerioli et al., 2019), money laundering (Deleanu, 2017), elections (Figueiredo Filho et al., 2022; Mebane, 2006; Pericchi & Torres, 2011) and campaign finance (Cho & Gaines, 2007; Gamermann & Antunes, 2018). Researchers have also applied the Newcomb-Benford Law to analyze data related to the COVID-19 pandemic (Campolieti, 2021; Farhadi, 2021; Lee et al., 2020; Silva & Figueiredo Filho, 2020). These studies primarily evaluate how well the digit frequencies in epidemiological figures align with the NBL. Deviations from theoretical distribution are interpreted as potential signals of inconsistencies, encompassing deliberate fraud or failures in surveillance systems to provide reliable information (Balashov et al., 2021; Figueiredo Filho et al., 2022; Kennedy & Yam, 2020).

In this paper, we follow the framework developed by Figueiredo et al. (2022) to challenge the conclusions reported by Koch and Okamura (2020) regarding the distribution of Chinese COVID-19 figures. The reasoning for reanalyzing Koch and Okamura (2020) data is the lack of rigor in their data analysis. First, they reject the null hypothesis by cherry-picking evidence. Second, they offer an unsolid claim to use Kuiper test instead of the chi-square test. Additionally, we believe that students and professionals will benefit from our replication materials since we provide detailed guidance on how to implement the aforementioned tests using R statistical programming language.

The remainder of the paper is structured as follows: Section 2 offers an account of the data employed in this study and delivers a concise overview of the Newcomb-Benford Law. In Section 3, a series of multiple conformity tests are presented, focusing specifically on the data provided by Koch and Okamura (2020), with the aim of questioning and scrutinizing their primary findings. Finally, Section 4 concludes the paper.

2. Materials and methods

2.1 Data

Following the best scientific practices (Figueiredo Filho et al., 2019), we contacted both authors asking for cleaned data and computational scripts, but as of the submission of this paper they have not replied. Thus, we downloaded the original dataset from the Mendeley website (Koch & Okamura, 2020). However, the display of the information is not standardized across countries and neither computational scripts nor codebooks were provided, which makes it challenging to reproduce their results.

In spite of this difficulty, we identified which columns were used to run Benford Law first digit analysis. Then, we saved the spreadsheets as independent files and after some data cleaning we produced three files in .xlsx format: China, Italy and US. All data is aggregated by country with daily periodicity. We have collected data from February 28, 2020 to July 1, 2020.

Table 1 Theoretical digit distribution of NBL

Digit	1st	2nd	3th	4th
0		12	10.2	10
1	30.1	11.4	10.1	10
2	17.6	10.9	10.1	10
3	12.2	10.4	10.1	10
4	9.7	10	10	10
5	7.9	9.7	10	9.9
6	6.7	9.3	9.9	9.9
7	5.8	9	9.9	9.9
8	5.1	8.8	9.9	9.9
9	4.6	8.5	9.8	9.9

2.2 Statistical analysis

Discovered independently by Simon Newcomb in 1881 and later popularized by physicist Frank Benford in 1938, this empirical observation asserts that in naturally occurring numerical datasets, the leading digits of numbers are not uniformly distributed as one might intuitively assume (Benford, 1938; Newcomb, 1881). Instead, smaller digits, particularly ‘1,’ occur more frequently as the first digit than larger ones, such as ‘9.’ The exact distribution for the NBL for the first digit is given by:

\displaystyle P(d)=\log 10\left(1+\frac{1}{{{d}}}\right)\text{ for }d\in\{1,% \ldots,9\}

(1)

This intriguing non-uniform distribution has been found to emerge across several datasets, ranging from financial accounting, population demographics, scientific data, to even naturally occurring phenomena. As a result, the Newcomb-Benford Law has gained significant prominence for its potential applications in fraud detection, data integrity assessment, and as a valuable tool for anomaly detection in large-scale datasets. Table 1 shows the NBL theoretical frequency of the first, second, third, and fourth digits.

For the application of Benford’s Law to a specific dataset, the data should exhibit a geometric progression or consist of multiple geometric progressions (Lee et al., 2020; Nigrini, 2012). Moreover, it requires large data sets whose numbers combine multiple distributions, cover several orders of magnitude, and where the mean is greater than the median with a positive skew (Cho & Gaines, 2007; Ciofalo, 2009; Janvresse, 2004). In the context of COVID-19 data, the exponential rise in SARS-COV-2 infections fulfills these assumptions (Hutzler et al., 2021). MAD estimates should be interpreted following Nigrini’s (2012) guidelines, as reported in Table 2.

Table 2 MAD range according Nigrini (2012)

First digit MAD range	Conclusion
0.0000 to 0.006	Close conformity
0.006 to 0.012	Acceptable conformity
0.012 to 0.015	Marginally acceptable conformity
Above 0.015	Nonconformity

2.3 Computational tools

All statistical analyses were performed using R Statistical, version 4.1.2, and all tests were two sided with 5% of significance level Replication materials including raw data and computational scripts are available at: https://osf.io/ ep3wd/.

Table 3 First digit distribution of number of COVID-19 confirmed cases by country

Digit	1	2	3	4	5	6	7	8	9
China	35	18.6	12.4	7.9	8.8	6.8	5.2	3.2	2
Italy	33.3	14.5	10	9.6	9.3	6.2	6.6	5.4	5.1
U.S.	29.7	17.6	13	10.9	8.1	6.7	5.1	4.9	3.8

Figure 1. Koch and Okamura (2020) observed values x NBL theoretical expectation.

3. Results

Figure 1 and Table 3 compare the first digit distribution of COVID-19 confirmed cases with the theoretical expectation under Newcomb-Benford Law by country.

Comparatively, the figures from China exhibit the highest deviation from what is expected under the hypothesis of conformity to NBL. Specifically, while the expected theoretical frequency of the first digit is 30.1%, estimates from China indicate an observed frequency of 35%. We also detected a strong underestimation of 8 and 9 digits. In contrast, data from the U.S. and Italy demonstrate a strongest adherence to Benford’s Law. Following Nigrini’s (2012) recommendation, Table 4 shows the reanalysis of Koch and Okamura (2020) data by including multiple conformity tests.

The results indicate that regardless of the measure, the Chinese data fails to conform to the Newcomb-Benford Law. Both Chi-square (China

=

26.12,

p

-value 0.001; Italy

=

18.13,

p

-value 0.02; US

=

17.35;

p

-value 0.027) and Kolmogorov-Smirnov (China

=

15.51,

p

-value

<

0.001; Italy

=

6.76,

p

-value

<

0.001; US

=

10.26,

p

-value

<

0.001) tests are highly significant, leading to the rejection of the null hypothesis. The Mean Absolute Deviation (MAD) values further reinforces this nonconformity (China

=

0.0154 [Nonconformity]; Italy

=

0.0137 [Marginally acceptable conformity]; US

=

0.0044 [Close conformity]), as described in Table 4.

Table 4 Reanalysis of Koch and Okamura (2020) data

Parameter	China ( $N=$ 717)		Italy ( $N=$ 980)		U.S ( $N=$ 4.427)
Chi-square	26	.12^**	18	.13^*	17	.35^*
Kolmogorov-Smirnov	15	.51^***	6	.76^***	10	.26^***
Euclidean distance	5	.73^***	2	.99^***	4	.29^***
Mean absolute deviation	0	.0154	0	.0137	0	.0044
Distortion factor	$-$ 22	.42	$-$ 2	.27	$-$ 4	.04
Mantissa (0.500)	0	.409	0	.481	0	.487
Variance (0.083)	0	.09	0	.095	0	.083
Kurtosis ( $-$ 1.2)	$-$ 1	.217	$-$ 1	.312	$-$ 1	.166
Skewness (0)	0	.118	$-$ 0	.015	$-$ 0	.032

^*p-value $<$ 0.05; ^**p-value $<$ 0.01; ^***p-value $<$ 0.001.

Figure 2. Conformity measures of number of COVID-19 confirmed cases by country. Note: chi2 $=$ Pearson chi-square; ks $=$ Kolmogorov-Smirnov D statistic; ed $=$ Euclidean distance; mad $=$ Mean absolute deviation; mantissa $=$ Average mantissa.

Comparatively, the Chinese data exhibits higher distortion factor (

-

22.42) than Italy (

-

2.27) and the U.S (

-

4.03), indicating a strong underestimation (Nigrini, 2012). Theoretically, adherence to Benford’s Law also implies a uniform distribution of mantissa. According to Newcomb (1881), “the law of probability of the occurrence of numbers is such that all mantissa of their logarithms are equally probable” (Newcomb, 1881, p. 3). While China (0.412) significantly deviates from the expected distribution under Benford’s Law (0.500), Italy (0.481) and the U.S. (0.487) show closer values to the theoretical distribution. In summary, the distribution of COVID-19 confirmed infections in China neither matches the distribution expected under Benford’s Law nor aligns with the figures observed in the U.S. and Italy, as reported by Koch and Okamura (2020). This finding is supported by Peng and Nagata (2020), Kennedy and Yam (2020) and Lee et al. (2020).

In what follows we show the advantage of using multiple conformity tests to evaluate the results of NBL empirical applications. To do so, we constructed three artificial datasets that match the exact same sample size analyzed by Koch and Okamura (2020). Figure 2 compares the conformity measures from the simulated data, which perfectly fits the Benford-Law expectations, to the goodness-of-fit estimates reported in Koch and Okamura (2020).

Statistical theory teaches us that when sample size increases, statistical tests tend to become more sensitive, increasing the likelihood of detecting smaller effects. Therefore, statistical power plays a key role in scientific inference by determining the probability of correctly rejecting a false null hypothesis. Koch and Okamura (2020) are right when they argue that the Chi-square test is sensitive to sample size. However, they failed to acknowledge that the excess of power “starts being noticeable for data sets with more than 5,000 records” (Nigrini, 2012, p. 154), which is not the case in their study. In essence, Koch and Okamura (2020) specifically selected the Kuiper test, which aligned with their hypothesis. Had they chosen any other test, they could have arrived at the opposite conclusion. It is essential to recognize the potential influence of test selection on the study’s outcomes, In addition, our simulations show that the joint application of multiple conformity tests leads to more reliable conclusions regarding the role of sample size driving empirical results when using NBL.

4. Conclusions

This paper expands upon the work conducted by Koch and Okamura (2020) in their application of Benford’s Law to evaluate the integrity of COVID-19 data. In an effort to enhance our comprehension of NBL, we emphasize the significance of employing multiple conformity tests, which yield more robust inferences compared to relying solely on a single measure. Our results show that Koch and Okamura (2020) findings do not hold under multiple testing. In particular, we demonstrate that the joint application of conformity tests is a more reliable approach to evaluate data integrity in NBL settings. Whether deviations from Benford’s Law are useful for detecting misreported or fraudulent data remains controversial, but approaching this question demands a more thoughtful statistical analysis than what is presented in the Koch and Okamura (2020)’s piece.

Despite the contribution we have made, there are some limitations that are worth mentioning. First, we were unable to access appropriate replication materials from Koch and Okamura’s (2020) study. Consequently, we may have missed some of their methodological procedures. Second, there is widespread skepticism regarding COVID-19 epidemiological data in general, mainly due to reporting delays and measurement errors. Taking these shortcomings into consideration, they could potentially act as sources of bias.

Acknowledgments

We are thankful to the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Fundação de Amparo à Pesquisa do Estado de Alagoas (FAPEAL) for their financial support. We also appreciate the referees for their constructive comments which have led to significant improvement of the manuscript.

References

1. Balashov V.S. Yan Y., & Zhu X. (2021). Using the Newcomb-Benford law to study the association between a country’s COVID-19 reporting accuracy and its development. Scientific Reports, 11(1), 22914.

Abstract

1. Introduction

2. Materials and methods

2.1 Data

2.2 Statistical analysis

2.3 Computational tools

3. Results

4. Conclusions

Acknowledgments

References

Also from Sage