I've Got Your Number
I've Got Your Number https://blltly.com/2tlYsb
A PHYSICIST AT GE RESEARCH LABORATORIES in the 1920s, Frank Benford found that numbers with low first digits occurred more frequently in the world and calculated the expected frequencies of the digits in tabulated data.
When physicist Frank Benford tested the first digits in lists of numbers during the 1920s and 1930s, he found that about 31% of the numbers had 1 as the first digit, 19% had 2 , and only 5% had 9 .
Benford then tested this idea by looking at the first digits of 20 lists of numbers with a total of 20,229 observations. His lists came from varied sources, such as geographic, scientific and demographic data. One list contained all the numbers in an issue of Reader's Digest . He found that about 31% of the numbers had 1 as the first digit, 19% had 2 , and only 5% had 9 as a first digit. Benford then made some physics-related assumptions about the distribution of naturally occurring data and, using integral calculus, he computed the expected frequencies of the digits and digit combinations.
An analysis of the actual dollar amounts showed that the numbers $25, $30 and $10 occurred most frequently. The followup audit showed that invoices with these amounts were mainly for courier charges. Repeated low dollar amounts highlight inefficiencies if they are being processed for the same type of purchase. At one company, the followup audit showed that accounts payable was processing about 12,000 invoices annually for employee business card purchases from the same vendor. Monthly billing could make steep reductions in processing costs. Other problems that have been found include:
It's also possible to test for excessive round numbers when an accountant wants to check for excessive estimating (perhaps royalty receivable schedules) and to test the last two digits to find number invention (perhaps in inventory counts).
Subset tests identify small lists of serious anomalies in large data sets, making an analysis much more manageable. They focus on errors as opposed to biases, fraud or processing inefficiencies. Data subsets are natural groupings of the data. In accounts payable, the subsets are usually vendor numbers. In banking data, the subsets are usually account numbers. Other subset variables could be data for sales associates in retailing, transaction dates, travel agents in airline data, cost centers and employees in payroll data.
Relative size factor. The RSF test finds subsets where the largest number is out of line with the remaining numbers and is possibly an error. It has detected errors in accounts payable when staff miscoded the decimal point in the invoice amount. The relative size factor (RSF) for a subset is: RSF = Largest number in subset / Second largest number in subset. An amount of $452.47 was coded as $45,247. That erroneous $45,247 greatly exceeded all the other payments to that vendor and the error was detected due to the high RSF.
A company in the Midwest wired $600,000 to what it thought was a vendor but actually was a charity. The $600,000 was significantly in excess of the amount usually donated to the charity. Had the company run the RSF test using the recipient's checking account numbers as the subset variable, the test would have detected that an amount of this magnitude had never before been wired to that account number. The test is designed to detect data errors. For example, a high RSF in payroll data could signal an overtime error and a high RSF for inventories could signal a calculation or count error.
Same, same, different. This test also detects errors by identifying near-identical entries. In accounts payable data the test is often used to identify cases in which the invoice number is the same, the dollar amount is the same an