## Posts filed under ‘Statistics’

### Hard Drive Failure Statistics at Google

Statistics about hard drives from Google’s data centers were published in USENIX, showing how different variables affect the failure rate of hard drives. AFR is “Annual Failure Rate”.

The older drives may be failing more simply because they are a less advanced batch.

Low, medium, and high specify the usage.

Surprisingly, 35-45 degrees C is the sweet spot for hard drives. Colder temperatures actually cause them to fail more.

SMART will catch about half of hard disk failures before the happen, but many drives will fail without any warning.

Pinheiro, E., Weber, W., & Barroso, L. (2007). Failure Trends in a Large Disk Drive Population. Proceedings from FAST ’07: *USENIX Conference on File and Storage Technologies*, 5. [PDF]

### The First Digit Law

If I were to pick a random city in the world, and tell you its population, what might the first digit of that number be?

You may think there’s equal probability for the first digit to be 1 to 9, but over 30% of the time it’s 1 (one).

Why? Think about it this way: let’s say a stock price *doubles* every year, starting at $100/share; it would spend a year with a first digit of 1 until it reaches $200, a year as $2xx or $3xx until it reaches $400, a year as $4xx, $5xx, $6xx, or $7xx, and then just a month or so at $8xx or $9xx, and all of a sudden it’s at $1,000 and the first digit is 1 again. Now it takes a long time (a year) to reach $2,000. There is a disproportionate amount of time when the stock price begins with the digit 1.

Many things in nature increase logarithmically. Benford observed this first-digit phenomenon in places including populations, addresses, baseball statistics, area of rivers, specific heats of compounds, and death rates. This rule has been used to identify accounting fraud where made-up numbers don’t match the distribution found in real accounting numbers.

Benford sampled over 20,000 numbers, and noticed the distribution of numbers was as follows,

Digit | Occurence |
---|---|

1 | 30.6% |

2 | 18.5% |

3 | 12.4% |

4 | 9.4% |

5 | 8.0% |

6 | 6.4% |

7 | 5.1% |

8 | 4.9% |

9 | 4.7% |

This can be closely modeled using the log distribution of

**F_a = log(1 + 1/a)**

where F_a is the frequency that the digit **a** is the first digit in used numbers.

Additionally, the frequency of the n-th digit of a number can also be calculated using a similar formula, presented in the paper.

This is the law of anomalous numbers. We’ve learned to count 1, 2, 3, 4, … but nature counts 1, 2, 4, 8, …

Benford, F. (1938). The Law of Anomalous Numbers. *Proceedings of the American Philosophical Society*, 78(4), 551-572.