Olympus-OM
[Top] [All Lists]

[OM] Google's hard drive failure study

Subject: [OM] Google's hard drive failure study
From: "Jez Cunningham" <jez.cunningham@xxxxxxxxx>
Date: Wed, 11 Apr 2007 13:48:44 +0200
Cross-posted from a (Nik*n) mailing list
http://www.juergenspecht.com/lists/d1scussion/
I hope I don't upset anyone by forwarding this...


Conventional wisdom states that the more you use your hard drive --
or, for that matter, the hotter your hard drive gets -- the more
likely it is to crash.  That certainly sounds plausible, but is it
true?  According to Google, the answer is a resounding "NO!"

How would Google know?  Well, remember that when you use Google to
search the internet you aren't really searching the internet.  You're
searching Google's copy of the internet, the files that Google's
spiders [a.k.a., "Googlebots"] find, vacuum up, and send back to the
Google mothership.  To store all of this data, Google uses a gozillion
hard drives [100,000 or more] in its data centers scattered around the
world.  And like any well-run data center, Google's data centers
constantly monitor and record data on the health status of every hard
drive.

Google employees Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz Andre
Barroso gathered in-depth data from over 100,000 disk drives deployed
throughout Google and discovered that

   * Contrary to previously reported results, there is very little
     correlation between failure rates and either elevated
     temperature or activity levels.

   * However, some SMART parameters (scan errors, reallocation
     counts, offline reallocation counts, and probational counts)
     have a HUGE impact on hard drive failure probability.

   * Given the lack of occurrence of predictive SMART signals on a
     large fraction of failed drives, it is unlikely that an accurate
     predictive failure model can be built based on these signals
     alone.

Google's complete report, titled "Failure Trends in a Large Disk Drive
Population", is a 241 KB, 13 page Adobe Acrobat file that you can
download at

   http://216.239.37.132/papers/disk_failures.pdf

The bad news is that this report reads a bit like stereo instructions.
If you aren't a techie, skip the PDF and check out Gizmodo's or
StorageMojo's summaries instead at

    http://tinyurl.com/yw7db8
    http://storagemojo.com/?p=378

Long story short: Most of what we know about hard drive failure rates
and causes is wrong.


==============================================
List usage info:     http://www.zuikoholic.com
List nannies:        olympusadmin@xxxxxxxxxx
==============================================

<Prev in Thread] Current Thread [Next in Thread>
Sponsored by Tako
Impressum | Datenschutz