Ian Collins wrote:
David Magda wrote:
On Jun 30, 2009, at 14:08, Bob Friesenhahn wrote:
I have seen UPSs help quite a lot for short glitches lasting
seconds, or a minute. Otherwise the outage is usually longer than
the UPSs can stay up since the problem required human attention.
A standby generator is needed for any long outages.
Can't remember where I read the claim, but supposedly if power isn't
restored within about ten minutes, then it will probably be out for a
few hours. If this 'statistic' is true, it would mean that your UPS
should last (say) fifteen minutes, and after that you really need a
generator.
Or run your systems of DC and get as much backup as you have room (and
budget!) for batteries. I once visited a central exchange with 48
hours of battery capacity...
The way Google handles UPSes is to have a small 12v battery integrated
with each PC power supply. When the machine is on, the battery has its
charged maintained. Not unlike a laptop in that it has a built in
battery backup, but using an inexpensive sealed lead acid battery
instead of lithium ion. Here is info along with photos of the Google
server internals:
http://news.cnet.com/8301-1001_3-10209580-92.html
http://willysr.blogspot.com/2009/04/googles-server-design.html
(IIRC there have been power supply UPSes since at least the late 1980s
which had an internal battery. Either that or they were UPSes that fit
inside the standard PC (AT) compatible desktop case, making the power
protection system entirely internal to the computer. I think I saw
these models one time while browsing late 1980s or early 1990s issues of
PC Magazine that reviewed UPSes. They still exist...one company selling
them is http://www.globtek.com/html/ups.html . A Google search for
'power supply built in UPS' would likely find more.)
I also did additional searches in the zfs-discuss archives and found a
thread from mid-February, which lead me to other threads. It looks like
there are still scattered instances where ZFS has not recovered
gracefully from power failures or other failures, where it became
necessary to perform a manual transaction group (txg) rollback. Here is
a consolidated list of links related to manual uberblock transaction
group (txg) rollback and similar ZFS data recovery guides, including
undeleting:
Section 1: Nathan Hand's guide and related thread
Nathan Hand's guide to invalidating uberblocks (Dec 2008 thread)
http://www.opensolaris.org/jive/thread.jspa?threadID=85794
or http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg22153.html
Section 2. Victor Latushkin's guide and related threads
Thread: zpool unimportable (corrupt zpool metadata??) but no zdb -l
device problems (Oct 2008 to Feb 2009 thread)
http://www.opensolaris.org/jive/thread.jspa?threadID=76960
or http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg19839.html
Repair report: Re: Solved - a big THANKS to Victor Latushkin @ Sun / Moscow
http://www.opensolaris.org/jive/message.jspa?messageID=289537#289537
Some recovery discussion by Victor: "zdb -bv alone took several hours to
walk the block tree"
http://www.opensolaris.org/jive/message.jspa?messageID=292991#292991
or
http://mail.opensolaris.org/pipermail/zfs-discuss/2008-October/022365.html
or http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg20095.html
Victor Latushkin's guide: "Thanks to COW nature of ZFS it was possible
to successfully recover pool state which was only 5 seconds older than
last unopenable one."
http://mail.opensolaris.org/pipermail/zfs-discuss/2008-October/022331.html
or http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg20061.html
Section 3: reliability debates, recovery tool planning, uberblock info
Thread: Availability: ZFS needs to handle disk removal / driver failure
better (August 2008 thread)
http://www.opensolaris.org/jive/thread.jspa?threadID=70811
or http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg19057.html
Thread: ZFS: unreliable for professional usage? (Feb 2009 thread)
http://www.opensolaris.org/jive/thread.jspa?threadID=91426
or http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg23833.html
Richard Elling's post that "uberblocks are kept in an 128-entry circular
queue which is 4x redundant with 2 copies each at the beginning and end
of the vdev. Other metadata, by default, is 2x redundant and spatially
diverse."
http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg24145.html
Jeff Bonwick's post about Bug ID 6667683
http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg23961.html
Bug ID 6667683: need a way to rollback to an uberblock from a previous txg
Description: If we are unable to open the pool based on the most recent
uberblock then it might be useful to try an older txg uberblock as it
might provide a better view of the world. Having a utility to reset the
uberblock to a previous txg might provide a nice recovery mechanism.
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6667683
Uberblock information
http://blogs.sun.com/blogfinger/entry/zfs_and_the_uberblock
http://blogs.sun.com/blogfinger/entry/zfs_and_the_uberblock_part
Section 4: undeleting
Recovering removed file on zfs disk using a modified mdb and zdb (i.e.
undelete)
http://mbruning.blogspot.com/2008/08/recovering-removed-file-on-zfs-disk.html
Re: [zfs-discuss] Forensic analysis [was: more ZFS recovery] (listed
because forensic analysis tools often overlap with undeletion tools/data
recovery tools)
http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg18557.html
http://opensolaris.org/os/project/forensics/ZFS-Forensics/
Thanks everyone for the input you've given so far.
-hk
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss