Re: [zfs-discuss] ZFS tale of woe and fail

Victor Latushkin Wed, 01 Jul 2009 09:38:50 -0700

On 19.01.09 12:09, Tom Bird wrote:

Toby Thain wrote:

On 18-Jan-09, at 6:12 PM, Nathan Kroenert wrote:

Hey, Tom -

Correct me if I'm wrong here, but it seems you are not allowing ZFS any
sort of redundancy to manage.


Every other file system out there runs fine on a single LUN, when things
go wrong you have a fsck utility that patches it up and the world keeps
on turning.

I can't find anywhere that will sell me a 48 drive SATA JBOD with all
the drives presented on a single SAS channel, so running on a single
giant LUN is a real world scenario that ZFS should be able to cope with,
as this is how the hardware I am stuck with is arranged.

Which is particularly catastrophic when one's 'content' is organized as
a monolithic file, as it is here - unless, of course, you have some way
of scavenging that file based on internal structure.


No, it's not a monolithic file, the point I was making there is that no
files are showing up.

r...@cs4:~# find /content
/content
r...@cs4:~# (yes that really is it)

This issue (and previous one reported by Tom) has got some publicityrecently - see here


http://www.uknof.org.uk/uknof13/Bird-Redux.pdf

So i feel like i need to provide a little bit more information about theoutcome (sorry that it is delayed and not as full as previous one).


First, it looked like this:

r...@cs4:~# zpool list
NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
content  62.5T  59.9T  2.63T    95%  ONLINE  -

r...@cs4:~# zpool status -v
  pool: content
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        content     ONLINE       0     0    32
          c2t8d0    ONLINE       0     0    32

errors: Permanent errors have been detected in the following files:

        content:<0x0>
        content:<0x2c898>

First permanent error means that root block of the filesystem named'content' was corrupted (all copies), so it was not possible to open itand access any content of that filesystem.

Fortunately enough, there were not too much activity on the pool, so wedecided to try previous states of the pool. I do not remember exact txgnumber we tried, but it was something like hundred txg back or so. Wechecked it with zdb and discovered that that state was more or less good- at least filesystem content was openable and it was possible to accessits content, so we decided to reactivate that previous state. Poolimported fine and contents of 'content' was there. Subsequent scrub didfind some errors but I do not remember exactly how much. Tom may haveexact number.


Victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS tale of woe and fail

Reply via email to