On 19.01.09 12:09, Tom Bird wrote:
Toby Thain wrote:
On 18-Jan-09, at 6:12 PM, Nathan Kroenert wrote:

Hey, Tom -

Correct me if I'm wrong here, but it seems you are not allowing ZFS any
sort of redundancy to manage.

Every other file system out there runs fine on a single LUN, when things
go wrong you have a fsck utility that patches it up and the world keeps
on turning.

I can't find anywhere that will sell me a 48 drive SATA JBOD with all
the drives presented on a single SAS channel, so running on a single
giant LUN is a real world scenario that ZFS should be able to cope with,
as this is how the hardware I am stuck with is arranged.

Which is particularly catastrophic when one's 'content' is organized as
a monolithic file, as it is here - unless, of course, you have some way
of scavenging that file based on internal structure.

No, it's not a monolithic file, the point I was making there is that no
files are showing up.

r...@cs4:~# find /content
/content
r...@cs4:~# (yes that really is it)

This issue (and previous one reported by Tom) has got some publicity recently - see here

http://www.uknof.org.uk/uknof13/Bird-Redux.pdf

So i feel like i need to provide a little bit more information about the outcome (sorry that it is delayed and not as full as previous one).

First, it looked like this:

r...@cs4:~# zpool list
NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
content  62.5T  59.9T  2.63T    95%  ONLINE  -

r...@cs4:~# zpool status -v
  pool: content
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        content     ONLINE       0     0    32
          c2t8d0    ONLINE       0     0    32

errors: Permanent errors have been detected in the following files:

        content:<0x0>
        content:<0x2c898>


First permanent error means that root block of the filesystem named 'content' was corrupted (all copies), so it was not possible to open it and access any content of that filesystem.

Fortunately enough, there were not too much activity on the pool, so we decided to try previous states of the pool. I do not remember exact txg number we tried, but it was something like hundred txg back or so. We checked it with zdb and discovered that that state was more or less good - at least filesystem content was openable and it was possible to access its content, so we decided to reactivate that previous state. Pool imported fine and contents of 'content' was there. Subsequent scrub did find some errors but I do not remember exactly how much. Tom may have exact number.

Victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to