Re: [zfs-discuss] ZFS: unreliable for professional usage?

Ross Smith Sat, 14 Feb 2009 22:12:09 -0800

Hey guys,

I'll let this die in a sec, but I just wanted to say that I've gone
and read the on disk document again this morning, and to be honest
Richard, without the description you just wrote, I really wouldn't
have known that uberblocks are in a 128 entry circular queue that's 4x
redundant.

Please understand that I'm not asking for answers to these notes, this
post is purely to illustrate to you ZFS guys that much as I appreciate
having the ZFS docs available, they are very tough going for anybody
who isn't a ZFS developer.  I consider myself well above average in IT
ability, and I've really spent quite a lot of time in the past year
reading around ZFS, but even so I would definitely have come to the
wrong conclusion regarding uberblocks.

Richard's post I can understand really easily, but in the on disk
format docs, that information is spread over 7 pages of really quite
technical detail, and to be honest, for a user like myself raises as
many questions as it answers:

On page 6 I learn that labels are stored on each vdev, as well as each
disk.  So there will be a label on the pool, mirror (or raid group),
and disk.  I know the disk ones are at the start and end of the disk,
and it sounds like the mirror vdev is in the same place, but where is
the root vdev label?  The example given doesn't mention its location
at all.

Then, on page 7 it sounds like the entire label is overwriten whenever
on-disk data is updated - "any time on-disk data is overwritten, there
is potential for error".  To me, it sounds like it's not a 128 entry
queue, but just a group of 4 labels, all of which are overwritten as
data goes to disk.

Then finally, on page 12 the uberblock is mentioned (although as an
aside, the first time I read these docs I had no idea what the
uberblock actually was).  It does say that only one uberblock is
active at a time, but with it being part of the label I'd just assume
these were overwritten as a group..

And that's why I'll often throw ideas out - I can either rely on my
own limited knowledge of ZFS to say if it will work, or I can take
advantage of the excellent community we have here, and post the idea
for all to see.  It's a quick way for good ideas to be improved upon,
and bad ideas consigned to the bin.  I've done it before in my rather
lengthly 'zfs availability' thread.  My thoughts there were thrashed
out nicely, with some quite superb additions (namely the concept of
lop sided mirrors which I think are a great idea).

Ross

PS.  I've also found why I thought you had to search for these blocks,
it was after reading this thread where somebody used mdb to search a
corrupt pool to try to recover data:
http://opensolaris.org/jive/message.jspa?messageID=318009

On Fri, Feb 13, 2009 at 11:09 PM, Richard Elling
<richard.ell...@gmail.com> wrote:
> Tim wrote:
>>
>>
>> On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn
>> <bfrie...@simple.dallas.tx.us <mailto:bfrie...@simple.dallas.tx.us>> wrote:
>>
>>    On Fri, 13 Feb 2009, Ross Smith wrote:
>>
>>        However, I've just had another idea.  Since the uberblocks are
>>        pretty
>>        vital in recovering a pool, and I believe it's a fair bit of
>>        work to
>>        search the disk to find them.  Might it be a good idea to
>>        allow ZFS to
>>        store uberblock locations elsewhere for recovery purposes?
>>
>>
>>    Perhaps it is best to leave decisions on these issues to the ZFS
>>    designers who know how things work.
>>
>>    Previous descriptions from people who do know how things work
>>    didn't make it sound very difficult to find the last 20
>>    uberblocks.  It sounded like they were at known points for any
>>    given pool.
>>
>>    Those folks have surely tired of this discussion by now and are
>>    working on actual code rather than reading idle discussion between
>>    several people who don't know the details of how things work.
>>
>>
>>
>> People who "don't know how things work" often aren't tied down by the
>> baggage of knowing how things work.  Which leads to creative solutions those
>> who are weighed down didn't think of.  I don't think it hurts in the least
>> to throw out some ideas.  If they aren't valid, it's not hard to ignore them
>> and move on.  It surely isn't a waste of anyone's time to spend 5 minutes
>> reading a response and weighing if the idea is valid or not.
>
> OTOH, anyone who followed this discussion the last few times, has looked
> at the on-disk format documents, or reviewed the source code would know
> that the uberblocks are kept in an 128-entry circular queue which is 4x
> redundant with 2 copies each at the beginning and end of the vdev.
> Other metadata, by default, is 2x redundant and spatially diverse.
>
> Clearly, the failure mode being hashed out here has resulted in the defeat
> of those protections. The only real question is how fast Jeff can roll out
> the
> feature to allow reverting to previous uberblocks.  The procedure for doing
> this by hand has long been known, and was posted on this forum -- though
> it is tedious.
> -- richard
>
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

Reply via email to