Hey guys, I'll let this die in a sec, but I just wanted to say that I've gone and read the on disk document again this morning, and to be honest Richard, without the description you just wrote, I really wouldn't have known that uberblocks are in a 128 entry circular queue that's 4x redundant.
Please understand that I'm not asking for answers to these notes, this post is purely to illustrate to you ZFS guys that much as I appreciate having the ZFS docs available, they are very tough going for anybody who isn't a ZFS developer. I consider myself well above average in IT ability, and I've really spent quite a lot of time in the past year reading around ZFS, but even so I would definitely have come to the wrong conclusion regarding uberblocks. Richard's post I can understand really easily, but in the on disk format docs, that information is spread over 7 pages of really quite technical detail, and to be honest, for a user like myself raises as many questions as it answers: On page 6 I learn that labels are stored on each vdev, as well as each disk. So there will be a label on the pool, mirror (or raid group), and disk. I know the disk ones are at the start and end of the disk, and it sounds like the mirror vdev is in the same place, but where is the root vdev label? The example given doesn't mention its location at all. Then, on page 7 it sounds like the entire label is overwriten whenever on-disk data is updated - "any time on-disk data is overwritten, there is potential for error". To me, it sounds like it's not a 128 entry queue, but just a group of 4 labels, all of which are overwritten as data goes to disk. Then finally, on page 12 the uberblock is mentioned (although as an aside, the first time I read these docs I had no idea what the uberblock actually was). It does say that only one uberblock is active at a time, but with it being part of the label I'd just assume these were overwritten as a group.. And that's why I'll often throw ideas out - I can either rely on my own limited knowledge of ZFS to say if it will work, or I can take advantage of the excellent community we have here, and post the idea for all to see. It's a quick way for good ideas to be improved upon, and bad ideas consigned to the bin. I've done it before in my rather lengthly 'zfs availability' thread. My thoughts there were thrashed out nicely, with some quite superb additions (namely the concept of lop sided mirrors which I think are a great idea). Ross PS. I've also found why I thought you had to search for these blocks, it was after reading this thread where somebody used mdb to search a corrupt pool to try to recover data: http://opensolaris.org/jive/message.jspa?messageID=318009 On Fri, Feb 13, 2009 at 11:09 PM, Richard Elling <richard.ell...@gmail.com> wrote: > Tim wrote: >> >> >> On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn >> <bfrie...@simple.dallas.tx.us <mailto:bfrie...@simple.dallas.tx.us>> wrote: >> >> On Fri, 13 Feb 2009, Ross Smith wrote: >> >> However, I've just had another idea. Since the uberblocks are >> pretty >> vital in recovering a pool, and I believe it's a fair bit of >> work to >> search the disk to find them. Might it be a good idea to >> allow ZFS to >> store uberblock locations elsewhere for recovery purposes? >> >> >> Perhaps it is best to leave decisions on these issues to the ZFS >> designers who know how things work. >> >> Previous descriptions from people who do know how things work >> didn't make it sound very difficult to find the last 20 >> uberblocks. It sounded like they were at known points for any >> given pool. >> >> Those folks have surely tired of this discussion by now and are >> working on actual code rather than reading idle discussion between >> several people who don't know the details of how things work. >> >> >> >> People who "don't know how things work" often aren't tied down by the >> baggage of knowing how things work. Which leads to creative solutions those >> who are weighed down didn't think of. I don't think it hurts in the least >> to throw out some ideas. If they aren't valid, it's not hard to ignore them >> and move on. It surely isn't a waste of anyone's time to spend 5 minutes >> reading a response and weighing if the idea is valid or not. > > OTOH, anyone who followed this discussion the last few times, has looked > at the on-disk format documents, or reviewed the source code would know > that the uberblocks are kept in an 128-entry circular queue which is 4x > redundant with 2 copies each at the beginning and end of the vdev. > Other metadata, by default, is 2x redundant and spatially diverse. > > Clearly, the failure mode being hashed out here has resulted in the defeat > of those protections. The only real question is how fast Jeff can roll out > the > feature to allow reverting to previous uberblocks. The procedure for doing > this by hand has long been known, and was posted on this forum -- though > it is tedious. > -- richard > > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss