Dan Price wrote: > On Thu 14 Aug 2008 at 03:37PM, Evan Layton wrote: >> This error is coming from ZFS. Did you change out one of your disks in >> the mirror recently? If so you may want to run format on that disk and >> see if it has an EFI label on it. If it does you'll have to break the >> mirror and remove that disk from the mirror, re-label it and add it >> back into the mirror. > > Evan, I would not recommend this procedure. Doing so will likely > (although probably not surely) result in an unbootable system.
I guess I just got lucky when this worked for me. This in a nutshell is what Lori had indicated I should try but I can see based on what you written here that either I misunderstood her or missed something. Thanks for the added information and letting us know that what I had suggested was a problem! -evan > > Yesterday I saw that I had an EFI labelled disk in my root pool, > by accident. And so set out to fix the issue. I did what you'd expect: > detach the device, re-fdisk it, then repartition it with format -e, and > with an SMI label. > > The end result of my fiddling was a machine which would not boot > build 95. As I tried various remedies (like installgrub, boot > to the cd and massage the pool, etc), the problem got worse until the > system could not boot any of my BEs anymore. > > Today I was lucky enough to have Lin, George and Erik from the ZFS team > all in my office helping me to debug this. They were awesome and we > quickly got to a root cause. > > The heart of the problem is that /etc/zfs/zpool.cache in the boot > archive and the pool configuration stored in the disks themselves can > get out of sync with each other. That's bad, because when ZFS tries to > reconcile them at boot time, it will get upset and panic, thinking that > the pool is damaged. This can happen when you do a mirror attach or > detach because apparently disk GUIDs in the pool can change as the > pool topology changes and mirror vdevs come and go. We stepped > through the problem with KMDB and watched ZFS load up a healthy pool, > then shoot it down as broken due to this reconciliation problem. > > If you want to remove an EFI labelled disk from your root pool, my advice > to you would be to do the following. Note that I have not tested this > particular sequence, but I think it will work. Hah. > > 0) Backup your data and settings. > > 1) 'zpool detach' the EFI labelled disk from your pool. After you do this > YOU MUST NOT REBOOT. Your system is now in a fragile state. > > 2) Run 'zpool status' to ensure that your pool now has one disk. > > 3) Edit /etc/boot/solaris/filelist.ramdisk. Remove the only line in the > file: > > etc/zfs/zpool.cache > > 4) Delete /platform/i86pc/boot_archive and /platform/i86pc/amd64/boot_archive > > 5) Run 'bootadm update-archive' -- This rebuilds the boot archive, > omitting the zpool.cache file. > > It may also be necessary to do installgrub at this point. Probably, and > it wouldn't hurt. > > 6) Reboot your system, to ensure that you have a working configuration. > > In Nevada, this is not an issue (George told me) because the boot archive > omits the zpool.cache file, so there's never any state to get out of sync. > I was left wondering why we populate /etc/boot/solaris/filelist.ramdisk > with "etc/zfs/zpool.cache". At a minimum, if we haven't already, we > should stop doing that as soon as possible. > > I will be filing bugs to cover these issues tomorrow. > > -dp > _______________________________________________ indiana-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/indiana-discuss
