I can check on Monday, but the system will probably panic... which doesn't really help :-)
Am I right in thinking failmode=wait is still the default? If so, that should be how it's set as this testing was done on a clean install of snv_106. From what I've seen, I don't think this is a problem with the zfs failmode. It's more of an issue of what happens in the period *before* zfs realises there's a problem and applies the failmode. This time there was just a window of a couple of minutes while commands would continue. In the past I've managed to stretch it out to hours. To me the biggest problems are: - ZFS accepting writes that don't happen (from both before and after the drive is removed) - No logging or warning of this in zpool status I appreciate that if you're using cache, some data loss is pretty much inevitable when a pool fails, but that should be a few seconds worth of data at worst, not minutes or hours worth. Also, if a pool fails completely and there's data in the cache that hasn't been committed to disk, it would be great if Solaris could respond by: - immediately dumping the cache to any (all?) working storage - prompting the user to fix the pool, or save the cache before powering down the system Ross On Fri, Feb 6, 2009 at 5:49 PM, Richard Elling <richard.ell...@gmail.com> wrote: > Ross, this is a pretty good description of what I would expect when > failmode=continue. What happens when failmode=panic? > -- richard > > > Ross wrote: >> >> Ok, it's still happening in snv_106: >> >> I plugged a USB drive into a freshly installed system, and created a >> single disk zpool on it: >> # zpool create usbtest c1t0d0 >> >> I opened the (nautilus?) file manager in gnome, and copied the /etc/X11 >> folder to it. I then copied the /etc/apache folder to it, and at 4:05pm, >> disconnected the drive. >> >> At this point there are *no* warnings on screen, or any indication that >> there is a problem. To check that the pool was still working, I created >> duplicates of the two folders on that drive. That worked without any >> errors, although the drive was physically removed. >> >> 4:07pm >> I ran zpool status, the pool is actually showing as unavailable, so at >> least that has happened faster than my last test. >> >> The folder is still open in gnome, however any attempt to copy files to or >> from it just hangs the file transfer operation window. >> >> 4:09pm >> /usbtest is still visible in gnome >> Also, I can still open a console and use the folder: >> >> # cd usbtest >> # ls >> X11 X11 (copy) apache apache (copy) >> >> I also tried: >> # mv X11 X11-test >> >> That hung, but I saw the X11 folder disappear from the graphical file >> manager, so the system still believes something is working with this pool. >> >> The main GUI is actually a little messed up now. The gnome file manager >> window looking at the /usbtest folder has hung. Also, right-clicking the >> desktop to open a new terminal hangs, leaving the right-click menu on >> screen. >> >> The main menu still works though, and I can still open a new terminal. >> >> 4:19pm >> Commands such as ls are finally hanging on the pool. >> >> At this point I tried to reboot, but it appears that isn't working. I >> used system monitor to kill everything I had running and tried again, but >> that didn't help. >> >> I had to physically power off the system to reboot. >> >> After the reboot, as expected, /usbtest still exists (even though the >> drive is disconnected). I removed that folder and connected the drive. >> >> ZFS detects the insertion and automounts the drive, but I find that >> although the pool is showing as online, and the filesystem shows as mounted >> at /usbtest. But the /usbtest directory doesn't exist. >> >> I had to export and import the pool to get it available, but as expected, >> I've lost data: >> # cd usbtest >> # ls >> X11 >> >> even worse, zfs is completely unaware of this: >> # zpool status -v usbtest >> pool: usbtest >> state: ONLINE >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> usbtest ONLINE 0 0 0 >> c1t0d0 ONLINE 0 0 0 >> >> errors: No known data errors >> >> >> So in summary, there are a good few problems here, many of which I've >> already reported as bugs: >> >> 1. ZFS still accepts read and write operations for a faulted pool, causing >> data loss that isn't necessarily reported by zpool status. >> 2. Even after writes start to hang, it's still possible to continue >> reading data from a faulted pool. >> 3. A faulted pool causes unwanted side effects in the GUI, making the >> system hard to use, and impossible to reboot. >> 4. After a hard reset, ZFS does not recover cleanly. Unused mountpoints >> are left behind. >> 5. Automatic mounting of pools doesn't seem to work reliably. >> 6. zfs status doesn't inform of any problems mounting the pool. >> > > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss