Le 6 févr. 09 à 20:54, Ross Smith a écrit :
Something to do with cache was my first thought. It seems to be able to read and write from the cache quite happily for some time, regardless of whether the pool is live. If you're reading or writing large amounts of data, zfs starts experiencing IO faults and offlines the pool pretty quickly. If you're just working with small datasets, or viewing files that you've recently opened, it seems you can stretch it out for quite a while. But yes, it seems that it doesn't enter failmode until the cache is full. I would expect it to hit this within 5 seconds (since I believe that is how often the cache should be writing).
Note that on an lightly loaded system , it's more 30 sec these days. -r
On Fri, Feb 6, 2009 at 7:04 PM, Brent Jones <br...@servuhome.net> wrote:On Fri, Feb 6, 2009 at 10:50 AM, Ross Smith <myxi...@googlemail.com> wrote:I can check on Monday, but the system will probably panic... which doesn't really help :-) Am I right in thinking failmode=wait is still the default? If so, that should be how it's set as this testing was done on a clean install of snv_106. From what I've seen, I don't think this is aproblem with the zfs failmode. It's more of an issue of what happens in the period *before* zfs realises there's a problem and applies thefailmode. This time there was just a window of a couple of minutes while commands would continue. In the past I've managed to stretch it out to hours. To me the biggest problems are: - ZFS accepting writes that don't happen (from both before and after the drive is removed) - No logging or warning of this in zpool statusI appreciate that if you're using cache, some data loss is pretty muchinevitable when a pool fails, but that should be a few seconds worth of data at worst, not minutes or hours worth. Also, if a pool fails completely and there's data in the cache that hasn't been committed to disk, it would be great if Solaris could respond by: - immediately dumping the cache to any (all?) working storage - prompting the user to fix the pool, or save the cache before powering down the system RossOn Fri, Feb 6, 2009 at 5:49 PM, Richard Elling <richard.ell...@gmail.com > wrote:Ross, this is a pretty good description of what I would expect when failmode=continue. What happens when failmode=panic? -- richard Ross wrote:Ok, it's still happening in snv_106:I plugged a USB drive into a freshly installed system, and created asingle disk zpool on it: # zpool create usbtest c1t0d0I opened the (nautilus?) file manager in gnome, and copied the / etc/X11 folder to it. I then copied the /etc/apache folder to it, and at 4:05pm,disconnected the drive.At this point there are *no* warnings on screen, or any indication that there is a problem. To check that the pool was still working, I created duplicates of the two folders on that drive. That worked without anyerrors, although the drive was physically removed. 4:07pmI ran zpool status, the pool is actually showing as unavailable, so atleast that has happened faster than my last test.The folder is still open in gnome, however any attempt to copy files to orfrom it just hangs the file transfer operation window. 4:09pm /usbtest is still visible in gnome Also, I can still open a console and use the folder: # cd usbtest # ls X11 X11 (copy) apache apache (copy) I also tried: # mv X11 X11-testThat hung, but I saw the X11 folder disappear from the graphical file manager, so the system still believes something is working with this pool.The main GUI is actually a little messed up now. The gnome file manager window looking at the /usbtest folder has hung. Also, right- clicking the desktop to open a new terminal hangs, leaving the right-click menu onscreen.The main menu still works though, and I can still open a new terminal.4:19pm Commands such as ls are finally hanging on the pool.At this point I tried to reboot, but it appears that isn't working. I used system monitor to kill everything I had running and tried again, butthat didn't help. I had to physically power off the system to reboot.After the reboot, as expected, /usbtest still exists (even though the drive is disconnected). I removed that folder and connected the drive.ZFS detects the insertion and automounts the drive, but I find that although the pool is showing as online, and the filesystem shows as mountedat /usbtest. But the /usbtest directory doesn't exist.I had to export and import the pool to get it available, but as expected,I've lost data: # cd usbtest # ls X11 even worse, zfs is completely unaware of this: # zpool status -v usbtest pool: usbtest state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM usbtest ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 errors: No known data errorsSo in summary, there are a good few problems here, many of which I'vealready reported as bugs:1. ZFS still accepts read and write operations for a faulted pool, causingdata loss that isn't necessarily reported by zpool status.2. Even after writes start to hang, it's still possible to continuereading data from a faulted pool.3. A faulted pool causes unwanted side effects in the GUI, making thesystem hard to use, and impossible to reboot.4. After a hard reset, ZFS does not recover cleanly. Unused mountpointsare left behind. 5. Automatic mounting of pools doesn't seem to work reliably. 6. zfs status doesn't inform of any problems mounting the pool._______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discussCould this be related to the ZFS TXG/transfer group buffers? ie. it'll buffer writes for a bit before committing to disk. Then, when its time to commit to disk, it realizes the disk is failed, and from then enter those failmode conditions (wait, continue, panic, ?). Could this be the case? http://blogs.sun.com/roch/date/20080514 -- Brent Jones br...@servuhome.net_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss