[removed zones-discuss after sending heads-up that the conversation will continue at zfs-discuss]
On Mon, Jan 4, 2010 at 5:16 PM, Cindy Swearingen <cindy.swearin...@sun.com> wrote: > Hi Mike, > > It is difficult to comment on the root cause of this failure since > the several interactions of these features are unknown. You might > consider seeing how Ed's proposal plays out and let him do some more > testing... Unfortunately Ed's proposal is not funded last I heard. Ops Center uses many of the same mechanisms for putting zones on ZFS. This is where I saw the problem initially. > If you are interested in testing this with NFSv4 and it still fails > the same way, then also consider testing this with a local file > instead of a NFS-mounted file and let us know the results. I'm also > unsure of using the same path for the pool and the zone root path, > rather than one path for pool and a pool/dataset path for zone > root path. I will test this myself if I get some time. I have been unable to reproduce with a local file. I have been able to reproduce with NFSv4 on build 130. Rather surprisingly the actual checksums found in the ereports are sometimes "0x0 0x0 0x0 0x0" or "0xbaddcafe00 ...". Here's what I did: - Install OpenSolaris build 130 (ldom on T5220) - Mount some NFS space at /nfszone: mount -F nfs -o vers=4 $file:/path /nfszone - Create a 10gig sparse file cd /nfszone mkfile -n 10g root - Create a zpool zpool create -m /zones/nfszone nfszone /nfszone/root - Configure and install a zone zonecfg -z nfszone set zonepath = /zones/nfszone set autoboot = false verify commit exit chmod 700 /zones/nfszone zoneadm -z nfszone install - Verify that the nfszone pool is clean. First, pkg history in the zone shows the timestamp of the last package operation 2010-01-07T20:27:07 install pkg Succeeded At 20:31 I ran: # zpool status nfszone pool: nfszone state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone ONLINE 0 0 0 /nfszone/root ONLINE 0 0 0 errors: No known data errors I booted the zone. By 20:32 it had accumulated 132 checksum errors: # zpool status nfszone pool: nfszone state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone DEGRADED 0 0 0 /nfszone/root DEGRADED 0 0 132 too many errors errors: No known data errors fmdump has some very interesting things to say about the actual checksums. The 0x0 and 0xbaddcafe00 seem to shout that these checksum errors are not due to a couple bits flipped # fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail 2 cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 0x290cbce13fc59dce 3 cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 0x7e0aef335f0c7f00 3 cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 0xd4f1025a8e66fe00 4 cksum_actual = 0x0 0x0 0x0 0x0 4 cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 0x330107da7c4bcec0 5 cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 0x4e0b3a8747b8a8 6 cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 0x280934efa6d20f40 6 cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 0x89715e34fbf9cdc0 16 cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 0x7f84b11b3fc7f80 48 cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 0x82804bc6ebcfc0 I halted the zone, exported the pool, imported the pool, then did a scrub. Everything seemed to be OK: # zpool export nfszone # zpool import -d /nfszone nfszone # zpool status nfszone pool: nfszone state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nfszone ONLINE 0 0 0 /nfszone/root ONLINE 0 0 0 errors: No known data errors # zpool scrub nfszone # zpool status nfszone pool: nfszone state: ONLINE scrub: scrub completed after 0h0m with 0 errors on Thu Jan 7 21:56:47 2010 config: NAME STATE READ WRITE CKSUM nfszone ONLINE 0 0 0 /nfszone/root ONLINE 0 0 0 errors: No known data errors But then I booted the zone... # zoneadm -z nfszone boot # zpool status nfszone pool: nfszone state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 0h0m with 0 errors on Thu Jan 7 21:56:47 2010 config: NAME STATE READ WRITE CKSUM nfszone ONLINE 0 0 0 /nfszone/root ONLINE 0 0 109 errors: No known data errors I'm confused as to why this pool seems to be quite usable even with so many checksum errors. -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss