Re: [zfs-discuss] Zones on shared storage - a warning

Edward Pilatowicz Thu, 07 Jan 2010 16:03:24 -0800

hey mike/cindy,

i've gone ahead and filed a zfs rfe on this functionality:
        6915127 need full support for zfs pools on files


implmenting this rfe is a requirement for supporting encapsulated
zones on shared storage.

ed

On Thu, Jan 07, 2010 at 03:26:17PM -0700, Cindy Swearingen wrote:
> Hi Mike,
>
> I can't really speak for how virtualization products are using
> files for pools, but we don't recommend creating pools on files,
> much less NFS-mounted files and then building zones on top.
>
> File-based pool configurations might be used for limited internal
> testing of some features, but our product testing does not include
> testing storage pools on files or NFS-mounted files.
>
> Unless Ed's project gets refunded, I'm not sure how much farther
> you can go with this approach.
>
> Thanks,
>
> Cindy
>
> On 01/07/10 15:05, Mike Gerdts wrote:
> >[removed zones-discuss after sending heads-up that the conversation
> >will continue at zfs-discuss]
> >
> >On Mon, Jan 4, 2010 at 5:16 PM, Cindy Swearingen
> ><cindy.swearin...@sun.com> wrote:
> >>Hi Mike,
> >>
> >>It is difficult to comment on the root cause of this failure since
> >>the several interactions of these features are unknown. You might
> >>consider seeing how Ed's proposal plays out and let him do some more
> >>testing...
> >
> >Unfortunately Ed's proposal is not funded last I heard.  Ops Center
> >uses many of the same mechanisms for putting zones on ZFS.  This is
> >where I saw the problem initially.
> >
> >>If you are interested in testing this with NFSv4 and it still fails
> >>the same way, then also consider testing this with a local file
> >>instead of a NFS-mounted file and let us know the results. I'm also
> >>unsure of using the same path for the pool and the zone root path,
> >>rather than one path for pool and a pool/dataset path for zone
> >>root path. I will test this myself if I get some time.
> >
> >I have been unable to reproduce with a local file.  I have been able
> >to reproduce with NFSv4 on build 130.  Rather surprisingly the actual
> >checksums found in the ereports are sometimes "0x0 0x0 0x0 0x0" or
> >"0xbaddcafe00 ...".
> >
> >Here's what I did:
> >
> >- Install OpenSolaris build 130 (ldom on T5220)
> >- Mount some NFS space at /nfszone:
> >   mount -F nfs -o vers=4 $file:/path /nfszone
> >- Create a 10gig sparse file
> >   cd /nfszone
> >   mkfile -n 10g root
> >- Create a zpool
> >   zpool create -m /zones/nfszone nfszone /nfszone/root
> >- Configure and install a zone
> >   zonecfg -z nfszone
> >    set zonepath = /zones/nfszone
> >    set autoboot = false
> >    verify
> >    commit
> >    exit
> >   chmod 700 /zones/nfszone
> >   zoneadm -z nfszone install
> >
> >- Verify that the nfszone pool is clean.  First, pkg history in the
> >zone shows the timestamp of the last package operation
> >
> >  2010-01-07T20:27:07 install                   pkg             Succeeded
> >
> >At 20:31 I ran:
> >
> ># zpool status nfszone
> >  pool: nfszone
> > state: ONLINE
> > scrub: none requested
> >config:
> >
> >        NAME             STATE     READ WRITE CKSUM
> >        nfszone          ONLINE       0     0     0
> >          /nfszone/root  ONLINE       0     0     0
> >
> >errors: No known data errors
> >
> >I booted the zone.  By 20:32 it had accumulated 132 checksum errors:
> >
> > # zpool status nfszone
> >  pool: nfszone
> > state: DEGRADED
> >status: One or more devices has experienced an unrecoverable error.  An
> >        attempt was made to correct the error.  Applications are unaffected.
> >action: Determine if the device needs to be replaced, and clear the errors
> >        using 'zpool clear' or replace the device with 'zpool replace'.
> >   see: http://www.sun.com/msg/ZFS-8000-9P
> > scrub: none requested
> >config:
> >
> >        NAME             STATE     READ WRITE CKSUM
> >        nfszone          DEGRADED     0     0     0
> >          /nfszone/root  DEGRADED     0     0   132  too many errors
> >
> >errors: No known data errors
> >
> >fmdump has some very interesting things to say about the actual
> >checksums.  The 0x0 and 0xbaddcafe00 seem to shout that these checksum
> >errors are not due to a couple bits flipped
> >
> ># fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail
> >   2    cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62
> >0x290cbce13fc59dce
> >   3    cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400
> >0x7e0aef335f0c7f00
> >   3    cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800
> >0xd4f1025a8e66fe00
> >   4    cksum_actual = 0x0 0x0 0x0 0x0
> >   4    cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900
> >0x330107da7c4bcec0
> >   5    cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73
> >0x4e0b3a8747b8a8
> >   6    cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00
> >0x280934efa6d20f40
> >   6    cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00
> >0x89715e34fbf9cdc0
> >  16    cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00
> >0x7f84b11b3fc7f80
> >  48    cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500
> >0x82804bc6ebcfc0
> >
> >I halted the zone, exported the pool, imported the pool, then did a
> >scrub.  Everything seemed to be OK:
> >
> ># zpool export nfszone
> ># zpool import -d /nfszone nfszone
> ># zpool status nfszone
> >  pool: nfszone
> > state: ONLINE
> > scrub: none requested
> >config:
> >
> >        NAME             STATE     READ WRITE CKSUM
> >        nfszone          ONLINE       0     0     0
> >          /nfszone/root  ONLINE       0     0     0
> >
> >errors: No known data errors
> ># zpool scrub nfszone
> ># zpool status nfszone
> >  pool: nfszone
> > state: ONLINE
> > scrub: scrub completed after 0h0m with 0 errors on Thu Jan  7 21:56:47 2010
> >config:
> >
> >        NAME             STATE     READ WRITE CKSUM
> >        nfszone          ONLINE       0     0     0
> >          /nfszone/root  ONLINE       0     0     0
> >
> >errors: No known data errors
> >
> >But then I booted the zone...
> >
> ># zoneadm -z nfszone boot
> ># zpool status nfszone
> >  pool: nfszone
> > state: ONLINE
> >status: One or more devices has experienced an unrecoverable error.  An
> >        attempt was made to correct the error.  Applications are unaffected.
> >action: Determine if the device needs to be replaced, and clear the errors
> >        using 'zpool clear' or replace the device with 'zpool replace'.
> >   see: http://www.sun.com/msg/ZFS-8000-9P
> > scrub: scrub completed after 0h0m with 0 errors on Thu Jan  7 21:56:47 2010
> >config:
> >
> >        NAME             STATE     READ WRITE CKSUM
> >        nfszone          ONLINE       0     0     0
> >          /nfszone/root  ONLINE       0     0   109
> >
> >errors: No known data errors
> >
> >I'm confused as to why this pool seems to be quite usable even with so
> >many checksum errors.
> >
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Zones on shared storage - a warning

Reply via email to