On 9/3/07, Dale Ghent <[EMAIL PROTECTED]> wrote: > > I saw a putback this past week from M. Maybee regarding this, but I > thought I'd post here that I saw what is apparently an incarnation of > 6569719 on a production box running s10u3 x86 w/ latest (on > sunsolve) patches. I have 3 other servers configured the same way WRT > work load, zfs pools and hardware resources, so if this occurs again > I'll see about logging a case and getting a relief patch. Anyhow, > perhaps a backport to s10 may be in order
[note: the patches I mention are s10 sparc specific. Translation to x86 required.] As of a few weeks ago s10u3 with latest patches did not have this problem for me, but s10u4 beta and snv69 did. My situation was on sun4v, not i386. More specifically: S10 118833-36, 118833-07, 118833-10: # zpool import pool: zfs id: 679728171331086542 state: FAULTED status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-5E config: zfs FAULTED corrupted data c0d1s3 FAULTED corrupted data snv_69, s10u4beta: Boot device: /[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]:dhcp File and args: -s SunOS Release 5.11 Version snv_69 64-bit Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Booting to milestone "milestone/single-user:default". Configuring /dev Using DHCP for network configuration information. Requesting System Maintenance Mode SINGLE USER MODE # zpool import panic[cpu0]/thread=300028943a0: dangling dbufs (dn=3000392dbe0, dbuf=3000392be08) 000002a10076f270 zfs:dnode_evict_dbufs+188 (3000392dbe0, 0, 1, 1, 2a10076f320, 7b729000) %l0-3: 000003000392ddf0 0000000000000000 0000000000000000 000003000392ddf8 %l4-7: 000002a10076f320 0000000000000001 000003000392bf20 0000000000000003 000002a10076f3e0 zfs:dmu_objset_evict_dbufs+100 (2, 0, 0, 7b722800, 0, 30000516900) %l0-3: 000000007b72ac00 000000007b724510 000000007b724400 0000030000516a70 %l4-7: 000003000392dbe0 0000030000516968 000000007b7228c1 0000000000000001 ... Sun offered me an IDR against 125100-07, but since I could not reproduce the problem on that kernel, I never tested it. This does imply that they think there is a dangling dbufs problem in 125100-07 that they think they have a fixed for support-paying customers. Perhaps this is the problem and related solution that you would be interested in. The interesting thing with my case is that the backing store for this device is a file on a ZFS file system, served up has a virtual disk in an LDOM. From the primary LDOM, there is no corruption. An unexpected reset (panic, I believe) of the primary LDOM seems to have caused the corruption in the guest LDOM. What was that about having the redundancy as close to the consumer as possible? :) -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss