On Wed, Jul 08, 2009 at 08:43:17AM +1200, Ian Collins wrote: > Ian Collins wrote: >> Brent Jones wrote: >>> On Fri, Jul 3, 2009 at 8:31 PM, Ian Collins<i...@ianshome.com> wrote: >>> >>>> Ian Collins wrote: >>>> >>>>> I was doing an incremental send between pools, the receive side is >>>>> locked up and no zfs/zpool commands work on that pool. >>>>> >>>>> The stacks look different from those reported in the earlier "ZFS >>>>> snapshot send/recv "hangs" X4540 servers" thread. >>>>> >>>>> Here is the process information from scat (other commands hanging on >>>>> the pool are also in cv_wait): >>>>> >>>>> >>>> Has anyone else seen anything like this? The box wouldn't even >>>> reboot, it had to be power cycled. It locks up on receive regularly >>>> now. >>> >>> I hit this too: >>> 6826836 >>> >>> Fixed in 117 >>> >>> http://opensolaris.org/jive/thread.jspa?threadID=104852&tstart=120 >>> >> I don't think this is the same problem (which is why a started a new >> thread), a single incremental set will eventually lock the pool up, >> pretty much guaranteed each time. >> > One more data point: > > This didn't happen when I had a single pool (stripe of mirrors) on the > server. It started happening when I split the mirrors and created a > second pool built from 3 8 drive raidz2 vdevs. Sending to the new pool > (either locally or from another machine) causes the hangs.
And here are my data points: We were running two X4500s under Nevada 112 but came across this issue on both of them. On receiving much data through a ZFS receive, they would lock up. Any zpool or zfs commands would hang and were unkillable. The only way to resolve the situation was to reboot without syncing disks. I reported this in some posts back in April (http://opensolaris.org/jive/click.jspa?searchID=2021762&messageID=368524) One of them had an old enough zpool and zfs version to down/up/sidegrade to Solaris 10 u6 and so I made this change. The thumper running Solaris 10 is now mostly fine - it normally receives an hourly snapshot with no problem. The thumper unning 112 has continued to experience the issues described by Ian and others. I've just upgraded to 117 and am having even more issues - I'm unable to receive or roll back snapshots, instead I see: 506 r...@thumper1:~> cat snap | zfs receive -vF thumperpool receiving incremental stream of vlepool/m...@200906182000 into thumperp...@200906182000 cannot receive incremental stream: most recent snapshot of thumperpool does not match incremental source 511 r...@thumper1:~> zfs rollback -r thumperpool/m...@200906181800 cannot destroy 'thumperpool/m...@200906181900': dataset already exists As a result, I'm a bit scuppered. I'm going to try going back to by 112 installation instead to see if that resolves any of my issues. All of our thumpers have the following disk configuration: 4 x 11 Disk raidz2 arrays with 2 disks as hot spares in a single pool. 2 disks in a mirror for booting. When zpool locks up on the main pool, I'm still able to get a zpool status on the boot pool. I can't access any data on the pool which is locked up. Andrew -- Systems Developer e: andrew.nic...@luns.net.uk im: a.nic...@jabber.lancs.ac.uk t: +44 (0)1524 5 10147 Lancaster University Network Services is a limited company registered in England and Wales. Registered number: 4311892. Registered office: University House, Lancaster University, Lancaster, LA1 4YW
signature.asc
Description: Digital signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss