We did try with this zpool set failmode=continue <pool> option
and the wait option before pulling running the cp command and pulling out the mirrors and in both cases there was a hang and I have a core dump of the hang as well. Any pointers to the bug opening process ? Thanks Karthik On 10/15/08 22:27, Neil Perrin wrote: > > > On 10/15/08 23:12, Karthik Krishnamoorthy wrote: >> Neil, >> >> Thanks for the quick suggestion, the hang seems to happen even with >> the zpool set failmode=continue <pool> option. >> >> Any other way to recover from the hang ? > > You should set the property before you remove the devices. > This should prevent the hang. It isn't used to recover from it. > > If you did do that then it seems like a bug somewhere in ZFS or the IO > stack > below it. In which case you should file a bug. > > Neil. >> >> thanks and regards, >> Karthik >> >> On 10/15/08 22:03, Neil Perrin wrote: >>> Karthik, >>> >>> The pool failmode property as implemented governs the behaviour when >>> all >>> the devices needed are unavailable. The default behaviour is to wait >>> (block) until the IO can continue - perhaps by re-enabling the >>> device(s). >>> The behaviour you expected can be achieved by "zpool set >>> failmode=continue <pool>", >>> as shown in the link you indicated below. >>> >>> Neil. >>> >>> On 10/15/08 22:38, Karthik Krishnamoorthy wrote: >>>> Hello All, >>>> >>>> Summary: >>>> ~~~~~~~~ >>>> cp command for mirrored zfs hung when all the disks in the mirrored >>>> pool were unavailable. >>>> Detailed description: >>>> ~~~~~~~~~~~~~~~~~~~~~ >>>> The cp command (copy a 1GB file from nfs to zfs) hung when all >>>> the disks >>>> in the mirrored pool (both c1t0d9 and c2t0d9) were removed >>>> physically. >>>> NAME STATE READ WRITE CKSUM >>>> test ONLINE 0 0 0 >>>> mirror ONLINE 0 0 0 >>>> c1t0d9 ONLINE 0 0 0 >>>> c2t0d9 ONLINE 0 0 0 >>>> We think if all the disks in the pool are unavailable, cp >>>> command should >>>> fail with error (not cause hang). >>>> Our request: >>>> ~~~~~~~~~~~~ >>>> Please investigate the root cause of this issue. >>>> >>>> How to reproduce: >>>> ~~~~~~~~~~~~~~~~~ >>>> 1. create a zfs mirrored pool >>>> 2. execute cp command from somewhere to the zfs mirrored pool. >>>> 3. remove the both of disks physically during cp command working >>>> = hang happen (cp command never return and we can't kill cp >>>> command) >>>> >>>> One engineer pointed me to this page >>>> http://opensolaris.org/os/community/arc/caselog/2007/567/onepager/ >>>> and indicated that if all the mirrors are removed zfs enters a hang >>>> like state to prevent the kernel from going into a panic mode and >>>> this type of feature would be an RFE. >>>> >>>> My questions are >>>> >>>> Are there any documentation of the "mirror" configuration of zfs >>>> that explains what happens when the underlying >>>> drivers detect problems in one of the mirror devices? >>>> >>>> It seems that the traditional views of "mirror" or "raid-2" would >>>> expect that the >>>> mirror would be able to proceed without interruption and that does >>>> not seem to be this case in ZFS. >>>> What is the purpose of the mirror, in zfs? Is it more like an instant >>>> backup? If so, what can the user do to recover, when there is an >>>> IO error on one of the devices? >>>> >>>> >>>> Appreciate any pointers and help, >>>> >>>> Thanks and regards, >>>> Karthik >>>> _______________________________________________ >>>> zfs-discuss mailing list >>>> zfs-discuss@opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss