Hi Richard, Richard Elling wrote: > Karthik Krishnamoorthy wrote: >> We did try with this >> >> zpool set failmode=continue <pool> option >> >> and the wait option before pulling running the cp command and pulling >> out the mirrors and in both cases there was a hang and I have a core >> dump of the hang as well. >> > > You have to wait for the I/O drivers to declare that the device is > dead. This can be up to several minutes, depending on the driver. Okay, the customer indicated they didn't see a hang with the ufs when they ran the same test with UFS. > >> Any pointers to the bug opening process ? >> > > http://bugs.opensolaris.org, or bugster if you have an account. > Be sure to indicate which drivers you are using, as this is not likely > a ZFS bug, per se. Output from prtconf -D should be a minimum. I have the core dump of the hang. Will make that available as well.
Thanks and regards, Karthik > -- richard > >> Thanks >> Karthik >> >> On 10/15/08 22:27, Neil Perrin wrote: >> >>> On 10/15/08 23:12, Karthik Krishnamoorthy wrote: >>> >>>> Neil, >>>> >>>> Thanks for the quick suggestion, the hang seems to happen even with >>>> the zpool set failmode=continue <pool> option. >>>> >>>> Any other way to recover from the hang ? >>>> >>> You should set the property before you remove the devices. >>> This should prevent the hang. It isn't used to recover from it. >>> >>> If you did do that then it seems like a bug somewhere in ZFS or the >>> IO stack >>> below it. In which case you should file a bug. >>> >>> Neil. >>> >>>> thanks and regards, >>>> Karthik >>>> >>>> On 10/15/08 22:03, Neil Perrin wrote: >>>> >>>>> Karthik, >>>>> >>>>> The pool failmode property as implemented governs the behaviour >>>>> when all >>>>> the devices needed are unavailable. The default behaviour is to wait >>>>> (block) until the IO can continue - perhaps by re-enabling the >>>>> device(s). >>>>> The behaviour you expected can be achieved by "zpool set >>>>> failmode=continue <pool>", >>>>> as shown in the link you indicated below. >>>>> >>>>> Neil. >>>>> >>>>> On 10/15/08 22:38, Karthik Krishnamoorthy wrote: >>>>> >>>>>> Hello All, >>>>>> >>>>>> Summary: >>>>>> ~~~~~~~~ >>>>>> cp command for mirrored zfs hung when all the disks in the >>>>>> mirrored >>>>>> pool were unavailable. >>>>>> Detailed description: >>>>>> ~~~~~~~~~~~~~~~~~~~~~ >>>>>> The cp command (copy a 1GB file from nfs to zfs) hung when >>>>>> all the disks >>>>>> in the mirrored pool (both c1t0d9 and c2t0d9) were removed >>>>>> physically. >>>>>> NAME STATE READ WRITE CKSUM >>>>>> test ONLINE 0 0 0 >>>>>> mirror ONLINE 0 0 0 >>>>>> c1t0d9 ONLINE 0 0 0 >>>>>> c2t0d9 ONLINE 0 0 0 >>>>>> We think if all the disks in the pool are unavailable, cp >>>>>> command should >>>>>> fail with error (not cause hang). >>>>>> Our request: >>>>>> ~~~~~~~~~~~~ >>>>>> Please investigate the root cause of this issue. >>>>>> >>>>>> How to reproduce: >>>>>> ~~~~~~~~~~~~~~~~~ >>>>>> 1. create a zfs mirrored pool >>>>>> 2. execute cp command from somewhere to the zfs mirrored pool. >>>>>> 3. remove the both of disks physically during cp command working >>>>>> = hang happen (cp command never return and we can't kill cp >>>>>> command) >>>>>> >>>>>> One engineer pointed me to this page >>>>>> http://opensolaris.org/os/community/arc/caselog/2007/567/onepager/ >>>>>> and indicated that if all the mirrors are removed zfs enters a >>>>>> hang like state to prevent the kernel from going into a panic >>>>>> mode and this type of feature would be an RFE. >>>>>> >>>>>> My questions are >>>>>> >>>>>> Are there any documentation of the "mirror" configuration of zfs >>>>>> that explains what happens when the underlying >>>>>> drivers detect problems in one of the mirror devices? >>>>>> >>>>>> It seems that the traditional views of "mirror" or "raid-2" would >>>>>> expect that the >>>>>> mirror would be able to proceed without interruption and that >>>>>> does not seem to be this case in ZFS. >>>>>> What is the purpose of the mirror, in zfs? Is it more like an >>>>>> instant >>>>>> backup? If so, what can the user do to recover, when there is an >>>>>> IO error on one of the devices? >>>>>> >>>>>> >>>>>> Appreciate any pointers and help, >>>>>> >>>>>> Thanks and regards, >>>>>> Karthik >>>>>> _______________________________________________ >>>>>> zfs-discuss mailing list >>>>>> zfs-discuss@opensolaris.org >>>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>>>>> >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss