We did try with this

zpool set failmode=continue <pool> option

and the wait option before pulling running the cp command and pulling 
out the mirrors and in both cases there was a hang and I have a core 
dump of the hang as well.

Any pointers to the bug opening process ?

Thanks
Karthik

On 10/15/08 22:27, Neil Perrin wrote:
>
>
> On 10/15/08 23:12, Karthik Krishnamoorthy wrote:
>> Neil,
>>
>> Thanks for the quick suggestion, the hang seems to happen even with 
>> the zpool set failmode=continue <pool> option.
>>
>> Any other way to recover from the hang ?
>
> You should set the property before you remove the devices.
> This should prevent the hang. It isn't used to recover from it.
>
> If you did do that then it seems like a bug somewhere in ZFS or the IO 
> stack
> below it. In which case you should file a bug.
>
> Neil.
>>
>> thanks and regards,
>> Karthik
>>
>> On 10/15/08 22:03, Neil Perrin wrote:
>>> Karthik,
>>>
>>> The pool failmode property as implemented governs the behaviour when 
>>> all
>>> the devices needed are unavailable. The default behaviour is to wait
>>> (block) until the IO can continue - perhaps by re-enabling the 
>>> device(s).
>>> The behaviour you expected can be achieved by "zpool set 
>>> failmode=continue <pool>",
>>> as shown in the link you indicated below.
>>>
>>> Neil.
>>>
>>> On 10/15/08 22:38, Karthik Krishnamoorthy wrote:
>>>> Hello All,
>>>>
>>>>   Summary:
>>>>   ~~~~~~~~
>>>>   cp command for mirrored zfs hung when all the disks in the mirrored
>>>>   pool were unavailable.
>>>>     Detailed description:
>>>>   ~~~~~~~~~~~~~~~~~~~~~
>>>>     The cp command (copy a 1GB file from nfs to zfs) hung when all 
>>>> the disks
>>>>   in the mirrored pool (both c1t0d9 and c2t0d9) were removed 
>>>> physically.
>>>>            NAME        STATE     READ WRITE CKSUM
>>>>          test        ONLINE      0     0     0
>>>>            mirror    ONLINE      0     0     0
>>>>              c1t0d9  ONLINE      0     0     0
>>>>              c2t0d9  ONLINE      0     0     0
>>>>     We think if all the disks in the pool are unavailable, cp 
>>>> command should
>>>>   fail with error (not cause hang).
>>>>     Our request:
>>>>   ~~~~~~~~~~~~
>>>>   Please investigate the root cause of this issue.
>>>>  
>>>>   How to reproduce:
>>>>   ~~~~~~~~~~~~~~~~~
>>>>   1. create a zfs mirrored pool
>>>>   2. execute cp command from somewhere to the zfs mirrored pool.
>>>>   3. remove the both of disks physically during cp command working
>>>>     =  hang happen (cp command never return and we can't kill cp 
>>>> command)
>>>>
>>>> One engineer pointed me to this page  
>>>> http://opensolaris.org/os/community/arc/caselog/2007/567/onepager/ 
>>>> and indicated that if all the mirrors are removed zfs enters a hang 
>>>> like state to prevent the kernel from going into a panic mode and 
>>>> this type of feature would be an RFE.
>>>>
>>>> My questions are
>>>>
>>>> Are there any documentation of the "mirror" configuration of zfs 
>>>> that explains what happens when the underlying
>>>> drivers detect problems in one of the mirror devices?
>>>>
>>>> It seems that the traditional views of "mirror" or "raid-2" would 
>>>> expect that the
>>>> mirror would be able to proceed without interruption and that does 
>>>> not seem to be this case in ZFS.
>>>> What is the purpose of the mirror, in zfs?  Is it more like an instant
>>>> backup?  If so, what can the user do to recover, when there is an
>>>> IO error on one of the devices?
>>>>
>>>>
>>>> Appreciate any pointers and help,
>>>>
>>>> Thanks and regards,
>>>> Karthik
>>>> _______________________________________________
>>>> zfs-discuss mailing list
>>>> zfs-discuss@opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to