Re: [zfs-discuss] zfs cp hangs when the mirrors are removed ..

Richard Elling Thu, 16 Oct 2008 07:54:50 -0700

Karthik Krishnamoorthy wrote:
> We did try with this
>
> zpool set failmode=continue <pool> option
>
> and the wait option before pulling running the cp command and pulling 
> out the mirrors and in both cases there was a hang and I have a core 
> dump of the hang as well.
>


You have to wait for the I/O drivers to declare that the device is
dead.  This can be up to several minutes, depending on the driver.

> Any pointers to the bug opening process ?
>   

http://bugs.opensolaris.org, or bugster if you have an account.
Be sure to indicate which drivers you are using, as this is not likely
a ZFS bug, per se.  Output from prtconf -D should be a minimum.
 -- richard

> Thanks
> Karthik
>
> On 10/15/08 22:27, Neil Perrin wrote:
>   
>> On 10/15/08 23:12, Karthik Krishnamoorthy wrote:
>>     
>>> Neil,
>>>
>>> Thanks for the quick suggestion, the hang seems to happen even with 
>>> the zpool set failmode=continue <pool> option.
>>>
>>> Any other way to recover from the hang ?
>>>       
>> You should set the property before you remove the devices.
>> This should prevent the hang. It isn't used to recover from it.
>>
>> If you did do that then it seems like a bug somewhere in ZFS or the IO 
>> stack
>> below it. In which case you should file a bug.
>>
>> Neil.
>>     
>>> thanks and regards,
>>> Karthik
>>>
>>> On 10/15/08 22:03, Neil Perrin wrote:
>>>       
>>>> Karthik,
>>>>
>>>> The pool failmode property as implemented governs the behaviour when 
>>>> all
>>>> the devices needed are unavailable. The default behaviour is to wait
>>>> (block) until the IO can continue - perhaps by re-enabling the 
>>>> device(s).
>>>> The behaviour you expected can be achieved by "zpool set 
>>>> failmode=continue <pool>",
>>>> as shown in the link you indicated below.
>>>>
>>>> Neil.
>>>>
>>>> On 10/15/08 22:38, Karthik Krishnamoorthy wrote:
>>>>         
>>>>> Hello All,
>>>>>
>>>>>   Summary:
>>>>>   ~~~~~~~~
>>>>>   cp command for mirrored zfs hung when all the disks in the mirrored
>>>>>   pool were unavailable.
>>>>>     Detailed description:
>>>>>   ~~~~~~~~~~~~~~~~~~~~~
>>>>>     The cp command (copy a 1GB file from nfs to zfs) hung when all 
>>>>> the disks
>>>>>   in the mirrored pool (both c1t0d9 and c2t0d9) were removed 
>>>>> physically.
>>>>>            NAME        STATE     READ WRITE CKSUM
>>>>>          test        ONLINE      0     0     0
>>>>>            mirror    ONLINE      0     0     0
>>>>>              c1t0d9  ONLINE      0     0     0
>>>>>              c2t0d9  ONLINE      0     0     0
>>>>>     We think if all the disks in the pool are unavailable, cp 
>>>>> command should
>>>>>   fail with error (not cause hang).
>>>>>     Our request:
>>>>>   ~~~~~~~~~~~~
>>>>>   Please investigate the root cause of this issue.
>>>>>  
>>>>>   How to reproduce:
>>>>>   ~~~~~~~~~~~~~~~~~
>>>>>   1. create a zfs mirrored pool
>>>>>   2. execute cp command from somewhere to the zfs mirrored pool.
>>>>>   3. remove the both of disks physically during cp command working
>>>>>     =  hang happen (cp command never return and we can't kill cp 
>>>>> command)
>>>>>
>>>>> One engineer pointed me to this page  
>>>>> http://opensolaris.org/os/community/arc/caselog/2007/567/onepager/ 
>>>>> and indicated that if all the mirrors are removed zfs enters a hang 
>>>>> like state to prevent the kernel from going into a panic mode and 
>>>>> this type of feature would be an RFE.
>>>>>
>>>>> My questions are
>>>>>
>>>>> Are there any documentation of the "mirror" configuration of zfs 
>>>>> that explains what happens when the underlying
>>>>> drivers detect problems in one of the mirror devices?
>>>>>
>>>>> It seems that the traditional views of "mirror" or "raid-2" would 
>>>>> expect that the
>>>>> mirror would be able to proceed without interruption and that does 
>>>>> not seem to be this case in ZFS.
>>>>> What is the purpose of the mirror, in zfs?  Is it more like an instant
>>>>> backup?  If so, what can the user do to recover, when there is an
>>>>> IO error on one of the devices?
>>>>>
>>>>>
>>>>> Appreciate any pointers and help,
>>>>>
>>>>> Thanks and regards,
>>>>> Karthik
>>>>> _______________________________________________
>>>>> zfs-discuss mailing list
>>>>> zfs-discuss@opensolaris.org
>>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>>>           
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs cp hangs when the mirrors are removed ..

Reply via email to