Re: [zfs-discuss] zfs cp hangs when the mirrors are removed ..

Karthik Krishnamoorthy Mon, 20 Oct 2008 09:11:01 -0700

Hi Richard,
Richard Elling wrote:
> Karthik Krishnamoorthy wrote:
>> We did try with this
>>
>> zpool set failmode=continue <pool> option
>>
>> and the wait option before pulling running the cp command and pulling 
>> out the mirrors and in both cases there was a hang and I have a core 
>> dump of the hang as well.
>>   
>
> You have to wait for the I/O drivers to declare that the device is
> dead.  This can be up to several minutes, depending on the driver.
Okay, the customer indicated they didn't see a hang with the ufs when 
they ran the same test with UFS.
>
>> Any pointers to the bug opening process ?
>>   
>
> http://bugs.opensolaris.org, or bugster if you have an account.
> Be sure to indicate which drivers you are using, as this is not likely
> a ZFS bug, per se.  Output from prtconf -D should be a minimum.
I have the core dump of the hang. Will make that available as well.


Thanks and regards,
Karthik
> -- richard
>
>> Thanks
>> Karthik
>>
>> On 10/15/08 22:27, Neil Perrin wrote:
>>  
>>> On 10/15/08 23:12, Karthik Krishnamoorthy wrote:
>>>    
>>>> Neil,
>>>>
>>>> Thanks for the quick suggestion, the hang seems to happen even with 
>>>> the zpool set failmode=continue <pool> option.
>>>>
>>>> Any other way to recover from the hang ?
>>>>       
>>> You should set the property before you remove the devices.
>>> This should prevent the hang. It isn't used to recover from it.
>>>
>>> If you did do that then it seems like a bug somewhere in ZFS or the 
>>> IO stack
>>> below it. In which case you should file a bug.
>>>
>>> Neil.
>>>    
>>>> thanks and regards,
>>>> Karthik
>>>>
>>>> On 10/15/08 22:03, Neil Perrin wrote:
>>>>      
>>>>> Karthik,
>>>>>
>>>>> The pool failmode property as implemented governs the behaviour 
>>>>> when all
>>>>> the devices needed are unavailable. The default behaviour is to wait
>>>>> (block) until the IO can continue - perhaps by re-enabling the 
>>>>> device(s).
>>>>> The behaviour you expected can be achieved by "zpool set 
>>>>> failmode=continue <pool>",
>>>>> as shown in the link you indicated below.
>>>>>
>>>>> Neil.
>>>>>
>>>>> On 10/15/08 22:38, Karthik Krishnamoorthy wrote:
>>>>>        
>>>>>> Hello All,
>>>>>>
>>>>>>   Summary:
>>>>>>   ~~~~~~~~
>>>>>>   cp command for mirrored zfs hung when all the disks in the 
>>>>>> mirrored
>>>>>>   pool were unavailable.
>>>>>>     Detailed description:
>>>>>>   ~~~~~~~~~~~~~~~~~~~~~
>>>>>>     The cp command (copy a 1GB file from nfs to zfs) hung when 
>>>>>> all the disks
>>>>>>   in the mirrored pool (both c1t0d9 and c2t0d9) were removed 
>>>>>> physically.
>>>>>>            NAME        STATE     READ WRITE CKSUM
>>>>>>          test        ONLINE      0     0     0
>>>>>>            mirror    ONLINE      0     0     0
>>>>>>              c1t0d9  ONLINE      0     0     0
>>>>>>              c2t0d9  ONLINE      0     0     0
>>>>>>     We think if all the disks in the pool are unavailable, cp 
>>>>>> command should
>>>>>>   fail with error (not cause hang).
>>>>>>     Our request:
>>>>>>   ~~~~~~~~~~~~
>>>>>>   Please investigate the root cause of this issue.
>>>>>>  
>>>>>>   How to reproduce:
>>>>>>   ~~~~~~~~~~~~~~~~~
>>>>>>   1. create a zfs mirrored pool
>>>>>>   2. execute cp command from somewhere to the zfs mirrored pool.
>>>>>>   3. remove the both of disks physically during cp command working
>>>>>>     =  hang happen (cp command never return and we can't kill cp 
>>>>>> command)
>>>>>>
>>>>>> One engineer pointed me to this page  
>>>>>> http://opensolaris.org/os/community/arc/caselog/2007/567/onepager/ 
>>>>>> and indicated that if all the mirrors are removed zfs enters a 
>>>>>> hang like state to prevent the kernel from going into a panic 
>>>>>> mode and this type of feature would be an RFE.
>>>>>>
>>>>>> My questions are
>>>>>>
>>>>>> Are there any documentation of the "mirror" configuration of zfs 
>>>>>> that explains what happens when the underlying
>>>>>> drivers detect problems in one of the mirror devices?
>>>>>>
>>>>>> It seems that the traditional views of "mirror" or "raid-2" would 
>>>>>> expect that the
>>>>>> mirror would be able to proceed without interruption and that 
>>>>>> does not seem to be this case in ZFS.
>>>>>> What is the purpose of the mirror, in zfs?  Is it more like an 
>>>>>> instant
>>>>>> backup?  If so, what can the user do to recover, when there is an
>>>>>> IO error on one of the devices?
>>>>>>
>>>>>>
>>>>>> Appreciate any pointers and help,
>>>>>>
>>>>>> Thanks and regards,
>>>>>> Karthik
>>>>>> _______________________________________________
>>>>>> zfs-discuss mailing list
>>>>>> zfs-discuss@opensolaris.org
>>>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>>>>           
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>   
>

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs cp hangs when the mirrors are removed ..

Reply via email to