> On Sep 1, 2015, at 16:13, Simon Hallam <s...@pml.ac.uk> wrote:
> 
> Hi Greg, Zheng,
> 
> Is this fixed in a later version of the kernel client? Or would it be wise 
> for us to start using the fuse client?
> 
> Cheers,

I just wrote a fix 
https://github.com/ceph/ceph-client/commit/33b68dde7f27927a7cb1a7691e3c5b6f847ffd14
 
<https://github.com/ceph/ceph-client/commit/33b68dde7f27927a7cb1a7691e3c5b6f847ffd14>.
  Yes, you should try ceps-fuse if this bug causes problems for you.

Regards
Yan, Zheng

> 
> Simon
> 
>> -----Original Message-----
>> From: Gregory Farnum [mailto:gfar...@redhat.com]
>> Sent: 31 August 2015 13:02
>> To: Yan, Zheng
>> Cc: Simon Hallam; Zheng Yan; ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] Testing CephFS
>> 
>> On Mon, Aug 31, 2015 at 12:16 PM, Yan, Zheng <uker...@gmail.com> wrote:
>>> On Mon, Aug 24, 2015 at 6:38 PM, Gregory Farnum
>> <gfar...@redhat.com> wrote:
>>>> On Mon, Aug 24, 2015 at 11:35 AM, Simon  Hallam <s...@pml.ac.uk>
>> wrote:
>>>>> Hi Greg,
>>>>> 
>>>>> The MDS' detect that the other one went down and started the replay.
>>>>> 
>>>>> I did some further testing with 20 client machines. Of the 20 client
>> machines, 5 hung with the following error:
>>>>> 
>>>>> [Aug24 10:53] ceph: mds0 caps stale
>>>>> [Aug24 10:54] ceph: mds0 caps stale
>>>>> [Aug24 10:58] ceph: mds0 hung
>>>>> [Aug24 11:03] ceph: mds0 came back
>>>>> [  +8.803334] libceph: mon2 10.15.0.3:6789 socket closed (con state
>> OPEN)
>>>>> [  +0.000018] libceph: mon2 10.15.0.3:6789 session lost, hunting for new
>> mon
>>>>> [Aug24 11:04] ceph: mds0 reconnect start
>>>>> [  +0.084938] libceph: mon2 10.15.0.3:6789 session established
>>>>> [  +0.008475] ceph: mds0 reconnect denied
>>>> 
>>>> Oh, this might be a kernel bug, failing to ask for mdsmap updates when
>>>> the connection goes away. Zheng, does that sound familiar?
>>>> -Greg
>>>> 
>>> 
>>> I reproduced this locally (use SIGSTOP to stop the monitor) . I think
>>> the root cause is that kernel client does not implement
>>> CEPH_FEATURE_MSGR_KEEPALIVE2. So the kernel client couldn't reliably
>>> detect the event that network cable got unplugged. It kept waiting for
>>> new events from the disconnected connection.
>> 
>> Yeah, the userspace client maintains an ongoing MDSMap subscription
>> from the monitors in order to hear about this. It puts more load on
>> the monitors but right now that's the solution we're going with: the
>> monitor times out the MDS, publishes a series of new maps (pushed to
>> the clients) in order to activate a standby, and the clients see that
>> they need to connect to the new MDS instance.
>> -Greg
> 
> 
> Please visit our new website at www.pml.ac.uk and follow us on Twitter  
> @PlymouthMarine
> 
> Winner of the Environment & Conservation category, the Charity Awards 2014.
> 
> Plymouth Marine Laboratory (PML) is a company limited by guarantee registered 
> in England & Wales, company number 4178503. Registered Charity No. 1091222. 
> Registered Office: Prospect Place, The Hoe, Plymouth  PL1 3DH, UK. 
> 
> This message is private and confidential. If you have received this message 
> in error, please notify the sender and remove it from your system. You are 
> reminded that e-mail communications are not secure and may contain viruses; 
> PML accepts no liability for any loss or damage which may be caused by 
> viruses.
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to