I can confirm it seems to be kernels greater than 3.16, we had this problem
where servers would lock up and had to perform restarts on a weekly basis.
We downgraded to 3.16, since then we have not had to do any restarts.

I did find this thread in the XFS forums and I am not sure if has been
fixed or not
http://oss.sgi.com/archives/xfs/2015-07/msg00034.html


On Tue, Dec 8, 2015 at 2:06 AM Tom Christensen <pav...@gmail.com> wrote:

> We run deep scrubs via cron with a script so we know when deep scrubs are
> happening, and we've seen nodes fail both during deep scrubbing and while
> no deep scrubs are occurring so I'm pretty sure its not related.
>
>
> On Tue, Dec 8, 2015 at 2:42 AM, Benedikt Fraunhofer <fraunho...@traced.net
> > wrote:
>
>> Hi Tom,
>>
>> 2015-12-08 10:34 GMT+01:00 Tom Christensen <pav...@gmail.com>:
>>
>> > We didn't go forward to 4.2 as its a large production cluster, and we
>> just
>> > needed the problem fixed.  We'll probably test out 4.2 in the next
>> couple
>>
>> unfortunately we don't have the luxury of a test cluster.
>> and to add to that, we couldnt simulate the load, altough it does not
>> seem to be load related.
>> Did you try running with nodeep-scrub as a short-term workaround?
>>
>> I'll give ~30% of the nodes 4.2 and see how it goes.
>>
>> > In our experience it takes about 2 weeks to start happening
>>
>> we're well below that. Somewhat between 1 and 4 days.
>> And yes, once one goes south, it affects the rest of the cluster.
>>
>> Thx!
>>
>>  Benedikt
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to