Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

Mykola Dvornik Tue, 08 Dec 2015 01:44:22 -0800

The same thing happens to my setup with CentOS7.x + non-stock kernel(kernel-ml from elrepo).

I was not happy with IOPS I got out of the stock CentOS7.x so I did thekernel upgrade and crashes started to happen until some of the OSDsbecome non-bootable at all. The funny thing is that I was not able todowngrade back to stock since OSDs were crashing with 'cannot decode'errors. I am doing backup at the moment and OSDs crash from time totime due to the ceph watchdog despite the x20 timeouts.


I believe the version of kernel-ml I have started with was 3.19.

On Tue, Dec 8, 2015 at 10:34 AM, Tom Christensen <pav...@gmail.com>wrote:

We didn't go forward to 4.2 as its a large production cluster, and wejust needed the problem fixed. We'll probably test out 4.2 in thenext couple months, but this one slipped past us as it didn't occurin our test cluster until after we had upgraded production. In ourexperience it takes about 2 weeks to start happening, but once itdoes its all hands on deck cause nodes are going to go down regularly.
All that being said, if/when we try 4.2 its going to need to run for1-2 months rock solid in our test cluster before it gets toproduction.
On Tue, Dec 8, 2015 at 2:30 AM, Benedikt Fraunhofer<fraunho...@traced.net> wrote:
Hi Tom,
> We have been seeing this same behavior on a cluster that has beenperfectly> happy until we upgraded to the ubuntu vivid 3.19 kernel. We arein the
i can't recall when we gave 3.19 a shot but now that you say it...The
cluster was happy for >9 months with 3.16.
Did you try 4.2 or do you think the regression from 3.16 introduced
somewhere trough 3.19 is still in 4.2?

Thx!
   Benedikt

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd become unusable, blocked by xfsaild (?) and load > 5000

Reply via email to