Re: [ceph-users] OSDs crashing in EC pool (whack-a-mole)

2019-01-18 Thread Peter Woodman
At the risk of hijacking this thread, like I said I've ran into this problem again, and have captured a log with debug_osd=20, viewable at https://www.dropbox.com/s/8zoos5hhvakcpc4/ceph-osd.3.log?dl=0 - any pointers? On Tue, Jan 8, 2019 at 11:31 AM Peter Woodman wrote: > > For the record, in the

Re: [ceph-users] OSDs crashing in EC pool (whack-a-mole)

2019-01-08 Thread Peter Woodman
For the record, in the linked issue, it was thought that this might be due to write caching. This seems not to be the case, as it happened again to me with write caching disabled. On Tue, Jan 8, 2019 at 11:15 AM Sage Weil wrote: > > I've seen this on luminous, but not on mimic. Can you generate

Re: [ceph-users] OSDs crashing in EC pool (whack-a-mole)

2019-01-08 Thread Sage Weil
I've seen this on luminous, but not on mimic. Can you generate a log with debug osd = 20 leading up to the crash? Thanks! sage On Tue, 8 Jan 2019, Paul Emmerich wrote: > I've seen this before a few times but unfortunately there doesn't seem > to be a good solution at the moment :( > > See al

Re: [ceph-users] OSDs crashing in EC pool (whack-a-mole)

2019-01-08 Thread Paul Emmerich
I've seen this before a few times but unfortunately there doesn't seem to be a good solution at the moment :( See also: http://tracker.ceph.com/issues/23145 Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München

[ceph-users] OSDs crashing in EC pool (whack-a-mole)

2019-01-08 Thread David Young
Hi all, One of my OSD hosts recently ran into RAM contention (was swapping heavily), and after rebooting, I'm seeing this error on random OSDs in the cluster: --- Jan 08 03:34:36 prod1 ceph-osd[3357939]: ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable) Jan 08 03:34