Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-23 Thread Alex Litvak
The only possible hint, crush coincides with a scrub time interval start. Why it didn't happen yesterday at the same time, I have no idea. I returned default debug settings with a hope that I get a little bit more info when next crush happens. I really would like to debug only specific componen

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-23 Thread Alex Litvak
I just had an osd crashed with no logs (debug was not enabled). Happened 24 hours later after actual upgrade from 14.2.1 to 14.2.2. Nothing else changed as far as environment or load. Disk is OK. Restarted osd and it came back. Had cluster up for 2 month until the upgrade without an issue. O

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-23 Thread Nathan Fish
I have not had any more OSDs crash, but the 3 that crashed still crash on startup. I may purge and recreate them, but there's no hurry. I have 18 OSDs per host and plenty of free space currently. On Tue, Jul 23, 2019 at 2:19 AM Ashley Merrick wrote: > > Have they been stable since, or still had s

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-22 Thread Ashley Merrick
Have they been stable since, or still had some crash? ,Thanks On Sat, 20 Jul 2019 10:09:08 +0800 Nigel Williams wrote On Sat, 20 Jul 2019 at 04:28, Nathan Fish wrote: On further investigation, it seems to be this bug: http://tracker.ceph.com/issu

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Alex Litvak
I was planning to upgrade 14.2.1 to 14.2.2 next week. Since there are few reports of crashes, does any one knows if upgrade somehow triggers the issue? If not, that what is? Since this has been reported before the upgrade by some, just wondering if upgrade to 14.2.2 makes the problem worse. O

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nathan Fish
Good to know. I tried reset-failed and restart several times, it didn't work on any of them. I also rebooted one of the hosts, didn't help. Thankfully it seems they failed far enough apart that our nearly-empty cluster rebuilt in time. But it's rather worrying. On Fri, Jul 19, 2019 at 10:09 PM Nig

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nigel Williams
On Sat, 20 Jul 2019 at 04:28, Nathan Fish wrote: > On further investigation, it seems to be this bug: > http://tracker.ceph.com/issues/38724 We just upgraded to 14.2.2, and had a dozen OSDs at 14.2.2 go down this bug, recovered with: systemctl reset-failed ceph-osd@160 systemctl start ceph-osd

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nathan Fish
On further investigation, it seems to be this bug: http://tracker.ceph.com/issues/38724 On Fri, Jul 19, 2019 at 1:38 PM Nathan Fish wrote: > > I came in this morning and started to upgrade to 14.2.2, only to > notice that 3 OSDs had crashed overnight - exactly 1 on each of 3 > hosts. Apparently t

[ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-19 Thread Nathan Fish
I came in this morning and started to upgrade to 14.2.2, only to notice that 3 OSDs had crashed overnight - exactly 1 on each of 3 hosts. Apparently there was no data loss, which implies they crashed at different times, far enough part to rebuild? Still digging through logs to find exactly when the