Re: [ceph-users] Cluster crash - FAILED assert(interval.last > last)

Josef Zelenka Thu, 11 Jan 2018 01:49:00 -0800

I have posted logs/strace from our osds with details to a ticket in theceph bug tracker - see here http://tracker.ceph.com/issues/21142. Youcan see where exactly the OSDs crash etc, this can be of help if someonedecides to debug it.

JZ



On 10/01/18 22:05, Josef Zelenka wrote:

Hi, today we had a disasterous crash - we are running a 3 node, 24 osdin total cluster (8 each) with SSDs for blockdb, HDD for bluestoredata. This cluster is used as a radosgw backend, for storing a bignumber of thumbnails for a file hosting site - around 110m files intotal. We were adding an interface to the nodes which required arestart, but after restarting one of the nodes, a lot of the OSDs werekicked out of the cluster and rgw stopped working. We have a lot ofpgs down and unfound atm. OSDs can't be started(aside from some,that's a mystery) with this error - FAILED assert ( interval.last >last) - they just periodically restart. So far, the cluster is brokenand we can't seem to bring it back up. We tried fscking the osds viathe ceph objectstore tool, but it was no good. The root of all thisseems to be in the FAILED assert(interval.last > last) error, howeveri can't find any info regarding this or how to fix it. Did someonehere also encounter it? We're running luminous on ubuntu 16.04.
Thanks

Josef Zelenka

Cloudevelops



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cluster crash - FAILED assert(interval.last > last)

Reply via email to