[ceph-users] Cluster crash - FAILED assert(interval.last > last)

Josef Zelenka Wed, 10 Jan 2018 13:06:12 -0800

Hi, today we had a disasterous crash - we are running a 3 node, 24 osdin total cluster (8 each) with SSDs for blockdb, HDD for bluestore data.This cluster is used as a radosgw backend, for storing a big number ofthumbnails for a file hosting site - around 110m files in total. We wereadding an interface to the nodes which required a restart, but afterrestarting one of the nodes, a lot of the OSDs were kicked out of thecluster and rgw stopped working. We have a lot of pgs down and unfoundatm. OSDs can't be started(aside from some, that's a mystery) with thiserror - FAILED assert ( interval.last > last) - they just periodicallyrestart. So far, the cluster is broken and we can't seem to bring itback up. We tried fscking the osds via the ceph objectstore tool, but itwas no good. The root of all this seems to be in the FAILEDassert(interval.last > last) error, however i can't find any inforegarding this or how to fix it. Did someone here also encounter it?We're running luminous on ubuntu 16.04.


Thanks


Josef Zelenka

Cloudevelops

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Cluster crash - FAILED assert(interval.last > last)

Reply via email to