[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2020-02-21 Thread Troy Ablan
0x7f574c8d9a28]  4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f574d1e87d5]  5: (()+0x5e746) [0x7f574d1e6746]  6: (()+0x5e773) [0x7f574d1e6773]  7: (()+0x5e993) [0x7f574d1e6993]  8: (OSDMap::decode(ceph::buffer::list::iterator&)+0x160e) [0x7f5750b0b68e]  9: (OSDMap::decode(ceph::buffer:

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2020-02-20 Thread Troy Ablan
your cluster had crashed, but not the HDDs. Both SSDs and HDDs were bluestore? Did the hdds ever crash subsequently? Which OS/kernel do you run? We're CentOS 7 with quite some uptime. On Thu, Feb 20, 2020 at 10:29 PM Troy Ablan wrote: I hope I don't sound too happy to hear that you&#

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2020-02-20 Thread Troy Ablan
an van der Ster wrote: Thanks Troy for the quick response. Are you still running mimic on that cluster? Seeing the crashes in nautilus too? Our cluster is also quite old -- so it could very well be memory or network gremlins. Cheers, Dan On Thu, Feb 20, 2020 at 10:11 PM Troy Ablan wrote: Dan

[ceph-users] Re: RESOLVED: Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2020-02-20 Thread Troy Ablan
ppen again in your cluster? Cheers, Dan On Tue, Aug 20, 2019 at 2:18 AM Troy Ablan wrote: While I'm still unsure how this happened, this is what was done to solve this. Started OSD in foreground with debug 10, watched for the most recent osdmap epoch mentioned before abort(). For