Re: [ceph-users] Double OSD failure (won't start) any recovery options?

2016-06-30 Thread XPC Design
I was talking on IRC and we're guessing it was a memory issue. I've woken up every morning now with some sort of scrub errors, with must (but not all) spawning from the one system with the now dead osds. This morning I didn't wake up to find any scrub errors, (but I can't tell if it has anything to

Re: [ceph-users] Double OSD failure (won't start) any recovery options?

2016-06-30 Thread Tomasz Kuzemko
With pool size=3 Ceph still should be able to recover from 2 failed OSDs. It will however disallow client access to the PGs that have only 1 copy until they are replicated at least min_size times. Such PGs are not marked as "active". As to the reason of your problems it seems hardware related. Wha

[ceph-users] Double OSD failure (won't start) any recovery options?

2016-06-29 Thread XPC Design
I've had two osds fail and I'm pretty sure they wont recover from this. I'm looking for help trying to get them back online if possible... terminate called after throwing an instance of 'ceph::buffer::malformed_input' what(): buffer::malformed_input: bad checksum on pg_log_entry_t - I'm having

[ceph-users] Double OSD failure (won't start) any recovery options?

2016-06-29 Thread XPC Design
I've had two osds fail and I'm pretty sure they wont recover from this. I'm looking for help trying to get them back online if possible... terminate called after throwing an instance of 'ceph::buffer::malformed_input' what(): buffer::malformed_input: bad checksum on pg_log_entry_t - I'm having

[ceph-users] Double OSD failure

2015-09-22 Thread David Bierce
http://pastebin.com/BUm61Bbf On our cluster that hosts mainly RBDs, we have an OSD fail, the OSD was replaced. During the rebalance with the new OSD, another OSD failed. That OSD was replaced during the continuing rebalance. Now that the dust has settled most of are RBDs are hanging on PGs and