I was talking on IRC and we're guessing it was a memory issue. I've woken
up every morning now with some sort of scrub errors, with must (but not
all) spawning from the one system with the now dead osds. This morning I
didn't wake up to find any scrub errors, (but I can't tell if it has
anything to
With pool size=3 Ceph still should be able to recover from 2 failed
OSDs. It will however disallow client access to the PGs that have only 1
copy until they are replicated at least min_size times. Such PGs are not
marked as "active".
As to the reason of your problems it seems hardware related. Wha
I've had two osds fail and I'm pretty sure they wont recover from this. I'm
looking for help trying to get them back online if possible...
terminate called after throwing an instance of
'ceph::buffer::malformed_input'
what(): buffer::malformed_input: bad checksum on pg_log_entry_t
- I'm having
I've had two osds fail and I'm pretty sure they wont recover from
this. I'm looking for help trying to get them back online if
possible...
terminate called after throwing an instance of 'ceph::buffer::malformed_input'
what(): buffer::malformed_input: bad checksum on pg_log_entry_t
- I'm having
http://pastebin.com/BUm61Bbf
On our cluster that hosts mainly RBDs, we have an OSD fail, the OSD was
replaced. During the rebalance with the new OSD, another OSD failed. That
OSD was replaced during the continuing rebalance. Now that the dust has
settled most of are RBDs are hanging on PGs and