hello, i'm currently running 0.61, with about 44 osd's and 4 monitors, one
as a spare.

with about 6 hosts.

I've been running into an issue where when one ceph host would go down the
entire system become unusable. today we recovered from a ssd crash crash
for an osd's journal, and it was a lot of work to get it back up, i
couldn't get monitors to come up and establish quorum. I was going to
rebuild it manually, but the documentation for ceph is outdated to manually
(dirty) remove a monitor using the monmap tool, i couldn't find the
/mon-$id/monmap directory.

anyway, I recovered eventually and was able to run with 4 monitors, and i
updated the crushmap and it crashed the monitor that i was updating the
crushmap too.

it now gives me

[976]: (33) Numerical argument out of domain

when i try to manually start it, i've seen this assert failure before, just
not sure whats causing it.

below i the log from the crash.
https://docs.google.com/a/nopatentpending.com/file/d/0BwQnRodV8ActNTVFUVpLVjdMSGc/edit

i'm not even really sure if my configs are right, i'm still pretty new at
this.

below are the configs, and the last map

ceph.conf
https://docs.google.com/file/d/0BwQnRodV8Acta3ZfSnBrOU40MW8/edit?usp=sharing

crush.map.txt
https://docs.google.com/file/d/0BwQnRodV8Actbl9hY054Mm9UTXM/edit?usp=sharing

if you need additional dumps from the monitor i can get it.

thanks
mr.npp
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to