I just had to restore an ms exchange database after an ceph hiccup (no actual 
data lost - Exchange is very good like that with its no loss restore!). The 
order of events went something like:

. Loss of connection on osd to the cluster network (public network was okay)
. pgs reported stuck
. stopped osd on the bad server
. resolved network problem
. restarted osd on the bad server
. noticed that the vm running exchange had hung
. rebooted and vm did a chkdsk automatically
. exchange refused to mount the main mailbox store

I'm not using rbd caching or anything, so for ntfs to lose files like that 
means something fairly nasty happened. My best guess is that the loss of 
connectivity and function while ceph was figuring out what was going on meant 
that windows IO was frozen and started timing out, but I still can't see how 
that could result in corruption.

Any suggestions on how I could avoid this situation in the future would be 
greatly appreciated!

Thanks

James
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to