One thing to note.... All of our kvm VMs have to be rebooted. This is something I wasn't expecting. Tried waiting for them to recover on their own, but that's not happening. Rebooting them restores service immediately. :/ Not ideal.
On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier <greg.poir...@opower.com>wrote: > Going to try increasing the full ratio. Disk utilization wasn't really > growing at an unreasonable pace. I'm going to keep an eye on it for the > next couple of hours and down/out the OSDs if necessary. > > We have four more machines that we're in the process of adding (which > doubles the number of OSDs), but got held up by some networking nonsense. > > Thanks for the tips. > > > On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil <s...@inktank.com> wrote: > >> On Thu, 10 Apr 2014, Greg Poirier wrote: >> > Hi, >> > I have about 200 VMs with a common RBD volume as their root filesystem >> and a >> > number of additional filesystems on Ceph. >> > >> > All of them have stopped responding. One of the OSDs in my cluster is >> marked >> > full. I tried stopping that OSD to force things to rebalance or at >> least go >> > to degraded mode, but nothing is responding still. >> > >> > I'm not exactly sure what to do or how to investigate. Suggestions? >> >> Try marking the osd out or partially out (ceph osd reweight N .9) to move >> some data off, and/or adjust the full ratio up (ceph pg set_full_ratio >> .95). Note that this becomes increasinly dangerous as OSDs get closer to >> full; add some disks. >> >> sage > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com