Going to try increasing the full ratio. Disk utilization wasn't really growing at an unreasonable pace. I'm going to keep an eye on it for the next couple of hours and down/out the OSDs if necessary.
We have four more machines that we're in the process of adding (which doubles the number of OSDs), but got held up by some networking nonsense. Thanks for the tips. On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil <s...@inktank.com> wrote: > On Thu, 10 Apr 2014, Greg Poirier wrote: > > Hi, > > I have about 200 VMs with a common RBD volume as their root filesystem > and a > > number of additional filesystems on Ceph. > > > > All of them have stopped responding. One of the OSDs in my cluster is > marked > > full. I tried stopping that OSD to force things to rebalance or at least > go > > to degraded mode, but nothing is responding still. > > > > I'm not exactly sure what to do or how to investigate. Suggestions? > > Try marking the osd out or partially out (ceph osd reweight N .9) to move > some data off, and/or adjust the full ratio up (ceph pg set_full_ratio > .95). Note that this becomes increasinly dangerous as OSDs get closer to > full; add some disks. > > sage
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com