Re: [ceph-users] OSD full - All RBD Volumes stopped responding

Greg Poirier Thu, 10 Apr 2014 22:14:17 -0700

One thing to note....
All of our kvm VMs have to be rebooted. This is something I wasn't
expecting.  Tried waiting for them to recover on their own, but that's not
happening. Rebooting them restores service immediately. :/ Not ideal.



On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier <greg.poir...@opower.com>wrote:

> Going to try increasing the full ratio. Disk utilization wasn't really
> growing at an unreasonable pace. I'm going to keep an eye on it for the
> next couple of hours and down/out the OSDs if necessary.
>
> We have four more machines that we're in the process of adding (which
> doubles the number of OSDs), but got held up by some networking nonsense.
>
> Thanks for the tips.
>
>
> On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil <s...@inktank.com> wrote:
>
>> On Thu, 10 Apr 2014, Greg Poirier wrote:
>> > Hi,
>> > I have about 200 VMs with a common RBD volume as their root filesystem
>> and a
>> > number of additional filesystems on Ceph.
>> >
>> > All of them have stopped responding. One of the OSDs in my cluster is
>> marked
>> > full. I tried stopping that OSD to force things to rebalance or at
>> least go
>> > to degraded mode, but nothing is responding still.
>> >
>> > I'm not exactly sure what to do or how to investigate. Suggestions?
>>
>> Try marking the osd out or partially out (ceph osd reweight N .9) to move
>> some data off, and/or adjust the full ratio up (ceph pg set_full_ratio
>> .95).  Note that this becomes increasinly dangerous as OSDs get closer to
>> full; add some disks.
>>
>> sage
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD full - All RBD Volumes stopped responding

Reply via email to