Re: [ceph-users] Write freeze when writing to rbd image and rebooting one of the nodes

Vasiliy Angapov Tue, 12 May 2015 23:58:41 -0700

Thanks, Gregory!

My Ceph version is 0.94.1. What I'm trying to test is the worst situation
when the node is loosing network or becomes inresponsive. So what i do is
"killall -9 ceph-osd", then reboot.


Well, I also tried to do a clean reboot several times (just a "reboot"
command), but i saw no difference - there is always an IO freeze for about
30 seconds. Btw, i'm using Fedora 20 on all nodes.

Ok, I will play with timeouts more.

Thanks again!

On Wed, May 13, 2015 at 10:46 AM, Gregory Farnum <g...@gregs42.com> wrote:

> On Tue, May 12, 2015 at 11:39 PM, Vasiliy Angapov <anga...@gmail.com>
> wrote:
> > Hi, colleagues!
> >
> > I'm testing a simple Ceph cluster in order to use it in production
> > environment. I have 8 OSDs (1Tb SATA  drives) which are evenly
> distributed
> > between 4 nodes.
> >
> > I'v mapped rbd image on the client node and started writing a lot of
> data to
> > it. Then I just reboot one node and see what's happening. What happens is
> > very sad. I have a write freeze for about 20-30 seconds which is enough
> for
> > ext4 filesystem to switch to RO.
> >
> > I wonder, if there is any way to minimize this lag? AFAIK, ext
> filesystems
> > have 5 seconds timeout before switching to RO. So is there any way to get
> > that lag beyond 5 secs? I've tried lowering different osd timeouts, but
> it
> > doesn't seem to help.
> >
> > How do you deal with such a situations? 20 seconds of downtime is not
> > tolerable in production.
>
> What version of Ceph are you running, and how are you rebooting it?
> Any newish version that gets a clean reboot will notify the cluster
> that it's shutting down, so you shouldn't witness blocked rights
> really at all.
>
> If you're doing a reboot that involves just ending the daemon, you
> will have to wait through the timeout period before the OSD gets
> marked down, which defaults to 30 seconds. This is adjustable (look
> for docs on the "osd heartbeat grace" config option), although if you
> set it too low you'll need to change a bunch of other timeouts which I
> don't know off-hand...
> -Greg
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Write freeze when writing to rbd image and rebooting one of the nodes

Reply via email to