Hi Greg,

Restarting the actual service ie: service ceph restart osd.50, only takes a few 
seconds.

Attached is a ceph -w of just running a service ceph restart osd.50, 

You can see it marks itself down pretty much straight away. Takes a little 
while to mark itself as up and finish "recovery"

If I do this to all 12 osd's the node goes crazy, It's almost like the node is 
cpu bound but it has 6 cores, and load average goes to 300+ 

http://pastie.org/pastes/8968950/text?key=0e0bs1ojbm2arnexn52iwq

Regards,
Quenten

-----Original Message-----
From: Gregory Farnum [mailto:g...@inktank.com] 
Sent: Wednesday, 26 March 2014 2:02 AM
To: Quenten Grasso
Cc: Kyle Bader; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] OSD Restarts cause excessively high load average and 
"requests are blocked > 32 sec"

How long does it take for the OSDs to restart? Are you just issuing a restart 
command via upstart/sysvinit/whatever? How many OSDMaps are generated from the 
time you issue that command to the time the cluster is healthy again?

This sounds like an issue we had for a while where OSDs would start peering 
before they had processed the maps they needed to look at; the fix might not 
have been backported to Emperor. But I'd like to be sure this isn't some other 
issue you're seeing.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sat, Mar 22, 2014 at 8:16 PM, Quenten Grasso <qgra...@onq.com.au> wrote:
> Hi Kyle,
>
> Thanks, I turned on debug ms = 1 and debug osd = 10 and restarted osd.54 
> heres here's log for that one.
>
> ceph-osd.54.log.bz2
> http://www67.zippyshare.com/v/99704627/file.html
>
>
> Strace osd 53,
> strace.zip
> http://www43.zippyshare.com/v/17581165/file.html
>
>
> Thanks,
> Quenten
> -----Original Message-----
> From: Kyle Bader [mailto:kyle.ba...@gmail.com]
> Sent: Sunday, 23 March 2014 12:10 PM
> To: Quenten Grasso
> Subject: Re: [ceph-users] OSD Restarts cause excessively high load average 
> and "requests are blocked > 32 sec"
>
>> Any ideas on why the load average goes so crazy & starts to block IO?
>
> Could you turn on "debug ms = 1" and "debug osd = 10" prior to restarting the 
> OSDs on one of your hosts and sharing the logs so we can take a look?
>
> It also might be worth while to strace one of the OSDs to try to determine 
> what it's working so hard on, maybe:
>
> strace -fc -p <osd pid>  > strace.osd1.log
>
> Thanks!
>
> --
>
> Kyle
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to