Re: [ceph-users] OSD Restarts cause excessively high load average and "requests are blocked > 32 sec"

Martin B Nielsen Sun, 23 Mar 2014 03:07:27 -0700

Hi,

I can see ~17% hardware interrupts which I find a little high - can you
make sure all load is spread over all your cores (/proc/interrupts)?


What about disk util once you restart them? Are they all 100% utilized or
is it 'only' mostly cpu-bound?

Also you're running a monitor on this node - how is the load on the nodes
where you run a monitor compared to those where you dont?

Cheers,
Martin


On Thu, Mar 20, 2014 at 10:18 AM, Quenten Grasso <qgra...@onq.com.au> wrote:

>  Hi All,
>
>
>
> I left out my OS/kernel version, Ubuntu 12.04.4 LTS w/ Kernel
> 3.10.33-031033-generic (We upgrade our kernels to 3.10 due to Dell Drivers).
>
>
>
> Here's an example of starting all the OSD's after a reboot.
>
>
>
> top - 09:10:51 up 2 min,  1 user,  load average: 332.93, 112.28, 39.96
>
> Tasks: 310 total,   1 running, 309 sleeping,   0 stopped,   0 zombie
>
> Cpu(s): 50.3%us, 32.5%sy,  0.0%ni,  0.0%id,  0.0%wa, 17.2%hi,  0.0%si,
> 0.0%st
>
> Mem:  32917276k total,  6331224k used, 26586052k free,     1332k buffers
>
> Swap: 33496060k total,        0k used, 33496060k free,  1474084k cached
>
>
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>
> 15875 root      20   0  910m 381m  50m S   60  1.2   0:50.57 ceph-osd
>
> 2996 root      20   0  867m 330m  44m S   59  1.0   0:58.32 ceph-osd
>
> 4502 root      20   0  907m 372m  47m S   58  1.2   0:55.14 ceph-osd
>
> 12465 root      20   0  949m 418m  55m S   58  1.3   0:51.79 ceph-osd
>
> 4171 root      20   0  886m 348m  45m S   57  1.1   0:56.17 ceph-osd
>
> 3707 root      20   0  941m 405m  50m S   57  1.3   0:59.68 ceph-osd
>
> 3560 root      20   0  924m 394m  51m S   56  1.2   0:59.37 ceph-osd
>
> 4318 root      20   0  965m 435m  55m S   56  1.4   0:54.80 ceph-osd
>
> 3337 root      20   0  935m 407m  51m S   56  1.3   1:01.96 ceph-osd
>
> 3854 root      20   0  897m 366m  48m S   55  1.1   1:00.55 ceph-osd
>
> 3143 root      20   0 1364m 424m  24m S   16  1.3   1:08.72 ceph-osd
>
> 2509 root      20   0  652m 261m  62m S    2  0.8   0:26.42 ceph-mon
>
>     4 root      20   0     0    0    0 S    0  0.0   0:00.08 kworker/0:0
>
>
>
> Regards,
>
> Quenten Grasso
>
>
>
> *From:* ceph-users-boun...@lists.ceph.com [mailto:
> ceph-users-boun...@lists.ceph.com] *On Behalf Of *Quenten Grasso
> *Sent:* Tuesday, 18 March 2014 10:19 PM
> *To:* 'ceph-users@lists.ceph.com'
> *Subject:* [ceph-users] OSD Restarts cause excessively high load average
> and "requests are blocked > 32 sec"
>
>
>
> Hi All,
>
>
>
> I'm trying to troubleshoot a strange issue with my Ceph cluster.
>
>
>
> We're Running Ceph Version 0.72.2
>
> All Nodes are Dell R515's w/ 6C AMD CPU w/ 32GB Ram, 12 x 3TB NearlineSAS
> Drives and 2 x 100GB Intel DC S3700 SSD's for Journals.
>
> All Pools have a replica of 2 or better. I.e. metadata replica of 3.
>
>
>
> I have 55 OSD's in the cluster across 5 nodes. When I restart the OSD's on
> a single node (any node) the load average of that node shoots up to 230+
> and the whole cluster starts blocking IO requests until it settles down and
> its fine again.
>
>
>
> Any ideas on why the load average goes so crazy & starts to block IO?
>
>
>
>
>
> <snips from my ceph.conf>
>
> [osd]
>
>         osd data = /var/ceph/osd.$id
>
>         osd journal size = 15000
>
>         osd mkfs type = xfs
>
>         osd mkfs options xfs = "-i size=2048 -f"
>
>         osd mount options xfs =
> "rw,noexec,nodev,noatime,nodiratime,barrier=0,inode64,logbufs=8,logbsize=256k"
>
>         osd max backfills = 5
>
>         osd recovery max active = 3
>
>
>
> [osd.0]
>
>         host = pbnerbd01
>
>         public addr = 10.100.96.10
>
>         cluster addr = 10.100.128.10
>
>         osd journal =
> /dev/disk/by-id/scsi-36b8ca3a0eaa2660019deaf8d3a40bec4-part1
>
>         devs = /dev/sda4
>
> </end>
>
>
>
> Thanks,
>
> Quenten
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD Restarts cause excessively high load average and "requests are blocked > 32 sec"

Reply via email to