Hi, 

We have recently deployed a Ceph cluster with 

12 OSD nodes(16 Core + 200GB RAM + 30 disks each of 14TB) Running CentOS 8
3 Monitoring Nodes (8 Core + 16GB RAM) Running CentOS 8

We are using Ceph Octopus and we are using RBD block devices.

We have three Ceph client nodes(16core + 30GB RAM, Running CentOS 8) across 
which RBDs are mapped and mounted, 25 RBDs each on each client node. Each RBD 
size is 10TB. Each RBD is formatted as EXT4 file system. 

>From network side, we have 10Gbps Active/Passive Bond on all the Ceph cluster 
>nodes, including the clients. Jumbo frames enabled  and MTU is 9000

This is a new cluster and cluster health reports Ok. But we see high IO wait 
during the writes. 

>From one of the clients, 

15:14:30        CPU     %user     %nice   %system   %iowait    %steal     %idle
15:14:31        all      0.06      0.00      1.00     45.03      0.00     53.91
15:14:32        all      0.06      0.00      0.94     41.28      0.00     57.72
15:14:33        all      0.06      0.00      1.25     45.78      0.00     52.91
15:14:34        all      0.00      0.00      1.06     40.07      0.00     58.86
15:14:35        all      0.19      0.00      1.38     41.04      0.00     57.39
Average:        all      0.08      0.00      1.13     42.64      0.00     56.16

and the system load shows very high 

top - 15:19:15 up 34 days, 41 min,  2 users,  load average: 13.49, 13.62, 13.83


>From 'atop' 

one of the CPUs shows this 

CPU | sys       7%  | user      1% |  irq       2% |  idle   1394% | wait    
195%  | steal     0% |  guest     0% | ipc  initial  | cycl initial  | curf  
806MHz |  curscal   ?%

On the OSD nodes, don't see much %utilization of the disks. 

RBD caching values are default. 

Are we overlooking some configuration item ?

Thanks and Regards,

At
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to