I know a few other people here were battling with the occasional issue of OSD 
being extremely slow when starting.

I personally run OSDs mixed with KVM guests on the same nodes, and was baffled 
by this issue occuring mostly on the most idle (empty) machines.
Thought it was some kind of race condition where OSD started too fast and disks 
couldn’t catch up, was investigating latency of CPUs and cards on a mostly idle 
hardware etc. - with no improvement.

But in the end, most of my issues were caused by page cache using too much 
memory. This doesn’t cause any problems when the OSDs have their memory 
allocated and are running, but when the OSD is (re)started, OS struggles to 
allocate contiguous blocks of memory for it and its buffers.
This could also be why I’m seeing such an improvement with my NUMA pinning 
script - cleaning memory on one node is probably easier and doesn’t block 
allocations on other nodes.

How can you tell if this is your case? When restarting an OSD that has this 
issue, look for CPU usage of “kswapd” processes. If it is >0 then you have this 
issue and would benefit from setting this:

for i in $(mount |grep "ceph/osd" |cut -d' ' -f1 |cut -d'/' -f3 |tr -d '[0-9]') 
; do echo 1 >/sys/block/$i/bdi/max_ratio ; done
(another option is echo 1 > drop_caches before starting the OSD, but that’s a 
bit brutal)

What this does is it limits the pagecache size for each block device to 1% of 
physical memory. I’d like to limit it even further but it doesn’t understand 
“0.3”...

Let me know if it helps, I’ve not been able to test if this cures the problem 
completely, but there was no regression after setting it.

Jan

P.S. This is for RHEL 6 / CentOS 6 ancient 2.6.32 kernel, newer kernels have 
tunables to limit the overall pagecache size. You can also set the limits in 
cgroups but I’m afraid that won’t help in this case as you can only set the 
whole memory footprint limit where it will battle for allocations anyway.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to