On Fri, Jul 31, 2015 at 5:47 PM, Jan Schermer <j...@schermer.cz> wrote:
> I know a few other people here were battling with the occasional issue of OSD 
> being extremely slow when starting.
>
> I personally run OSDs mixed with KVM guests on the same nodes, and was 
> baffled by this issue occuring mostly on the most idle (empty) machines.
> Thought it was some kind of race condition where OSD started too fast and 
> disks couldn’t catch up, was investigating latency of CPUs and cards on a 
> mostly idle hardware etc. - with no improvement.
>
> But in the end, most of my issues were caused by page cache using too much 
> memory. This doesn’t cause any problems when the OSDs have their memory 
> allocated and are running, but when the OSD is (re)started, OS struggles to 
> allocate contiguous blocks of memory for it and its buffers.
> This could also be why I’m seeing such an improvement with my NUMA pinning 
> script - cleaning memory on one node is probably easier and doesn’t block 
> allocations on other nodes.
>

Although this is make sense to me. It still let me shocked by the fact
that pagecache free or memory fragmentation will cause slow request!

> How can you tell if this is your case? When restarting an OSD that has this 
> issue, look for CPU usage of “kswapd” processes. If it is >0 then you have 
> this issue and would benefit from setting this:
>
> for i in $(mount |grep "ceph/osd" |cut -d' ' -f1 |cut -d'/' -f3 |tr -d 
> '[0-9]') ; do echo 1 >/sys/block/$i/bdi/max_ratio ; done
> (another option is echo 1 > drop_caches before starting the OSD, but that’s a 
> bit brutal)
>
> What this does is it limits the pagecache size for each block device to 1% of 
> physical memory. I’d like to limit it even further but it doesn’t understand 
> “0.3”...
>
> Let me know if it helps, I’ve not been able to test if this cures the problem 
> completely, but there was no regression after setting it.
>
> Jan
>
> P.S. This is for RHEL 6 / CentOS 6 ancient 2.6.32 kernel, newer kernels have 
> tunables to limit the overall pagecache size. You can also set the limits in 
> cgroups but I’m afraid that won’t help in this case as you can only set the 
> whole memory footprint limit where it will battle for allocations anyway.
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best Regards,

Wheat
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to