On Fri, Jul 31, 2015 at 5:47 PM, Jan Schermer <j...@schermer.cz> wrote: > I know a few other people here were battling with the occasional issue of OSD > being extremely slow when starting. > > I personally run OSDs mixed with KVM guests on the same nodes, and was > baffled by this issue occuring mostly on the most idle (empty) machines. > Thought it was some kind of race condition where OSD started too fast and > disks couldn’t catch up, was investigating latency of CPUs and cards on a > mostly idle hardware etc. - with no improvement. > > But in the end, most of my issues were caused by page cache using too much > memory. This doesn’t cause any problems when the OSDs have their memory > allocated and are running, but when the OSD is (re)started, OS struggles to > allocate contiguous blocks of memory for it and its buffers. > This could also be why I’m seeing such an improvement with my NUMA pinning > script - cleaning memory on one node is probably easier and doesn’t block > allocations on other nodes. >
Although this is make sense to me. It still let me shocked by the fact that pagecache free or memory fragmentation will cause slow request! > How can you tell if this is your case? When restarting an OSD that has this > issue, look for CPU usage of “kswapd” processes. If it is >0 then you have > this issue and would benefit from setting this: > > for i in $(mount |grep "ceph/osd" |cut -d' ' -f1 |cut -d'/' -f3 |tr -d > '[0-9]') ; do echo 1 >/sys/block/$i/bdi/max_ratio ; done > (another option is echo 1 > drop_caches before starting the OSD, but that’s a > bit brutal) > > What this does is it limits the pagecache size for each block device to 1% of > physical memory. I’d like to limit it even further but it doesn’t understand > “0.3”... > > Let me know if it helps, I’ve not been able to test if this cures the problem > completely, but there was no regression after setting it. > > Jan > > P.S. This is for RHEL 6 / CentOS 6 ancient 2.6.32 kernel, newer kernels have > tunables to limit the overall pagecache size. You can also set the limits in > cgroups but I’m afraid that won’t help in this case as you can only set the > whole memory footprint limit where it will battle for allocations anyway. > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best Regards, Wheat _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com