> On Nov 30, 2015, at 6:52 PM, Laurent GUERBY <laur...@guerby.net> wrote:
> 
> Hi,
> 
> We lost a disk today in our ceph cluster so we added a new machine with
> 4 disks to replace the capacity and we activated straw1 tunable too
> (we also tried straw2 but we quickly backed up this change).
> 
> During recovery OSD started crashing on all of our machines
> the issue being OSD RAM usage that goes very high, eg:
> 
> 24078 root      20   0 27.784g 0.026t  10888 S   5.9 84.9
> 16:23.63 /usr/bin/ceph-osd --cluster=ceph -i 41 -f
> /dev/sda1       2.7T  2.2T  514G  82% /var/lib/ceph/osd/ceph-41
> 
> That's about 8GB resident RAM per TB of disk, way above
> what we provisionned ~ 2-4 GB RAM/TB.

We had something vaguely similar (not nearly that dramatic though!) happen to 
us. During a recovery (actually, I think this was rebalancing after upgrading 
from an earlier version of ceph), our OSDs took so much memory they would get 
killed by oom_killer and we couldn't keep the cluster up long enough to get 
back to healthy. 

A solution for us was to enable zswap; previously we had been running with no 
swap at all. 

If you are running a kernel newer than 3.11 (you might want more recent than 
that as I believe there were major fixes after 3.17), then enabling zswap 
allows the kernel to compress pages in memory before needing to touch disk. The 
default max pool size for this is 20% of memory. There is extra CPU time to 
compress/decompress, but it's much faster than going to disk, and the OSD data 
appears to be quite compressible. For us, nothing actually made it to the disk, 
but a swapfile must to be enabled for zswap to do its work. 

https://www.kernel.org/doc/Documentation/vm/zswap.txt
http://askubuntu.com/questions/471912/zram-vs-zswap-vs-zcache-ultimate-guide-when-to-use-which-one

Add "zswap.enabled=1" to your kernel bool parameters and reboot. 

If you have no swap file/partition/disk/whatever, then you need one for zswap 
to actually do anything. Here is an example, but use whatever sizes, locations, 
process you prefer:

dd if=/dev/zero of=/var/swap bs=1M count=8192
chmod 600 /var/swap
mkswap /var/swap
swapon /var/swap

Consider adding it to /etc/fstab:
/var/swap       swap    swap    defaults 0 0 

This got us through the rebalancing. The OSDs eventually returned to normal, 
but we've just left zswap enabled with no apparent problems. I don't know that 
it will be enough for your situation, but it might help. 

Ryan
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to