We
On Fri, Dec 28, 2018, 4:23 PM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Fri, Dec 7, 2018 at 12:43 PM Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > >> >> After a fresh JVM start the memory allocation looks roughly like this: >> >> total used free shared buffers cached >> Mem: 14G 14G 173M 1.1M 12M 3.2G >> -/+ buffers/cache: 11G 3.4G >> Swap: 0B 0B 0B >> >> Then, within a number of days, the allocated disk cache shrinks all the >> way down to unreasonable numbers like only 150M. At the same time "free" >> stays at the original level and "used" grows all the way up to 14G. >> Shortly after that the node becomes unavailable because of the IO and >> ultimately after some time the JVM gets killed. >> >> Most importantly, the resident size of JVM process stays at around 11-12G >> all the time, like it was shortly after the start. How can we find where >> the rest of the memory gets allocated? Is it just some sort of malloc >> fragmentation? >> > > For the ones following along at home, here's what we ended up with so far: > > 0. Switched to the next biggest EC2 instance type, r4.xlarge: and the > symptoms are gone. Our bill is dominated by the price EBS storage, so this > is much less than 2x increase in total. > > 1. We've noticed that increased memory usage correlates with the number of > SSTables on disk. When the number of files on disk decreases, available > memory increases. This leads us to think that extra memory allocation is > indeed due to use of mmap. Not clear how we could account for that. > > 2. Improved our monitoring to include number of files (via total - free > inodes). > > Given the cluster's resource utilization, it still feels like r4.large > would be a good fit, if only we could figure out those few "missing" GB of > RAM. ;-) > > Cheers! > -- > Alex > >