Hi Frank, I'm having trouble getting the exact version of ceph you used to create this heap profile. Could you run the google-pprof --text steps at [1] and share the output?
Thanks, Dan [1] https://docs.ceph.com/docs/master/rados/troubleshooting/memory-profiling/ On Tue, Aug 11, 2020 at 2:37 PM Frank Schilder <fr...@dtu.dk> wrote: > > Hi Mark, > > here is a first collection of heap profiling data (valid 30 days): > > https://files.dtu.dk/u/53HHic_xx5P1cceJ/heap_profiling-2020-08-03.tgz?l > > This was collected with the following config settings: > > osd dev osd_memory_cache_min > 805306368 > osd basic osd_memory_target > 2147483648 > > Setting the cache_min value seems to help keeping cache space available. > Unfortunately, the above collection is for 12 days only. I needed to restart > the OSD and will need to restart it soon again. I hope I can then run a > longer sample. The profiling does cause slow ops though. > > Maybe you can see something already? It seems to have collected some leaked > memory. Unfortunately, it was a period of extremely low load. Basically, with > the day of recording the utilization dropped to almost zero. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Frank Schilder <fr...@dtu.dk> > Sent: 21 July 2020 12:57:32 > To: Mark Nelson; Dan van der Ster > Cc: ceph-users > Subject: [ceph-users] Re: OSD memory leak? > > Quick question: Is there a way to change the frequency of heap dumps? On this > page http://goog-perftools.sourceforge.net/doc/heap_profiler.html a function > HeapProfilerSetAllocationInterval() is mentioned, but no other way of > configuring this. Is there a config parameter or a ceph daemon call to adjust > this? > > If not, can I change the dump path? > > Its likely to overrun my log partition quickly if I cannot adjust either of > the two. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Frank Schilder <fr...@dtu.dk> > Sent: 20 July 2020 15:19:05 > To: Mark Nelson; Dan van der Ster > Cc: ceph-users > Subject: [ceph-users] Re: OSD memory leak? > > Dear Mark, > > thank you very much for the very helpful answers. I will raise > osd_memory_cache_min, leave everything else alone and watch what happens. I > will report back here. > > Thanks also for raising this as an issue. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Mark Nelson <mnel...@redhat.com> > Sent: 20 July 2020 15:08:11 > To: Frank Schilder; Dan van der Ster > Cc: ceph-users > Subject: Re: [ceph-users] Re: OSD memory leak? > > On 7/20/20 3:23 AM, Frank Schilder wrote: > > Dear Mark and Dan, > > > > I'm in the process of restarting all OSDs and could use some quick advice > > on bluestore cache settings. My plan is to set higher minimum values and > > deal with accumulated excess usage via regular restarts. Looking at the > > documentation > > (https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/), > > I find the following relevant options (with defaults): > > > > # Automatic Cache Sizing > > osd_memory_target {4294967296} # 4GB > > osd_memory_base {805306368} # 768MB > > osd_memory_cache_min {134217728} # 128MB > > > > # Manual Cache Sizing > > bluestore_cache_meta_ratio {.4} # 40% ? > > bluestore_cache_kv_ratio {.4} # 40% ? > > bluestore_cache_kv_max {512 * 1024*1024} # 512MB > > > > Q1) If I increase osd_memory_cache_min, should I also increase > > osd_memory_base by the same or some other amount? > > > osd_memory_base is a hint at how much memory the OSD could consume > outside the cache once it's reached steady state. It basically sets a > hard cap on how much memory the cache will use to avoid over-committing > memory and thrashing when we exceed the memory limit. It's not necessary > to get it right, it just helps smooth things out by making the automatic > memory tuning less aggressive. IE if you have a 2 GB memory target and > a 512MB base, you'll never assign more than 1.5GB to the cache on the > assumption that the rest of the OSD will eventually need 512MB to > operate even if it's not using that much right now. I think you can > probably just leave it alone. What you and Dan appear to be seeing is > that this number isn't static in your case but increases over time any > way. Eventually I'm hoping that we can automatically account for more > and more of that memory by reading the data from the mempools. > > > Q2) The cache ratio options are shown under the section "Manual Cache > > Sizing". Do they also apply when cache auto tuning is enabled? If so, is it > > worth changing these defaults for higher values of osd_memory_cache_min? > > > They actually do have an effect on the automatic cache sizing and > probably shouldn't only be under the manual section. When you have the > automatic cache sizing enabled, those options will affect the "fair > share" values of the different caches at each cache priority level. IE > at priority level 0, if both caches want more memory than is available, > those ratios will determine how much each cache gets. If there is more > memory available than requested, each cache gets as much as they want > and we move on to the next priority level and do the same thing again. > So in this case the ratios end up being sort of more like fallback > settings for when you don't have enough memory to fulfill all cache > requests at a given priority level, but otherwise are not utilized until > we hit that limit. The goal with this scheme is to make sure that "high > priority" items in each cache get first dibs at the memory even if it > might skew the ratios. This might be things like rocksdb bloom filters > and indexes, or potentially very recent hot items in one cache vs very > old items in another cache. The ratios become more like guidelines than > hard limits. > > > When you change to manual mode, you set an overall bluestore cache size > and each cache gets a flat percentage of it based on the ratios. With > 0.4/0.4 you will always have 40% for onode, 40% for omap, and 20% for > data even if one of those caches does not use all of it's memory. > > > > > > Many thanks for your help with this. I can't find answers to these > > questions in the docs. > > > > There might be two reasons for high osd_map memory usage. One is, that our > > OSDs seem to hold a large number of OSD maps: > > > I brought this up in our core team standup last week. Not sure if > anyone has had time to look at it yet though. > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io