Hi Frank,

I'm having trouble getting the exact version of ceph you used to
create this heap profile.
Could you run the google-pprof --text steps at [1] and share the output?

Thanks, Dan

[1] https://docs.ceph.com/docs/master/rados/troubleshooting/memory-profiling/


On Tue, Aug 11, 2020 at 2:37 PM Frank Schilder <fr...@dtu.dk> wrote:
>
> Hi Mark,
>
> here is a first collection of heap profiling data (valid 30 days):
>
> https://files.dtu.dk/u/53HHic_xx5P1cceJ/heap_profiling-2020-08-03.tgz?l
>
> This was collected with the following config settings:
>
>   osd                      dev      osd_memory_cache_min              
> 805306368
>   osd                      basic    osd_memory_target                 
> 2147483648
>
> Setting the cache_min value seems to help keeping cache space available. 
> Unfortunately, the above collection is for 12 days only. I needed to restart 
> the OSD and will need to restart it soon again. I hope I can then run a 
> longer sample. The profiling does cause slow ops though.
>
> Maybe you can see something already? It seems to have collected some leaked 
> memory. Unfortunately, it was a period of extremely low load. Basically, with 
> the day of recording the utilization dropped to almost zero.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Frank Schilder <fr...@dtu.dk>
> Sent: 21 July 2020 12:57:32
> To: Mark Nelson; Dan van der Ster
> Cc: ceph-users
> Subject: [ceph-users] Re: OSD memory leak?
>
> Quick question: Is there a way to change the frequency of heap dumps? On this 
> page http://goog-perftools.sourceforge.net/doc/heap_profiler.html a function 
> HeapProfilerSetAllocationInterval() is mentioned, but no other way of 
> configuring this. Is there a config parameter or a ceph daemon call to adjust 
> this?
>
> If not, can I change the dump path?
>
> Its likely to overrun my log partition quickly if I cannot adjust either of 
> the two.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Frank Schilder <fr...@dtu.dk>
> Sent: 20 July 2020 15:19:05
> To: Mark Nelson; Dan van der Ster
> Cc: ceph-users
> Subject: [ceph-users] Re: OSD memory leak?
>
> Dear Mark,
>
> thank you very much for the very helpful answers. I will raise 
> osd_memory_cache_min, leave everything else alone and watch what happens. I 
> will report back here.
>
> Thanks also for raising this as an issue.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Mark Nelson <mnel...@redhat.com>
> Sent: 20 July 2020 15:08:11
> To: Frank Schilder; Dan van der Ster
> Cc: ceph-users
> Subject: Re: [ceph-users] Re: OSD memory leak?
>
> On 7/20/20 3:23 AM, Frank Schilder wrote:
> > Dear Mark and Dan,
> >
> > I'm in the process of restarting all OSDs and could use some quick advice 
> > on bluestore cache settings. My plan is to set higher minimum values and 
> > deal with accumulated excess usage via regular restarts. Looking at the 
> > documentation 
> > (https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/),
> >  I find the following relevant options (with defaults):
> >
> > # Automatic Cache Sizing
> > osd_memory_target {4294967296} # 4GB
> > osd_memory_base {805306368} # 768MB
> > osd_memory_cache_min {134217728} # 128MB
> >
> > # Manual Cache Sizing
> > bluestore_cache_meta_ratio {.4} # 40% ?
> > bluestore_cache_kv_ratio {.4} # 40% ?
> > bluestore_cache_kv_max {512 * 1024*1024} # 512MB
> >
> > Q1) If I increase osd_memory_cache_min, should I also increase 
> > osd_memory_base by the same or some other amount?
>
>
> osd_memory_base is a hint at how much memory the OSD could consume
> outside the cache once it's reached steady state.  It basically sets a
> hard cap on how much memory the cache will use to avoid over-committing
> memory and thrashing when we exceed the memory limit. It's not necessary
> to get it right, it just helps smooth things out by making the automatic
> memory tuning less aggressive.  IE if you have a 2 GB memory target and
> a 512MB base, you'll never assign more than 1.5GB to the cache on the
> assumption that the rest of the OSD will eventually need 512MB to
> operate even if it's not using that much right now.  I think you can
> probably just leave it alone.  What you and Dan appear to be seeing is
> that this number isn't static in your case but increases over time any
> way.  Eventually I'm hoping that we can automatically account for more
> and more of that memory by reading the data from the mempools.
>
> > Q2) The cache ratio options are shown under the section "Manual Cache 
> > Sizing". Do they also apply when cache auto tuning is enabled? If so, is it 
> > worth changing these defaults for higher values of osd_memory_cache_min?
>
>
> They actually do have an effect on the automatic cache sizing and
> probably shouldn't only be under the manual section.  When you have the
> automatic cache sizing enabled, those options will affect the "fair
> share" values of the different caches at each cache priority level.  IE
> at priority level 0, if both caches want more memory than is available,
> those ratios will determine how much each cache gets.  If there is more
> memory available than requested, each cache gets as much as they want
> and we move on to the next priority level and do the same thing again.
> So in this case the ratios end up being sort of more like fallback
> settings for when you don't have enough memory to fulfill all cache
> requests at a given priority level, but otherwise are not utilized until
> we hit that limit.  The goal with this scheme is to make sure that "high
> priority" items in each cache get first dibs at the memory even if it
> might skew the ratios.  This might be things like rocksdb bloom filters
> and indexes, or potentially very recent hot items in one cache vs very
> old items in another cache.  The ratios become more like guidelines than
> hard limits.
>
>
> When you change to manual mode, you set an overall bluestore cache size
> and each cache gets a flat percentage of it based on the ratios.  With
> 0.4/0.4 you will always have 40% for onode, 40% for omap, and 20% for
> data even if one of those caches does not use all of it's memory.
>
>
> >
> > Many thanks for your help with this. I can't find answers to these 
> > questions in the docs.
> >
> > There might be two reasons for high osd_map memory usage. One is, that our 
> > OSDs seem to hold a large number of OSD maps:
>
>
> I brought this up in our core team standup last week.  Not sure if
> anyone has had time to look at it yet though.
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to