Hi Hongfei,

Recently, we faced a memory leak on the following redirect with 8.1.1[*1].
The fix[*2] is coming in the next release, 8.1.2 and 9.0.1.
If you didn't have your leak with 8.1.0, it might be the same leak.

> 1. Inside HttpSM, which states require allocate/re-use ioBuf? Is there a
way to put a ceiling on each slot or total allocation?
IOBuffer(s) are allocated in many states. In general, Address Sanitizer
(ASan) is helpful to detect memory leaks.

1. Build ATS with `--enable-asan` option
2. Run Traffic Server without freelist (--fF)

[*1] https://github.com/apache/trafficserver/issues/7380
[*2] https://github.com/apache/trafficserver/pull/7401

Thanks,
Masaori

On Tue, Jan 12, 2021 at 4:19 AM Hongfei Zhang <hongfei...@gmail.com> wrote:

> Thanks Fei.  There weren’t SSL/TLS sessions in our environment but I do
> feel some of memory are being held by ‘dormant’ sessions. The total of
> amount of memory held by freelist (44G) was however surprisingly high.
> Majority of that (99%) are allocated through and held by ioBufAllocator. I
> am wondering if there is anyway to limit the size of these freelists, also
> curious what caused the ‘Allocated’ to continue to go up and why the
> ‘In-Use’ did not go to zero after user traffic stops (and all of the
> keep-alive session times out).
>
> I am also puzzled by the line memory/RamCacheLRUEntry shows only 5.2M,
> where traffic_top shows about 32GB ram cache used.
>
>
> Thanks,
> -Hongfei
>
> > On Dec 17, 2020, at 3:32 PM, Fei Deng <duke8...@apache.org> wrote:
> >
> > Not saying this is the exact cause, but we've seen similar behavior
> > previously. The reason for our issue was the session cache size was set
> to
> > a size too big compared to the ram size, and since the sessions stored in
> > the cache are only removed when the cache is full, and inserting new
> > sessions caused it to trigger *removeOldestSession*. You might want to
> > check your configurations related to this feature
> > *proxy.config.ssl.session_cache.size*.
> >
> > On Thu, Dec 17, 2020 at 1:52 PM Hongfei Zhang <hongfei...@gmail.com>
> wrote:
> >
> >> Hi Folks,
> >>
> >> Based on information provided
> >>
> https://docs.trafficserver.apache.org/en/8.1.x/admin-guide/performance/index.en.html#memory-allocation
> and
> >> with a fixed ram_cache.size setting (32GB), we expected the memory
> usage to
> >> be plateaued a couple of days usage.    This is not however what we saw
> in
> >> multiple production environments. It seemed the memory usage increases
> >> steadily overtime, abeilt as a slow pace once the system’s memory usage
> >> reaches 80-85% (there aren’t many other processes running on the
> system),
> >> until to a point ATS process is killed by kernel (oom kill) or human
> >> intervention (server restart). On a system with 192GB ram (32GB used for
> >> RAM disk, and ATS configured to use up to 32GB ram cache), peaking
> >> streaming throughput at 10Gbps, ATS has to be killed/restared in about 2
> >> weeks.  At peak hours, there are about 5k-6k client connections and less
> >> than 1k upstream connections (to mid tier caches).
> >>
> >> We did some analysis on the Freelist dump (kill -USR1 pid) output (an
> >> example is attached) and found the allocated in ioBufAllocator[0-14]
> slots
> >> appeared to be main contributor to the total and also likely to be the
> >> source of the increase overtime.
> >>
> >> In terms of configurations and plugin usage,  in addition to ram_cache
> >> setting to 32GB, we also
> >> changed proxy.config.http.default_buffer_water_mark INT 15000000 (from
> >> default 64k) to allow the entire video segment to be buffered on the
> >> upstream connection to avoid client starvation issue when the first
> client
> >> comes from a slow draining link, and
> >> proxy.config.cache.target_fragment_size INT 4096 to allow upstream
> chunked
> >> responses to be written into cache storage timely.  There is no
> connection
> >> limits (# of connections appeared to be always in the normal range).
> The
> >> inactivity timeout values are fairly low (<120 secs).The only plugin we
> >> used is header_rewrite.so. No https, no http/2.
> >>
> >> I would appreciate if someone can shed some lights on how to further
> track
> >> this down, and any practical tips for short term mitigation. In
> particular:
> >> 1. Inside HttpSM, which states require allocate/re-use ioBuf? Is there a
> >> way to put a ceiling on each slot or total allocation?
> >> 2. Is the ioBufAllocation ceiling a function of total connections in
> which
> >> case I should set a connection limit?
> >> 3. The memory/RamCacheLRUEntry shows 5.2M, how is this related to the
> >> actual ram_cache usage reported by traffic_top (32GB used)?
> >> 4. At the point of the freelist dump, ATS process size was 78GB, the
> >> freelist total showed about 44GB, with 32GB ram_cache used (traffic_top
> >> reports). Assuming these two number are not overlapping, I also know the
> >> in-memory (disk) directory entry cache takes at least 10GB, then these
> >> numbers do add up. 44+32+10 >> 78. What am I missing?
> >>
> >>
> >> Thanks,
> >> -Hongfei
> >>
>
>

Reply via email to