Hi Hongfei, Recently, we faced a memory leak on the following redirect with 8.1.1[*1]. The fix[*2] is coming in the next release, 8.1.2 and 9.0.1. If you didn't have your leak with 8.1.0, it might be the same leak.
> 1. Inside HttpSM, which states require allocate/re-use ioBuf? Is there a way to put a ceiling on each slot or total allocation? IOBuffer(s) are allocated in many states. In general, Address Sanitizer (ASan) is helpful to detect memory leaks. 1. Build ATS with `--enable-asan` option 2. Run Traffic Server without freelist (--fF) [*1] https://github.com/apache/trafficserver/issues/7380 [*2] https://github.com/apache/trafficserver/pull/7401 Thanks, Masaori On Tue, Jan 12, 2021 at 4:19 AM Hongfei Zhang <hongfei...@gmail.com> wrote: > Thanks Fei. There weren’t SSL/TLS sessions in our environment but I do > feel some of memory are being held by ‘dormant’ sessions. The total of > amount of memory held by freelist (44G) was however surprisingly high. > Majority of that (99%) are allocated through and held by ioBufAllocator. I > am wondering if there is anyway to limit the size of these freelists, also > curious what caused the ‘Allocated’ to continue to go up and why the > ‘In-Use’ did not go to zero after user traffic stops (and all of the > keep-alive session times out). > > I am also puzzled by the line memory/RamCacheLRUEntry shows only 5.2M, > where traffic_top shows about 32GB ram cache used. > > > Thanks, > -Hongfei > > > On Dec 17, 2020, at 3:32 PM, Fei Deng <duke8...@apache.org> wrote: > > > > Not saying this is the exact cause, but we've seen similar behavior > > previously. The reason for our issue was the session cache size was set > to > > a size too big compared to the ram size, and since the sessions stored in > > the cache are only removed when the cache is full, and inserting new > > sessions caused it to trigger *removeOldestSession*. You might want to > > check your configurations related to this feature > > *proxy.config.ssl.session_cache.size*. > > > > On Thu, Dec 17, 2020 at 1:52 PM Hongfei Zhang <hongfei...@gmail.com> > wrote: > > > >> Hi Folks, > >> > >> Based on information provided > >> > https://docs.trafficserver.apache.org/en/8.1.x/admin-guide/performance/index.en.html#memory-allocation > and > >> with a fixed ram_cache.size setting (32GB), we expected the memory > usage to > >> be plateaued a couple of days usage. This is not however what we saw > in > >> multiple production environments. It seemed the memory usage increases > >> steadily overtime, abeilt as a slow pace once the system’s memory usage > >> reaches 80-85% (there aren’t many other processes running on the > system), > >> until to a point ATS process is killed by kernel (oom kill) or human > >> intervention (server restart). On a system with 192GB ram (32GB used for > >> RAM disk, and ATS configured to use up to 32GB ram cache), peaking > >> streaming throughput at 10Gbps, ATS has to be killed/restared in about 2 > >> weeks. At peak hours, there are about 5k-6k client connections and less > >> than 1k upstream connections (to mid tier caches). > >> > >> We did some analysis on the Freelist dump (kill -USR1 pid) output (an > >> example is attached) and found the allocated in ioBufAllocator[0-14] > slots > >> appeared to be main contributor to the total and also likely to be the > >> source of the increase overtime. > >> > >> In terms of configurations and plugin usage, in addition to ram_cache > >> setting to 32GB, we also > >> changed proxy.config.http.default_buffer_water_mark INT 15000000 (from > >> default 64k) to allow the entire video segment to be buffered on the > >> upstream connection to avoid client starvation issue when the first > client > >> comes from a slow draining link, and > >> proxy.config.cache.target_fragment_size INT 4096 to allow upstream > chunked > >> responses to be written into cache storage timely. There is no > connection > >> limits (# of connections appeared to be always in the normal range). > The > >> inactivity timeout values are fairly low (<120 secs).The only plugin we > >> used is header_rewrite.so. No https, no http/2. > >> > >> I would appreciate if someone can shed some lights on how to further > track > >> this down, and any practical tips for short term mitigation. In > particular: > >> 1. Inside HttpSM, which states require allocate/re-use ioBuf? Is there a > >> way to put a ceiling on each slot or total allocation? > >> 2. Is the ioBufAllocation ceiling a function of total connections in > which > >> case I should set a connection limit? > >> 3. The memory/RamCacheLRUEntry shows 5.2M, how is this related to the > >> actual ram_cache usage reported by traffic_top (32GB used)? > >> 4. At the point of the freelist dump, ATS process size was 78GB, the > >> freelist total showed about 44GB, with 32GB ram_cache used (traffic_top > >> reports). Assuming these two number are not overlapping, I also know the > >> in-memory (disk) directory entry cache takes at least 10GB, then these > >> numbers do add up. 44+32+10 >> 78. What am I missing? > >> > >> > >> Thanks, > >> -Hongfei > >> > >