Not saying this is the exact cause, but we've seen similar behavior
previously. The reason for our issue was the session cache size was set to
a size too big compared to the ram size, and since the sessions stored in
the cache are only removed when the cache is full, and inserting new
sessions caused it to trigger *removeOldestSession*. You might want to
check your configurations related to this feature
*proxy.config.ssl.session_cache.size*.

On Thu, Dec 17, 2020 at 1:52 PM Hongfei Zhang <hongfei...@gmail.com> wrote:

> Hi Folks,
>
> Based on information provided
> https://docs.trafficserver.apache.org/en/8.1.x/admin-guide/performance/index.en.html#memory-allocation
>  and
> with a fixed ram_cache.size setting (32GB), we expected the memory usage to
> be plateaued a couple of days usage.    This is not however what we saw in
> multiple production environments. It seemed the memory usage increases
> steadily overtime, abeilt as a slow pace once the system’s memory usage
> reaches 80-85% (there aren’t many other processes running on the system),
> until to a point ATS process is killed by kernel (oom kill) or human
> intervention (server restart). On a system with 192GB ram (32GB used for
> RAM disk, and ATS configured to use up to 32GB ram cache), peaking
> streaming throughput at 10Gbps, ATS has to be killed/restared in about 2
> weeks.  At peak hours, there are about 5k-6k client connections and less
> than 1k upstream connections (to mid tier caches).
>
> We did some analysis on the Freelist dump (kill -USR1 pid) output (an
> example is attached) and found the allocated in ioBufAllocator[0-14] slots
> appeared to be main contributor to the total and also likely to be the
> source of the increase overtime.
>
> In terms of configurations and plugin usage,  in addition to ram_cache
> setting to 32GB, we also
> changed proxy.config.http.default_buffer_water_mark INT 15000000 (from
> default 64k) to allow the entire video segment to be buffered on the
> upstream connection to avoid client starvation issue when the first client
> comes from a slow draining link, and
> proxy.config.cache.target_fragment_size INT 4096 to allow upstream chunked
> responses to be written into cache storage timely.  There is no connection
> limits (# of connections appeared to be always in the normal range).  The
> inactivity timeout values are fairly low (<120 secs).The only plugin we
> used is header_rewrite.so. No https, no http/2.
>
> I would appreciate if someone can shed some lights on how to further track
> this down, and any practical tips for short term mitigation. In particular:
> 1. Inside HttpSM, which states require allocate/re-use ioBuf? Is there a
> way to put a ceiling on each slot or total allocation?
> 2. Is the ioBufAllocation ceiling a function of total connections in which
> case I should set a connection limit?
> 3. The memory/RamCacheLRUEntry shows 5.2M, how is this related to the
> actual ram_cache usage reported by traffic_top (32GB used)?
> 4. At the point of the freelist dump, ATS process size was 78GB, the
> freelist total showed about 44GB, with 32GB ram_cache used (traffic_top
> reports). Assuming these two number are not overlapping, I also know the
>  in-memory (disk) directory entry cache takes at least 10GB, then these
> numbers do add up. 44+32+10 >> 78. What am I missing?
>
>
> Thanks,
> -Hongfei
>

Reply via email to