Hi Folks, Based on information provided https://docs.trafficserver.apache.org/en/8.1.x/admin-guide/performance/index.en.html#memory-allocation and with a fixed ram_cache.size setting (32GB), we expected the memory usage to be plateaued a couple of days usage. This is not however what we saw in multiple production environments. It seemed the memory usage increases steadily overtime, abeilt as a slow pace once the system’s memory usage reaches 80-85% (there aren’t many other processes running on the system), until to a point ATS process is killed by kernel (oom kill) or human intervention (server restart). On a system with 192GB ram (32GB used for RAM disk, and ATS configured to use up to 32GB ram cache), peaking streaming throughput at 10Gbps, ATS has to be killed/restared in about 2 weeks. At peak hours, there are about 5k-6k client connections and less than 1k upstream connections (to mid tier caches). We did some analysis on the Freelist dump (kill -USR1 pid) output (an example is attached) and found the allocated in ioBufAllocator[0-14] slots appeared to be main contributor to the total and also likely to be the source of the increase overtime. In terms of configurations and plugin usage, in addition to ram_cache setting to 32GB, we also changed proxy.config.http.default_buffer_water_mark INT 15000000 (from default 64k) to allow the entire video segment to be buffered on the upstream connection to avoid client starvation issue when the first client comes from a slow draining link, and proxy.config.cache.target_fragment_size INT 4096 to allow upstream chunked responses to be written into cache storage timely. There is no connection limits (# of connections appeared to be always in the normal range). The inactivity timeout values are fairly low (<120 secs).The only plugin we used is header_rewrite.so. No https, no http/2. I would appreciate if someone can shed some lights on how to further track this down, and any practical tips for short term mitigation. In particular: 1. Inside HttpSM, which states require allocate/re-use ioBuf? Is there a way to put a ceiling on each slot or total allocation? 2. Is the ioBufAllocation ceiling a function of total connections in which case I should set a connection limit? 3. The memory/RamCacheLRUEntry shows 5.2M, how is this related to the actual ram_cache usage reported by traffic_top (32GB used)? 4. At the point of the freelist dump, ATS process size was 78GB, the freelist total showed about 44GB, with 32GB ram_cache used (traffic_top reports). Assuming these two number are not overlapping, I also know the in-memory (disk) directory entry cache takes at least 10GB, then these numbers do add up. 44+32+10 >> 78. What am I missing? Thanks, -Hongfei |
Allocated | In-Use | Type Size | Free List Name --------------------|--------------------|------------|---------------------------------- 8388608000 | 4125097984 | 2097152 | memory/ioBufAllocator[14] 28689039360 | 24322768896 | 1048576 | memory/ioBufAllocator[13] 3573547008 | 2275934208 | 524288 | memory/ioBufAllocator[12] 947912704 | 637534208 | 262144 | memory/ioBufAllocator[11] 679477248 | 454950912 | 131072 | memory/ioBufAllocator[10] 801112064 | 275644416 | 65536 | memory/ioBufAllocator[9] 473956352 | 99221504 | 32768 | memory/ioBufAllocator[8] 33554432 | 7667712 | 16384 | memory/ioBufAllocator[7] 64749568 | 20176896 | 8192 | memory/ioBufAllocator[6] 120586240 | 77496320 | 4096 | memory/ioBufAllocator[5] 524288 | 0 | 2048 | memory/ioBufAllocator[4] 131072 | 0 | 1024 | memory/ioBufAllocator[3] 131072 | 1536 | 512 | memory/ioBufAllocator[2] 65536 | 3584 | 256 | memory/ioBufAllocator[1] 45350912 | 92416 | 128 | memory/ioBufAllocator[0] 2285568 | 670560 | 96 | memory/eventAllocator 2766240 | 1918400 | 80 | memory/mutexAllocator 47153152 | 1697920 | 64 | memory/ioBlockAllocator 20954880 | 4207968 | 48 | memory/ioDataAllocator 5940480 | 5809200 | 240 | memory/ioAllocator 0 | 0 | 432 | memory/socksAllocator 0 | 0 | 128 | memory/udpReadContAllocator 0 | 0 | 160 | memory/udpPacketAllocator 16619200 | 9008208 | 752 | memory/netVCAllocator 0 | 0 | 128 | memory/UDPIOEventAllocator 0 | 0 | 880 | memory/sslNetVCAllocator 5201920 | 4293824 | 64 | memory/RamCacheLRUEntry 0 | 0 | 96 | memory/RamCacheCLFUSEntry 245760 | 239200 | 160 | memory/openDirEntry 0 | 0 | 48 | memory/evacuationKey 8192 | 0 | 64 | memory/cacheRemoveCont 1044480 | 1032672 | 96 | memory/evacuationBlock 10431200 | 10361344 | 944 | memory/cacheVConnection 0 | 0 | 48 | memory/ClusterVConnectionCache::Entry 0 | 0 | 576 | memory/cacheContAllocator 0 | 0 | 32 | memory/byteBankAllocator 0 | 0 | 592 | memory/clusterVCAllocator 0 | 0 | 112 | memory/inControlAllocator 0 | 0 | 128 | memory/outControlAllocator 0 | 0 | 16 | memory/DNSRequestDataAllocator 135424 | 33856 | 33856 | memory/dnsBufAllocator 163840 | 0 | 1280 | memory/dnsEntryAllocator 4096 | 320 | 16 | memory/expiryQueueEntry 8192 | 1280 | 64 | memory/refCountCacheHashingValueAllocator 0 | 0 | 96 | memory/hostDBFileContAllocator 22568960 | 2320 | 2320 | memory/hostDBContAllocator 0 | 0 | 128 | memory/OneWayTunnelAllocator 54525952 | 49588224 | 2048 | memory/hdrStrHeap 63438848 | 46405632 | 2048 | memory/hdrHeap 622592 | 34304 | 256 | memory/httpCacheAltAllocator 0 | 0 | 48 | memory/CongestRequestParamAllocator 0 | 0 | 160 | memory/CongestionDBContAllocator 0 | 0 | 128 | memory/RemapPluginsAlloc 0 | 0 | 960 | memory/http2ClientSessionAllocator 0 | 0 | 960 | memory/http2StreamAllocator 0 | 0 | 48 | memory/CacheLookupHttpConfigAllocator 1179648 | 1179264 | 192 | memory/httpServerSessionAllocator 80621568 | 5007744 | 7776 | memory/httpSMAllocator 15253504 | 15223040 | 896 | memory/http1ClientSessionAllocator 0 | 0 | 128 | memory/socksProxyAllocator 4096 | 0 | 32 | memory/MIMEFieldSDKHandle 0 | 0 | 240 | memory/INKVConnAllocator 921600 | 34656 | 96 | memory/INKContAllocator 1224704 | 43840 | 32 | memory/apiHookAllocator 0 | 0 | 592 | memory/ICPRequestCont_allocator 0 | 0 | 128 | memory/ICPPeerReadContAllocator 0 | 0 | 432 | memory/PeerReadDataAllocator 0 | 0 | 512 | memory/FetchSMAllocator 10616832 | 659456 | 1024 | memory/ArenaBlock 44182686784 | 32454043824 | | TOTAL -----------------------------------------------------------------------------------------