Hi All, I think that this is the second memory leak that I mentioned below. It does not have to do with the send-queue-full message, but only with simply configuring more than one active host. For example, if you configure three hosts such as “{ host1, host2, host3 }”, then the following leak will happen --
LogHostList::preproc_and_try_delete(LogBuffer *lb) { … ink_release_assert(lb->m_references == 0); m_references starts as zero nr_host = nr = count(); ink_atomic_increment(&lb->m_references, nr_host); increments the m_references by 3 _not_ by 1 for (LogHost *host = first(); host && nr; host = next(host)) { LogHost *lh = host; available_host = lh; do { ink_atomic_increment(&lb->m_references, 1); increments the m_references by 1 for each host tried success = lh->preproc_and_try_delete(lb); LogHost::preproc_and_try_delete() will consume 1 ref-count need_orphan = need_orphan && (success == false); } while (lb && (success == false) && (lh = lh->failover_link.next)); nr--; } … if (lb != nullptr) { LogBuffer::destroy(lb); this only consumes 1 ref-count } return 0; } So since the ink_atomic_increment() and LogBuffer::destroy() are not balanced this will produce a memory leak. In my test scenario, at the end of 100K transactions, ATS had allocated 5961 and only freed 4348 LogBuffers. Proposed fix is to change ink_atomic_increment(&lb->m_references, nr_host) to ink_atomic_increment(&lb->m_references, 1) to balance things. Appreciate any second opinions, and I opened a PR #4041 with both memory leaks. Thanks, Peter -----Original Message----- From: Chou, Peter Sent: Thursday, July 26, 2018 1:42 PM To: dev <dev@trafficserver.apache.org> Subject: RE: Issue #2416 - Log Collation Memory Leak. Leif, 1) No problem. It is a two line change so we can carry in our custom 7.1.x if necessary. 2) Yeah this feature has issues. After fixing this memory leak, we realized that there is ANOTHER leak if you over-run the log collation transfer capacity triggering the "send-queue full" message. Perhaps this can be mitigated by increasing the send queue size (collation_max_send_buffers). This second issue can be triggered by increased transaction rate and/or configuring multiple active hosts, i.e., { host1, host2 } rather than active-standby { { host1 , host2 } }, to over-load the log sending capacity. I will take your proposal back to discuss with my users. Thanks, Peter -----Original Message----- From: Leif Hedstrom [mailto:zw...@apache.org] Sent: Thursday, July 26, 2018 12:54 PM To: dev <dev@trafficserver.apache.org> Subject: Re: Issue #2416 - Log Collation Memory Leak. > On Jul 26, 2018, at 3:14 AM, Chou, Peter <pbc...@labs.att.com> wrote: > > Hi All, > > After some additional study on how the ink_atomic_increment() and > LogBuffer::destroy() work together in the LogFile::, LogHost::, and > LogHostList::preproc_and_try_delete() functions, the following is probably a > better patch to address this issue. Appreciate any second opinions on this > before I open a PR. Nice find. However, I have two comments on this: 1) I’d really prefer not to respin the 7.1.x release over this. Most people don’t use this (IMO broken) feature. I hope that’s ok. 2) I’d like to restart the discussions of removing this feature from v9.0.0 again (in fact, I kinda wish we could nuke its from 8.0.0, but a little late for that). For #2, I feel that there are a lot better tools out there, like Kafka, Elastic Search (or Splunk for a commercial solution). I’ll make a separate thread on asking for the deprecation and removal of this feature :-) Cheers, — Leif