RE: Issue #2416 - Log Collation Memory Leak.

Chou, Peter Mon, 30 Jul 2018 23:57:02 -0700

Hi All,

I think that this is the second memory leak that I mentioned below. It does not 
have to do with the send-queue-full message, but only with simply configuring 
more than one active host. For example, if you configure three hosts such as “{ 
host1, host2, host3 }”, then the following leak will happen --

LogHostList::preproc_and_try_delete(LogBuffer *lb)
{
…
  ink_release_assert(lb->m_references == 0);                                 
m_references starts as zero

  nr_host = nr = count();
  ink_atomic_increment(&lb->m_references, nr_host);                          
increments the m_references by 3 _not_ by 1 

  for (LogHost *host = first(); host && nr; host = next(host)) {
    LogHost *lh    = host;
    available_host = lh;

    do {
      ink_atomic_increment(&lb->m_references, 1);                            
increments the m_references by 1 for each host tried
      success     = lh->preproc_and_try_delete(lb);                            
LogHost::preproc_and_try_delete() will consume 1 ref-count
      need_orphan = need_orphan && (success == false);
    } while (lb && (success == false) && (lh = lh->failover_link.next));

    nr--;
  }
…
  if (lb != nullptr) {
    LogBuffer::destroy(lb);                                                  
this only consumes 1 ref-count 
  }
  return 0;
}

So since the ink_atomic_increment() and LogBuffer::destroy() are not balanced 
this will produce a memory leak. In my test scenario, at the end of 100K 
transactions, ATS had allocated 5961 and only freed 4348 LogBuffers. Proposed 
fix is to change ink_atomic_increment(&lb->m_references, nr_host) to 
ink_atomic_increment(&lb->m_references, 1) to balance things. Appreciate any 
second opinions, and I opened a PR #4041 with both memory leaks.

Thanks,
Peter

-----Original Message-----
From: Chou, Peter 
Sent: Thursday, July 26, 2018 1:42 PM
To: dev <dev@trafficserver.apache.org>
Subject: RE: Issue #2416 - Log Collation Memory Leak.

Leif,

1) No problem. It is a two line change so we can carry in our custom 7.1.x if 
necessary.

2) Yeah this feature has issues. After fixing this memory leak, we realized 
that there is ANOTHER leak if you over-run the log collation transfer capacity 
triggering the "send-queue full" message. Perhaps this can be mitigated by 
increasing the send queue size (collation_max_send_buffers). This second issue 
can be triggered by increased transaction rate and/or configuring multiple 
active hosts, i.e., { host1, host2 } rather than active-standby { { host1 , 
host2 } }, to over-load the log sending capacity. I will take your proposal 
back to discuss with my users.

Thanks,
Peter

-----Original Message-----
From: Leif Hedstrom [mailto:zw...@apache.org] 
Sent: Thursday, July 26, 2018 12:54 PM
To: dev <dev@trafficserver.apache.org>
Subject: Re: Issue #2416 - Log Collation Memory Leak.

> On Jul 26, 2018, at 3:14 AM, Chou, Peter <pbc...@labs.att.com> wrote:
> 
> Hi All,
> 
> After some additional study on how the ink_atomic_increment() and 
> LogBuffer::destroy() work together in the LogFile::, LogHost::, and 
> LogHostList::preproc_and_try_delete() functions, the following is probably a 
> better patch to address this issue. Appreciate any second opinions on this 
> before I open a PR.

Nice find.

However, I have two comments on this:

1) I’d really prefer not to respin the 7.1.x release over this. Most people 
don’t use this (IMO broken) feature. I hope that’s ok.

2) I’d like to restart the discussions of removing this feature from v9.0.0 
again (in fact, I kinda wish we could nuke its from 8.0.0, but a little late 
for that).

For #2, I feel that there are a lot better tools out there, like Kafka, Elastic 
Search (or Splunk for a commercial solution).

I’ll make a separate thread on asking for the deprecation and removal of this 
feature :-)

Cheers,

— Leif

RE: Issue #2416 - Log Collation Memory Leak.

Reply via email to