Bug#530440: apt-cacher-ng: Thread leak due to race condition

Alexander Inyukhin Sun, 24 May 2009 17:28:17 -0700

Thanks for quick answer.

I was not quite right about locking.
It still a race condition involving nSpareThreads, but more complicated.


When variable decrements, is supposed that some thread will be awaked,
but some other worker may return and take this job.
In this case number on spare threads nSpareThreads will be decremented,
but their count will not change.
In the default case returning thread will be added spare pool,
incrementing this variable back.

AFAIU, the root of the problem, that from the decrementer's point of view
this counter is not the the number of threads, but the number
of outstanding requests. I think it is sufficient to check
that nSpareThreads has non-zero value to spawn new thread
and move all thread accounting into ThreadAction.

This soultion seems to work for me.



I will answer you later with details on questions below.
It takes some time to reach this overflow )

On Mon, May 25, 2009 at 01:38:41AM +0200, Eduard Bloch wrote:
> #include <hallo.h>
> * Alexander Inyukhin [Mon, May 25 2009, 01:30:38AM]:
> 
> > I have noticed, that acng eats all available to a process
> > virtual memory after some days of work, and it starts
> > to return 503 to all requests.
> > It spawns a lot of threads and keeps them running.
> 
> How many exactly? (ps -L ...)
> What exactly is in the HTTP status line (after 503)?
> 
> > The reason of this behavior is race condition while counting spare threads.
> > Variable nSpareThreads must change under mutually exclusive lock,
> > but in the function ThreadAction it is guarded with reLock function,
> > which allows all workers to enter this critical section simultaneously.
> > 
> > Due to this nSpareThreads is increased slower, than it should, and it allows
> > threads to leak.
> 
> Nice idea, but I don't think so. Reason: both code positions
> (decrementing and incrementing) are covered by the mutex which is inside
> of the global object "cond" (to which lockguard helpers are connected in
> both cases).
> 
> Further, reaching thread limit would have different symptoms (not
> throwing 503... just grep for "503", it's not used in conserver.cc at all).
> 
> However, your problem might be somehow connected to 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=529744 . That problem
> looks like many downloader objects not being released (according to
> pipe/socket ratio) which might also be caused by hanging user connection
> threads. And receiving two heavy bug reports within one week after no such
> problem has been reported for months, that's very suspicious.
> 
> I just don't have a good idea yet. Version 0.3.12 was released few
> minutes ago and should appear on incoming.debian.org now. It adds proper
> handling for EINTR on close(). Please take that one for further
> tests. If the problem disappears -> great, if not: please provide thread
> count and status of file handles (lsof) and last lines of apt-cacher.err
> file.



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#530440: apt-cacher-ng: Thread leak due to race condition

Reply via email to