Re: "Too Many Connections" exceptions after moving to Tomcat 8

Phil Steitz Sat, 01 Aug 2015 08:12:48 -0700

On 7/31/15 8:16 AM, Christopher Schultz wrote:
> Phil,
>
> On 7/30/15 8:05 PM, Phil Steitz wrote:
> > On 7/30/15 9:03 AM, Christopher Schultz wrote:
> >> Jerry,
> >>
> >> On 7/29/15 3:25 PM, Jerry Malcolm wrote:
> >>> Well, it appears that we are slowly getting to the bottom of
> >>> this. But with every answer, I get a few more questions....
> >>
> >>> First, I installed the latest TC8 on my laptop, copied my
> >>> server.conf and conf/catalina folder to it and started it up
> >>> just to see what errors I got.  After changing out an obsolete
> >>> listener, it came up.  I found all of the <resource> parm
> >>> exceptions in stderr.  So that question is cleared up.  Thanks
> >>> for clarifying where to find that.
> >>
> >> If you have an obsolete Listener, you probably copied your
> >> server.xml from Tomcat 7 to Tomcat 8 which, while being less of a
> >> disaster than with previous version-pairs, is not good practice.
> >>
> >> Instead, start with the stock server.xml that comes with your
> >> Tomcat version and modify it to suit. These days, you should
> >> pretty much only have to configure the <Executor> and <Connector>
> >> elements, unless you have a particularly exotic <Host>
> >> configuration.
> >>
> >>> The site is a wedding vendor advertiser site that spans two
> >>> major cities.  There is no user login.  Simply a very huge
> >>> online catalog. I'm certain it's deployed only once.  Whether I
> >>> need that many connections is a valid question.  As far as I
> >>> know, I haven't hit the limit in normal operation until now.
> >>> Could possibly reduce the count if I collect statistics.
> >>
> >> Our user load is roughly 250 concurrently logged-in users per
> >> Tomcat node, and we have maxTotal="20". I never get alarms about
> >> hitting that maximum. Your requirements may be different.
> >>
> >>> I've been monitoring the production server logs all day
> >>> watching to be sure connection pool doesn't dry up again.
> >>> About an hour ago, there was a single huge dump in stdout of
> >>> approx 2000 'logAbandoned' exceptions. They showed connections
> >>> from 1am right after my last bounce of the server thru 1:35pm.
> >>
> >> It looks like your startup process (likely loading and caching
> >> stuff from the db on launch) is leaky. That can run-up your
> >> connection could quite quickly.
> >>
> >>> The good news is with the stack trace on one of them I was able
> >>> to see the bug causing the leak.
> >>
> >> Good.
> >>> But why did it decide to wait over 12 hours accumulating
> >>> abandoned connections before dumping them back in the pool?
> >>
> >> I was about to say the following, but markt says it might be a
> >> bug in DBCP.
>
> > The bug is in Commons Pool (POOL-300).  It is not flushing its
> > abandoned object log.  That means abandoned traces won't appear
> > (in the default System.out configuration) until some have been
> > accumulated.
>
> Thanks for the correction.
>
> >> I'll say it anyway:
> >>
> >> DBCP 2 looks like it only checks for abandoned connections "on
> >> borrow" so it might only log their abandonment when you see a
> >> flurry of connection-checkouts occurring, not when the
> >> connections are returned to the pool. DBCP 1 would complain
> >> pretty much immediately when the timeout was reached and the
> >> connection hadn't been returned.
>
> > When DBCP checks for abandoned connections depends in its
> > configuration properties.  There are two relevant properties:
> > removeAbandonedOnBorrow and removeAbandonedOnMaintenance.
>
> Right. I think most people don't use the "maintenance" mode, so I was
> being sloppy. I haven't read the code, but the configuration options
> make it sound like the connection isn't checked until it's borrowed
> again from the pool, which could be a very long time after it would be
> expected to have been "abandoned".


Sorry the docs are not that clear.  The problem is that the config
properties work together to determine behavior and documenting them
individually makes it hard to put the whole picture together. 
Improvement patches most welcome :)  In any case, here is how it works:

Connections are evaluated for abandonment when they are out in
circulation - checked out to clients.  If you have set
timeBetweenEvictionRunsMillis to a positive value, pool maintenance
runs every timeBetweenEvictionRunsMillis milliseconds.  If you have
removeAbandonedOnMaintenance set to true, each time maintenance runs
the pool removes abandoned connections.  If you have
removeAbandonedOnBorrow set to true, the pool removes abandoned
connections if it is nearing depletion when a borrow request
arrives.  In both cases, the pool looks at the statistics that it
maintains on the list of all objects checked out by clients to
determine which ones appear to be abandoned.  To appear abandoned
means to be checked out by a client but not used for longer than the
removeAbandonedTimeout.  For DBCP, "used" means the connection has
been used.

Phil

>
> >>> I realize from now knowing the code bug that the leak is a
> >>> slow drip that is continually leaking on a regular basis. But
> >>> since that last 12-hour accumulated dump, the abandoning has
> >>> returned to silence. Since leaks are occurring regularly and
> >>> would be timing out regularly, shouldn't I see a similar 'slow
> >>> drip' of logAbandoned entries in stdout instead of a big dump
> >>> every 12 hours?
> >>
> >>> It's going to take a day or two to fix the leak, test, and
> >>> deploy.
> >>
> >> For testing, set maxTotal="1". You'll find your leaks *very
> >> quickly* that way, because everything will come grinding to a
> >> halt when you try to fetch that second connection from the pool.
> >>
> >>> If indeed abandoned connections are now correctly being
> >>> returned to the pool, then I presume we are back to working the
> >>> way it did on TC7. Still not sure why it started working now.
> >>> But I guess once I get the leak fixed and if TC8 is now
> >>> configured to handle abandoned connections, I'm good.  Still
> >>> would like to know about the mega-dump vs. trickle of abandoned
> >>> connections being logged.
> >>
> >> You should be able to run in testing with an upgraded DBCP 2.
> >> You might have to build it from trunk, though. I'm not sure if
> >> you are okay with that, but it might help you with your testing.
>
> > The thing to swap out is Commons Pool.
>
> Yep, thanks for the clarification.
>
> > There is a release VOTE in progress now for an RC including a fix
> > for POOL-300.
>
> Good news!
>
> > A workaround that should work is to get a reference to the
> > BasicDataSource instance, say, bds and do
>
> > bds.setAbandonedLogWriter(new PrintWriter(System.out, true));
>
> > before using the pool.
>
> > Not sure if this will work correctly to get the output properly
> > directed under tomcat; but it is worth a try.
>
> Definitely.
>
> -chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: "Too Many Connections" exceptions after moving to Tomcat 8

Reply via email to