Re: [pool] Recovering from transient factory outages

Phil Steitz Tue, 13 Feb 2024 13:38:13 -0800

Thanks, Romain, this is awesome.  I would really like to find a way to get
this kind of thing implemented in [pool] or via enhanced factories.  See
more on that below.


On Tue, Feb 13, 2024 at 1:27 PM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> Hi Phil,
>
> What I used by the past for this kind of thing was to rely on the timeout
> of the pool plus in the healthcheck - external to the pool - have some
> trigger (the simplest was "if 5 healthchecks fail without any success in
> between" for ex), such trigger will spawn a task (think thread even if it
> uses an executor but guarantee to have a place for this task) which will
> retry but at a faster pace (instead of every 30s it is 5 times in a run for
> - number was tunable but 5 was my default).
> If still detected as down - vs not overloaded or alike - it will consider
> the database down and spawn a task which will retry every 30 seconds, if
> the database comes back - I added some business check but idea is not just
> check the connection but the tables are accessible cause often after such a
> downtime the db does not come at once - just destroy/recreate the pool.
> The destroy/recreate was handled using a DataSource proxy in front of the
> pool and change the delegate.
>

It seems to me that all of this might be possible using what I was calling
a ReslientFactory.  The factory could implement the health-checking itself,
using pluggable strategies for how to check, how often, what means outage,
etc.  And the factory could (if so configured and in the right state)
bounce the pool.  I like the model of escalating concern.


> Indeed it is not magic inside the pool but can only better work than the
> pool solution cause you can integrate to your already existing checks and
> add more advanced checks - if you have jpa just do a fast query on any
> table to validate db is back for ex.
> At the end code is pretty simple and has another big advantage: you can
> circuit break the database completely while you consider the db is down
> just letting passing 10% of whatever ratio you want - of the requests (kind
> of canary testing which avoids too much pressure on the pool).
>
> I guess it was not exactly the answer you expected but think it can be a
> good solution and ultimately can site in a new package in dbcp or alike?
>

I don't see anything here that is specific really to database connections
(other than the proxy setup to gracefully handle bounces), so I want to
keep thinking about how to solve the general problem by somehow enhancing
factories and/or pools.

Phil

>
> Best,
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github <
> https://github.com/rmannibucau> |
> LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
> <
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> >
>
>
> Le mar. 13 févr. 2024 à 21:11, Phil Steitz <phil.ste...@gmail.com> a
> écrit :
>
> > POOL-407 tracks a basic liveness problem that we have never been able to
> > solve:
> >
> > A factory "goes down" resulting in either failed object creation or
> failed
> > validation during the outage.  The pool has capacity to create, but the
> > factory fails to serve threads as they arrive, so they end up parked
> > waiting on the idle object pool.  After a possibly very brief
> interruption,
> > the factory heals itself (maybe a database comes back up) and the waiting
> > threads can be served, but until other threads arrive, get served and
> > return instances to the pool, the parked threads remain blocked.
> > Configuring minIdle and pool maintenance (timeBetweenEvictionRuns > 0)
> can
> > improve the situation, but running the evictor at high enough frequency
> to
> > handle every transient failure is not a great solution.
> >
> > I am stuck on how to improve this.  I have experimented with the idea of
> a
> > ResilientFactory, placing the responsibility on the factory to know when
> it
> > is down and when it comes back up and when it does, to keep calling it's
> > pool's create as long as it has take waiters and capacity; but I am not
> > sure that is the best approach.  The advantage of this is that
> > resource-specific failure and recovery-detection can be implemented.
> >
> > Another option that I have played with is to have the pool keep track of
> > factory failures and when it observes enough failures over a long enough
> > time, it starts a thread to do some kind of exponential backoff to keep
> > retrying the factory.  Once the factory comes back, the recovery thread
> > creates as many instances as it can without exceeding capacity and adds
> > them to the pool.
> >
> > I don't really like either of these.  Anyone have any better ideas?
> >
> > Phil
> >
>

Re: [pool] Recovering from transient factory outages

Reply via email to