Thanks, Romain, this is awesome. I would really like to find a way to get this kind of thing implemented in [pool] or via enhanced factories. See more on that below.
On Tue, Feb 13, 2024 at 1:27 PM Romain Manni-Bucau <rmannibu...@gmail.com> wrote: > Hi Phil, > > What I used by the past for this kind of thing was to rely on the timeout > of the pool plus in the healthcheck - external to the pool - have some > trigger (the simplest was "if 5 healthchecks fail without any success in > between" for ex), such trigger will spawn a task (think thread even if it > uses an executor but guarantee to have a place for this task) which will > retry but at a faster pace (instead of every 30s it is 5 times in a run for > - number was tunable but 5 was my default). > If still detected as down - vs not overloaded or alike - it will consider > the database down and spawn a task which will retry every 30 seconds, if > the database comes back - I added some business check but idea is not just > check the connection but the tables are accessible cause often after such a > downtime the db does not come at once - just destroy/recreate the pool. > The destroy/recreate was handled using a DataSource proxy in front of the > pool and change the delegate. > It seems to me that all of this might be possible using what I was calling a ReslientFactory. The factory could implement the health-checking itself, using pluggable strategies for how to check, how often, what means outage, etc. And the factory could (if so configured and in the right state) bounce the pool. I like the model of escalating concern. > Indeed it is not magic inside the pool but can only better work than the > pool solution cause you can integrate to your already existing checks and > add more advanced checks - if you have jpa just do a fast query on any > table to validate db is back for ex. > At the end code is pretty simple and has another big advantage: you can > circuit break the database completely while you consider the db is down > just letting passing 10% of whatever ratio you want - of the requests (kind > of canary testing which avoids too much pressure on the pool). > > I guess it was not exactly the answer you expected but think it can be a > good solution and ultimately can site in a new package in dbcp or alike? > I don't see anything here that is specific really to database connections (other than the proxy setup to gracefully handle bounces), so I want to keep thinking about how to solve the general problem by somehow enhancing factories and/or pools. Phil > > Best, > Romain Manni-Bucau > @rmannibucau <https://twitter.com/rmannibucau> | Blog > <https://rmannibucau.metawerx.net/> | Old Blog > <http://rmannibucau.wordpress.com> | Github < > https://github.com/rmannibucau> | > LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book > < > https://www.packtpub.com/application-development/java-ee-8-high-performance > > > > > Le mar. 13 févr. 2024 à 21:11, Phil Steitz <phil.ste...@gmail.com> a > écrit : > > > POOL-407 tracks a basic liveness problem that we have never been able to > > solve: > > > > A factory "goes down" resulting in either failed object creation or > failed > > validation during the outage. The pool has capacity to create, but the > > factory fails to serve threads as they arrive, so they end up parked > > waiting on the idle object pool. After a possibly very brief > interruption, > > the factory heals itself (maybe a database comes back up) and the waiting > > threads can be served, but until other threads arrive, get served and > > return instances to the pool, the parked threads remain blocked. > > Configuring minIdle and pool maintenance (timeBetweenEvictionRuns > 0) > can > > improve the situation, but running the evictor at high enough frequency > to > > handle every transient failure is not a great solution. > > > > I am stuck on how to improve this. I have experimented with the idea of > a > > ResilientFactory, placing the responsibility on the factory to know when > it > > is down and when it comes back up and when it does, to keep calling it's > > pool's create as long as it has take waiters and capacity; but I am not > > sure that is the best approach. The advantage of this is that > > resource-specific failure and recovery-detection can be implemented. > > > > Another option that I have played with is to have the pool keep track of > > factory failures and when it observes enough failures over a long enough > > time, it starts a thread to do some kind of exponential backoff to keep > > retrying the factory. Once the factory comes back, the recovery thread > > creates as many instances as it can without exceeding capacity and adds > > them to the pool. > > > > I don't really like either of these. Anyone have any better ideas? > > > > Phil > > >