Re: [HACKERS] bg worker: general purpose requirements

Markus Wanner Fri, 17 Sep 2010 08:30:12 -0700

Hi,

On 09/16/2010 07:47 PM, Robert Haas wrote:

It would be nice if there were a way to create
a general facility here that we could then build various applications
on, but I'm not sure whether that's the case.  We had some
back-and-forth about what is best for replication vs. what is best for
vacuum vs. what is best for parallel query.  If we could somehow
conceive of a system that could serve all of those needs without
introducing any more configuration complexity than what we have now,
that would of course be very interesting.

Lets think about this again from a little distance. We have the existingautovacuum and the Postgres-R project. Then there are the potentialfeatures 'parallel querying' and 'autonomous transactions' that could inprinciple benefit from the bgworker infrastructure.

For all of those, one could head for a multi-threaded, a multi-processor an async, event based approach. Multi-threading seems to be out ofquestion for Postgres. We don't have much of an async event frameworkanywhere, so at least for parallel querying that seems out of questionas well. Only the 'autonomous transactions' feature seems simple enoughto be doable within a single process. That approach would still miss theisolation that a separate process features (not sure that's required,but 'autonomous' sounds like it could be a good thing to have).

So assuming we use the multi-process approach provided by bgworkers forboth potential features. What are the requirements?

autovacuum: only very few jobs at a time, not very resource intensive,not passing around lots of data

Postgres-R: lots of concurrent jobs, easily more than normal backends,depending on the amount of nodes in the cluster and read/write ratio,lots of data to be passed around

parallel querying: a couple dozen concurrent jobs (by number of CPUs orspindles available?), more doesn't help, lots of data to be passed around

autonomous transactions: max. one per normal backend (correct?), wayfewer should suffice in most cases, only control data to be passed around

So, for both potential features as well as for autovacuum, a ratio of1:10 (or even less) for max_bgworkers:max_connections would suffice.Postgres-R clearly seems to be the out-breaker here. It needs specialconfiguration anyway, so I'd have no problem with defaults that targetthe other use cases.

All of the potential users of bgworkers benefit from a pre-connectedbgworker. Meaning having at least one spare bgworker around per databasecould be beneficial, potentially more depending on how often spike loadsoccur. As long as there are only few databases, it's easily possible tohave at least one spare process around per database, but with thousandsof databases, that might get prohibitively expensive (not sure where theboundary between win vs loose is, though. Idle backends vs. connectioncost).

None the less, bgworkers would make the above features easier toimplement, as they provide the controlled background worker processinfrastructure, including job handling (and even queuing) in thecoordinator process. Having spare workers available is not a perquisiteto use bgworkers, it's just an optimization.

Autovacuum could possibly benefit from bgworkers by enabling a finergrained choice for what database and table to vacuum when. I didn't looktoo much into that, though.

Regarding the additional configuration overhead of the bgworkers patch:max_autovacuum_workers gets turned into max_background_workers, so theonly additional GUCs currently are: min_spare_background_workers andmax_spare_background_workers (sorry, I thought I named them idleworkers, looks like I've gone with spare workers for the GUCs).

Those are used to control and limit (in both directions) the amount ofspare workers (per database). It's the simplest possible variant I couldthink of. But I'm open to other mechanisms, especially ones that requireless configuration. Simply keeping spare workers around for a giventimeout *could* be a replacement and would save us one GUC.

However, I feel like this gives less control over how the bgworkers areused. For example, I'd prefer to be able to prevent the system fromallocating all bgworkers to a single database at once. And as mentionedabove, it also makes sense to pre-fork some bgworkers in advance, ifthere are still enough available. The timeout approach doesn't take careof that, but assumes that the past is a good indicator of use for thefuture.

Hope that sheds some more light on how bgworkers could be useful. MaybeI just need to describe the job handling features of the coordinatorbetter as well? (Simon also requested better documentation...)


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bg worker: general purpose requirements

Reply via email to