čt 10. 9. 2020 v 5:02 odesílatel Craig Ringer <cr...@2ndquadrant.com> napsal:
> Hi all > > As I've gained experience working on background workers, it's become > increasingly clear that they're a bit too different to normal backends for > many nontrivial uses. > > I thought I'd take a moment to note some of it here, along with some > proposals for things we could potentially do to make it much easier to use > bgworkers correctly especially when using them to run queries. > > This is NOT A PATCH SET. It's a set of discussion proposals and it's also > intended as a bit of a helper for people just getting started on bgworkers. > There are a lot of subtle differences in the runtime environment a basic > bgworker provides vs the runtime environment extension authors will be used > to when writing fmgr-callable C functions. > > (It looks like pg12 and pg13 have some improvements, so some of the issues > I was going to mention with error cleanup paths and locking aren't relevant > anymore.) > > DIFFERENCES WHEN CODING FOR BGWORKERS > ---- > > Some of the most important differences vs normal backends that I've > noticed are: > > * There's no supported way to recover from an error and continue > execution. Each bgworker has to roll its own logic based on > PostgresMain()'s sigsetjmp() handler and maintain it. > * The bgworker infrastructure doesn't set up the the application_name in > PGPROC > * Example bgworkers roll their own signal handlers rather than using die, > procsignal_sigusr1_handler, and PostgresSigHupHandler. This makes them seem > to work OK, but fail to respect CHECK_FOR_INTERRUPTS() and other expected > behaviour when they call into the executor etc. > * Each bgworker has to roll its own mainloop with ConfigReloadPending / > ProcessConfigFile() etc, and this isn't illustrated at all in the examples > * Example workers don't interact with the pgstat subsystem > > Managing bgworker lifecycle is also quite difficult. The postmaster > doesn't offer any help and we don't expose much help from core. Any > extension that spawns workers needs to do its own handling of worker > startup registration in shmem, worker exit reaping, handling workers that > exit before registering, etc. This can be seen in the logical replication > code, which rolls its own management per > src/backend/replication/logical/launcher.c . > > PROPOSED CHANGES IN EXAMPLE BGWORKERS/CONTRIBS > ----- > > So I'd like to make these changes to example bgworkers and to contrib > extensions: > > * Change example bgworkers in contrib to set up normal default signal > handlers, not define their own > * Add ConfigReloadPending and ProcessConfigFile() to all example bgworkers > * Show pgstat use in example bgworkers > > PROPOSED BGW SETUP AND MAINLOOP HELPERS > ---- > > and I'm wondering if anyone thinks it's a good idea to add some bgworker > mainloop helper API, where one call is made at the start of the bgw, before > BackgroundWorkerInitializeConnection(), to: > > * set application_name > * set process title > * set up default signal handlers and unblock signals > * creates and assigns a BGWMemoryContext named after the bgw > * set state to idle in pgstat > * sets up MyProcPort (or teach InitPostgres to do so for a bgw) > * takes a function pointer for a before_shmem_exit cleanup handler > to encourage ext authors to use this approach > > then a mainloop helper that: > > * Checks that CurrentMemoryContext == BGWMemoryContext > * Runs CHECK_FOR_INTERRUPTS() > * Checks for postmaster death and exits > * Checks for and performs config file reload > > > > PROPOSED ERROR HANDLING HELPER > ---- > > It'd also be beneficial to have a helper for bgw mainloops with error > recovery, generalizing and extracting the pattern I see in all these > routines: > > src/backend/postmaster/autovacuum.c=AutoVacLauncherMain > src/backend/postmaster/autovacuum.c=AutoVacWorkerMain > src/backend/postmaster/bgworker.c=StartBackgroundWorker > src/backend/postmaster/bgwriter.c=BackgroundWriterMain > src/backend/postmaster/checkpointer.c=CheckpointerMain > src/backend/postmaster/walwriter.c=WalWriterMain > src/backend/tcop/postgres.c=PostgresMain > > which all do their own error reset loops. There's way too much copy/paste > going on there IMO, even before we consider bgworkers, and it's nearly > impossible for anyone who isn't quite an experienced Pg developer to write > a bgw that can do anything with errors except promote ERROR to FATAL and > die. > > This would be modeled on PosgresMain(). The worker would only need to add > its own logic to the sigsetjmp() error jump path, not duplicate all the > core cleanup work. > > PROPOSED GENERALISED WORKER MANAGEMENT > ---- > > Finally I'm wondering if there's any interest in generalizing the logical > rep worker management for other bgworkers. I've done a ton of work with > worker management and it's something I'm sure I could take on but I don't > want to write it without knowing there's some chance of acceptance. > > The general idea is to provide a way for bgworkers to start up managers > for pools / sets of workers. They launch them and have a function they can > call in their mainloop that watches their child worker states, invoking > callbacks when they fail to launch, launch successfully, exit cleanly after > finishing their work, or die with an error. Workers are tracked in a shmem > seg where the start of the seg must be a key struct (akin to how the hash > API works). We would provide calls to look up a worker shmem struct by key, > signal a worker by key, wait for a worker to exit (up to timeout), etc. > Like in the logical rep code, access to the worker registration shmem would > be controlled by LWLock. The extension code can put whatever it wants in > the worker shmem entries after the key, including various unions or > whatever - the worker management API won't care. > > This abstracts all the low level mess away from bgworker implementations > and lets them focus on writing the code they want to run. > > I'd probably suggest doing so by extracting the logical rep worker > management, and making the logical rep code use the generalized worker > management. So it'd be proven, and have in core users. > > PROPOSED XACT API WRAPPER > ---- > > Additionally there are some more general postgres API patterns that the > team working on pglogical and BDR has landed up writing wrappers for or > compensating for. One of the most important is transaction management. In > many places in pglogical where we start lightweight transactions in > pglogical we use a wrapper that starts the txn if there isn't already one > open and remembers whether one was started. Then the end helper closes it > only if the start helper opened it. Finally, the memory context is always > reset to the memory context before txn start once the txn ends, instead of > being left at TopMemoryContext, since we've learned that that's an > extremely common source of bugs. > > I propose a wrapper for lightweight txn management, like the one we use in > many places in pglogical. This is not a copy of that code, it's just a > scratched out example of roughly how we do it: > > LocalTransactionState txn; > StartTransactionIfNotStarted(&txn, LXACT_WANT_SNAPSHOT); > ... > CommitTransactionCommandIfStarted(&txn); > > where > > static inline void > StartTransactionIfNotStarted(LocalTransactionState *txn, int flags) > { > txn->started = !TransactionInProgress(); > txn->save_mctx = CurrentMemoryContext; > if (txn->started) > { > txn->pushed_snapshot = false; > StartTransactionCommand(); > if (flags & LXACT_WANT_SNAPSHOT) > { > txn->pushed_snapshot = true; > PushActiveSnapshot(GetTransactionSnapshot()); > } > } > } > > static inline void > CommitTransactionIfNotStarted(LocalTransactionState *txn) > { > if (txn->started) > { > Assert(TransactionInProgress()); > Assert(CurrentMemoryContext == TopTransactionContext); > if (txn->pushed_snapshot) > PopActiveSnapshot(); > CommitTransactionCommand(); > } > else > { > /* Caller didn't change context between start and end */ > Assert(CurrentMemoryContext == txn->save_mctx); > } > (void) MemoryContextSwitchTo(txn->save_mctx); > } > any from these proposal looks like good idea Regards Pavel > > -- > Craig Ringer http://www.2ndQuadrant.com/ > 2ndQuadrant - PostgreSQL Solutions for the Enterprise >