Hi, On 2023-01-27 16:59:10 -0800, Nathan Bossart wrote: > On Fri, Jan 27, 2023 at 04:23:19PM -0800, Andres Freund wrote: > >> +typedef bool (*RecoveryRestoreCB) (const char *file, const char *path, > >> + const char > >> *lastRestartPointFileName); > >> +typedef void (*RecoveryArchiveCleanupCB) (const char > >> *lastRestartPointFileName); > >> +typedef void (*RecoveryEndCB) (const char *lastRestartPointFileName); > >> +typedef void (*RecoveryShutdownCB) (void); > > > > I think the signature of these forces bad coding practices, because there's > > no > > way to have state within a recovery module (as there's no parameter for it). > > > > It's possible we would eventually support multiple modules, e.g. restoring > > from shorter term file based archiving and from longer term archiving in > > some > > blob store. Then we'll regret not having a varible for this. > > Are you suggesting that we add a "void *arg" to each one of these?
Yes. Or pass a pointer to a struct with a "private_data" style field to all of them. > >> +extern RecoveryModuleCallbacks RecoveryContext; > > > > I think that'll typically be interpreteted as a MemoryContext by readers. > > How about RecoveryCallbacks? Callbacks is better than Context here imo, but I think 'Recovery' makes it sound like this actually performs WAL replay or such. Seems like it should be RestoreCallbacks at the very least? > > Also, why is this a global var? Exported too? > > It's needed in xlog.c, xlogarchive.c, and xlogrecovery.c. Would you rather > it be static to xlogarchive.c and provide accessors for the others? Maybe? Something about this feels wrong to me, but I can't entirely put my finger on it. > >> +/* > >> + * Type of the shared library symbol _PG_recovery_module_init that is > >> looked up > >> + * when loading a recovery library. > >> + */ > >> +typedef void (*RecoveryModuleInit) (RecoveryModuleCallbacks *cb); > > > > I think this is a bad way to return callbacks. This way the > > RecoveryModuleCallbacks needs to be modifiable, which makes the job for the > > compiler harder (and isn't the greatest for security). > > > > I strongly encourage you to follow the model used e.g. by tableam. The init > > function should return a pointer to a *constant* struct. Which is > > compile-time > > initialized with the function pointers. > > > > See the bottom of heapam_handler.c for how that ends up looking. > > Hm. I used the existing strategy for archive modules and logical decoding > output plugins here. Unfortunately I didn't realize the problem when I was designing the output plugin interface. But there's probably too many users of it out there now to change it. The interface does at least provide a way to have its own "per instance" state, via the startup callback and LogicalDecodingContext->output_plugin_private. The worst interface in this area is index AMs - the handler returns a pointer to a palloced struct with callbacks. That then is copied into a new allocation in the relcache entry. We have hundreds to thousands of copies of what bthandler() sets up in memory. Without any sort of benefit. > I think it would be weird for the archive module and > recovery module interfaces to look so different, but if that's okay, I can > change it. I'm a bit sad about the archive module case - I wonder if we should change it now, there can't be many users of it out there. And I think it's more likely that we'll eventually want multiple archiving scripts to run concurrently - which will be quite hard with the current interface (no private state). Just btw: It's imo a bit awkward for the definition of the archiving plugin interface to be in pgarch.h: "Exports from postmaster/pgarch.c" doesn't describe that well. A dedicated header seems cleaner. > > >> +void > >> +LoadRecoveryCallbacks(void) > >> +{ > >> + RecoveryModuleInit init; > >> + > >> + /* > >> + * If the shell command is enabled, use our special initialization > >> + * function. Otherwise, load the library and call its > >> + * _PG_recovery_module_init(). > >> + */ > >> + if (restoreLibrary[0] == '\0') > >> + init = shell_restore_init; > >> + else > >> + init = (RecoveryModuleInit) > >> + load_external_function(restoreLibrary, > >> "_PG_recovery_module_init", > >> + false, NULL); > > > > Why a special rule for shell, instead of just defaulting the GUC to it? > > I'm not following this one. The default value of the restore_library GUC > is an empty string, which means that the shell commands should be used. I was wondering why we implement "shell" via a separate mechanism from restore_library. I.e. a) why doesn't restore_library default to 'shell', instead of an empty string, b) why aren't restore_command et al implemented using a restore module. > >> + /* > >> + * If using shell commands, remove callbacks for any commands that are > >> not > >> + * set. > >> + */ > >> + if (restoreLibrary[0] == '\0') > >> + { > >> + if (recoveryRestoreCommand[0] == '\0') > >> + RecoveryContext.restore_cb = NULL; > >> + if (archiveCleanupCommand[0] == '\0') > >> + RecoveryContext.archive_cleanup_cb = NULL; > >> + if (recoveryEndCommand[0] == '\0') > >> + RecoveryContext.recovery_end_cb = NULL; > > > > I'd just mandate that these are implemented and that the module has to > > handle > > if it doesn't need to do anything. > > Wouldn't this just force module authors to write empty functions for the > parts they don't need? Yes. But what's the point of a restore library that doesn't implement a restore command? Making some/all callbacks mandatory and validating mandatory callbacks are set, during load, IME makes it easier to evolve the interface over time, because problems become immediately apparent, rather than having to wait for a certain callback to be hit. It's not actually clear to me why another restore library shouldn't be able to use restore_command etc, given that we have the parameters. One quite useful module would be a version of the "shell" interface that runs multiple restore commands in parallel, assuming we'll need subsequent files as well. The fact that restore_command are not run in parallel, and that many useful restore commands have a fair bit of latency, is an issue. So a shell_parallel restore library would e.g. be useful? Greetings, Andres Freund