Hi, [1] reminded me of a topic that I wanted to bring up at some point:
To me the division of labor between initdb and bootstrap doesn't make much sense anymore: initdb reads postgres.bki, replaces a few tokens, starts postgres in bootstrap mode, and then painstakenly feeds bootstrap.bki lines to the server. Given that bootstrap mode parsing is a dedicated parser, only invoked from a single point, what's the point of initdb doing the preprocessing and then incurring pipe overhead? Sure, there's a few tokens that we replace in initdb. As it turns out there's only two rows that are actually variable. The username of the initial superuser in pg_authid and the pg_database row for template 1, where encoding, lc_collate and lc_ctype varies. The rest is all compile time constant replacements we could do as part of genbki.pl. It seems we could save a good number of context switches by opening postgres.bki just before boot_yyparse() in BootstrapModeMain() and having the parser read it. The pg_authid / pg_database rows we could just do via explicit insertions in BootstrapModeMain(), provided by commandline args? Similarly, since the introduction of extensions at the latest, the server knows how to execute SQL from a file. Why don't we just process information_schema.sql, system_views.sql et al that way? If we don't need a dedicated "input" mode feeding boot_yyparse() in bootstrap mode anymore (because bootstrap mode feeds it from postgres.bki directly), we likely could avoid the restart between bootstrap and single user mode. Afaics that only really is needed because we need to send SQL after bootstrap_template1(). That'd likely be a nice speedup, because we don't need to write the bootstrap contents from shared buffers to the OS just to read them back in single user mode. I don't plan to work on this immediately, but I thought it's worth bringing up anyway. Greetings, Andres Freund [1] https://www.postgresql.org/message-id/20220216012953.6d7bzmsblqou3ru4%40alap3.anarazel.de