On Thu, Aug 10, 2023 at 7:07 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Thu, Aug 10, 2023 at 12:52 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > Are you suggesting doing this before we start the old cluster or after > > we stop the old cluster? I was thinking about the pros and cons of > > doing this check when the server is 'on' (along with other upgrade > > checks something like the patch is doing now) versus when the server > > is 'off'. I think the advantage of doing it when the server is 'off' > > (after check_and_dump_old_cluster()) is that it will be ensured that > > there is no extra WAL that could be generated during the upgrade and > > has not been verified against confirmed_flush_lsn location. But OTOH, > > to retrieve slot information when the server is 'off', we need a > > separate utility or probably a functionality for the same in > > pg_upgrade and also some WAL reading stuff which sounds to me like a > > larger change that may not be warranted here. I think anyway the extra > > WAL (if any got generated during the upgrade) won't be required after > > the upgrade so not convinced to make such a check while the server is > > 'off'. Are there reasons which make it better to do this while the old > > cluster is 'off'? > > What I imagined is that we do this check before > check_and_dump_old_cluster() while the server is 'off'. Reading the > slot state file would be simple and I guess we would not need a tool > or cli program for that. We need to expose RepliactionSlotOnDisk, > though. >
Won't that require a lot of version-specific checks as across versions the file format could be different? For the case of the control file, we use version-specific pg_controldata (for the old cluster, the corresponding version's pg_controldata) utility to read the old version control file. I thought we need something similar here if we want to do what you are suggesting. > > After reading the control file and the slots' state files we > check if slot's confirmed_flush_lsn matches the latest checkpoint LSN > in the control file (BTW maybe we can get slot name and plugin name > here instead of using pg_dump?). But isn't the advantage of doing via pg_dump (in binary_mode) that we allow some outside core in-place upgrade tool to also use it if required? If we don't think that would be required then we can probably use the info we retrieve it in pg_upgrade. > > the first commit. Or another idea would be to allow users to mark > replication slots "upgradable" so that pg_upgrade skips the > confirmed_flush_lsn check. > I guess for that we need to ask users to ensure that confirm_flush_lsn is up-to-date and then provide some slot-level API to mark the slots with the required status. If so, that sounds a bit complicated for users. -- With Regards, Amit Kapila.