On Thu, Sep 7, 2023 at 3:34 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, Sep 7, 2023 at 12:55 PM Michael Paquier <mich...@paquier.xyz> wrote: > > > > (Just dumping what I have in mind while reading the thread.) > > > > On Sat, Sep 02, 2023 at 10:08:51AM +0530, Amit Kapila wrote: > > > During pg_upgrade, we start the server for the old cluster which can > > > allow the checkpointer to remove the WAL files. It has been noticed > > > that we do generate certain types of WAL records (e.g > > > XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT) > > > even during pg_upgrade for old cluster, so additional WAL records > > > could let checkpointer decide that certain WAL segments can be removed > > > (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate > > > the slots. Currently, I can't see any problem with this but for future > > > work where we want to migrate logical slots during an upgrade[1], we > > > need to decide what to do for such cases. The initial idea we had was > > > that if the old cluster has some invalid slots, we won't allow an > > > upgrade unless the user removes such slots or uses some option like > > > --exclude-slots. It is quite possible that slots got invalidated > > > during pg_upgrade due to no user activity. Now, even though the > > > possibility of the same is less I think it is worth considering what > > > should be the behavior. > > > > > > The other possibilities apart from not allowing an upgrade in such a > > > case could be (a) Before starting the old cluster, we fetch the slots > > > directly from the disk using some tool like [2] and make the decisions > > > based on that state; (b) During the upgrade, we don't allow WAL to be > > > removed if it can invalidate slots; (c) Copy/Migrate the invalid slots > > > as well but for that, we need to expose an API to invalidate the > > > slots; (d) somehow distinguish the slots that are invalidated during > > > an upgrade and then simply copy such slots because anyway we ensure > > > that all the WAL required by slot is sent before shutdown. > > > > Checking for any invalid slots at the beginning of the upgrade and > > complain sounds like a good thing to do, because these are not > > expected to begin with, no? That's similar to a pre-check requirement > > that the slots should have fed the subscribers with all the data > > available up to the shutdown checkpoint when the publisher was stopped > > before running pg_upgrade. So (a) is a good idea to prevent an > > upgrade based on a state we don't expect from the start, as something > > in check.c, I assume. > > > > On top of that, (b) sounds like a good idea to me anyway to be more > > defensive. But couldn't we do that just like we do for autovacuum and > > force the GUCs that could remove the slot's WAL to not do their work > > instead? > > > > I think if we just make max_slot_wal_keep_size to -1 that should be > sufficient to not let any slots get invalidated during upgrade. Do you > have anything else in mind?
This seems like a good solution to the problem. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com