At Mon, 30 Oct 2023 14:55:01 +0530, Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote in > > I get it. I agree to go with just the assert because the GUC > > check_hook kinda tightens the screws against setting > > max_slot_wal_keep_size to a value other than -1 during the binary > > upgrade,
Thanks for being on the same page. > A few comments on your inhibit_m_s_w_k_s_during_upgrade_2.txt: > > 1. > + * All WAL files on the publisher node must be retained during an upgrade to > + * maintain connections from downstreams. While pg_upgrade explicitly sets > > It's not just the publisher, anyone using logical slots. Also, no > downstream please. If you want, you can repurpose the comment that's > added by 29d0a77f. > > /* > * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the > * checkpointer process. If WALs required by logical replication slots > * are removed, the slots are unusable. This setting prevents the > * invalidation of slots during the upgrade. We set this option when > * cluster is PG17 or later because logical replication slots can only be > * migrated since then. Besides, max_slot_wal_keep_size is added in PG13. > */ It is helpful. Thanks! > 2. > At present, only logical slots really require > + * this. > > Can we remove the above comment as the code with SlotIsLogical(s) > explains it all? max_slot_wal_keep_size affects both logical and physical slots. Therefore, if we are interested only one of the two types of slots, it's important to clarify the rationale. Regardless of any potential exntension to physical slots, I believe it's essential to clarify the rationale. I couldn't determine from that extensive thread whether there's a possible extension to physical slots. Could you inform me if such an extension can happen and, if not, provide the reason? > 3. > + GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set > to -1 during the upgrade."); > + return false; > > How about we be explicit like the following which helps users reason > about this restriction instead of them looking at the comments/docs? > > GUC_check_errcode(ERRCODE_INVALID_PARAMETER_VALUE); > GUC_check_errmsg(""\"max_slot_wal_keep_size\" must be set > to -1 when in binary upgrade mode"); > GUC_check_errdetail("A value of -1 prevents the removal of > WAL required for logical slots upgrade."); > return false; I don't quite see the reason to provide such a detailed explanation just for this error. Additionally, since this check is performed regardless of the presence or absense of logical slots, I think the errdetail message might potentially confuse those whosee it. Adding "binary" looks fine as is and done in the attached. > 4. I think a test case to hit the error in the check hook in > 003_logical_slots.pl will help a lot here - not only covers the code > but also helps demonstrate how one can reach the error. Yeah, of course. I was planning to add tests once the direction of the discussion became clear. I will add them in the next version. > 5. I think the check_hook is better defined in xlog.c the place where > it's actually being declared and in action. IMV, there's no reason for > it to be in slot.c as it doesn't deal with any slot related > variables/structs. This also avoids an unnecessary "utils/guc_hooks.h" > inclusion in slot.c. > +bool > +check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source) > +{ Sounds reasonable. Moved. I simply moved it to xlog.c, but the function comment was thoroughly written only for this moved function, making it somewhat stand out.. > 5. A possible problem with this check_hook approach is that it doesn't > let anyone setting max_slot_wal_keep_size to a value other than -1 > during pg_ugprade even if someone doesn't have logical slots or > doesn't want to upgrade logical slots in which case the WAL file > growth during pg_upgrade may be huge (transiently) unless the > pg_resetwal step of pg_upgrade removes it at the end. While I doubt anyone wishes to set the variable to a specific value during upgrade, think there are individuals who might be reluctant to edit the config file due to unclear reasons. While we could consider an alternative - checking for logical slots during binary upgrade- it's debatable if the effort is justified. (I haven't verified its feasibility, however.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b541be8eec..d83b55da68 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2063,6 +2063,27 @@ check_wal_segment_size(int *newval, void **extra, GucSource source) return true; } +/* + * GUC check_hook for max_slot_wal_keep_size + * + * If WALs required by logical replication slots are removed, the slots are + * unusable. While pg_upgrade sets this variable to -1 via the command line to + * attempt to prevent such removal during binary upgrade, there are ways for + * users to override it. For the sake of completing the objective, ensure that + * this variable remains unchanged. See InvalidatePossiblyObsoleteSlot() and + * start_postmaster() in pg_upgrade for more details. + */ +bool +check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source) +{ + if (IsBinaryUpgrade && *newval != -1) + { + GUC_check_errdetail("\"max_slot_wal_keep_size\" must be set to -1 during binary upgrade mode."); + return false; + } + return true; +} + /* * At a checkpoint, how many WAL segments to recycle as preallocated future * XLOG segments? Returns the highest segment that should be preallocated. diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 99823df3c7..b161a0603a 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1424,18 +1424,12 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause, SpinLockRelease(&s->mutex); /* - * The logical replication slots shouldn't be invalidated as - * max_slot_wal_keep_size GUC is set to -1 during the upgrade. - * - * The following is just a sanity check. + * check_max_slot_wal_keep_size() ensures max_slot_wal_keep_size is set + * to -1, so, slot invalidation for logical slots shouldn't happen + * during an upgrade. At present, only logical slots really require + * this. */ - if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade) - { - ereport(ERROR, - errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("replication slots must not be invalidated during the upgrade"), - errhint("\"max_slot_wal_keep_size\" must be set to -1 during the upgrade")); - } + Assert (!*invalidated || !SlotIsLogical(s) || !IsBinaryUpgrade); if (active_pid != 0) { diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 7605eff9b9..818ffdb2ae 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -2845,7 +2845,7 @@ struct config_int ConfigureNamesInt[] = }, &max_slot_wal_keep_size_mb, -1, -1, MAX_KILOBYTES, - NULL, NULL, NULL + check_max_slot_wal_keep_size, NULL, NULL }, { diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h index 2a191830a8..3d74483f44 100644 --- a/src/include/utils/guc_hooks.h +++ b/src/include/utils/guc_hooks.h @@ -84,6 +84,8 @@ extern bool check_maintenance_io_concurrency(int *newval, void **extra, extern void assign_maintenance_io_concurrency(int newval, void *extra); extern bool check_max_connections(int *newval, void **extra, GucSource source); extern bool check_max_wal_senders(int *newval, void **extra, GucSource source); +extern bool check_max_slot_wal_keep_size(int *newval, void **extra, + GucSource source); extern void assign_max_wal_size(int newval, void *extra); extern bool check_max_worker_processes(int *newval, void **extra, GucSource source);