On Fri, 14 Apr 2023 at 16:00, Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > Dear Julien, > > > Sorry for the delay, I didn't had time to come back to it until this > > afternoon. > > No issues, everyone is busy:-). > > > I don't think that your analysis is correct. Slots are guaranteed to be > > stopped after all the normal backends have been stopped, exactly to avoid > > such > > extraneous records. > > > > What is happening here is that the slot's confirmed_flush_lsn is properly > > updated in memory and ends up being the same as the current LSN before the > > shutdown. But as it's a logical slot and those records aren't decoded, the > > slot isn't marked as dirty and therefore isn't saved to disk. You don't see > > that behavior when doing a manual checkpoint before (per your script > > comment), > > as in that case the checkpoint also tries to save the slot to disk but then > > finds a slot that was marked as dirty and therefore saves it. > > > > In your script's scenario, when you restart the server the previous slot > > data > > is restored and the confirmed_flush_lsn goes backward, which explains those > > extraneous records. > > So you meant to say that the key point was that some records which are not > sent > to subscriber do not mark slots as dirty, hence the updated confirmed_flush > was > not written into slot file. Is it right? LogicalConfirmReceivedLocation() is > called > by walsender when the process gets reply from worker process, so your analysis > seems correct. > > > It's probably totally harmless to throw away that value for now (and > > probably > > also doesn't lead to crazy amount of work after restart, I really don't know > > much about the logical slot code), but clearly becomes problematic with your > > usecase. One easy way to fix this is to teach the checkpoint code to force > > saving the logical slots to disk even if they're not marked as dirty during > > a > > shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to > > not > > interfere with the cfbot). With this patch applied I reliably only see a > > final > > shutdown checkpoint record with your scenario. > > > > Now such a change will make shutdown a bit more expensive when using logical > > replication, even if in 99% of cases you will not need to save the > > confirmed_flush_lsn value, so I don't know if that's acceptable or not. > > In any case we these records must be advanced. IIUC, currently such records > are > read after rebooting but ingored, and this patch just skips them. I have not > measured, > but there is a possibility that is not additional overhead, just a trade-off. > > Currently I did not come up with another solution, so I have included your > patch. > Please see 0002. > > Additionally, I added a checking functions in 0003 > According to pg_resetwal and other functions, the length of > CHECKPOINT_SHUTDOWN > record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + > sizeof(CheckPoint)). > Therefore, the function ensures that the difference between current insert > position > and confirmed_flush_lsn is less than (above + page header).
Logical replication slots can be created only if wal_level >= logical, currently we do not have any check to see if wal_level >= logical if "--include-logical-replication-slots" option is specified. If include-logical-replication-slots is specified with pg_upgrade, we will be creating replication slots after a lot of steps like performing prechecks, analyzing, freezing, deleting, restoring, copying, setting related objects and then create replication slot, where we will be erroring out after a lot of time(Many cases pg_upgrade takes a lot of hours to perform these operations). I feel it would be better to add a check in the beginning itself somewhere in check_new_cluster to see if wal_level is set appropriately in case of include-logical-replication-slot option to detect and throw an error early itself. Regards, Vignesh