I am unfortunately not really familiar with Google Cloud SQL internals as well. But we have seen this happen on Amazon RDS as well.
Could it be possible that we are requesting a future WAL segment, say WAL upto X is written and we are asking for X + 1? It could be that the error message is misleading. I do not have the information from pg_replication_slots as I have terminated the test. I am fairly certain that I can reproduce this again. I will gather both the restart_lsn and contents of pg_wal for the failed segment. Is there any other information that would help debug this further? On Fri, Dec 29, 2023 at 4:16 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > On 12/27/23 16:31, Kaushik Iska wrote: > > Hi all, > > > > I'm including additional details, as I am able to reproduce this issue a > > little more reliably. > > > > Postgres Version: POSTGRES_14_9.R20230830.01_07 > > Vendor: Google Cloud SQL > > Logical Replication Protocol version 1 > > > > I don't know much about Google Cloud SQL internals. Is it relatively > close to Postgres (as e.g. RDS) or are the internals very different / > modified for cloud environments? > > > Here are the logs of attempt succeeding right after it fails: > > > > 2023-12-27 01:12:40.581 UTC [59790]: [6-1] db=postgres,user=postgres > > STATEMENT: START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL > > 6/5AE67D79 (proto_version '1', publication_names > > 'peerflow_pub_wal_testing_2') <- FAILS > > 2023-12-27 01:12:41.087 UTC [59790]: [7-1] db=postgres,user=postgres > > ERROR: requested WAL segment 000000010000000600000059 has already been > > removed > > 2023-12-27 01:12:44.581 UTC [59794]: [3-1] db=postgres,user=postgres > > STATEMENT: START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL > > 6/5AE67D79 (proto_version '1', publication_names > > 'peerflow_pub_wal_testing_2') <- SUCCEEDS > > 2023-12-27 01:12:44.582 UTC [59794]: [4-1] db=postgres,user=postgres > > LOG: logical decoding found consistent point at 6/5A31F050 > > > > Happy to include any additional details of my setup. > > > > I personally don't see how could this fail and then succeed, unless > Google does something smart with the WAL segments under the hood. Surely > we try to open the same WAL segment (given the LSN is the same), so how > could it not exist and then exist? > > As Ron already suggested, it might be useful to see information for the > replication slot peerflow_slot_wal_testing_2 (especially the restart_lsn > value). Also, maybe show the contents of pg_wal (especially for the > segment referenced in the error message). > > Can you reproduce this outside Google cloud environment? > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >