I'm not sure this fits -hackers.. At Tue, 2 Aug 2022 18:57:41 -0700, Dmitry Koterov <dmitry.kote...@gmail.com> wrote in > Hi. > > Suppose on master, I run a *multi-query* using PQexec and save the value > returned by pg_current_wal_insert_lsn: > > master_lsn = query(master, "INSERT INTO some VALUES (...); SELECT > pg_current_wal_insert_lsn()") > > Then I run a PQexec query on a replica and save the value returned by > pg_last_wal_replay_lsn: > > replica_lsn = query(replica, "SELECT pg_last_wal_replay_lsn()") > > The question to experts in PG internals: *is it guaranteed that, as long as > replica_lsn >= master_lsn (GREATER OR EQUAL, not just greater), then a > subsequent read from replica will always return me the inserted record* > (i.e. the replica is up to date), considering noone updates/deletes in that > table?
https://www.postgresql.org/docs/devel/libpq-exec.html > The command string can include multiple SQL commands (separated by > semicolons). Multiple queries sent in a single PQexec call are > processed in a single transaction, unless there are explicit > BEGIN/COMMIT commands included in the query string to divide it into > multiple transactions. If the query() runs PQexec() with the same string, the call to pg_current_wal_insert_lsn() is made before the insert is commited. That behavior can be emulated on psql. (The backslash before semicolon is crucial. It lets the connected queries be sent in a single PQexec()) =# select pg_current_wal_insert_lsn(); pg_current_wal_insert_lsn --------------------------- 0/68E5038 (1 row) =# insert into t values(0)\; select pg_current_wal_lsn(); INSERT 0 1 pg_current_wal_lsn -------------------- 0/68E5038 (1 row) =# select pg_current_wal_insert_lsn(); pg_current_wal_insert_lsn --------------------------- 0/68E50A0 (1 row) $ pg_waldump -s'0/68E5038' -e'0/68E50A0' $PGDATA/pg_wal/000000010000000000000006 rmgr: Heap len (rec/tot): 59/ 59, tx: 770, lsn: 0/068E5038, prev 0/068E5000, desc: INSERT off 15 flags 0x00, blkref #0: rel 1663/5/16424 blk 0 rmgr: Transaction len (rec/tot): 34/ 34, tx: 770, lsn: 0/068E5078, prev 0/068E5038, desc: COMMIT 2022-08-03 15:49:43.749158 JST So, the replica cannot show the inserted data at the LSN the function returned. If you explicitly ended transaction before pg_current_wal_insert_lsn() call, the expected LSN would be returned. =# select pg_current_wal_insert_lsn(); pg_current_wal_insert_lsn --------------------------- 0/68E75C8 (1 row) =# begin\;insert into t values(0)\;commit\; select pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/68E7958 $ pg_waldump -s'0/68E75C8' -e'0/68E7958' $PGDATA/pg_wal/000000010000000000000006 prev 0/068E7590, desc: INSERT off 22 flags 0x00, blkref #0: rel 1663/5/16424 blk 0 FPW rmgr: Transaction len (rec/tot): 34/ 34, tx: 777, lsn: 0/068E7930, prev 0/068E75C8, desc: COMMIT 2022-08-03 16:09:13.516498 JST > I'm asking, because according to some hints in the docs, this should be > true. But for some reason, we have to use "greater" (not "greater or > equals") condition in the real code, since with just ">=" the replica > doesn't sometimes read the written data. Thus the wrong LSN appears to have caused the behavior. regards. -- Kyotaro Horiguchi NTT Open Source Software Center