Hello, Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> writes:
> restart_lsn stays at the beginning of a transaction until the > transaction ends so just using restart_lsn allows repeated > decoding of a transaction, in short, rewinding occurs. The > function works only for inactive slot so the current code works > fine on this point. Sorry, I do not follow. restart_lsn is advanced whenever there is a consistent snapshot dumped (in xl_running_xacts) which is old enough to wholly decode all xacts not yet confirmed by the client. Could you please elaborate, what's wrong with that? > Addition to that restart_lsn also can be on a > page bounary. Do you have an example of that? restart_lsn is set initially to WAL insert position at ReplicationSlotReserveWal, and later it always points to xl_running_xacts record with consistent snapshot dumped. > So directly set ctx->reader->EndRecPtr by startlsn fixes the > problem, but I found another problem here. There is a minor issue with the patch. Now slot advancement hangs polling for new WAL on my example from [1]; most probably because we must exit the loop when ctx->reader->EndRecPtr == moveto. > The function accepts any LSN even if it is not at the begiining > of a record. We will see errors or crashs or infinite waiting or > maybe any kind of trouble by such values. The moved LSN must > always be at the "end of a record" (that is, at the start of the > next recored). The attached patch also fixes this. Indeed, but we have these problems only if we are trying to read WAL since confirmed_flush. [1] https://www.postgresql.org/message-id/873720e4hf.fsf%40ars-thinkpad -- Arseny Sher Postgres Professional: http://www.postgrespro.com The Russian Postgres Company