On Fri, 2021-06-11 at 10:13 +0530, Amit Kapila wrote: > Because sometimes clients don't have to do anything for xlog records. > One example is WAL for DDL where logical decoding didn't produce > anything for the client but later with keepalive we send the LSN of > WAL where DDL has finished and the client just responds with the > position sent by the server as it doesn't have any other pending > transactions.
If I understand correctly, in this situation it avoids the cost of a write on the client just to update its stored LSN progress value when there's no real data to be written. In that case the client would need to rely on the server's confirmed_flush_lsn instead of its own stored LSN progress value. That's a reasonable thing for the *client* to do explicitly, e.g. by just reading the slot's confirmed_flush_lsn and comparing to its own stored lsn. But I don't think it's reasonable for the server to just skip over data requested by the client because it thinks it knows best. > I think because there is no need to process the WAL that has been > confirmed by the client. Do you see any problems with this scheme? Several: * Replication setups are complex, and it can be easy to misconfigure something or have a bug in some control code. An error is valuable to detect the problem closer to the source. * There are plausible configurations where things could go badly wrong. For instance, if you are storing the decoded data in another postgres server with syncrhonous_commit=off, and acknowledging LSNs before they are durable. A crash of the destination system would be consistent, but it would be missing some data earlier than the confirmed_flush_lsn. The client would then request the data starting at its stored lsn progress value, but the server would skip ahead to the confirmed_flush_lsn; silently missing data. * It's contradicted by the docs: "Instructs server to start streaming WAL for logical replication, starting at WAL location XXX/XXX." * The comment acknowledges that a user might expect an error in that case; but doesn't really address why the user would expect an error, and why it's OK to violate that expectation. Regards, Jeff Davis