On 11/11/24 23:22, Amit Kapila wrote:
On Mon, Nov 11, 2024 at 9:05 PM Tomas Vondra <to...@vondra.me> wrote:

Alternatively, we could simply stop relying on the timestamps recorded
in the commit, and instead derive "monotonic" commit timestamps after
the fact. For example, we could track timestamps for some subset of
commits, and then approximate the rest using LSN.

AFAIK the inversion can happen only for concurrent commits, and there's
can't be that many of those. So if we record the (LSN, timestamp) for
every 1MB of WAL, we approximate timestamps for commits in that 1MB by
linear approximation.

Of course, there's a lot of questions and details to solve - e.g. how
often would it need to happen, when exactly would it happen, etc. And
also how would that integrate with the logical decoding - it's easy to
just get the timestamp from the WAL record, this would require more work
to actually calculate it. It's only a very rough idea.


I think for logical decoding it would be probably easy because it
reads all the WAL. So, it can remember the commit time of the previous
commit, and if any future commit has a commit timestamp lower than
that it can fix it by incrementing it. But outside logical decoding,
it would be tricky because we may need a separate process to fix up
commit timestamps by using linear approximation. IIUC, the bigger
challenge is that such a solution would require us to read the WAL on
a continuous basis and keep fixing commit timestamps or we need to
read the extra WAL before using or relying on commit timestamp. This
sounds to be a somewhat complex and costlier solution though the cost
is outside the hot-code path but still, it matters as it leads to
extra read I/O.


I had originally experimented with adjusting the timestamps sent to the subscriber in the output plugin. This works for the most part but has two major flaws:

1) In case the system is restarted in a situation where it has some forward clock skew and the restart LSN is right in the middle of it, the output plugin has no knowledge of that and again emits backwards running timestamps.

2) The origin and the subscriber now have a different timestamp associated with the same transaction. That can lead to different behavior in conflict resolution when a third node gets involved.


Best Regards, Jan


Reply via email to