On Tue, Nov 12, 2024 at 8:35 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, Nov 11, 2024 at 9:05 PM Tomas Vondra <to...@vondra.me> wrote: > > > > On 11/11/24 09:19, Amit Kapila wrote: > > > > > > I can't think of a solution other than the current proposal where we > > > do both the operations (reserve WAL space for commit and adjust > > > commit_timestamp, if required) together to resolve LSN<->Timestamp > > > invertion issue. Please let us know if you have any other ideas for > > > solving this. Even, if we want to change ReserveXLogInsertLocation to > > > make the locked portion an atomic operation, we can still do it for > > > records other than commit. Now, if we don't have any other solution > > > for LSN<->Timestamp inversion issue, changing the entire locked > > > portion to atomic will close the door to solving this problem forever > > > unless we have some other way to solve it which can make it difficult > > > to rely on commit_timestamps for certain scenarios. > > > > > > > I don't know what the solution is, isn't the problem that > > > > (a) we record both values (LSN and timestamp) during commit > > > > (b) reading timestamp from system clock can be quite expensive > > > > It seems to me that if either of those two issues disappeared, we > > wouldn't have such an issue. > > > > For example, imagine getting a timestamp is super cheap - just reading > > and updating a simple counter from shmem, just like we do for the LSN. > > Wouldn't that solve the problem? > > > > Yeah, this is exactly what I thought. > > > For example, let's say XLogCtlInsert has two fields > > > > int64 CommitTimestamp; > > > > and that ReserveXLogInsertLocation() also does this for each commit: > > > > commit_timestamp = Insert->commit_timestamp++; > > > > while holding the insertpos_lock. Now we have the LSN and timestamp > > perfectly correlated. > > > > Right, and the patch sent by Hou-San [1] (based on the original patch > by Jan) is somewhat on these lines. The idea you have shared or > implemented by the patch is a logical clock stored in shared memory. > So, what the patch does is that if the time recorded by the current > commit record is lesser than or equal to the logical clock (which > means after we record time in the commit code path and before we > reserve the LSN, there happens a concurrent transaction), we shall > increment the value of logical clock by one and use that as commit > time. > > So, in most cases, we need to perform one additional "if check" and > "an assignment" under spinlock, and that too only for the commit > record. In some cases, when inversion happens, there will be "one > increment" and "two assignments." >
As the inversion issue can mainly hamper logical replication-based solutions we can do any of this additional work under spinlock only when the current record is a commit record (which the currently proposed patch is already doing) and "wal_level = logical" and also can have another option at the subscription level to enable this new code path. I am not sure what is best but just sharing the ideas here. -- With Regards, Amit Kapila.