On Thu, May 27, 2021 at 7:04 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, May 27, 2021 at 1:46 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > On Thu, May 27, 2021 at 2:48 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > Okay, that makes sense but still not sure how will you identify if we > > > need to reset XID in case of failure doing that in the previous > > > attempt. > > > > It's a just idea but we can record the failed transaction with XID as > > well as its commit LSN passed? The sequence I'm thinking is, > > > > 1. the worker records the XID and commit LSN of the failed transaction > > to a catalog. > > > > When will you record this info? I am not sure if we can try to update > this when an error has occurred. We can think of using try..catch in > apply worker and then record it in catch on error but would that be > advisable? One random thought that occurred to me is to that apply > worker notifies such information to the launcher (or maybe another > process) which will log this information.
Yeah, I was concerned about that too and had the same idea. The information still could not be written if the server crashes before the launcher writes it. But I think it's an acceptable. > > > 2. the user specifies how to resolve that conflict transaction > > (currently only 'skip' is supported) and writes to the catalog. > > 3. the worker does the resolution method according to the catalog. If > > the worker didn't start to apply those changes, it can skip the entire > > transaction. If did, it rollbacks the transaction and ignores the > > remaining. > > > > The worker needs neither to reset information of the last failed > > transaction nor to mark the conflicted transaction as resolved. The > > worker will ignore that information when checking the catalog if the > > commit LSN is passed. > > > > So won't this require us to check the required info in the catalog > before applying each transaction? If so, that might be overhead, maybe > we can build some cache of the highest commitLSN that can be consulted > rather than the catalog table. I think workers can cache that information when starts and invalidates and reload the cache when the catalog gets updated. Specifying to skip XID will update the catalog, invalidating the cache. > I think we need to think about when to > remove rows for which conflict has been resolved as we can't let that > information grow infinitely. I guess we can update catalog tuples in place when another conflict happens next time. The catalog tuple should be fixed size. The already-resolved conflict will have the commit LSN older than its replication origin's LSN. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/