On Sat, Feb 19, 2022 at 1:17 AM David G. Johnston <david.g.johns...@gmail.com> wrote: > > On Fri, Feb 18, 2022 at 1:26 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: >> >> >> Here is the summary of the discussion, changes, and plan. >> >> 1. Move some error information such as the error message to a new >> system catalog, pg_subscription_error. The pg_subscription_error table >> would have the following columns: >> >> * sesubid : subscription Oid. >> * serelid : relation Oid (NULL for apply worker). >> * seerrlsn : commit-LSN or the error transaction. >> * seerrcmd : command (INSERT, UPDATE, etc.) of the error transaction. >> * seerrmsg : error message > > > Not a fan of the "se" prefix but overall yes. We should include a timestamp. >
How about naming it pg_subscription_worker_error as Peter E. has suggested in one of his emails? I find pg_subscription_error slightly odd as one could imagine that even the errors related to subscription commands like Alter Subscription. >> >> The tuple is inserted or updated when an apply worker or a tablesync >> worker raises an error. If the same error occurs in a row, the update >> is skipped. > Are you going to query table to check if it is same error? > > I disagree with this - I would treat every new instance of an error to be > important and insert on conflict (sesubid, serelid) the new entry, updating > lsn/cmd/msg with the new values. > I don't think that will be a problem but what advantage are you envisioning with updating the same info except for timestamp? >> The tuple is removed in the following cases: >> >> * the subscription is dropped. >> * the table is dropped. >> >> * the table is removed from the subscription. >> * the worker successfully committed a non-empty transaction. > > > Correct. This handles the "end of sync worker" just fine since its final > action should be a successful commit of a non-empty transaction. >> I think for tablesync workers, we may need slightly different handling as there could probably be no transactions to apply apart from the initial copy. Now, I think for tablesync worker, we can't postpone it till after we update the rel state as SUBREL_STATE_SYNCDONE because if we do it after that and there is some error updating/deleting the tuple, the tablesync worker won't be launched again and that entry will remain in the system for a longer duration. -- With Regards, Amit Kapila.