On Mon, Jul 19, 2021 at 2:22 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Fri, Jul 16, 2021 at 8:33 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > On Wed, Jul 14, 2021 at 5:14 PM Masahiko Sawada <sawada.m...@gmail.com> > > wrote: > > > > > > Sounds good. I'll incorporate this in the next version patch that I'm > > > planning to submit this week. > > > > Sorry, I could not make it this week. I'll submit them early next week. > > > > No problem. > > > While updating the patch I thought we need to have more design > > discussion on two points of clearing error details after the error is > > resolved: > > > > 1. How to clear apply worker errors. IIUC we've discussed that once > > the apply worker skipped the transaction we leave the error entry > > itself but clear its fields except for some fields such as failure > > counts. But given that the stats messages could be lost, how can we > > ensure to clear those error details? For table sync workers’ error, we > > can have autovacuum workers periodically check entires of > > pg_subscription_rel and clear the error entry if the table sync worker > > completes table sync (i.g., checking if srsubstate = ‘r’). But there > > is no such information for the apply workers and subscriptions. > > > > But won't the corresponding subscription (pg_subscription) have the > XID as InvalidTransactionid once the xid is skipped or at least a > different XID then we would have in pg_stat view? Can we use that to > reset entry via vacuum?
I think the XID is InvalidTransaction until the user specifies it. So I think we cannot know whether we're before skipping or after skipping only by the transaction ID. No? > > > In > > addition to sending the message clearing the error details just after > > skipping the transaction, I thought that we can have apply workers > > periodically send the message clearing the error details but it seems > > not good. > > > > Yeah, such things should be a last resort. > > > 2. Do we really want to leave the table sync worker even after the > > error is resolved and the table sync completes? Unlike the apply > > worker error, the number of table sync worker errors could be very > > large, for example, if a subscriber subscribes to many tables. If we > > leave those errors in the stats view, it uses more memory space and > > could affect writing and reading stats file performance. If such left > > table sync error entries are not helpful in practice I think we can > > remove them rather than clear some fields. What do you think? > > > > Sounds reasonable to me. One might think to update the subscription > error count by including table_sync errors but not sure if that is > helpful and even if that is helpful, we can extend it later. Agreed. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/