> On Jun 17, 2021, at 11:34 PM, Peter Smith <smithpb2...@gmail.com> wrote:
> 
> I tried your patch.

Thanks for the quick and thorough review!

> (2) New column "disabled_by_error".
> 
> I wondered if there was actually any need for this column. Isn't the
> same information conveyed by just having "subenabled" = false, at same
> time as as non-empty "suberrmsg"? This would remove any confusion for
> having 2 booleans which both indicate disabled.

Yeah, I wondered about that before posting v1.  I removed the disabled_by_error 
field for v2.

> (3) New columns "disabled_by_error", "disabled_on_error".
> 
> All other columns of the pg_subscription have a "sub" prefix.

I don't feel strongly about this.  How about "subdisableonerr"?  I used that in 
v2.

> I did not find any code using that newly added member "errhint".

Thanks for catching that.  I had tried to remove all references to "errhint" 
before posting v1.  The original idea was that both the message and hint of the 
error would be kept, but in testing I found the hint field was typically empty, 
so I removed it.  Sorry that I left one mention of it lying around.

> (5) dump.c

I didn't bother getting pg_dump working before posting v1, and I still have not 
done so, as I mainly want to solicit feedback on whether the basic direction I 
am going will work for the community.

> (6) Patch only handles errors only from the Apply worker.
> 
> Tablesync can give similar errors (e.g. PK violation during DATASYNC
> phase) which will trigger re-launch forever regardless of the setting
> of "disabled_on_error".
> (confirmed by observations below)

Yes, this is a good point, and also mentioned by Amit.  I have fixed it in v2 
and adjusted the regression test to trigger an automatic disabling for initial 
table sync as well as for change replication.

> 2021-06-18 15:12:45.905 AEST [25904] LOG:  edata is true for
> subscription 'tap_sub': message = "duplicate key value violates unique
> constraint "test_tab_pkey"", hint = "<NONE>"

You didn't call this out, but FYI, I don't intend to leave this particular log 
message in the patch.  It was for development only.  I have removed it for v2 
and have added a different log message much sooner after catching the error, to 
avoid squashing the error in case some other action fails.

The regression test shows this, if you open 
tmp_check/log/022_disable_on_error_subscriber.log:

2021-06-18 16:25:20.138 PDT [56926] LOG:  logical replication subscription "s1" 
will be disabled due to error: duplicate key value violates unique constraint 
"s1_tbl_unique"
2021-06-18 16:25:20.139 PDT [56926] ERROR:  duplicate key value violates unique 
constraint "s1_tbl_unique"
2021-06-18 16:25:20.139 PDT [56926] DETAIL:  Key (i)=(1) already exists.
2021-06-18 16:25:20.139 PDT [56926] CONTEXT:  COPY tbl, line 2

The first line logs the error prior to attempting to disable the subscription, 
and the next three lines are due to rethrowing the error after committing the 
successful disabling of the subscription.  If the attempt to disable the 
subscription itself throws, these additional three lines won't show up, but the 
first one should.  Amit mentioned this upthread.  Do you think this will be ok, 
or would you like to also have a suberrdetail field so that the detail doesn't 
get lost?  I haven't added such an extra field, and am inclined to think it 
would be excessive, but maybe others feel differently?


> ======
> 
> Test: Cause a PK violation in the Tablesync copy (DATASYNC) phase.
> (when disable_on_error = true)
> Observation: This patch changes nothing for this case. The Tablesyn
> re-launchs in a forever loop the same as current functionality.

In v2, tablesync copy errors should also be caught.  The test has been extended 
to cover this also.

Attachment: v2-0001-Optionally-disabling-subscriptions-on-error.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Reply via email to