Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

Andres Freund Thu, 01 Jun 2017 18:15:06 -0700

On 2017-06-02 10:05:21 +0900, Michael Paquier wrote:
> On Fri, Jun 2, 2017 at 9:29 AM, Andres Freund <[email protected]> wrote:
> > On 2017-06-02 08:38:51 +0900, Michael Paquier wrote:
> >> On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund <[email protected]> wrote:
> >> > I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
> >> > Normally INT is used cancel interrupts, and since walsender is now also
> >> > working as a normal backend, this overlap is bad.  Even for plain
> >> > walsender backend this seems bad, because now non-superusers replication
> >> > users will terminate replication connections when they do
> >> > pg_cancel_backend().  For replication=dbname users it's especially bad
> >> > because there can be completely legitimate uses of pg_cancel_backend().
> >>
> >> Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
> >> for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
> >> for a non-am_walsender backend. Am I missing something?
> >
> > Yes, but nothing in those observeration actually addresses my point?
> 
> I am still confused by your previous email, which, at least it seems
> to me, implies that logical WAL senders have been working correctly
> with query cancellations. Now SIGINT is just ignored, which means that
> pg_cancel_backend() has never worked for WAL senders until now, and
> this behavior has not changed with 086221c. So there is no new
> breakage introduced by this commit. I get your point to reuse SIGINT
> for query cancellations though, but that's a new feature.


The issue is that the commit made a non-existant feature
(pg_cancel_backend() to walsenders) into a broken one (pg_cancel_backend
terminates walsenders).  Additionally v10 added something new
(walsenders executing SQL), and that will need at least some signal
handling fixes - hard to do if e.g. SIGINT is reused for something else.

> > a) upon shutdown checkpointer (so we can use procsignal), not
> >    postmaster, sends PROCSIG_WALSND_INIT_STOPPING to all walsenders (so
> >    we don't have to use up a normal signal handler)
> 
> You'll need a second one that wakes up the latch of the WAL senders to
> send more WAL records.

Don't think so, procsignal_sigusr1_handler serves quite well for that.
There's nearby discussion that we need to do so anyway, to fix recovery
conflict interrupts, parallelism interrupts and such.


> > b) Upon reception walsenders immediately exit if !replication_active,
> >    otherwise sets got_STOPPING
> 
> Okay, that's what happens now anyway, any new replication command
> received results in an error. I actually prefer the way of doing in
> HEAD, which at least reports an error.

Err, no. What happens right now is that plainly nothing is done if a
connection is idle or busy executing things.  Only if a new command is
sent we error out - that makes very little sense.


> > c) Logical decoding will finish if got_STOPPING is sent, *WITHOUT*, as
> >    currently done, moving to WALSNDSTATE_STOPPING.  Not yet quite sure
> >    how to best handle the WALSNDSTATE_STOPPING transition in WalSndLoop().
> 
> Wouldn't it make sense to have the logical receivers be able to
> receive WAL up to the end of checkpoint record?

Yea, that's what I'm doing.  For that we really only need to change the
check in WalSndWaitForWal() check of got_SIGINT to got_STOPPING, and add
a XLogSendLogical() check in the WalSndCaughtUp if() that sets
got_SIGUSR2 *without* setting WALSNDSTATE_STOPPING (otherwise we'd
possibly continue to trigger wal records until the send buffer is
emptied).

- Andres


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication and PANIC during shutdown checkpoint in publisher

Reply via email to