On 11 March 2016 at 20:15, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:

> Craig Ringer wrote:
> > Hi all
> >
> > I think I found a couple of logical decoding issues while writing tests
> for
> > failover slots.
> >
> > Despite the docs' claim that a logical slot will replay data "exactly
> > once", a slot's confirmed_lsn can go backwards and the SQL functions can
> > replay the same data more than once.We don't mark a slot as dirty if only
> > its confirmed_lsn is advanced, so it isn't flushed to disk. For failover
> > slots this means it also doesn't get replicated via WAL. After a master
> > crash, or for failover slots after a promote event, the confirmed_lsn
> will
> > go backwards.  Users of the SQL interface must keep track of the safely
> > locally flushed slot position themselves and throw the repeated data
> away.
> > Unlike with the walsender protocol it has no way to ask the server to
> skip
> > that data.
> >
> > Worse, because we don't dirty the slot even a *clean shutdown* causes
> slot
> > confirmed_lsn to go backwards. That's a bug IMO. We should force a flush
> of
> > all slots at the shutdown checkpoint, whether dirty or not, to address
> it.
>
> Why don't we mark the slot dirty when confirmed_lsn advances?  If we fix
> that, doesn't it fix the other problems too?
>

Yes, it does.

That'll cause slots to be written out at checkpoints when they otherwise
wouldn't have to be, but I'd rather be doing a little more work in this
case. Compared to the disk activity from WAL decoding etc the effect should
be undetectable anyway.

Andres? Any objection to dirtying a slot when the confirmed lsn advances,
so we write it out at the next checkpoint?

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply via email to