On 11 March 2016 at 08:19, Craig Ringer <cr...@2ndquadrant.com> wrote:
> Hi all > > I think I found a couple of logical decoding issues while writing tests > for failover slots. > > Despite the docs' claim that a logical slot will replay data "exactly > once", a slot's confirmed_lsn can go backwards and the SQL functions can > replay the same data more than once.We don't mark a slot as dirty if only > its confirmed_lsn is advanced, so it isn't flushed to disk. For failover > slots this means it also doesn't get replicated via WAL. After a master > crash, or for failover slots after a promote event, the confirmed_lsn will > go backwards. Users of the SQL interface must keep track of the safely > locally flushed slot position themselves and throw the repeated data away. > Unlike with the walsender protocol it has no way to ask the server to skip > that data. > Yeh, I read that and thought it was wrong. "At least once, in order" is what we want, how it works and how it should be described. > Worse, because we don't dirty the slot even a *clean shutdown* causes slot > confirmed_lsn to go backwards. That's a bug IMO. We should force a flush of > all slots at the shutdown checkpoint, whether dirty or not, to address it. > Given the earlier understanding, above, I'm not sure I see why that must happen. But I agree it should happen. > Barring objections I intend to submit a fix to: > > - Document that slots can replay data more than once > - Force flush of all slots on a shutdown checkpoint > +1 > Also, pg_logical_slot_get_changes and its _peek_ variant should have a > param specifying the starting LSN to read and return. If this is lower than > the restart_lsn but non-null it should ERROR; if it's greater than or equal > it should use this position instead of starting at the confirmed_lsn. > Maybe. I don't really like changing APIs. Can we add new funcs? Make sure a NULL LSN can be passed as well. Is the return type of pg_logical_slot_peek_changes() incorrect in the docs? > Time permitting I'd like to add a pg_logical_slot_confirm function, so you > can aternate _peek_changes and _confirm, making it possible to get > walsender-interface-like behaviour via the SQL interface. > I thought thats what replorigins do. > Right now you can't reliably use the SQL interface because _get_changes > can succeed locally and advance the slot but the client can fail to receive > all the changes due to network issues etc. Sure, the SQL interface is meant > mainly for testing, but especially for !postgresql downstreams I strongly > suspect people will want to use it for "real" work rather than have to > modify each client driver to support replication protocol extensions. > -- Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services