Re: [HACKERS] Logical decoding of sequence advances, part II

Craig Ringer Mon, 22 Aug 2016 16:40:17 -0700

On 23 Aug 2016 05:43, "Kevin Grittner" <kgri...@gmail.com> wrote:
>
> On Mon, Aug 22, 2016 at 3:29 PM, Robert Haas <robertmh...@gmail.com>
wrote:
>
> > it seems to me that
> > this is just one facet of a much more general problem: given two
> > transactions T1 and T2, the order of replay must match the order of
> > commit unless you can prove that there are no dependencies between
> > them.  I don't see why it matters whether the operations are sequence
> > operations or data operations; it's just a question of whether they're
> > modifying the same "stuff".


It matters because sequence operations aren't transactional in pg. Except
when they are - operations on a newly CREATEd sequence or one where we did
a TRUNCATE ...RESTART IDENTITY.

But we don't store the xid of the xact associated with a transactional
sequence update along with the sequence update anywhere. We just rely on nk
other xact knowing to look at the sequence relfilenode we're changing.
Doesn't work so well in logical rep.

We also don't store knowledge of whether or not the sequence advance is
transactional. Again important because for two xacts t1 and t2:

* Sequence last value is 50

* T1 calls nextval. Needs a new chunk because all cached values have been
used. Writes sequence wal advancing seq last_value to 100, returns 51.

* T2 calls nextval, gets cached value 52.

* T2 commits

* Master crashes and we fail over to replica.

This is fine for physical rep. We replay the sequence advance and all is
well.

But for logical rep the sequence can't be treated as part of t1. If t1
rolls back or we fail over before replying it we might return value 52 from
nextval even though we replayed and committed t2 that used value 52. Oops.

However if some xact t3 creates a sequence we can't replay updates to it
until the sequence relation is committed. And it's even more fun with
TRUNCATE ... RESTART IDENTITY where we need rollback behaviour too.

Make sense? It's hard because sequences are sometimes but not always exrmpt
from transactional behaviour and pg doesn't record when, since it can rely
on physical wal redo order and can apply sequence advances before the
sequence relation is committed yet.

>
> The commit order is the simplest and safest *unless* there is a
> read-write anti-dependency a/k/a read-write dependency a/k/a
> rw-conflict: where a read from one transaction sees the "before"
> version of data modified by the other transaction.  In such a case
> it is necessary for correct serializable transaction behavior for
> the transaction that read the "before" image to be replayed before
> the write it didn't see, regardless of commit order.  If you're not
> trying to avoid serialization anomalies, it is less clear to me
> what is best.

Could you provide an example of a case where xacts replayed in commit order
will produce incorrect results?

Remember that we aren't doing statement based replication in pg logical
decoding/replication. We don't care how a row got changed, only that we
make consistent transitions from before state to after state to for each
transaction, such that the data committed and visible on the master is
visible on the standby and no uncommitted or not yet visible data on the
master is committed/visible on the replica. The replica should have visible
committed data matching the master as it was when it originally executed
the xact we most recently replayed.

No locking is decoded or replayed. It is not expected that a normal non
replication client executing some other concurrent xact will have the same
effect if run on standby as on master.

It's replication not tightly coupled clustering. If/when we have things
like parallel decoding and replay of  concurrent xacts then issues like the
dependencies you mention will start to become a concern. We are a long way
from there.

For sequences the requirement IMO is that the sequence advances on the
replica to or past the position it was at on the master when the first xact
that saw those sequence values committed. We should never see the sequence
'behind' such that calling nextval on the replica can produce a value
already seen and stored by some committed xact on the replica. Being a bit
ahead is ok, much like pg discards sequence values on crash.

That's not that hard. The problems arise when the sequence it's self isn't
committed yet, per above.

Re: [HACKERS] Logical decoding of sequence advances, part II

Reply via email to