On 4 October 2017 at 07:35, Petr Jelinek <petr.jeli...@2ndquadrant.com> wrote:
> On 02/10/17 18:59, Petr Jelinek wrote: > >> > >> Now fix the trigger function: > >> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER > AS $$ > >> BEGIN > >> RETURN NEW; > >> END $$ LANGUAGE plpgsql; > >> > >> And manually perform at master two updates inside one transaction: > >> > >> postgres=# begin; > >> BEGIN > >> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26; > >> UPDATE 1 > >> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26; > >> UPDATE 1 > >> postgres=# commit; > >> <hangs> > >> > >> and in replica log we see: > >> 2017-10-02 18:40:26.094 MSK [2954] LOG: logical replication apply > >> worker for subscription "sub" has started > >> 2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible > >> tuple > >> 2017-10-02 18:40:26.102 MSK [2882] LOG: worker process: logical > >> replication worker for subscription 16403 (PID 2954) exited with exit > >> code 1 > >> > >> Error happens in trigger.c: > >> > >> #3 0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50, > >> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac, > >> lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at > >> trigger.c:3103 > >> #4 0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50, > >> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac, > >> fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748 > >> #5 0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50, > >> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240) > >> at execReplication.c:461 > >> #6 0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at > >> worker.c:736 > > > > We have locked the same tuple in RelationFindReplTupleByIndex() just > > before this gets called and didn't get the same error. I guess we do > > something wrong with snapshots. Will need to investigate more. > > > > Okay, so it's not snapshot. It's the fact that we don't set the > es_output_cid in replication worker which GetTupleForTrigger is using > when locking the tuple. Attached one-liner fixes it. > This seems like a clear-cut bug with a simple fix. Lets get this committed, so we don't lose it. The rest of the thread is going off into the weeds a bit issues unrelated to the original problem. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services