On Wed, 10 Apr 2019 at 21:39, Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On 2019-04-10 12:11:21 +0530, tushar wrote: > > > > On 03/13/2019 08:40 PM, tushar wrote: > > > Hi , > > > > > > I am getting a server crash on standby while executing > > > pg_logical_slot_get_changes function , please refer this scenario > > > > > > Master cluster( ./initdb -D master) > > > set wal_level='hot_standby in master/postgresql.conf file > > > start the server , connect to psql terminal and create a physical > > > replication slot ( SELECT * from > > > pg_create_physical_replication_slot('p1');) > > > > > > perform pg_basebackup using --slot 'p1' (./pg_basebackup -D slave/ -R > > > --slot p1 -v)) > > > set wal_level='logical' , hot_standby_feedback=on, > > > primary_slot_name='p1' in slave/postgresql.conf file > > > start the server , connect to psql terminal and create a logical > > > replication slot ( SELECT * from > > > pg_create_logical_replication_slot('t','test_decoding');) > > > > > > run pgbench ( ./pgbench -i -s 10 postgres) on master and select > > > pg_logical_slot_get_changes on Slave database > > > > > > postgres=# select * from pg_logical_slot_get_changes('t',null,null); > > > 2019-03-13 20:34:50.274 IST [26817] LOG: starting logical decoding for > > > slot "t" > > > 2019-03-13 20:34:50.274 IST [26817] DETAIL: Streaming transactions > > > committing after 0/6C000060, reading WAL from 0/6C000028. > > > 2019-03-13 20:34:50.274 IST [26817] STATEMENT: select * from > > > pg_logical_slot_get_changes('t',null,null); > > > 2019-03-13 20:34:50.275 IST [26817] LOG: logical decoding found > > > consistent point at 0/6C000028 > > > 2019-03-13 20:34:50.275 IST [26817] DETAIL: There are no running > > > transactions. > > > 2019-03-13 20:34:50.275 IST [26817] STATEMENT: select * from > > > pg_logical_slot_get_changes('t',null,null); > > > TRAP: FailedAssertion("!(data == tupledata + tuplelen)", File: > > > "decode.c", Line: 977) > > > server closed the connection unexpectedly > > > This probably means the server terminated abnormally > > > before or while processing the request. > > > The connection to the server was lost. Attempting reset: 2019-03-13 > > > 20:34:50.276 IST [26809] LOG: server process (PID 26817) was terminated > > > by signal 6: Aborted > > > > > Andres - Do you think - this is an issue which needs to be fixed ? > > Yes, it definitely needs to be fixed. I just haven't had sufficient time > to look into it. Have you reproduced this with Amit's latest version? > > Amit, have you spent any time looking into it? I know that you're not > that deeply steeped into the internals of logical decoding, but perhaps > there's something obvious going on.
I tried to see if I can quickly understand what's going on. Here, master wal_level is hot_standby, not logical, though slave wal_level is logical. On slave, when pg_logical_slot_get_changes() is run, in DecodeMultiInsert(), it does not get any WAL records having XLH_INSERT_CONTAINS_NEW_TUPLE set. So data pointer is never incremented, it remains at tupledata. So at the end of the function, this assertion fails : Assert(data == tupledata + tuplelen); because data is actually at tupledata. Not sure why this is happening. On slave, wal_level is logical, so logical records should have tuple data. Not sure what does that have to do with wal_level of master. Everything should be there on slave after it replays the inserts; and also slave wal_level is logical. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company