Hi hackers, I saw a problem in logical replication, when the target table on subscriber is a partitioned table, it only checks whether the Replica Identity of partitioned table is consistent with the publisher, and doesn't check Replica Identity of the partition.
For example: -- publisher -- create table tbl (a int not null, b int); create unique INDEX ON tbl (a); alter table tbl replica identity using INDEX tbl_a_idx; create publication pub for table tbl; -- subscriber -- -- table tbl (parent table) has RI index, while table child has no RI index. create table tbl (a int not null, b int) partition by range (a); create table child partition of tbl default; create unique INDEX ON tbl (a); alter table tbl replica identity using INDEX tbl_a_idx; create subscription sub connection 'port=5432 dbname=postgres' publication pub; -- publisher -- insert into tbl values (11,11); update tbl set a=a+1; It caused an assertion failure on subscriber: TRAP: FailedAssertion("OidIsValid(idxoid) || (remoterel->replident == REPLICA_IDENTITY_FULL)", File: "worker.c", Line: 2088, PID: 1616523) The backtrace is attached. We got the assertion failure because idxoid is invalid, as table child has no Replica Identity or Primary Key. We have a check in check_relation_updatable(), but what it checked is table tbl (the parent table) and it passed the check. I think one approach to fix it is to check the target partition in this case, instead of the partitioned table. When trying to fix it, I saw some other problems about updating partition map cache. a) In logicalrep_partmap_invalidate_cb(), the type of the entry in LogicalRepPartMap should be LogicalRepPartMapEntry, instead of LogicalRepRelMapEntry. b) In logicalrep_partition_open(), it didn't check if the entry is valid. c) When the publisher send new relation mapping, only relation map cache will be updated, and partition map cache wouldn't. I think it also should be updated because it has remote relation information, too. Attach two patches which tried to fix them. 0001 patch: fix the above three problems about partition map cache. 0002 patch: fix the assertion failure, check the Replica Identity of the partition if the target table is a partitioned table. Thanks to Hou Zhijie for helping me in the first patch. I will add a test for it later if no one doesn't like this fix. Regards, Shi yu
v1-0001-Fix-partition-map-cache-issues.patch
Description: v1-0001-Fix-partition-map-cache-issues.patch
v1-0002-Check-partition-table-replica-identity-on-subscri.patch
Description: v1-0002-Check-partition-table-replica-identity-on-subscri.patch
#0 0x00007fd35e76970f in raise () from /lib64/libc.so.6 #1 0x00007fd35e753b25 in abort () from /lib64/libc.so.6 #2 0x0000000000fff9b4 in ExceptionalCondition (conditionName=0x1237a80 "OidIsValid(idxoid) || (remoterel->replident == REPLICA_IDENTITY_FULL)", errorType=0x1237208 "FailedAssertion", fileName=0x123737f "worker.c", lineNumber=2088) at assert.c:69 #3 0x0000000000c354d4 in FindReplTupleInLocalRel (estate=0x1c92240, localrel=0x7fd352ce7bf8, remoterel=0x1ca5b28, remoteslot=0x1c962a0, localslot=0x7ffc8b70c360) at worker.c:2087 #4 0x0000000000c35a0f in apply_handle_tuple_routing (edata=0x1c7b3b0, remoteslot=0x1c7b808, newtup=0x7ffc8b70c420, operation=CMD_UPDATE) at worker.c:2192 #5 0x0000000000c34a1d in apply_handle_update (s=0x7ffc8b70c4f0) at worker.c:1855 #6 0x0000000000c36cab in apply_dispatch (s=0x7ffc8b70c4f0) at worker.c:2481 #7 0x0000000000c37834 in LogicalRepApplyLoop (last_received=22131584) at worker.c:2757 #8 0x0000000000c39ee5 in start_apply (origin_startpos=0) at worker.c:3531 #9 0x0000000000c3ae14 in ApplyWorkerMain (main_arg=0) at worker.c:3787 #10 0x0000000000bbd903 in StartBackgroundWorker () at bgworker.c:858 #11 0x0000000000bd175a in do_start_bgworker (rw=0x1bd4910) at postmaster.c:5815 #12 0x0000000000bd1ea5 in maybe_start_bgworkers () at postmaster.c:6039 #13 0x0000000000bcfdb7 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5204 #14 <signal handler called> #15 0x00007fd35e8254ab in select () from /lib64/libc.so.6 #16 0x0000000000bc74fc in ServerLoop () at postmaster.c:1770 #17 0x0000000000bc6a67 in PostmasterMain (argc=3, argv=0x1ba9b00) at postmaster.c:1478 #18 0x0000000000a10b78 in main (argc=3, argv=0x1ba9b00) at main.c:202