On Tuesday, December 14, 2021 3:42 PM houzj.f...@fujitsu.com <houzj.f...@fujitsu.com> wrote: > > On Sat, Nov 20, 2021 7:31 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila <amit.kapil...@gmail.com> > > wrote: > > > > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote > > > <amitlangot...@gmail.com> > > wrote: > > > > > > > > The problematic case is attaching the partition *after* the > > > > subscriber has already marked the root parent as synced and/or > > > > ready for replication. Refreshing the subscription doesn't help > > > > it discover the newly attached partition, because a > > > > publish_via_partition_root only ever tells about the root parent, > > > > which would be already synced, so the subscriber would think > > > > there's nothing to copy. > > > > > > > > > > Okay, I see this could be a problem but I haven't tried to reproduce it. > > > > One more thing you mentioned is that the initial sync won't work after > > refresh but later changes will be replicated but I noticed that later > > changes also don't get streamed till we restart the subscriber server. > > I am not sure but we might not be invalidating apply workers cache due > > to which it didn't notice the same. > > I investigated this bug recently, and I think the reason is that when > receiving > relcache invalidation message, the callback function[1] in walsender only > reset > the schema sent status while it doesn't reset the replicate_valid flag. So, it > won’t rebuild the publication actions of the relation. > > [1] > static void > rel_sync_cache_relation_cb(Datum arg, Oid relid) ... > /* > * Reset schema sent status as the relation definition may have > changed. > * Also free any objects that depended on the earlier definition. > */ > if (entry != NULL) > { > entry->schema_sent = false; > list_free(entry->streamed_txns); > ... > > Also, when you DETACH a partition, the publication won’t be rebuilt too > because of the same reason. Which could cause unexpected behavior if we > modify the detached table's data . And the bug happens regardless of whether > pubviaroot is set or not. > > For the fix: > > I think if we also reset replicate_valid flag in rel_sync_cache_relation_cb, > then > the bug can be fixed. I have a bit hesitation about this approach, because it > could increase the frequency of invalidating and rebuilding the publication > action. But I haven't produced some other better approaches. >
I have confirmed that the bug of ATTACH PARTITION has been fixed due to recent commit 7f481b8. Currently, we always invalidate the RelationSyncCache when attaching a partition, so the pubactions of the newly attached partition will be rebuilt correctly. Best regards, Hou zj