On Mon, Sep 14, 2020 at 4:50 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, Sep 14, 2020 at 1:23 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > On Mon, Sep 14, 2020 at 8:48 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > > > > Yeah, this is right, and here is some initial analysis. It seems to be > > > failing in below code: > > > rel_sync_cache_relation_cb(){ ...list_free(entry->streamed_txns);..} > > > > > > This list can have elements only in 'streaming' mode (need to enable > > > 'streaming' with Create Subscription command) whereas none of the > > > tests in 010_truncate.pl is using 'streaming', so this list should be > > > empty (NULL). The two different assertion failures shown in BF reports > > > in list_free code are as below: > > > Assert(list->length > 0); > > > Assert(list->length <= list->max_length); > > > > > > It seems to me that this list is not initialized properly when it is > > > not used or maybe that is true in some special circumstances because > > > we initialize it in get_rel_sync_entry(). I am not sure if CCI build > > > is impacting this in some way. > > > > > > Even I have analyzed this but did not find any reason why the > > streamed_txns list should be anything other than NULL. The only thing > > is we are initializing the entry->streamed_txns to NULL and the list > > free is checking "if (list == NIL)" then return. However IMHO, that > > should not be an issue becase NIL is defined as (List*) NULL. > > > > Yeah, that is not the issue but it is better to initialize it with NIL > for the sake of consistency. The basic issue here was we were trying > to open/lock the relation(s) before initializing this list. Now, when > we process the invalidations during open relation, we try to access > this list in rel_sync_cache_relation_cb and that leads to assertion > failure. I have reproduced the exact scenario of 010_truncate.pl via > debugger. Basically, the backend on publisher has sent the > invalidation after truncating the relation 'tab1' and while processing > the truncate message if WALSender receives that message exactly after > creating the RelSyncEntry for 'tab1', the Assertion shown in BF can be > reproduced.
Yeah, this is an issue and I am also able to reproduce this manually using gdb. Basically, I have inserted some data in publication table and after that, I stopped in get_rel_sync_entry after creating the reentry and before calling GetRelationPublications. Meanwhile, I have truncated this table and then it hit the same issue you pointed here. > The attached patch will fix the issue. What do you think? The patch looks good to me and fixing the reported issue. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com