On Friday, May 21, 2021 4:43 PM Amit Langote <amitlangot...@gmail.com> wrote: > On Fri, May 21, 2021 at 3:55 PM osumi.takami...@fujitsu.com > <osumi.takami...@fujitsu.com> wrote: > > But, I've detected segmentation faults caused by the patch, which can > > happen during 100_bugs.pl in src/test/subscription. > > This happens more than one in ten times. > > > > This problem would be a timing issue and has been introduced by v3 > already. > > I used v5 for HEAD also and reproduced this failure, while OSS HEAD > > doesn't reproduce this, even when I executed 100_bugs.pl 200 times in a > tight loop. > > I aligned the commit id 4f586fe2 for all check. Below logs are ones I got > > from > v3. > > > > My first guess of the cause is that between the timing to get an entry > > from hash_search() in get_rel_sync_entry() and to set the map by > > convert_tuples_by_name() in maybe_send_schema(), we had invalidation > message, which tries to free unset descs in the entry ? > > Hmm, maybe get_rel_syn_entry() should explicitly set map to NULL when first > initializing an entry. It's possible that without doing so, the map remains > set > to a garbage value, which causes the invalidation callback that runs into such > partially initialized entry to segfault upon trying to deference that garbage > pointer. Just in case, I prepared a new PG and did a check to make get_rel_sync_entry() print its first pointer value with elog. Here, when I executed 100_bugs.pl, I got some garbage below.
* The change I did: @@ -1011,6 +1011,7 @@ get_rel_sync_entry(PGOutputData *data, Oid relid) entry->pubactions.pubinsert = entry->pubactions.pubupdate = entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false; entry->publish_as_relid = InvalidOid; + elog(LOG, "**> the pointer's default value : %p", entry->map); } * Grep result of all logs from 100_bugs.pl 2021-05-21 09:05:56.132 UTC [29122] sub1 LOG: **> the pointer's default value : (nil) 2021-05-21 09:06:11.556 UTC [30198] testsub1 LOG: **> the pointer's default value : (nil) 2021-05-21 09:06:11.561 UTC [30200] pg_16389_sync_16384_6964667281140237667 LOG: **> the pointer's default value : 0x7f7f7f7f7f7f7f7f 2021-05-21 09:06:11.567 UTC [30191] testsub2 LOG: **> the pointer's default value : (nil) 2021-05-21 09:06:11.570 UTC [30194] pg_16387_sync_16384_6964667292923737489 LOG: **> the pointer's default value : 0x7f7f7f7f7f7f7f7f 2021-05-21 09:06:02.513 UTC [29809] testsub LOG: **> the pointer's default value : (nil) 2021-05-21 09:06:02.557 UTC [29809] testsub LOG: **> the pointer's default value : (nil) So, your solution is right, I think. > I've tried that in the attached v6 patches. Please check. With this fix, I don't get the failure. I executed 100_bugs.pl 100 times in a loop and didn't face that problem. Again, I conducted one make check-world for each combination * use OSS HEAD or PG13 * apply only the first patch or both two patches Those all passed. Best Regards, Takamichi Osumi