On Thu, May 20, 2021 at 11:18 AM Pengchengliu <pengcheng...@tju.edu.cn> wrote: > > Hi Greg, > Thanks a lot for you explanation and your fix. > > I think your fix can resolve the core dump issue. As with your fix, > parallel process reset Transaction Xmin from ActiveSnapshot. > But it will change Transaction snapshot for all parallel scenarios. I > don't know whether it bring in other issue. > For test only, I think it is enough. > > So is there anybody can explain what's exactly difference between > ActiveSnapshot and TransactionSnapshot in parallel work process. > Then maybe we can find a better solution and try to fix it really. >
I am proposing the attached patch to fix this issue (patches for both PG13.2 and latest PG14 source are provided). Perhaps this will trigger others who better know the intricacies of snapshot handling, and know the reasons and history behind why the transaction_snapshot and active_snapshot are currently passed separately to parallel workers, to speak up here and discuss the issue further, and check my fix (and note, I haven't attempted to modify README.parallel in the patch until I get further clarification on this issue). Perhaps someone can explain the purpose of calling GetTransactionSnapshot() AGAIN in InitializeParallelDSM() and how is this consistent with the current ActiveSnapshot? AFAICS, that doesn't seem correct, and that's why the patch removes it. I've rebuilt Postgres with the patch applied and run the regression tests, with and without "force_parallel_mode=regress", and all tests pass. So if the fix is wrong, no tests currently exist that detect issues with it. Regards, Greg Nancarrow Fujitsu Australia
v1-0001-PG14-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data
v1-0001-PG13_2-Fix-parallel-worker-failed-assertion-and-coredump.patch
Description: Binary data