On Tue, Aug 29, 2017 at 10:05 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapil...@gmail.com> writes: >> On Tue, Aug 29, 2017 at 8:32 AM, Robert Haas <robertmh...@gmail.com> wrote: >>> There's already ExecParallelReinitialize, which could be made to walk >>> the nodes in addition to what it does already, but I don't understand >>> exactly what else needs fixing. > >> Sure, but it is not advisable to reset state of all the nodes below >> gather at that place, otherwise, it will be more or less like we are >> forcing rescan of each node. I think there we can reset the shared >> parallel state of parallel-aware nodes (or anything related) and then >> allow rescan to reset the master backend specific state for all nodes >> beneath gather. > > Right, the idea is to make this happen separately from the "rescan" > logic. In general, it's a good idea for ExecReScanFoo to do as little > as possible, so that you don't pay if a node is rescanned more than > once before it's asked to do anything, or indeed if no rows are ever > demanded from it at all. > > Attached is a WIP patch along this line. >
The idea looks sane to me. > It's unfinished because > I've not done the tedium of extending the FDW and CustomScan APIs > to support this new type of per-node operation; but that part seems > straightforward enough. The core code is complete and survived > light testing. > I have also played a bit with both of the patches together and didn't found any problem. In your second patch, I have a minor comment. void ExecReScanGather(GatherState *node) { ! /* Make sure any existing workers are gracefully shut down */ ExecShutdownGatherWorkers(node); The above call doesn't ensure the shutdown. It just ensures that we receive all messages from parallel workers. Basically, it doesn't call WaitForParallelWorkersToExit. > I'm pretty happy with the results --- note in > particular how we get rid of some very dubious coding in > ExecReScanIndexScan and ExecReScanIndexOnlyScan. > > If you try the test case from a2b70c89c on this patch alone, you'll notice > that instead of sometimes reporting too-small counts during the rescans, > it pretty consistently reports too-large counts. This is because we are > now successfully resetting the shared state for the parallel seqscan, but > we haven't done anything about the leader's HashAgg node deciding that it > can re-use its old hashtable. So on the first scan, the leader typically > scans all or most of the table because of its startup time advantage, and > saves those counts in its hashtable. On later scans, the workers read all > of the table while the leader decides it need do no scanning. So we get > counts that reflect all of the table (from the workers) plus whatever part > of the table the leader read the first time. So this by no means removes > the need for my other patch. > > If no objections, I'll do the additional legwork and push. > No objections. > As before, > I think we can probably get away without fixing 9.6, even though it's > nominally got the same bug. > +1. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers