On Fri, Oct 9, 2020 at 11:58 PM Greg Nancarrow <gregn4...@gmail.com> wrote: > On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > That will be true for the number of rows/pages we need to scan not for > > the number of tuples we need to return as a result. The formula here > > considers the number of rows the parallel scan will return and the > > more the number of rows each parallel node needs to pass via shared > > memory to gather node the more costly it will be. > > > > We do consider the total pages we need to scan in > > compute_parallel_worker() where we use a logarithmic formula to > > determine the number of workers. > > Despite all the best intentions, the current costings seem to be > geared towards selection of a non-parallel plan over a parallel plan, > the more rows there are in the table. Yet the performance of a > parallel plan appears to be better than non-parallel-plan the more > rows there are in the table.
Right, but as Amit said, we still have to account for the cost of schlepping tuples between processes. Hmm... could the problem be that we're incorrectly estimating that Insert (without RETURNING) will send a bazillion tuples, even though that isn't true? I didn't look at the code but that's what the plan seems to imply when it says stuff like "Gather (cost=15428.00..16101.14 rows=1000000 width=4)". I suppose the row estimates for ModifyTable paths are based on what they write, not what they emit, and in the past that distinction didn't matter much because it wasn't something that was used for comparing alternative plans. Now it is.