On Fri, Oct 9, 2020 at 4:28 PM Greg Nancarrow <gregn4...@gmail.com> wrote: > > On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <gregn4...@gmail.com> wrote: > > > > > > Speaking of costing, I'm not sure I really agree with the current > > > costing of a Gather node. Just considering a simple Parallel SeqScan > > > case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of > > > Gather cost always completely drowns out any other path costs when a > > > large number of rows are involved (at least with default > > > parallel-related GUC values), such that Parallel SeqScan would never > > > be the cheapest path. This linear relationship in the costing based on > > > the rows and a parallel_tuple_cost doesn't make sense to me. Surely > > > after a certain amount of rows, the overhead of launching workers will > > > be out-weighed by the benefit of their parallel work, such that the > > > more rows, the more likely a Parallel SeqScan will benefit. > > > > > > > That will be true for the number of rows/pages we need to scan not for > > the number of tuples we need to return as a result. The formula here > > considers the number of rows the parallel scan will return and the > > more the number of rows each parallel node needs to pass via shared > > memory to gather node the more costly it will be. > > > > We do consider the total pages we need to scan in > > compute_parallel_worker() where we use a logarithmic formula to > > determine the number of workers. > > > > Despite all the best intentions, the current costings seem to be > geared towards selection of a non-parallel plan over a parallel plan, > the more rows there are in the table. Yet the performance of a > parallel plan appears to be better than non-parallel-plan the more > rows there are in the table. > This doesn't seem right to me. Is there a rationale behind this costing model? >
Yes, AFAIK, there is no proof that we can get any (much) gain by dividing the I/O among workers. It is primarily the CPU effort which gives the benefit. So, the parallel plans show greater benefit when we have to scan a large table and then project much lesser rows. -- With Regards, Amit Kapila.