On Mon, May 22, 2017 at 2:54 PM, Rafia Sabih <rafia.sa...@enterprisedb.com> wrote: > On Wed, May 17, 2017 at 2:57 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> On Tue, May 16, 2017 at 2:14 PM, Ashutosh Bapat >> <ashutosh.ba...@enterprisedb.com> wrote: >>> On Mon, May 15, 2017 at 9:23 PM, Robert Haas <robertmh...@gmail.com> wrote: >>> >>> Also, looking at the patch, it doesn't look like it take enough care >>> to build execution state of new worker so that it can participate in a >>> running query. I may be wrong, but the execution state initialization >>> routines are written with the assumption that all the workers start >>> simultaneously? >>> >> >> No such assumptions, workers started later can also join the execution >> of the query. >> > If we are talking of run-time allocation of workers I'd like to > propose an idea to safeguard parallelism from selectivity-estimation > errors. Start each query (if it qualifies for the use of parallelism) > with a minimum number of workers (say 2) irrespective of the #planned > workers. Then as query proceeds and we find that there is more work to > do, we allocate more workers. > > Let's get to the details a little, we'll have following new variables, > - T_int - a time interval at which we'll periodically check if the > query requires more workers, > - work_remaining - a variable which estimates the work yet to do. This > will use the selectivity estimates to find the total work done and the > remaining work accordingly. Once, the actual number of rows crosses > the estimated number of rows, take maximum possible tuples for that > operator as the new estimate. > > Now, we'll check at gather, after each T_int if the work is remaining > and allocate another 2 (say) workers. This way we'll keep on adding > the workers in small chunks and not in one go. Thus, saving resources > in case over-estimation is done. > I understand your concern about selectivity estimation error which affects the number of workers planned as well. But, in that case, I would like to fix the optimizer so that it calculates the number of workers correctly. If the optimizer thinks that we should start with n number of workers, probably we SHOULD start with n number of workers.
However, error in selectivity estimation(The root of all evil, the Achilles Heel of query optimization, according to Guy Lohman et al. :)) can always prove the optimizer wrong. In that case, +1 for your suggested approach of dynamically add or kill some workers based on the estimated work left to do. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers