On Tue, Jul 11, 2017 at 9:02 AM, Jeff Janes <jeff.ja...@gmail.com> wrote: > If I have a slow function which is evaluated in a simple seq scan, I do not > get parallel execution, even though it would be massively useful. Unless > force_parallel_mode=ON, then I get a dummy parallel plan with one worker. > > explain select aid,slow(abalance) from pgbench_accounts;
After analysing this, I see multiple reasons of this getting not selected 1. The query is selecting all the tuple and the benefit what we are getting by parallelism is by dividing cpu_tuple_cost which is 0.01 but for each tuple sent from worker to gather there is parallel_tuple_cost which is 0.1 for each tuple. (which will be very less in case of aggregate). Maybe you can try some selecting with some condition. like below: postgres=# explain select slow(abalance) from pgbench_accounts where abalance > 1; QUERY PLAN ----------------------------------------------------------------------------------- Gather (cost=0.00..46602.33 rows=1 width=4) Workers Planned: 2 -> Parallel Seq Scan on pgbench_accounts (cost=0.00..46602.33 rows=1 width=4) Filter: (abalance > 1) 2. The second problem I am seeing is that (maybe the code problem), the "slow" function is very costly (10000000) and in apply_projection_to_path we account for this cost. But, I have noticed that for gather node also we are adding this cost to all the rows but actually, if the lower node is already doing the projection then gather node just need to send out the tuple instead of actually applying the projection. In below function, we always multiply the target->cost.per_tuple with path->rows, but in case of gather it should multiply this with subpath->rows apply_projection_to_path() .... path->startup_cost += target->cost.startup - oldcost.startup; path->total_cost += target->cost.startup - oldcost.startup + (target->cost.per_tuple - oldcost.per_tuple) * path->rows; So because of this high projection cost the seqpath and parallel path both have fuzzily same cost but seqpath is winning because it's parallel safe. > > CREATE OR REPLACE FUNCTION slow(integer) > RETURNS integer > LANGUAGE plperl > IMMUTABLE PARALLEL SAFE STRICT COST 10000000 > AS $function$ > my $thing=$_[0]; > foreach (1..1_000_000) { > $thing = sqrt($thing); > $thing *= $thing; > }; > return ($thing+0); > $function$; > > The partial path is getting added to the list of paths, it is just not > getting chosen, even if parallel_*_cost are set to zero. Why not? > > If I do an aggregate, then it does use parallel workers: > > explain select sum(slow(abalance)) from pgbench_accounts; > > It doesn't use as many as I would like, because there is a limit based on > the logarithm of the table size (I'm using -s 10 and get 3 parallel > processes) , but at least I know how to start looking into that. > > Also, how do you debug stuff like this? Are there some gdb tricks to make > this easier to introspect into the plans? > > Cheers, > > Jeff -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers