Re: [HACKERS] why not parallel seq scan for slow functions

Amit Kapila Tue, 08 Aug 2017 00:51:07 -0700

On Wed, Aug 2, 2017 at 11:12 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
>>
>> On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
>> > On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapil...@gmail.com>
>> > wrote:
>> >>
>> >> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.ja...@gmail.com>
>> >> wrote:
>> >> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbal...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> So because of this high projection cost the seqpath and parallel
>> >> >> path
>> >> >> both have fuzzily same cost but seqpath is winning because it's
>> >> >> parallel safe.
>> >> >
>> >> >
>> >> > I think you are correct.  However, unless parallel_tuple_cost is set
>> >> > very
>> >> > low, apply_projection_to_path never gets called with the Gather path
>> >> > as
>> >> > an
>> >> > argument.  It gets ruled out at some earlier stage, presumably
>> >> > because
>> >> > it
>> >> > assumes the projection step cannot make it win if it is already
>> >> > behind
>> >> > by
>> >> > enough.
>> >> >
>> >>
>> >> I think that is genuine because tuple communication cost is very high.
>> >
>> >
>> > Sorry, I don't know which you think is genuine, the early pruning or my
>> > complaint about the early pruning.
>> >
>>
>> Early pruning.  See, currently, we don't have a way to maintain both
>> parallel and non-parallel paths till later stage and then decide which
>> one is better. If we want to maintain both parallel and non-parallel
>> paths, it can increase planning cost substantially in the case of
>> joins.  Now, surely it can have benefit in many cases, so it is a
>> worthwhile direction to pursue.
>
>
> If I understand it correctly, we have a way, it just can lead to exponential
> explosion problem, so we are afraid to use it, correct?  If I just
> lobotomize the path domination code (make pathnode.c line 466 always test
> false)
>
>                 if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT)
>
> Then it keeps the parallel plan and later chooses to use it (after applying
> your other patch in this thread) as the overall best plan.  It even doesn't
> slow down "make installcheck-parallel" by very much, which I guess just
> means the regression tests don't have a lot of complex joins.
>
> But what is an acceptable solution?  Is there a heuristic for when retaining
> a parallel path could be helpful, the same way there is for fast-start
> paths?  It seems like the best thing would be to include the evaluation
> costs in the first place at this step.
>
> Why is the path-cost domination code run before the cost of the function
> evaluation is included?


Because the function evaluation is part of target list and we create
path target after the creation of base paths (See call to
create_pathtarget @ planner.c:1696).

>  Is that because the information needed to compute
> it is not available at that point,

Right.

I see two ways to include the cost of the target list for parallel
paths before rejecting them (a) Don't reject parallel paths
(Gather/GatherMerge) during add_path.  This has the danger of path
explosion. (b)  In the case of parallel paths, somehow try to identify
that path has a costly target list (maybe just check if the target
list has anything other than vars) and use it as a heuristic to decide
that whether a parallel path can be retained.

I think the preference will be to do something on the lines of
approach (b), but I am not sure whether we can easily do that.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

Reply via email to