Re: [HACKERS] why not parallel seq scan for slow functions

Amit Kapila Fri, 16 Feb 2018 18:48:06 -0800

On Fri, Feb 16, 2018 at 9:29 AM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> On Thu, Feb 15, 2018 at 7:47 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
>> <ashutosh.ba...@enterprisedb.com> wrote:
>>> I happened to look at the patch for something else. But here are some
>>> comments. If any of those have been already discussed, please feel
>>> free to ignore. I have gone through the thread cursorily, so I might
>>> have missed something.
>>>
>>> In grouping_planner() we call query_planner() first which builds the
>>> join tree and creates paths, then calculate the target for scan/join
>>> rel which is applied on the topmost scan join rel. I am wondering
>>> whether we can reverse this order to calculate the target list of
>>> scan/join relation and pass it to standard_join_search() (or the hook)
>>> through query_planner().
>>>
>>
>> I think the reason for doing in that order is that we can't compute
>> target's width till after query_planner().  See the below note in
>> code:
>>
>> /*
>> * Convert the query's result tlist into PathTarget format.
>> *
>> * Note: it's desirable to not do this till after query_planner(),
>> * because the target width estimates can use per-Var width numbers
>> * that were obtained within query_planner().
>> */
>> final_target = create_pathtarget(root, tlist);
>>
>> Now, I think we can try to juggle the code in a way that the width can
>> be computed later, but the rest can be done earlier.
>
> /* Convenience macro to get a PathTarget with valid cost/width fields */
> #define create_pathtarget(root, tlist) \
>     set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
> create_pathtarget already works that way. We will need to split it.
>
> Create the Pathtarget without widths. Apply width estimates once we
> know the width of Vars somewhere here in query_planner()
>     /*
>      * We should now have size estimates for every actual table involved in
>      * the query, and we also know which if any have been deleted from the
>      * query by join removal; so we can compute total_table_pages.
>      *
>      * Note that appendrels are not double-counted here, even though we don't
>      * bother to distinguish RelOptInfos for appendrel parents, because the
>      * parents will still have size zero.
>      *
> The next step is building the join tree. Set the pathtarget before that.
>
>> However, I think
>> that will be somewhat major change
>
> I agree.
>
>>  still won't address all kind of
>> cases (like for ordered paths) unless we can try to get all kind of
>> targets pushed down in the call stack.
>
> I didn't understand that.
>


The places where we use a target different than the target which is
pushed via query planner can cause a similar problem (ex. see the
usage of adjust_paths_for_srfs) because the cost of that target
wouldn't be taken into consideration for Gather's costing.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] why not parallel seq scan for slow functions

Reply via email to