On Thu, 22 Aug 2024 at 15:02, Ranier Vilela <ranier...@gmail.com> wrote:

> Hi.
>
> Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofengli...@gmail.com>
> escreveu:
>
>> I ran into a query plan where the Result node seems redundant to me:
>>
>> create table t (a int, b int, c int);
>> insert into t select i%10, i%10, i%10 from generate_series(1,100)i;
>> create index on t (a, b);
>> analyze t;
>>
>> set enable_hashagg to off;
>> set enable_seqscan to off;
>>
>> explain (verbose, costs off)
>> select distinct b, a from t order by a, b;
>>                        QUERY PLAN
>> ---------------------------------------------------------
>>  Result
>>    Output: b, a
>>    ->  Unique
>>          Output: a, b
>>          ->  Index Only Scan using t_a_b_idx on public.t
>>                Output: a, b
>> (6 rows)
>>
>> What I expect is that both the Scan node and the Unique node output
>> 'b, a', and we do not need an additional projection step, something
>> like:
>>
>> explain (verbose, costs off)
>> select distinct b, a from t order by a, b;
>>                     QUERY PLAN
>> ---------------------------------------------------
>>  Unique
>>    Output: b, a
>>    ->  Index Only Scan using t_a_b_idx on public.t
>>          Output: b, a
>> (4 rows)
>>
>> I looked into this a little bit and found that in function
>> create_ordered_paths, we decide whether a projection step is needed
>> based on a simple pointer comparison between sorted_path->pathtarget
>> and final_target.
>>
>>     /* Add projection step if needed */
>>     if (sorted_path->pathtarget != target)
>>         sorted_path = apply_projection_to_path(root, ordered_rel,
>>                                                sorted_path, target);
>>
>> This does not seem right to me, as PathTargets are not canonical, so
>> we cannot guarantee that two identical PathTargets will have the same
>> pointer.  Actually, for the query above, the two PathTargets are
>> identical but have different pointers.
>>
> Could memcmp solve this?
>
> With patch attached, using memcmp to compare the pointers.
>
> select distinct b, a from t order by a, b;
>             QUERY PLAN
> ----------------------------------
>  Sort
>    Output: b, a
>    Sort Key: t.a, t.b
>    ->  HashAggregate
>          Output: b, a
>          Group Key: t.a, t.b
>          ->  Seq Scan on public.t
>                Output: a, b, c
> (8 rows)
>
> attached patch for consideration.
>
> best regards,
> Ranier Vilela
>

+1 for the idea of removing this redundant node.
I had a look in this patch, and I was wondering if we still need
sorted_path->pathtarget != target in the condition.

Apart from that,
-               if (sorted_path->pathtarget != target)
+               if (sorted_path->pathtarget != target &&
+                   memcmp(sorted_path->pathtarget, target,
sizeof(PathTarget)) != 0)
An extra space is there, please fix it.

Some regression tests should be added for this.

-- 
Regards,
Rafia Sabih

Reply via email to