On Thu, 22 Aug 2024 at 15:02, Ranier Vilela <ranier...@gmail.com> wrote:
> Hi. > > Em qui., 22 de ago. de 2024 às 04:34, Richard Guo <guofengli...@gmail.com> > escreveu: > >> I ran into a query plan where the Result node seems redundant to me: >> >> create table t (a int, b int, c int); >> insert into t select i%10, i%10, i%10 from generate_series(1,100)i; >> create index on t (a, b); >> analyze t; >> >> set enable_hashagg to off; >> set enable_seqscan to off; >> >> explain (verbose, costs off) >> select distinct b, a from t order by a, b; >> QUERY PLAN >> --------------------------------------------------------- >> Result >> Output: b, a >> -> Unique >> Output: a, b >> -> Index Only Scan using t_a_b_idx on public.t >> Output: a, b >> (6 rows) >> >> What I expect is that both the Scan node and the Unique node output >> 'b, a', and we do not need an additional projection step, something >> like: >> >> explain (verbose, costs off) >> select distinct b, a from t order by a, b; >> QUERY PLAN >> --------------------------------------------------- >> Unique >> Output: b, a >> -> Index Only Scan using t_a_b_idx on public.t >> Output: b, a >> (4 rows) >> >> I looked into this a little bit and found that in function >> create_ordered_paths, we decide whether a projection step is needed >> based on a simple pointer comparison between sorted_path->pathtarget >> and final_target. >> >> /* Add projection step if needed */ >> if (sorted_path->pathtarget != target) >> sorted_path = apply_projection_to_path(root, ordered_rel, >> sorted_path, target); >> >> This does not seem right to me, as PathTargets are not canonical, so >> we cannot guarantee that two identical PathTargets will have the same >> pointer. Actually, for the query above, the two PathTargets are >> identical but have different pointers. >> > Could memcmp solve this? > > With patch attached, using memcmp to compare the pointers. > > select distinct b, a from t order by a, b; > QUERY PLAN > ---------------------------------- > Sort > Output: b, a > Sort Key: t.a, t.b > -> HashAggregate > Output: b, a > Group Key: t.a, t.b > -> Seq Scan on public.t > Output: a, b, c > (8 rows) > > attached patch for consideration. > > best regards, > Ranier Vilela > +1 for the idea of removing this redundant node. I had a look in this patch, and I was wondering if we still need sorted_path->pathtarget != target in the condition. Apart from that, - if (sorted_path->pathtarget != target) + if (sorted_path->pathtarget != target && + memcmp(sorted_path->pathtarget, target, sizeof(PathTarget)) != 0) An extra space is there, please fix it. Some regression tests should be added for this. -- Regards, Rafia Sabih