Re: POC: GROUP BY optimization

Andrei Lepikhov Wed, 29 May 2024 21:23:19 -0700

On 5/29/24 19:53, Alexander Korotkov wrote:

Hi, Andrei!


Thank you for your feedback.

On Wed, May 29, 2024 at 11:08 AM Andrei Lepikhov
<[email protected]> wrote:

On 5/27/24 19:41, Alexander Korotkov wrote:

Any thoughts?

About 0001:
Having overviewed it, I don't see any issues (but I'm the author),
except grammatical ones - but I'm not a native to judge it.
Also, the sentence 'turning GROUP BY clauses  into pathkeys' is unclear
to me. It may be better to write something like:  'building pathkeys by
the list of grouping clauses'.


OK, thank you.  I'll run once again for the grammar issues.

0002:
The part under USE_ASSERT_CHECKING looks good to me. But the code in
group_keys_reorder_by_pathkeys looks suspicious: of course, we do some
doubtful work without any possible way to reproduce, but if we envision
some duplicated elements in the group_clauses, we should avoid usage of
the list_concat_unique_ptr.


As I understand Tom, there is a risk that clauses list may contain
multiple instances of equivalent SortGroupClause, not duplicate
pointers.

Maybe. I just implemented the worst-case scenario with the intention ofmaximum safety. So, it's up to you.

What's more, why do you not exit from
foreach_ptr immediately after SortGroupClause has been found? I think
the new_group_clauses should be consistent with the new_group_pathkeys.


I wanted this to be consistent with preprocess_groupclause(), where
duplicate SortGroupClause'es are grouped together.  Otherwise, we
could just delete redundant SortGroupClause'es.

Hm, preprocess_groupclause is called before the standard_qp_callbackforms the 'canonical form' of group_pathkeys and such behaviour needed.But the code that chooses the reordering strategy uses already processedgrouping clauses, where we don't have duplicates in the firstnum_groupby_pathkeys elements, do we?

0004:
I was also thinking about reintroducing the preprocess_groupclause
because with the re-arrangement of GROUP-BY clauses according to
incoming pathkeys, it doesn't make sense to have a user-defined order—at
least while cost_sort doesn't differ costs for alternative column orderings.
So, I'm okay with the code. But why don't you use the same approach with
foreach_ptr as before?


I restored the function as it was before 0452b461bc with minimal edits
to support the incremental sort.  I think it would be more valuable to
keep the difference with pg16 code small rather than refactor to
simplify existing code.

Got it

--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

Reply via email to