Your email client seems to be adding additional vertical space to your emails. I've removed the additional newlines in the quotes. Are you able to fix the client so it does not do that?
On Sun, 8 Jan 2023 at 00:10, Ankit Kumar Pandey <itsanki...@gmail.com> wrote: > > On 07/01/23 09:58, David Rowley wrote: > > You also don't seem to be considering the fact that the query might > > have a DISTINCT clause. > > Major reason for this was that I am not exactly aware of what distinct > clause means (especially in > > context of window functions) and how it is different from other > sortClauses (partition, order by, group). I'm talking about the query's DISTINCT clause. i.e SELECT DISTINCT. If you look in the grouping_planner() function, you'll see that create_distinct_paths() is called between create_window_paths() and create_ordered_paths(). > Yes, this is a fair point. Multiple sort is actually beneficial in cases > like this, perhaps limits clause and runCondition should be no op too? I'm not sure what you mean by "no op". Do you mean not apply the optimization? > > I think the patch should also be using pathkeys_contained_in() and > > Lists of pathkeys rather than concatenating lists of SortGroupClauses > > together. That should allow things to work correctly when a given > > pathkey has become redundant due to either duplication or a Const in > > the Eclass. > > Make sense, I actually duplicated that logic from > make_pathkeys_for_window. We should make this changes there as well because > if we have SELECT rank() OVER (PARTITION BY a ORDER BY a) > (weird example but you get the idea), it leads to duplicates in > window_sortclauses. It won't lead to duplicate pathkeys. Look in make_pathkeys_for_sortclauses() and what pathkey_is_redundant() does. Notice that it checks if the pathkey already exists in the list before appending. > Agree with runConditions part but for limit clause, row reduction happens > at the last, so whether we use patched version or master version, > none of sorts would benefit/degrade from that, right? Maybe you're right. Just be aware that a sort done for a query with an ORDER BY LIMIT will perform a top-n sort. top-n sorts only need to store the top-n tuples and that can significantly reduce the memory required for sorting perhaps resulting in the sort fitting in memory rather than spilling out to disk. You might want to test this by having the leading sort column as an INT, and then the 2nd one as a long text column of maybe around two kilobytes. Make all the leading column values the same so that the comparison for the text column is always performed. Make the LIMIT small compared to the total number of rows, that should test the worse case. Check the performance with and without the limitCount != NULL part of the patch that disables the optimization for LIMIT. > Is it okay > to add comments in test cases? I don't see it much on existing cases > so kind of reluctant to add but it makes intentions much more clear. I think tests should always have a comment to state what they're testing. Not many people seem to do that, unfortunately. The problem with not stating what the test is testing is that if, for example, the test is checking that the EXPLAIN output is showing a Sort, what if at some point in the future someone adjusts some costing code and the plan changes to an Index Scan. If there's no comment to state that we're looking for a Sort plan, then the author of the patch that's adjusting the costs might just think it's ok to change the expected plan to an Index Scan. I've seen this problem occur even when the comments *do* exist. There's just about no hope of such a test continuing to do what it's meant to if the comments don't exist. David