Andrei Lepikhov писал(а) 2025-04-24 16:01:
On 3/28/25 09:19, Alexander Pyhalov wrote:
Andy Fan писал(а) 2024-10-17 03:33:
I've updated patch. One more interesting case which we found - when
fractional path is selected, it still can be more expensive than
sorted cheapest total path (as we look only on indexes whith necessary
pathkeys, not on indexes which allow efficiently fetch data).
So far couldn't find artificial example, but we've seen inadequate
index selection due to this issue - instead of using index suited for
filters in where, index, suitable for sorting was selected as one
having the cheapest fractional cost.
I think it is necessary to generalise the approach a little.
Each MergeAppend subpath candidate that fits pathkeys should be
compared to the overall-optimal path + explicit Sort node. Let's label
this two-node composition as base_path. There are three cases exist:
startup-optimal, total-optimal and fractional-optimal.
In the startup case, base_path should use cheapest_startup_path, the
total-optimal case - cheapest_total_path and for a fractional case, we
may employ the get_cheapest_fractional_path routine to detect proper
base_path.
It may provide a more effective plan either in full, fractional and
partial scan cases:
1. The Sort node may be pushed down to subpaths under a parallel or
async Append.
2. When a minor set of subpaths doesn't have a proper index, and it is
profitable to sort them instead of switching to plain Append.
In general, analysing the regression tests changed by this code, I see
that the cost model prefers a series of small sortings instead of a
single massive one. May be it will show some profit for execution time.
I am not afraid of any palpable planning overhead here because we just
do cheap cost estimation and comparison operations that don't need
additional memory allocations. The caller is responsible for building a
proper Sort node if this method is chosen as optimal.
In the attachment, see the patch written according to the idea. There
are I introduced two new routines:
get_cheapest_path_for_pathkeys_ext
get_cheapest_fractional_path_for_pathkeys_ext
Hi. I'm a bit confused that
get_cheapest_fractional_path_for_pathkeys_ext() looks only on sorting
cheapest fractional path, and get_cheapest_path_for_pathkeys_ext() in
STARTUP_COST case looks only on sorting cheapest_startup_path.
Usually, sorted cheapest_total_path will be cheaper than sorted
fractional/startup path at least by startup cost (as after sorting it
includes total_cost of input path). But we ignore this case when
selecting cheapest_startup and cheapest_fractional subpaths. As result
selected cheapest_startup and cheapest_fractional can be not cheapest
for startup or selecting a fraction of rows.
Consider the partition with the following access paths:
1) cheapest_startup without required pathkeys:
startup_cost: 0.42
total_cost: 4004
2) some index_path with required pathkeys:
startup_cost: 6.6
total_cost: 2000
3) cheapest_total_path:
startup_cost: 0.42
total_cost: 3.48
Here, when selecting cheapest startup subpath we'll compare costs of
index path (2) and sorted cheapest_startup (1), but will ignore sorted
cheapest_total_path (3), despite the fact that it really can be the
cheapest startup path, providing required sorting order.
--
Best regards,
Alexander Pyhalov,
Postgres Professional