Re: On disable_cost

Heikki Linnakangas Tue, 02 Jul 2024 07:57:45 -0700

On 28/06/2024 18:46, Robert Haas wrote:

On Wed, Jun 12, 2024 at 11:35 AM Robert Haas <robertmh...@gmail.com> wrote:

Well, that didn't generate much discussion, but here I am trying
again. Here I've got patches 0001 and 0002 from my previous posting;
I've dropped 0003 and 0004 from the previous set for now so as not to
distract from the main event, but they may still be a good idea.
Instead I've got an 0003 and an 0004 that implement the "count of
disabled nodes" approach that we have discussed previously. This seems
to work fine, unlike the approaches I tried earlier. I think this is
the right direction to go, but I'd like to know what concerns people
might have.


Here is a rebased patch set, where I also fixed pgindent damage and a
couple of small oversights in 0004.

I am hoping to get these committed some time in July. So if somebody
thinks that's too soon or thinks it shouldn't happen at all, please
don't wait too long to let me know about that.


v3-0001-Remove-grotty-use-of-disable_cost-for-TID-scan-pl.patch:

+1, this seems ready for commit

v3-0002-Rationalize-behavior-of-enable_indexscan-and-enab.patch:

I fear this will break people's applications, if they are currentlyforcing a sequential scan with "set enable_indexscan=off". Now they willneed to do "set enable_indexscan=off; set enable_indexonlyscan=off" forthe same effect. Maybe it's acceptable, disabling sequential scans toforce an index scan is much more common than the other way round.


v3-0003-Treat-number-of-disabled-nodes-in-a-path-as-a-sep.patch:

@@ -1318,6 +1342,12 @@ cost_tidscan(Path *path, PlannerInfo *root,
        startup_cost += path->pathtarget->cost.startup;
        run_cost += path->pathtarget->cost.per_tuple * path->rows;

+ /*

+        * There are assertions above verifying that we only reach this function
+        * either when enable_tidscan=true or when the TID scan is the only 
legal
+        * path, so it's safe to set disabled_nodes to zero here.
+        */
+       path->disabled_nodes = 0;
        path->startup_cost = startup_cost;
        path->total_cost = startup_cost + run_cost;
 }

So if you have enable_tidscan=off, and have a query with "WHERE CURRENTOF foo" that is planned with a TID scan, we set disable_nodes = 0? Thatsounds wrong, shouldn't disable_nodes be 1 in that case? It probablycannot affect the rest of the plan, given that "WHERE CURRENT OF" isonly valid in an UPDATE or DELETE, but still. At least it deserves abetter explanation in the comment.

diff --git a/src/backend/optimizer/plan/createplan.c 
b/src/backend/optimizer/plan/createplan.c
index 6b64c4a362..20236e8c4d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -25,6 +25,7 @@
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/print.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"


left over from debugging?

@@ -68,6 +68,15 @@ static bool pathlist_is_reparameterizable_by_child(List 
*pathlist,
 int
 compare_path_costs(Path *path1, Path *path2, CostSelector criterion)
 {
+       /* Number of disabled nodes, if different, trumps all else. */
+       if (unlikely(path1->disabled_nodes != path2->disabled_nodes))
+       {
+               if (path1->disabled_nodes < path2->disabled_nodes)
+                       return -1;
+               else
+                       return +1;
+       }
+
        if (criterion == STARTUP_COST)
        {
                if (path1->startup_cost < path2->startup_cost)

Is "unlikely()" really appropriate here (and elsewhere in the patch)? Ifyou run with enable_seqscan=off but have no indexes, you could take thatpath pretty often.

If this function needs optimizing, I'd suggest splitting it into twofunctions, one for comparing the startup cost and another for the totalcost. Almost all callers pass a constant for that argument, so theymight as well call the correct function directly and avoid the branchfor that.

@@ -658,6 +704,20 @@ add_path_precheck(RelOptInfo *parent_rel,
                Path       *old_path = (Path *) lfirst(p1);
                PathKeysComparison keyscmp;

+ /*

+                * Since the pathlist is sorted by disabled_nodes and then by
+                * total_cost, we can stop looking once we reach a path with 
more
+                * disabled nodes, or the same number of disabled nodes plus a
+                * total_cost larger than the new path's.
+                */
+               if (unlikely(old_path->disabled_nodes != disabled_nodes))
+               {
+                       if (disabled_nodes < old_path->disabled_nodes)
+                               break;
+               }
+               else if (total_cost <= old_path->total_cost * STD_FUZZ_FACTOR)
+                       break;
+
                /*
                 * We are looking for an old_path with the same 
parameterization (and
                 * by assumption the same rowcount) that dominates the new path 
on
@@ -666,39 +726,27 @@ add_path_precheck(RelOptInfo *parent_rel,
                 *
                 * Cost comparisons here should match 
compare_path_costs_fuzzily.
                 */
-               if (total_cost > old_path->total_cost * STD_FUZZ_FACTOR)
+               /* new path can win on startup cost only if consider_startup */
+               if (startup_cost > old_path->startup_cost * STD_FUZZ_FACTOR ||
+                       !consider_startup)
                {

The "Cost comparisons here should match compare_path_costs_fuzzily"comment also applies to the check on total_cost that you moved up. Maybemove up the comment to the beginning of the loop.


v3-0004-Show-number-of-disabled-nodes-in-EXPLAIN-ANALYZE-.patch:

It's surprising that the "Disable Nodes" is printed even with the COSTSOFF option. It's handy for our regression tests, it's good to print themthere, but it feels wrong.

Could we cram it into the "cost=... rows=..." part? And perhaps a markerthat a node was disabled would be more user friendly than showing thecumulative count? Something like:


postgres=# set enable_material=off;
SET
postgres=# set enable_seqscan=off;
SET
postgres=# set enable_bitmapscan=off;
SET
postgres=# explain select * from foo, bar;

QUERY PLAN

------------------------------------------------------------------------------------
 Nested Loop  (cost=0.15..155632.40 rows=6502500 width=8)

-> Index Only Scan using foo_i_idx on foo (cost=0.15..82.41rows=2550 width=4)

   ->  Seq Scan on bar  (cost=0.00..35.50 (disabled) rows=2550 width=4)
(5 rows)

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: On disable_cost

Reply via email to