Hi,

currently partial indexes end up not using index only scans in most cases, because check_index_only() is overly conservative, as explained in this comment:

 * XXX this is overly conservative for partial indexes, since we will
 * consider attributes involved in the index predicate as required even
 * though the predicate won't need to be checked at runtime. (The same
 * is true for attributes used only in index quals, if we are certain
 * that the index is not lossy.)  However, it would be quite expensive
 * to determine that accurately at this point, so for now we take the
 * easy way out.

In other words, unless you include columns from the index predicate to the index, the planner will decide index only scans are not possible. Which is a bit unfortunate, because those columns are not needed at runtime, and will only increase the index size (and the main benefit of partial indexes is size reduction).

The attached patch fixes this by only considering clauses that are not implied by the index predicate. The effect is simple:

    create table t as select i as a, i as b from
                      generate_series(1,10000000) s(i);

    create index tidx_partial on t(b) where a > 1000 and a < 2000;

    vacuum freeze t;
    analyze t;

explain analyze select count(b) from t where a > 1000 and a < 2000;

                                QUERY PLAN
-----------------------------------------------------------------------
 Aggregate  (cost=39.44..39.45 rows=1 width=4)
            (actual time=8.350..8.354 rows=1 loops=1)
   ->  Index Scan using tidx_partial on t
            (cost=0.28..37.98 rows=585 width=4)
            (actual time=0.034..4.368 rows=999 loops=1)
 Planning time: 0.197 ms
 Execution time: 8.441 ms
(4 rows)

explain analyze select count(b) from t where a > 1000 and a < 2000;

                                QUERY PLAN
-----------------------------------------------------------------------
 Aggregate  (cost=33.44..33.45 rows=1 width=4)
            (actual time=8.019..8.023 rows=1 loops=1)
   ->  Index Only Scan using tidx_partial on t
            (cost=0.28..31.98 rows=585 width=4)
            (actual time=0.036..4.165 rows=999 loops=1)
         Heap Fetches: 0
 Planning time: 0.188 ms
 Execution time: 8.106 ms
(5 rows)


I've done a bunch of tests, and I do see small (hardly noticeable) increase in planning time with long list of WHERE clauses, because all those need to be checked against the index predicate. Not sure if this is what's meant by 'quite expensive' in the comment. Moreover, this was more than compensated by the IOS benefits (even with everything in RAM).

But maybe it's possible to fix that somehow? For example, we're certainly doing those checks elsewhere when deciding which clauses need to be evaluated at run-time, so maybe we could cache that somehow?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 9da5444..113747c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1798,13 +1798,13 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
 	 * Check that all needed attributes of the relation are available from the
 	 * index.
 	 *
-	 * XXX this is overly conservative for partial indexes, since we will
-	 * consider attributes involved in the index predicate as required even
-	 * though the predicate won't need to be checked at runtime.  (The same is
-	 * true for attributes used only in index quals, if we are certain that
-	 * the index is not lossy.)  However, it would be quite expensive to
-	 * determine that accurately at this point, so for now we take the easy
-	 * way out.
+	 * For partial indexes we won't consider attributes involved in clauses
+	 * implied by the index predicate, as those won't be needed at runtime.
+	 *
+	 * XXX The same is true for attributes used only in index quals, if we
+	 * are certain that the index is not lossy. However, it would be quite
+	 * expensive to determine that accurately at this point, so for now we
+	 * take the easy way out.
 	 */
 
 	/*
@@ -1819,6 +1819,28 @@ check_index_only(RelOptInfo *rel, IndexOptInfo *index)
 	{
 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
 
+		/*
+		 * If the index is partial, we won't consider the clauses that are
+		 * implied by the index predicate (those are not needed at runtime).
+		 */
+		if (index->indpred != NIL)
+		{
+			bool	implied = false;
+			List   *clauses  = NIL;
+
+			/* need a list for the 'implied_by' call */
+			clauses = lappend(clauses, (Node *) rinfo->clause);
+
+			implied = predicate_implied_by(clauses, index->indpred);
+
+			/* we generally don't free memory, but well ... */
+			list_free(clauses);
+
+			/* if the clause is implied by index predicate, skip it */
+			if (implied)
+				continue;
+		}
+
 		pull_varattnos((Node *) rinfo->clause, rel->relid, &attrs_used);
 	}
 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to