|Hello everyone,

I made a new patch according to the previous comments. It is simpler now, only adding a few checks to the bitmap heap scan node. When the target list for the bitmap heap scan is empty, and there is no filter, and the bitmap page generated by the index scan is exact, and the corresponding heap page is visible to all transaction, we don't fetch it.

The performance is better than with the previous patch, because now it can leverage the parallel heap scan logic. A simple benchmark is attached: this patch is more than ten times faster on a frequent search term, and two times faster on an infrequent one.

Still, there is one thing that is bothering me. I use empty targetlist as the marker of that I should not fetch tuples. Because of that, I have to make sure that use_physical_tlist() doesn't touch empty tlists. Consequently, if count(*) sits on top of a subquery, this subquery has to project and cannot be deleted (see trivial_subqueryscan). There is such a query in the regression test select_distinct: "select count(*) from (select distinct two, four, two from tenk1);". For that particular query it shouldn't matter much, so I changed the test, but the broader implications of this escape me at the moment.

The cost estimation is very simplified now: I just halve the number of pages to be fetched. The most important missing part is checking whether we have any quals that are not checked by the index: if we do, we'll have to fetch all the tuples. Finding nonindex qualifications is somewhat convoluted for the bitmap index scan tree involving multiple indexes, so I didn't implement it for now. We could also consider estimating the number of lossy pages in the tid bitmap given current work_mem size.

I'll be glad to hear your thoughts on this.|
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 79f534e4e9..d7ea6f6929 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -39,6 +39,7 @@
 
 #include "access/relscan.h"
 #include "access/transam.h"
+#include "access/visibilitymap.h"
 #include "executor/execdebug.h"
 #include "executor/nodeBitmapHeapscan.h"
 #include "miscadmin.h"
@@ -225,9 +226,25 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			}
 
 			/*
-			 * Fetch the current heap page and identify candidate tuples.
+			 * If we don't need the tuple contents and are only counting them,
+			 * we can skip fetching the page if the bitmap doesn't need rechecking
+			 * and all tuples on the page are visible to our transaction
 			 */
-			bitgetpage(scan, tbmres);
+			node->bhs_nofetch = !tbmres->recheck
+				&& node->ss.ps.qual == NULL
+				&& node->ss.ps.plan->targetlist == NIL
+				&& VM_ALL_VISIBLE(node->ss.ss_currentRelation, tbmres->blockno,
+								  &node->bhs_vmbuffer);
+
+			if (node->bhs_nofetch)
+				scan->rs_ntuples = tbmres->ntuples;
+			else
+			{
+				/*
+				 * Fetch the current heap page and identify candidate tuples.
+				 */
+				bitgetpage(scan, tbmres);
+			}
 
 			if (tbmres->ntuples >= 0)
 				node->exact_pages++;
@@ -289,45 +306,58 @@ BitmapHeapNext(BitmapHeapScanState *node)
 		 */
 		BitmapPrefetch(node, scan);
 
-		/*
-		 * Okay to fetch the tuple
-		 */
-		targoffset = scan->rs_vistuples[scan->rs_cindex];
-		dp = (Page) BufferGetPage(scan->rs_cbuf);
-		lp = PageGetItemId(dp, targoffset);
-		Assert(ItemIdIsNormal(lp));
 
-		scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
-		scan->rs_ctup.t_len = ItemIdGetLength(lp);
-		scan->rs_ctup.t_tableOid = scan->rs_rd->rd_id;
-		ItemPointerSet(&scan->rs_ctup.t_self, tbmres->blockno, targoffset);
+		if (node->bhs_nofetch)
+		{
+			/*
+			 * If we don't have to fetch the tuple, just return a
+			 * bogus one 
+			 */
+			slot->tts_isempty = false;
+			slot->tts_nvalid = 0;
+		}
+		else
+		{
+			/*
+			 * Okay to fetch the tuple
+			 */
+			targoffset = scan->rs_vistuples[scan->rs_cindex];
+			dp = (Page) BufferGetPage(scan->rs_cbuf);
+			lp = PageGetItemId(dp, targoffset);
+			Assert(ItemIdIsNormal(lp));
 
-		pgstat_count_heap_fetch(scan->rs_rd);
+			scan->rs_ctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lp);
+			scan->rs_ctup.t_len = ItemIdGetLength(lp);
+			scan->rs_ctup.t_tableOid = scan->rs_rd->rd_id;
+			ItemPointerSet(&scan->rs_ctup.t_self, tbmres->blockno, targoffset);
 
-		/*
-		 * Set up the result slot to point to this tuple. Note that the slot
-		 * acquires a pin on the buffer.
-		 */
-		ExecStoreTuple(&scan->rs_ctup,
-					   slot,
-					   scan->rs_cbuf,
-					   false);
+			pgstat_count_heap_fetch(scan->rs_rd);
 
-		/*
-		 * If we are using lossy info, we have to recheck the qual conditions
-		 * at every tuple.
-		 */
-		if (tbmres->recheck)
-		{
-			econtext->ecxt_scantuple = slot;
-			ResetExprContext(econtext);
+			/*
+			 * Set up the result slot to point to this tuple. Note that the slot
+			 * acquires a pin on the buffer.
+			 */
+			ExecStoreTuple(&scan->rs_ctup,
+						   slot,
+						   scan->rs_cbuf,
+						   false);
 
-			if (!ExecQual(node->bitmapqualorig, econtext))
+			/*
+			 * If we are using lossy info, we have to recheck the qual conditions
+			 * at every tuple.
+			 */
+			if (tbmres->recheck)
 			{
-				/* Fails recheck, so drop it and loop back for another */
-				InstrCountFiltered2(node, 1);
-				ExecClearTuple(slot);
-				continue;
+				econtext->ecxt_scantuple = slot;
+				ResetExprContext(econtext);
+
+				if (!ExecQual(node->bitmapqualorig, econtext))
+				{
+					/* Fails recheck, so drop it and loop back for another */
+					InstrCountFiltered2(node, 1);
+					ExecClearTuple(slot);
+					continue;
+				}
 			}
 		}
 
@@ -573,6 +603,15 @@ BitmapPrefetch(BitmapHeapScanState *node, HeapScanDesc scan)
 #ifdef USE_PREFETCH
 	ParallelBitmapHeapState *pstate = node->pstate;
 
+	if (node->bhs_nofetch)
+	{
+		/*
+		 * If we did not need to fetch the current page,
+		 * we probably will not need to fetch the next.
+		 */
+		return;
+	}
+
 	if (pstate == NULL)
 	{
 		TBMIterator *prefetch_iterator = node->prefetch_iterator;
@@ -778,6 +817,11 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
 	if (node->shared_prefetch_iterator)
 		tbm_end_shared_iterate(node->shared_prefetch_iterator);
 
+	if (node->bhs_vmbuffer != InvalidBuffer)
+	{
+		ReleaseBuffer(node->bhs_vmbuffer);
+	}
+
 	/*
 	 * close heap scan
 	 */
@@ -827,6 +871,8 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	scanstate->prefetch_iterator = NULL;
 	scanstate->prefetch_pages = 0;
 	scanstate->prefetch_target = 0;
+	scanstate->bhs_vmbuffer = InvalidBuffer;
+	scanstate->bhs_nofetch = false;
 	/* may be updated below */
 	scanstate->prefetch_maximum = target_prefetch_pages;
 	scanstate->pscan_len = 0;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b35acb7bdc..13fb8d3153 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -982,6 +982,14 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 	else
 		cost_per_page = spc_random_page_cost;
 
+	/*
+	 * Bitmap heap scan tries not to fetch the heap pages if the targetlist
+	 * is empty. It should be reflected in the cost somehow; for now, just
+	 * halve the number of pages to be fetched.
+	 */
+	if (path->pathtarget->exprs == NIL)
+		pages_fetched = pages_fetched / 2;
+
 	run_cost += pages_fetched * cost_per_page;
 
 	/*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 5c934f223d..0874871b61 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -778,6 +778,13 @@ use_physical_tlist(PlannerInfo *root, Path *path, int flags)
 		return false;
 
 	/*
+	 * If the tlist is empty, keep it as is, so that the
+	 * bitmap heap scan could try skipping heap page fetches.
+	 */
+	if (path->pathtarget->exprs == NIL)
+		return false;
+
+	/*
 	 * We can do this for real relation scans, subquery scans, function scans,
 	 * tablefunc scans, values scans, and CTE scans (but not for, eg, joins).
 	 */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 35c28a6143..6f567f25e8 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1334,6 +1334,8 @@ typedef struct ParallelBitmapHeapState
  *		shared_tbmiterator	   shared iterator
  *		shared_prefetch_iterator shared iterator for prefetching
  *		pstate			   shared state for parallel bitmap scan
+ * 		bhs_vmbuffer	   buffer for visibility map lookups
+ * 		bhs_nofetch		   can we not fetch the heap page and return an empty tuple?
  * ----------------
  */
 typedef struct BitmapHeapScanState
@@ -1354,6 +1356,8 @@ typedef struct BitmapHeapScanState
 	TBMSharedIterator *shared_tbmiterator;
 	TBMSharedIterator *shared_prefetch_iterator;
 	ParallelBitmapHeapState *pstate;
+	Buffer bhs_vmbuffer;
+	bool bhs_nofetch;
 } BitmapHeapScanState;
 
 /* ----------------
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9f4c88dab4..2e82ab1ec0 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3746,7 +3746,6 @@ where tt1.f1 = ss1.c0;
                      Output: tt1.f1
                      Filter: (tt1.f1 = 'foo'::text)
                ->  Seq Scan on public.text_tbl tt2
-                     Output: tt2.f1
          ->  Materialize
                Output: tt4.f1
                ->  Nested Loop Left Join
@@ -3765,7 +3764,7 @@ where tt1.f1 = ss1.c0;
                Output: (tt4.f1)
                ->  Seq Scan on public.text_tbl tt5
                      Output: tt4.f1
-(29 rows)
+(28 rows)
 
 select 1 from
   text_tbl as tt1
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index f3696c6d1d..d29c1adaab 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -130,16 +130,17 @@ SELECT DISTINCT p.age FROM person* p ORDER BY age using >;
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT count(*) FROM
   (SELECT DISTINCT two, four, two FROM tenk1) ss;
-                       QUERY PLAN                       
---------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Aggregate
    Output: count(*)
-   ->  HashAggregate
-         Output: tenk1.two, tenk1.four, tenk1.two
-         Group Key: tenk1.two, tenk1.four, tenk1.two
-         ->  Seq Scan on public.tenk1
+   ->  Subquery Scan on ss
+         ->  HashAggregate
                Output: tenk1.two, tenk1.four, tenk1.two
-(7 rows)
+               Group Key: tenk1.two, tenk1.four, tenk1.two
+               ->  Seq Scan on public.tenk1
+                     Output: tenk1.two, tenk1.four, tenk1.two
+(8 rows)
 
 SELECT count(*) FROM
   (SELECT DISTINCT two, four, two FROM tenk1) ss;
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index fc91f3ce36..4c4ee4d808 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -136,12 +136,15 @@ SELECT count(*) FROM tenk2;
 (1 row)
 
 -- do an indexscan
+-- disable bitmap scan because it can skip fetching the heap tuples
+SET enable_bitmapscan to OFF;
 SELECT count(*) FROM tenk2 WHERE unique1 = 1;
  count 
 -------
      1
 (1 row)
 
+SET enable_bitmapscan to ON;
 -- We can't just call wait_for_stats() at this point, because we only
 -- transmit stats when the session goes idle, and we probably didn't
 -- transmit the last couple of counts yet thanks to the rate-limiting logic
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 2090a411fe..fc1b512a35 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -1658,20 +1658,22 @@ UPDATE rw_view1 SET a = a + 5; -- should fail
 ERROR:  new row violates check option for view "rw_view1"
 DETAIL:  Failing row contains (15).
 EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (5);
-                          QUERY PLAN                           
----------------------------------------------------------------
+                      QUERY PLAN                       
+-------------------------------------------------------
  Insert on base_tbl b
    ->  Result
          SubPlan 1
-           ->  Index Only Scan using ref_tbl_pkey on ref_tbl r
-                 Index Cond: (a = b.a)
+           ->  Bitmap Heap Scan on ref_tbl r
+                 Recheck Cond: (a = b.a)
+                 ->  Bitmap Index Scan on ref_tbl_pkey
+                       Index Cond: (a = b.a)
          SubPlan 2
            ->  Seq Scan on ref_tbl r_1
-(7 rows)
+(9 rows)
 
 EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5;
-                           QUERY PLAN                            
------------------------------------------------------------------
+                      QUERY PLAN                       
+-------------------------------------------------------
  Update on base_tbl b
    ->  Hash Join
          Hash Cond: (b.a = r.a)
@@ -1679,11 +1681,13 @@ EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5;
          ->  Hash
                ->  Seq Scan on ref_tbl r
          SubPlan 1
-           ->  Index Only Scan using ref_tbl_pkey on ref_tbl r_1
-                 Index Cond: (a = b.a)
+           ->  Bitmap Heap Scan on ref_tbl r_1
+                 Recheck Cond: (a = b.a)
+                 ->  Bitmap Index Scan on ref_tbl_pkey
+                       Index Cond: (a = b.a)
          SubPlan 2
            ->  Seq Scan on ref_tbl r_2
-(11 rows)
+(13 rows)
 
 DROP TABLE base_tbl, ref_tbl CASCADE;
 NOTICE:  drop cascades to view rw_view1
@@ -2024,24 +2028,28 @@ EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
 DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
 NOTICE:  snooped value: Row 1
 EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (2, 'New row 2');
-                        QUERY PLAN                         
------------------------------------------------------------
+                       QUERY PLAN                       
+--------------------------------------------------------
  Insert on base_tbl
    InitPlan 1 (returns $0)
-     ->  Index Only Scan using base_tbl_pkey on base_tbl t
-           Index Cond: (id = 2)
+     ->  Bitmap Heap Scan on base_tbl t
+           Recheck Cond: (id = 2)
+           ->  Bitmap Index Scan on base_tbl_pkey
+                 Index Cond: (id = 2)
    ->  Result
          One-Time Filter: ($0 IS NOT TRUE)
  
  Update on base_tbl
    InitPlan 1 (returns $0)
-     ->  Index Only Scan using base_tbl_pkey on base_tbl t
-           Index Cond: (id = 2)
+     ->  Bitmap Heap Scan on base_tbl t
+           Recheck Cond: (id = 2)
+           ->  Bitmap Index Scan on base_tbl_pkey
+                 Index Cond: (id = 2)
    ->  Result
          One-Time Filter: $0
          ->  Index Scan using base_tbl_pkey on base_tbl
                Index Cond: (id = 2)
-(15 rows)
+(19 rows)
 
 INSERT INTO rw_view1 VALUES (2, 'New row 2');
 SELECT * FROM base_tbl;
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 6e882bf3ac..cec32cff2d 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -138,7 +138,10 @@ ROLLBACK;
 -- do a seqscan
 SELECT count(*) FROM tenk2;
 -- do an indexscan
+-- disable bitmap scan because it can skip fetching the heap tuples
+SET enable_bitmapscan to OFF;
 SELECT count(*) FROM tenk2 WHERE unique1 = 1;
+SET enable_bitmapscan to ON;
 
 -- We can't just call wait_for_stats() at this point, because we only
 -- transmit stats when the session goes idle, and we probably didn't
********* master, frequent term:

test=# explain analyze select count(*) from pglist where fts @@ to_tsquery( 
'simple', 'tom & lane' );

                                                                  QUERY PLAN    
                                                              
----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=112620.76..112620.77 rows=1 width=8) (actual 
time=19862.702..19862.702 rows=1 loops=1)
   ->  Bitmap Heap Scan on pglist  (cost=542.90..112486.92 rows=53535 width=0) 
(actual time=1502.176..19743.815 rows=222813 loops=1)
         Recheck Cond: (fts @@ '''tom'' & ''lane'''::tsquery)
         Heap Blocks: exact=105992
         ->  Bitmap Index Scan on pglist_fts_rum  (cost=0.00..529.51 rows=53535 
width=0) (actual time=1441.195..1441.195 rows=222813 loops=1)
               Index Cond: (fts @@ '''tom'' & ''lane'''::tsquery)
 Planning time: 180.332 ms
 Execution time: 19865.841 ms
(8 rows)


********* index-only, frequent term:

test=# explain analyze select count(*) from pglist where fts @@ to_tsquery( 
'simple', 'tom & lane' );

                                                                  QUERY PLAN    
                                                              
----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=56911.69..56911.70 rows=1 width=8) (actual 
time=1543.316..1543.316 rows=1 loops=1)
   ->  Bitmap Heap Scan on pglist  (cost=542.01..56778.14 rows=53420 width=0) 
(actual time=1496.301..1529.001 rows=222813 loops=1)
         Recheck Cond: (fts @@ '''tom'' & ''lane'''::tsquery)
         Heap Blocks: exact=105992
         Heap Fetches: 0
         ->  Bitmap Index Scan on pglist_fts_rum  (cost=0.00..528.65 rows=53420 
width=0) (actual time=1457.597..1457.597 rows=222813 loops=1)
               Index Cond: (fts @@ '''tom'' & ''lane'''::tsquery)
 Planning time: 221.965 ms
 Execution time: 1545.217 ms
(9 rows)


********* master, infrequent term:

test=# set work_mem to 8192; explain analyze select count(*) from pglist where 
fts @@ to_tsquery( 'simple', 'ildus' );
SET
                                                             QUERY PLAN         
                                                     
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17514.67..17514.68 rows=1 width=8) (actual 
time=331.773..331.774 rows=1 loops=1)
   ->  Bitmap Heap Scan on pglist  (cost=71.28..17502.00 rows=5069 width=0) 
(actual time=103.387..331.366 rows=143 loops=1)
         Recheck Cond: (fts @@ '''ildus'''::tsquery)
         Heap Blocks: exact=121
         ->  Bitmap Index Scan on pglist_fts_rum  (cost=0.00..70.02 rows=5069 
width=0) (actual time=64.870..64.870 rows=143 loops=1)
               Index Cond: (fts @@ '''ildus'''::tsquery)
 Planning time: 205.332 ms
 Execution time: 332.806 ms


********* index-only, infrequent term:

test=# set work_mem to 8192; explain analyze select count(*) from pglist where 
fts @@ to_tsquery( 'simple', 'ildus' );
SET
                                                             QUERY PLAN         
                                                     
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8830.99..8831.00 rows=1 width=8) (actual 
time=145.989..145.989 rows=1 loops=1)
   ->  Bitmap Heap Scan on pglist  (cost=71.28..8818.32 rows=5069 width=0) 
(actual time=115.310..145.924 rows=143 loops=1)
         Recheck Cond: (fts @@ '''ildus'''::tsquery)
         Heap Blocks: exact=121
         ->  Bitmap Index Scan on pglist_fts_rum  (cost=0.00..70.02 rows=5069 
width=0) (actual time=57.375..57.375 rows=143 loops=1)
               Index Cond: (fts @@ '''ildus'''::tsquery)
 Planning time: 196.365 ms
 Execution time: 146.677 ms
(8 rows)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to