Incorrect result of bitmap heap scan.

Konstantin Knizhnik Mon, 02 Dec 2024 13:22:59 -0800

Hi hackers,

Attached script reproduces the problem with incorrect results of `selectcount(*)` (it returns larger number of records than really available inthe table).It is not always reproduced, so you may need to repeat it multiple times- at my system it failed 3 times from 10.


The problem takes place with pg16/17/18 (other versions I have not checked).

The test is called `test_ios` (index-only-scan), but it is not correct.Index-only scan is not used in this case.And this is actually the first question to PG17/18: IOS is not used whennumber of records is less than 100k (for this particular table):

postgres=# create table t(pk integer primary key); CREATE TABLEpostgres=# set enable_seqscan = off; SET postgres=# set enable_indexscan= off; SET postgres=# insert into t values (generate_series(1,1000));INSERT 0 1000 postgres=# vacuum t; VACUUM postgres=# explain selectcount(*) from t; QUERY PLAN-------------------------------------------------------------------------------Aggregate (cost=43.02..43.03 rows=1 width=8) -> Bitmap Heap Scan on t(cost=25.52..40.52 rows=1000 width=0) -> Bitmap Index Scan on t_pkey(cost=0.00..25.27 rows=1000 width=0) (3 rows) postgres=# setenable_bitmapscan = off; SET postgres=# explain select count(*) from t;QUERY PLAN ------------------------------------------------------------Aggregate (cost=17.50..17.51 rows=1 width=8) -> Seq Scan on t(cost=0.00..15.00 rows=1000 width=0) Disabled: true (3 rows)

So, as you can see, Postgres prefers to use disabled seqscan, but notIOS. It is different from pg16 where disabling bitmap scan makesoptimizer to choose index-only scan:

postgres=# explain select count(*) from t; QUERY PLAN----------------------------------------------------------- Aggregate(cost=41.88..41.88 rows=1 width=8) -> Seq Scan on t (cost=0.00..35.50rows=2550 width=0) (2 rows) postgres=# set enable_seqscan = off; SETpostgres=# explain select count(*) from t; QUERY PLAN-------------------------------------------------------------------------------Aggregate (cost=75.54..75.55 rows=1 width=8) -> Bitmap Heap Scan on t(cost=33.67..69.17 rows=2550 width=0) -> Bitmap Index Scan on t_pkey(cost=0.00..33.03 rows=2550 width=0) (3 rows) postgres=# setenable_bitmapscan = off; SET postgres=# explain select count(*) from t;QUERY PLAN-------------------------------------------------------------------------------Aggregate (cost=45.77..45.78 rows=1 width=8) -> Index Only Scan usingt_pkey on t (cost=0.28..43.27 rows=1000 width=0) (2 rows)

This is strange behavior of pg17 which for some reasons rejects IOS (butit is used if number of records in the table is 100k or more). But themain problem is that used plan Bitmap Heap Scan + Bitmap Index Scan mayreturn incorrect result.

Replacing `select count(*)` with `select count(pk)` eliminates theproblem, as well as disabling of autovacuum. It seems to be clear thatthe problem is with visibility map.

We have the following code in heap bitmap scan: /* * We can skipfetching the heap page if we don't need any fields from the * heap, thebitmap entries don't need rechecking, and all tuples on the * page arevisible to our transaction. */ if (!(scan->rs_flags & SO_NEED_TUPLES) &&!tbmres->recheck && VM_ALL_VISIBLE(scan->rs_rd, tbmres->blockno,&hscan->rs_vmbuffer)) { /* can't be lossy in the skip_fetch case */Assert(tbmres->ntuples >= 0); Assert(hscan->rs_empty_tuples_pending >=0); hscan->rs_empty_tuples_pending += tbmres->ntuples; return true; }

So if we do not need tuples (|count(*)|case) and page is marked asall-visible in VM, then we just count|tbmres->ntuples|elements withoutextra checks.I almost not so familiar with internals of executor, but it is not clearto me how we avoid race condition between VM update and heap bitmap scan?

Assume that bitmap scan index marks all tids available in index. Someelements in this bitmap can refer old (invisible) versions. Then vacuumcomes, removes dead elements and mark page as all-visible. After it westart heap bitmap scan, see that page is all-visible and count allmarked elements on this page including dead (which are not present inthe page any more).

Which lock or check should prevent such scenario?

import random
import threading
import time
import psycopg2

def test_ios():
    con = psycopg2.connect(database="postgres")
    n_records = 1000
    n_oltp_writers = 10
    n_oltp_readers = 5
    n_olap_readers = 1

    con.autocommit = True
    cur = con.cursor()

    cur.execute("DROP table if exists t")
    cur.execute("CREATE TABLE t(pk bigint not null)")
    cur.execute(f"insert into t values (generate_series(1,{n_records}))")
    cur.execute("create index on t(pk)")
    cur.execute("vacuum t")

    running = True

    def oltp_writer():
        con = psycopg2.connect(database="postgres")
        con.autocommit = True
        cur = con.cursor()
        while running:
            pk = random.randrange(1, n_records)
            cur.execute(f"update t set pk={n_records-pk+1} where pk={pk}")

    def oltp_reader():
        con = psycopg2.connect(database="postgres")
        con.autocommit = True
        cur = con.cursor()
        cur.execute("set effective_io_concurrency=0")
        cur.execute("set max_parallel_workers_per_gather=0")
        cur.execute("set enable_seqscan=off")
        cur.execute("set enable_indexscan=off")
        while running:
            pk = random.randrange(1, n_records)
            cur.execute(f"select count(*) from t where pk between {pk} and {pk+1000}")

    def olap_reader():
        con = psycopg2.connect(database="postgres")
        con.autocommit = True
        cur = con.cursor()
        cur.execute("set effective_io_concurrency=0")
        cur.execute("set max_parallel_workers_per_gather=0")
        cur.execute("set enable_seqscan=off")
        cur.execute("set enable_indexscan=off")
        while running:
            cur.execute("select count(*) from t")
            count = cur.fetchall()[0][0]
            assert count == n_records

    oltp_writers = []
    for _ in range(n_oltp_writers):
        t = threading.Thread(target=oltp_writer)
        oltp_writers.append(t)
        t.start()

    oltp_readers = []
    for _ in range(n_oltp_readers):
        t = threading.Thread(target=oltp_reader)
        oltp_readers.append(t)
        t.start()

    olap_readers = []
    for _ in range(n_olap_readers):
        t = threading.Thread(target=olap_reader)
        olap_readers.append(t)
        t.start()

    time.sleep(100)
    running = False

    for t in oltp_writers:
        t.join()
    for t in oltp_readers:
        t.join()
    for t in olap_readers:
        t.join()

if __name__=="__main__":
    test_ios()

Incorrect result of bitmap heap scan.

Reply via email to