stats.sql might fail due to shared buffers also used by parallel tests

Alexander Lakhin Sun, 05 Jan 2025 23:00:20 -0800

Hello hackers,

I'd like to share my investigation of one mysterious stats.sql failure
occurred in December: [1].
The difference of the failure is:
--- C:/prog/bf/root/HEAD/pgsql/src/test/regress/expected/stats.out 2024-09-18 
19:31:14.665516500 +0000

+++ C:/prog/bf/root/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/stats.out 2024-12-1709:57:08.944588500 +0000

@@ -1291,7 +1291,7 @@
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
  ?column?
 ----------
- t
+ f
 (1 row)


027_stream_regress_primary.log contains:
2024-12-17 09:57:06.778 UTC [8568:109] pg_regress/stats LOG: statement: SELECT 
sum(writes) AS writes, sum(fsyncs) AS fsyncs
      FROM pg_stat_io
      WHERE object = 'relation'
2024-12-17 09:57:06.779 UTC [8568:110] pg_regress/stats LOG: statement: CREATE 
TABLE test_io_shared(a int);

2024-12-17 09:57:06.780 UTC [8568:111] pg_regress/stats LOG: statement: INSERT INTO test_io_shared SELECT i FROMgenerate_series(1,100)i;

...
2024-12-17 09:57:06.782 UTC [8568:115] pg_regress/stats LOG: statement: 
CHECKPOINT;
2024-12-17 09:57:06.794 UTC [3664:40] LOG:  checkpoint starting: immediate 
force wait

2024-12-17 09:57:06.856 UTC [3664:41] LOG: checkpoint complete: wrote 0 buffers (0.0%), wrote 1 SLRU buffers; 0 WALfile(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.062 s; sync files=0, longest=0.000 s,average=0.000 s; distance=1875 kB, estimate=52682 kB; lsn=0/14A2F410, redo lsn=0/14A2F3B8

2024-12-17 09:57:06.857 UTC [8568:116] pg_regress/stats LOG: statement: 
CHECKPOINT;
2024-12-17 09:57:06.857 UTC [3664:42] LOG:  checkpoint starting: immediate 
force wait

2024-12-17 09:57:06.859 UTC [3664:43] LOG: checkpoint complete: wrote 0 buffers (0.0%), wrote 0 SLRU buffers; 0 WALfile(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.002 s; sync files=0, longest=0.000 s,average=0.000 s; distance=0 kB, estimate=47414 kB; lsn=0/14A2F4E0, redo lsn=0/14A2F488

2024-12-17 09:57:06.859 UTC [8568:117] pg_regress/stats LOG: statement: SELECT 
sum(writes) AS writes, sum(fsyncs) AS fsyncs
      FROM pg_stat_io
      WHERE object = 'relation'
2024-12-17 09:57:06.860 UTC [8568:118] pg_regress/stats LOG: statement: SELECT 
77693 > 77693;

The corresponding test case (added by 10a082bf7, dated 2023-02-11) is:
SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
  FROM pg_stat_io
  WHERE object = 'relation' \gset io_sum_shared_before_
...
CREATE TABLE test_io_shared(a int);
INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
...
CHECKPOINT;
CHECKPOINT;
...
SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
  FROM pg_stat_get_backend_io(pg_backend_pid())
  WHERE object = 'relation' \gset my_io_sum_shared_after_
SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;

So, as we can see from the log above, both checkpoints wrote 0 buffers,
moreover, no other process/backend wrote any buffers.

After series of experiments, I found that this situation is possible when
other backend steals shared buffers from checkpointer, without making new
buffers dirty, and then delays flushing it's statistics.

Please find attached a reproducer for this issue: I've added two sleeps
to stats.sql and reduced parallel_schedule, but I believe it's a legal
change, which doesn't affect the test case essentially.

With bufchurn.sql running concurrently. and TEMP_CONFIG containing
shared_buffers = 1MB (as in the 027_stream_regress test in question), I can
see exactly the same failure:
$ echo 'shared_buffers = 1MB' >/tmp/extra.config; TEMP_CONFIG=/tmp/extra.config 
make -s check
# +++ regress check in src/test/regress +++
# initializing database system by copying initdb template
# using temp instance on port 65312 with PID 129700
ok 1         - test_setup                                356 ms
ok 2         - create_index                             1002 ms
# parallel group (2 tests):  stats bufchurn
not ok 3     + stats                                   21176 ms
...

diff -U3 .../src/test/regress/expected/stats.out 
.../src/test/regress/results/stats.out
--- .../src/test/regress/expected/stats.out  2025-01-06 08:03:07.720275980 +0200
+++ .../src/test/regress/results/stats.out   2025-01-06 08:15:34.602170721 +0200
@@ -1321,7 +1321,7 @@
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
  ?column?
 ----------
- t
+ f
 (1 row)
...

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-12-17%2008%3A59%3A44

Best regards,
Alexander

diff --git a/src/test/regress/expected/bufchurn.out b/src/test/regress/expected/bufchurn.out
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index a0317b7208..651898b9f4 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -834,6 +834,12 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
 SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
 CREATE TEMP TABLE test_stats_temp AS SELECT 17;
 DROP TABLE test_stats_temp;
+SELECT pg_sleep(10);
+ pg_sleep 
+----------
+ 
+(1 row)
+
 -- Checkpoint twice: The checkpointer reports stats after reporting completion
 -- of the checkpoint. But after a second checkpoint we'll see at least the
 -- results of the first.
@@ -1298,6 +1304,12 @@ SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT pg_sleep(10);
+ pg_sleep 
+----------
+ 
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (usually not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1edd9e45eb..2142842a31 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -1,138 +1,4 @@
-# ----------
-# src/test/regress/parallel_schedule
-#
-# Most test scripts can be run after running just test_setup and possibly
-# create_index.  Exceptions to this rule are documented below.
-#
-# By convention, we put no more than twenty tests in any one parallel group;
-# this limits the number of connections needed to run the tests.
-# ----------
-
-# required setup steps
 test: test_setup
-
-# ----------
-# The first group of parallel tests
-# ----------
-test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
-
-# ----------
-# The second group of parallel tests
-# multirangetypes depends on rangetypes
-# multirangetypes shouldn't run concurrently with type_sanity
-# ----------
-test: strings md5 numerology point lseg line box path polygon circle date time timetz timestamp timestamptz interval inet macaddr macaddr8 multirangetypes
-
-# ----------
-# Another group of parallel tests
-# geometry depends on point, lseg, line, box, path, polygon, circle
-# horology depends on date, time, timetz, timestamp, timestamptz, interval
-# ----------
-test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comments expressions unicode xid mvcc database stats_import
-
-# ----------
-# Load huge amounts of data
-# We should split the data files into single files and then
-# execute two copy tests in parallel, to check that copy itself
-# is concurrent safe.
-# ----------
-test: copy copyselect copydml copyencoding insert insert_conflict
-
-# ----------
-# More groups of parallel tests
-# Note: many of the tests in later groups depend on create_index
-# ----------
-test: create_function_c create_misc create_operator create_procedure create_table create_type create_schema
-test: create_index create_index_spgist create_view index_including index_including_gist
-
-# ----------
-# Another group of parallel tests
-# ----------
-test: create_aggregate create_function_sql create_cast constraints triggers select inherit typed_table vacuum drop_if_exists updatable_views roleattributes create_am hash_func errors infinite_recurse
-
-# ----------
-# sanity_check does a vacuum, affecting the sort order of SELECT *
-# results. So it should not run parallel to other tests.
-# ----------
-test: sanity_check
-
-# ----------
-# Another group of parallel tests
-# aggregates depends on create_aggregate
-# join depends on create_misc
-# ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update delete namespace prepared_xacts
-
-# ----------
-# Another group of parallel tests
-# ----------
-test: brin gin gist spgist privileges init_privs security_label collate matview lock replica_identity rowsecurity object_address tablesample groupingsets drop_operator password identity generated_stored join_hash
-
-# ----------
-# Additional BRIN tests
-# ----------
-test: brin_bloom brin_multi
-
-# ----------
-# Another group of parallel tests
-# psql depends on create_am
-# amutils depends on geometry, create_index_spgist, hash_index, brin
-# ----------
-test: create_table_like alter_generic alter_operator misc async dbsize merge misc_functions sysviews tsrf tid tidscan tidrangescan collate.utf8 collate.icu.utf8 incremental_sort create_role without_overlaps
-
-# collate.linux.utf8 and collate.icu.utf8 tests cannot be run in parallel with each other
-test: rules psql psql_crosstab amutils stats_ext collate.linux.utf8 collate.windows.win1252
-
-# ----------
-# Run these alone so they don't run out of parallel workers
-# select_parallel depends on create_misc
-# ----------
-test: select_parallel
-test: write_parallel
-test: vacuum_parallel
-
-# no relation related tests can be put in this group
-test: publication subscription
-
-# ----------
-# Another group of parallel tests
-# select_views depends on create_view
-# ----------
-test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock indirect_toast equivclass
-
-# ----------
-# Another group of parallel tests (JSON related)
-# ----------
-test: json jsonb json_encoding jsonpath jsonpath_encoding jsonb_jsonpath sqljson sqljson_queryfuncs sqljson_jsontable
-
-# ----------
-# Another group of parallel tests
-# with depends on create_misc
-# NB: temp.sql does a reconnect which transiently uses 2 connections,
-# so keep this parallel group to at most 19 tests
-# ----------
-test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion truncate alter_table sequence polymorphism rowtypes returning largeobject with xml
-
-# ----------
-# Another group of parallel tests
-#
-# The stats test resets stats, so nothing else needing stats access can be in
-# this group.
-# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
-
-# event_trigger depends on create_am and cannot run concurrently with
-# any test that runs DDL
-# oidjoins is read-only, though, and should run late for best coverage
-test: oidjoins event_trigger
-
-# event_trigger_login cannot run concurrently with any other tests because
-# on-login event handling could catch connection of a concurrent test.
-test: event_trigger_login
-
-# this test also uses event triggers, so likewise run it by itself
-test: fast_default
-
-# run tablespace test at the end because it drops the tablespace created during
-# setup that other tests may use.
-test: tablespace
+test: create_index
+test: stats bufchurn
+# test: ministats bufchurn
diff --git a/src/test/regress/sql/bufchurn.sql b/src/test/regress/sql/bufchurn.sql
new file mode 100644
index 0000000000..c29ac4acf9
--- /dev/null
+++ b/src/test/regress/sql/bufchurn.sql
@@ -0,0 +1,22 @@
+SELECT pg_sleep(2);
+CREATE TABLE t0(i int, t text);
+ALTER TABLE t0 ALTER t SET STORAGE PLAIN;
+INSERT INTO t0 SELECT g, g || repeat('x', 8000) FROM generate_series(1, 256) g;
+
+CREATE TABLE t1(i int, t text);
+ALTER TABLE tl ALTER t SET STORAGE PLAIN;
+INSERT INTO t1 SELECT g, g || repeat('x', 8000) FROM generate_series(1, 256) g;
+
+EXPLAIN ANALYZE SELECT SUM(i) FROM t0 GROUP BY t;
+EXPLAIN ANALYZE SELECT SUM(i) FROM t1 GROUP BY t;
+
+\c -
+SELECT 'SELECT pg_sleep(1);' FROM generate_series(1, 10)
+\gexec
+
+BEGIN;
+SELECT 'EXPLAIN ANALYZE SELECT SUM(i) FROM t0; EXPLAIN ANALYZE SELECT SUM(i) FROM t1;' FROM generate_series(1, 20)
+\gexec
+
+SELECT pg_sleep(10);
+ROLLBACK;
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 399c72bbcf..fcec1be377 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -426,6 +426,7 @@ SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
 CREATE TEMP TABLE test_stats_temp AS SELECT 17;
 DROP TABLE test_stats_temp;
 
+SELECT pg_sleep(10);
 -- Checkpoint twice: The checkpointer reports stats after reporting completion
 -- of the checkpoint. But after a second checkpoint we'll see at least the
 -- results of the first.
@@ -632,6 +633,7 @@ SELECT sum(extends) AS my_io_sum_shared_after_extends
   WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
+SELECT pg_sleep(10);
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (usually not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.

stats.sql might fail due to shared buffers also used by parallel tests

Reply via email to