Re: One-off failure in "cluster" test

Thomas Munro Sun, 16 Aug 2020 19:53:43 -0700

On Mon, Aug 17, 2020 at 1:27 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
>
> On Mon, Aug 17, 2020 at 1:20 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
> > Thomas Munro <thomas.mu...@gmail.com> writes:
> > > I wonder what caused this[1] one-off failure to see tuples in clustered 
> > > order:
> > > ...
> > > I guess a synchronised scan could cause that, but I wouldn't expect one 
> > > here.
> >
> > Looking at its configuration, chipmunk uses
> >
> >  'extra_config' => {
> >  ...
> >                                                       'shared_buffers = 
> > 10MB',


Ahh, I see what's happening.  You don't need a concurrent process
scanning *your* table for scan order to be nondeterministic.  The
preceding CLUSTER command can leave the start block anywhere if its
call to ss_report_location() fails to acquire SyncScanLock
conditionally.  So I think we just need to disable that for this test,
like in the attached.

From b12d7952feb6153cc760cd8c028182927e8b0198 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Mon, 17 Aug 2020 14:41:42 +1200
Subject: [PATCH] Fix rare failure in cluster test.

Don't allow synchronized scans of the table used in the "cluster"
regression test, because the conditional locking strategy used for
synchronization means that even a non-concurrent scan of a table from
the same session causes nondeterminism in subsequent scans.

Back-patch to 9.6 when the test arrived.

Discussion: https://postgr.es/m/CA%2BhUKGLTK6ZuEkpeJ05-MEmvmgZveCh%2B_w013m7%2ByKWFSmRcDA%40mail.gmail.com
---
 src/test/regress/expected/cluster.out | 3 +++
 src/test/regress/sql/cluster.sql      | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index bdae8fe00c..6e3191b84e 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -452,6 +452,8 @@ create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
 -- ensure we don't use the index in CLUSTER nor the checking SELECTs
 set enable_indexscan = off;
+-- make sure our test will always read blocks from the start of the table
+set synchronize_seqscans = off;
 -- Use external sort:
 set maintenance_work_mem = '1MB';
 cluster clstr_4 using cluster_sort;
@@ -464,6 +466,7 @@ where row(hundred, thousand, tenthous) <= row(lhundred, lthousand, ltenthous);
 ---------+----------+----------+-----------+----------+-----------
 (0 rows)
 
+reset synchronize_seqscans;
 reset enable_indexscan;
 reset maintenance_work_mem;
 -- test CLUSTER on expression index
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 188183647c..1bd5e933e8 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -210,6 +210,9 @@ create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
 -- ensure we don't use the index in CLUSTER nor the checking SELECTs
 set enable_indexscan = off;
 
+-- make sure our test will always read blocks from the start of the table
+set synchronize_seqscans = off;
+
 -- Use external sort:
 set maintenance_work_mem = '1MB';
 cluster clstr_4 using cluster_sort;
@@ -219,6 +222,7 @@ select * from
         tenthous, lag(tenthous) over () as ltenthous from clstr_4) ss
 where row(hundred, thousand, tenthous) <= row(lhundred, lthousand, ltenthous);
 
+reset synchronize_seqscans;
 reset enable_indexscan;
 reset maintenance_work_mem;
 
-- 
2.20.1

Re: One-off failure in "cluster" test

Reply via email to