On Sat, Jul 04, 2015 at 08:07:49PM -0400, Noah Misch wrote:
> On Sun, Jul 05, 2015 at 12:54:43AM +0200, Andres Freund wrote:
> > On 2015-07-04 18:40:41 -0400, Noah Misch wrote:
> > > (1) "IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)".  Getting it 
> > > working
> > > required the attached patch.
> 
> > Will you apply? Having the ability to test change seems to put you in a
> > much better spot then me.
> 
> I will.

That commit (0d32d2e) permitted things to compile and usually pass tests, but
I missed the synchronization bug.  Since 2015-10-01, the buildfarm has seen
sixteen duplicate-catalog-OID failures.  The compiler has always been xlC, and
the branch has been HEAD or 9.5.  Examples:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-12-15%2004%3A12%3A49
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-12-08%2001%3A20%3A16
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2015-12-11%2005%3A25%3A54

These suggested OidGenLock wasn't doing its job.  I've seen similar symptoms
around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73,
5765-J08)" for ppc64le.  The problem is generic-xlc.h
pg_atomic_compare_exchange_u32_impl() issuing __isync() before
__compare_and_swap().  __isync() shall follow __compare_and_swap(); see our
own s_lock.h, its references, and other projects' usage:

https://github.com/boostorg/smart_ptr/blob/boost-1.60.0/include/boost/smart_ptr/detail/sp_counted_base_vacpp_ppc.hpp#L55
http://redmine.gromacs.org/projects/gromacs/repository/entry/src/external/thread_mpi/include/thread_mpi/atomic/xlc_ppc.h?rev=v5.1.2#L112
https://www-01.ibm.com/support/knowledgecenter/SSGH2K_13.1.2/com.ibm.xlc131.aix.doc/language_ref/asm_example.html

This patch's test case would have failed about half the time under today's
generic-xlc.h.  Fast machines run it in about 1s.  A test that detects the bug
99% of the time would run far longer, hence this compromise.


Archaeology enthusiasts may enjoy the bug's changing presentation over time.
LWLockAcquire() exposed it after commit ab5194e redesigned lwlock.c in terms
of atomics.  Assert builds were unaffected for commits in [ab5194e, bbfd7ed);
atomics.h asserts fattened code enough to mask the bug.  However, removing the
AssertPointerAlignment() from pg_atomic_fetch_or_u32() would have sufficed to
surface it in assert builds.  All builds demonstrated the bug once bbfd7ed
introduced xlC pg_attribute_noreturn(), which elicits a simpler instruction
sequence around the ExceptionalCondition() call embedded in each Assert.

nm
diff --git a/src/bin/pgbench/.gitignore b/src/bin/pgbench/.gitignore
index aae819e..983a3cd 100644
--- a/src/bin/pgbench/.gitignore
+++ b/src/bin/pgbench/.gitignore
@@ -1,3 +1,4 @@
 /exprparse.c
 /exprscan.c
 /pgbench
+/tmp_check/
diff --git a/src/bin/pgbench/Makefile b/src/bin/pgbench/Makefile
index 18fdf58..e9b1b74 100644
--- a/src/bin/pgbench/Makefile
+++ b/src/bin/pgbench/Makefile
@@ -40,3 +40,9 @@ clean distclean:
 
 maintainer-clean: distclean
        rm -f exprparse.c exprscan.c
+
+check:
+       $(prove_check)
+
+installcheck:
+       $(prove_installcheck)
diff --git a/src/bin/pgbench/t/001_pgbench.pl b/src/bin/pgbench/t/001_pgbench.pl
new file mode 100644
index 0000000..88e83ab
--- /dev/null
+++ b/src/bin/pgbench/t/001_pgbench.pl
@@ -0,0 +1,25 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 3;
+
+# Test concurrent insertion into table with UNIQUE oid column.  DDL expects
+# GetNewOidWithIndex() to successfully avoid violating uniqueness for indexes
+# like pg_class_oid_index and pg_proc_oid_index.  This indirectly exercises
+# LWLock and spinlock concurrency.  This test makes a 5-MiB table.
+my $node = get_new_node('main');
+$node->init;
+$node->start;
+$node->psql('postgres',
+           'CREATE UNLOGGED TABLE oid_tbl () WITH OIDS; '
+         . 'ALTER TABLE oid_tbl ADD UNIQUE (oid);');
+my $script = $node->basedir . '/pgbench_script';
+append_to_file($script,
+       'INSERT INTO oid_tbl SELECT FROM generate_series(1,1000);');
+$node->command_like(
+       [   qw(pgbench --no-vacuum --client=5 --protocol=prepared
+                 --transactions=25 --file), $script ],
+       qr{processed: 125/125},
+       'concurrent OID generation');
diff --git a/src/include/port/atomics/generic-xlc.h 
b/src/include/port/atomics/generic-xlc.h
index 1e8a11e..f24e3af 100644
--- a/src/include/port/atomics/generic-xlc.h
+++ b/src/include/port/atomics/generic-xlc.h
@@ -41,18 +41,22 @@ pg_atomic_compare_exchange_u32_impl(volatile 
pg_atomic_uint32 *ptr,
                                                                        uint32 
*expected, uint32 newval)
 {
        /*
+        * XXX: __compare_and_swap is defined to take signed parameters, but 
that
+        * shouldn't matter since we don't perform any arithmetic operations.
+        */
+       bool            ret = __compare_and_swap((volatile int*)&ptr->value,
+                                                                               
 (int *)expected, (int)newval);
+
+       /*
         * xlc's documentation tells us:
         * "If __compare_and_swap is used as a locking primitive, insert a call 
to
         * the __isync built-in function at the start of any critical sections."
+        *
+        * The critical section begins immediately after __compare_and_swap().
         */
        __isync();
 
-       /*
-        * XXX: __compare_and_swap is defined to take signed parameters, but 
that
-        * shouldn't matter since we don't perform any arithmetic operations.
-        */
-       return __compare_and_swap((volatile int*)&ptr->value,
-                                                         (int *)expected, 
(int)newval);
+       return ret;
 }
 
 #define PG_HAVE_ATOMIC_FETCH_ADD_U32
@@ -69,10 +73,12 @@ static inline bool
 pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
                                                                        uint64 
*expected, uint64 newval)
 {
+       bool            ret = __compare_and_swaplp((volatile long*)&ptr->value,
+                                                                               
   (long *)expected, (long)newval);
+
        __isync();
 
-       return __compare_and_swaplp((volatile long*)&ptr->value,
-                                                               (long 
*)expected, (long)newval);;
+       return ret;
 }
 
 #define PG_HAVE_ATOMIC_FETCH_ADD_U64
diff --git a/src/tools/msvc/clean.bat b/src/tools/msvc/clean.bat
index feb0fe5..d21692f 100755
--- a/src/tools/msvc/clean.bat
+++ b/src/tools/msvc/clean.bat
@@ -94,6 +94,7 @@ if exist src\bin\pg_basebackup\tmp_check rd /s /q 
src\bin\pg_basebackup\tmp_chec
 if exist src\bin\pg_config\tmp_check rd /s /q src\bin\pg_config\tmp_check
 if exist src\bin\pg_ctl\tmp_check rd /s /q src\bin\pg_ctl\tmp_check
 if exist src\bin\pg_rewind\tmp_check rd /s /q src\bin\pg_rewind\tmp_check
+if exist src\bin\pgbench\tmp_check rd /s /q src\bin\pgbench\tmp_check
 if exist src\bin\scripts\tmp_check rd /s /q src\bin\scripts\tmp_check
 
 REM Clean up datafiles built with contrib
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to