Re: adding wait_start column to pg_locks

torikoshia Wed, 17 Feb 2021 23:27:16 -0800

On 2021-02-16 16:59, Fujii Masao wrote:

On 2021/02/15 15:17, Fujii Masao wrote:
On 2021/02/10 10:43, Fujii Masao wrote:
On 2021/02/09 23:31, torikoshia wrote:
On 2021-02-09 22:54, Fujii Masao wrote:
On 2021/02/09 19:11, Fujii Masao wrote:
On 2021/02/09 18:13, Fujii Masao wrote:
On 2021/02/09 17:48, torikoshia wrote:
On 2021-02-05 18:49, Fujii Masao wrote:
On 2021/02/05 0:03, torikoshia wrote:
On 2021-02-03 11:23, Fujii Masao wrote:
64-bit fetches are not atomic on some platforms. So spinlockis necessary when updating "waitStart" without holding thepartition lock? Also GetLockStatusData() needs spinlock whenreading "waitStart"?
Also it might be worth thinking to use 64-bit atomicoperations like
pg_atomic_read_u64(), for that.
Thanks for your suggestion and advice!
In the attached patch I used pg_atomic_read_u64() andpg_atomic_write_u64().
waitStart is TimestampTz i.e., int64, but it seemspg_atomic_read_xxx and pg_atomic_write_xxx only supportsunsigned int, so I cast the type.
I may be using these functions not correctly, so if somethingis wrong, I would appreciate any comments.
About the documentation, since your suggestion seems betterthan v6, I used it as is.
Thanks for updating the patch!

+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+        pg_atomic_write_u64(&MyProc->waitStart,
+ pg_atomic_read_u64((pg_atomic_uint64 *) &now));
pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
+ deadlockStart =get_timeout_start_time(DEADLOCK_TIMEOUT);
+        pg_atomic_write_u64(&MyProc->waitStart,
+ pg_atomic_read_u64((pg_atomic_uint64 *)&deadlockStart));
Same as above.

+        /*
+ * Record waitStart reusing the deadlock timeouttimer.
+         *
+ * It would be ideal this can be synchronously donewith updating+ * lock information. Howerver, since it givesperformance impacts+ * to hold partitionLock longer time, we do it hereasynchronously.
+         */
IMO it's better to comment why we reuse the deadlock timeouttimer.
     proc->waitStatus = waitStatus;
+    pg_atomic_init_u64(&MyProc->waitStart, 0);
pg_atomic_write_u64() should be used instead? Because waitStartcan be
accessed concurrently there.
I updated the patch and addressed the above review comments.Patch attached.
Barring any objection, I will commit this version.
Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).
Thanks for the test! I pushed the patch.
But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10
-    relation     | locktype |        mode
------------------+----------+---------------------
- test_prepared_1 | relation | RowExclusiveLock
- test_prepared_1 | relation | AccessExclusiveLock
-(2 rows)
-
+ERROR:  invalid spinlock number: 0
"rorqual" reported that the above error happened in the serverbuilt with
--disable-atomics --disable-spinlocks when reading pg_locks after
the transaction was prepared. The cause of this issue is that"waitStart"atomic variable in the dummy proc created at the end of preparetransactionwas not initialized. I updated the patch so thatpg_atomic_init_u64() iscalled for the "waitStart" in the dummy proc for preparedtransaction.
Patch attached. I confirmed that the patched server built with
--disable-atomics --disable-spinlocks passed all the regressiontests.
Thanks for fixing the bug, I also tested v9.patch configured with
--disable-atomics --disable-spinlocks on my environment andconfirmed
that all tests have passed.
Thanks for the test!
I found another bug in the patch. InitProcess() initializes"waitStart",but previously InitAuxiliaryProcess() did not. This could cause"invalid
spinlock number" error when reading pg_locks in the standby server.
I fixed that. Attached is the updated version of the patch.
I pushed this version. Thanks!
While reading the patch again, I found two minor things.
1. As discussed in another thread [1], the atomic variable "waitStart"shouldbe initialized at the postmaster startup rather than the startup ofeach
  child process. I changed "waitStart" so that it's initialized in
InitProcGlobal() and also reset to 0 by using pg_atomic_write_u64()in
  InitProcess() and InitAuxiliaryProcess().

2. Thanks to the above change, InitProcGlobal() initializes "waitStart"
even in PGPROC entries for prepare transactions. But those entriesarezeroed in MarkAsPreparingGuts(), so "waitStart" needs to beinitializedagain. Currently TwoPhaseGetDummyProc() initializes "waitStart" inthe
  PGPROC entry for prepare transaction. But it's better to do that in
  MarkAsPreparingGuts() instead because that function initializes other
  PGPROC variables. So I moved that initialization code from
  TwoPhaseGetDummyProc() to MarkAsPreparingGuts().

Patch attached. Thought?


Thanks for updating the patch!

It seems to me that the modification is right.
I ran some regression tests but didn't find problems.


Regards,


--
Atsushi Torikoshi

Re: adding wait_start column to pg_locks

Reply via email to