Hi Yura,
On 11/27/2017 07:41 AM, Юрий Соколов wrote:
I looked at assembly, and remembered, that last commit simplifies
`init_local_spin_delay` to just two-three writes of zeroes (looks
like compiler combines 2*4byte write into 1*8 write). Compared to
code around (especially in LWLockAcquire itself), this overhead
is negligible.
Though, I found that there is benefit in calling LWLockAttemptLockOnce
before entering loop with calls to LWLockAttemptLockOrQueue in the
LWLockAcquire (in there is not much contention). And this way, `inline`
decorator for LWLockAttemptLockOrQueue could be omitted. Given, clang
doesn't want to inline this function, it could be the best way.
In attach version with LWLockAcquireOnce called before entering loop
in LWLockAcquire.
Oh... there were stupid error in previos file.
Attached fixed version.
I can reconfirm my performance findings with this patch; system same as
up-thread.
Thanks !
Best regards,
Jesper