Hi, Nathan.

I just realized that I almost forgot about this thread :)

> The result looks great, but the discussion in [0] shows that the result may
> vary among different ARM chips. Could you provide the chip model of this
> test? So that we can do a cross validation of this patch. Not sure if compiler
> version is necessary too. I'm willing to test it on Alibaba Cloud Yitian 710
> if I have time.

I did some benchmark on Yitian 710.

On c8y.16xlarge (64 cores):

Without the patch:
  80.31%  postgres               [.] __aarch64_swp4_acq
   1.77%  postgres               [.] __aarch64_ldadd4_acq_rel
   1.13%  postgres               [.] hash_search_with_hash_value
   0.87%  pg_stat_statements.so  [.] __aarch64_swp4_acq
   0.72%  postgres               [.] perform_spin_delay
   0.44%  postgres               [.] _bt_compare

tps = 295272.628421 (including connections establishing)
tps = 295335.660323 (excluding connections establishing)

Patched:
   9.94%  postgres               [.] s_lock
   6.07%  postgres               [.] __aarch64_swp4_acq
   5.73%  postgres               [.] hash_search_with_hash_value
   2.81%  postgres               [.] perform_spin_delay
   2.29%  postgres               [.] _bt_compare
   2.15%  postgres               [.] PinBuffer

tps = 864519.764125 (including connections establishing)
tps = 864638.244443 (excluding connections establishing)


Seems that great performance could be gained if s_lock contention is severe.
This may be more likely to happen on bigger machines.

On c8y.2xlarge (8 cores), I failed to make s_lock contended severely, and
as a result this patch didn’t bring any difference outside the noise.


Regards,
Jingtang




Reply via email to