Hi, Nathan. I just realized that I almost forgot about this thread :)
> The result looks great, but the discussion in [0] shows that the result may > vary among different ARM chips. Could you provide the chip model of this > test? So that we can do a cross validation of this patch. Not sure if compiler > version is necessary too. I'm willing to test it on Alibaba Cloud Yitian 710 > if I have time. I did some benchmark on Yitian 710. On c8y.16xlarge (64 cores): Without the patch: 80.31% postgres [.] __aarch64_swp4_acq 1.77% postgres [.] __aarch64_ldadd4_acq_rel 1.13% postgres [.] hash_search_with_hash_value 0.87% pg_stat_statements.so [.] __aarch64_swp4_acq 0.72% postgres [.] perform_spin_delay 0.44% postgres [.] _bt_compare tps = 295272.628421 (including connections establishing) tps = 295335.660323 (excluding connections establishing) Patched: 9.94% postgres [.] s_lock 6.07% postgres [.] __aarch64_swp4_acq 5.73% postgres [.] hash_search_with_hash_value 2.81% postgres [.] perform_spin_delay 2.29% postgres [.] _bt_compare 2.15% postgres [.] PinBuffer tps = 864519.764125 (including connections establishing) tps = 864638.244443 (excluding connections establishing) Seems that great performance could be gained if s_lock contention is severe. This may be more likely to happen on bigger machines. On c8y.2xlarge (8 cores), I failed to make s_lock contended severely, and as a result this patch didn’t bring any difference outside the noise. Regards, Jingtang