RE: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

Floris Van Nee Sun, 08 Mar 2020 04:24:12 -0700

> Attached is v5, which inlines in a targeted fashion, pretty much in the same
> way as the earliest version. This is the same as v4 in every other way.
> Perhaps you can test this.
>


Thank you for the new patch. With the new one I am indeed able to reproduce a 
performance increase. It is very difficult to reliably reproduce it when 
running on a large number of cores though, due to the NUMA architecture.
For tests with a small number of cores, I pin all of them to the same node. 
With that, I see a significant performance increase for v5 compared to master. 
However, if I pin pgbench to a different node than the node that Postgres is 
pinned to, this leads to a 20% performance degradation compared to having all 
of them on the same node, as well as the stddev increasing by a factor of 2 
(regardless of patch). With that, it becomes very difficult to see any kind of 
performance increase due to the patch. For a large number of pgbench workers, I 
cannot specifically pin the pgbench worker on the same node as the Postgres 
backend connection it's handling. Leaving it to the OS gives very unreliable 
results - when I run the 30 workers / 30 connections test, I sometimes see 
periods of up to 30 minutes where it's doing it 'correctly', but it could also 
randomly run at the -20% performance for a long time. So far my best bet at 
explaining this is the NUMA performance hit. I'd like to be able to 
specifically schedule some Postgres backends to run on one node, while other 
Postgres backends run on a different node, but this isn't straightforward.

For now, I see no issues with the patch though. However, in real life 
situations there may be other, more important, optimizations for people that 
use big multi-node machines.

Thoughts?

-Floris

RE: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

Reply via email to