On 03.09.2020 11:18, Michael Paquier wrote:
On Sun, Aug 16, 2020 at 02:26:57PM -0700, Andres Freund wrote:
So we get some builfarm results while thinking about this.
Andres, there is an entry in the CF for this thread:
https://commitfest.postgresql.org/29/2500/

A lot of work has been committed with 623a9ba, 73487a6, 5788e25, etc.
Now that PGXACT is done, how much work is remaining here?
--
Michael

Andres,
First of all a lot of thanks for this work.
Improving Postgres connection scalability is very important.

Reported results looks very impressive.
But I tried to reproduce them and didn't observed similar behavior.
So I am wondering what can be the difference and what I am doing wrong.

I have tried two different systems.
First one is IBM Power2 server with 384 cores and 8Tb of RAM.
I run the same read-only pgbench test as you. I do not think that size of the 
database is matter, so I used scale 100 -
it seems to be enough to avoid frequent buffer conflicts.
Then I run the same scripts as you:

 for ((n=100; n < 1000; n+=100)); do echo $n; pgbench -M prepared -c $n -T 100 
-j $n -M prepared -S -n postgres ;  done
 for ((n=1000; n <= 5000; n+=1000)); do echo $n; pgbench -M prepared -c $n -T 
100 -j $n -M prepared -S -n postgres ;  done


I have compared current master with version of Postgres prior to your commits 
with scalability improvements: a9a4a7ad56

For all number of connections older version shows slightly better results, for 
example for 500 clients: 475k TPS vs. 450k TPS for current master.

This is quite exotic server and I do not have currently access to it.
So I have repeated experiments at Intel server.
It has 160 cores Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz and 256Gb of RAM.

The same database, the same script, results are the following:

Clients         old/inc         old/exl         new/inc         new/exl
1000    1105750         1163292         1206105         1212701
2000    1050933         1124688         1149706         1164942
3000    1063667         1195158         1118087         1144216
4000    1040065         1290432         1107348         1163906
5000    943813  1258643         1103790         1160251

I have separately show results including/excluding connection connections 
establishing,
because in new version there are almost no differences between them,
but for old version gap between them is noticeable.

Configuration file has the following differences with default postgres config:

max_connections = 10000                 # (change requires restart)
shared_buffers = 8GB                    # min 128kB


This results contradict with yours and makes me ask the following questions:

1. Why in your case performance is almost two times larger (2 millions vs 1)?
The hardware in my case seems to be at least not worser than yours...
May be there are some other improvements in the version you have tested which 
are not yet committed to master?

2. You wrote: This is on a machine with 2
Intel(R) Xeon(R) Platinum 8168, but virtualized (2 sockets of 18 cores/36 
threads)

According to Intel specification Intel® Xeon® Platinum 8168 Processor has 24 
cores:
https://ark.intel.com/content/www/us/en/ark/products/120504/intel-xeon-platinum-8168-processor-33m-cache-2-70-ghz.html

And at your graph we can see almost linear increase of speed up to 40 
connections.

But most suspicious word for me is "virtualized". What is the actual hardware 
and how it is virtualized?

Do you have any idea why in my case master version (with your commits) behaves 
almost the same as non-patched version?
Below is yet another table showing scalability from 10 to 100 connections and 
combining your results (first two columns) and my results (last two columns):


Clients         old master      pgxact-split-cache      current master
        revision 9a4a7ad56
10      367883  375682  358984
        347067
20      748000  810964  668631
        630304
30      999231  1288276         920255
        848244
40      991672  1573310         1100745
        970717
50
        1017561         1715762         1193928
        1008755
60
        993943  1789698         1255629
        917788
70
        971379  1819477         1277634
        873022
80
        966276  1842248         1266523
        830197
90
        901175  1847823         1255260
        736550
100
        803175  1865795         1241143
        736756


May be it is because of more complex architecture of my server?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Reply via email to