On 20.01.2025 17:20, Ilia Evdokimov wrote:

On 15.01.2025 20:16, Sami Imseih wrote:
Probably, but first I suggest benchmarking with sampling applied to all queries. If the results are good, we can later filter certain queries based on different characteristics.
Absolutely. The benchmark numbers to justify this feature are
the next step. Thanks for your work on this!

Regards,

Sami

Hi hackers,

I’d like to share the results of my benchmark.

To stress the spinlock in pg_stat_statements while incrementing counters for popular entries, it's important to use the same query repeatedly. To avoid overloading pgss with normalization, the queries should not contain constants. I found that using the query 'SELECT now()' works best for this purpose.

I ran the benchmark on a machine with 48 CPUs, which may not be sufficient to fully test sampling, but I was able to achieve around 1.5 million TPS using 'SELECT now()'. To load the CPUs to about 85-90%, I ran pgbench with -c 45 -j 45, using a custom 'SELECT now()' in pgbench_script.sql file containing 'SELECT now()'. The benchmark was conducted on a newly created 'pgbench' database, , with processes like autovacuum, fsync, and checkpoints disabled(e.g., checkpoint _timeout = '24h').

I tested various 'sample_rate' values and compared the performance with pgss both enabled and disabled. The detailed results are attached in the 'sample_rate.txt' file, along with pgbench progress reports every 10 seconds.


pgbench -c 45 -j 45 -T 200 -f pgbench_script.sql --progress=10 pgbench

sample_rate | num of transactions | lat | stddev | tps

        1.0 |      1207127.572580 |0.037|  0.030 | 241400836
        0.8 |      1403551.516338 |0.032|  0.031 | 280673286
        0.5 |      1658596.614064 |0.027|  0.012 | 331679344
        0.2 |      1757747.509476 |0.025|  0.008 | 351507156
        0.0 |      1760055.986259 |0.025|  0.008 | 351965559
   pgss off |      1828743.791205 |0.024|  0.008 | 365703762


If anyone has the capability to run this benchmark on machines with more CPUs or with different queries, it would be nice. I’d appreciate any suggestions or feedback.

--.
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.


Hi hackers,


I'd like to share the results of a new benchmark I conducted using the pgbench tool with the -M prepared -S flags!


Previously, I shared a benchmark focused on extreme high-load scenarios, but now I’ve conducted a more realistic test case based on Alexander’s suggestion [0].

Machine Characteristics:

 * 48 CPUs
 * 374 GB RAM
 * 3 TB Disk

To avoid interference from background processes, I disabled autovacuum and synchronous operations using the following settings in postgresql.auto.conf:
    autovacuum = off
    fsync = off
    checkpoint_timeout = '24h'
    shared_preload_libraries = 'pg_stat_statements'

Create a test database:
    createdb pgbench
    pgbench -i -s 3000 pgbench

If pg_stat_statements is enabled, reset its hash table before starting the benchmark:
    psql -c 'SELECT pg_stat_statements_reset()'

Run the benchmark with the following command:
    pgbench -c 46 -j 46 -T 200 -M prepared -S --progress=10 pgbench


Here are the results with and without pg_stat_statements, varying the sample_rate parameter:

sample_rate |                 tps | lat | stddev | num of transactions
       1.0 |      1299070.105018 |0.035|  0.055 | 259781895
       0.8 |      1393327.873249 |0.033|  0.039 | 278630669
       0.5 |      1420303.856480 |0.032|  0.037 | 284023396
       0.2 |      1432482.672301 |0.032|  0.037 | 286461834
       0.0 |      1760055.986259 |0.032|  0.036 | 290169584
  pgss off |      1460920.805259 |0.031|  0.036 | 292144256

When pg_stat_statements is enabled, TPS decreases by about 10%. However, by leveraging the new sampling feature (sample_rate), it is possible to significantly mitigate this overhead. For detailed benchmark results with the '-M prepared -S' flags, please see the attached file 'sample_rate_prepared.txt'


P.S. Previously, I shared results under extreme high-load conditions using the query 'SELECT now()'. For clarity, I’m rewriting those results for better readability:

sample_rate |                 tps | lat | stddev | num of transactions
       1.0 |      1207127.572580 |0.037|  0.030 | 241400836
       0.8 |      1403551.516338 |0.032|  0.031 | 280673286
       0.5 |      1658596.614064 |0.027|  0.012 | 331679344
       0.2 |      1757747.509476 |0.025|  0.008 | 351507156
       0.0 |      1760055.986259 |0.025|  0.008 | 351965559
  pgss off |      1828743.791205 |0.024|  0.008 | 365703762

Here, we observed a significant ~33% decrease in TPS when pgss was enabled without sampling.


Any thoughts?


[0]: https://www.postgresql.org/message-id/CAPpHfdsTKAQqC3A48-MGQhrhfEamXZPb64w%3Dutk7thQcOMNr7Q%40mail.gmail.com

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
pg_stat_statements.sample_rate = 1.0

pgbench -c 46 -j 46 -T 200 -M prepared -S --progress=10 pgbench
pgbench (16.6)
starting vacuum...end.
progress: 10.0 s, 314978.6 tps, lat 0.146 ms stddev 0.257, 0 failed
progress: 20.0 s, 849655.3 tps, lat 0.054 ms stddev 0.108, 0 failed
progress: 30.0 s, 1338227.1 tps, lat 0.034 ms stddev 0.045, 0 failed
progress: 40.0 s, 1375800.0 tps, lat 0.033 ms stddev 0.042, 0 failed
progress: 50.0 s, 1383403.2 tps, lat 0.033 ms stddev 0.039, 0 failed
progress: 60.0 s, 1381348.2 tps, lat 0.033 ms stddev 0.044, 0 failed
progress: 70.0 s, 1386079.3 tps, lat 0.033 ms stddev 0.040, 0 failed
progress: 80.0 s, 1372314.9 tps, lat 0.033 ms stddev 0.040, 0 failed
progress: 90.0 s, 1383231.2 tps, lat 0.033 ms stddev 0.042, 0 failed
progress: 100.0 s, 1376702.3 tps, lat 0.033 ms stddev 0.042, 0 failed
progress: 110.0 s, 1375622.4 tps, lat 0.033 ms stddev 0.043, 0 failed
progress: 120.0 s, 1387577.0 tps, lat 0.033 ms stddev 0.040, 0 failed
progress: 130.0 s, 1383540.3 tps, lat 0.033 ms stddev 0.039, 0 failed
progress: 140.0 s, 1383138.6 tps, lat 0.033 ms stddev 0.040, 0 failed
progress: 150.0 s, 1377247.6 tps, lat 0.033 ms stddev 0.042, 0 failed
progress: 160.0 s, 1382830.9 tps, lat 0.033 ms stddev 0.039, 0 failed
progress: 170.0 s, 1383637.7 tps, lat 0.033 ms stddev 0.041, 0 failed
progress: 180.0 s, 1381811.7 tps, lat 0.033 ms stddev 0.041, 0 failed
progress: 190.0 s, 1382384.7 tps, lat 0.033 ms stddev 0.041, 0 failed
transaction type: <builtin: select only>
scaling factor: 3000
query mode: prepared
number of clients: 46
number of threads: 46
maximum number of tries: 1
duration: 200 s
number of transactions actually processed: 259781895
number of failed transactions: 0 (0.000%)
latency average = 0.035 ms
latency stddev = 0.055 ms
initial connection time = 26.448 ms
tps = 1299070.105018 (without initial connection time)

pgbench=# select query, calls from pg_stat_statements where calls > 1000;
                        query                         |   calls   
------------------------------------------------------+-----------
 SELECT abalance FROM pgbench_accounts WHERE aid = $1 | 259781895
(1 row)


pgbench=# show pg_stat_statements.sample_rate ;
 pg_stat_statements.sample_rate 
--------------------------------
 1
(1 row)

pgbench=# select count(*) from pg_stat_statements;
 count 
-------
   130
(1 row)

========================================================================

pg_stat_statements.sample_rate = 0.8

pgbench -c 46 -j 46 -T 200 -M prepared -S --progress=10 pgbench
pgbench (16.6)
starting vacuum...end.
progress: 10.0 s, 319046.3 tps, lat 0.144 ms stddev 0.257, 0 failed
progress: 20.0 s, 893116.1 tps, lat 0.051 ms stddev 0.102, 0 failed
progress: 30.0 s, 1440539.1 tps, lat 0.032 ms stddev 0.025, 0 failed
progress: 40.0 s, 1484261.8 tps, lat 0.031 ms stddev 0.015, 0 failed
progress: 50.0 s, 1483957.5 tps, lat 0.031 ms stddev 0.018, 0 failed
progress: 60.0 s, 1491030.9 tps, lat 0.031 ms stddev 0.014, 0 failed
progress: 70.0 s, 1483171.0 tps, lat 0.031 ms stddev 0.014, 0 failed
progress: 80.0 s, 1475368.5 tps, lat 0.031 ms stddev 0.015, 0 failed
progress: 90.0 s, 1487719.7 tps, lat 0.031 ms stddev 0.015, 0 failed
progress: 100.0 s, 1485387.0 tps, lat 0.031 ms stddev 0.016, 0 failed
progress: 110.0 s, 1474178.4 tps, lat 0.031 ms stddev 0.020, 0 failed
progress: 120.0 s, 1489178.2 tps, lat 0.031 ms stddev 0.016, 0 failed
progress: 130.0 s, 1480493.7 tps, lat 0.031 ms stddev 0.015, 0 failed
progress: 140.0 s, 1477666.3 tps, lat 0.031 ms stddev 0.014, 0 failed
progress: 150.0 s, 1483067.4 tps, lat 0.031 ms stddev 0.016, 0 failed
progress: 160.0 s, 1480664.3 tps, lat 0.031 ms stddev 0.014, 0 failed
progress: 170.0 s, 1482848.3 tps, lat 0.031 ms stddev 0.019, 0 failed
progress: 180.0 s, 1482780.2 tps, lat 0.031 ms stddev 0.015, 0 failed
progress: 190.0 s, 1487093.0 tps, lat 0.031 ms stddev 0.015, 0 failed
transaction type: <builtin: select only>
scaling factor: 3000
query mode: prepared
number of clients: 46
number of threads: 46
maximum number of tries: 1
duration: 200 s
number of transactions actually processed: 278630669
number of failed transactions: 0 (0.000%)
latency average = 0.033 ms
latency stddev = 0.039 ms
initial connection time = 26.004 ms
tps = 1393327.873249 (without initial connection time)

pgbench=# select query, calls from pg_stat_statements where calls > 1000;
                        query                         |   calls   
------------------------------------------------------+-----------
 SELECT abalance FROM pgbench_accounts WHERE aid = $1 | 222902226
(1 row)

pgbench=# show pg_stat_statements.sample_rate ;
 pg_stat_statements.sample_rate 
--------------------------------
 0.8
(1 row)

pgbench=# select count(*) from pg_stat_statements;
 count 
-------
   117
(1 row)


========================================================================

pg_stat_statements.sample_rate = 0.5

pgbench -c 46 -j 46 -T 200 -M prepared -S --progress=10 pgbench
pgbench (16.6)
starting vacuum...end.
progress: 10.0 s, 313361.7 tps, lat 0.146 ms stddev 0.263, 0 failed
progress: 20.0 s, 907177.0 tps, lat 0.051 ms stddev 0.103, 0 failed
progress: 30.0 s, 1467741.3 tps, lat 0.031 ms stddev 0.022, 0 failed
progress: 40.0 s, 1514111.4 tps, lat 0.030 ms stddev 0.009, 0 failed
progress: 50.0 s, 1508402.2 tps, lat 0.030 ms stddev 0.008, 0 failed
progress: 60.0 s, 1511752.6 tps, lat 0.030 ms stddev 0.008, 0 failed
progress: 70.0 s, 1505456.0 tps, lat 0.030 ms stddev 0.009, 0 failed
progress: 80.0 s, 1502666.4 tps, lat 0.030 ms stddev 0.008, 0 failed
progress: 90.0 s, 1512033.1 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 100.0 s, 1503996.7 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 110.0 s, 1506571.7 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 120.0 s, 1523809.6 tps, lat 0.030 ms stddev 0.013, 0 failed
progress: 130.0 s, 1515853.1 tps, lat 0.030 ms stddev 0.008, 0 failed
progress: 140.0 s, 1511753.9 tps, lat 0.030 ms stddev 0.010, 0 failed
progress: 150.0 s, 1511626.1 tps, lat 0.030 ms stddev 0.010, 0 failed
progress: 160.0 s, 1513088.5 tps, lat 0.030 ms stddev 0.008, 0 failed
progress: 170.0 s, 1520756.9 tps, lat 0.030 ms stddev 0.009, 0 failed
progress: 180.0 s, 1519851.7 tps, lat 0.030 ms stddev 0.010, 0 failed
progress: 190.0 s, 1517730.6 tps, lat 0.030 ms stddev 0.009, 0 failed
transaction type: <builtin: select only>
scaling factor: 3000
query mode: prepared
number of clients: 46
number of threads: 46
maximum number of tries: 1
duration: 200 s
number of transactions actually processed: 284023396
number of failed transactions: 0 (0.000%)
latency average = 0.032 ms
latency stddev = 0.037 ms
initial connection time = 27.242 ms
tps = 1420303.856480 (without initial connection time)

pgbench=# select query, calls from pg_stat_statements where calls > 1000;
                        query                         |   calls   
------------------------------------------------------+-----------
 SELECT abalance FROM pgbench_accounts WHERE aid = $1 | 142014320
(1 row)

pgbench=# show pg_stat_statements.sample_rate ;
 pg_stat_statements.sample_rate 
--------------------------------
 0.5
(1 row)

pgbench=# select count(*) from pg_stat_statements;
 count 
-------
    83
(1 row)

========================================================================

pg_stat_statements.sample_rate = 0.2

pgbench -c 46 -j 46 -T 200 -M prepared -S --progress=10 pgbench
pgbench (16.6)
starting vacuum...end.
progress: 10.0 s, 305735.8 tps, lat 0.150 ms stddev 0.266, 0 failed
progress: 20.0 s, 886211.5 tps, lat 0.052 ms stddev 0.105, 0 failed
progress: 30.0 s, 1477449.3 tps, lat 0.031 ms stddev 0.023, 0 failed
progress: 40.0 s, 1533608.6 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 50.0 s, 1520686.0 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 60.0 s, 1521163.7 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 70.0 s, 1506093.3 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 80.0 s, 1521926.6 tps, lat 0.030 ms stddev 0.009, 0 failed
progress: 90.0 s, 1532715.4 tps, lat 0.030 ms stddev 0.006, 0 failed
progress: 100.0 s, 1536818.2 tps, lat 0.030 ms stddev 0.008, 0 failed
progress: 110.0 s, 1532156.5 tps, lat 0.030 ms stddev 0.008, 0 failed
progress: 120.0 s, 1531077.6 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 130.0 s, 1533445.0 tps, lat 0.030 ms stddev 0.011, 0 failed
progress: 140.0 s, 1530465.1 tps, lat 0.030 ms stddev 0.006, 0 failed
progress: 150.0 s, 1533264.8 tps, lat 0.030 ms stddev 0.006, 0 failed
progress: 160.0 s, 1531706.4 tps, lat 0.030 ms stddev 0.009, 0 failed
progress: 170.0 s, 1527768.1 tps, lat 0.030 ms stddev 0.006, 0 failed
progress: 180.0 s, 1523762.9 tps, lat 0.030 ms stddev 0.008, 0 failed
progress: 190.0 s, 1531264.7 tps, lat 0.030 ms stddev 0.008, 0 failed
transaction type: <builtin: select only>
scaling factor: 3000
query mode: prepared
number of clients: 46
number of threads: 46
maximum number of tries: 1
duration: 200 s
number of transactions actually processed: 286461834
number of failed transactions: 0 (0.000%)
latency average = 0.032 ms
latency stddev = 0.037 ms
initial connection time = 26.189 ms
tps = 1432482.672301 (without initial connection time)

pgbench=# select query, calls from pg_stat_statements where calls > 1000;
                        query                         |  calls   
------------------------------------------------------+----------
 SELECT abalance FROM pgbench_accounts WHERE aid = $1 | 57294631
(1 row)

pgbench=# show pg_stat_statements.sample_rate ;
 pg_stat_statements.sample_rate 
--------------------------------
 0.2
(1 row)

pgbench=# select count(*) from pg_stat_statements;
 count 
-------
    56
(1 row)


========================================================================

pg_stat_statements.sample_rate = 0.0

pgbench -c 46 -j 46 -T 200 -M prepared -S --progress=10 pgbench
pgbench (16.6)
starting vacuum...end.
progress: 10.0 s, 308612.3 tps, lat 0.149 ms stddev 0.263, 0 failed
progress: 20.0 s, 904494.3 tps, lat 0.051 ms stddev 0.103, 0 failed
progress: 30.0 s, 1497609.8 tps, lat 0.031 ms stddev 0.023, 0 failed
progress: 40.0 s, 1549945.7 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 50.0 s, 1552001.6 tps, lat 0.030 ms stddev 0.009, 0 failed
progress: 60.0 s, 1550998.6 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 70.0 s, 1552726.1 tps, lat 0.029 ms stddev 0.006, 0 failed
progress: 80.0 s, 1551847.2 tps, lat 0.030 ms stddev 0.006, 0 failed
progress: 90.0 s, 1542375.5 tps, lat 0.030 ms stddev 0.008, 0 failed
progress: 100.0 s, 1552666.8 tps, lat 0.029 ms stddev 0.006, 0 failed
progress: 110.0 s, 1542779.7 tps, lat 0.030 ms stddev 0.010, 0 failed
progress: 120.0 s, 1533717.7 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 130.0 s, 1532413.9 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 140.0 s, 1544919.6 tps, lat 0.030 ms stddev 0.006, 0 failed
progress: 150.0 s, 1539568.5 tps, lat 0.030 ms stddev 0.006, 0 failed
progress: 160.0 s, 1551072.9 tps, lat 0.030 ms stddev 0.007, 0 failed
progress: 170.0 s, 1558352.5 tps, lat 0.029 ms stddev 0.009, 0 failed
progress: 180.0 s, 1556477.4 tps, lat 0.029 ms stddev 0.007, 0 failed
progress: 190.0 s, 1549669.6 tps, lat 0.030 ms stddev 0.007, 0 failed
transaction type: <builtin: select only>
scaling factor: 3000
query mode: prepared
number of clients: 46
number of threads: 46
maximum number of tries: 1
duration: 200 s
number of transactions actually processed: 290169584
number of failed transactions: 0 (0.000%)
latency average = 0.032 ms
latency stddev = 0.036 ms
initial connection time = 27.090 ms
tps = 1451036.547496 (without initial connection time)

pgbench=# select query, calls from pg_stat_statements where calls > 1000;
 query | calls 
-------+-------
(0 rows)

pgbench=# show pg_stat_statements.sample_rate ;
 pg_stat_statements.sample_rate 
--------------------------------
 0
(1 row)

pgbench=# select count(*) from pg_stat_statements;
 count 
-------
     0
(1 row)


========================================================================

pg_stat_statements is off

pgbench -c 46 -j 46 -T 200 -M prepared -S --progress=10 pgbench
pgbench (16.6)
starting vacuum...end.
progress: 10.0 s, 316410.4 tps, lat 0.145 ms stddev 0.257, 0 failed
progress: 20.0 s, 919027.2 tps, lat 0.050 ms stddev 0.103, 0 failed
progress: 30.0 s, 1513827.2 tps, lat 0.030 ms stddev 0.021, 0 failed
progress: 40.0 s, 1555707.4 tps, lat 0.029 ms stddev 0.006, 0 failed
progress: 50.0 s, 1552820.8 tps, lat 0.029 ms stddev 0.006, 0 failed
progress: 60.0 s, 1564985.9 tps, lat 0.029 ms stddev 0.006, 0 failed
progress: 70.0 s, 1559507.4 tps, lat 0.029 ms stddev 0.007, 0 failed
progress: 80.0 s, 1561871.2 tps, lat 0.029 ms stddev 0.006, 0 failed
progress: 90.0 s, 1563076.5 tps, lat 0.029 ms stddev 0.006, 0 failed
progress: 100.0 s, 1562030.4 tps, lat 0.029 ms stddev 0.006, 0 failed
progress: 110.0 s, 1555040.3 tps, lat 0.029 ms stddev 0.008, 0 failed
progress: 120.0 s, 1557295.8 tps, lat 0.029 ms stddev 0.008, 0 failed
progress: 130.0 s, 1552556.9 tps, lat 0.029 ms stddev 0.008, 0 failed
progress: 140.0 s, 1547155.3 tps, lat 0.030 ms stddev 0.006, 0 failed
progress: 150.0 s, 1543266.5 tps, lat 0.030 ms stddev 0.008, 0 failed
progress: 160.0 s, 1561086.8 tps, lat 0.029 ms stddev 0.007, 0 failed
progress: 170.0 s, 1558993.9 tps, lat 0.029 ms stddev 0.009, 0 failed
progress: 180.0 s, 1554616.8 tps, lat 0.029 ms stddev 0.006, 0 failed
progress: 190.0 s, 1557972.0 tps, lat 0.029 ms stddev 0.007, 0 failed
transaction type: <builtin: select only>
scaling factor: 3000
query mode: prepared
number of clients: 46
number of threads: 46
maximum number of tries: 1
duration: 200 s
number of transactions actually processed: 292144256
number of failed transactions: 0 (0.000%)
latency average = 0.031 ms
latency stddev = 0.036 ms
initial connection time = 29.085 ms
tps = 1460920.805259 (without initial connection time)

Reply via email to