Hi, Thanks to your efforts the performance of the partitioned table has improved greatly. Since I evaluated the performance by combining the performance improvement patches proposed in PG12, I share it.
The purpose of this evaluation is to organize performance issues of the table which is partitioned large number like over thousand and I want to bring the partitioned table performance close to the no partitioned table performance. I used pgbench. On the premise that tables don't have data and plan_cache_mode is auto. - source: master(38763d6778 Date: Thu Sep 20 15:52:39 2018 +1200) + v9 patch[1] + v5 patch[2] + v3 patch[3] [1] Reduce partition tuple routing overheads https://commitfest.postgresql.org/19/1690/ [2] Revise executor's handling of range table relations https://commitfest.postgresql.org/19/1758/ [3] Speed up planning with partitions https://commitfest.postgresql.org/19/1778/ - table definition: When 6400 items of data is inserted to parent table, the each leaf partitions have the same number of items. For example, the following DDL is the number of leaf partitions is a hundred. create table test.accounts_history(id serial, aid int, delta int, mtime timestamp without time zone) partition by range(aid); create table test.account_part_1 partition of test.accounts for values from (1) to (65); create table test.account_part_2 partition of test.accounts for values from (65) to (129); ... create table test.account_part_100 partition of test.accounts for values from (6337) to (6400); create table test.ah_part_1 partition of test.accounts_history for values from (1) to (65); create table test.ah_part_2 partition of test.accounts_history for values from (65) to (129); ... create table test.ah_part_100 partition of test.accounts_history for values from (6337) to (6400); - benchmark script: I make SQL which only one leaf partition is targeted in each case of SELECT/UPDATE/DELETE/INSERT. \set aid random(1, 6400) \set delta random(-5000, 5000) SELECT abalance FROM test.accounts WHERE aid = :aid; or INSERT INTO test.accounts_history (aid, delta, mtime) VALUES (:aid, :delta, CURRENT_TIMESTAMP); or UPDATE test.accounts SET abalance = abalance + :delta WHERE aid = :aid; or DELETE FROM test.accounts where aid = :aid; - results: 1. simple mode results: part_num is the number of partition and 0 means no partitioned case. tps_ex is tps of excluding connections establishing. Also, after pgbench, I evaluate Planning time and Execution Time using explain analyze. plan_time_avg and execute_time_avg are average of explain analyze when executed ten times. pgbench -n -T 60 -r -f select.sql part_num | tps_ex | plan_time_avg | execute_time_avg ----------+-------------+---------------+------------------ 0 | 8285.83582 | 0.0528 | 0.0222 100 | 5948.1711 | 0.1342 | 0.0306 200 | 5436.438478 | 0.15 | 0.0298 400 | 4523.867744 | 0.148 | 0.03 800 | 3460.625739 | 0.1447 | 0.0305 1600 | 2428.795542 | 0.1528 | 0.0303 3200 | 1539.672214 | 0.1552 | 0.0316 6400 | 880.965232 | 0.1704 | 0.0288 (8 rows) pgbench -n -T 60 -r -f update.sql part_num | tps_ex | plan_time_avg | execute_time_avg ----------+-------------+---------------+------------------ 0 | 7360.58261 | 0.0596 | 0.0417 100 | 4633.880563 | 0.1564 | 0.105 200 | 3972.737702 | 0.152 | 0.1007 400 | 3000.23471 | 0.1594 | 0.1039 800 | 2139.676379 | 0.1664 | 0.1055 1600 | 1348.553673 | 0.165 | 0.1056 3200 | 787.48559 | 0.1774 | 0.1124 6400 | 411.575671 | 0.1823 | 0.1089 (8 rows) pgbench -n -T 60 -r -f delete.sql part_num | tps_ex | plan_time_avg | execute_time_avg ----------+-------------+---------------+------------------ 0 | 8133.84019 | 0.057 | 0.0403 100 | 5150.452458 | 0.1398 | 0.0936 200 | 4352.69018 | 0.1414 | 0.0964 400 | 3298.86364 | 0.1459 | 0.099 800 | 2245.946178 | 0.1559 | 0.1029 1600 | 1386.92366 | 0.1591 | 0.1048 3200 | 802.024765 | 0.1617 | 0.1042 6400 | 407.214158 | 0.168 | 0.1087 (8 rows) pgbench -n -T 60 -r -f insert.sql part_num | tps_ex | plan_time_avg | execute_time_avg ----------+-------------+---------------+------------------ 0 | 5246.142416 | 0.0307 | 0.0601 100 | 2190.331571 | 0.0311 | 0.3587 200 | 1452.601752 | 0.0301 | 0.5065 400 | 863.771879 | 0.031 | 0.7864 800 | 482.528223 | 0.0308 | 1.361 1600 | 254.026173 | 0.0325 | 2.4999 3200 | 133.04952 | 0.0327 | 4.94 6400 | 68.541207 | 0.0335 | 9.8271 (8 rows) SELECT/UPDATE/DELETE case Planning Time and Execution Time are almost same. but, tps decreases as the number of partitions increases. why? INSERT case Execution Time is longer as the number of partitions increase. This is why find_all_inheritors locks all leaf partitions. Improvement idea was already proposed in [1] but, not discussion started yet. 2. prepared mode results: part_num and tps_ex are same with simple mode results. plan_time_avg and execute_time are the average of the 7th to 10th out of 10 times. pgbench -n -M prepared -T 60 -r -f select.sql part_num | tps_ex | plan_time_avg | execute_time_avg ----------+-------------+---------------+------------------ 0 | 10939.35898 | 0.007 | 0.021 100 | 4705.814419 | 0.063 | 0.025 200 | 3884.922764 | 0.067 | 0.0255 400 | 2762.569518 | 0.06575 | 0.0255 800 | 1826.824659 | 0.07 | 0.02625 1600 | 1119.961247 | 0.065 | 0.021 3200 | 588.001498 | 0.08725 | 0.02775 6400 | 218.817738 | 0.1325 | 0.03325 (8 rows) pgbench -n -M prepared -T 60 -r -f update.sql part_num | tps_ex | plan_time_avg | execute_time_avg ----------+--------------+---------------+------------------ 0 | 10996.995001 | 0.01475 | 0.029 100 | 3711.790226 | 0.07975 | 0.074 200 | 2724.792289 | 0.0865 | 0.1055 400 | 1837.348771 | 0.09175 | 0.104 800 | 1105.378276 | 0.0925 | 0.10425 1600 | 619.924085 | 0.09425 | 0.1115 3200 | 289.202665 | 0.109 | 0.12475 6400 | 110.149098 | 0.1695 | 0.1585 (8 rows) pgbench -n -M prepared -T 60 -r -f delete.sql part_num | tps_ex | plan_time_avg | execute_time_avg ----------+--------------+---------------+------------------ 0 | 11398.458657 | 0.009 | 0.02625 100 | 3997.105459 | 0.072 | 0.08675 200 | 3124.704202 | 0.0745 | 0.09725 400 | 2113.749027 | 0.083 | 0.10125 800 | 1365.142607 | 0.081 | 0.107 1600 | 740.248148 | 0.0895 | 0.10925 3200 | 363.054579 | 0.09525 | 0.11375 6400 | 124.795721 | 0.14725 | 0.158 (8 rows) pgbench -n -M prepared -T 60 -r -f insert.sql part_num | tps_ex | plan_time_avg | execute_time_avg ----------+-------------+---------------+------------------ 0 | 6308.66377 | 0.01175 | 0.04225 100 | 2476.919703 | 0.014 | 0.26375 200 | 1569.59956 | 0.01325 | 0.38625 400 | 912.4026 | 0.014 | 0.6055 800 | 502.504371 | 0.0145 | 1.05725 1600 | 259.97686 | 0.01425 | 1.9835 3200 | 136.809811 | 0.01225 | 3.877 6400 | 70.489791 | 0.0105 | 7.76575 (8 rows) SELECT/UPDATE/DELETE case As with simple mode results, tps decreases as the number of partitions increases. why? Also, I wonder why no prepared case is faster than prepared case. When I see all 10 times results of the explain analyze, I realize Planning Time of 6th explain analyze is much longer. why does it take so long to build a new generic plan? part_num | num | plan_time | execute_time ----------+-----+-----------+-------------- 6400 | 1 | 0.459 | 0.043 6400 | 2 | 0.168 | 0.028 6400 | 3 | 0.169 | 0.028 6400 | 4 | 0.151 | 0.027 6400 | 5 | 0.148 | 0.027 6400 | 6 | 449.213 | 0.052 6400 | 7 | 0.142 | 0.033 6400 | 8 | 0.12 | 0.034 6400 | 9 | 0.136 | 0.033 6400 | 10 | 0.132 | 0.033 INSERT case I think the cause of performance degradation is the same as simple mode. In the case of prepared statement, even if the target of SELECT statement is one leaf partition, since LockRelationOid of AcquireExecutorLocks is executed for all leaves, the performance decreases as the number of partitions increases. Can AcquireExecutorLocks lock only the target leaf partition? I am planning to investigate using a system TAP etc. for other bottlenecks. If you have any other convenient method, please let me know. Also, if there is something already known as a bottleneck, please let me know. regards, -- Sho Kato