Hi, I've not yet been involved in postgresql's development process, so here's a first. Please find attached a patch for improving the BT binary search speeds, with accompanying performance data.
v1-0001 adapts _bt_compare and _bt_binsrch* to make use of static properties of L&Y-style nbtree indexes to speed up finding an initial position in the nbtee. I alter the logic from _bt_compare to accept a 'start-comparing-at-column' argument, and to also return which column the comparison result came from. This allows us to keep track of which columns we've already checked for equality, and when getting deeper into the index this allows us to skip checking the already equal key columns. Specifically, when binary-searching through a page, we keep track of which column was checked for the high-key, and which for the low-key. The first column of these that is not yet equal will be used for the next comparison. Any columns before this column must be equal, as both the high- and the low-keys in that page have already been validated as having an equal prefix. The column number is then passed on through the insertion key, allowing the optimization to be used in the climbing of the nbtree index. v1-0001 will be especially performant in nbtree indexes with a relatively low initial cardinality and high compare cost. My own performance testing was done on a laptop (4c/8t), after first getting it warm with other benchmarks & compilations, so the results are a bit unstable. While testing this I also noticed that there were a lot of pipeline stalls around the arguments and results of _bt_compare in the hot loops of _bt_binsrch* where the new functionality would be used, so I've moved the core logic of _bt_compare to a static inline function _bt_compare_inline, which helped the code to go from a 2% TPS decrease for single-column indexes to ~ 8% TPS increase for an insert-only workload, and for 3-column text all-collated indexes a TPS increase of 20% on my system. This also allowed me to keep the signature of _bt_compare the same as it was, limiting the scope of the changes to only the named functions. Finally, this could be a great start on prefix truncation for btree indexes, though that is _not_ the goal of this patch. This patch skips, but does not truncate, the common prefix. Kind regards, Matthias van de Meent P.S. One more thing I noticed in benchmarking is that a significant part of the costs of non-default collations is in looking up the collation twice in the collation cache. My guess from flame graphs is that there could be a large throughput increase for short strings if the collation lookup from lc_collate_is_c() in varstr_cmp could be reused in the subsequent call to pg_newlocale_from_collation()
System: Laptop (XPS 13 2018, i7-8550U 4c/8t, 16GB) Compile configuration: ./configure CFLAGS= -fno-omit-frame-pointer -ggdb -O2 Postgres configuration: default postgresql config files. All tests were run after getting the system hot with 15 minutes of other multi-thread benchmarking (~50% all-core load), and all tests had perf running attached to one of the backends for 900 seconds of the 930s benchmark PGBench command: pgbench postgres --random-seed=101 -P 10 -r -T 930 -c 10 -M prepared -s 100 -f bench_multicolumn.sql ----- \d+ pgbench_history Table "public.pgbench_history" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+-----------------------------+-----------+----------+---------+----------+--------------+------------- tid | integer | | | | plain | | bid | integer | | | | plain | | aid | integer | | | | plain | | delta | integer | | | | plain | | mtime | timestamp without time zone | | | | plain | | filler | character(22) | | | | extended | | Indexes: "pgbench_history_aid_idx" btree (aid) "pgbench_history_aid_idx1" btree (aid) "pgbench_history_aid_idx2" btree (aid) "pgbench_history_aid_idx3" btree (aid) "pgbench_history_aid_idx4" btree (aid) Access method: heap \d+ pgbench_history_2 Table "public.pgbench_history_2" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+-----------------------------+-----------+----------+---------+----------+--------------+------------- tid | integer | | | | plain | | bid | integer | | | | plain | | aid | integer | | | | plain | | delta | integer | | | | plain | | mtime | timestamp without time zone | | | | plain | | filler | character(22) | | | | extended | | Indexes: "pgbench_history_2_bid_tid_aid_idx" btree ((bid::text) COLLATE "en_GB.utf8", (tid::text) COLLATE "en_GB.utf8", (aid::text) COLLATE "en_GB.utf8") "pgbench_history_2_bid_tid_aid_idx1" btree ((bid::text) COLLATE "en_GB.utf8", (tid::text) COLLATE "en_GB.utf8", (aid::text) COLLATE "en_GB.utf8") "pgbench_history_2_bid_tid_aid_idx2" btree ((bid::text) COLLATE "en_GB.utf8", (tid::text) COLLATE "en_GB.utf8", (aid::text) COLLATE "en_GB.utf8") Access method: heap ----- # cat bench.sql \set aid random(1, 100000 * :scale) \set bid random(1, 1 * :scale) \set tid random(1, 10 * :scale) \set delta random(-5000, 5000) \set aid2 random(1, 100000 * :scale) \set bid2 random(1, 1 * :scale) \set tid2 random(1, 10 * :scale) \set delta2 random(-5000, 5000) \set aid3 random(1, 100000 * :scale) \set bid3 random(1, 1 * :scale) \set tid3 random(1, 10 * :scale) \set delta3 random(-5000, 5000) \set aid4 random(1, 100000 * :scale) \set bid4 random(1, 1 * :scale) \set tid4 random(1, 10 * :scale) \set delta4 random(-5000, 5000) \set aidsel random(1, 100000 * :scale) \set bidsel random(1, 1 * :scale) \set tidsel random(1, 10 * :scale) BEGIN; INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) SELECT tid.val, bid.val, aid.val, delta.val, CURRENT_TIMESTAMP FROM unnest(ARRAY[ :tid, :tid2, :tid3, :tid4 ]::int[]) AS tid(val) CROSS JOIN unnest(ARRAY[ :aid, :aid2, :aid3, :aid4 ]::int[]) AS aid(val) CROSS JOIN unnest(ARRAY[ :bid, :bid2, :bid3, :bid4 ]::int[]) AS bid(val) CROSS JOIN unnest(ARRAY[ :delta, :delta2, :delta3, :delta4 ]::int[]) AS delta(val) ORDER BY 1,2,3,4; END; ----- # cat bench_multicolumn.sql \set aid random(1, 100000 * :scale) \set bid random(1, 1 * :scale) \set tid random(1, 10 * :scale) \set delta random(-5000, 5000) \set aid2 random(1, 100000 * :scale) \set bid2 random(1, 1 * :scale) \set tid2 random(1, 10 * :scale) \set delta2 random(-5000, 5000) \set aid3 random(1, 100000 * :scale) \set bid3 random(1, 1 * :scale) \set tid3 random(1, 10 * :scale) \set delta3 random(-5000, 5000) \set aid4 random(1, 100000 * :scale) \set bid4 random(1, 1 * :scale) \set tid4 random(1, 10 * :scale) \set delta4 random(-5000, 5000) \set aidsel random(1, 100000 * :scale) \set bidsel random(1, 1 * :scale) \set tidsel random(1, 10 * :scale) BEGIN; INSERT INTO pgbench_history_2 (tid, bid, aid, delta, mtime) SELECT tid.val, bid.val, aid.val, delta.val, CURRENT_TIMESTAMP FROM unnest(ARRAY[ :tid, :tid2, :tid3, :tid4 ]::int[]) AS tid(val) CROSS JOIN unnest(ARRAY[ :aid, :aid2, :aid3, :aid4 ]::int[]) AS aid(val) CROSS JOIN unnest(ARRAY[ :bid, :bid2, :bid3, :bid4 ]::int[]) AS bid(val) CROSS JOIN unnest(ARRAY[ :delta, :delta2, :delta3, :delta4 ]::int[]) AS delta(val) ORDER BY 1,2,3,4; END; ------ patched, 930 s, bench.sql pgbench: setting random seed to 101 starting vacuum...end. progress: 10.0 s, 341.0 tps, lat 29.089 ms stddev 30.810 progress: 20.0 s, 319.8 tps, lat 31.171 ms stddev 10.435 progress: 30.0 s, 247.4 tps, lat 38.007 ms stddev 18.556 progress: 40.0 s, 264.2 tps, lat 39.939 ms stddev 76.057 progress: 50.0 s, 227.6 tps, lat 43.968 ms stddev 28.792 progress: 60.0 s, 301.9 tps, lat 33.063 ms stddev 40.975 progress: 70.0 s, 229.6 tps, lat 43.487 ms stddev 108.210 progress: 80.0 s, 335.9 tps, lat 29.665 ms stddev 24.645 progress: 90.0 s, 172.6 tps, lat 57.101 ms stddev 143.803 progress: 100.0 s, 271.4 tps, lat 37.329 ms stddev 101.465 progress: 110.0 s, 223.8 tps, lat 44.595 ms stddev 54.963 progress: 120.0 s, 297.0 tps, lat 33.564 ms stddev 11.235 progress: 130.0 s, 190.0 tps, lat 52.578 ms stddev 84.232 progress: 140.0 s, 287.3 tps, lat 34.508 ms stddev 22.400 progress: 150.0 s, 244.5 tps, lat 41.034 ms stddev 59.983 progress: 160.0 s, 144.1 tps, lat 51.613 ms stddev 104.385 progress: 170.0 s, 253.4 tps, lat 49.536 ms stddev 182.602 progress: 180.0 s, 129.6 tps, lat 67.945 ms stddev 115.509 progress: 190.0 s, 219.4 tps, lat 50.908 ms stddev 171.248 progress: 200.0 s, 217.8 tps, lat 45.827 ms stddev 41.240 progress: 210.0 s, 275.8 tps, lat 36.161 ms stddev 13.430 progress: 220.0 s, 126.5 tps, lat 79.032 ms stddev 212.278 progress: 230.0 s, 239.1 tps, lat 41.366 ms stddev 32.036 progress: 240.0 s, 227.8 tps, lat 44.042 ms stddev 59.363 progress: 250.0 s, 152.7 tps, lat 62.596 ms stddev 118.120 progress: 260.0 s, 262.3 tps, lat 39.728 ms stddev 34.226 progress: 270.0 s, 208.1 tps, lat 47.917 ms stddev 76.345 progress: 280.0 s, 181.4 tps, lat 54.180 ms stddev 78.126 progress: 290.0 s, 222.6 tps, lat 45.635 ms stddev 87.984 progress: 300.0 s, 173.2 tps, lat 57.669 ms stddev 100.865 progress: 310.0 s, 244.9 tps, lat 40.697 ms stddev 11.285 progress: 320.0 s, 146.3 tps, lat 68.471 ms stddev 256.766 progress: 330.0 s, 203.7 tps, lat 47.753 ms stddev 41.044 progress: 340.0 s, 159.0 tps, lat 64.396 ms stddev 217.094 progress: 350.0 s, 136.2 tps, lat 73.353 ms stddev 227.920 progress: 360.0 s, 215.2 tps, lat 46.403 ms stddev 46.676 progress: 370.0 s, 144.9 tps, lat 68.682 ms stddev 122.000 progress: 380.0 s, 247.4 tps, lat 40.382 ms stddev 41.599 progress: 390.0 s, 200.9 tps, lat 49.744 ms stddev 98.925 progress: 400.0 s, 185.9 tps, lat 50.971 ms stddev 73.078 progress: 410.0 s, 210.1 tps, lat 49.879 ms stddev 104.948 progress: 420.0 s, 152.3 tps, lat 65.604 ms stddev 118.946 progress: 430.0 s, 255.7 tps, lat 39.041 ms stddev 11.417 progress: 440.0 s, 158.8 tps, lat 62.968 ms stddev 121.171 progress: 450.0 s, 168.5 tps, lat 57.488 ms stddev 84.297 progress: 460.0 s, 191.6 tps, lat 53.493 ms stddev 124.024 progress: 470.0 s, 116.5 tps, lat 79.623 ms stddev 277.348 progress: 480.0 s, 233.9 tps, lat 45.812 ms stddev 86.531 progress: 490.0 s, 138.4 tps, lat 72.205 ms stddev 142.392 progress: 500.0 s, 186.6 tps, lat 53.620 ms stddev 129.372 progress: 510.0 s, 200.3 tps, lat 49.802 ms stddev 46.425 progress: 520.0 s, 172.8 tps, lat 55.253 ms stddev 82.574 progress: 530.0 s, 153.3 tps, lat 67.972 ms stddev 168.677 progress: 540.0 s, 165.7 tps, lat 60.225 ms stddev 90.945 progress: 550.0 s, 261.4 tps, lat 38.208 ms stddev 14.819 progress: 560.0 s, 118.0 tps, lat 84.580 ms stddev 164.559 progress: 570.0 s, 218.8 tps, lat 45.462 ms stddev 49.899 progress: 580.0 s, 198.9 tps, lat 50.508 ms stddev 89.160 progress: 590.0 s, 125.4 tps, lat 79.599 ms stddev 149.763 progress: 600.0 s, 232.0 tps, lat 42.878 ms stddev 18.589 progress: 610.0 s, 142.0 tps, lat 70.308 ms stddev 149.538 progress: 620.0 s, 211.2 tps, lat 44.240 ms stddev 52.681 progress: 630.0 s, 181.0 tps, lat 58.747 ms stddev 193.505 progress: 640.0 s, 121.0 tps, lat 79.193 ms stddev 271.643 progress: 650.0 s, 168.6 tps, lat 61.646 ms stddev 168.953 progress: 660.0 s, 130.9 tps, lat 75.944 ms stddev 219.201 progress: 670.0 s, 196.8 tps, lat 50.921 ms stddev 55.397 progress: 680.0 s, 118.3 tps, lat 84.445 ms stddev 154.030 progress: 690.0 s, 238.6 tps, lat 41.867 ms stddev 32.984 progress: 700.0 s, 139.4 tps, lat 71.821 ms stddev 124.981 progress: 710.0 s, 202.1 tps, lat 48.222 ms stddev 24.230 progress: 720.0 s, 127.5 tps, lat 80.099 ms stddev 215.664 progress: 730.0 s, 172.4 tps, lat 56.097 ms stddev 54.861 progress: 740.0 s, 165.8 tps, lat 62.131 ms stddev 167.580 progress: 750.0 s, 133.3 tps, lat 73.846 ms stddev 215.322 progress: 760.0 s, 189.3 tps, lat 53.416 ms stddev 65.091 progress: 770.0 s, 115.6 tps, lat 86.455 ms stddev 168.831 progress: 780.0 s, 172.2 tps, lat 58.051 ms stddev 76.124 progress: 790.0 s, 146.3 tps, lat 68.235 ms stddev 137.139 progress: 800.0 s, 224.1 tps, lat 44.547 ms stddev 16.364 progress: 810.0 s, 126.2 tps, lat 79.209 ms stddev 229.841 progress: 820.0 s, 205.1 tps, lat 46.008 ms stddev 10.014 progress: 830.0 s, 127.7 tps, lat 81.160 ms stddev 175.965 progress: 840.0 s, 140.6 tps, lat 72.159 ms stddev 204.774 progress: 850.0 s, 183.8 tps, lat 54.245 ms stddev 42.740 progress: 860.0 s, 143.2 tps, lat 68.757 ms stddev 106.735 progress: 870.0 s, 154.1 tps, lat 65.654 ms stddev 158.173 progress: 880.0 s, 137.5 tps, lat 70.474 ms stddev 110.565 progress: 890.0 s, 184.7 tps, lat 55.848 ms stddev 87.179 progress: 900.0 s, 115.7 tps, lat 86.003 ms stddev 299.908 progress: 910.0 s, 194.3 tps, lat 51.392 ms stddev 13.637 progress: 920.0 s, 108.3 tps, lat 92.466 ms stddev 209.959 progress: 930.0 s, 168.1 tps, lat 47.101 ms stddev 36.065 transaction type: bench.sql scaling factor: 100 query mode: prepared number of clients: 10 number of threads: 1 duration: 930 s number of transactions actually processed: 178177 latency average = 52.129 ms latency stddev = 114.687 ms tps = 191.518158 (including connections establishing) tps = 191.518738 (excluding connections establishing) statement latencies in milliseconds: 0.003 \set aid random(1, 100000 * :scale) 0.001 \set bid random(1, 1 * :scale) 0.001 \set tid random(1, 10 * :scale) 0.001 \set delta random(-5000, 5000) 0.001 \set aid2 random(1, 100000 * :scale) 0.001 \set bid2 random(1, 1 * :scale) 0.001 \set tid2 random(1, 10 * :scale) 0.001 \set delta2 random(-5000, 5000) 0.001 \set aid3 random(1, 100000 * :scale) 0.001 \set bid3 random(1, 1 * :scale) 0.001 \set tid3 random(1, 10 * :scale) 0.001 \set delta3 random(-5000, 5000) 0.001 \set aid4 random(1, 100000 * :scale) 0.001 \set bid4 random(1, 1 * :scale) 0.001 \set tid4 random(1, 10 * :scale) 0.001 \set delta4 random(-5000, 5000) 0.001 \set aidsel random(1, 100000 * :scale) 0.001 \set bidsel random(1, 1 * :scale) 0.001 \set tidsel random(1, 10 * :scale) 0.356 BEGIN; 17.408 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) 34.344 END; ----- master, 930 s, bench.sql pgbench: setting random seed to 101 starting vacuum...end. progress: 10.0 s, 312.5 tps, lat 31.717 ms stddev 14.195 progress: 20.0 s, 262.3 tps, lat 37.993 ms stddev 11.230 progress: 30.0 s, 272.7 tps, lat 36.714 ms stddev 23.265 progress: 40.0 s, 263.0 tps, lat 37.879 ms stddev 65.930 progress: 50.0 s, 224.8 tps, lat 44.470 ms stddev 86.014 progress: 60.0 s, 286.2 tps, lat 34.900 ms stddev 13.527 progress: 70.0 s, 165.2 tps, lat 60.233 ms stddev 171.420 progress: 80.0 s, 248.7 tps, lat 40.269 ms stddev 36.533 progress: 90.0 s, 190.4 tps, lat 52.448 ms stddev 41.025 progress: 100.0 s, 115.8 tps, lat 85.946 ms stddev 293.640 progress: 110.0 s, 286.4 tps, lat 34.936 ms stddev 14.323 progress: 120.0 s, 160.8 tps, lat 62.172 ms stddev 92.608 progress: 130.0 s, 215.8 tps, lat 46.146 ms stddev 131.800 progress: 140.0 s, 182.3 tps, lat 54.950 ms stddev 71.712 progress: 150.0 s, 256.6 tps, lat 38.815 ms stddev 11.481 progress: 160.0 s, 127.3 tps, lat 72.627 ms stddev 275.988 progress: 170.0 s, 191.9 tps, lat 55.868 ms stddev 185.480 progress: 180.0 s, 214.0 tps, lat 45.943 ms stddev 39.222 progress: 190.0 s, 160.0 tps, lat 63.257 ms stddev 130.757 progress: 200.0 s, 219.6 tps, lat 43.744 ms stddev 28.191 progress: 210.0 s, 181.0 tps, lat 57.440 ms stddev 99.531 progress: 220.0 s, 151.7 tps, lat 65.776 ms stddev 213.306 progress: 230.0 s, 159.7 tps, lat 62.497 ms stddev 136.030 progress: 240.0 s, 238.4 tps, lat 39.954 ms stddev 11.079 progress: 250.0 s, 132.1 tps, lat 79.017 ms stddev 143.541 progress: 260.0 s, 226.1 tps, lat 43.954 ms stddev 35.270 progress: 270.0 s, 204.8 tps, lat 49.023 ms stddev 69.802 progress: 280.0 s, 147.9 tps, lat 55.598 ms stddev 74.641 progress: 290.0 s, 206.6 tps, lat 56.805 ms stddev 187.694 progress: 300.0 s, 143.2 tps, lat 67.343 ms stddev 118.650 progress: 310.0 s, 214.7 tps, lat 43.773 ms stddev 35.842 progress: 320.0 s, 164.4 tps, lat 66.286 ms stddev 180.354 progress: 330.0 s, 218.8 tps, lat 45.652 ms stddev 12.087 progress: 340.0 s, 84.8 tps, lat 117.904 ms stddev 273.770 progress: 350.0 s, 184.5 tps, lat 54.002 ms stddev 138.251 progress: 360.0 s, 137.6 tps, lat 72.837 ms stddev 132.434 progress: 370.0 s, 226.8 tps, lat 44.034 ms stddev 13.589 progress: 380.0 s, 129.8 tps, lat 76.705 ms stddev 184.436 progress: 390.0 s, 211.2 tps, lat 47.189 ms stddev 11.393 progress: 400.0 s, 105.4 tps, lat 94.978 ms stddev 204.029 progress: 410.0 s, 192.0 tps, lat 51.845 ms stddev 79.272 progress: 420.0 s, 153.4 tps, lat 65.260 ms stddev 117.110 progress: 430.0 s, 221.9 tps, lat 45.092 ms stddev 12.958 progress: 440.0 s, 133.4 tps, lat 74.657 ms stddev 248.591 progress: 450.0 s, 197.3 tps, lat 50.618 ms stddev 37.545 progress: 460.0 s, 127.0 tps, lat 78.603 ms stddev 178.486 progress: 470.0 s, 175.3 tps, lat 57.002 ms stddev 162.394 progress: 480.0 s, 147.5 tps, lat 67.667 ms stddev 141.490 progress: 490.0 s, 228.5 tps, lat 43.423 ms stddev 11.420 progress: 500.0 s, 126.2 tps, lat 79.533 ms stddev 198.813 progress: 510.0 s, 188.5 tps, lat 51.785 ms stddev 60.370 progress: 520.0 s, 167.0 tps, lat 61.005 ms stddev 119.650 progress: 530.0 s, 133.8 tps, lat 74.877 ms stddev 230.466 progress: 540.0 s, 210.7 tps, lat 47.299 ms stddev 31.189 progress: 550.0 s, 114.6 tps, lat 80.473 ms stddev 167.048 progress: 560.0 s, 185.0 tps, lat 58.132 ms stddev 152.718 progress: 570.0 s, 160.7 tps, lat 62.098 ms stddev 77.087 progress: 580.0 s, 204.3 tps, lat 48.962 ms stddev 18.831 progress: 590.0 s, 101.2 tps, lat 98.628 ms stddev 290.047 progress: 600.0 s, 201.8 tps, lat 49.564 ms stddev 14.297 progress: 610.0 s, 114.3 tps, lat 87.298 ms stddev 188.860 progress: 620.0 s, 194.8 tps, lat 47.937 ms stddev 85.735 progress: 630.0 s, 165.2 tps, lat 64.231 ms stddev 140.820 progress: 640.0 s, 182.4 tps, lat 54.844 ms stddev 76.489 progress: 650.0 s, 69.6 tps, lat 143.706 ms stddev 303.705 progress: 660.0 s, 210.7 tps, lat 47.316 ms stddev 14.321 progress: 670.0 s, 108.9 tps, lat 91.912 ms stddev 222.168 progress: 680.0 s, 206.9 tps, lat 47.959 ms stddev 12.506 progress: 690.0 s, 135.4 tps, lat 74.120 ms stddev 246.657 progress: 700.0 s, 202.9 tps, lat 46.287 ms stddev 11.305 progress: 710.0 s, 119.2 tps, lat 88.569 ms stddev 186.168 progress: 720.0 s, 181.4 tps, lat 54.055 ms stddev 120.903 progress: 730.0 s, 175.1 tps, lat 58.069 ms stddev 71.869 progress: 740.0 s, 131.0 tps, lat 73.536 ms stddev 152.709 progress: 750.0 s, 216.3 tps, lat 47.885 ms stddev 82.389 progress: 760.0 s, 128.8 tps, lat 77.422 ms stddev 166.961 progress: 770.0 s, 228.6 tps, lat 43.780 ms stddev 19.923 progress: 780.0 s, 131.7 tps, lat 75.922 ms stddev 205.226 progress: 790.0 s, 192.8 tps, lat 50.344 ms stddev 52.140 progress: 800.0 s, 177.0 tps, lat 57.879 ms stddev 152.870 progress: 810.0 s, 153.3 tps, lat 65.338 ms stddev 239.970 progress: 820.0 s, 231.9 tps, lat 43.010 ms stddev 12.916 progress: 830.0 s, 124.9 tps, lat 79.828 ms stddev 169.748 progress: 840.0 s, 168.6 tps, lat 59.181 ms stddev 139.497 progress: 850.0 s, 181.9 tps, lat 54.637 ms stddev 63.153 progress: 860.0 s, 154.8 tps, lat 62.182 ms stddev 76.714 progress: 870.0 s, 130.3 tps, lat 79.718 ms stddev 248.145 progress: 880.0 s, 160.3 tps, lat 54.626 ms stddev 57.105 progress: 890.0 s, 112.6 tps, lat 99.791 ms stddev 287.384 progress: 900.0 s, 109.1 tps, lat 91.646 ms stddev 318.205 progress: 910.0 s, 135.5 tps, lat 73.656 ms stddev 138.860 progress: 920.0 s, 216.3 tps, lat 46.173 ms stddev 11.084 progress: 930.0 s, 92.5 tps, lat 108.177 ms stddev 219.292 transaction type: bench.sql scaling factor: 100 query mode: prepared number of clients: 10 number of threads: 1 duration: 930 s number of transactions actually processed: 164419 latency average = 56.479 ms latency stddev = 127.463 ms tps = 176.784451 (including connections establishing) tps = 176.785074 (excluding connections establishing) statement latencies in milliseconds: 0.003 \set aid random(1, 100000 * :scale) 0.001 \set bid random(1, 1 * :scale) 0.001 \set tid random(1, 10 * :scale) 0.001 \set delta random(-5000, 5000) 0.001 \set aid2 random(1, 100000 * :scale) 0.001 \set bid2 random(1, 1 * :scale) 0.001 \set tid2 random(1, 10 * :scale) 0.001 \set delta2 random(-5000, 5000) 0.001 \set aid3 random(1, 100000 * :scale) 0.001 \set bid3 random(1, 1 * :scale) 0.001 \set tid3 random(1, 10 * :scale) 0.001 \set delta3 random(-5000, 5000) 0.001 \set aid4 random(1, 100000 * :scale) 0.001 \set bid4 random(1, 1 * :scale) 0.001 \set tid4 random(1, 10 * :scale) 0.001 \set delta4 random(-5000, 5000) 0.001 \set aidsel random(1, 100000 * :scale) 0.001 \set bidsel random(1, 1 * :scale) 0.001 \set tidsel random(1, 10 * :scale) 0.361 BEGIN; 18.019 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) 38.078 END; ----- master, 930s, bench_multicolumn.sql pgbench: setting random seed to 101 starting vacuum...end. progress: 10.0 s, 430.8 tps, lat 23.133 ms stddev 8.663 progress: 20.0 s, 396.7 tps, lat 25.159 ms stddev 9.469 progress: 30.0 s, 268.1 tps, lat 37.273 ms stddev 19.749 progress: 40.0 s, 213.6 tps, lat 46.806 ms stddev 119.588 progress: 50.0 s, 266.8 tps, lat 36.114 ms stddev 22.595 progress: 60.0 s, 190.5 tps, lat 53.182 ms stddev 112.206 progress: 70.0 s, 224.8 tps, lat 45.354 ms stddev 190.966 progress: 80.0 s, 267.4 tps, lat 37.350 ms stddev 18.173 progress: 90.0 s, 90.9 tps, lat 110.253 ms stddev 375.539 progress: 100.0 s, 236.9 tps, lat 42.115 ms stddev 213.869 progress: 110.0 s, 247.8 tps, lat 40.288 ms stddev 43.067 progress: 120.0 s, 163.6 tps, lat 61.188 ms stddev 169.101 progress: 130.0 s, 199.6 tps, lat 32.540 ms stddev 19.470 progress: 140.0 s, 115.6 tps, lat 107.340 ms stddev 352.879 progress: 150.0 s, 252.4 tps, lat 43.892 ms stddev 127.978 progress: 160.0 s, 150.2 tps, lat 49.491 ms stddev 21.708 progress: 170.0 s, 149.9 tps, lat 83.520 ms stddev 418.274 progress: 180.0 s, 225.5 tps, lat 39.943 ms stddev 20.760 progress: 190.0 s, 100.5 tps, lat 101.205 ms stddev 280.655 progress: 200.0 s, 207.0 tps, lat 52.350 ms stddev 131.781 progress: 210.0 s, 105.1 tps, lat 94.512 ms stddev 243.827 progress: 220.0 s, 219.0 tps, lat 43.176 ms stddev 21.080 progress: 230.0 s, 105.2 tps, lat 100.327 ms stddev 357.633 ^[OS^[ ~^[progress: 240.0 s, 233.4 tps, lat 42.653 ms stddev 41.913 progress: 250.0 s, 123.6 tps, lat 81.513 ms stddev 219.729 progress: 260.0 s, 142.1 tps, lat 70.376 ms stddev 200.171 progress: 270.0 s, 159.8 tps, lat 62.572 ms stddev 142.659 progress: 280.0 s, 148.8 tps, lat 63.117 ms stddev 116.285 progress: 290.0 s, 175.1 tps, lat 60.219 ms stddev 182.777 progress: 300.0 s, 95.2 tps, lat 104.471 ms stddev 230.961 progress: 310.0 s, 213.3 tps, lat 47.123 ms stddev 26.251 progress: 320.0 s, 70.4 tps, lat 142.142 ms stddev 468.454 progress: 330.0 s, 192.0 tps, lat 51.311 ms stddev 38.811 progress: 340.0 s, 89.6 tps, lat 113.461 ms stddev 311.851 progress: 350.0 s, 186.9 tps, lat 52.970 ms stddev 107.756 progress: 360.0 s, 149.4 tps, lat 67.484 ms stddev 173.089 progress: 370.0 s, 101.3 tps, lat 98.464 ms stddev 363.240 progress: 380.0 s, 208.9 tps, lat 47.840 ms stddev 17.286 progress: 390.0 s, 129.3 tps, lat 77.385 ms stddev 251.631 progress: 400.0 s, 123.8 tps, lat 76.163 ms stddev 168.368 progress: 410.0 s, 167.0 tps, lat 62.836 ms stddev 121.803 progress: 420.0 s, 112.4 tps, lat 89.379 ms stddev 209.648 progress: 430.0 s, 165.4 tps, lat 52.842 ms stddev 21.411 progress: 440.0 s, 127.3 tps, lat 88.522 ms stddev 281.637 progress: 450.0 s, 94.0 tps, lat 98.969 ms stddev 214.125 progress: 460.0 s, 196.7 tps, lat 54.223 ms stddev 115.539 progress: 470.0 s, 108.8 tps, lat 91.794 ms stddev 319.129 progress: 480.0 s, 122.1 tps, lat 80.696 ms stddev 196.234 progress: 490.0 s, 170.5 tps, lat 59.599 ms stddev 122.153 progress: 500.0 s, 99.2 tps, lat 100.730 ms stddev 277.295 progress: 510.0 s, 148.5 tps, lat 63.083 ms stddev 129.138 progress: 520.0 s, 149.8 tps, lat 70.758 ms stddev 246.407 progress: 530.0 s, 88.5 tps, lat 113.259 ms stddev 332.573 progress: 540.0 s, 170.2 tps, lat 56.910 ms stddev 107.577 progress: 550.0 s, 135.0 tps, lat 76.030 ms stddev 210.734 progress: 560.0 s, 79.9 tps, lat 125.229 ms stddev 244.893 progress: 570.0 s, 181.1 tps, lat 52.076 ms stddev 23.068 progress: 580.0 s, 126.9 tps, lat 83.354 ms stddev 220.131 progress: 590.0 s, 92.7 tps, lat 107.679 ms stddev 237.332 progress: 600.0 s, 172.7 tps, lat 57.274 ms stddev 84.826 progress: 610.0 s, 145.2 tps, lat 69.548 ms stddev 149.931 progress: 620.0 s, 90.6 tps, lat 110.333 ms stddev 251.717 progress: 630.0 s, 147.6 tps, lat 60.933 ms stddev 92.366 progress: 640.0 s, 187.2 tps, lat 58.374 ms stddev 152.668 progress: 650.0 s, 187.9 tps, lat 53.002 ms stddev 110.319 progress: 660.0 s, 145.5 tps, lat 69.447 ms stddev 162.522 progress: 670.0 s, 131.2 tps, lat 76.227 ms stddev 200.570 progress: 680.0 s, 121.4 tps, lat 81.743 ms stddev 160.592 progress: 690.0 s, 155.2 tps, lat 52.632 ms stddev 70.630 progress: 700.0 s, 181.6 tps, lat 62.099 ms stddev 158.982 progress: 710.0 s, 156.5 tps, lat 65.605 ms stddev 150.245 progress: 720.0 s, 141.2 tps, lat 73.262 ms stddev 164.604 progress: 730.0 s, 120.1 tps, lat 82.681 ms stddev 140.803 progress: 740.0 s, 175.7 tps, lat 57.051 ms stddev 114.034 progress: 750.0 s, 147.6 tps, lat 67.595 ms stddev 147.916 progress: 760.0 s, 169.0 tps, lat 59.360 ms stddev 103.024 progress: 770.0 s, 156.3 tps, lat 56.774 ms stddev 64.596 progress: 780.0 s, 173.8 tps, lat 60.229 ms stddev 145.519 progress: 790.0 s, 171.7 tps, lat 60.409 ms stddev 139.820 progress: 800.0 s, 171.4 tps, lat 56.860 ms stddev 101.722 progress: 810.0 s, 136.4 tps, lat 72.898 ms stddev 193.728 progress: 820.0 s, 157.6 tps, lat 66.915 ms stddev 136.713 progress: 830.0 s, 197.3 tps, lat 50.840 ms stddev 111.424 progress: 840.0 s, 120.8 tps, lat 82.673 ms stddev 270.600 progress: 850.0 s, 151.8 tps, lat 65.834 ms stddev 132.691 progress: 860.0 s, 184.0 tps, lat 54.286 ms stddev 106.034 progress: 870.0 s, 132.6 tps, lat 75.403 ms stddev 175.190 progress: 880.0 s, 155.6 tps, lat 64.321 ms stddev 144.043 progress: 890.0 s, 138.1 tps, lat 72.183 ms stddev 149.395 progress: 900.0 s, 146.5 tps, lat 68.217 ms stddev 129.466 progress: 910.0 s, 155.5 tps, lat 59.979 ms stddev 69.778 progress: 920.0 s, 109.1 tps, lat 97.855 ms stddev 169.143 progress: 930.0 s, 115.1 tps, lat 86.888 ms stddev 157.241 transaction type: bench_multicolumn.sql scaling factor: 100 query mode: prepared number of clients: 10 number of threads: 1 duration: 930 s number of transactions actually processed: 150577 latency average = 61.729 ms latency stddev = 165.584 ms tps = 161.902177 (including connections establishing) tps = 161.902494 (excluding connections establishing) statement latencies in milliseconds: 0.002 \set aid random(1, 100000 * :scale) 0.001 \set bid random(1, 1 * :scale) 0.001 \set tid random(1, 10 * :scale) 0.001 \set delta random(-5000, 5000) 0.001 \set aid2 random(1, 100000 * :scale) 0.001 \set bid2 random(1, 1 * :scale) 0.001 \set tid2 random(1, 10 * :scale) 0.001 \set delta2 random(-5000, 5000) 0.000 \set aid3 random(1, 100000 * :scale) 0.001 \set bid3 random(1, 1 * :scale) 0.001 \set tid3 random(1, 10 * :scale) 0.001 \set delta3 random(-5000, 5000) 0.001 \set aid4 random(1, 100000 * :scale) 0.001 \set bid4 random(1, 1 * :scale) 0.001 \set tid4 random(1, 10 * :scale) 0.001 \set delta4 random(-5000, 5000) 0.001 \set aidsel random(1, 100000 * :scale) 0.001 \set bidsel random(1, 1 * :scale) 0.000 \set tidsel random(1, 10 * :scale) 0.256 BEGIN; 23.732 INSERT INTO pgbench_history_2 (tid, bid, aid, delta, mtime) 37.728 END; patched, 930 s, bench_multicolumn.sql pgbench: setting random seed to 101 starting vacuum...end. progress: 10.0 s, 245.9 tps, lat 40.426 ms stddev 13.338 progress: 20.0 s, 244.9 tps, lat 40.778 ms stddev 11.499 progress: 30.0 s, 220.2 tps, lat 44.843 ms stddev 13.181 progress: 40.0 s, 176.9 tps, lat 57.154 ms stddev 39.953 progress: 50.0 s, 213.6 tps, lat 46.762 ms stddev 13.346 progress: 60.0 s, 488.2 tps, lat 20.475 ms stddev 17.779 progress: 70.0 s, 470.3 tps, lat 21.239 ms stddev 24.805 progress: 80.0 s, 387.4 tps, lat 25.815 ms stddev 34.607 progress: 90.0 s, 400.3 tps, lat 24.957 ms stddev 23.723 progress: 100.0 s, 366.5 tps, lat 27.286 ms stddev 66.979 progress: 110.0 s, 367.3 tps, lat 27.176 ms stddev 28.625 progress: 120.0 s, 338.6 tps, lat 29.514 ms stddev 45.125 progress: 130.0 s, 304.8 tps, lat 32.780 ms stddev 61.692 progress: 140.0 s, 269.0 tps, lat 37.148 ms stddev 73.160 progress: 150.0 s, 263.4 tps, lat 37.251 ms stddev 79.585 progress: 160.0 s, 299.4 tps, lat 32.048 ms stddev 47.968 progress: 170.0 s, 261.5 tps, lat 40.426 ms stddev 84.604 progress: 180.0 s, 223.9 tps, lat 44.454 ms stddev 100.991 progress: 190.0 s, 169.4 tps, lat 59.277 ms stddev 237.757 progress: 200.0 s, 242.9 tps, lat 41.131 ms stddev 75.598 progress: 210.0 s, 297.7 tps, lat 33.600 ms stddev 61.503 progress: 220.0 s, 265.3 tps, lat 37.649 ms stddev 100.443 progress: 230.0 s, 263.1 tps, lat 37.962 ms stddev 77.040 progress: 240.0 s, 223.7 tps, lat 44.671 ms stddev 96.937 progress: 250.0 s, 208.9 tps, lat 47.850 ms stddev 163.998 progress: 260.0 s, 153.5 tps, lat 54.204 ms stddev 125.948 progress: 270.0 s, 225.4 tps, lat 49.184 ms stddev 139.220 progress: 280.0 s, 223.1 tps, lat 46.299 ms stddev 110.193 progress: 290.0 s, 249.5 tps, lat 40.970 ms stddev 91.191 progress: 300.0 s, 251.4 tps, lat 39.775 ms stddev 80.429 progress: 310.0 s, 213.1 tps, lat 46.846 ms stddev 105.240 progress: 320.0 s, 197.4 tps, lat 50.748 ms stddev 111.180 progress: 330.0 s, 165.0 tps, lat 60.244 ms stddev 151.391 progress: 340.0 s, 195.7 tps, lat 51.197 ms stddev 109.990 progress: 350.0 s, 199.4 tps, lat 48.199 ms stddev 83.645 progress: 360.0 s, 134.4 tps, lat 77.168 ms stddev 257.218 progress: 370.0 s, 109.1 tps, lat 91.737 ms stddev 239.082 progress: 380.0 s, 159.5 tps, lat 57.498 ms stddev 129.741 progress: 390.0 s, 193.0 tps, lat 54.895 ms stddev 116.293 progress: 400.0 s, 156.1 tps, lat 65.559 ms stddev 135.288 progress: 410.0 s, 210.0 tps, lat 47.507 ms stddev 93.252 progress: 420.0 s, 174.9 tps, lat 57.200 ms stddev 165.627 progress: 430.0 s, 165.3 tps, lat 60.328 ms stddev 135.303 progress: 440.0 s, 114.1 tps, lat 87.840 ms stddev 190.481 progress: 450.0 s, 138.3 tps, lat 59.793 ms stddev 102.914 progress: 460.0 s, 182.2 tps, lat 63.602 ms stddev 197.301 progress: 470.0 s, 138.0 tps, lat 65.635 ms stddev 200.784 progress: 480.0 s, 170.1 tps, lat 64.948 ms stddev 173.358 progress: 490.0 s, 169.6 tps, lat 58.762 ms stddev 139.054 progress: 500.0 s, 135.5 tps, lat 72.901 ms stddev 213.248 progress: 510.0 s, 148.8 tps, lat 68.151 ms stddev 155.461 progress: 520.0 s, 156.1 tps, lat 63.845 ms stddev 147.884 progress: 530.0 s, 173.6 tps, lat 47.442 ms stddev 64.936 progress: 540.0 s, 180.7 tps, lat 65.152 ms stddev 207.246 progress: 550.0 s, 179.4 tps, lat 55.658 ms stddev 105.423 progress: 560.0 s, 173.8 tps, lat 57.615 ms stddev 94.866 progress: 570.0 s, 179.6 tps, lat 55.548 ms stddev 130.517 progress: 580.0 s, 181.0 tps, lat 55.044 ms stddev 126.109 progress: 590.0 s, 161.5 tps, lat 62.217 ms stddev 136.538 progress: 600.0 s, 161.3 tps, lat 51.697 ms stddev 87.159 progress: 610.0 s, 167.6 tps, lat 64.527 ms stddev 199.725 progress: 620.0 s, 150.8 tps, lat 71.611 ms stddev 256.386 progress: 630.0 s, 150.7 tps, lat 66.379 ms stddev 178.892 progress: 640.0 s, 139.3 tps, lat 71.740 ms stddev 215.406 progress: 650.0 s, 162.0 tps, lat 61.436 ms stddev 214.835 progress: 660.0 s, 169.1 tps, lat 59.385 ms stddev 136.375 progress: 670.0 s, 174.4 tps, lat 57.328 ms stddev 142.646 progress: 680.0 s, 163.9 tps, lat 60.961 ms stddev 147.686 progress: 690.0 s, 168.2 tps, lat 59.408 ms stddev 113.007 progress: 700.0 s, 218.1 tps, lat 43.138 ms stddev 63.753 progress: 710.0 s, 165.2 tps, lat 61.004 ms stddev 124.893 progress: 720.0 s, 163.4 tps, lat 60.585 ms stddev 142.039 progress: 730.0 s, 172.2 tps, lat 61.205 ms stddev 158.038 progress: 740.0 s, 158.2 tps, lat 63.365 ms stddev 151.284 progress: 750.0 s, 154.9 tps, lat 64.682 ms stddev 150.947 progress: 760.0 s, 168.5 tps, lat 59.308 ms stddev 181.120 progress: 770.0 s, 145.6 tps, lat 68.708 ms stddev 203.380 progress: 780.0 s, 153.4 tps, lat 59.480 ms stddev 95.686 progress: 790.0 s, 157.3 tps, lat 68.311 ms stddev 176.298 progress: 800.0 s, 155.9 tps, lat 63.335 ms stddev 122.997 progress: 810.0 s, 136.9 tps, lat 74.575 ms stddev 186.505 progress: 820.0 s, 152.7 tps, lat 62.206 ms stddev 166.385 progress: 830.0 s, 143.3 tps, lat 73.374 ms stddev 152.323 progress: 840.0 s, 185.8 tps, lat 53.480 ms stddev 133.932 progress: 850.0 s, 154.1 tps, lat 61.599 ms stddev 192.194 progress: 860.0 s, 144.2 tps, lat 73.218 ms stddev 199.194 progress: 870.0 s, 159.3 tps, lat 60.034 ms stddev 133.204 progress: 880.0 s, 151.1 tps, lat 64.337 ms stddev 147.333 progress: 890.0 s, 140.8 tps, lat 68.930 ms stddev 172.148 progress: 900.0 s, 159.7 tps, lat 68.639 ms stddev 159.429 progress: 910.0 s, 149.5 tps, lat 66.574 ms stddev 169.599 progress: 920.0 s, 141.4 tps, lat 71.256 ms stddev 179.652 progress: 930.0 s, 111.0 tps, lat 89.830 ms stddev 177.565 transaction type: bench_multicolumn.sql scaling factor: 100 query mode: prepared number of clients: 10 number of threads: 1 duration: 930 s number of transactions actually processed: 185220 latency average = 50.178 ms latency stddev = 127.798 ms tps = 199.157086 (including connections establishing) tps = 199.157644 (excluding connections establishing) statement latencies in milliseconds: 0.002 \set aid random(1, 100000 * :scale) 0.001 \set bid random(1, 1 * :scale) 0.001 \set tid random(1, 10 * :scale) 0.001 \set delta random(-5000, 5000) 0.001 \set aid2 random(1, 100000 * :scale) 0.001 \set bid2 random(1, 1 * :scale) 0.001 \set tid2 random(1, 10 * :scale) 0.001 \set delta2 random(-5000, 5000) 0.001 \set aid3 random(1, 100000 * :scale) 0.000 \set bid3 random(1, 1 * :scale) 0.001 \set tid3 random(1, 10 * :scale) 0.001 \set delta3 random(-5000, 5000) 0.001 \set aid4 random(1, 100000 * :scale) 0.001 \set bid4 random(1, 1 * :scale) 0.001 \set tid4 random(1, 10 * :scale) 0.001 \set delta4 random(-5000, 5000) 0.001 \set aidsel random(1, 100000 * :scale) 0.001 \set bidsel random(1, 1 * :scale) 0.000 \set tidsel random(1, 10 * :scale) 0.237 BEGIN; 20.808 INSERT INTO pgbench_history_2 (tid, bid, aid, delta, mtime) 29.121 END;
From 0fc050033563f5bef074401197d3b38425478857 Mon Sep 17 00:00:00 2001 From: Matthias van de Meent <boekew...@gmail.com> Date: Wed, 2 Sep 2020 11:51:12 +0200 Subject: [PATCH] Update _bt_binsrch* to account for prefix column equality This improves the search speed for multi-column search indexes, as the prefixes of hikey and lokey do not need to be compared when high key and low key both have N columns common with the searchkey. Signed-off-by: Matthias van de Meent <boekew...@gmail.com> --- src/backend/access/nbtree/nbtinsert.c | 9 ++ src/backend/access/nbtree/nbtsearch.c | 164 +++++++++++++++++++------- src/backend/access/nbtree/nbtutils.c | 1 + src/include/access/nbtree.h | 20 ++++ 4 files changed, 153 insertions(+), 41 deletions(-) diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c index d36f7557c8..2bf6115040 100644 --- a/src/backend/access/nbtree/nbtinsert.c +++ b/src/backend/access/nbtree/nbtinsert.c @@ -208,6 +208,7 @@ search: /* start over... */ if (stack) _bt_freestack(stack); + itup_key->searchcolneq = 1; goto search; } @@ -402,6 +403,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel, bool inposting = false; bool prevalldead = true; int curposti = 0; + int searchcolneq = itup_key->searchcolneq; /* Assume unique until we find a duplicate */ *is_unique = true; @@ -559,6 +561,8 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel, if (nbuf != InvalidBuffer) _bt_relbuf(rel, nbuf); *is_unique = false; + /* Reset the search state to the state as we were called */ + itup_key->searchcolneq = searchcolneq; return InvalidTransactionId; } @@ -577,6 +581,8 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel, *speculativeToken = SnapshotDirty.speculativeToken; /* Caller releases lock on buf immediately */ insertstate->bounds_valid = false; + /* Reset the search state to the state as we were called */ + itup_key->searchcolneq = searchcolneq; return xwait; } @@ -751,6 +757,9 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel, if (nbuf != InvalidBuffer) _bt_relbuf(rel, nbuf); + /* Reset the search state to the state as we were called */ + itup_key->searchcolneq = searchcolneq; + return InvalidTransactionId; } diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c index 1e628a33d7..aefc8733fd 100644 --- a/src/backend/access/nbtree/nbtsearch.c +++ b/src/backend/access/nbtree/nbtsearch.c @@ -45,6 +45,7 @@ static bool _bt_parallel_readpage(IndexScanDesc scan, BlockNumber blkno, static Buffer _bt_walk_left(Relation rel, Buffer buf, Snapshot snapshot); static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir); static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir); +static inline BTCompareResult _bt_compare_inline(Relation rel, BTScanInsert key, Page page, OffsetNumber offnum, int scanStartCol); /* @@ -348,8 +349,10 @@ _bt_binsrch(Relation rel, BTPageOpaque opaque; OffsetNumber low, high; - int32 result, - cmpval; + int32 cmpval; + int higheqcol, + loweqcol; + BTCompareResult result; page = BufferGetPage(buf); opaque = (BTPageOpaque) PageGetSpecialPointer(page); @@ -388,20 +391,33 @@ _bt_binsrch(Relation rel, cmpval = key->nextkey ? 0 : 1; /* select comparison value */ + higheqcol = key->searchcolneq; + loweqcol = key->searchcolneq; + while (high > low) { OffsetNumber mid = low + ((high - low) / 2); /* We have low <= mid < high, so mid points at a real slot */ - result = _bt_compare(rel, key, page, mid); + result = _bt_compare_inline(rel, key, page, mid, Min(higheqcol, loweqcol)); - if (result >= cmpval) + Assert(result.compareResult == _bt_compare(rel, key, page, mid)); + + if (result.compareResult >= cmpval) + { low = mid + 1; + loweqcol = result.comparedColumn; + } else + { high = mid; + higheqcol = result.comparedColumn; + } } + key->searchcolneq = Min(loweqcol, higheqcol); + /* * At this point we have high == low, but be careful: they could point * past the last slot on the page. @@ -452,8 +468,10 @@ _bt_binsrch_insert(Relation rel, BTInsertState insertstate) OffsetNumber low, high, stricthigh; - int32 result, - cmpval; + int32 cmpval; + int higheqcol, + loweqcol; + BTCompareResult result; page = BufferGetPage(insertstate->buf); opaque = (BTPageOpaque) PageGetSpecialPointer(page); @@ -500,6 +518,8 @@ _bt_binsrch_insert(Relation rel, BTInsertState insertstate) stricthigh = high; /* high initially strictly higher */ cmpval = 1; /* !nextkey comparison value */ + loweqcol = key->searchcolneq; + higheqcol = key->searchcolneq; while (high > low) { @@ -507,14 +527,20 @@ _bt_binsrch_insert(Relation rel, BTInsertState insertstate) /* We have low <= mid < high, so mid points at a real slot */ - result = _bt_compare(rel, key, page, mid); + result = _bt_compare_inline(rel, key, page, mid, Min(loweqcol, higheqcol)); + Assert(result.compareResult == _bt_compare(rel, key, page, mid)); - if (result >= cmpval) + if (result.compareResult >= cmpval) + { low = mid + 1; + loweqcol = result.comparedColumn; + } else { high = mid; - if (result != 0) + higheqcol = result.comparedColumn; + + if (result.compareResult != 0) stricthigh = high; } @@ -525,10 +551,12 @@ _bt_binsrch_insert(Relation rel, BTInsertState insertstate) * posting list when postingoff is set. This should happen * infrequently. */ - if (unlikely(result == 0 && key->scantid != NULL)) + if (unlikely(result.compareResult == 0 && key->scantid != NULL)) insertstate->postingoff = _bt_binsrch_posting(key, page, mid); } + key->searchcolneq = Min(loweqcol, higheqcol); + /* * On a leaf page, a binary search always returns the first key >= scan * key (at least in !nextkey case), which could be the last slot + 1. This @@ -623,6 +651,36 @@ _bt_binsrch_posting(BTScanInsert key, Page page, OffsetNumber offnum) * 0 if scankey == tuple at offnum; * >0 if scankey > tuple at offnum. * + * See _bt_compare_inline() for more specific information. + *---------- + */ +int32 +_bt_compare(Relation rel, + BTScanInsert key, + Page page, + OffsetNumber offnum) +{ + BTCompareResult result = _bt_compare_inline(rel, key, page, offnum, 1); + return result.compareResult; +} + +/*---------- + * _bt_compare_inline() -- Compare insertion-type scankey to tuple on a page + * + * page/offnum: location of btree item to be compared to. + * scanStartCol: first column of key to start comparing. 1-based + * + * This routine returns: + * + * BTCompareResult.compareResult: + * <0 if scankey < tuple at offnum; + * 0 if scankey == tuple at offnum; + * >0 if scankey > tuple at offnum. + * BTCompareResult.comparedColumn: + * if all columns are equal, the number of indexed + * columns in the scankey + 1, otherwise the 1-indexed column + * number that was compared to get the non-0 result. + * * NULLs in the keys are treated as sortable values. Therefore * "equality" does not necessarily mean that the item should be returned * to the caller as a matching key. Similarly, an insertion scankey @@ -640,11 +698,12 @@ _bt_binsrch_posting(BTScanInsert key, Page page, OffsetNumber offnum) * key. See backend/access/nbtree/README for details. *---------- */ -int32 -_bt_compare(Relation rel, - BTScanInsert key, - Page page, - OffsetNumber offnum) +inline static BTCompareResult +_bt_compare_inline(Relation rel, + BTScanInsert key, + Page page, + OffsetNumber offnum, + int scanStartCol) { TupleDesc itupdesc = RelationGetDescr(rel); BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page); @@ -653,18 +712,23 @@ _bt_compare(Relation rel, ScanKey scankey; int ncmpkey; int ntupatts; - int32 result; + BTCompareResult result; Assert(_bt_check_natts(rel, key->heapkeyspace, page, offnum)); Assert(key->keysz <= IndexRelationGetNumberOfKeyAttributes(rel)); Assert(key->heapkeyspace || key->scantid == NULL); + Assert(scanStartCol >= 1); /* * Force result ">" if target item is first data item on an internal page * --- see NOTE above. */ if (!P_ISLEAF(opaque) && offnum == P_FIRSTDATAKEY(opaque)) - return 1; + { + result.compareResult = 1; + result.comparedColumn = scanStartCol; + return result; + } itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum)); ntupatts = BTreeTupleGetNAtts(itup, rel); @@ -684,8 +748,12 @@ _bt_compare(Relation rel, ncmpkey = Min(ntupatts, key->keysz); Assert(key->heapkeyspace || ncmpkey == key->keysz); Assert(!BTreeTupleIsPosting(itup) || key->allequalimage); - scankey = key->scankeys; - for (int i = 1; i <= ncmpkey; i++) + Assert(scanStartCol <= ncmpkey + 1); + + scankey = key->scankeys + (scanStartCol - 1); + result.comparedColumn = scanStartCol; + + for (; result.comparedColumn <= ncmpkey; result.comparedColumn++) { Datum datum; bool isNull; @@ -695,18 +763,18 @@ _bt_compare(Relation rel, if (scankey->sk_flags & SK_ISNULL) /* key is NULL */ { if (isNull) - result = 0; /* NULL "=" NULL */ + result.compareResult = 0; /* NULL "=" NULL */ else if (scankey->sk_flags & SK_BT_NULLS_FIRST) - result = -1; /* NULL "<" NOT_NULL */ + result.compareResult = -1; /* NULL "<" NOT_NULL */ else - result = 1; /* NULL ">" NOT_NULL */ + result.compareResult = 1; /* NULL ">" NOT_NULL */ } else if (isNull) /* key is NOT_NULL and item is NULL */ { if (scankey->sk_flags & SK_BT_NULLS_FIRST) - result = 1; /* NOT_NULL ">" NULL */ + result.compareResult = 1; /* NOT_NULL ">" NULL */ else - result = -1; /* NOT_NULL "<" NULL */ + result.compareResult = -1; /* NOT_NULL "<" NULL */ } else { @@ -718,17 +786,17 @@ _bt_compare(Relation rel, * to flip the sign of the comparison result. (Unless it's a DESC * column, in which case we *don't* flip the sign.) */ - result = DatumGetInt32(FunctionCall2Coll(&scankey->sk_func, - scankey->sk_collation, - datum, - scankey->sk_argument)); + result.compareResult = DatumGetInt32(FunctionCall2Coll(&scankey->sk_func, + scankey->sk_collation, + datum, + scankey->sk_argument)); if (!(scankey->sk_flags & SK_BT_DESC)) - INVERT_COMPARE_RESULT(result); + INVERT_COMPARE_RESULT(result.compareResult); } /* if the keys are unequal, return the difference */ - if (result != 0) + if (result.compareResult != 0) return result; scankey++; @@ -744,7 +812,10 @@ _bt_compare(Relation rel, * necessary. */ if (key->keysz > ntupatts) - return 1; + { + result.compareResult = 1; + return result; + } /* * Use the heap TID attribute and scantid to try to break the tie. The @@ -789,10 +860,14 @@ _bt_compare(Relation rel, */ if (key->heapkeyspace && !key->pivotsearch && key->keysz == ntupatts && heapTid == NULL) - return 1; + { + result.compareResult = 1; + return result; + } /* All provided scankey arguments found to be equal */ - return 0; + result.compareResult = 0; + return result; } /* @@ -801,7 +876,10 @@ _bt_compare(Relation rel, */ Assert(key->keysz == IndexRelationGetNumberOfKeyAttributes(rel)); if (heapTid == NULL) - return 1; + { + result.compareResult = 1; + return result; + } /* * Scankey must be treated as equal to a posting list tuple if its scantid @@ -810,18 +888,19 @@ _bt_compare(Relation rel, * with scantid. */ Assert(ntupatts >= IndexRelationGetNumberOfKeyAttributes(rel)); - result = ItemPointerCompare(key->scantid, heapTid); - if (result <= 0 || !BTreeTupleIsPosting(itup)) + result.compareResult = ItemPointerCompare(key->scantid, heapTid); + if (result.compareResult <= 0 || !BTreeTupleIsPosting(itup)) return result; else { - result = ItemPointerCompare(key->scantid, - BTreeTupleGetMaxHeapTID(itup)); - if (result > 0) - return 1; + result.compareResult = ItemPointerCompare(key->scantid, + BTreeTupleGetMaxHeapTID(itup)); + if (result.compareResult > 0) + return result; } - return 0; + result.compareResult = 0; + return result; } /* @@ -1340,6 +1419,7 @@ _bt_first(IndexScanDesc scan, ScanDirection dir) inskey.anynullkeys = false; /* unused */ inskey.nextkey = nextkey; inskey.pivotsearch = false; + inskey.searchcolneq = 1; inskey.scantid = NULL; inskey.keysz = keysCount; @@ -1375,6 +1455,8 @@ _bt_first(IndexScanDesc scan, ScanDirection dir) _bt_initialize_more_data(so, dir); + /* reset the column search position for the following statement */ + inskey.searchcolneq = 1; /* position to the precise item on the page */ offnum = _bt_binsrch(rel, &inskey, buf); diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c index 81589b9056..5c058ca9a8 100644 --- a/src/backend/access/nbtree/nbtutils.c +++ b/src/backend/access/nbtree/nbtutils.c @@ -122,6 +122,7 @@ _bt_mkscankey(Relation rel, IndexTuple itup) key->anynullkeys = false; /* initial assumption */ key->nextkey = false; key->pivotsearch = false; + key->searchcolneq = 1; key->keysz = Min(indnkeyatts, tupnatts); key->scantid = key->heapkeyspace && itup ? BTreeTupleGetHeapTID(itup) : NULL; diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h index 65d9698b89..9aebff05c0 100644 --- a/src/include/access/nbtree.h +++ b/src/include/access/nbtree.h @@ -639,6 +639,12 @@ typedef BTStackData *BTStack; * using a scankey built from a leaf page's high key. Most callers set this * to false. * + * scancolneq contains an integer that is the first column number of the + * current index scan location that has not yet proven to equal the search + * key's columns. This helps by not repeatedly having to compare columns + * that we know are equal. + * This is incremented by _bt_search and _bt_search_insert + * * scantid is the heap TID that is used as a final tiebreaker attribute. It * is set to NULL when index scan doesn't need to find a position for a * specific physical tuple. Must be set when inserting new tuples into @@ -662,6 +668,7 @@ typedef struct BTScanInsertData bool nextkey; bool pivotsearch; ItemPointer scantid; /* tiebreaker for scankeys */ + int searchcolneq; /* column that is not yet equal */ int keysz; /* Size of scankeys array */ ScanKeyData scankeys[INDEX_MAX_KEYS]; /* Must appear last */ } BTScanInsertData; @@ -945,6 +952,19 @@ typedef struct BTScanOpaqueData typedef BTScanOpaqueData *BTScanOpaque; +/* + * When two tuples are compared, we get a comparison result for one of the + * columns. While the result itself is the most interesting, the column + * can be used to further optimize searching in the index. + * + * See also: _bt_compare() and _bt_compare_inline() + */ +typedef struct BTCompareResult +{ + int32 compareResult; /* the value of the comparison between the first non-equal columns */ + int comparedColumn; /* the column number of the first non-equal column */ +} BTCompareResult; + /* * We use some private sk_flags bits in preprocessed scan keys. We're allowed * to use bits 16-31 (see skey.h). The uppermost bits are copied from the -- 2.20.1