On Mon, 9 Nov 2020 at 03:52, Andy Fan <zhihui.fan1...@gmail.com> wrote: > then I did a perf on the 2 version, Is it possible that you called > tts_minimal_clear twice in > the v9 version? Both ExecClearTuple and ExecStoreMinimalTuple called > tts_minimal_clear > on the same slot. > > With the following changes:
Thanks for finding that. After applying that fix I did a fresh set of benchmarks on the latest master, latest master + v8 and latest master + v9 using the attached script. (resultcachebench2.sh.txt) I ran this on my zen2 AMD64 machine and formatted the results into the attached resultcache_master_vs_v8_vs_v9.csv file If I load this into PostgreSQL: # create table resultcache_bench (tbl text, target text, col text, latency_master numeric(10,3), latency_v8 numeric(10,3), latency_v9 numeric(10,3)); # copy resultcache_bench from '/path/to/resultcache_master_vs_v8_vs_v9.csv' with(format csv); and run: # select col,tbl,target, sum(latency_v8) v8, sum(latency_v9) v9, round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 from resultcache_bench group by 1,2,3 order by 2,1,3; I've attached the results of the above query. (resultcache_v8_vs_v9.txt) Out of the 24 tests done on each branch, only 6 of 24 are better on v9 compared to v8. So v8 wins on 75% of the tests. v9 never wins using the lookup1 table (1 row per lookup). It only runs on 50% of the lookup100 queries (100 inner rows per outer row). However, despite the draw in won tests for the lookup100 test, v8 takes less time overall, as indicated by the following query: postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 from resultcache_bench where tbl='lookup100'; v8_vs_v9 ---------- 99.3 (1 row) Ditching the WHERE clause and simply doing: postgres=# select round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9 from resultcache_bench; v8_vs_v9 ---------- 96.2 (1 row) indicates that v8 is 3.8% faster than v9. Altering that query accordingly indicates v8 is 11.5% faster than master and v9 is only 7% faster than master. Of course, scaling up the test will yield both versions being even more favourable then master, but the point here is comparing v8 to v9. David
#!/bin/bash seconds=60 psql -c "drop table if exists hundredk, lookup1, lookup100;" postgres psql -c "create table hundredk (hundredk int, tenk int, thousand int, hundred int, ten int, one int);" postgres psql -c "insert into hundredk select x%100000,x%10000,x%1000,x%100,x%10,1 from generate_Series(1,100000) x;" postgres psql -c "create table lookup100 (a int);" postgres psql -c "insert into lookup100 select x from generate_Series(1,100000)x,generate_Series(1,100);" postgres psql -c "create index on lookup100 (a);" postgres psql -c "create table lookup1 (a int);" postgres psql -c "insert into lookup1 select x from generate_Series(1,100000)x;" postgres psql -c "create index on lookup1 (a);" postgres psql -c "vacuum analyze lookup1, lookup100, hundredk;" postgres for branch in master resultcache_v8 resultcache_v9 do cd pg_src git checkout $branch make clean -s make -j -s make install -s cd ./ss.sh sleep 1 psql -c "select pg_prewarm('lookup1'), pg_prewarm('lookup100'), pg_prewarm('hundredk');" postgres for tbl in lookup1 lookup100 do for target in "count(*)" "count(l.a)" "'*'" do for col in thousand hundred ten one do echo "select $target from hundredk hk inner join $tbl l on hk.$col = l.a" > bench.sql echo Testing $branch $tbl $target $col >> bench.log pgbench -n -T $seconds -f bench.sql postgres | grep latency >> bench.log done done done done
resultcache_master_vs_v8_vs_v9.csv
Description: MS-Excel spreadsheet
col | tbl | target | v8 | v9 | v8_vs_v9 ----------+-----------+------------+----------+----------+---------- hundred | lookup1 | '*' | 42.484 | 44.511 | 95.4 hundred | lookup1 | count(*) | 30.513 | 33.016 | 92.4 hundred | lookup1 | count(l.a) | 32.651 | 35.471 | 92.0 one | lookup1 | '*' | 42.084 | 43.668 | 96.4 one | lookup1 | count(*) | 29.255 | 32.162 | 91.0 one | lookup1 | count(l.a) | 31.772 | 35.139 | 90.4 ten | lookup1 | '*' | 40.286 | 42.439 | 94.9 ten | lookup1 | count(*) | 29.286 | 32.009 | 91.5 ten | lookup1 | count(l.a) | 31.392 | 34.053 | 92.2 thousand | lookup1 | '*' | 43.771 | 45.711 | 95.8 thousand | lookup1 | count(*) | 31.531 | 33.845 | 93.2 thousand | lookup1 | count(l.a) | 33.339 | 35.903 | 92.9 hundred | lookup100 | '*' | 1494.440 | 1471.999 | 101.5 hundred | lookup100 | count(*) | 266.988 | 265.659 | 100.5 hundred | lookup100 | count(l.a) | 268.414 | 273.851 | 98.0 one | lookup100 | '*' | 1503.697 | 1553.174 | 96.8 one | lookup100 | count(*) | 952.287 | 967.910 | 98.4 one | lookup100 | count(l.a) | 970.547 | 990.562 | 98.0 ten | lookup100 | '*' | 1354.550 | 1407.461 | 96.2 ten | lookup100 | count(*) | 944.390 | 929.513 | 101.6 ten | lookup100 | count(l.a) | 958.143 | 944.583 | 101.4 thousand | lookup100 | '*' | 1528.326 | 1507.617 | 101.4 thousand | lookup100 | count(*) | 203.801 | 203.707 | 100.0 thousand | lookup100 | count(l.a) | 204.645 | 209.883 | 97.5