Hello. At Thu, 28 Nov 2019 17:23:19 -0500, Noah Misch <n...@leadboat.com> wrote in > On Thu, Nov 28, 2019 at 09:35:08PM +0900, Kyotaro Horiguchi wrote: > > I measured the performance with the latest patch set. > > > > > 1. Determine $DDL_COUNT, a number of DDL transactions that take about one > > > minute when done via syncs. > > > 2. Start "pgbench -rP1 --progress-timestamp -T180 -c10 -j10". > > > 3. Wait 10s. > > > 4. Start one DDL backend that runs $DDL_COUNT transactions. > > > 5. Save DDL start timestamp, DDL end timestamp, and pgbench output. > > If you have the raw data requested in (5), please share them here so folks > have the option to reproduce your graphs and calculations.
Sorry, I forgot to attach the scripts. The raw data was vanished into unstable connection and the steps was quite crude. I prioritized on showing some numbers at the time. I revised the scripts into more automated way and will take numbers again. > > > 2. Start server with wal_level = replica (all other variables are not > > > changed) then run the attached ./bench.sh > > > > The bench.sh attachment was missing; please attach it. Please give the > > output > > of this command: > > > > select name, setting from pg_settings where setting <> boot_val; (I intentionally show all the results..) =# select name, setting from pg_settings where setting<> boot_val; name | setting ----------------------------+-------------------- application_name | psql archive_command | (disabled) client_encoding | UTF8 data_directory_mode | 0700 default_text_search_config | pg_catalog.english lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 lc_messages | en_US.UTF-8 lc_monetary | en_US.UTF-8 lc_numeric | en_US.UTF-8 lc_time | en_US.UTF-8 log_checkpoints | on log_file_mode | 0600 log_timezone | Asia/Tokyo max_stack_depth | 2048 max_wal_senders | 0 max_wal_size | 10240 server_encoding | UTF8 shared_buffers | 16384 TimeZone | Asia/Tokyo unix_socket_permissions | 0777 wal_buffers | 512 wal_level | minimal (23 rows) The result for "replica" setting in the benchmark script are used as base numbers (or the denominator of the percentages). > > 3. Restart server with wal_level = replica then run the bench.sh > > twice. > > I assume this is wal_level=minimal, not wal_level=replica. Oops! It's wrong I ran once with replica, then twice with minimal. Anyway, I revised the benchmarking scripts and attached them. The parameters written in benchmain.sh were decided as ./bench2.pl 5 <count> <pages> s with wal_level=minimal server takes around 60 seconds. I'll send the complete data tomorrow (in JST). The attached f.txt is the result of preliminary test only with pages=100 and 250 (with HDD). The attached files are: benchmain.sh - main script bench2.sh - run a benchmark with a single set of parameters bench1.pl - behchmark client program summarize.pl - script to summarize benchmain.sh's output f.txt.gz - result only for pages=100, DDL count = 2200 (not 2250) How to run: $ /..unpatched_path../initdb -D <unpatched_datadir> (wal_level=replica, max_wal_senders=0, log_checkpoints=yes, max_wal_size=10GB) $ /..patched_path../initdb -D <patched_datadir> (wal_level=minimal, max_wal_senders=0, log_checkpoints=yes, max_wal_size=10GB) $./benchmain.sh > <result_file> # output raw data $./summarize.pl [-v] < <result_file> # show summary With the attached f.txt, summarize.pl gives the following output. WAL wins with the that pages. $ cat f.txt | ./summarize.pl ## params: wal_level=replica mode=none pages=100 count=353 scale=20 (% are relative to "before") before: tps 262.3 (100.0%), lat 39.840 ms (100.0%) (29 samples) during: tps 120.7 ( 46.0%), lat 112.508 ms (282.4%) (35 samples) after: tps 106.3 ( 40.5%), lat 163.492 ms (410.4%) (86 samples) DDL time: 34883 ms ( 100.0% relative to mode=none) ## params: wal_level=minimal mode=sync pages=100 count=353 scale=20 (% are relative to "before") before: tps 226.3 (100.0%), lat 48.091 ms (100.0%) (29 samples) during: tps 83.0 ( 36.7%), lat 184.942 ms (384.6%) (100 samples) after: tps 82.6 ( 36.5%), lat 196.863 ms (409.4%) (21 samples) DDL time: 99239 ms ( 284.5% relative to mode=none) ## params: wal_level=minimal mode=WAL pages=100 count=353 scale=20 (% are relative to "before") before: tps 240.3 (100.0%), lat 44.686 ms (100.0%) (29 samples) during: tps 129.6 ( 53.9%), lat 113.585 ms (254.2%) (31 samples) after: tps 124.5 ( 51.8%), lat 141.992 ms (317.8%) (90 samples) DDL time: 30392 ms ( 87.1% relative to mode=none) ## params: wal_level=replica mode=none pages=250 count=258 scale=20 (% are relative to "before") before: tps 266.3 (100.0%), lat 45.884 ms (100.0%) (29 samples) during: tps 87.9 ( 33.0%), lat 148.433 ms (323.5%) (54 samples) after: tps 105.6 ( 39.6%), lat 153.216 ms (333.9%) (67 samples) DDL time: 53176 ms ( 100.0% relative to mode=none) ## params: wal_level=minimal mode=sync pages=250 count=258 scale=20 (% are relative to "before") before: tps 225.1 (100.0%), lat 47.705 ms (100.0%) (29 samples) during: tps 93.7 ( 41.6%), lat 143.231 ms (300.2%) (83 samples) after: tps 93.8 ( 41.7%), lat 186.097 ms (390.1%) (38 samples) DDL time: 82104 ms ( 154.4% relative to mode=none) ## params: wal_level=minimal mode=WAL pages=250 count=258 scale=20 (% are relative to "before") before: tps 230.2 (100.0%), lat 48.472 ms (100.0%) (29 samples) during: tps 90.3 ( 39.2%), lat 183.365 ms (378.3%) (48 samples) after: tps 123.9 ( 53.8%), lat 131.129 ms (270.5%) (73 samples) DDL time: 47660 ms ( 89.6% relative to mode=none) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
#! /usr/bin/bash PGBENCH_SCALE=20 export PATH=/home/horiguti/bin/pgsql_pendsync/bin:$PATH function run_one_param() { # bench2.sh <ddl_count> <insert_pages> <method (s:sync, w:WAL, n:not-set)> <pgbench scale> ./bench2.sh $1 $2 n ${PGBENCH_SCALE} 2>&1 ./bench2.sh $1 $2 s ${PGBENCH_SCALE} 2>&1 ./bench2.sh $1 $2 w ${PGBENCH_SCALE} 2>&1 } killall -9 postgres sleep 1 #run_one_param <ddl_count> <insert_pages> # On slow HDD export PGDATA=/home/horiguti/storage/hdd/data/data_pendsync run_one_param 353 100 run_one_param 258 250 run_one_param 185 500 run_one_param 118 1000 run_one_param 58 2500 run_one_param 32 5000 run_one_param 18 10000 # On M.2 SSD export PGDATA=/home/horiguti/storage/ssd/data/data_pendsync run_one_param 2250 100 run_one_param 1162 250 run_one_param 564 500 run_one_param 297 1000 run_one_param 123 2500 run_one_param 63 5000 run_one_param 32 10000
#! /usr/bin/bash if [ "$PGDATA" == "" ]; then echo "\$PGDATA should be set" exit; fi; rm -r $PGDATA/* initdb if [ "$3" == "n" ]; then wal_level=replica else wal_level=minimal fi cat <<EOF >> $PGDATA/postgresql.conf wal_level=$wal_level max_wal_senders=0 log_checkpoints=yes max_wal_size = 10GB EOF binary=`which postgres` scale=$4 echo "## params: count=$1 pages=$2 mode=$3 binary=$binary scale=$scale wal_level=$wal_level" pg_ctl stop -m f pg_ctl start 2>&1 pgbench -i -s ${scale} psql -c 'checkpoint;' ((sleep 30; echo "START"; ./bench1.pl 5 $1 $2 $3; echo "END") & pgbench -rP1 --progress-timestamp -T150 -c10 -j10) 2>&1 pg_ctl stop -m i
#! /usr/bin/perl use strict; use IPC::Open2; use Time::HiRes qw (gettimeofday tv_interval); my $tupperpage = 226; my $large_size = 100000000; my @time = (); sub bench { my ($header, $nprocs, $ntups, $threshold, $ddlcount) = @_; my @result = (); my @rds = (); for (my $ip = 0 ; $ip < $nprocs ; $ip++) { pipe(my $rd, my $wr); $rds[$ip] = $rd; my $pid = fork(); die "fork failed: $!\n" if ($pid < 0); if ($pid == 0) { close($rd); my $pid = open2(my $psqlrd, my $psqlwr, "psql postgres > /dev/null"); if ($threshold >= 0) { print $psqlwr "SET wal_skip_threshold to $threshold;\n"; } print $psqlwr "DROP TABLE IF EXISTS t$ip;"; print $psqlwr "CREATE TABLE t$ip (a int);\n"; my @st = gettimeofday(); for (my $i = 0 ; $i < $ddlcount ; $i++) { print $psqlwr "BEGIN;"; print $psqlwr "TRUNCATE t$ip;"; print $psqlwr "INSERT INTO t$ip (SELECT a FROM generate_series(1, $ntups) a);"; print $psqlwr "COMMIT;"; } close($psqlwr); waitpid($pid, 0); print $wr $ip, " ", 1000 * tv_interval(\@st, [gettimeofday]), "\n"; exit; } close($wr); } my $rpid; while (($rpid = wait()) == 0) {} my $sum = 0; for (my $ip = 0 ; $ip < $nprocs ; $ip++) { my $ret = readline($rds[$ip]); die "format? $ret\n" if ($ret !~ /^([0-9]+) ([0-9.]+)$/); $sum += $2; } printf "$header: procs $nprocs: time %.0f\n", $sum / $nprocs; } sub log10 { return log($_[0]) / log(10); } # benchmark for wal_level = replica, the third parameter of bench # doesn't affect sub bench1 { my $ddlcount = 5; $ddlcount = $ARGV[1] if ($#ARGV > 0); print "benchmark for wal_level = replica\n"; for (my $s = 0 ; $s <= 4 ; $s += 0.25) { my $ss = int(10 ** $s); bench("size $ss", 1, $ss * $tupperpage, 0, $ddlcount); } } # benchmark for wal_level = minimal. sub bench2 { my $ddlcount = 5; $ddlcount = $ARGV[1] if ($#ARGV > 0); print "benchmark for wal_level = minimal\n"; for (my $s = 0 ; $s <= 4.5 ; $s += 0.25) { my $ss = int(10 ** $s); bench("size $ss: SYNC ", 1, $ss * $tupperpage, 0, $ddlcount); bench("size $ss: WAL ", 1, $ss * $tupperpage, $large_size, $ddlcount); } } # find crossing point of WAL and SYNC by bisecting sub bench3 { my $ddlcount = 5; $ddlcount = $ARGV[1] if ($#ARGV > 0); print "find crossing point of WAL and SYNC by bisecting\n"; bench("SYNC: size 0", 1, 1, 8); bench("WAL : size 0", 1, 1, 16); my $s = 1; my $st = 10000; while (1) { my $ts = bench("SYNC: size $s", $tupperpage * $s, 0, $ddlcount); my $tw = bench("WAL : size $s", $tupperpage * $s, $large_size, $ddlcount); if ($st < 1.0){ print "DONE\n"; exit(0); } if ($ts > $tw) { $s += $st; $st /= 2; } else { $s -= $st; $st /= 2; } } } # benchmark with multiple processes sub bench4 { my $ddlcount = 5; my $nprocs = 10; $nprocs = $ARGV[1] if ($#ARGV > 0); $ddlcount = $ARGV[2] if ($#ARGV > 1); print "benchmark for wal_level = minimal, $nprocs processes\n"; print "bench 4: nprocs = $nprocs, DDL count = $ddlcount\n"; for (my $s = 1.0 ; $s <= 3.5 ; $s += 0.25) { my $ss = int(10 ** $s); bench("pages $ss: SYNC ", $nprocs, $ss * $tupperpage, 0, 5); bench("pages $ss: WAL ", $nprocs, $ss * $tupperpage, $large_size, 5); } } sub bench5 { my $ddlcount = 5; my $pages = 100; my $mode = "s"; my $threshold = 0; $ddlcount = $ARGV[1] if ($#ARGV > 0); $pages = $ARGV[2] if ($#ARGV > 1); $mode = $ARGV[3] if ($#ARGV > 2); if ($mode eq 's') { $threshold = 0; } elsif ($mode eq 's') { $threshold = $large_size; } elsif ($mode eq 'n') { $threshold = -1; } elsif ($mode eq 'w') { $threshold = $large_size; } else { die "mode must be s, w or n\n"; } print "bench 5: mode = $mode, DDL count = $ddlcount, pages = $pages\n"; bench("size $pages", 1, $pages * $tupperpage, $threshold, $ddlcount); } bench1() if ($ARGV[0] == 1); bench2() if ($ARGV[0] == 2); bench3() if ($ARGV[0] == 3); bench4() if ($ARGV[0] == 4); bench5() if ($ARGV[0] == 5);
#! /usr/bin/perl use strict; use POSIX qw(floor); my $state = 0; my $wal_level = ''; my $pages = 0; my $binary = ''; my $scale = 0; my $paramcount = 0; my $mode = ''; my $sumtps = 0; my $sumlat = 0; my $count = 0; my $trig = 0; my $title = "(undef)"; my $ddltime = 0; my @lines = (); my $trailstr = ''; my $verbose = 1 if ($ARGV[0] eq '-v'); my %tps = (); my %lat = (); my %ddltime = (); my %modestr=("n", "none", "s", "sync", "w", "WAL"); while (<STDIN>) { push(@lines, $_); chomp; next if (/^(END|START)$/); next if (/NOTICE: /); # print "$state: $_\n"; if ($state == 0) { if (/^## params: count=([0-9.]+) pages=([0-9.]+) mode=(.) binary=([^ ]+) scale=([0-9.]+) wal_level=([a-z]+)/) { $paramcount = $1; $pages = $2; $mode = $3; $binary = $4; $scale = $5; $wal_level = $6; my $modestr = $modestr{$mode}; print "## params: wal_level=$wal_level mode=$modestr pages=$pages count=$paramcount scale=$scale\n"; print "(% are relative to \"before\")\n"; $state = 1; next; } else { next; } } elsif ($state == 1) { if (/^starting vacuum/) { $state = 2; } next; } elsif ($state == 2) { if (/^bench.*/) { $trig = 1; $title = "before"; $state = 3; } } elsif ($state == 3) { if (/^size ([0-9]+): procs ([0-9]+): time ([0-9]+)$/) { $ddltime{$mode} = $3; $trig = 1; $title = "during"; $trailstr = ''; $state = 4; } } elsif ($state == 4) { if (/^transaction type: /) { $trig = 1; $title = "after"; $trailstr = "# $_\n"; $state = 5; } } elsif ($state == 5) { if (!/^statement latencies /) { $trailstr .= "# $_\n"; next; } printf "DDL time: %6.0f ms (%6.1f%% relative to mode=%s)\n", $ddltime{$mode}, floor(1000.0 * $ddltime{$mode} / $ddltime{n} + 0.5) / 10, $modestr{n}; $trailstr .= "# $_\n"; $state = 6; next; } elsif ($state == 6) { if (/^ {8}/) { $trailstr.= "# $_\n"; next; } if ($verbose) { print $trailstr; } $state = 0; next; } if ($trig) { die "count 0?\n" if ($count == 0); $tps{$title} = $sumtps / $count; $lat{$title} = $sumlat / $count; printf "%6s: tps %6.1f (%5.1f%%), lat %9.3f ms (%5.1f%%) (%d samples)\n", $title, $tps{$title}, floor(1000.0 * $tps{$title} / $tps{before} + 0.5)/10, $lat{$title}, floor(1000.0 * $lat{$title} / $lat{before} + 0.5)/10, $count; $sumtps = $sumlat = $count = 0; $trig = 0; next; } if (!/^progress: ([0-9.]+) s, ([0-9.]+) tps, lat ([0-9.]+) ms stddev ([0-9.]+|NaN)$/) { last; } $sumtps += $2; $sumlat += $3; $count++; } if ($state != 0) { print "Wrong state after EOF: state = $state\n"; print "=====================================\n"; foreach (-10 .. -1) { printf "%d: %s", ($. + $_), $lines[$. + $_]; } print "=====================================\n"; exit(1); } die "uncounted lines?\n" if ($count > 0);
f.txt.gz
Description: Binary data