On 2017-02-25 00:40, Petr Jelinek wrote:
0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
0002-Fix-after-trigger-execution-in-logical-replication.patch
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
snapbuild-v3-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
snapbuild-v3-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
snapbuild-v3-0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
snapbuild-v3-0004-Skip-unnecessary-snapshot-builds.patch
0001-Logical-replication-support-for-initial-data-copy-v6.patch
Here are some results. There is improvement although it's not an
unqualified success.
Several repeat-runs of pgbench_derail2.sh, with different parameters for
number-of-client yielded an output file each.
Those show that logrep is now pretty stable when there is only 1 client
(pgbench -c 1). But it starts making mistakes with 4, 8, 16 clients.
I'll just show a grep of the output files; I think it is
self-explicatory:
Output-files (lines counted with grep | sort | uniq -c):
-- out_20170225_0129.txt
250 -- pgbench -c 1 -j 8 -T 10 -P 5 -n
250 -- All is well.
-- out_20170225_0654.txt
25 -- pgbench -c 4 -j 8 -T 10 -P 5 -n
24 -- All is well.
1 -- Not good, but breaking out of wait (waited more than 60s)
-- out_20170225_0711.txt
25 -- pgbench -c 8 -j 8 -T 10 -P 5 -n
23 -- All is well.
2 -- Not good, but breaking out of wait (waited more than 60s)
-- out_20170225_0803.txt
25 -- pgbench -c 16 -j 8 -T 10 -P 5 -n
11 -- All is well.
14 -- Not good, but breaking out of wait (waited more than 60s)
So, that says:
1 clients: 250x success, zero fail (250 not a typo, ran this overnight)
4 clients: 24x success, 1 fail
8 clients: 23x success, 2 fail
16 clients: 11x success, 14 fail
I want to repeat what I said a few emails back: problems seem to
disappear when a short wait state is introduced (directly after the
'alter subscription sub1 enable' line) to give the logrep machinery time
to 'settle'. It makes one think of a timing error somewhere (now don't
ask me where..).
To show that, here is pgbench_derail2.sh output that waited 10 seconds
(INIT_WAIT in the script) as such a 'settle' period works faultless
(with 16 clients):
-- out_20170225_0852.txt
25 -- pgbench -c 16 -j 8 -T 10 -P 5 -n
25 -- All is well.
QED.
(By the way, no hanged sessions so far, so that's good)
thanks
Erik Rijkers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers