On 09/05/17 00:03, Erik Rijkers wrote: > On 2017-05-05 02:00, Andres Freund wrote: >> >> Could you have a look? > > Running tests with these three patches: > >> 0001-WIP-Fix-off-by-one-around-GetLastImportantRecPtr.patch+ >> 0002-WIP-Possibly-more-robust-snapbuild-approach.patch + >> fix-statistics-reporting-in-logical-replication-work.patch > (on top of 44c528810) > > I test by 15-minute pgbench runs while there is a logical replication > connection. Primary and replica are on the same machine. > > I have seen errors on 3 different machines (where error means: at least > 1 of the 4 pgbench tables is not md5-equal). It seems better, faster > machines yield less errors. > > Normally I see in pg_stat_replication (on master) one process in state > 'streaming'. > > pid | wal | replay_loc | diff | state | app | > sync_state > 16495 | 11/EDBC0000 | 11/EA3FEEE8 | 58462488 | streaming | derail2 | async > > Often there are another two processes in pg_stat_replication that remain > in state 'startup'. > > In the failing sessions the 'streaming'-state process is missing; in > failing sessions there are only the two processes that are and remain in > 'startup'.
Hmm, startup is the state where slot creation is happening. I wonder if it's just taking long time to create snapshot because of the 5th issue which is not yet fixed (and the original patch will not apply on top of this change). Alternatively there is a bug in this patch. Did you see high CPU usage during the test when there were those "startup" state walsenders? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers