Alvaro Herrera <alvhe...@2ndquadrant.com> writes: > Simon Riggs wrote: >> Replication lag tracking for walsenders >> >> Adds write_lag, flush_lag and replay_lag cols to pg_stat_replication.
> Did anyone notice that this seems to be causing buildfarm member 'tern' > to fail the recovery check? See here: > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=tern&dt=2017-04-21%2012%3A48%3A09&stg=recovery-check > which has > TRAP: FailedAssertion("!(lsn >= prev.lsn)", File: "walsender.c", Line: 3331) > Line 3331 was added by this commit. Note that while that commit was some time back, tern has only just started running recovery-check, following its update to the latest buildfarm script. It looks like it's run that test four times and failed twice, so far. So, not 100% reproducible, but there's something rotten there. Timing-dependent, maybe? Some excavation in the buildfarm database says that the coverage for the recovery-check test has been mighty darn thin up until just recently. These are all the reports we have: pgbfprod=> select sysname, min(snapshot) as oldest, count(*) from build_status_log where log_stage = 'recovery-check.log' group by 1 order by 2; sysname | oldest | count ----------+---------------------+------- hamster | 2016-03-01 02:34:26 | 182 crake | 2017-04-09 01:58:15 | 80 nightjar | 2017-04-11 15:54:34 | 52 longfin | 2017-04-19 16:29:39 | 9 hornet | 2017-04-20 14:12:08 | 4 mandrill | 2017-04-20 14:14:08 | 4 sungazer | 2017-04-20 14:16:08 | 4 tern | 2017-04-20 14:18:08 | 4 prion | 2017-04-20 14:23:05 | 8 jacana | 2017-04-20 15:00:17 | 3 (10 rows) So, other than hamster which is certainly going to have its own spin on the timing question, we have next to no track record for this test. I wouldn't bet that this issue is unique to tern; more likely, that's just the first critter to show an intermittent issue. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers