Hello Bertrand and Michael,

23.01.2024 11:07, Bertrand Drouvot wrote:
On Tue, Jan 23, 2024 at 02:50:06PM +0900, Michael Paquier wrote:

Anyway, that's not the end of it.  What should we do for snapshot
snapshot records coming from the bgwriter?
What about?

3) depending on how stabilized this test (and others that suffer from "random"
xl_running_xacts) is, then think about the bgwriter.

A recent buildfarm failure [1] reminds me of that remaining question.
Here we have a slow machine (a successful run, for example [2], shows
541.13s duration of the test) and the following information logged:

[13:55:13.725](34.411s) ok 25 - inactiveslot slot invalidation is logged with 
vacuum on pg_class
[13:55:13.727](0.002s) not ok 26 - activeslot slot invalidation is logged with 
vacuum on pg_class
[13:55:13.728](0.001s) #   Failed test 'activeslot slot invalidation is logged 
with vacuum on pg_class'
#   at 
C:/prog/bf/root/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl 
line 229.
[14:27:42.995](1949.267s) # poll_query_until timed out executing this query:
# select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where 
datname = 'testdb'
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
[14:27:42.999](0.004s) not ok 27 - confl_active_logicalslot updated
[14:27:43.000](0.001s) #   Failed test 'confl_active_logicalslot updated'
#   at 
C:/prog/bf/root/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl 
line 235.
Timed out waiting confl_active_logicalslot to be updated at C:/prog/bf/root/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 235.

---
035_standby_logical_decoding_standby.log:
2024-06-06 13:55:07.715 UTC [9172:7] LOG:  invalidating obsolete replication slot 
"row_removal_inactiveslot"
2024-06-06 13:55:07.715 UTC [9172:8] DETAIL:  The slot conflicted with xid 
horizon 754.
2024-06-06 13:55:07.715 UTC [9172:9] CONTEXT:  WAL redo at 0/4020A80 for Heap2/PRUNE_ON_ACCESS: snapshotConflictHorizon: 754, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 1, nunused: 0, dead: [48]; blkref #0: rel 1663/16384/2610, blk 0
2024-06-06 13:55:14.372 UTC [7532:1] [unknown] LOG:  connection received: 
host=127.0.0.1 port=55328
2024-06-06 13:55:14.381 UTC [7532:2] [unknown] LOG:  connection authenticated: identity="EC2AMAZ-P7KGG90\\pgrunner" method=sspi (C:/prog/bf/root/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata/pg_hba.conf:2) 2024-06-06 13:55:14.381 UTC [7532:3] [unknown] LOG:  connection authorized: user=pgrunner database=postgres application_name=035_standby_logical_decoding.pl 2024-06-06 13:55:14.443 UTC [7532:4] 035_standby_logical_decoding.pl LOG:  statement: select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where datname = 'testdb' 2024-06-06 13:55:14.452 UTC [7532:5] 035_standby_logical_decoding.pl LOG:  disconnection: session time: 0:00:00.090 user=pgrunner database=postgres host=127.0.0.1 port=55328
# (there is no `invalidating obsolete replication slot 
"row_removal_activeslot"` message)
...
2024-06-06 14:27:42.675 UTC [4032:4] 035_standby_logical_decoding.pl LOG:  statement: select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where datname = 'testdb' 2024-06-06 14:27:42.681 UTC [4032:5] 035_standby_logical_decoding.pl LOG:  disconnection: session time: 0:00:00.080 user=pgrunner database=postgres host=127.0.0.1 port=58713 2024-06-06 14:27:43.095 UTC [7892:2] FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

It's hard to determine from this info, why row_removal_activeslot was not
invalidated, but running this test on a slowed down Windows VM, I (still)
get the same looking failures caused by RUNNING_XACTS appeared just before
`invalidating obsolete replication slot "row_removal_inactiveslot"`.
So I would consider this failure as yet another result of bgwriter activity
and add it to the list of known failures as such...

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-06-06%2012%3A36%3A11
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=drongo&dt=2024-06-05%2017%3A03%3A13&stg=misc-check

Best regards,
Alexander


Reply via email to