Hi, we have an error happening on a catalog table on one of the dbs in the instance. This is the error report form:
A description of what you are trying to achieve and what results you expect.: Any kind of vacuum fails on pg_authid table, I would expect it to succeed. This is occasionaly blocking autovacuums, so we must resort to manual vacuum of busy tables, which is bearable but inconvenient. PostgreSQL version number you are running: PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28), 64-bit How you installed PostgreSQL: >From the pgdg yum repositories. Changes made to the settings in the postgresql.conf file: see Server Configuration for a quick way to list them all. name | current_setting | source ------------------------------+------------------------------------------------------------------------------+---------------------- application_name | psql | client archive_command | rsync -a %p barman@x.x.x.x:/data/backup/barman/ubipgsql-10/incoming/%f | configuration file archive_mode | on | configuration file checkpoint_completion_target | 0.9 | configuration file checkpoint_timeout | 25min | configuration file client_encoding | UTF8 | client DateStyle | ISO, MDY | configuration file default_text_search_config | pg_catalog.english | configuration file dynamic_shared_memory_type | posix | configuration file effective_cache_size | 64GB | configuration file hot_standby | on | configuration file hot_standby_feedback | on | configuration file lc_messages | en_US.UTF-8 | configuration file lc_monetary | en_US.UTF-8 | configuration file lc_numeric | en_US.UTF-8 | configuration file lc_time | en_US.UTF-8 | configuration file listen_addresses | * | configuration file log_autovacuum_min_duration | 1s | configuration file log_checkpoints | on | configuration file log_destination | syslog | configuration file log_directory | log | configuration file log_filename | postgresql-%a.log | configuration file log_line_prefix | user=%u,db=%d,client=%h | configuration file log_lock_waits | on | configuration file log_min_duration_statement | 1s | configuration file log_min_messages | info | configuration file log_rotation_age | 1d | configuration file log_rotation_size | 0 | configuration file log_statement | ddl | configuration file log_temp_files | 1MB | configuration file log_timezone | Europe/Rome | configuration file log_truncate_on_rotation | on | configuration file logging_collector | on | configuration file maintenance_work_mem | 1GB | configuration file max_connections | 1000 | configuration file max_stack_depth | 2MB | environment variable max_wal_size | 12GB | configuration file min_wal_size | 80MB | configuration file password_encryption | scram-sha-256 | configuration file pg_stat_statements.max | 10000 | configuration file pg_stat_statements.track | all | configuration file shared_buffers | 32GB | configuration file shared_preload_libraries | pg_stat_statements | configuration file syslog_facility | local0 | configuration file syslog_ident | postgres | configuration file TimeZone | Europe/Rome | configuration file track_activity_query_size | 2048 | configuration file track_functions | all | configuration file track_io_timing | on | configuration file wal_keep_segments | 200 | configuration file wal_level | logical | configuration file (51 rows) Operating system and version: CentOS Linux release 7.4.1708 (Core) Linux xx.xx.com 3.10.0-693.11.1.el7.x86_64 #1 SMP Mon Dec 4 23:52:40 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux What program you're using to connect to PostgreSQL: psql, jdbc driver. It's not an application side error. Is there anything relevant or unusual in the PostgreSQL server logs?: Yes, we see these kind of errors: May 20 16:06:04 ubipgsql01 postgres[26739]: [1380-1] user=,db=,client= ERROR: found xmin 2889675859 from before relfrozenxid 400011439 May 20 16:06:04 ubipgsql01 postgres[26739]: [1380-2] user=,db=,client= CONTEXT: automatic vacuum of table "postgres.pg_catalog.pg_authid" When these errors pop up, autovacuum repeatedly fails on this very same table, and we have to resort to manual vacuums For questions about any kind of error: What you were doing when the error happened / how to cause the error: The first time we saw the error, we found an idle transaction was left open for several days. The user was not monitored for long running transactions. We killed the session, deleted the offending row (the one with the xmin value reported) and then vacuumed the table. However, teh same error pops up every time we try to add a new user. We would like to fix these error once and for all, so we can add new users. The EXACT TEXT of the error message you're getting, if there is one: (Copy and paste the message to the email, do not send a screenshot) This is the error: psql (10.4) Type "help" for help. postgres=# vacuum pg_authid; ERROR: found xmin 3031994631 from before relfrozenxid 400011439 Some datas: postgres=# select xmin from pg_authid ; xmin ------------ 1 1 1 1 1 557 7216348 110077819 110511334 3031994631 3032044199 3032044199 3032044199 3032070282 (14 rows) postgres=# select relfrozenxid from pg_class where relname='pg_authid'; relfrozenxid -------------- 400011439 (1 row) postgres=# Is this a sympthom of data corruption or transaction wraparound due to the long running transaction that we killed weeks ago? This is the only table in the whole cluster that has this error. We are monitoring transactions wraparound with the the check_postgres.pl script, the check is still running fine and no alert was given at all since the cluster has been running. Thank you in advance for any answer. Paolo Crosato