Hi hackers, I am proposing to add the reason for the oldest xmin to VACUUM logs. This feature would be useful for identifying why dead tuples cannot be removed, thereby helping to diagnose and prevent table bloat.
The current logs only indicate that dead tuples could not be reclaimed due to the oldest xmin, but they do not reveal the underlying reason. To identify the cause, it is necessary to query multiple views: pg_stat_activity (for active transactions), pg_prepared_xacts (for prepared statements), pg_replication_slots (for replication slots), and pg_stat_replication (for hot standby feedback). However, because the data in these views is volatile, it is difficult to retroactively determine what was holding the oldest xmin at the specific time the log message was generated. This PoC patch addresses this problem. The implementation now outputs the reason for the oldest xmin and, where applicable, the backend PID. This information was originally discarded when calculating the oldest xmin horizon, and the computation required to retrieve these reasons is considered reasonable. The patch is attached. What do you think? -- Best regards, Shinya Kato NTT OSS Center
v1-0001-Report-oldest-xmin-source-when-autovacuum-cannot-.patch
Description: Binary data
