Hello All, I'm experiencing significant read query blocking on Aurora PostgreSQL read replicas during VACUUM relation truncation, particularly with TOAST tables. This affects a high-traffic service (~3000 req/sec) and causes application downtime.
*Problem Summary:* WAL replay of relation truncation operations on read replicas triggers buffer invalidation that requires AccessExclusive locks, blocking concurrent read queries for extended periods. *Environment Details:* - Aurora PostgreSQL (read replica setup) - Workload: Async writes to primary, read-only queries on replica - TOAST table with ~4KB average compressed column size - maintenance_work_mem: 2087MB * Observed Behavior:* * [23541]: select gzipped_dto from <table> dl1_0 where (dl1_0.entity_id,dl1_0.language_code) in (($1,$2)) 2025-06-28 11:57:34 UTC: process 23574 still waiting for AccessShareLock on relation 20655 after 1000.035 ms* The blocking coincides with substantial TOAST table truncation: *2025-06-28 11:57:39 UTC::@:[8399]:LOG: automatic vacuum of table "delivery.pg_toast.pg_toast_20652": index scans: 1 pages: 212964 removed, 434375 remain, 78055 scanned (12.06% of total) tuples: 198922 removed, 2015440 remain, 866 are dead but not yet removable removable cutoff: 1066590201, which was 783 XIDs old when operation ended frozen: 3 pages from table (0.00% of total) had 19 tuples frozen index scan needed: 39600 pages from table (6.12% of total) had 199413 dead item identifiers removed index "pg_toast_20652_index": pages: 16131 in total, 35 newly deleted, 7574 currently deleted, 7539 reusable I/O timings: read: 173469.911 ms, write: 0.000 ms avg read rate: 9.198 MB/s, avg write rate: 0.000 MB/s buffer usage: 220870 hits, 213040 misses, 0 dirtied WAL usage: 0 records, 0 full page images, 0 bytes system usage: CPU: user: 2.97 s, system: 1.86 s, elapsed: 180.95 s* *Analysis:* The vacuum reclaimed 212,964 pages (33% of the relation), indicating legitimate space reclamation. With maintenance_work_mem set to 2087MB, memory constraints aren't limiting the vacuum process. However, WAL replay of the truncation operation on the read replica requires invalidating these pages from shared_buffers, which conflicts with ongoing read queries. *Questions for Discussion:* 1. Batch Buffer Invalidation: Could buffer invalidation during WAL replay be batched or deferred to reduce lock contention duration? 2. Replica-Specific Truncation Policy: Should read replicas have different truncation thresholds (REL_TRUNCATE_MINIMUM/REL_TRUNCATE_FRACTION) to balance space reclamation against query availability? 3. Cloud-Native Considerations: In cloud environments like Aurora with separate storage layers, is immediate buffer invalidation during truncation replay necessary, or could this be optimized? 4. Lock Duration Optimization: The current truncation process holds AccessExclusive locks during the entire invalidation. Could this be shortened through incremental processing? *Potential Approaches:* - Implement configurable truncation behavior for standby servers - Add batching/throttling to buffer invalidation during WAL replay - Provide a way to defer truncation replay during high read activity periods This issue particularly affects TOAST tables due to their chunked storage pattern creating more opportunities for dead space, but the core problem applies to any significant relation truncation on read replicas. Has anyone else encountered this issue? Are there existing configuration options or patches that address WAL replay buffer invalidation conflicts? Thanks for any insights. Thanks, Dharin