Hi, The document explains that "lost" value that pg_replication_slots.wal_status reports means
some WAL files are definitely lost and this slot cannot be used to resume replication anymore. However, I observed "lost" value while inserting lots of records, but replication could continue normally. So I wonder if pg_replication_slots.wal_status may have a bug. wal_status is calculated in GetWALAvailability(), and probably I found some issues in it. keepSegs = ConvertToXSegs(Max(max_wal_size_mb, wal_keep_segments), wal_segment_size) + 1; max_wal_size_mb is the number of megabytes. wal_keep_segments is the number of WAL segment files. So it's strange to calculate max of them. The above should be the following? Max(ConvertToXSegs(max_wal_size_mb, wal_segment_size), wal_keep_segments) + 1 if ((max_slot_wal_keep_size_mb <= 0 || max_slot_wal_keep_size_mb >= max_wal_size_mb) && oldestSegMaxWalSize <= targetSeg) return WALAVAIL_NORMAL; This code means that wal_status reports "normal" only when max_slot_wal_keep_size is negative or larger than max_wal_size. Why is this condition necessary? The document explains "normal means that the claimed files are within max_wal_size". So whatever max_slot_wal_keep_size value is, IMO that "normal" should be reported if the WAL files claimed by the slot are within max_wal_size. Thought? Or, if that condition is really necessary, the document should be updated so that the note about the condition is added. If the WAL files claimed by the slot exceeds max_slot_wal_keep_size but any those WAL files have not been removed yet, wal_status seems to report "lost". Is this expected behavior? Per the meaning of "lost" described in the document, "lost" should be reported only when any claimed files are removed, I think. Thought? Or this behavior is expected and the document is incorrect? BTW, if we want to implement GetWALAvailability() as the document advertises, we can simplify it like the attached POC patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 55cac186dc..0b9cca2173 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9504,62 +9504,29 @@ GetWALAvailability(XLogRecPtr targetLSN) XLogSegNo currSeg; /* segid of currpos */ XLogSegNo targetSeg; /* segid of targetLSN */ XLogSegNo oldestSeg; /* actual oldest segid */ - XLogSegNo oldestSegMaxWalSize; /* oldest segid kept by max_wal_size */ - XLogSegNo oldestSlotSeg = InvalidXLogRecPtr; /* oldest segid kept by - * slot */ uint64 keepSegs; /* slot does not reserve WAL. Either deactivated, or has never been active */ if (XLogRecPtrIsInvalid(targetLSN)) return WALAVAIL_INVALID_LSN; - currpos = GetXLogWriteRecPtr(); - /* calculate oldest segment currently needed by slots */ XLByteToSeg(targetLSN, targetSeg, wal_segment_size); - KeepLogSeg(currpos, &oldestSlotSeg); - /* - * Find the oldest extant segment file. We get 1 until checkpoint removes - * the first WAL segment file since startup, which causes the status being - * wrong under certain abnormal conditions but that doesn't actually harm. - */ - oldestSeg = XLogGetLastRemovedSegno() + 1; + /* Find the oldest extant segment file */ + oldestSeg = XLogGetLastRemovedSegno(); - /* calculate oldest segment by max_wal_size and wal_keep_segments */ + if (targetSeg <= oldestSeg) + return WALAVAIL_REMOVED; + + currpos = GetXLogWriteRecPtr(); XLByteToSeg(currpos, currSeg, wal_segment_size); - keepSegs = ConvertToXSegs(Max(max_wal_size_mb, wal_keep_segments), - wal_segment_size) + 1; + keepSegs = ConvertToXSegs(max_wal_size_mb, wal_segment_size); - if (currSeg > keepSegs) - oldestSegMaxWalSize = currSeg - keepSegs; - else - oldestSegMaxWalSize = 1; - - /* - * If max_slot_wal_keep_size has changed after the last call, the segment - * that would been kept by the current setting might have been lost by the - * previous setting. No point in showing normal or keeping status values - * if the targetSeg is known to be lost. - */ - if (targetSeg >= oldestSeg) - { - /* - * show "normal" when targetSeg is within max_wal_size, even if - * max_slot_wal_keep_size is smaller than max_wal_size. - */ - if ((max_slot_wal_keep_size_mb <= 0 || - max_slot_wal_keep_size_mb >= max_wal_size_mb) && - oldestSegMaxWalSize <= targetSeg) + if (currSeg - targetSeg <= keepSegs) return WALAVAIL_NORMAL; - /* being retained by slots */ - if (oldestSlotSeg <= targetSeg) - return WALAVAIL_RESERVED; - } - - /* Definitely lost */ - return WALAVAIL_REMOVED; + return WALAVAIL_RESERVED; }