On Wed, Dec 18, 2024 at 10:33 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
Hi Amit! > On Thu, Dec 5, 2024 at 3:14 PM Jakub Wartak > <jakub.war...@enterprisedb.com> wrote: > > > > One of our customers ran into a very odd case, where hot standby feedback > > backend_xmin propagation stopped working due to major (hours/days) clock > > time shifts on hypervisor-managed VMs. This happens (and is fully > > reproducible) e.g. in scenarios where standby connects and its own VM is > > having time from the future (relative to primary) and then that time goes > > back to "normal". In such situation "sends hot_standby_feedback xmin" > > timestamp messages are stopped being transferred, e.g.: > > > > 2024-12-05 02:02:35 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot > > standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0 > > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending > > write 6/E9015230 flush 6/E9015230 apply 6/E9015230 > > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot > > standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0 > > <-- clock readjustment and no further "sending hot standby feedback" > ... > > > > I can share reproduction steps if anyone is interested. This basically > > happens due to usage of TimestampDifferenceExceeds() in > > XLogWalRcvSendHSFeedback(), but I bet there are other similiar scenarios. > > > > We started to use a different mechanism in HEAD. See > XLogWalRcvSendHSFeedback(). Yes, you are correct somewhat because I was looking on REL13_STABLE, but I've taken a fresh quick look at 05a7be93558 and tested it too. Sadly, PG17 still maintains the same behavior of lack of proper backend_xmin propagation (it stops sending hot standby feedback once time on standby jumps forward). I believe this is the case because walreceiver schedules next wakeup in far far future (when the clock / now() is way ahead, see WalRcvComputeNextWakeup()), so when the clock is back to normal (resetted back -X hours/days), the next wakeup seems to be +X hours/days ahead. > > What I was kind of surprised about was the lack of recommendation for > > having primary/standby to have clocks synced when using > > hot_standby_feedback, but such a thing is mentioned for > > recovery_min_apply_delay. So I would like to add at least one sentence to > > hot_standby_feedback to warn about this too, patch attached. > > > > IIUC, this issue doesn't occur because the primary and standby clocks > are not synchronized. It happened because the clock on standby moved > backward. In PG17 it would be because the clock moved way forward too much on the standby. I don't know how it happened to that customer, but it was probably done somehow by the hypervisor in that scenario (so time wasn't slewed improperly by ntpd AFAIR, edge case, I know...) > This is quite unlike the 'recovery_min_apply_delay' where > non-synchronization of clocks between primary and standby can lead to > unexpected results. This is because we don't compare any time on the > primary with the time on standby. If this understanding is correct > then the wording proposed by your patch should be changed accordingly. .. if my understanding is correct, it is both depending on version :^) I was thinking about backpatching docs (of what is the recommended policy here? to just update new-release docs?), so I'm proposing something more generic than earlier, but it takes Your point into account - would something like below be good enough? - <para> - Using this option requires the primary and standby(s) to have system - clocks synchronized, otherwise it may lead to prolonged risk of not - removing dead rows on primary for extended periods of time as the - feedback mechanism is based on timestamps exchanged between primary - and standby(s). - </para> + <para> + Using this option requires the primary and standby(s) to have system + clocks synchronized (without big time jumps), otherwise it may lead to + prolonged risk of not removing dead rows on primary for extended periods + of time as the feedback mechanism implementation is timestamp based. + </para> -J.