On Wed, Dec 18, 2024 at 10:33 AM Amit Kapila <amit.kapil...@gmail.com> wrote:

Hi Amit!

> On Thu, Dec 5, 2024 at 3:14 PM Jakub Wartak
> <jakub.war...@enterprisedb.com> wrote:
> >
> > One of our customers ran into a very odd case, where hot standby feedback 
> > backend_xmin propagation stopped working due to major (hours/days) clock 
> > time shifts on hypervisor-managed VMs. This happens (and is fully 
> > reproducible) e.g. in scenarios where standby connects and its own VM is 
> > having time from the future (relative to primary) and then that time goes 
> > back to "normal". In such situation "sends hot_standby_feedback xmin" 
> > timestamp messages are stopped being transferred, e.g.:
> >
> > 2024-12-05 02:02:35 UTC [6002]: db=,user=,app=,client= DEBUG:  sending hot 
> > standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
> > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG:  sending 
> > write 6/E9015230 flush 6/E9015230 apply 6/E9015230
> > 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG:  sending hot 
> > standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
> > <-- clock readjustment and no further "sending hot standby feedback"
> ...
> >
> > I can share reproduction steps if anyone is interested. This basically 
> > happens due to usage of TimestampDifferenceExceeds() in 
> > XLogWalRcvSendHSFeedback(), but I bet there are other similiar scenarios.
> >
>
> We started to use a different mechanism in HEAD. See 
> XLogWalRcvSendHSFeedback().

Yes, you are correct somewhat because I was looking on REL13_STABLE,
but I've taken a fresh quick look at 05a7be93558 and tested it too.
Sadly, PG17 still maintains the same behavior of lack of proper
backend_xmin propagation (it stops sending hot standby feedback once
time on standby jumps forward). I believe this is the case because
walreceiver schedules next wakeup in far far future (when the clock /
now() is way ahead, see WalRcvComputeNextWakeup()), so when the clock
is back to normal (resetted back -X hours/days), the next wakeup seems
to be +X hours/days ahead.

> > What I was kind of surprised about was the lack of recommendation for 
> > having primary/standby to have clocks synced when using 
> > hot_standby_feedback, but such a thing is mentioned for 
> > recovery_min_apply_delay. So I would like to add at least one sentence to 
> > hot_standby_feedback to warn about this too, patch attached.
> >
>
> IIUC, this issue doesn't occur because the primary and standby clocks
> are not synchronized. It happened because the clock on standby moved
> backward.

In PG17 it would be because the clock moved way forward too much on
the standby. I don't know how it happened to that customer, but it was
probably done somehow by the hypervisor in that scenario (so time
wasn't slewed improperly by ntpd AFAIR, edge case, I know...)

> This is quite unlike the 'recovery_min_apply_delay' where
> non-synchronization of clocks between primary and standby can lead to
> unexpected results. This is because we don't compare any time on the
> primary with the time on standby. If this understanding is correct
> then the wording proposed by your patch should be changed accordingly.

.. if my understanding is correct, it is both depending on version :^)
I was thinking about backpatching docs (of what is the recommended
policy here? to just update new-release docs?), so I'm proposing
something more generic than earlier, but it takes Your point into
account - would something like below be good enough?

-       <para>
-        Using this option requires the primary and standby(s) to have system
-        clocks synchronized, otherwise it may lead to prolonged risk of not
-        removing dead rows on primary for extended periods of time as the
-        feedback mechanism is based on timestamps exchanged between primary
-        and standby(s).
-       </para>

+       <para>
+        Using this option requires the primary and standby(s) to have system
+        clocks synchronized (without big time jumps), otherwise it may lead to
+        prolonged risk of not removing dead rows on primary for
extended periods
+        of time as the feedback mechanism implementation is timestamp based.
+       </para>

-J.


Reply via email to