Re: Add LSN <-> time conversion functionality

Andrey M. Borodin Thu, 01 Aug 2024 03:37:34 -0700

This is a copy of my message for pgsql-hackers mailing list. Unfortunately 
original message was rejected due to one of recipients addresses is blocked.


> On 1 Aug 2024, at 10:54, Andrey M. Borodin <x4...@yandex-team.ru> wrote:
> 
> 
> 
>> On 1 Aug 2024, at 05:44, Melanie Plageman <melanieplage...@gmail.com> wrote:
>> 
>> Thanks for the review! v6 attached.
>> 
>> On Sat, Jul 6, 2024 at 1:36 PM Andrey M. Borodin <x4...@yandex-team.ru> 
>> wrote:
>>> 
>>> PgStartLSN = GetXLogInsertRecPtr();
>>> Should this be kind of RecoveryEndPtr? How is it expected to behave on 
>>> Standby in HA cluster, which was doing a crash recovery of 1y WALs in a 
>>> day, then is in startup for a year as a Hot Standby, and then is promoted?
>> 
>> So, I don't think we will allow use of the LSNTimeStream on a standby,
>> since it is unclear what it would mean on a standby. For example, do
>> you want to know the time the LSN was generated or the time it was
>> replayed? Note that bgwriter won't insert values to the time stream on
>> a standby (it explicitly checks).
> 
> Yes, I mentioned Standby because PgStartLSN is not what it says it is.
> 
>> 
>> But, you bring up an issue that I don't quite know what to do about.
>> If the standby doesn't have an LSNTimeStream, then when it is
>> promoted, LSN <-> time conversions of LSNs and times before the
>> promotion seem impossible. Maybe if the stats file is getting written
>> out at checkpoints, we could restore from that previous primary's file
>> after promotion?
> 
> I’m afraid that clocks of a Primary from previous timeline might be not in 
> sync with ours.
> It’s OK if it causes error, we just need to be prepared when they indicate 
> values from future. Perhaps, by shifting their last point to our “PgStartLSN”.
> 
>> 
>> This brings up a second issue, which is that, currently, bgwriter
>> won't insert into the time stream when wal_level is minimal. I'm not
>> sure exactly how to resolve it because I am relying on the "last
>> important rec pointer" and the LOG_SNAPSHOT_INTERVAL_MS to throttle
>> when the bgwriter actually inserts new records into the LSNTimeStream.
>> I could come up with a different timeout interval for updating the
>> time stream. Perhaps I should do that?
> 
> IDK. My knowledge of bgwriter is not enough to give a meaningful advise here.
> 
>> 
>>> lsn_ts_calculate_error_area() is prone to overflow. Even int64 does not 
>>> seem capable to accommodate LSN*time. And the function may return negative 
>>> result, despite claiming area as a result. It’s intended, but a little 
>>> misleading.
>> 
>> Ah, great point. I've fixed this.
> 
> Well, not exactly. Result of lsn_ts_calculate_error_area() is still fabs()’ed 
> upon usage. I’d recommend to fabs in function.
> BTW lsn_ts_calculate_error_area() have no prototype.
> 
> Also, I’m not a big fan of using IEEE 754 float in this function. This data 
> type have 24 bits of significand bits.
> Consider that current timestamp has 50 binary digits. Let’s assume realistic 
> LSNs have same 50 bits.
> Then our rounding error is 2^76 byte*microseconds.
> Let’s assume we are interested to measure time on a scale of 1GB WAL records.
> This gives us rounding error of 2^46 microseconds = 2^26 seconds = 64 million 
> seconds = 2 years.
> Seems like a gross error.
> 
> If we use IEEE 754 doubles we have 53 significand bytes. And rounding error 
> will be on a scale of 128 microseconds per GB, which is acceptable.
> 
> So I think double is better than float here.
> 
> Nitpicking, but I’d prefer to sum up (triangle2 + triangle3 + rectangle_part) 
> before subtracting. This can save a bit of precision (smaller figures can 
> have lesser exponent).
> 
> 
>>>> On 29 Jun 2024, at 03:09, Melanie Plageman <melanieplage...@gmail.com> 
>>>> wrote:
>>>> change the user-facing functions for estimating an
>>>> LSN/time conversion to instead return a floor and a ceiling -- instead
>>>> of linearly interpolating a guess. This would be a way to keep users
>>>> from misunderstanding the accuracy of the functions to translate LSN
>>>> <-> time.
>>> 
>>> I think this is a good idea. And it covers well “server restart problem”. 
>>> If API just returns -inf as a boundary, caller can correctly interpret this 
>>> situation.
>> 
>> Providing "ceiling" and "floor" user functions is still a TODO for me,
>> however, I think that the patch mostly does handle server restarts.
>> 
>> In the event of a restart, the cumulative stats system will have
>> persisted our time stream, so the LSNTimeStream will just be read back
>> in with the rest of the stats. I've added logic to ensure that if the
>> PgStartLSN is newer than our oldest LSNTimeStream entry, we use the
>> oldest entry instead of PgStartLSN when doing conversions LSN <->
>> time.
>> 
>> As for a crash, stats do not persist crashes, but I think Michael's
>> patch will go in to write out the stats file at checkpoints, and then
>> this should be good enough.
>> 
>> Is there anything else you think that is an issue with restarts?
> 
> Nope, looks good to me.
> 
>> 
>>> Thanks! Looking forward to more timely freezing.
>> 
>> Thanks! I'll be posting a new version of the opportunistic freezing
>> patch that uses the time stream quite soon, so I hope you'll take a
>> look at that as well!
> 
> Great! Thank you!
> Besides your TODOs and my nitpicking this patch series looks RfC to me.
> 
> I have to address some review comments on my patches, then I hope I’ll switch 
> to reviewing opportunistic freezing.
> 
> 
> Best regards, Andrey Borodin.

Re: Add LSN <-> time conversion functionality

Reply via email to