This is a copy of my message for pgsql-hackers mailing list. Unfortunately original message was rejected due to one of recipients addresses is blocked.
> On 1 Aug 2024, at 10:54, Andrey M. Borodin <x4...@yandex-team.ru> wrote: > > > >> On 1 Aug 2024, at 05:44, Melanie Plageman <melanieplage...@gmail.com> wrote: >> >> Thanks for the review! v6 attached. >> >> On Sat, Jul 6, 2024 at 1:36 PM Andrey M. Borodin <x4...@yandex-team.ru> >> wrote: >>> >>> PgStartLSN = GetXLogInsertRecPtr(); >>> Should this be kind of RecoveryEndPtr? How is it expected to behave on >>> Standby in HA cluster, which was doing a crash recovery of 1y WALs in a >>> day, then is in startup for a year as a Hot Standby, and then is promoted? >> >> So, I don't think we will allow use of the LSNTimeStream on a standby, >> since it is unclear what it would mean on a standby. For example, do >> you want to know the time the LSN was generated or the time it was >> replayed? Note that bgwriter won't insert values to the time stream on >> a standby (it explicitly checks). > > Yes, I mentioned Standby because PgStartLSN is not what it says it is. > >> >> But, you bring up an issue that I don't quite know what to do about. >> If the standby doesn't have an LSNTimeStream, then when it is >> promoted, LSN <-> time conversions of LSNs and times before the >> promotion seem impossible. Maybe if the stats file is getting written >> out at checkpoints, we could restore from that previous primary's file >> after promotion? > > I’m afraid that clocks of a Primary from previous timeline might be not in > sync with ours. > It’s OK if it causes error, we just need to be prepared when they indicate > values from future. Perhaps, by shifting their last point to our “PgStartLSN”. > >> >> This brings up a second issue, which is that, currently, bgwriter >> won't insert into the time stream when wal_level is minimal. I'm not >> sure exactly how to resolve it because I am relying on the "last >> important rec pointer" and the LOG_SNAPSHOT_INTERVAL_MS to throttle >> when the bgwriter actually inserts new records into the LSNTimeStream. >> I could come up with a different timeout interval for updating the >> time stream. Perhaps I should do that? > > IDK. My knowledge of bgwriter is not enough to give a meaningful advise here. > >> >>> lsn_ts_calculate_error_area() is prone to overflow. Even int64 does not >>> seem capable to accommodate LSN*time. And the function may return negative >>> result, despite claiming area as a result. It’s intended, but a little >>> misleading. >> >> Ah, great point. I've fixed this. > > Well, not exactly. Result of lsn_ts_calculate_error_area() is still fabs()’ed > upon usage. I’d recommend to fabs in function. > BTW lsn_ts_calculate_error_area() have no prototype. > > Also, I’m not a big fan of using IEEE 754 float in this function. This data > type have 24 bits of significand bits. > Consider that current timestamp has 50 binary digits. Let’s assume realistic > LSNs have same 50 bits. > Then our rounding error is 2^76 byte*microseconds. > Let’s assume we are interested to measure time on a scale of 1GB WAL records. > This gives us rounding error of 2^46 microseconds = 2^26 seconds = 64 million > seconds = 2 years. > Seems like a gross error. > > If we use IEEE 754 doubles we have 53 significand bytes. And rounding error > will be on a scale of 128 microseconds per GB, which is acceptable. > > So I think double is better than float here. > > Nitpicking, but I’d prefer to sum up (triangle2 + triangle3 + rectangle_part) > before subtracting. This can save a bit of precision (smaller figures can > have lesser exponent). > > >>>> On 29 Jun 2024, at 03:09, Melanie Plageman <melanieplage...@gmail.com> >>>> wrote: >>>> change the user-facing functions for estimating an >>>> LSN/time conversion to instead return a floor and a ceiling -- instead >>>> of linearly interpolating a guess. This would be a way to keep users >>>> from misunderstanding the accuracy of the functions to translate LSN >>>> <-> time. >>> >>> I think this is a good idea. And it covers well “server restart problem”. >>> If API just returns -inf as a boundary, caller can correctly interpret this >>> situation. >> >> Providing "ceiling" and "floor" user functions is still a TODO for me, >> however, I think that the patch mostly does handle server restarts. >> >> In the event of a restart, the cumulative stats system will have >> persisted our time stream, so the LSNTimeStream will just be read back >> in with the rest of the stats. I've added logic to ensure that if the >> PgStartLSN is newer than our oldest LSNTimeStream entry, we use the >> oldest entry instead of PgStartLSN when doing conversions LSN <-> >> time. >> >> As for a crash, stats do not persist crashes, but I think Michael's >> patch will go in to write out the stats file at checkpoints, and then >> this should be good enough. >> >> Is there anything else you think that is an issue with restarts? > > Nope, looks good to me. > >> >>> Thanks! Looking forward to more timely freezing. >> >> Thanks! I'll be posting a new version of the opportunistic freezing >> patch that uses the time stream quite soon, so I hope you'll take a >> look at that as well! > > Great! Thank you! > Besides your TODOs and my nitpicking this patch series looks RfC to me. > > I have to address some review comments on my patches, then I hope I’ll switch > to reviewing opportunistic freezing. > > > Best regards, Andrey Borodin.