Hi!

I’m doing another iteration over the patchset.

PgStartLSN = GetXLogInsertRecPtr();
Should this be kind of RecoveryEndPtr? How is it expected to behave on Standby 
in HA cluster, which was doing a crash recovery of 1y WALs in a day, then is in 
startup for a year as a Hot Standby, and then is promoted?

lsn_ts_calculate_error_area() is prone to overflow. Even int64 does not seem 
capable to accommodate LSN*time. And the function may return negative result, 
despite claiming area as a result. It’s intended, but a little misleading.

i-- > 0
Is there a point to do a backward count in the loop?
Consider dropping not one by one, but half of a stream, LSNTimeStream is ~2Kb 
of cache and it’s loaded as a whole to the cache..



> On 27 Jun 2024, at 07:18, Melanie Plageman <melanieplage...@gmail.com> wrote:
> 
>> 2. Some benchmarks to proof the patch does not have CPU footprint.
> 
> This is still a todo. Typically when designing a benchmark like this,
> I would want to pick a worst-case workload to see how bad it could be.
> I wonder if just a write heavy workload like pgbench builtin tpcb-like
> would be sufficient?

Increasing background writer activity to maximum and not seeing LSNTimeStream 
function in `perf top` seems enough to me.

> 
>> === Nits ===
>> "Timeline" term is already taken.
> 
> I changed it to LSNTimeStream. What do you think?
Sounds good to me.


> 
>> Tests fail on Windows.
> 
> I think this was because of the compiler warnings, but I need to
> double-check now.
Nope, it really looks more serious.
[12:31:25.701] FAILED: 
src/backend/postgres_lib.a.p/utils_activity_pgstat_wal.c.obj 
[12:31:25.701] "cl" "-Isrc\backend\postgres_lib.a.p" "-Isrc\include" 
"-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" 
"-I..\src\include\port\win32_msvc" "/MDd" "/FIpostgres_pch.h" 
"/Yupostgres_pch.h" "/Fpsrc\backend\postgres_lib.a.p\postgres_pch.pch" 
"/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/DWIN32" "/DWINDOWS" 
"/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" 
"/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" 
"/wd4090" "/wd4267" "-DBUILDING_DLL" "/FS" 
"/FdC:\cirrus\build\src\backend\postgres_lib.pdb" 
/Fosrc/backend/postgres_lib.a.p/utils_activity_pgstat_wal.c.obj "/c" 
../src/backend/utils/activity/pgstat_wal.c
[12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(433): error C2375: 
'pg_estimate_lsn_at_time': redefinition; different linkage
[12:31:25.701] c:\cirrus\build\src\include\utils/fmgrprotos.h(2906): note: see 
declaration of 'pg_estimate_lsn_at_time'
[12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(434): error C2375: 
'pg_estimate_time_at_lsn': redefinition; different linkage
[12:31:25.701] c:\cirrus\build\src\include\utils/fmgrprotos.h(2905): note: see 
declaration of 'pg_estimate_time_at_lsn'
[12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(435): error C2375: 
'pg_lsntime_stream': redefinition; different linkage
[12:31:25.858] c:\cirrus\build\src\include\utils/fmgrprotos.h(2904): note: see 
declaration of 'pg_lsntime_stream'


> 
>> The patch lacks tests.
> 
> I thought about this a bit. I wonder what kind of tests make sense.
> 
> I could
> 1) Add tests with the existing stats tests
> (src/test/regress/sql/stats.sql) and just test that bgwriter is in
> fact adding to the time stream.
> 
> 2) Or should I add some infrastructure to be able to create an
> LSNTimeStream and then insert values to it and do some validations of
> what is added? I did a version of this but it is just much more
> annoying with C & SQL than with python (where I tried out my
> algorithms) [2].

I think just a test which calls functions and discards the result would greatly 
increase coverage.


> On 29 Jun 2024, at 03:09, Melanie Plageman <melanieplage...@gmail.com> wrote:
> change the user-facing functions for estimating an
> LSN/time conversion to instead return a floor and a ceiling -- instead
> of linearly interpolating a guess. This would be a way to keep users
> from misunderstanding the accuracy of the functions to translate LSN
> <-> time.

I think this is a good idea. And it covers well “server restart problem”. If 
API just returns -inf as a boundary, caller can correctly interpret this 
situation.

Thanks! Looking forward to more timely freezing.


Best regards, Andrey Borodin.

Reply via email to