Hi! I’m doing another iteration over the patchset.
PgStartLSN = GetXLogInsertRecPtr(); Should this be kind of RecoveryEndPtr? How is it expected to behave on Standby in HA cluster, which was doing a crash recovery of 1y WALs in a day, then is in startup for a year as a Hot Standby, and then is promoted? lsn_ts_calculate_error_area() is prone to overflow. Even int64 does not seem capable to accommodate LSN*time. And the function may return negative result, despite claiming area as a result. It’s intended, but a little misleading. i-- > 0 Is there a point to do a backward count in the loop? Consider dropping not one by one, but half of a stream, LSNTimeStream is ~2Kb of cache and it’s loaded as a whole to the cache.. > On 27 Jun 2024, at 07:18, Melanie Plageman <melanieplage...@gmail.com> wrote: > >> 2. Some benchmarks to proof the patch does not have CPU footprint. > > This is still a todo. Typically when designing a benchmark like this, > I would want to pick a worst-case workload to see how bad it could be. > I wonder if just a write heavy workload like pgbench builtin tpcb-like > would be sufficient? Increasing background writer activity to maximum and not seeing LSNTimeStream function in `perf top` seems enough to me. > >> === Nits === >> "Timeline" term is already taken. > > I changed it to LSNTimeStream. What do you think? Sounds good to me. > >> Tests fail on Windows. > > I think this was because of the compiler warnings, but I need to > double-check now. Nope, it really looks more serious. [12:31:25.701] FAILED: src/backend/postgres_lib.a.p/utils_activity_pgstat_wal.c.obj [12:31:25.701] "cl" "-Isrc\backend\postgres_lib.a.p" "-Isrc\include" "-I..\src\include" "-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "/MDd" "/FIpostgres_pch.h" "/Yupostgres_pch.h" "/Fpsrc\backend\postgres_lib.a.p\postgres_pch.pch" "/nologo" "/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "-DBUILDING_DLL" "/FS" "/FdC:\cirrus\build\src\backend\postgres_lib.pdb" /Fosrc/backend/postgres_lib.a.p/utils_activity_pgstat_wal.c.obj "/c" ../src/backend/utils/activity/pgstat_wal.c [12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(433): error C2375: 'pg_estimate_lsn_at_time': redefinition; different linkage [12:31:25.701] c:\cirrus\build\src\include\utils/fmgrprotos.h(2906): note: see declaration of 'pg_estimate_lsn_at_time' [12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(434): error C2375: 'pg_estimate_time_at_lsn': redefinition; different linkage [12:31:25.701] c:\cirrus\build\src\include\utils/fmgrprotos.h(2905): note: see declaration of 'pg_estimate_time_at_lsn' [12:31:25.701] ../src/backend/utils/activity/pgstat_wal.c(435): error C2375: 'pg_lsntime_stream': redefinition; different linkage [12:31:25.858] c:\cirrus\build\src\include\utils/fmgrprotos.h(2904): note: see declaration of 'pg_lsntime_stream' > >> The patch lacks tests. > > I thought about this a bit. I wonder what kind of tests make sense. > > I could > 1) Add tests with the existing stats tests > (src/test/regress/sql/stats.sql) and just test that bgwriter is in > fact adding to the time stream. > > 2) Or should I add some infrastructure to be able to create an > LSNTimeStream and then insert values to it and do some validations of > what is added? I did a version of this but it is just much more > annoying with C & SQL than with python (where I tried out my > algorithms) [2]. I think just a test which calls functions and discards the result would greatly increase coverage. > On 29 Jun 2024, at 03:09, Melanie Plageman <melanieplage...@gmail.com> wrote: > change the user-facing functions for estimating an > LSN/time conversion to instead return a floor and a ceiling -- instead > of linearly interpolating a guess. This would be a way to keep users > from misunderstanding the accuracy of the functions to translate LSN > <-> time. I think this is a good idea. And it covers well “server restart problem”. If API just returns -inf as a boundary, caller can correctly interpret this situation. Thanks! Looking forward to more timely freezing. Best regards, Andrey Borodin.