Attached v7 changes the SQL-callable functions to return ranges of
LSNs and times covering the target time or LSN instead of linearly
interpolating an approximate answer.

I also changed the frequency and conditions under which the background
writer updates the global LSNTimeStream. There is now a dedicated
interval at which the LSNTimeStream is updated (instead of reusing the
log standby snapshot interval).

I also found that it is incorrect to set PgStartLSN to the insert LSN
in PostmasterMain(). The XLog buffer cache is not guaranteed to be
initialized in time. Instead of trying to provide an LSN lower bound
for locating times before those recorded on the global LSNTimeStream,
I simply return a lower bound of InvalidXLogRecPtr. Similarly, I
provide a lower bound of -infinity when locating LSNs before those
recorded on the global LSNTimeStream.

On Thu, Aug 1, 2024 at 3:55 AM Andrey M. Borodin <x4...@yandex-team.ru> wrote:
>
> > On 1 Aug 2024, at 05:44, Melanie Plageman <melanieplage...@gmail.com> wrote:
> >
> > On Sat, Jul 6, 2024 at 1:36 PM Andrey M. Borodin <x4...@yandex-team.ru> 
> > wrote:
> >>
> >> PgStartLSN = GetXLogInsertRecPtr();
> >> Should this be kind of RecoveryEndPtr? How is it expected to behave on 
> >> Standby in HA cluster, which was doing a crash recovery of 1y WALs in a 
> >> day, then is in startup for a year as a Hot Standby, and then is promoted?
> >
> > So, I don't think we will allow use of the LSNTimeStream on a standby,
> > since it is unclear what it would mean on a standby. For example, do
> > you want to know the time the LSN was generated or the time it was
> > replayed? Note that bgwriter won't insert values to the time stream on
> > a standby (it explicitly checks).
>
> Yes, I mentioned Standby because PgStartLSN is not what it says it is.

Right, I've found another way of dealing with this since PgStartLSN
was incorrect.

> > But, you bring up an issue that I don't quite know what to do about.
> > If the standby doesn't have an LSNTimeStream, then when it is
> > promoted, LSN <-> time conversions of LSNs and times before the
> > promotion seem impossible. Maybe if the stats file is getting written
> > out at checkpoints, we could restore from that previous primary's file
> > after promotion?
>
> I’m afraid that clocks of a Primary from previous timeline might be not in 
> sync with ours.
> It’s OK if it causes error, we just need to be prepared when they indicate 
> values from future. Perhaps, by shifting their last point to our “PgStartLSN”.

Regarding a standby being promoted. I plan to make a version of the
LSNTimeStream functions which works on a standby by using
getRecordTimestamp() and inserts an LSN from the last record replayed
and the associated timestamp. That should mean the LSNTimeStream on
the standby is roughly the same as the one on the primary since those
records were inserted on the primary.

As for time going backwards in general, I've now made it so that
values are only inserted if the times are monotonically increasing and
the LSN is the same or increasing. This should handle time going
backwards, either on the primary itself or after a standby is promoted
if the timeline wasn't a perfect match.

> > This brings up a second issue, which is that, currently, bgwriter
> > won't insert into the time stream when wal_level is minimal. I'm not
> > sure exactly how to resolve it because I am relying on the "last
> > important rec pointer" and the LOG_SNAPSHOT_INTERVAL_MS to throttle
> > when the bgwriter actually inserts new records into the LSNTimeStream.
> > I could come up with a different timeout interval for updating the
> > time stream. Perhaps I should do that?
>
> IDK. My knowledge of bgwriter is not enough to give a meaningful advise here.

See my note at top of the email.

> >> lsn_ts_calculate_error_area() is prone to overflow. Even int64 does not 
> >> seem capable to accommodate LSN*time. And the function may return negative 
> >> result, despite claiming area as a result. It’s intended, but a little 
> >> misleading.
> >
> > Ah, great point. I've fixed this.
>
> Well, not exactly. Result of lsn_ts_calculate_error_area() is still fabs()’ed 
> upon usage. I’d recommend to fabs in function.
> BTW lsn_ts_calculate_error_area() have no prototype.
>
> Also, I’m not a big fan of using IEEE 754 float in this function. This data 
> type have 24 bits of significand bits.
> Consider that current timestamp has 50 binary digits. Let’s assume realistic 
> LSNs have same 50 bits.
> Then our rounding error is 2^76 byte*microseconds.
> Let’s assume we are interested to measure time on a scale of 1GB WAL records.
> This gives us rounding error of 2^46 microseconds = 2^26 seconds = 64 million 
> seconds = 2 years.
> Seems like a gross error.
>
> If we use IEEE 754 doubles we have 53 significand bytes. And rounding error 
> will be on a scale of 128 microseconds per GB, which is acceptable.
>
> So I think double is better than float here.
>
> Nitpicking, but I’d prefer to sum up (triangle2 + triangle3 + rectangle_part) 
> before subtracting. This can save a bit of precision (smaller figures can 
> have lesser exponent).

Okay, thanks for the detail. See what you think about v7.

Some perf testing of bgwriter updates are still a todo. I was thinking
that it might be bad to take an exclusive lock on the WAL stats data
structure for the entire time I am inserting a new value to the
LSNTimeStream. I was thinking maybe I should take a share lock and
calculate which element to drop first and then take the exclusive
lock? Or maybe I should make a separate lock for just the stream
member of PgStat_WalStats. Maybe it isn't worth it? I'm not sure.

- Melanie
From 1e8bf7042e1c652c490f9ccd2940d200617cbfee Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Mon, 5 Aug 2024 20:33:15 -0400
Subject: [PATCH v7 2/4] Add global LSNTimeStream to PgStat_WalStats

Add a globally maintained instance of an LSNTimeStream to
PgStat_WalStats and a utility function to insert new values.
The WAL generation rate is meant to be used for statistical purposes, so
it makes sense for it to live in the WAL stats data structure.

Background writer is tasked with inserting new LSN, time pairs to the
global stream in its main loop at regular intervals. There is precedent
for background writer performing such tasks: bgwriter already
periodically logs snapshots into the WAL for the benefit of standbys.
---
 src/backend/postmaster/bgwriter.c       | 81 +++++++++++++++++++++----
 src/backend/utils/activity/pgstat_wal.c | 15 +++++
 src/include/pgstat.h                    |  5 ++
 3 files changed, 89 insertions(+), 12 deletions(-)

diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 0f75548759a..bb5e2d8ec5d 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -76,6 +76,21 @@ int			BgWriterDelay = 200;
 static TimestampTz last_snapshot_ts;
 static XLogRecPtr last_snapshot_lsn = InvalidXLogRecPtr;
 
+/*
+ * Interval at which new LSN, time pairs are added into the global
+ * LSNTimeStream, in milliseconds.
+ */
+#define LOG_STREAM_INTERVAL_MS 30000
+
+/*
+ * The timestamp at which we last checked whether or not to update the global
+ * LSNTimeStream.
+ */
+static TimestampTz last_stream_check_ts;
+
+/* The LSN we last updated the LSNTimeStream with */
+static XLogRecPtr last_stream_update_lsn = InvalidXLogRecPtr;
+
 
 /*
  * Main entry point for bgwriter process
@@ -119,6 +134,12 @@ BackgroundWriterMain(char *startup_data, size_t startup_data_len)
 	 */
 	last_snapshot_ts = GetCurrentTimestamp();
 
+	/* Insert an entry to the global LSNTimeStream as soon as we can. */
+	last_stream_check_ts = last_snapshot_ts;
+	last_stream_update_lsn = GetXLogInsertRecPtr();
+	pgstat_wal_update_lsntime_stream(last_stream_update_lsn,
+									 last_stream_check_ts);
+
 	/*
 	 * Create a memory context that we will do all our work in.  We do this so
 	 * that we can reset the context during error recovery and thereby avoid
@@ -269,26 +290,62 @@ BackgroundWriterMain(char *startup_data, size_t startup_data_len)
 		 * Checkpointer, when active, is barely ever in its mainloop and thus
 		 * makes it hard to log regularly.
 		 */
-		if (XLogStandbyInfoActive() && !RecoveryInProgress())
+
+		if (!RecoveryInProgress())
 		{
 			TimestampTz timeout = 0;
 			TimestampTz now = GetCurrentTimestamp();
 
-			timeout = TimestampTzPlusMilliseconds(last_snapshot_ts,
-												  LOG_SNAPSHOT_INTERVAL_MS);
+			if (XLogStandbyInfoActive())
+			{
+				timeout = TimestampTzPlusMilliseconds(last_snapshot_ts,
+													  LOG_SNAPSHOT_INTERVAL_MS);
+
+				/*
+				 * Only log if enough time has passed and interesting records
+				 * have been inserted since the last snapshot.  Have to
+				 * compare with <= instead of < because
+				 * GetLastImportantRecPtr() points at the start of a record,
+				 * whereas last_snapshot_lsn points just past the end of the
+				 * record.
+				 */
+				if (now >= timeout &&
+					last_snapshot_lsn <= GetLastImportantRecPtr())
+				{
+					last_snapshot_lsn = LogStandbySnapshot();
+					last_snapshot_ts = now;
+				}
+			}
+
+			timeout = TimestampTzPlusMilliseconds(last_stream_check_ts,
+												  LOG_STREAM_INTERVAL_MS);
 
 			/*
-			 * Only log if enough time has passed and interesting records have
-			 * been inserted since the last snapshot.  Have to compare with <=
-			 * instead of < because GetLastImportantRecPtr() points at the
-			 * start of a record, whereas last_snapshot_lsn points just past
-			 * the end of the record.
+			 * Periodically insert a new LSNTime into the global
+			 * LSNTimeStream. It makes sense for the background writer to
+			 * maintain the global LSNTimeStream because it runs regularly and
+			 * returns to its main loop frequently.
 			 */
-			if (now >= timeout &&
-				last_snapshot_lsn <= GetLastImportantRecPtr())
+			if (now >= timeout)
 			{
-				last_snapshot_lsn = LogStandbySnapshot();
-				last_snapshot_ts = now;
+				XLogRecPtr	insert_lsn = GetXLogInsertRecPtr();
+
+				Assert(insert_lsn != InvalidXLogRecPtr);
+
+				/*
+				 * We only insert an LSNTime if the LSN has changed since the
+				 * last update. This sacrifices accuracy on LSN -> time
+				 * conversions but saves space, which increases the accuracy
+				 * of time -> LSN conversions.
+				 */
+				if (insert_lsn > last_stream_update_lsn)
+				{
+					pgstat_wal_update_lsntime_stream(insert_lsn,
+													 now);
+					last_stream_update_lsn = insert_lsn;
+				}
+
+				last_stream_check_ts = now;
 			}
 		}
 
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 95ec65a51ff..1ce9060641c 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -369,6 +369,21 @@ lsntime_insert(LSNTimeStream *stream, TimestampTz time,
 }
 
 
+/*
+ * Utility function for inserting a new member into the LSNTimeStream member
+ * of WAL stats.
+ */
+void
+pgstat_wal_update_lsntime_stream(XLogRecPtr lsn, TimestampTz time)
+{
+	PgStatShared_Wal *stats_shmem = &pgStatLocal.shmem->wal;
+
+	LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE);
+	lsntime_insert(&stats_shmem->stats.stream, time, lsn);
+	LWLockRelease(&stats_shmem->lock);
+}
+
+
 /*
  * Returns a range of LSNTimes starting at lower and ending at upper and
  * covering the target_time. If target_time is before the stream, lower will
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 13856e2bef3..43df60ce24c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -507,6 +507,7 @@ typedef struct PgStat_WalStats
 	PgStat_Counter wal_sync;
 	PgStat_Counter wal_write_time;
 	PgStat_Counter wal_sync_time;
+	LSNTimeStream stream;
 	TimestampTz stat_reset_timestamp;
 } PgStat_WalStats;
 
@@ -795,6 +796,10 @@ extern void time_bounds_for_lsn(const LSNTimeStream *stream,
 								XLogRecPtr target_lsn,
 								LSNTime *lower, LSNTime *upper);
 
+/* Helper for maintaining the global LSNTimeStream */
+extern void pgstat_wal_update_lsntime_stream(XLogRecPtr lsn,
+											 TimestampTz time);
+
 
 /*
  * Variables in pgstat.c
-- 
2.34.1

From eeb936816924b91a194ed03a4296b4f669e72071 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Wed, 7 Aug 2024 10:57:45 -0400
Subject: [PATCH v7 3/4] Add time <-> LSN translation range functions

Previous commits added a global LSNTimeStream, maintained by background
writer and functions to return a range of LSNs covering a time or time
covering an LSN.

Add SQL-callable functions to produce these ranges and a SQL-callable
function returning the entire LSNTimeStream.

This could be useful in combination with SQL-callable functions
accessing a page LSN to approximate the time of last modification of a
page or estimating the LSN consumption rate to moderate maintenance
processes and balance system resource utilization.
---
 doc/src/sgml/monitoring.sgml        |  75 +++++++++++++++++++
 src/backend/utils/adt/pgstatfuncs.c | 107 ++++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat     |  27 +++++++
 src/test/regress/expected/stats.out |  43 +++++++++++
 src/test/regress/sql/stats.sql      |  28 ++++++++
 5 files changed, 280 insertions(+)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 55417a6fa9d..9b63659900b 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3195,6 +3195,81 @@ description | Waiting for a newly initialized WAL file to reach durable storage
    </tgroup>
   </table>
 
+  <para>
+  In addition to these WAL stats, a stream of LSN-time pairs is accessible
+  via the functions shown in <xref linkend="functions-lsn-time-stream"/>.
+  </para>
+
+  <table id="functions-lsn-time-stream">
+   <title>LSN Time Stream Information Functions</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="func_table_entry"><para role="func_signature">
+       Function
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="func_table_entry"><para role="func_signature">
+       <indexterm>
+        <primary>pg_stat_lsn_bounds_for_time</primary>
+       </indexterm>
+       <function>pg_stat_lsn_bounds_for_time</function>
+       ( <type>timestamp with time zone</type> )
+       <returnvalue>record</returnvalue>
+       (<parameter>lsn</parameter> <type>pg_lsn</type>,
+       <parameter>lsn</parameter> <type>pg_lsn</type>)
+      </para>
+      <para>
+       Returns the upper and lower bound of the LSN range on the global
+       LSNTimeLine in which the time falls.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="func_table_entry"><para role="func_signature">
+       <indexterm>
+        <primary>pg_stat_time_bounds_for_lsn</primary>
+       </indexterm>
+       <function>pg_stat_time_bounds_for_lsn</function>
+       ( <type>pg_lsn</type> )
+       <returnvalue>record</returnvalue>
+       ( <parameter>time</parameter> <type>timestamp with time zone</type>,
+       <parameter>time</parameter> <type>timestamp with time zone</type>)
+      </para>
+      <para>
+       Returns the upper and lower bound of the time range on the global
+       LSNTimeLine in which the LSN falls.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="func_table_entry"><para role="func_signature">
+       <indexterm>
+        <primary>pg_stat_lsntime_stream</primary>
+       </indexterm>
+       <function>pg_stat_lsntime_stream</function> ()
+       <returnvalue>setof record</returnvalue>
+       ( <parameter>lsn</parameter> <type>pg_lsn</type>,
+       <parameter>time</parameter> <type>timestamp with time zone</type>)
+      </para>
+      <para>
+       Returns all of the LSN-time pairs currently in the global LSN time
+       stream.
+      </para></entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+
+
+
 </sect2>
 
  <sect2 id="monitoring-pg-stat-database-view">
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 32211371237..ac862fb679a 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -30,6 +30,7 @@
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 #include "utils/timestamp.h"
 
 #define UINT32_ACCESS_ONCE(var)		 ((uint32)(*((volatile uint32 *)&(var))))
@@ -1526,6 +1527,112 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
 
+/*
+ * Returns the LSN, time pairs making up the global LSNTimeStream maintained
+ * in WAL statistics.
+ */
+Datum
+pg_stat_lsntime_stream(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_WalStats *stats;
+	LSNTimeStream *stream;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	stats = pgstat_fetch_stat_wal();
+	stream = &stats->stream;
+
+	for (size_t i = 0; i < stream->length; i++)
+	{
+		Datum		values[2] = {0};
+		bool		nulls[2] = {0};
+
+		values[0] = stream->data[i].lsn;
+		values[1] = stream->data[i].time;
+		tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+							 values, nulls);
+	}
+
+	return (Datum) 0;
+}
+
+/*
+ * Returns the upper and lower bounds of an LSN range covering the passed-in
+ * time. If the passed-in time is far enough in the past that we don't have
+ * data, the lower bound will be InvalidXLogRecPtr. If it is in the future,
+ * the upper bound will be FFFFFFFF/FFFFFFFF.
+ */
+Datum
+pg_stat_lsn_bounds_for_time(PG_FUNCTION_ARGS)
+{
+	PgStat_WalStats *wal_stats;
+	TimestampTz target_time;
+	LSNTime		lower,
+				upper;
+	TupleDesc	tupdesc;
+	Datum		values[2] = {0};
+	bool		nulls[2] = {0};
+
+	target_time = PG_GETARG_TIMESTAMPTZ(0);
+
+	tupdesc = CreateTemplateTupleDesc(2);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "lower",
+					   PG_LSNOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "upper",
+					   PG_LSNOID, -1, 0);
+	BlessTupleDesc(tupdesc);
+
+	wal_stats = pgstat_fetch_stat_wal();
+	lsn_bounds_for_time(&wal_stats->stream, target_time, &lower, &upper);
+
+	values[0] = lower.lsn;
+	values[1] = upper.lsn;
+
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc,
+													  values,
+													  nulls)));
+}
+
+
+/*
+ * Returns the upper and lower bounds of a TimestampTz range covering the
+ * passed-in LSN. If the passed-in LSN is far enough in the past that we don't
+ * have data, the lower bound will be -infinity. If the passed-in LSN is in
+ * the future, the upper bound will be infinity.
+ */
+Datum
+pg_stat_time_bounds_for_lsn(PG_FUNCTION_ARGS)
+{
+	PgStat_WalStats *wal_stats;
+	XLogRecPtr	target_lsn;
+	LSNTime		lower,
+				upper;
+	TupleDesc	tupdesc;
+	Datum		values[2] = {0};
+	bool		nulls[2] = {0};
+
+	target_lsn = PG_GETARG_LSN(0);
+
+	tupdesc = CreateTemplateTupleDesc(2);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "lower",
+					   TIMESTAMPTZOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "upper",
+					   TIMESTAMPTZOID, -1, 0);
+	BlessTupleDesc(tupdesc);
+
+	wal_stats = pgstat_fetch_stat_wal();
+	time_bounds_for_lsn(&wal_stats->stream, target_lsn, &lower, &upper);
+
+	values[0] = lower.time;
+	values[1] = upper.time;
+
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc,
+													  values,
+													  nulls)));
+}
+
 /*
  * Returns statistics of SLRU caches.
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d36f6001bb1..c59f42bc974 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6375,6 +6375,33 @@
   prorettype => 'timestamptz', proargtypes => 'xid',
   prosrc => 'pg_xact_commit_timestamp' },
 
+{ oid => '9997', descr => 'get upper and lower time bounds for LSN',
+  proname => 'pg_stat_time_bounds_for_lsn', provolatile => 'v',
+  proisstrict => 't', proparallel => 'u',
+  prorettype => 'record', proargtypes => 'pg_lsn',
+  proallargtypes => '{pg_lsn,timestamptz,timestamptz}',
+  proargmodes => '{i,o,o}',
+  proargnames => '{target_lsn, lower, upper}',
+  prosrc => 'pg_stat_time_bounds_for_lsn' },
+
+{ oid => '9996', descr => 'get upper and lower LSN bounds for time',
+  proname => 'pg_stat_lsn_bounds_for_time', provolatile => 'v',
+  proisstrict => 't', proparallel => 'u',
+  prorettype => 'record', proargtypes => 'timestamptz',
+  proallargtypes => '{timestamptz,pg_lsn,pg_lsn}',
+  proargmodes => '{i,o,o}',
+  proargnames => '{target_time, lower, upper}',
+  prosrc => 'pg_stat_lsn_bounds_for_time' },
+
+{ oid => '9994',
+  descr => 'print the LSN Time Stream',
+  proname => 'pg_stat_lsntime_stream', prorows => '64',
+  provolatile => 'v', proparallel => 'u',
+  proretset => 't', prorettype => 'record',
+  proargtypes => '', proallargtypes => '{pg_lsn,timestamptz}',
+  proargmodes => '{o,o}', proargnames => '{lsn,time}',
+  prosrc => 'pg_stat_lsntime_stream' },
+
 { oid => '6168',
   descr => 'get commit timestamp and replication origin of a transaction',
   proname => 'pg_xact_commit_timestamp_origin', provolatile => 'v',
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 6e08898b183..5f32e3bd9e0 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -813,6 +813,49 @@ SELECT (n_tup_ins + n_tup_upd) > 0 AS has_data FROM pg_stat_all_tables
 -----
 -- Test that various stats views are being properly populated
 -----
+-- Test the functions querying the global LSNTimeStream stored in WAL stats.
+-- An LSN range covering a time 100 years in the past should be from 0 to a
+-- non-zero LSN (either the oldest LSN in the stream or the current insert
+-- LSN).
+SELECT lower = pg_lsn(0),
+       upper > pg_lsn(0)
+  FROM pg_stat_lsn_bounds_for_time(now() - make_interval(years=> 100));
+ ?column? | ?column? 
+----------+----------
+ t        | t
+(1 row)
+
+-- An LSN range covering a time 100 years in the future should be from roughly
+-- the current time to FFFFFFFF/FFFFFFFF (UINT64_MAX).
+SELECT lower > pg_lsn(0),
+       upper = pg_lsn('FFFFFFFF/FFFFFFFF')
+    FROM pg_stat_lsn_bounds_for_time(now() + make_interval(years=> 100));
+ ?column? | ?column? 
+----------+----------
+ t        | t
+(1 row)
+
+-- A TimestampTz range covering LSN 0 should be from -infinity to a positive
+-- time (either the oldest time in the stream or the current time).
+SELECT lower = timestamptz('-infinity'),
+       upper::time > 'allballs'::time
+    FROM pg_stat_time_bounds_for_lsn(pg_lsn(0));
+ ?column? | ?column? 
+----------+----------
+ t        | t
+(1 row)
+
+-- A TimestampTz range covering an LSN 1 GB in the future should be from
+-- roughly the current time to infinity.
+SELECT lower::time > 'allballs'::time,
+       upper = timestamptz('infinity')
+    FROM pg_stat_time_bounds_for_lsn(
+         pg_current_wal_insert_lsn() + 1000000000);
+ ?column? | ?column? 
+----------+----------
+ t        | t
+(1 row)
+
 -- Test that sessions is incremented when a new session is started in pg_stat_database
 SELECT sessions AS db_stat_sessions FROM pg_stat_database WHERE datname = (SELECT current_database()) \gset
 \c
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d8ac0d06f48..0260779141c 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -411,6 +411,34 @@ SELECT (n_tup_ins + n_tup_upd) > 0 AS has_data FROM pg_stat_all_tables
 -- Test that various stats views are being properly populated
 -----
 
+-- Test the functions querying the global LSNTimeStream stored in WAL stats.
+
+-- An LSN range covering a time 100 years in the past should be from 0 to a
+-- non-zero LSN (either the oldest LSN in the stream or the current insert
+-- LSN).
+SELECT lower = pg_lsn(0),
+       upper > pg_lsn(0)
+  FROM pg_stat_lsn_bounds_for_time(now() - make_interval(years=> 100));
+
+-- An LSN range covering a time 100 years in the future should be from roughly
+-- the current time to FFFFFFFF/FFFFFFFF (UINT64_MAX).
+SELECT lower > pg_lsn(0),
+       upper = pg_lsn('FFFFFFFF/FFFFFFFF')
+    FROM pg_stat_lsn_bounds_for_time(now() + make_interval(years=> 100));
+
+-- A TimestampTz range covering LSN 0 should be from -infinity to a positive
+-- time (either the oldest time in the stream or the current time).
+SELECT lower = timestamptz('-infinity'),
+       upper::time > 'allballs'::time
+    FROM pg_stat_time_bounds_for_lsn(pg_lsn(0));
+
+-- A TimestampTz range covering an LSN 1 GB in the future should be from
+-- roughly the current time to infinity.
+SELECT lower::time > 'allballs'::time,
+       upper = timestamptz('infinity')
+    FROM pg_stat_time_bounds_for_lsn(
+         pg_current_wal_insert_lsn() + 1000000000);
+
 -- Test that sessions is incremented when a new session is started in pg_stat_database
 SELECT sessions AS db_stat_sessions FROM pg_stat_database WHERE datname = (SELECT current_database()) \gset
 \c
-- 
2.34.1

From 54d6baf71e0e73131bb03fe641fd9bdaddf18a93 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Mon, 5 Aug 2024 20:29:51 -0400
Subject: [PATCH v7 1/4] Add LSNTimeStream API for converting LSN <-> time

Add a new structure, LSNTimeStream, consisting of LSNTimes -- each an
LSN, time pair. This structure is intended to reflect the WAL generation
rate. It can be used to determine a time range in which an LSN was
inserted or an LSN range covering a particular time. These could be used
to interpolate a more specific point in the range.

It produces ranges and not specific time <-> LSN conversions because an
LSNTimeStream is lossy. An LSNTimeStream is fixed size, so when a new
LSNTime is inserted to a full LSNTimeStream, an LSNTime is dropped and
the new LSNTime is inserted. We drop the LSNTime whose absence would
cause the least error when interpolating between its adjoining points.

This commit does not add any instances of LSNTimeStream.
---
 src/backend/utils/activity/pgstat_wal.c | 414 ++++++++++++++++++++++++
 src/include/pgstat.h                    |  45 +++
 src/tools/pgindent/typedefs.list        |   2 +
 3 files changed, 461 insertions(+)

diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index e2a3f6b865c..95ec65a51ff 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -17,8 +17,11 @@
 
 #include "postgres.h"
 
+#include "access/xlog.h"
 #include "executor/instrument.h"
+#include "math.h"
 #include "utils/pgstat_internal.h"
+#include "utils/timestamp.h"
 
 
 PgStat_PendingWalStats PendingWalStats = {0};
@@ -31,6 +34,23 @@ PgStat_PendingWalStats PendingWalStats = {0};
  */
 static WalUsage prevWalUsage;
 
+static double lsn_ts_calculate_error_area(LSNTime *left,
+										  LSNTime *mid,
+										  LSNTime *right);
+static unsigned char lsntime_to_drop(LSNTimeStream *stream);
+static void lsntime_insert(LSNTimeStream *stream, TimestampTz time,
+						   XLogRecPtr lsn);
+
+static void stream_get_bounds_for_lsn(const LSNTimeStream *stream,
+									  XLogRecPtr target_lsn,
+									  LSNTime *lower,
+									  LSNTime *upper);
+
+static void stream_get_bounds_for_time(const LSNTimeStream *stream,
+									   TimestampTz target_time,
+									   LSNTime *lower,
+									   LSNTime *upper);
+
 
 /*
  * Calculate how much WAL usage counters have increased and update
@@ -192,3 +212,397 @@ pgstat_wal_snapshot_cb(void)
 		   sizeof(pgStatLocal.snapshot.wal));
 	LWLockRelease(&stats_shmem->lock);
 }
+
+/*
+ * Given three LSNTimes, calculate the area of the triangle they form were
+ * they plotted with time on the X axis and LSN on the Y axis. An
+ * illustration:
+ *
+ *   LSN
+ *    |
+ *    |                                                         * right
+ *    |
+ *    |
+ *    |
+ *    |                                                * mid    * C
+ *    |
+ *    |
+ *    |
+ *    |  * left                                        * B      * A
+ *    |
+ *    +------------------------------------------------------------------
+ *
+ * The area of the triangle with vertices (left, mid, right) is the error
+ * incurred over the interval [left, right] were we to interpolate with just
+ * [left, right] rather than [left, mid] and [mid, right].
+ */
+static double
+lsn_ts_calculate_error_area(LSNTime *left, LSNTime *mid, LSNTime *right)
+{
+	double		left_time = left->time,
+				left_lsn = left->lsn;
+	double		mid_time = mid->time,
+				mid_lsn = mid->lsn;
+	double		right_time = right->time,
+				right_lsn = right->lsn;
+	double		rectangle_all,
+				triangle1,
+				triangle2,
+				triangle3,
+				rectangle_part,
+				area_to_subtract;
+
+	/* Area of the rectangle with opposing corners left and right */
+	rectangle_all = (right_time - left_time) * (right_lsn - left_lsn);
+
+	/* Area of the right triangle with vertices left, right, and A */
+	triangle1 = rectangle_all / 2;
+
+	/* Area of the right triangle with vertices left, mid, and B */
+	triangle2 = (mid_lsn - left_lsn) * (mid_time - left_time) / 2;
+
+	/* Area of the right triangle with vertices mid, right, and C */
+	triangle3 = (right_lsn - mid_lsn) * (right_time - mid_time) / 2;
+
+	/* Area of the rectangle with vertices mid, A, B, and C */
+	rectangle_part = (right_lsn - mid_lsn) * (mid_time - left_time);
+
+	/* Sum up the area to subtract first to produce a more precise answer */
+	area_to_subtract = triangle2 + triangle3 + rectangle_part;
+
+	/* Area of the triangle with vertices left, mid, and right */
+	return fabs(triangle1 - area_to_subtract);
+}
+
+/*
+ * Determine which LSNTime to drop from a full LSNTimeStream.
+ * Drop the LSNTime whose absence would introduce the least error into future
+ * linear interpolation on the stream.
+ *
+ * We determine the error that would be introduced by dropping a point on the
+ * stream by calculating the area of the triangle formed by the LSNTime and
+ * its adjacent LSNTimes. We do this for each LSNTime in the stream (except
+ * for the first and last LSNTimes) and choose the LSNTime with the smallest
+ * error (area).
+ *
+ * We avoid extrapolation by never dropping the first or last points.
+ */
+static unsigned char
+lsntime_to_drop(LSNTimeStream *stream)
+{
+	double		min_area;
+	unsigned char target_point;
+
+	/* Don't drop points if free spots available are available */
+	Assert(stream->length == LSNTIMESTREAM_VOLUME);
+	StaticAssertStmt(LSNTIMESTREAM_VOLUME >= 3, "LSNTIMESTREAM_VOLUME < 3");
+
+	min_area = lsn_ts_calculate_error_area(&stream->data[0],
+										   &stream->data[1],
+										   &stream->data[2]);
+
+	target_point = 1;
+
+	for (size_t i = 2; i < stream->length - 1; i++)
+	{
+		LSNTime    *left = &stream->data[i - 1];
+		LSNTime    *mid = &stream->data[i];
+		LSNTime    *right = &stream->data[i + 1];
+		double		area = lsn_ts_calculate_error_area(left, mid, right);
+
+		if (area < min_area)
+		{
+			min_area = area;
+			target_point = i;
+		}
+	}
+
+	return target_point;
+}
+
+/*
+ * Insert a new LSNTime into the LSNTimeStream in the first available element.
+ * If there are no empty elements, drop an LSNTime from the stream to make
+ * room for the new LSNTime.
+ */
+static void
+lsntime_insert(LSNTimeStream *stream, TimestampTz time,
+			   XLogRecPtr lsn)
+{
+	unsigned char drop;
+	LSNTime		entrant = {.lsn = lsn,.time = time};
+
+	if (stream->length < LSNTIMESTREAM_VOLUME)
+	{
+		/*
+		 * Time must move forward on the stream. If the clock moves backwards,
+		 * for example in an NTP correction, we'll just skip inserting this
+		 * LSNTime.
+		 *
+		 * Translating LSN <-> time is most meaningful if the LSNTimeStream
+		 * entries are the position of a single location in the WAL over time.
+		 * Though time must monotonically increase, it is valid to insert
+		 * multiple LSNTimes with the same LSN. Imagine a period of time in
+		 * which no new WAL records are inserted.
+		 */
+		if (stream->length > 0 &&
+			(time <= stream->data[stream->length - 1].time ||
+			 lsn < stream->data[stream->length - 1].lsn))
+		{
+			ereport(WARNING,
+					errmsg("Won't insert non-monotonic \"%lu, %s\" to LSNTimeStream.",
+						   lsn, timestamptz_to_str(time)));
+			return;
+		}
+
+		stream->data[stream->length++] = entrant;
+		return;
+	}
+
+	drop = lsntime_to_drop(stream);
+
+	memmove(&stream->data[drop],
+			&stream->data[drop + 1],
+			sizeof(LSNTime) * (stream->length - 1 - drop));
+
+	stream->data[stream->length - 1] = entrant;
+}
+
+
+/*
+ * Returns a range of LSNTimes starting at lower and ending at upper and
+ * covering the target_time. If target_time is before the stream, lower will
+ * contain the minimum values for the datatypes. If target_time is newer than
+ * the stream, upper will contain the maximum values for the datatypes.
+ */
+static void
+stream_get_bounds_for_time(const LSNTimeStream *stream,
+						   TimestampTz target_time,
+						   LSNTime *lower,
+						   LSNTime *upper)
+{
+	Assert(lower && upper);
+
+	/*
+	 * If the target_time is "off the stream" -- either the stream has no
+	 * members or the target_time is older than all values in the stream or
+	 * newer than all values -- the lower and/or upper bounds may be the min
+	 * or max value for the datatypes, respectively.
+	 */
+	*lower = LSNTIME_INIT(InvalidXLogRecPtr, INT64_MIN);
+	*upper = LSNTIME_INIT(UINT64_MAX, INT64_MAX);
+
+	/*
+	 * If the LSNTimeStream has no members, it provides no information about
+	 * the range.
+	 */
+	if (stream->length == 0)
+	{
+		elog(DEBUG1,
+			 "Attempt to identify LSN bounds for time: \"%s\" using empty LSNTimeStream.",
+			 timestamptz_to_str(target_time));
+		return;
+	}
+
+	/*
+	 * If the target_time is older than the stream, the oldest member in the
+	 * stream is our upper bound.
+	 */
+	if (target_time <= stream->data[0].time)
+	{
+		*upper = stream->data[0];
+		if (target_time == stream->data[0].time)
+			*lower = stream->data[0];
+		return;
+	}
+
+	/*
+	 * Loop through the stream and stop at the first LSNTime newer than or
+	 * equal to our target time. Skip the first LSNTime, as we know it is
+	 * older than our target time.
+	 */
+	for (size_t i = 1; i < stream->length; i++)
+	{
+		if (target_time == stream->data[i].time)
+		{
+			*lower = stream->data[i];
+			*upper = stream->data[i];
+			return;
+		}
+
+		if (target_time < stream->data[i].time)
+		{
+			/* Time must increase monotonically on the stream. */
+			Assert(stream->data[i - 1].time <
+				   stream->data[i].time);
+			*lower = stream->data[i - 1];
+			*upper = stream->data[i];
+			return;
+		}
+	}
+
+	/*
+	 * target_time is newer than the stream, so the newest member in the
+	 * stream is our lower bound.
+	 */
+	*lower = stream->data[stream->length - 1];
+}
+
+/*
+ * Try to find an upper and lower bound for the possible LSN values at the
+ * provided target_time. If the target_time doesn't fall on the provided
+ * LSNTimeStream, we compare the target_time to the current time and see if we
+ * can fill in a missing boundary. Note that we do not consult the
+ * current time if the target_time fell on the stream -- even if doing so
+ * might provide a tighter range.
+ */
+void
+lsn_bounds_for_time(const LSNTimeStream *stream, TimestampTz target_time,
+					LSNTime *lower, LSNTime *upper)
+{
+	TimestampTz current_time;
+	XLogRecPtr	current_lsn;
+
+	stream_get_bounds_for_time(stream, target_time, lower, upper);
+
+	/*
+	 * We found valid upper and lower bounds for target_time, so we're done.
+	 */
+	if (lower->lsn != InvalidXLogRecPtr && upper->lsn != UINT64_MAX)
+		return;
+
+	/*
+	 * The target_time was either off the stream or the stream has no members.
+	 * In either case, see if we can use the current time and LSN to provide
+	 * one (or both) of the bounds.
+	 */
+	current_time = GetCurrentTimestamp();
+	current_lsn = GetXLogInsertRecPtr();
+
+	if (lower->lsn == InvalidXLogRecPtr && target_time >= current_time)
+		*lower = LSNTIME_INIT(current_lsn, current_time);
+
+	if (upper->lsn == UINT64_MAX && target_time <= current_time)
+		*upper = LSNTIME_INIT(current_lsn, current_time);
+
+	Assert(upper->lsn >= lower->lsn);
+}
+
+/*
+ * Returns a range of LSNTimes starting at lower and ending at upper and
+ * covering the target_lsn. If target_lsn is before the stream, lower will
+ * contain the minimum values for the datatypes. If target_time is newer than
+ * the stream, upper will contain the maximum values for the datatypes.
+ */
+static void
+stream_get_bounds_for_lsn(const LSNTimeStream *stream,
+						  XLogRecPtr target_lsn,
+						  LSNTime *lower,
+						  LSNTime *upper)
+{
+	Assert(lower && upper);
+
+	/*
+	 * If the target_time is "off the stream" -- either the stream has no
+	 * members or the target_time is older than all values in the stream or
+	 * newer than all values -- the lower and/or upper bounds may be the min
+	 * or max value for the datatypes, respectively.
+	 */
+	*lower = LSNTIME_INIT(InvalidXLogRecPtr, INT64_MIN);
+	*upper = LSNTIME_INIT(UINT64_MAX, INT64_MAX);
+
+	/*
+	 * If the LSNTimeStream has no members, it provides no information about
+	 * the range.
+	 */
+	if (stream->length == 0)
+	{
+		elog(DEBUG1,
+			 "Attempt to identify time bounds for LSN: \"%lu\" using empty LSNTimeStream.",
+			 target_lsn);
+		return;
+	}
+
+	/*
+	 * If the target_lsn is older than the stream, the oldest member in the
+	 * stream is our upper bound.
+	 */
+	if (target_lsn <= stream->data[0].lsn)
+	{
+		*upper = stream->data[0];
+		if (target_lsn == stream->data[0].lsn)
+			*lower = stream->data[0];
+		return;
+	}
+
+	/*
+	 * Loop through the stream and stop at the first LSNTime newer than or
+	 * equal to our target time. Skip the first LSNTime, as we know it is
+	 * older than our target time.
+	 */
+	for (size_t i = 1; i < stream->length; i++)
+	{
+		if (target_lsn == stream->data[i].lsn)
+		{
+			*lower = stream->data[i - 1];
+			*upper = stream->data[i];
+			return;
+		}
+
+		if (target_lsn < stream->data[i].lsn)
+		{
+			/* LSNs must not decrease on the stream. */
+			Assert(stream->data[i - 1].lsn <=
+				   stream->data[i].lsn);
+			*lower = stream->data[i - 1];
+			*upper = stream->data[i];
+			return;
+		}
+	}
+
+	/*
+	 * target_lsn is newer than the stream, so the newest member in the stream
+	 * is our lower bound.
+	 */
+	*lower = stream->data[stream->length - 1];
+}
+
+/*
+ * Try to find an upper and lower bound for the possible times covering the
+ * provided target_lsn. If the target_lsn doesn't fall on the provided
+ * LSNTimeStream, we compare the target_lsn to the current insert LSN and see
+ * if we can fill in a missing boundary. Note that we do not consult
+ * the current insert LSN if the target_lsn fell on the stream -- even if
+ * doing so might provide a tighter range.
+ */
+void
+time_bounds_for_lsn(const LSNTimeStream *stream, XLogRecPtr target_lsn,
+					LSNTime *lower, LSNTime *upper)
+{
+	TimestampTz current_time;
+	XLogRecPtr	current_lsn;
+
+	stream_get_bounds_for_lsn(stream, target_lsn, lower, upper);
+
+	/*
+	 * We found valid upper and lower bounds for target_lsn, so we're done.
+	 */
+	if (lower->time != INT64_MIN && upper->time != INT64_MAX)
+		return;
+
+	/*
+	 * The target_lsn was either off the stream or the stream has no members.
+	 * In either case, see if we can use the current time and LSN to provide
+	 * one (or both) of the bounds.
+	 */
+	current_time = GetCurrentTimestamp();
+	current_lsn = GetXLogInsertRecPtr();
+
+	if (lower->time == INT64_MIN && target_lsn >= current_lsn)
+		*lower = LSNTIME_INIT(current_lsn, current_time);
+
+	if (upper->time == INT64_MAX && target_lsn <= current_lsn)
+		*upper = LSNTIME_INIT(current_lsn, current_time);
+
+	Assert(upper->time >= lower->time);
+}
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f63159c55ca..13856e2bef3 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -458,6 +458,45 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter autoanalyze_count;
 } PgStat_StatTabEntry;
 
+/*
+ * The elements of an LSNTimeStream. For the LSNTimeStream to be meaningful,
+ * the lsn should be a consistent position in the WAL over time (e.g. the
+ * insert LSN at each time in the stream or the flush LSN at each time).
+ */
+typedef struct LSNTime
+{
+	TimestampTz time;
+	XLogRecPtr	lsn;
+} LSNTime;
+
+/*
+ * Convenience macro returning an LSNTime with the time and LSN set to the
+ * passed in values.
+ */
+#define LSNTIME_INIT(i_lsn, i_time) \
+	((LSNTime) { .lsn = (i_lsn), .time = (i_time) })
+
+#define LSNTIMESTREAM_VOLUME 64
+
+/*
+ * An LSN time stream is an array consisting of LSNTimes from least to most
+ * recent. The array is filled before any element is dropped. Once the
+ * LSNTimeStream length == volume (the array is full), an LSNTime is dropped,
+ * the subsequent LSNTimes are moved down by 1, and the new LSNTime is
+ * inserted at the tail.
+ *
+ * When dropping an LSNTime, we attempt to pick the member which would
+ * introduce the least error into the stream. See lsntime_to_drop() for more
+ * details.
+ *
+ * Use the stream for LSN <-> time conversions.
+ */
+typedef struct LSNTimeStream
+{
+	unsigned char length;
+	LSNTime		data[LSNTIMESTREAM_VOLUME];
+} LSNTimeStream;
+
 typedef struct PgStat_WalStats
 {
 	PgStat_Counter wal_records;
@@ -749,6 +788,12 @@ extern void pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_
 
 extern void pgstat_report_wal(bool force);
 extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
+extern void lsn_bounds_for_time(const LSNTimeStream *stream,
+								TimestampTz target_time,
+								LSNTime *lower, LSNTime *upper);
+extern void time_bounds_for_lsn(const LSNTimeStream *stream,
+								XLogRecPtr target_lsn,
+								LSNTime *lower, LSNTime *upper);
 
 
 /*
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 547d14b3e7c..c8d84122976 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1587,6 +1587,8 @@ LogicalTapeSet
 LsnReadQueue
 LsnReadQueueNextFun
 LsnReadQueueNextStatus
+LSNTime
+LSNTimeStream
 LtreeGistOptions
 LtreeSignature
 MAGIC
-- 
2.34.1

Reply via email to