Thanks so much for reviewing! On Fri, Feb 16, 2024 at 3:41 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > When I first read this, I immediately started wondering if this might > use the commit timestamp stuff we already have. Because for each commit > we already store the LSN and commit timestamp, right? But I'm not sure > that would be a good match - the commit_ts serves a very special purpose > of mapping XID => (LSN, timestamp), I don't see how to make it work for > (LSN=>timestmap) and (timestamp=>LSN) very easily.
I took a look at the code in commit_ts.c, and I really don't see a way of reusing any of this commit<->timestamp infrastructure for timestamp<->LSN mappings. > As for the inner workings of the patch, my understanding is this: > > - "LSNTimeline" consists of "LSNTime" entries representing (LSN,ts) > points, but those points are really "buckets" that grow larger and > larger for older periods of time. Yes, they are buckets in the sense that they represent multiple values but each contains a single LSNTime value which is the minimum of all the LSNTimes we "merged" into that single array element. In order to represent a range of time, you need to use two array elements. The linear interpolation from time <-> LSN is all done with two elements. > - AFAIK each entry represent an interval of time, and the next (older) > interval is twice as long, right? So the first interval is 1 second, > then 2 seconds, 4 seconds, 8 seconds, ... > > - But I don't understand how the LSNTimeline entries are "aging" and get > less accurate, while the "current" bucket is short. lsntime_insert() > seems to simply move to the next entry, but doesn't that mean we insert > the entries into larger and larger buckets? Because the earlier array elements can represent fewer logical members than later ones and because elements are merged into the next element when space runs out, later array elements will contain older data and more of it, so those "ranges" will be larger. But, after thinking about it and also reading your feedback, I realized my algorithm was silly because it starts merging logical members before it has even used the whole array. The attached v3 has a new algorithm. Now, LSNTimes are added from the end of the array forward until all array elements have at least one logical member (array length == volume). Once array length == volume, new LSNTimes will result in merging logical members in existing elements. We want to merge older members because those can be less precise. So, the number of logical members per array element will always monotonically increase starting from the beginning of the array (which contains the most recent data) and going to the end. We want to use all the available space in the array. That means that each LSNTime insertion will always result in a single merge. We want the timeline to be inclusive of the oldest data, so merging means taking the smaller value of two LSNTime values. I had to pick a rule for choosing which elements to merge. So, I choose the merge target as the oldest element whose logical membership is < 2x its predecessor. I merge the merge target's predecessor into the merge target. Then I move all of the intervening elements down 1. Then I insert the new LSNTime at index 0. > - The comments never really spell what amount of time the entries cover > / how granular it is. My understanding is it's simply measured in number > of entries added, which is assumed to be constant and drive by > bgwriter_delay, right? Which is 200ms by default. Which seems fine, but > isn't the hibernation (HIBERNATE_FACTOR) going to mess with it? > > Is there some case where bgwriter would just loop without sleeping, > filling the timeline much faster? (I can't think of any, but ...) bgwriter will wake up when there are buffers to flush, which is likely correlated with there being new LSNs. So, actually it seems like it might work well to rely on only filling the timeline when there are things for bgwriter to do. > - The LSNTimeline comment claims an array of size 64 is large enough to > not need to care about filling it, but maybe it should briefly explain > why we can never fill it (I guess 2^64 is just too many). The new structure fits a different number of members. I have yet to calculate that number, but it should be explained in the comments once I do. For example, if we made an LSNTimeline with volume 4, once every element had one LSNTime and we needed to start merging, the following is how many logical members each element would have after each of four merges: 1111 1112 1122 1114 1124 So, if we store the number of members as an unsigned 64-bit int and we have an LSNTimeline with volume 64, what is the maximum number of members can we store if we hold all of the invariants described in my algorithm above (we only merge when required, every element holds < 2x the number of logical members as its predecessor, we do exactly one merge every insertion [when required], membership must monotonically increase [choose the oldest element meeting the criteria when deciding what to merge])? > - I don't quite understand why 0005 adds the functions to pageinspect. > This has nothing to do with pages, right? You're right. I just couldn't think of a good place to put the functions. In version 3, I just put the SQL functions in pgstat_wal.c and made them generally available (i.e. not in a contrib module). I haven't added docs back yet. But perhaps a section near the docs describing pg_xact_commit_timestamp() [1]? I wasn't sure if I should put the SQL function source code in pgstatfuncs.c -- I kind of prefer it in pgstat_wal.c but there are no other SQL functions there. > - Not sure why we need 0001. Just so that the "estimate" functions in > 0002 have a convenient "start" point? Surely we could look at the > current LSNTimeline data and use the oldest value, or (if there's no > data) use the current timestamp/LSN? When there are 0 or 1 entries in the timeline you'll get an answer that could be very off if you just return the current timestamp or LSN. I guess that is okay? > - I wonder what happens if we lose the data - we know that if people > reset statistics for whatever reason (or just lose them because of a > crash, or because they're on a replica), bad things happen to > autovacuum. What's the (expected) impact on pruning? This is an important question. Because stats aren't crashsafe, we could return very inaccurate numbers for some time/LSN values if we crash. I don't actually know what we could do about that. When I use the LSNTimeline for the freeze heuristic it is less of an issue because the freeze heuristic has a fallback strategy when there aren't enough stats to do its calculations. But other users of this LSNTimeline will simply get highly inaccurate results (I think?). Is there anything we could do about this? It seems bad. Andres had brought up something at some point about, what if the database is simply turned off for awhile and then turned back on. Even if you cleanly shut down, will there be "gaps" in the timeline? I think that could be okay, but it is something to think about. > - What about a SRF function that outputs the whole LSNTimeline? Would be > useful for debugging / development, I think. (Just a suggestion). Good idea! I've added this. Though, maybe there was a simpler way to implement than I did. Just a note, all of my comments could use a lot of work, but I want to get consensus on the algorithm before I make sure and write about it in a perfect way. - Melanie [1] https://www.postgresql.org/docs/devel/functions-info.html#FUNCTIONS-INFO-COMMIT-TIMESTAMP
From cf9e6f507bc9781bf79e8c39766c8e84209d2ada Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Wed, 21 Feb 2024 20:06:29 -0500 Subject: [PATCH v3 5/5] Add time <-> LSN translation functions Previous commits added a global LSNTimeline, maintained by background writer, that allows approximate translations between time and LSNs. Add SQL-callable functions to convert from LSN to time and back and a SQL-callable function returning the entire LSNTimeline. This could be useful in combination with SQL-callable functions accessing a page LSN to approximate the time of last modification of a page or estimating the LSN consumption rate to moderate maintenance processes and balance system resource utilization. --- src/backend/utils/activity/pgstat_wal.c | 58 +++++++++++++++++++++++++ src/include/catalog/pg_proc.dat | 22 ++++++++++ 2 files changed, 80 insertions(+) diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c index df4c91ee3cf..27f2b23cd88 100644 --- a/src/backend/utils/activity/pgstat_wal.c +++ b/src/backend/utils/activity/pgstat_wal.c @@ -18,6 +18,7 @@ #include "postgres.h" #include "access/xlog.h" +#include "funcapi.h" #include "utils/pgstat_internal.h" #include "executor/instrument.h" #include "utils/builtins.h" @@ -418,3 +419,60 @@ pgstat_wal_update_lsntimeline(TimestampTz time, XLogRecPtr lsn) lsntime_insert(&stats_shmem->stats.timeline, time, lsn); LWLockRelease(&stats_shmem->lock); } + +PG_FUNCTION_INFO_V1(pg_estimate_lsn_at_time); +PG_FUNCTION_INFO_V1(pg_estimate_time_at_lsn); +PG_FUNCTION_INFO_V1(pg_lsntimeline); + +Datum +pg_estimate_time_at_lsn(PG_FUNCTION_ARGS) +{ + XLogRecPtr lsn = PG_GETARG_LSN(0); + PgStatShared_Wal *stats_shmem = &pgStatLocal.shmem->wal; + TimestampTz result; + + LWLockAcquire(&stats_shmem->lock, LW_SHARED); + result = estimate_time_at_lsn(&stats_shmem->stats.timeline, lsn); + LWLockRelease(&stats_shmem->lock); + + PG_RETURN_TIMESTAMPTZ(result); +} + +Datum +pg_estimate_lsn_at_time(PG_FUNCTION_ARGS) +{ + PgStatShared_Wal *stats_shmem = &pgStatLocal.shmem->wal; + TimestampTz time = PG_GETARG_TIMESTAMPTZ(0); + XLogRecPtr result; + + LWLockAcquire(&stats_shmem->lock, LW_SHARED); + result = estimate_lsn_at_time(&stats_shmem->stats.timeline, time); + LWLockRelease(&stats_shmem->lock); + + PG_RETURN_LSN(result); +} + +Datum +pg_lsntimeline(PG_FUNCTION_ARGS) +{ + ReturnSetInfo *rsinfo; + LSNTimeline *timeline; + PgStatShared_Wal *stats_shmem = &pgStatLocal.shmem->wal; + + InitMaterializedSRF(fcinfo, 0); + rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + LWLockAcquire(&stats_shmem->lock, LW_SHARED); + timeline = &stats_shmem->stats.timeline; + for (int i = LSNTIMELINE_VOLUME - timeline->length; i < LSNTIMELINE_VOLUME; i++) + { + Datum values[2] = {0}; + bool nulls[2] = {0}; + + values[0] = timeline->data[i].time; + values[1] = timeline->data[i].lsn; + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, + values, nulls); + } + LWLockRelease(&stats_shmem->lock); + return (Datum) 0; +} diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 9c120fc2b7f..e69cf9c2437 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -6326,6 +6326,28 @@ prorettype => 'timestamptz', proargtypes => 'xid', prosrc => 'pg_xact_commit_timestamp' }, +{ oid => '9997', + descr => 'get approximate LSN at a particular point in time', + proname => 'pg_estimate_lsn_at_time', provolatile => 'v', + prorettype => 'pg_lsn', proargtypes => 'timestamptz', + prosrc => 'pg_estimate_lsn_at_time' }, + +{ oid => '9996', + descr => 'get approximate time at a particular LSN', + proname => 'pg_estimate_time_at_lsn', provolatile => 'v', + prorettype => 'timestamptz', proargtypes => 'pg_lsn', + prosrc => 'pg_estimate_time_at_lsn' }, + +{ oid => '9994', + descr => 'print the LSN timeline', + proname => 'pg_lsntimeline', prorows => '64', + proretset => 't', provolatile => 'v', proparallel => 's', + prorettype => 'record', proargtypes => '', + proallargtypes => '{timestamptz,pg_lsn}', + proargmodes => '{o,o}', + proargnames => '{time, lsn}', + prosrc => 'pg_lsntimeline' }, + { oid => '6168', descr => 'get commit timestamp and replication origin of a transaction', proname => 'pg_xact_commit_timestamp_origin', provolatile => 'v', -- 2.37.2
From 6fd317a319de144a7a999fc6230268521e72a36f Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Wed, 21 Feb 2024 20:28:27 -0500 Subject: [PATCH v3 3/5] Add LSNTimeline to PgStat_WalStats Add a globally maintained instance of the new LSNTimeline to PgStat_WalStats and a utility function to insert new values. --- src/backend/utils/activity/pgstat_wal.c | 10 ++++++++++ src/include/pgstat.h | 4 ++++ 2 files changed, 14 insertions(+) diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c index 96e84319f6f..df4c91ee3cf 100644 --- a/src/backend/utils/activity/pgstat_wal.c +++ b/src/backend/utils/activity/pgstat_wal.c @@ -408,3 +408,13 @@ stop: result = (double) (lsn - start.lsn) / lsns_elapsed * time_elapsed + start.time; return Max(result, 0); } + +void +pgstat_wal_update_lsntimeline(TimestampTz time, XLogRecPtr lsn) +{ + PgStatShared_Wal *stats_shmem = &pgStatLocal.shmem->wal; + + LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE); + lsntime_insert(&stats_shmem->stats.timeline, time, lsn); + LWLockRelease(&stats_shmem->lock); +} diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 1926ddb00ed..8a63f56fdd3 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -482,6 +482,7 @@ typedef struct PgStat_WalStats PgStat_Counter wal_sync; PgStat_Counter wal_write_time; PgStat_Counter wal_sync_time; + LSNTimeline timeline; TimestampTz stat_reset_timestamp; } PgStat_WalStats; @@ -764,6 +765,9 @@ extern void pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_ extern void pgstat_report_wal(bool force); extern PgStat_WalStats *pgstat_fetch_stat_wal(void); +/* Helpers for maintaining the LSNTimeline */ +extern void pgstat_wal_update_lsntimeline(TimestampTz time, XLogRecPtr lsn); + /* * Variables in pgstat.c -- 2.37.2
From 1348109617ce772835336ea8e8a9781407f6060d Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Tue, 5 Dec 2023 07:29:39 -0500 Subject: [PATCH v3 1/5] Record LSN at postmaster startup The insert_lsn at postmaster startup can be used along with PgStartTime as seed values for a timeline mapping LSNs to time. Future commits will add such a structure for LSN <-> time conversions. A start LSN allows for such conversions before even inserting a value into the timeline. The current time and current insert LSN can be used along with PgStartTime and PgStartLSN. This is WIP, as I'm not sure if I did this in the right place. --- src/backend/access/transam/xlog.c | 2 ++ src/backend/postmaster/postmaster.c | 1 + src/include/utils/builtins.h | 3 +++ 3 files changed, 6 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c1162d55bff..3fea9f4c31f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -146,6 +146,8 @@ bool XLOG_DEBUG = false; int wal_segment_size = DEFAULT_XLOG_SEG_SIZE; +XLogRecPtr PgStartLSN = InvalidXLogRecPtr; + /* * Number of WAL insertion locks to use. A higher value allows more insertions * to happen concurrently, but adds some CPU overhead to flushing the WAL, diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index df945a5ac4d..9e5cad60549 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1440,6 +1440,7 @@ PostmasterMain(int argc, char *argv[]) * Remember postmaster startup time */ PgStartTime = GetCurrentTimestamp(); + PgStartLSN = GetXLogInsertRecPtr(); /* * Report postmaster status in the postmaster.pid file, to allow pg_ctl to diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index 359c570f23e..16a7a058bc7 100644 --- a/src/include/utils/builtins.h +++ b/src/include/utils/builtins.h @@ -17,6 +17,7 @@ #include "fmgr.h" #include "nodes/nodes.h" #include "utils/fmgrprotos.h" +#include "access/xlogdefs.h" /* Sign + the most decimal digits an 8-byte number could have */ #define MAXINT8LEN 20 @@ -85,6 +86,8 @@ extern void generate_operator_clause(fmStringInfo buf, Oid opoid, const char *rightop, Oid rightoptype); +extern PGDLLIMPORT XLogRecPtr PgStartLSN; + /* varchar.c */ extern int bpchartruelen(char *s, int len); -- 2.37.2
From 6204555c6d59035cba9f850c676d0b1f530c6f43 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Wed, 27 Dec 2023 16:32:40 -0500 Subject: [PATCH v3 4/5] Bgwriter maintains global LSNTimeline Insert new LSN, time pairs to the global LSNTimeline stored in PgStat_WalStats in the background writer's main loop. This ensures that new values are added to the timeline in a regular manner. --- src/backend/postmaster/bgwriter.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index 6364b16261f..4b4d5db60bb 100644 --- a/src/backend/postmaster/bgwriter.c +++ b/src/backend/postmaster/bgwriter.c @@ -272,6 +272,7 @@ BackgroundWriterMain(void) { TimestampTz timeout = 0; TimestampTz now = GetCurrentTimestamp(); + XLogRecPtr current_lsn = GetLastImportantRecPtr(); timeout = TimestampTzPlusMilliseconds(last_snapshot_ts, LOG_SNAPSHOT_INTERVAL_MS); @@ -284,10 +285,11 @@ BackgroundWriterMain(void) * the end of the record. */ if (now >= timeout && - last_snapshot_lsn <= GetLastImportantRecPtr()) + last_snapshot_lsn <= current_lsn) { last_snapshot_lsn = LogStandbySnapshot(); last_snapshot_ts = now; + pgstat_wal_update_lsntimeline(now, current_lsn); } } -- 2.37.2
From 800b0610f430f965e9216a374afe638bbec7bb6f Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Wed, 27 Dec 2023 16:40:27 -0500 Subject: [PATCH v3 2/5] Add LSNTimeline for converting LSN <-> time Add a new structure, LSNTimeline, consisting of LSNTimes -- each an LSN, time pair. Each LSNTime can represent multiple logical LSN, time pairs, referred to as members. LSN <-> time conversions can be done using linear interpolation with two LSNTimes on the LSNTimeline. This commit does not add a global instance of LSNTimeline. It adds the structures and functions needed to maintain and access such a timeline. --- src/backend/utils/activity/pgstat_wal.c | 224 ++++++++++++++++++++++++ src/include/pgstat.h | 44 +++++ src/tools/pgindent/typedefs.list | 2 + 3 files changed, 270 insertions(+) diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c index 1a3c0e1a669..96e84319f6f 100644 --- a/src/backend/utils/activity/pgstat_wal.c +++ b/src/backend/utils/activity/pgstat_wal.c @@ -17,8 +17,12 @@ #include "postgres.h" +#include "access/xlog.h" #include "utils/pgstat_internal.h" #include "executor/instrument.h" +#include "utils/builtins.h" +#include "utils/timestamp.h" +#include "utils/pg_lsn.h" PgStat_PendingWalStats PendingWalStats = {0}; @@ -32,6 +36,12 @@ PgStat_PendingWalStats PendingWalStats = {0}; static WalUsage prevWalUsage; +static void lsntime_merge(LSNTime *target, LSNTime *src); +static void lsntime_insert(LSNTimeline *timeline, TimestampTz time, XLogRecPtr lsn); + +XLogRecPtr estimate_lsn_at_time(const LSNTimeline *timeline, TimestampTz time); +TimestampTz estimate_time_at_lsn(const LSNTimeline *timeline, XLogRecPtr lsn); + /* * Calculate how much WAL usage counters have increased and update * shared WAL and IO statistics. @@ -184,3 +194,217 @@ pgstat_wal_snapshot_cb(void) sizeof(pgStatLocal.snapshot.wal)); LWLockRelease(&stats_shmem->lock); } + +/* + * Set *target to be the earlier of *target or *src. + */ +static void +lsntime_merge(LSNTime *target, LSNTime *src) +{ + LSNTime result; + uint64 new_members = target->members + src->members; + + if (target->time < src->time) + result = *target; + else if (src->time < target->time) + result = *src; + else if (target->lsn < src->lsn) + result = *target; + else if (src->lsn < target->lsn) + result = *src; + else + result = *target; + + *target = result; + target->members = new_members; + src->members = 1; +} + +static int +lsntime_merge_target(LSNTimeline *timeline) +{ + /* Don't merge if free space available */ + Assert(timeline->length == LSNTIMELINE_VOLUME); + + for (int i = timeline->length; i-- > 0;) + { + /* + * An array element can represent up to twice the number of members + * represented by the preceding array element. + */ + if (timeline->data[i].members < (2 * timeline->data[i - 1].members)) + return i; + } + + /* Should not be reachable or we are out of space */ + Assert(false); +} + +/* + * Insert a new LSNTime into the LSNTimeline in the first available element, + * or, if there are no empty elements, insert it into the element at index 0, + * merge the logical members of two old buckets and move the intervening + * elements down by one. + */ +void +lsntime_insert(LSNTimeline *timeline, TimestampTz time, + XLogRecPtr lsn) +{ + int merge_target; + LSNTime entrant = {.lsn = lsn,.time = time,.members = 1}; + + if (timeline->length < LSNTIMELINE_VOLUME) + { + /* + * The new entry should exceed the most recent entry to ensure time + * moves forward on the timeline. + */ + Assert(timeline->length == 0 || + (lsn >= timeline->data[LSNTIMELINE_VOLUME - timeline->length].lsn && + time >= timeline->data[LSNTIMELINE_VOLUME - timeline->length].time)); + + /* + * If there are unfilled elements in the timeline, then insert the + * passed-in LSNTime into the tail of the array. + */ + timeline->length++; + timeline->data[LSNTIMELINE_VOLUME - timeline->length] = entrant; + return; + } + + /* + * If all elements in the timeline represent at least one member, merge + * the oldest element whose membership is < 2x its predecessor with its + * preceding member. Then shift all elements preceding these two elements + * down by one and insert the passed-in LSNTime at array index 0. + */ + merge_target = lsntime_merge_target(timeline); + Assert(merge_target >= 0 && merge_target < timeline->length); + lsntime_merge(&timeline->data[merge_target], &timeline->data[merge_target - 1]); + memmove(&timeline->data[1], &timeline->data[0], sizeof(LSNTime) * merge_target - 1); + timeline->data[0] = entrant; +} + +/* + * Translate time to a LSN using the provided timeline. The timeline will not + * be modified. + */ +XLogRecPtr +estimate_lsn_at_time(const LSNTimeline *timeline, TimestampTz time) +{ + XLogRecPtr result; + int64 time_elapsed, + lsns_elapsed; + LSNTime start = {.time = PgStartTime,.lsn = PgStartLSN}; + LSNTime end = {.time = GetCurrentTimestamp(),.lsn = GetXLogInsertRecPtr()}; + + /* + * If the provided time is before DB startup, the best we can do is return + * the start LSN. + */ + if (time < start.time) + return start.lsn; + + /* + * If the provided time is after now, the current LSN is our best + * estimate. + */ + if (time >= end.time) + return end.lsn; + + /* + * Loop through the timeline. Stop at the first LSNTime earlier than our + * target time. This LSNTime will be our interpolation start point. If + * there's an LSNTime later than that, then that will be our interpolation + * end point. + */ + for (int i = LSNTIMELINE_VOLUME - timeline->length; i < LSNTIMELINE_VOLUME; i++) + { + if (timeline->data[i].time > time) + continue; + + start = timeline->data[i]; + if (i > 0) + end = timeline->data[i - 1]; + goto stop; + } + + /* + * If we exhausted the timeline, then use its earliest LSNTime as our + * interpolation end point. + */ + if (timeline->length > 0) + end = timeline->data[timeline->length - 1]; + +stop: + Assert(end.time > start.time); + Assert(end.lsn > start.lsn); + time_elapsed = end.time - start.time; + Assert(time_elapsed != 0); + lsns_elapsed = end.lsn - start.lsn; + Assert(lsns_elapsed != 0); + result = (double) (time - start.time) / time_elapsed * lsns_elapsed + start.lsn; + return Max(result, 0); +} + +/* + * Translate lsn to a time using the provided timeline. The timeline will not + * be modified. + */ +TimestampTz +estimate_time_at_lsn(const LSNTimeline *timeline, XLogRecPtr lsn) +{ + int64 time_elapsed, + lsns_elapsed; + TimestampTz result; + LSNTime start = {.time = PgStartTime,.lsn = PgStartLSN}; + LSNTime end = {.time = GetCurrentTimestamp(),.lsn = GetXLogInsertRecPtr()}; + + /* + * If the LSN is before DB startup, the best we can do is return that + * time. + */ + if (lsn <= start.lsn) + return start.time; + + /* + * If the target LSN is after the current insert LSN, the current time is + * our best estimate. + */ + if (lsn >= end.lsn) + return end.time; + + /* + * Loop through the timeline. Stop at the first LSNTime earlier than our + * target LSN. This LSNTime will be our interpolation start point. If + * there's an LSNTime later than that, then that will be our interpolation + * end point. + */ + for (int i = LSNTIMELINE_VOLUME - timeline->length; i < LSNTIMELINE_VOLUME; i++) + { + if (timeline->data[i].lsn > lsn) + continue; + + start = timeline->data[i]; + if (i > 0) + end = timeline->data[i - 1]; + goto stop; + } + + /* + * If we exhausted the timeline, then use its earliest LSNTime as our + * interpolation end point. + */ + if (timeline->length > 0) + end = timeline->data[timeline->length - 1]; + +stop: + Assert(end.time > start.time); + Assert(end.lsn > start.lsn); + time_elapsed = end.time - start.time; + Assert(time_elapsed != 0); + lsns_elapsed = end.lsn - start.lsn; + Assert(lsns_elapsed != 0); + result = (double) (lsn - start.lsn) / lsns_elapsed * time_elapsed + start.time; + return Max(result, 0); +} diff --git a/src/include/pgstat.h b/src/include/pgstat.h index 2136239710e..1926ddb00ed 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -11,6 +11,7 @@ #ifndef PGSTAT_H #define PGSTAT_H +#include "access/xlogdefs.h" #include "datatype/timestamp.h" #include "portability/instr_time.h" #include "postmaster/pgarch.h" /* for MAX_XFN_CHARS */ @@ -428,6 +429,49 @@ typedef struct PgStat_StatTabEntry PgStat_Counter autoanalyze_count; } PgStat_StatTabEntry; +/* + * The elements of an LSNTimeline. Each LSNTime represents one or more time, + * LSN pairs. The LSN is typically the insert LSN recorded at the time. Members + * is the number of logical members -- each a time, LSN pair -- represented in + * the LSNTime. + */ +typedef struct LSNTime +{ + TimestampTz time; + XLogRecPtr lsn; + uint64 members; +} LSNTime; + +#define LSNTIMELINE_VOLUME 64 +/* + * A timeline consists of LSNTimes from most to least recent. The array is + * filled from end to start before the contents of any elements are merged. + * Once the LSNTimeline length == volume (the array is full), old elements are + * merged to make space for new elements at index 0. When merging logical + * members, each element of the array in the timeline may represent twice as + * many logical members as the preceding element. + * + * This gives more recent times greater precision than less recent ones. An + * array of size 64 and an unsigned 64-bit number for the number of members + * should provide sufficient capacity without accounting for what to do when + * all elements of the array are at capacity. + * + * After every element has at least one logical member, when a new LSNTime is + * inserted, the oldest array element whose logical membership is < 2x its + * predecessor is the merge target. Its preceding element is merged into it. + * Then all of the intervening elements are moved down by one and the new + * LSNTime is inserted at index 0. + * + * Merging two elements is combining their members and assigning the lesser + * LSNTime. Use the timeline for LSN <-> time conversion using linear + * interpolation. + */ +typedef struct LSNTimeline +{ + int length; + LSNTime data[LSNTIMELINE_VOLUME]; +} LSNTimeline; + typedef struct PgStat_WalStats { PgStat_Counter wal_records; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index d808aad8b05..aef83230836 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1525,6 +1525,8 @@ LogicalTapeSet LsnReadQueue LsnReadQueueNextFun LsnReadQueueNextStatus +LSNTime +LSNTimeline LtreeGistOptions LtreeSignature MAGIC -- 2.37.2