> > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > > > I think the output in the kind column can be displayed as {immediate, > > force, wait, requested, time}. By the way these are all checkpoint > > flags so it is better to display it as checkpoint flags instead of > > checkpoint kind as mentioned in one of my previous comments. > > I will update in the next patch.
The current format matches with the server log message for the checkpoint start in LogCheckpointStart(). Just to be consistent, I have not changed the code. I have taken care of the rest of the comments in v5 patch for which there was clarity. Thanks & Regards, Nitin Jadhav On Mon, Mar 7, 2022 at 8:15 PM Nitin Jadhav <nitinjadhavpostg...@gmail.com> wrote: > > > + <row> > > + <entry role="catalog_table_entry"><para role="column_definition"> > > + <structfield>type</structfield> <type>text</type> > > + </para> > > + <para> > > + Type of checkpoint. See <xref linkend="checkpoint-types"/>. > > + </para></entry> > > + </row> > > + > > + <row> > > + <entry role="catalog_table_entry"><para role="column_definition"> > > + <structfield>kind</structfield> <type>text</type> > > + </para> > > + <para> > > + Kind of checkpoint. See <xref linkend="checkpoint-kinds"/>. > > + </para></entry> > > + </row> > > > > This looks a bit confusing. Two columns, one with the name "checkpoint > > types" and another "checkpoint kinds". You can probably rename > > checkpoint-kinds to checkpoint-flags and let the checkpoint-types be > > as-it-is. > > Makes sense. I will change in the next patch. > --- > > > + > > <entry><structname>pg_stat_progress_checkpoint</structname><indexterm><primary>pg_stat_progress_checkpoint</primary></indexterm></entry> > > + <entry>One row only, showing the progress of the checkpoint. > > > > Let's make this message consistent with the already existing message > > for pg_stat_wal_receiver. See description for pg_stat_wal_receiver > > view in "Dynamic Statistics Views" table. > > You want me to change "One row only" to "Only one row" ? If that is > the case then for other views in the "Collected Statistics Views" > table, it is referred as "One row only". Let me know if you are > pointing out something else. > --- > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > > > I think the output in the kind column can be displayed as {immediate, > > force, wait, requested, time}. By the way these are all checkpoint > > flags so it is better to display it as checkpoint flags instead of > > checkpoint kind as mentioned in one of my previous comments. > > I will update in the next patch. > --- > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > start_lsn | 0/14C60F8 > > start_time | 2022-03-03 18:59:56.018662+05:30 > > phase | performing two phase checkpoint > > > > This is the output I see when the checkpointer process has come out of > > the two phase checkpoint and is currently writing checkpoint xlog > > records and doing other stuff like updating control files etc. Is this > > okay? > > The idea behind choosing the phases is based on the functionality > which takes longer time to execute. Since after two phase checkpoint > till post checkpoint cleanup won't take much time to execute, I have > not added any additional phase for that. But I also agree that this > gives wrong information to the user. How about mentioning the phase > information at the end of each phase like "Initializing", > "Initialization done", ..., "two phase checkpoint done", "post > checkpoint cleanup done", .., "finalizing". Except for the first phase > ("initializing") and last phase ("finalizing"), all the other phases > describe the end of a certain operation. I feel this gives correct > information even though the phase name/description does not represent > the entire code block between two phases. For example if the current > phase is ''two phase checkpoint done". Then the user can infer that > the checkpointer has done till two phase checkpoint and it is doing > other stuff that are after that. Thoughts? > --- > > > The output of log_checkpoint shows the number of buffers written is 3 > > whereas the output of pg_stat_progress_checkpoint shows it as 0. See > > below: > > > > 2022-03-03 20:04:45.643 IST [22043] LOG: checkpoint complete: wrote 3 > > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; > > write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2, > > longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB > > > > -- > > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > start_lsn | 0/14C60F8 > > start_time | 2022-03-03 18:59:56.018662+05:30 > > phase | finalizing > > buffers_total | 0 > > buffers_processed | 0 > > buffers_written | 0 > > > > Any idea why this mismatch? > > Good catch. In BufferSync() we have 'num_to_scan' (buffers_total) > which indicates the total number of buffers to be processed. Based on > that, the 'buffers_processed' and 'buffers_written' counter gets > incremented. I meant these values may reach upto 'buffers_total'. The > current pg_stat_progress_view support above information. There is > another place when 'ckpt_bufs_written' gets incremented (In > SlruInternalWritePage()). This increment is above the 'buffers_total' > value and it is included in the server log message (checkpoint end) > and not included in the view. I am a bit confused here. If we include > this increment in the view then we cannot calculate the exact > 'buffers_total' beforehand. Can we increment the 'buffers_toal' also > when 'ckpt_bufs_written' gets incremented so that we can match the > behaviour with checkpoint end message? Please share your thoughts. > --- > > > I think we can add a couple of more information to this view - > > start_time for buffer write operation and start_time for buffer sync > > operation. These are two very time consuming tasks in a checkpoint and > > people would find it useful to know how much time is being taken by > > the checkpoint in I/O operation phase. thoughts? > > The detailed progress is getting shown for these 2 phases of the > checkpoint like 'buffers_processed', 'buffers_written' and > 'files_synced'. Hence I did not think about adding start time and If > it is really required, then I can add. > > Thanks & Regards, > Nitin Jadhav > > On Thu, Mar 3, 2022 at 8:30 PM Ashutosh Sharma <ashu.coe...@gmail.com> wrote: > > > > Here are some of my review comments on the latest patch: > > > > + <row> > > + <entry role="catalog_table_entry"><para role="column_definition"> > > + <structfield>type</structfield> <type>text</type> > > + </para> > > + <para> > > + Type of checkpoint. See <xref linkend="checkpoint-types"/>. > > + </para></entry> > > + </row> > > + > > + <row> > > + <entry role="catalog_table_entry"><para role="column_definition"> > > + <structfield>kind</structfield> <type>text</type> > > + </para> > > + <para> > > + Kind of checkpoint. See <xref linkend="checkpoint-kinds"/>. > > + </para></entry> > > + </row> > > > > This looks a bit confusing. Two columns, one with the name "checkpoint > > types" and another "checkpoint kinds". You can probably rename > > checkpoint-kinds to checkpoint-flags and let the checkpoint-types be > > as-it-is. > > > > == > > > > + > > <entry><structname>pg_stat_progress_checkpoint</structname><indexterm><primary>pg_stat_progress_checkpoint</primary></indexterm></entry> > > + <entry>One row only, showing the progress of the checkpoint. > > > > Let's make this message consistent with the already existing message > > for pg_stat_wal_receiver. See description for pg_stat_wal_receiver > > view in "Dynamic Statistics Views" table. > > > > == > > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > > > I think the output in the kind column can be displayed as {immediate, > > force, wait, requested, time}. By the way these are all checkpoint > > flags so it is better to display it as checkpoint flags instead of > > checkpoint kind as mentioned in one of my previous comments. > > > > == > > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > start_lsn | 0/14C60F8 > > start_time | 2022-03-03 18:59:56.018662+05:30 > > phase | performing two phase checkpoint > > > > > > This is the output I see when the checkpointer process has come out of > > the two phase checkpoint and is currently writing checkpoint xlog > > records and doing other stuff like updating control files etc. Is this > > okay? > > > > == > > > > The output of log_checkpoint shows the number of buffers written is 3 > > whereas the output of pg_stat_progress_checkpoint shows it as 0. See > > below: > > > > 2022-03-03 20:04:45.643 IST [22043] LOG: checkpoint complete: wrote 3 > > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; > > write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2, > > longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB > > > > -- > > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-----+------------------------------------- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > start_lsn | 0/14C60F8 > > start_time | 2022-03-03 18:59:56.018662+05:30 > > phase | finalizing > > buffers_total | 0 > > buffers_processed | 0 > > buffers_written | 0 > > > > Any idea why this mismatch? > > > > == > > > > I think we can add a couple of more information to this view - > > start_time for buffer write operation and start_time for buffer sync > > operation. These are two very time consuming tasks in a checkpoint and > > people would find it useful to know how much time is being taken by > > the checkpoint in I/O operation phase. thoughts? > > > > -- > > With Regards, > > Ashutosh Sharma. > > > > On Wed, Mar 2, 2022 at 4:45 PM Nitin Jadhav > > <nitinjadhavpostg...@gmail.com> wrote: > > > > > > Thanks for reviewing. > > > > > > > > > I suggested upthread to store the starting timeline instead. This > > > > > > way you can > > > > > > deduce whether it's a restartpoint or a checkpoint, but you can > > > > > > also deduce > > > > > > other information, like what was the starting WAL. > > > > > > > > > > I don't understand why we need the timeline here to just determine > > > > > whether it's a restartpoint or checkpoint. > > > > > > > > I'm not saying it's necessary, I'm saying that for the same space usage > > > > we can > > > > store something a bit more useful. If no one cares about having the > > > > starting > > > > timeline available for no extra cost then sure, let's just store the > > > > kind > > > > directly. > > > > > > Fixed. > > > > > > > 2) Can't we just have these checks inside CASE-WHEN-THEN-ELSE blocks > > > > directly instead of new function pg_stat_get_progress_checkpoint_kind? > > > > + snprintf(ckpt_kind, MAXPGPATH, "%s%s%s%s%s%s%s%s%s", > > > > + (flags == 0) ? "unknown" : "", > > > > + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", > > > > + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", > > > > + (flags & CHECKPOINT_IMMEDIATE) ? "immediate " : "", > > > > + (flags & CHECKPOINT_FORCE) ? "force " : "", > > > > + (flags & CHECKPOINT_WAIT) ? "wait " : "", > > > > + (flags & CHECKPOINT_CAUSE_XLOG) ? "wal " : "", > > > > + (flags & CHECKPOINT_CAUSE_TIME) ? "time " : "", > > > > + (flags & CHECKPOINT_FLUSH_ALL) ? "flush-all" : ""); > > > > > > Fixed. > > > --- > > > > > > > 5) Do we need a special phase for this checkpoint operation? I'm not > > > > sure in which cases it will take a long time, but it looks like > > > > there's a wait loop here. > > > > vxids = GetVirtualXIDsDelayingChkpt(&nvxids); > > > > if (nvxids > 0) > > > > { > > > > do > > > > { > > > > pg_usleep(10000L); /* wait for 10 msec */ > > > > } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids)); > > > > } > > > > > > Yes. It is better to add a separate phase here. > > > --- > > > > > > > Also, how about special phases for SyncPostCheckpoint(), > > > > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(), > > > > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but > > > > it might be increase in future (?)), TruncateSUBTRANS()? > > > > > > SyncPreCheckpoint() is just incrementing a counter and > > > PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel > > > there is no need to add any phases for these as of now. We can add in > > > the future if necessary. Added phases for SyncPostCheckpoint(), > > > InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS(). > > > --- > > > > > > > 6) SLRU (Simple LRU) isn't a phase here, you can just say > > > > PROGRESS_CHECKPOINT_PHASE_PREDICATE_LOCK_PAGES. > > > > + > > > > + pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE, > > > > + PROGRESS_CHECKPOINT_PHASE_SLRU_PAGES); > > > > CheckPointPredicate(); > > > > > > > > And :s/checkpointing SLRU pages/checkpointing predicate lock pages > > > >+ WHEN 9 THEN 'checkpointing SLRU pages' > > > > > > Fixed. > > > --- > > > > > > > 7) > > > > :s/PROGRESS_CHECKPOINT_PHASE_FILE_SYNC/PROGRESS_CHECKPOINT_PHASE_PROCESS_FILE_SYNC_REQUESTS > > > > > > I feel PROGRESS_CHECKPOINT_PHASE_FILE_SYNC is a better option here as > > > it describes the purpose in less words. > > > > > > > And :s/WHEN 11 THEN 'performing sync requests'/WHEN 11 THEN > > > > 'processing file sync requests' > > > > > > Fixed. > > > --- > > > > > > > 8) :s/Finalizing/finalizing > > > > + WHEN 14 THEN 'Finalizing' > > > > > > Fixed. > > > --- > > > > > > > 9) :s/checkpointing snapshots/checkpointing logical replication > > > > snapshot files > > > > + WHEN 3 THEN 'checkpointing snapshots' > > > > :s/checkpointing logical rewrite mappings/checkpointing logical > > > > replication rewrite mapping files > > > > + WHEN 4 THEN 'checkpointing logical rewrite > > > > mappings' > > > > > > Fixed. > > > --- > > > > > > > 10) I'm not sure if it's discussed, how about adding the number of > > > > snapshot/mapping files so far the checkpoint has processed in file > > > > processing while loops of > > > > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can > > > > be many logical snapshot or mapping files and users may be interested > > > > in knowing the so-far-processed-file-count. > > > > > > I had thought about this while sharing the v1 patch and mentioned my > > > views upthread. I feel it won't give meaningful progress information > > > (It can be treated as statistics). Hence not included. Thoughts? > > > > > > > > > As mentioned upthread, there can be multiple backends that request a > > > > > > checkpoint, so unless we want to store an array of pid we should > > > > > > store a number > > > > > > of backend that are waiting for a new checkpoint. > > > > > > > > > > Yeah, you are right. Let's not go that path and store an array of > > > > > pids. I don't see a strong use-case with the pid of the process > > > > > requesting checkpoint. If required, we can add it later once the > > > > > pg_stat_progress_checkpoint view gets in. > > > > > > > > I don't think that's really necessary to give the pid list. > > > > > > > > If you requested a new checkpoint, it doesn't matter if it's only your > > > > backend > > > > that triggered it, another backend or a few other dozen, the result > > > > will be the > > > > same and you have the information that the request has been seen. We > > > > could > > > > store just a bool for that but having a number instead also gives a bit > > > > more > > > > information and may allow you to detect some broken logic on your > > > > client code > > > > if it keeps increasing. > > > > > > It's a good metric to show in the view but the information is not > > > readily available. Additional code is required to calculate the number > > > of requests. Is it worth doing that? I feel this can be added later if > > > required. > > > > > > Please find the v4 patch attached and share your thoughts. > > > > > > Thanks & Regards, > > > Nitin Jadhav > > > > > > On Tue, Mar 1, 2022 at 2:27 PM Nitin Jadhav > > > <nitinjadhavpostg...@gmail.com> wrote: > > > > > > > > > > 3) Why do we need this extra calculation for start_lsn? > > > > > > Do you ever see a negative LSN or something here? > > > > > > + ('0/0'::pg_lsn + ( > > > > > > + CASE > > > > > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, > > > > > > (64)::numeric) > > > > > > + ELSE (0)::numeric > > > > > > + END + (s.param3)::numeric)) AS start_lsn, > > > > > > > > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > > > > > bigint type; thus the signed int64. This cast is OK as it wraps > > > > > around, but that means we have to take care to correctly display the > > > > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > > > > > the special-casing for negative values. > > > > > > > > Yes. The extra calculation is required here as we are storing unit64 > > > > value in the variable of type int64. When we convert uint64 to int64 > > > > then the bit pattern is preserved (so no data is lost). The high-order > > > > bit becomes the sign bit and if the sign bit is set, both the sign and > > > > magnitude of the value changes. To safely get the actual uint64 value > > > > whatever was assigned, we need the above calculations. > > > > > > > > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > > > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > > > > > the reasoning for having this function and it's named as > > > > > > *checkpoint* > > > > > > when it doesn't do anything specific to the checkpoint at all? > > > > > > > > > > I hadn't thought of using the types' inout functions, but it looks > > > > > like timestamp IO functions use a formatted timestring, which won't > > > > > work with the epoch-based timestamp stored in the view. > > > > > > > > There is a variation of to_timestamp() which takes UNIX epoch (float8) > > > > as an argument and converts it to timestamptz but we cannot directly > > > > call this function with S.param4. > > > > > > > > TimestampTz > > > > GetCurrentTimestamp(void) > > > > { > > > > TimestampTz result; > > > > struct timeval tp; > > > > > > > > gettimeofday(&tp, NULL); > > > > > > > > result = (TimestampTz) tp.tv_sec - > > > > ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY); > > > > result = (result * USECS_PER_SEC) + tp.tv_usec; > > > > > > > > return result; > > > > } > > > > > > > > S.param4 contains the output of the above function > > > > (GetCurrentTimestamp()) which returns Postgres epoch but the > > > > to_timestamp() expects UNIX epoch as input. So some calculation is > > > > required here. I feel the SQL 'to_timestamp(946684800 + > > > > (S.param4::float / 1000000)) AS start_time' works fine. The value > > > > '946684800' is equal to ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * > > > > SECS_PER_DAY). I am not sure whether it is good practice to use this > > > > way. Kindly share your thoughts. > > > > > > > > Thanks & Regards, > > > > Nitin Jadhav > > > > > > > > On Mon, Feb 28, 2022 at 6:40 PM Matthias van de Meent > > > > <boekewurm+postg...@gmail.com> wrote: > > > > > > > > > > On Sun, 27 Feb 2022 at 16:14, Bharath Rupireddy > > > > > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > > > 3) Why do we need this extra calculation for start_lsn? > > > > > > Do you ever see a negative LSN or something here? > > > > > > + ('0/0'::pg_lsn + ( > > > > > > + CASE > > > > > > + WHEN (s.param3 < 0) THEN pow((2)::numeric, > > > > > > (64)::numeric) > > > > > > + ELSE (0)::numeric > > > > > > + END + (s.param3)::numeric)) AS start_lsn, > > > > > > > > > > Yes: LSN can take up all of an uint64; whereas the pgstat column is a > > > > > bigint type; thus the signed int64. This cast is OK as it wraps > > > > > around, but that means we have to take care to correctly display the > > > > > LSN when it is > 0x7FFF_FFFF_FFFF_FFFF; which is what we do here using > > > > > the special-casing for negative values. > > > > > > > > > > As to whether it is reasonable: Generating 16GB of wal every second > > > > > (2^34 bytes /sec) is probably not impossible (cpu <> memory bandwidth > > > > > has been > 20GB/sec for a while); and that leaves you 2^29 seconds of > > > > > database runtime; or about 17 years. Seeing that a cluster can be > > > > > `pg_upgrade`d (which doesn't reset cluster LSN) since PG 9.0 from at > > > > > least version PG 8.4.0 (2009) (and through pg_migrator, from 8.3.0)), > > > > > we can assume that clusters hitting LSN=2^63 will be a reasonable > > > > > possibility within the next few years. As the lifespan of a PG release > > > > > is about 5 years, it doesn't seem impossible that there will be actual > > > > > clusters that are going to hit this naturally in the lifespan of PG15. > > > > > > > > > > It is also possible that someone fat-fingers pg_resetwal; and creates > > > > > a cluster with LSN >= 2^63; resulting in negative values in the > > > > > s.param3 field. Not likely, but we can force such situations; and as > > > > > such we should handle that gracefully. > > > > > > > > > > > 4) Can't you use timestamptz_in(to_char(s.param4)) instead of > > > > > > pg_stat_get_progress_checkpoint_start_time? I don't quite understand > > > > > > the reasoning for having this function and it's named as > > > > > > *checkpoint* > > > > > > when it doesn't do anything specific to the checkpoint at all? > > > > > > > > > > I hadn't thought of using the types' inout functions, but it looks > > > > > like timestamp IO functions use a formatted timestring, which won't > > > > > work with the epoch-based timestamp stored in the view. > > > > > > > > > > > Having 3 unnecessary functions that aren't useful to the users at > > > > > > all > > > > > > in proc.dat will simply eatup the function oids IMO. Hence, I > > > > > > suggest > > > > > > let's try to do without extra functions. > > > > > > > > > > I agree that (1) could be simplified, or at least fully expressed in > > > > > SQL without exposing too many internals. If we're fine with exposing > > > > > internals like flags and type layouts, then (2), and arguably (4), can > > > > > be expressed in SQL as well. > > > > > > > > > > -Matthias