Some comments on pg_walinspect-docc.patch this time: + <varlistentry> + <term> + <function>pg_get_wal_record_info(in_lsn pg_lsn, lsn OUT pg_lsn, prev_lsn OUT pg_lsn, xid OUT xid, resource_manager OUT text, length OUT int4, total_length OUT int4, description OUT text, block_ref OUT text, data OUT bytea, data_len OUT int4)</function> + </term>
You may shorten this by mentioning just the function input parameters and specify "returns record" like shown below. So no need to specify all the OUT params. pg_get_wal_record_info(in_lsn pg_lsn) returns record. Please check the documentation for other functions for reference. == + <term> + <function>pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn DEFAULT NULL, lsn OUT pg_lsn, prev_lsn OUT pg_lsn, xid OUT xid, resource_manager OUT text, length OUT int4, total_length OUT int4, description OUT text, block_ref OUT text, data OUT bytea, data_len OUT int4)</function> + </term> Same comment applies here as well. In the return type you can just mention - "returns setof record" like shown below: pg_get_wal_records_info(start_lsn pg_lsn, end_lsn pg_lsn) returns setof records. You may also check for such optimizations at other places. I might have missed some. == +<screen> +postgres=# select prev_lsn, xid, resource_manager, length, total_length, block_ref from pg_get_wal_records_info('0/158A7F0', '0/1591400'); + prev_lsn | xid | resource_manager | length | total_length | block_ref +-----------+-----+------------------+--------+--------------+-------------------------------------------------------------------------- + 0/158A7B8 | 735 | Heap | 54 | 7838 | blkref #0: rel 1663/5/2619 blk 18 (FPW); hole: offset: 88, length: 408 + 0/158A7F0 | 735 | Btree | 53 | 8133 | blkref #0: rel 1663/5/2696 blk 1 (FPW); hole: offset: 1632, length: 112 + 0/158C6A8 | 735 | Heap | 53 | 873 | blkref #0: rel 1663/5/1259 blk 0 (FPW); hole: offset: 212, length: 7372 Instead of specifying column names in the targetlist I think it's better to use "*" so that it will display all the output columns. Also you may shorten the gap between start and end lsn to reduce the output size. == Any reason for not specifying author name in the .sgml file. Do you want me to add my name to the author? :) <para> Ashutosh Sharma <email>ashu.coe...@gmail.com</email> </para> </sect2> -- With Regards, Ashutosh Sharma. On Thu, Mar 10, 2022 at 10:15 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Thu, Mar 10, 2022 at 1:52 PM Jeff Davis <pg...@j-davis.com> wrote: > > > > On Wed, 2022-03-02 at 22:37 +0530, Bharath Rupireddy wrote: > > > > > > Attaching v6 patch set with above review comments addressed. Please > > > review it further. > > Thanks Jeff for reviewing it. I've posted the latest v7 patch-set > upthread [1] which is having more simple-yet-useful-and-effective > functions. > > > * Don't issue WARNINGs or other messages for ordinary situations, like > > when pg_get_wal_records_info() hits the end of WAL. > > v7 patch-set [1] has no warnings, but the functions will error out if > future LSN is specified. > > > * It feels like the APIs that allow waiting for the end of WAL are > > slightly off. Can't you just do pg_get_wal_records_info(start_lsn, > > least(pg_current_wal_flush_lsn(), end_lsn)) if you want the non-waiting > > behavior? Try to make the API more orthogonal, where a few basic > > functions can be combined to give you everything you need, rather than > > specifying extra parameters and issuing WARNINGs. I > > v7 patch-set [1] onwards waiting mode has been removed for all of the > functions, again to keep things simple-yet-useful-and-effective. > However, we can always add new pg_walinspect functions that wait for > future WAL in the next versions once basic stuff gets committed and if > many users ask for it. > > > * In the docs, include some example output. I don't see any output in > > the tests, which makes sense because it's mostly non-deterministic, but > > it would be helpful to see sample output of at least > > pg_get_wal_records_info(). > > +1. Added for pg_get_wal_records_info and pg_get_wal_stats. > > > * Is pg_get_wal_stats() even necessary, or can you get the same > > information with a query over pg_get_wal_records_info()? For instance, > > if you want to group by transaction ID rather than rmgr, then > > pg_get_wal_stats() is useless. > > Yes, you are right pg_get_wal_stats provides WAL stats per resource > manager which is similar to pg_waldump with --start, --end and --stats > option. It provides more information than pg_get_wal_records_info and > is a good way of getting stats than adding more columns to > pg_get_wal_records_info, calculating percentage in sql and having > group by clause. IMO, pg_get_wal_stats is more readable and useful. > > > * Would be nice to have a pg_wal_file_is_valid() or similar, which > > would test that it exists, and the header matches the filename (e.g. if > > it was recycled but not used, that would count as invalid). I think > > pg_get_first_valid_wal_record_lsn() would make some cases look invalid > > even if the file is valid -- for example, if a wal record spans many > > wal segments, the segments might look invalid because they contain no > > complete records, but the file itself is still valid and contains valid > > wal data. > > Actually I haven't tried testing a single WAL record spanning many WAL > files yet(I'm happy to try it if someone suggests such a use-case). In > that case too I assume pg_get_first_valid_wal_record_lsn() shouldn't > have a problem because it just gives the next valid LSN and it's > previous LSN using existing WAL reader API XLogFindNextRecord(). It > opens up the WAL file segments using (some dots to connect - > page_read/read_local_xlog_page, WALRead, > segment_open/wal_segment_open). Thoughts? > > I don't think it's necessary to have a function pg_wal_file_is_valid() > that given a WAL file name as input checks whether a WAL file exists > or not, probably not in the core (xlogfuncs.c) too. These kinds of > functions can open up challenges in terms of user input validation and > may cause unnecessary problems, please see some related discussion > [2]. > > > * Is there a reason you didn't include the timeline ID in > > pg_get_wal_records_info()? > > I'm right now allowing the functions to read WAL from the current > server's timeline which I have mentioned in the docs. The server's > current timeline is available via pg_control_checkpoint()'s > timeline_id. So, having timeline_id as a column doesn't make sense. > Again this is to keep things simple-yet-useful-and-effective. However, > we can add new pg_walinspect functions to read WAL from historic as > well as current timelines in the next versions once basic stuff gets > committed and if many users ask for it. > > + <para> > + All the functions of this module will provide the WAL information using the > + current server's timeline ID. > + </para> > > > * Can we mark this extension 'trusted'? I'm not 100% clear on the > > standards for that marker, but it seems reasonable for a database owner > > with the right privileges might want to install it. > > 'trusted' extensions concept is added by commit 50fc694 [3]. Since > pg_walinspect deals with WAL, we strictly want to control who creates > and can execute functions exposed by it, so I don't know if 'trusted' > is a good idea here. Also, pageinspect isn't a 'trusted' extension. > > > * pg_get_raw_wal_record() seems too powerful for pg_monitor. Maybe that > > function should require pg_read_server_files? Or at least > > pg_read_all_data? > > pg_read_all_data may not be the right choice, but pg_read_server_files > is. However, does it sound good if some functions are allowed to be > executed by users with a pg_monitor role and others > pg_get_raw_wal_record by users with pg_read_server_files? Since the > extension itself can be created by superusers, isn't the > pg_get_raw_wal_record sort of safe with pg_mointor itself? > > If hackers don't agree, I'm happy to grant execution on > pg_get_raw_wal_record() to the pg_read_server_files role. > > Attaching the v8 patch-set resolving above comments and some tests for > checking function permissions. Please review it further. > > [1] > https://www.postgresql.org/message-id/CALj2ACWtToUQ5hCCBJP%2BmKeVUcN-g7cMb9XvhAcicPxUDsdcKg%40mail.gmail.com > [2] > https://www.postgresql.org/message-id/CA%2BTgmobYrTgMEF0SV%2ByDYyCCh44DAGjZVs7BYGrD8xD3vwNjHA%40mail.gmail.com > [3] commit 50fc694e43742ce3d04a5e9f708432cb022c5f0d > Author: Tom Lane <t...@sss.pgh.pa.us> > Date: Wed Jan 29 18:42:43 2020 -0500 > > Invent "trusted" extensions, and remove the pg_pltemplate catalog. > > Regards, > Bharath Rupireddy.