On Sun, Mar 19, 2023 at 8:21 PM Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > The documentation has just one section titled "General Functions" > which directly contains detailed explation of four functions, making > it hard to get clear understanding of the available functions. I > considered breaking it down into a few subsections, but that wouldn't > look great since most of them would only contain one function. > However, I feel it would be helpful to add a list of all functions at > the beginning of the section.
I like the idea of sections, even if there is only one function per section in some cases. I also think that we should add a "Tip" that advises users that they may use an "end LSN" that is the largest possible LSN, 'FFFFFFFF/FFFFFFFF' to get information about records up until the current LSN of the cluster (per commit 5c1b6628). Is there a straightforward way to get a usable LSN constant for this purpose? The simplest way I could come up with quickly is "SELECT pg_lsn(2^64.-1)" -- which still isn't very simple. Actually, it might be even worse than 'FFFFFFFF/FFFFFFFF', so perhaps we should just use that in the docs new "Tip". > I agree that adding a note about the characteristics would helpful to > avoid the misuse of pg_get_wal_block_info(). How about something like, > "Note that pg_get_wal_block_info() omits records that contains no > block references."? This should be a strict invariant. In other words, it should be part of the documented contract of pg_get_wal_block_info and pg_get_wal_records_info. The two functions should be defined in terms of each other. Their relationship is important. Users should be able to safely assume that the records that have a NULL block_ref according to pg_get_wal_records_info are *precisely* those records that won't have any entries within pg_get_wal_block_info (assuming that the same LSN range is used with both functions). pg_walinspect should explicitly promise this, and promise the corollary condition around non-NULL block_ref records. It is a useful promise from the point of view of users. It also makes it easier to understand what's really going on here without any ambiguity. I don't completely disagree with Michael about the redundancy. I just think that it's worth it on performance grounds. We might want to say that directly in the docs, too. > > Attaching v3 patch set - 0001 optimizations around block references, > > 0002 enables pg_get_wal_block_info() to emit per-record info. Any > > thoughts? > > + /* Get block references, if any, otherwise continue. */ > + if (!XLogRecHasAnyBlockRefs(xlogreader)) > + continue; > > I'm not sure, but the "continue" might be confusing since the code > "continue"s if the condition is true and continues the process > otherwise.. And it seems like a kind of "explaination of what the > code does". I feel we don't need the a comment there. +1. Also, if GetWALBlockInfo() is now supposed to only be called when XLogRecHasAnyBlockRefs() now then it should probably have an assertion to verify the precondition. > It is not an issue with this patch, but as I look at this version, I'm > starting to feel uneasy about the subtle differences between what > GetWALRecordsInfo and GetWALBlockInfo do. One solution might be to > have GetWALBlockInfo return a values array for each block, but that > could make things more complex than needed. Alternatively, could we > get GetWALRecordsInfo to call tuplestore_putvalues() internally? This > way, both functions can manage the temporary memory context within > themselves. Agreed. I'm also not sure what to do about it, though. > This means GetWALBlockInfo overwrites the last two columns generated > by GetWalRecordInfo, but I don't think this approach is clean and > stable. I agree we don't want the final columns in a block info tuple > but we don't want to duplicate the common code path. > I initially thought we could devide the function into > GetWALCommonInfo(), GetWALRecordInfo() and GetWALBlockInfo(), but it > doesn't seem that simple.. In the end, I think we should have separate > GetWALRecordInfo() and GetWALBlockInfo() that have duplicate > "values[i++] = .." lines. I agree. A little redundancy is better when the alternative is fragile code, and I'm pretty sure that that applies here -- there won't be very many duplicated lines, and the final code will be significantly clearer. There can be a comment about keeping GetWALRecordInfo and GetWALBlockInfo in sync. -- Peter Geoghegan