So I looked at this patch and I have the same basic question as Bruce. Do we really want to expose every binary tool associated with Postgres as an extension? Obviously this is tempting for cloud provider users which is not an unreasonable argument. But it does have consequences.
1) Some things like pg_waldump are running code that is not normally under user control. This could have security issues or reliability issues. On that front I'm especially concerned that pg_verify_raw_wal_record() for example would let an attacker feed arbitrary hand crafted xlog records into the parser which is not normally something a user can do. If they feed it something it's not expecting it might be easy to cause a crash and server restart. There's also a bit of concern about data retention. Generally in Postgres when rows are deleted there's very weak guarantees about the data really being wiped. We definitely don't wipe it from disk of course. And things like pageinspect could expose it long after it's been deleted. But one might imagine after pageinspect shows it's gone and/or after a vacuum full the data is actually purged. But then something like pg_walinspect would make even that insufficient. 2) There's no documentation. I'm guessing you hesitated to write documentation until the interface is settled but actually sometimes writing documentation helps expose things in the interface that look strange when you try to explain them. 3) And the interface does look a bit strange. Like what's the deal with pg_get_wal_record_info_2() ? I gather it's just a SRF version of pg_get_wal_record_info() but that's a strange name. And then what's the point of pg_get_wal_record_info() at all? Why wouldn't the SRF be sufficient even for the specific case of a single record? And the stats functions seem a bit out of place to me. If the SRF returned the data in the right format the user should be able to do aggregate queries to generate these stats easily enough. If anything a simple SQL function to do the aggregations could be provided. Now this is starting to get into the realm of bikeshedding but... Some of the code is taken straight from pg_waldump and does things like: + appendStringInfo(&rec_blk_ref, "blkref #%u: rel %u/%u/%u fork %s blk %u", But that's kind of out of place for an SQL interface. It makes it hard to write queries since things like the relid, block number etc are in the string. If I wanted to use these functions I would expect to be doing something like "select all the decoded records pertaining to block n". All that said, I don't want to gatekeep based on this kind of criticism. The existing code is based on pg_waldump and if we want an extension to expose that then that's a reasonable place to start. We can work on a better format for the data later it doesn't mean we shouldn't start with something we have today. 4) This isn't really an issue with your patch at all but why on earth do we have a bitvector for WAL compression methods?! Like, what does it mean to have multiple compression methods set? That should just be a separate field with values for each type of compression surely? I suppose this raises the issue of what happens if someone fixes that. They'll now have to update pg_waldump *and* pg_walinspect. I don't think that would actually be a lot of work but it's definitely more than just one. Also, perhaps they should be in the same contrib directory so at least people won't forget there are two.