On Thu, Feb 4, 2021 at 11:10 AM Mark Dilger <mark.dil...@enterprisedb.com> wrote: > I also made changes to clean up 0003 (formerly numbered 0004)
"deduplice" is a typo. I'm not sure that I agree with check_each_database()'s commentary about why it doesn't make sense to optimize the resolve-the-databases step. Like, suppose I type 'pg_amcheck sasquatch'. I think the way you have it coded it's going to tell me that there are no databases to check, which might make me think I used the wrong syntax or something. I want it to tell me that sasquatch does not exist. If I happen to be a cryptid believer, I may reject that explanation as inaccurate, but at least there's no question about what pg_amcheck thinks the problem is. Why does check_each_database() go out of its way to run the main query without the always-secure search path? If there's a good reason, I think it deserves a comment saying what the reason is. If there's not a good reason, then I think it should use the always-secure search path for 100% of everything. Same question applies to check_one_database(). ParallelSlotSetHandler(free_slot, VerifyHeapamSlotHandler, sql.data) could stand to be split over two lines, like you do for the nearly run_command() call, so that it doesn't go past 80 columns. I suggest having two variables instead of one for amcheck_schema. Using the same variable to store the unescaped value and then later the escaped value is, IMHO, confusing. Whatever you call the escaped version, I'd rename the function parameters elsewhere to match. "status = PQsendQuery(conn, sql) == 1" seems a bit uptight to me. Why not just make status an int and then just "status = PQsendQuery(conn, sql)" and then test for status != 0? I don't really care if you don't change this, it's not actually important. But personally I'd rather code it as if any non-zero value meant success. I think the pg_log_error() in run_command() could be worded a bit better. I don't think it's a good idea to try to include the type of object in there like this, because of the translatability guidelines around assembling messages from fragments. And I don't think it's good to say that the check failed because the reality is that we weren't able to ask for the check to be run in the first place. I would rather log this as something like "unable to send query: %s". I would also assume we need to bail out entirely if that happens. I'm not totally sure what sorts of things can make PQsendQuery() fail but I bet it boils down to having lost the server connection. Should that occur, trying to send queries for all of the remaining objects is going to result in repeating the same error many times, which isn't going to be what anybody wants. It's unclear to me whether we should give up on the whole operation but I think we have to at least give up on that connection... unless I'm confused about what the failure mode is likely to be here. It looks to me like the user won't be able to tell by the exit code what happened. What I did with pg_verifybackup, and what I suggest we do here, is exit(1) if anything went wrong, either in terms of failing to execute queries or in terms of those queries returning problem reports. With pg_verifybackup, I thought about trying to make it like 0 => backup OK, 2 => backup not OK, 2 => trouble, but I found it too hard to distinguish what should be exit(1) and what should be exit(2) and the coding wasn't trivial either, so I went with the simpler scheme. The opening line of appendDatabaseSelect() could be adjusted to put the regexps parameter on the next line, avoiding awkward wrapping. If they are being run with a safe search path, the queries in appendDatabaseSelect(), appendSchemaSelect(), etc. could be run without all the paranoia. If not, maybe they should be. The casts to text don't include the paranoia: with an unsafe search path, we need pg_catalog.text here. Or no cast at all, which seems like it ought to be fine too. Not quite sure why you are doing all that casting to text; the datatype is presumably 'name' and ought to collate like collate "C" which is probably fine. It would probably be a better idea for appendSchemaSelect to declare a PQExpBuffer and call initPQExpBuffer just once, and then resetPQExpBuffer after each use, and finally termPQExpBuffer just once. The way you have it is not expensive enough to really matter, but avoiding repeated allocate/free cycles is probably best. I wonder if a pattern like .foo.bar ends up meaning the same thing as a pattern like foo.bar, with the empty database name being treated the same as if nothing were specified. >From the way appendTableCTE() is coded, it seems to me that if I ask for tables named j* excluding tables named jam* I still might get toast tables for my jam, which seems wrong. There does not seem to be any clear benefit to defining CT_TABLE = 0 in this case, so I would let the compiler deal with it. We should not be depending on that to have any particular numeric value. Why does pg_amcheck.c have a header file pg_amcheck.h if there's only one source file? If you had multiple source files then the header would be a reasonable place to put stuff they all need, but you don't. Copying the definitions of HEAP_TABLE_AM_OID and BTREE_AM_OID into pg_amcheck.h or anywhere else seems bad. I think you just be doing #include "catalog/pg_am_d.h". I think I'm out of steam for today but I'll try to look at this more soon. In general I think this patch and the whole series are pretty close to being ready to commit, even though there are still things I think need fixing here and there. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com