On Wed, Mar 3, 2021 at 10:22 AM Mark Dilger <mark.dil...@enterprisedb.com> wrote: > > Your four messages about there being nothing to check seem like they > > could be consolidated down to one: "nothing to check for pattern > > \"%s\"". > > I anticipated your review comment, but I'm worried about the case that > somebody runs > > pg_amcheck -t "foo" -i "foo" > > and one of those matches and the other does not. The message 'nothing to > check for pattern "foo"' will be wrong (because there was something to check > for it) and unhelpful (because it doesn't say which failed to match.)
Fair point. > Changed, though I assumed your parens for corruption() were not intended. Uh, yeah. > Thanks for the review! + fprintf(stderr, "%s: no relations to check", progname); Missing newline. Generally, I would favor using pg_log_whatever as a way of reporting messages starting when option parsing is complete. In other words, starting here: + fprintf(stderr, "%s: no databases to check\n", progname); I see no real advantage in having a bunch of these using fprintf(stderr, ...), which to me seems most appropriate only for very early failures. Perhaps amcheck_sql could be spread across fewer lines, now that it doesn't have so many decorations? pg_basebackup uses -P as a short form for --progress, so maybe we should match that here. When I do "pg_amcheck --progress", it just says "259/259 (100%)" which I don't find too clear. The corresponding pg_basebackup output is "32332/32332 kB (100%), 1/1 tablespace" which has the advantage of including units. I think if you just add the word "relations" to your message it will be nicer. When I do "pg_amcheck -s public" it tells me that there are no relations to check in schemas for "public". I think "schemas matching" would read better than "schemas for." Similar with the other messages. When I try "pg_amcheck -t nessie" it tells me that there are no tables to check for "nessie" but saying that there are no tables to check matching "nessie" to me sounds more natural. The code doesn't seem real clear on the difference between a database name and a pattern. Consider: createdb rhaas createdb 'rh*s' PGDATABASE='rh*s' pg_amcheck It checks the rhaas database, which I venture to say is just plain wrong. The error message when I exclude the only checkable database is not very clear. "pg_amcheck -D rhaas" says pg_amcheck: no checkable database: "rhaas". Well, I get that there's no checkable database. But as a user I have no idea what "rhaas" is. I can even get it to issue this complaint more than once: createdb q createdb qq pg_amcheck -D 'q*' q qq Now it issues the "no checkable database" complaint twice, once for q and once for qq. But if there's no checkable database, I only need to know that once. Either the message is wrongly-worded, or it should only be issued once and doesn't need to include the pattern. I think it's the second one, but I could be wrong. Using a pattern as the only or first argument doesn't work; i.e. "pg_amcheck rhaas" works but "pg_amcheck rhaa?" fails because there is no database with that exact literal name. This seems like another instance of confusion between a literal database name and a database name pattern. I'm not quite sure what the right solution is here. We could give up on having database patterns altogether -- the comparable issue does not arise for database and schema name patterns -- or the maintenance database could default to something that's not going to be a pattern, like "postgres," rather than being taken from a command-line argument that is intended to be a pattern. Or some hybrid approach e.g. -d options are patterns, but don't set the maintenance database, while extra command line arguments are literal database names, and thus are presumably OK to use as the maintenance DB. But it's too weird IMHO to support patterns here and then have supplying one inevitably fail unless you also specify --maintenance-db. It's sorta annoying that there doesn't seem to be an easy way to find out exactly what relations got checked as a result of whatever I did. Perhaps pg_amcheck -v should print a line for each relation saying that it's checking that relation; it's not actually that verbose as things stand. If we thought that was overdoing it, we could set things up so that multiple -v options keep increasing the verbosity level, so that you can get this via pg_amcheck -vv. I submit that pg_amcheck -e is not useful for this purpose because the queries, besides being long, use the relation OIDs rather than the names, so it's not easy to see what happened. I think that something's not working in terms of schema exclusion. If I create a brand-new database and then run "pg_amcheck -S pg_catalog -S information_schema -S pg_toast" it still checks stuff. In fact it seems to check the exact same amount of stuff that it checks if I run it with no command-line options at all. In fact, if I run "pg_amcheck -S '*'" that still checks everything. Unless I'm misunderstanding what this option is supposed to do, the fact that a version of this patch where this seemingly doesn't work at all escaped to the list suggests that your testing has got some gaps. I like the semantics of --no-toast-expansion and --no-index-expansion as you now have them, but I find I don't really like the names. Could I suggest --no-dependent-indexes and --no-dependent-toast? I tried pg_amcheck --startblock=tsgsdg and got an error message without a trailing newline. I tried --startblock=-525523 and got no error. I tried --startblock=99999999999999999999999999 and got a complaint that the value was out of bounds, but without a trailing newline. Maybe there's an argument that the bounds don't need to be checked, but surely there's no argument for checking one and not the other. I haven't tried the corresponding cases with --endblock but you should. I tried --startblock=2 --endblock=1 and got a complaint that the ending block precedes the starting block, which is totally reasonable (though I might say "start block" and "end block" rather than using the -ing forms) but this message is prefixed with "pg_amcheck: " whereas the messages about an altogether invalid starting block where not so prefixed. Is there a reason not to make this consistent? I also tried using a random positive integer for startblock, and for every relation I am told "ERROR: starting block number must be between 0 and <whatever>". That makes sense, because I used a big number for the start block and I don't have any big relations, but it makes for an absolute ton of output, because every verify_heapam query is 11 lines long. This suggests a couple of possible improvements. First, maybe we should only display the query that produced the error in verbose mode. Second, maybe the verify_heapam() query should be tightened up so that it doesn't stretch across quite so many lines. I think the call to verify_heapam() could be spread across like 2 lines rather than 7, which would improve readability. On a related note, I wonder why we need every verify_heapam() call to join to pg_class and pg_namespace just to fetch the schema and table name which, presumably, we should or at least could already have. This kinda relates to my comment earlier about making -v print a message per relation so that we can see, in human-readable format, which relations are getting checked. Right now, if you got an error checking just one relation, how would you know which relation you got it from? Unless the server happens to report that information in the message, you're just in the dark, because pg_amcheck won't tell you. The line "Read the description of the amcheck contrib module for details" seems like it could be omitted. Perhaps the first line of the help message could be changed to read "pg_amcheck uses amcheck to find corruption in a PostgreSQL database." or something like that, instead. -- Robert Haas EDB: http://www.enterprisedb.com