On Wed, Mar 10, 2021 at 11:10 AM Mark Dilger <mark.dil...@enterprisedb.com> wrote: > Once again, I think you are right and have removed the objectionable > behavior, but.... > > The --startblock and --endblock options make the most sense when the user is > only checking one table, like > > pg_amcheck --startblock=17 --endblock=19 --table=my_schema.my_corrupt_table > > because the user likely has some knowledge about that table, perhaps from a > prior run of pg_amcheck. The --startblock and --endblock arguments are a bit > strange when used globally, as relations don't all have the same number of > blocks, so > > pg_amcheck --startblock=17 --endblock=19 mydb > > will very likely emit lots of error messages for tables which don't have > blocks in that range. That's not entirely pg_amcheck's fault, as it just did > what the user asked, but it also doesn't seem super helpful. I'm not going > to do anything about it in this release.
+1 to all that. I tend toward the opinion that trying to make --startblock and --endblock do anything useful in the context of checking multiple relations is not really possible, and therefore we just shouldn't put any effort into it. But if user feedback shows otherwise, we can always do something about it later. > After running 'make installcheck', if I delete all entries from pg_class > where relnamespace = 'pg_toast'::regclass, by running 'pg_amcheck > regression', I get lines that look like this: > > heap relation "regression"."public"."quad_poly_tbl": > ERROR: could not open relation with OID 17177 In this here example, the first line ends in a colon. > relation "regression"."public"."functional_dependencies", block 28, offset > 54, attribute 0 > attribute 0 with length 4294967295 ends at offset 50 beyond total tuple > length 43 But this here one does not. Seems like it should be consistent. The QUALIFIED_NAME_FIELDS macro doesn't seem to be used anywhere, which is good, because macros with unbalanced parentheses are usually not a good plan; and a macro that expands to a comma-separated list of things is suspect too. "invalid skip options\n" seems too plural. With regard to your use of strtol() for --{start,end}block, telling the user that their input is garbage seems pejorative, even though it may be accurate. Compare: [rhaas EDBAS]$ pg_dump -jdsgdsgd pg_dump: error: invalid number of parallel jobs In the message "relation end block argument precedes start block argument\n", I think you could lose both instances of the word "argument" and probably the word "relation" as well. I actually don't know why all of these messages about start and end block mention "relation". It's not like there is some other kind of non-relation-related start block with which it could be confused. The comment for run_command() explains some things about the cparams argument, but those things are false. In fact the argument is unused. Usual PostgreSQL practice when freeing memory in e.g. verify_heap_slot_handler is to set the pointers to NULL as well. The performance cost of this is trivial, and it makes debugging a lot easier should somebody accidentally write code to access one of those things after it's been freed. The documentation says that -D "does exclude any database that was listed explicitly as dbname on the command line, nor does it exclude the database chosen in the absence of any dbname argument." The first part of this makes complete sense to me, but I'm not sure about the second part. If I type pg_amcheck --all -D 'r*', I think I'm expecting that "rhaas" won't be checked. Likewise, if I say pg_amcheck -d 'bob*', I think I only want to check the bob-related databases and not rhaas. I suggest documenting --endblock as "Check table blocks up to and including the specified ending block number. An error will occur if a relation being checked has fewer than this number of blocks." And similarly for --startblock: "Check table blocks beginning with the specified block number. An error will occur, etc." Perhaps even mention something like "This option is probably only useful when checking a single table." Also, the documentation here isn't clear that this affects only table checking, not index checking. It appears that pg_amcheck sometimes makes dummy connections to the database that don't do anything, e.g. pg_amcheck -t 'q*' resulted in: 2021-03-10 15:00:14.273 EST [95473] LOG: connection received: host=[local] 2021-03-10 15:00:14.274 EST [95473] LOG: connection authorized: user=rhaas database=rhaas application_name=pg_amcheck 2021-03-10 15:00:14.286 EST [95473] LOG: statement: SELECT pg_catalog.set_config('search_path', '', false); 2021-03-10 15:00:14.290 EST [95464] DEBUG: forked new backend, pid=95474 socket=11 2021-03-10 15:00:14.291 EST [95464] DEBUG: server process (PID 95473) exited with exit code 0 2021-03-10 15:00:14.291 EST [95474] LOG: connection received: host=[local] 2021-03-10 15:00:14.293 EST [95474] LOG: connection authorized: user=rhaas database=rhaas application_name=pg_amcheck 2021-03-10 15:00:14.296 EST [95474] LOG: statement: SELECT pg_catalog.set_config('search_path', '', false); <...more queries from PID 95474...> 2021-03-10 15:00:14.321 EST [95464] DEBUG: server process (PID 95474) exited with exit code 0 It doesn't seem to make sense to connect to a database, set the search path, exit, and then immediately reconnect to the same database. This is slightly inconsistent: pg_amcheck: checking heap table "rhaas"."public"."foo" heap relation "rhaas"."public"."foo": ERROR: XX000: catalog is missing 144 attribute(s) for relid 16392 LOCATION: RelationBuildTupleDesc, relcache.c:652 query was: SELECT blkno, offnum, attnum, msg FROM "public".verify_heapam( relation := 16392, on_error_stop := false, check_toast := true, skip := 'none') In line 1 it's a heap table, but in line 2 it's a heap relation. That's all I've got. -- Robert Haas EDB: http://www.enterprisedb.com