> On Oct 13, 2021, at 1:43 PM, Mark Dilger <mark.dil...@enterprisedb.com> wrote: > > The issue of name parsing impacts pg_dump and pg_dumpall, also. Consider > what happens with: > > pg_dump -t production.critical.secrets > secrets.dump > dropdb production > > In v13, if your default database is "testing", and database "testing" has the > same schemas and tables (but not data) as production, you are unhappy. You > just dumped a copy of your test data and blew away the production data. > > You could end up unhappy in v14, if database "testing" has a schema named > "production" and a table that matches the pattern /^critical.secrets$/, but > otherwise, you'll get an error from pg_dump, "pg_dump: error: no matching > tables were found". Neither behavior seems correct.
With the attached patch, this scenario results in a "cross-database references are not implemented" error. > The function where the processing occurs is processSQLNamePattern, which is > called by pg_dump, pg_dumpall, and psql. All three callers expect > processSQLNamePattern to append where-clauses to a buffer, not to execute any > sql of its own. I propose that processSQLNamePattern return an error code if > the pattern contains more than three parts, but otherwise insert the database > portion into the buffer as a "pg_catalog.current_database() > OPERATOR(pg_catalog.=) <database>", where <database> is a properly escaped > representation of the database portion. Maybe someday we can change that to > OPERATOR(pg_catalog.~), but for now we lack the sufficient logic for handling > multiple matching database names. (The situation is different for > pg_dumpall, as it's using the normal logic for matching a relation name, not > for matching a database, and we'd still be fine matching that against a > pattern.) I ultimately went with your strcmp idea rather than OPERATOR(pg_catalog.=), as rejecting the database name as part of the query complicates the calling convention for no apparent benefit. I had been concerned about database names that were collation-wise equal but byte-wise unequal, but it seems we already treat those as distinct database names, so my concern was unnecessary. We already use strcmp on database names from frontend clients (fe_utils/parallel_slots.c, psql/prompt.c, pg_amcheck.c, pg_dump.c, pg_upgrade/relfilenode.c), from libpq (libpq/hba.c) and from the backend (commands/dbcommands.c, init/postinit.c). I tried testing how this plays out by handing `createdb` the name é (U+00E9 "LATIN SMALL LETTER E WITH ACCUTE") and then again the name é (U+0065 "LATIN SMALL LETTER E" followed by U+0301 "COMBINING ACCUTE ACCENT".) That results in two distinct databases, not an error about a duplicate database name: # select oid, datname, datdba, encoding, datcollate, datctype from pg_catalog.pg_database where datname IN ('é', 'é'); oid | datname | datdba | encoding | datcollate | datctype -------+---------+--------+----------+-------------+------------- 37852 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8 37855 | é | 10 | 6 | en_US.UTF-8 | en_US.UTF-8 (2 rows) But that doesn't seem to prove much, as other tools in my locale don't treat those as equal either. (Testing with perl's "eq" operator, they compare as distinct.) I expected to find regression tests providing better coverage for this somewhere, but did not. Anybody know more about it? > For psql and pg_dump, I'm tempted to restrict the database portion (if not > quoted) to neither contain shell glob characters nor POSIX regex characters, > and return an error code if any are found, so that the clients can raise an > appropriate error to the user. With the patch, using pattern characters in an unquoted database portion results in a "database name must be literal" error. Using them in a quoted database name is allowed, but unless you are connected to a database that literally equals that name, you will get a "cross-database references are not implemented" error. > In psql, this proposal would result in no tables matching \d > wrongdb.schema.table, which would differ from v13's behavior. You wouldn't > get an error about having specified the wrong database. You'd just get no > matching relations. \d ??db??.schema.table would complain about the db > portion being a pattern. \d "??db??".schema.table would work, assuming > you're connected to a database literally named ??db?? With the patch, psql will treat \d wrongdb.schema.table as a "cross-database references are not implemented" error. > In pg_dumpall, --exclude-database=more.than.one.part would give an error > about too many dotted parts rather than simply trying to exclude the last > "part" and silently ignoring the prefix, which I think is what v13's > pg_dumpall would do. --exclude-database=db?? would work to exclude four > character database names beginning in "db". The patch implements this. > In pg_dump, the -t wrongdb.schema.table would match nothing and give the > familiar error "pg_dump: error: no matching tables were found". With the patch, pg_dump instead gives a "cross-database references are not implemented" error. > pg_dump -t too.many.dotted.names would give a different error about too > many parts. With the patch, pg_dump instead gives a "improper qualified name (too many dotted names)" error. > pg_dump -t db??.foo.bar would give an error about the database needing to be > a literal name rather than a pattern. With the patch, pg_dump gives a "database name must be literal" error. This is the only new error message in the patch, which puts a burden on translators, but I didn't see any existing message that would serve. Suggestions welcome. > I don't like your proposal to use a strcmp() rather than a pg_catalog.= > match, because it diverges from how the rest of the pattern is treated, > including in how encoding settings might interact with the name, needing to > be executed on the client side rather than in the server where the rest of > the name resolution is happening. Recanted, as discussed above. The patch only changes the behavior of pg_amcheck in that it now rejects patterns with too many parts. Using database patterns was and remains legal for this tool. The patch changes nothing about reindexdb. That's a debatable design choice, but reindexdb doesn't use string_utils's processSQLNamePattern() function as the other tools do, nor does its documentation reference psql's #APP-PSQL-PATTERNS documentation. It's --schema option only takes literal names.
v1-0001-Reject-patterns-with-too-many-parts-or-wrong-db.patch
Description: Binary data
— Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company