> On Oct 13, 2021, at 1:43 PM, Mark Dilger <mark.dil...@enterprisedb.com> wrote:
> 
> The issue of name parsing impacts pg_dump and pg_dumpall, also.  Consider 
> what happens with:
> 
> pg_dump -t production.critical.secrets > secrets.dump
> dropdb production
> 
> In v13, if your default database is "testing", and database "testing" has the 
> same schemas and tables (but not data) as production, you are unhappy.  You 
> just dumped a copy of your test data and blew away the production data.
> 
> You could end up unhappy in v14, if database "testing" has a schema named 
> "production" and a table that matches the pattern /^critical.secrets$/, but 
> otherwise, you'll get an error from pg_dump, "pg_dump: error: no matching 
> tables were found".  Neither behavior seems correct.

With the attached patch, this scenario results in a "cross-database references 
are not implemented" error.

> The function where the processing occurs is processSQLNamePattern, which is 
> called by pg_dump, pg_dumpall, and psql.  All three callers expect 
> processSQLNamePattern to append where-clauses to a buffer, not to execute any 
> sql of its own.  I propose that processSQLNamePattern return an error code if 
> the pattern contains more than three parts, but otherwise insert the database 
> portion into the buffer as a "pg_catalog.current_database() 
> OPERATOR(pg_catalog.=) <database>", where <database> is a properly escaped 
> representation of the database portion.  Maybe someday we can change that to 
> OPERATOR(pg_catalog.~), but for now we lack the sufficient logic for handling 
> multiple matching database names.  (The situation is different for 
> pg_dumpall, as it's using the normal logic for matching a relation name, not 
> for matching a database, and we'd still be fine matching that against a 
> pattern.)

I ultimately went with your strcmp idea rather than OPERATOR(pg_catalog.=), as 
rejecting the database name as part of the query complicates the calling 
convention for no apparent benefit.  I had been concerned about database names 
that were collation-wise equal but byte-wise unequal, but it seems we already 
treat those as distinct database names, so my concern was unnecessary.  We 
already use strcmp on database names from frontend clients 
(fe_utils/parallel_slots.c, psql/prompt.c, pg_amcheck.c, pg_dump.c, 
pg_upgrade/relfilenode.c), from libpq (libpq/hba.c) and from the backend 
(commands/dbcommands.c, init/postinit.c).  

I tried testing how this plays out by handing `createdb` the name é (U+00E9 
"LATIN SMALL LETTER E WITH ACCUTE") and then again the name é (U+0065 "LATIN 
SMALL LETTER E" followed by U+0301 "COMBINING ACCUTE ACCENT".)  That results in 
two distinct databases, not an error about a duplicate database name:

# select oid, datname, datdba, encoding, datcollate, datctype from 
pg_catalog.pg_database where datname IN ('é', 'é');
  oid  | datname | datdba | encoding | datcollate  |  datctype   
-------+---------+--------+----------+-------------+-------------
 37852 | é       |     10 |        6 | en_US.UTF-8 | en_US.UTF-8
 37855 | é       |     10 |        6 | en_US.UTF-8 | en_US.UTF-8
(2 rows)

But that doesn't seem to prove much, as other tools in my locale don't treat 
those as equal either.  (Testing with perl's "eq" operator, they compare as 
distinct.)  I expected to find regression tests providing better coverage for 
this somewhere, but did not.  Anybody know more about it?

> For psql and pg_dump, I'm tempted to restrict the database portion (if not 
> quoted) to neither contain shell glob characters nor POSIX regex characters, 
> and return an error code if any are found, so that the clients can raise an 
> appropriate error to the user.

With the patch, using pattern characters in an unquoted database portion 
results in a "database name must be literal" error.  Using them in a quoted 
database name is allowed, but unless you are connected to a database that 
literally equals that name, you will get a "cross-database references are not 
implemented" error.

> In psql, this proposal would result in no tables matching \d 
> wrongdb.schema.table, which would differ from v13's behavior.  You wouldn't 
> get an error about having specified the wrong database.  You'd just get no 
> matching relations.  \d ??db??.schema.table would complain about the db 
> portion being a pattern.  \d "??db??".schema.table would work, assuming 
> you're connected to a database literally named ??db??

With the patch, psql will treat \d wrongdb.schema.table as a "cross-database 
references are not implemented" error.

> In pg_dumpall, --exclude-database=more.than.one.part would give an error 
> about too many dotted parts rather than simply trying to exclude the last 
> "part" and silently ignoring the prefix, which I think is what v13's 
> pg_dumpall would do.  --exclude-database=db?? would work to exclude four 
> character database names beginning in "db".

The patch implements this.

> In pg_dump, the -t wrongdb.schema.table would match nothing and give the 
> familiar error "pg_dump: error: no matching tables were found".

With the patch, pg_dump instead gives a "cross-database references are not 
implemented" error.

>   pg_dump -t too.many.dotted.names would give a different error about too 
> many parts.

With the patch, pg_dump instead gives a "improper qualified name (too many 
dotted names)" error.

> pg_dump -t db??.foo.bar would give an error about the database needing to be 
> a literal name rather than a pattern.

With the patch, pg_dump gives a "database name must be literal" error.  This is 
the only new error message in the patch, which puts a burden on translators, 
but I didn't see any existing message that would serve.  Suggestions welcome.

> I don't like your proposal to use a strcmp() rather than a pg_catalog.= 
> match, because it diverges from how the rest of the pattern is treated, 
> including in how encoding settings might interact with the name, needing to 
> be executed on the client side rather than in the server where the rest of 
> the name resolution is happening. 

Recanted, as discussed above.


The patch only changes the behavior of pg_amcheck in that it now rejects 
patterns with too many parts.  Using database patterns was and remains legal 
for this tool.

The patch changes nothing about reindexdb.  That's a debatable design choice, 
but reindexdb doesn't use string_utils's processSQLNamePattern() function as 
the other tools do, nor does its documentation reference psql's 
#APP-PSQL-PATTERNS documentation.  It's --schema option only takes literal 
names.


Attachment: v1-0001-Reject-patterns-with-too-many-parts-or-wrong-db.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Reply via email to