Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

Bryn Llewellyn Thu, 06 Oct 2022 15:54:08 -0700

> list.pg.ga...@pendari.org wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Does this imply a risk that a future PG version will go against the SQL 
>> standard and reject any non-latin name that is free of all punctuation 
>> characters, when used in the role of a SQL identifier, unless it's double 
>> quoted?
> 
> From my perspective this thread seems to miss the essential purposes behind 
> quote_ident(). It is part of processing external/user input… Expecting an 
> arbitrary string to be equal to itself after it has been through string 
> processing code is risky unless that processing is part of the design, and 
> quote_ident() was never designed to be part of any such arrangement.
> 
> …If the designer decides to use non-ASCII characters in the identifier they 
> can… just double-quote those identifiers.
> 
> AFAIK we never really use quote_ident() except to process external input… 
> There is no surprise for me in the subject line. There is mild surprise the 
> question was asked.


Here's why I fell into the trap that I did.

(1) I used the Oracle RDBMS for a very long time. Now I’m working hard to 
forget everything that I learned there and learn the PG way instead. I’m 
frequently caught out.

(2) ORCL provides a function with the same aim as quote_indent. Its input is 
the text of a name. And its output is the text of a legal SQL identifier for 
that name. The output *always* starts and ends with a double quote. And why 
not? Sometimes the bare text of a name is perfectly acceptable as the text of a 
SQL identifier—and yet it’s still perfectly acceptable in that way when it's 
surrounded with double quotes.

(3) The PG doc on quote_ident says this in large friendly letters:

> Quotes are added only if necessary…


Notice "only". I now know that this is very much not the case. You can compose 
an effectively unlimited number of different examples along these lines:

select quote_ident('redaktør'); → "redaktør"
create table redaktør(n int); → table successfully created

The doc might well have said that quotes are often added when they're not 
needed. I know that the doc also says "to be used as an identifier in an SQL 
statement string". But the reference doc for a function should give a precise 
and accurate specification of the rule that derives the output value from the 
input value(s). (There might well be performance caveats.) As long as this is 
reliable, then the user is free to use the function for any purpose where the 
specified behavior fits the use case.

(4) I do have a genuine use case. I know that, usually, it's best to preface a 
question to a forum like this with a clear statement of what you want to 
achieve. Well, in this case I wanted to achieve a full correct understanding of 
what quote_ident does. I think that I have this now. I accept that no change 
will ever be made to the actual behavior of quote_ident (for example, to have 
it always surround the return text with double quotes). And I accept that 
nobody is motivated to change the docs to stop PG neophytes like me being 
mislead by taking the extant wording at face value.

B.t.w., my specific use case is best solved by spending just a little effort to 
write my own function to check my own, context-specific, spec of what defines a 
bad name. Of course, it uses PG's native regular expression functionality. I 
was simply distracted from that effort when I briefly wondered if the 
ready-made quote_ident might save me some effort. I very soon realized that it 
would not. But that the same time, I couldn't help noticing that its behavior 
was at odds with the doc. This ought to surprise anybody but the most cynical 
amongst us. So naturally I asked about this.

Re: ('dog$house' = quote_ident('dog$house')) is surprisingly FALSE

Reply via email to