On Thu, Jun 9, 2011 at 1:22 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Thu, Jun 9, 2011 at 11:17 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> Hmm ... while the above is easy enough to do in the backend, where we >>> can look at pg_database_encoding_max_length, we have also got instances >>> of this coding pattern in src/port/pgstrcasecmp.c. It's a lot less >>> obvious how to make the test in frontend environments. Thoughts anyone? > >> I'm not sure if this helps at all, but an awful lot of those tests are >> against hard-coded strings that are known to contain only ASCII >> characters. Is there some way we can optimize this for that case? > > For the places where we're just looking for a match to a fixed all-ASCII > string, an ASCII-only downcasing would be sufficient, and would > eliminate the whole problem. But I doubt all the callers fall into that > class. > > What I'm particularly worried about at the moment is whether we are > assuming anywhere that the frontend side can duplicate the backend's > identifier downcasing behavior. That seems like a complete morass, > because (1) they might not have the same locale, (2) they might not > have the same encoding, (3) even if they do, the "same" locale is known > to behave differently on different platforms.
Right. Understood. So let's look at the cases (from git grep pg_strcasecmp and pg_strncasecmp): contrib/dict_int: Fixed strings only, and it's all backend code anyway. contrib/dict_xsyn: Fixed strings only, and it's all backend code anyway. contrib/hstore: Fixed strings only, and it's all backend code anyway. contrib/pg_upgrade: Used to compare LC_COLLATE, LC_CTYPE, and encoding names. contrib/pgbench: Definitely front-end code, but it's all fixed strings. contrib/pgcrypto: All fixed strings except for one instance in px_find_digit. But it's all backend contrib/spi: One instance, not a fixed string, but it's backend code. contrib/unaccent: One instance, not a fixed string, but it's backend code. src/backend/*: Backend code, obviously. src/bin/initdb: Strings from a constant lookup table (tsearch_config_languages) only. src/bin/pg_basebackup: Fixed strings only. src/bin/pg_ctl: Fixed strings only. src/bin/pg_dump: Fixed strings only. src/bin/psql: Fixed strings only. In a couple of cases they are not constants - help.c uses strings from to generated file sql_help.h, and tab-complete.c uses strings from a constant array called words_after_create[]. But these are constant lookup tables. src/include: access/reloptions.h uses strncasecmp() as part of a macro. That should be OK as long as no one tries to include this in frontend code, which seems rather impractical. src/interfaces/ecpg/ecpglib: Fixed strings. src/interfaces/ecpg/pgtypeslib: Fixed strings, and strings from a constant lookup table, only. src/interfaces/ecpg/preproc: This looks a bit worrisome. It seems we might be using it on identifiers here. src/interfaces/libpq: This is attempting to match a wildcard certificate name against a hostname, in two different places. src/port/chklocale.c: Fixed strings or ones from a lookup table. src/timezone/pgtz.c: Matches input strings against filenames read from the OS. So mostly I think these are OK. The instance in src/interfaces/ecpg/preproc looks like the most likely candidate for a problem spot. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers