Re: [HACKERS] PGDay.it collation discussion notes

Heikki Linnakangas Mon, 20 Oct 2008 02:29:42 -0700

Tom Lane wrote:

Another objection to this design is that it's completely unclear that
functions from text to text should necessarily yield the same collation
that went into them, but if you treat collation as a hard-wired part of
the expression syntax tree you aren't going to be able to do anything else.
(What will you do about functions/operators taking more than one text
argument?)

Whatever the spec says. Collation is intimately associated with thecomparison operations, and doesn't make any sense anywhere else. The waythe default collation for a given operation is determined, by bubblingup the collation from the operands, through function calls and otherexpressions, is just to make life a bit easier for the developer who'swriting the SQL. We could demand that you always explicitly specify acollation when you use the text equality or inequality operators, butbecause that would be quite tiresome, a reasonable default is derivedfrom the context.

I believe the spec stipulates how that default is derived, so I don'tthink we need to fret over it. We'll need it eventually, but the parserchanges is not the critical part. We can start off by deriving thecollation from a GUC variable, for example.

I think it would be better to treat the collation indicator as part of
string *values* and let it bubble up through expressions that way.
The "expr COLLATE ident" syntax would be a simple run-time operation
that pokes a new collation into a string value.  The notion of a column
having a particular collation would then amount to a check constraint on
the values going into the column.


Looking at an individual value, collation just doesn't make sense.
Collation is property of the comparison operation, not of a value.

In the parser, we might have to do something like that though, becauseaccording to the standard you can tack the COLLATION keyword to stringconstants and have it bubble up. But let's keep that ugliness justinside the parser.

One, impractical, way to implement collation would be to have oneoperator class per collation. In fact you could do that today, with nobackend changes, to support multiple collations. It's totallyimpractical, because for starters you'd need different comparisonoperators, with different names, for each collation. But it's the rightmental model.

I think the right approach is to invent a new concept called "operatormodifier". It's basically a 3rd argument to operators. It can bespecified explicitly when an operator is used, with syntax like "<left>Op <right> USING <modifier>", or in case of collation, it's derived fromthe context, per SQL spec. The operator modifier is tacked on to OpExprsand SortClauses in the parser, and passed as a 3rd argument to thefunction implementing the operator at execution time.

When an index is created, if the operators in the operator class take anoperator modifier, it's stored at creation time into a new column inpg_index (needs to be a vector or array to handle multi-column indexes).The planner needs to check the modifier when it determines whether anindex can be used or not.

BTW, this reminds me of the discussions we had about the tsearch defaultconfiguration. It's different, though, because in full text search,there's a separate tsvector data type, and the problem was withexpression indexes, not regular ones.


Another consideration is LC_CTYPE. Just like we want to support
different collations, we should support different character
classifications for upper()/lower(). We might want to tie it into
collation, as using different ctype and collation doesn't usually make
sense, but it's something to keep in mind.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDay.it collation discussion notes

Reply via email to