Re: [HACKERS] PGDay.it collation discussion notes

Gregory Stark Sat, 18 Oct 2008 09:07:56 -0700

Tom Lane <[EMAIL PROTECTED]> writes:

> It's fairly irritating to think that a string-specific option is going
> to become part of the fundamental type system --- it makes no sense to
> distinguish different collations for numeric for instance


Actually I thought of that generality as an advantage. Just because we can't
think of any right now doesn't mean there aren't applications of this. The
only example I could think of was a comparison operator on numeric which
specifies a significant precision. That doesn't sound super useful but there
are a lot of data types out there and I don't see any reason to think text is
the only one in the world that will have more than one reasonable ordering.

> I think it would be better to treat the collation indicator as part of
> string *values* and let it bubble up through expressions that way.
> The "expr COLLATE ident" syntax would be a simple run-time operation
> that pokes a new collation into a string value.  The notion of a column
> having a particular collation would then amount to a check constraint on
> the values going into the column.

I'm not super familiar with the spec here but from what I understood I think
this would be very different.

For instance, I think you need to be able to set the default collation on a
whole column after the fact. Rewriting the whole table to handle a collation
change seems like a non-starter.

Also, if the column doesn't have a default collation specified then you need
to use the default collation for a more general object -- I'm not sure if it's
table or schema next.

Thirdly, to handle resolving conflicting default collations you need to track
where the source of the default collation was. Ie, whether it was a default or
an explicit choice by the query.

Collation isn't really a property of the text at all. This design would force
the sources of text to pick a collation that will be used by other parts of
the application that they know nothing about. How is an DBA using COPY to
populate a table going to know what collation the web app which eventually
uses the data in that table will want to use?

The other side of the coin is that given the spec-compliant behaviour you can
always emulate the behaviour you're describing by adding another column. It
would be more useful too since you'll have a "language" column which may be
useful independently from the text content.

And of course the scheme you're describing would waste a huge amount of space
in every string on disk. For short strings it could triple the amount of space
(plus I think the explicit vs implicit collation would make it even worse).

-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDay.it collation discussion notes

Reply via email to