On 19 June 2012 18:57, Kevin Grittner <kevin.gritt...@wicourts.gov> wrote: > We weren't using en_US.UTF-8 collation (or any other "proper" > collation) on Sybase -- I'm not sure whether they even supported > proper collation sequences on the versions we used. I'm thinking of > when we were using their "case insensitive" sorting. I don't know > the implementation details, but the behavior was consistent with > including each character-based column twice: once in the requested > position in the ORDER BY clause but folded to a consistent case, and > again after all the columns in the ORDER BY clause in original form, > with C collation. > > I wasn't aware that en_US.UTF-8 doesn't have equivalence without > equality.
Not that many do. The underlying cause of the problem back in 2005 was the tacit assumption that none do, which presumably we got away with for a while. I mentioned Hungarian, but it happens a bit in Swedish too. PostgreSQL supported Unicode before 2005, when the tie-breaker was introduced. I know at least one Swede who used Postgres95. I just took a look at the REL6_4 branch, and it looks much the same in 1999 as it did in 2005, in that there is no tie-breaker after the strcoll(). Now, that being the case, and Hungarian in particular having a whole bunch of these equivalencies, I have to wonder if the original complainant's problem really was diagnosed correctly. It could of had something to do with the fact that texteq() was confused about whether it reported equality or equivalency - it may have taken that long for the (len1 != len2) fastpath thing (only holds for equality, not equivalence, despite the fact that the 2005-era strcoll() call checks equivalence within texteq() ) to trip someone out, because texteq() would have thereby given inconsistent answers in a very subtle way, that were not correct either according to the Hungarian locale, nor according to simple bitwise equality. That's mostly speculation, but the question must be asked. > I guess that surprising result in my last post is just > plain inevitable with that collation then. Bummer. Is there > actually anyone who finds that to be a useful behavior? Looking at the details again and assuming a US locale, yeah, it is. The substance of your complain holds though. > For a collation which considered upper-case and lower-case to be > equivalent, would PostgreSQL sort as I wanted, or is it doing some > tie-break per column within equivalent values? You could do that, and some people do use custom collations for various reasons. That's obviously very much of minority interest though. Most people will just use citext or something. However, since citext is itself a client of varstr_cmp(), this won't help you. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers