Re: [HACKERS] Per-column collation

Pavel Stehule Tue, 16 Nov 2010 12:01:05 -0800

2010/11/16 Peter Eisentraut <pete...@gmx.net>:
> On tis, 2010-11-16 at 20:00 +0100, Pavel Stehule wrote:
>> yes - my first question is: Why we need to specify encoding, when only
>> one encoding is supported? I can't to use a cs_CZ.iso88592 when my db
>> use a UTF8 - btw there is wrong message:
>>
>> yyy=# select * from jmena order by jmeno collate "cs_CZ.iso88592";
>> ERROR:  collation "cs_CZ.iso88592" for current database encoding
>> "UTF8" does not exist
>> LINE 1: select * from jmena order by jmeno collate "cs_CZ.iso88592";
>>                                            ^
>
> Sorry, is there some mistake in that message?
>


it is unclean - I expect some like "cannot to use collation
cs_CZ.iso88502, because your database use a utf8 encoding".

>> I don't know why, but preferred encoding for czech is iso88592 now -
>> but I can't to use it - so I can't to use a names "czech", "cs_CZ". I
>> always have to use a full name "cs_CZ.utf8". It's wrong. More - from
>> this moment, my application depends on firstly used encoding - I can't
>> to change encoding without refactoring of SQL statements - because
>> encoding is hardly there (in collation clause).
>
> I can only look at the locales that the operating system provides.  We
> could conceivably make some simplifications like stripping off the
> ".utf8", but then how far do we go and where do we stop?  Locale names
> on Windows look different too.  But in general, how do you suppose we
> should map an operating system locale name to an "acceptable" SQL
> identifier?  You might hope, for example, that we could look through the
> list of operating system locale names and map, say,
>
> cs_CZ           -> "czech"
> cs_CZ.iso88592  -> "czech"
> cs_CZ.utf8      -> "czech"
> czech           -> "czech"
>
> but we have no way to actually know that these are semantically similar,
> so this illustrated mapping is AI complete.  We need to take the locale
> names as is, and that may or may not carry encoding information.
>
>> So I don't understand, why you fill a table pg_collation with thousand
>> collated that are not possible to use? If I use a utf8, then there
>> should be just utf8 based collates. And if you need to work with wide
>> collates, then I am for a preferring utf8 - minimally for central
>> europe region. if somebody would to use a collates here, then he will
>> use a combination cs, de, en - so it must to use a latin2 and latin1
>> or utf8. I think so encoding should not be a part of collation when it
>> is possible.
>
> Different databases can have different encodings, but the pg_collation
> catalog is copied from the template database in any case.  We can't do
> any changes in system catalogs as we create new databases, so the
> "useless" collations have to be there.  There are only a few hundred,
> actually, so it's not really a lot of wasted space.
>

I have not a problem with size. Just I think, current behave isn't
practical. When database encoding is utf8, then I except, so "cs_CZ"
or "czech" will be for utf8. I understand, so template0 must have a
all locales, and I understand why current behave is, but it is very
user unfriendly. Actually, only old application in CR uses latin2,
almost all uses a utf, but now latin2 is preferred. This is bad and
should be solved.

Regards

Pavel

>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-column collation

Reply via email to