2010/11/16 Peter Eisentraut <pete...@gmx.net>: > On tis, 2010-11-16 at 20:00 +0100, Pavel Stehule wrote: >> yes - my first question is: Why we need to specify encoding, when only >> one encoding is supported? I can't to use a cs_CZ.iso88592 when my db >> use a UTF8 - btw there is wrong message: >> >> yyy=# select * from jmena order by jmeno collate "cs_CZ.iso88592"; >> ERROR: collation "cs_CZ.iso88592" for current database encoding >> "UTF8" does not exist >> LINE 1: select * from jmena order by jmeno collate "cs_CZ.iso88592"; >> ^ > > Sorry, is there some mistake in that message? >
it is unclean - I expect some like "cannot to use collation cs_CZ.iso88502, because your database use a utf8 encoding". >> I don't know why, but preferred encoding for czech is iso88592 now - >> but I can't to use it - so I can't to use a names "czech", "cs_CZ". I >> always have to use a full name "cs_CZ.utf8". It's wrong. More - from >> this moment, my application depends on firstly used encoding - I can't >> to change encoding without refactoring of SQL statements - because >> encoding is hardly there (in collation clause). > > I can only look at the locales that the operating system provides. We > could conceivably make some simplifications like stripping off the > ".utf8", but then how far do we go and where do we stop? Locale names > on Windows look different too. But in general, how do you suppose we > should map an operating system locale name to an "acceptable" SQL > identifier? You might hope, for example, that we could look through the > list of operating system locale names and map, say, > > cs_CZ -> "czech" > cs_CZ.iso88592 -> "czech" > cs_CZ.utf8 -> "czech" > czech -> "czech" > > but we have no way to actually know that these are semantically similar, > so this illustrated mapping is AI complete. We need to take the locale > names as is, and that may or may not carry encoding information. > >> So I don't understand, why you fill a table pg_collation with thousand >> collated that are not possible to use? If I use a utf8, then there >> should be just utf8 based collates. And if you need to work with wide >> collates, then I am for a preferring utf8 - minimally for central >> europe region. if somebody would to use a collates here, then he will >> use a combination cs, de, en - so it must to use a latin2 and latin1 >> or utf8. I think so encoding should not be a part of collation when it >> is possible. > > Different databases can have different encodings, but the pg_collation > catalog is copied from the template database in any case. We can't do > any changes in system catalogs as we create new databases, so the > "useless" collations have to be there. There are only a few hundred, > actually, so it's not really a lot of wasted space. > I have not a problem with size. Just I think, current behave isn't practical. When database encoding is utf8, then I except, so "cs_CZ" or "czech" will be for utf8. I understand, so template0 must have a all locales, and I understand why current behave is, but it is very user unfriendly. Actually, only old application in CR uses latin2, almost all uses a utf, but now latin2 is preferred. This is bad and should be solved. Regards Pavel > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers