Hello, (I've recovered the lost Cc recipients so far)
At Mon, 8 Aug 2016 12:52:11 +0300, Victor Wagner <vi...@wagner.pp.ru> wrote in <20160808125211.1361c...@fafnir.local.vm> > On Mon, 08 Aug 2016 18:28:57 +0900 (Tokyo Standard Time) > Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote: > > > > I don't see charset compatibility to be easily detectable, > > In the worst case we can hardcode explicit compatibility table. We could have the language lists compatible with some language-bound encodings. For example, LATIN1 (ISO/IEC 8859-1), according to Wikipedia (https://en.wikipedia.org/wiki/ISO/IEC_8859-1) According to the list, we might have the following compatibility list of locales, maybe without region. {{"UTF8", "LATIN1"}, "af", "sq", "eu", "da", "en", "fo", "en"}... and so. The biggest problem for this is at least *I* cannot confirm the validity of the list. Both about perfectness of coverage of LATIN1 over all languages in the list and omission of any possiblly coverable language. Nontheless, we could use such lists if we accept the possible imperfectness, which would eventually result in the original error (conversion failure) or excess fallback for possibly convertable languages but unfortunately the latter would be inacceptable for table data. > There is limited set of languages, which have translated error messages, > and limited (albeit wide) set of encodings, supported by PostgreSQL. So Yes, we can have a negative list already known to be incompatible. {{"UTF8", "LATIN1"}, "ru", .. er..what else?} ISO639-1 seems to have about 190 languages and most of them are apparently incompatible with LATIN1 encoding. It doesn't seem to me good to have a haphazardly made negative list. > it is possible to define complete list of encodings, compatible with > some translation. And fall back to untranslated messages if client > encoding is not in this list. > > > because locale (or character set) is not a matter of PostgreSQL > > (except for some encodings bound to one particular character > > set)... So the conversion-fallback might be a only available > > solution. > > Conversion fallback may be a solution for data. For NLS-messages I think > it is better to fall back to English (untranslated) messages than use of > transliteration or something alike. I suppose that 'fallback' means "have a try then use English if failed" so I think it is sutable rather for message, not for data, and it doesn't need any a priori information about compatibility. It seems to me that PostgreSQL refuses to ignore or conceal conversion errors and return broken or unwanted byte sequence for data. Things are different for error messages, it is preferable to be anyyhow readable than totally abandoned. > I think that for now we can assume that the best effort is already done > for the data, and think how to improve situation with messages. Is there any source to know the compatibility for any combination of language vs encoding? Maybe we need a ground for the list. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers