Le jeudi 8 mai 2003, JÃrÃme Marant Ãcrit : > Rebonjour, Bonsoir, (re-),
> Encore une interrogation sur UTF-8. > > Ne serait-ce pas faire un progrÃs que de fournir dÃs à > prÃsent à la fois une version iso-8859-1 ou -15 et une > version UTF-8 des templates debconf (respectivement > fr.po et fr.UTF-8.po) si nous souhaitons encourager > le passage à UTF-8? > > Pour ce faire, il suffit d'utiliser msgconv fourni avec > la glibc : msgconv -t UTF-8 input_file > output_file > Il n'y a pas d'effort supplÃmentaire à fournir. > > Je pense qu'il faudrait le faire pour les .po en gÃnÃral > mais je dÃpasse le cadre du projet. bon j'ai trouvà msgconv dans ma sid chrootÃe, j'ai fait un petit test et Ãa ne marche pas, en tout cas pour ce que je cherche à faire pour le ddts. Voici ce que j'ai envoyà à Grisu, sans rÃaction pour le moment : Subject: iconv from iso-8859-15 0xBD (unicode 0x153) to non-existing iso-8859-1 fail Le mardi 8 avril 2003, Nicolas Bertolissio Ãcrit : > Hi, > > I found the bug, this is because iconv cannot convert a character from > iso-8859-15 to iso-8859-1 > > applicatif entiÃrement mis en Åuvre au sein de vatÂ: vous n'avez besoin > d'aucune > ^ > > this character is œ but does not exist in iso-8859-1. > > I don't know how we can fix this bug for the moment. We could store all > descriptions in utf8 but we would have the same trouble when sending a > description in iso-8859-1 as convertion from utf8 would not be possible. Hypothesis: =========== For the necessity of an example let's say we want to use iso-8859-15 but translations are stores in iso-8859-1 and we look particularly at character 0x153 (unicode) which doesn't exist in iso-8859-1 but can be recoded as 'oe' and which exists in iso-8859-15 as 'Å' (0xBD). Facts: ====== If someone send back an iso-8859-15 encoded translation, 'iconv' fails converting 0xBD to iso-8859-1 as this character doesn't exists in iso-8859-1. Possible solutions: =================== - use 'recode' instead of 'iconv' it will successfully convert from iso-8859-15 0xBD to iso-8859-1 'oe' problem: this doesn't work the other way, iso-8859-1 'oe' is recoded as iso-8859-15 'oe' - use utf8 for translated descriptions all conversions from iso-8859-1 or iso-8859-15 to utf8 will succeed problem: 'iconv' will fail to convert unicode 0x153 to iso-8859-1 as this character doesn't exist So both theses solutions are not acceptable. Last one I thought about: Use utf8 AND 'recode' ===================== sending translation: -------------------- conversions from unicode 0x153 to iso-8859-15 0xBD and to iso-8859-1 'oe' will both succeed so we can send data in any requested encoding. receiving trnaslation: ---------------------- conversion from iso-8859-15 0xBD to utf8 will be unicode 0x153 so comparision between database translation and received one will match in utf8. conversion from iso-8859-1 'oe' to utf8 will be unicode 'oe', so comparision between database translation and received one will NOT match in utf8. So we cannont use utf8 for comparison. But as conversion FROM unicode 0x153 will be either iso-8859-15 0xBD or iso-8859-1 'oe', we can use the received encoding for comparison as it will match. Troubles: --------- yes, there are some :( - If a translator uses iso-8859-1 'oe', it will be stored as unicode 'oe' in the database but if a reviewer uses iso-8859-15, unicode 'oe' will be sent to him as iso-8859-15 'oe', he will certainly change it into iso-8859-15 0xBD so a bug will be issued with unicode 0x153. The bug will be send in iso-8859-1 'oe' so with no difference for the translator. If he accepts the changes (he cannot see), the database will be updated with unicode 0x153 and all is fine. BUT: If he just closes the bug, certainly the reviewer will send back the same review and this is an endless circle. Review comment may help for this, but reviewers will have to write them, we cannot automake them. - When a bug report is issue, it will always be issued in iso-8859-1 so even if the translator uses iso-8859-15 he will receive an iso-8859-1 'oe' and HAVE TO change it back into iso-8859-15 0xBD. One possible solution for this last one is to store the translator/reviewer encoding in a new database so we can issue the bug report with the right encoding. Any comment? Donc, c'est juste une idÃe pour le moment, mais c'est peut-Ãtre, sÃrement pas la bonne liste pour en discuter. Mais les commentaires sont les bien venus. Nicolas --