Re: Fichier po de debconf et UTF-8

Nicolas Bertolissio Thu, 08 May 2003 14:40:54 -0500

Le jeudi  8 mai 2003, JÃrÃme Marant Ãcrit :
> Rebonjour,
Bonsoir, (re-),


>   Encore une interrogation sur UTF-8.
> 
>   Ne serait-ce pas faire un progrÃs que de fournir dÃs Ã
>   prÃsent Ã la fois une version iso-8859-1 ou -15 et une
>   version UTF-8 des templates debconf (respectivement
>   fr.po et fr.UTF-8.po) si nous souhaitons encourager
>   le passage Ã UTF-8?
> 
>   Pour ce faire, il suffit d'utiliser msgconv fourni avec
>   la glibc : msgconv -t UTF-8 input_file > output_file
>   Il n'y a pas d'effort supplÃmentaire Ã fournir.
> 
>   Je pense qu'il faudrait le faire pour les .po en gÃnÃral
>   mais je dÃpasse le cadre du projet.
bon j'ai trouvÃ msgconv dans ma sid chrootÃe, j'ai fait un petit test et
Ãa ne marche pas, en tout cas pour ce que je cherche Ã faire pour le
ddts. Voici ce que j'ai envoyÃ Ã Grisu, sans rÃaction pour le moment :

Subject: iconv from iso-8859-15 0xBD (unicode 0x153) to non-existing iso-8859-1 
fail

Le mardi  8 avril 2003, Nicolas Bertolissio Ãcrit :
> Hi,
> 
> I found the bug, this is because iconv cannot convert a character from
> iso-8859-15 to iso-8859-1
> 
>  applicatif entiÃrement mis en Åuvre au sein de vatÂ: vous n'avez besoin 
> d'aucune
>                                ^
> 
> this character is &oelig; but does not exist in iso-8859-1.
> 
> I don't know how we can fix this bug for the moment. We could store all
> descriptions in utf8 but we would have the same trouble when sending a
> description in iso-8859-1 as convertion from utf8 would not be possible.

Hypothesis:
===========

For the necessity of an example let's say we want to use iso-8859-15 but
translations are stores in iso-8859-1 and we look particularly at
character 0x153 (unicode) which doesn't exist in iso-8859-1 but can be
recoded as 'oe' and which exists in iso-8859-15 as 'Å' (0xBD).



Facts:
======

If someone send back an iso-8859-15 encoded translation, 'iconv' fails
converting 0xBD to iso-8859-1 as this character doesn't exists in
iso-8859-1.


Possible solutions:
===================

- use 'recode' instead of 'iconv'

  it will successfully convert from iso-8859-15 0xBD to iso-8859-1 'oe'

  problem: this doesn't work the other way, iso-8859-1 'oe' is recoded
  as iso-8859-15 'oe'


- use utf8 for translated descriptions

  all conversions from iso-8859-1 or iso-8859-15 to utf8 will succeed

  problem: 'iconv' will fail to convert unicode 0x153 to iso-8859-1 as
  this character doesn't exist


So both theses solutions are not acceptable.

Last one I thought about:


Use utf8 AND 'recode'
=====================

sending translation:
--------------------
conversions from unicode 0x153 to iso-8859-15 0xBD and to iso-8859-1
'oe' will both succeed so we can send data in any requested encoding.


receiving trnaslation:
----------------------
conversion from iso-8859-15 0xBD to utf8 will be unicode 0x153 so
comparision between database translation and received one will match in
utf8.

conversion from iso-8859-1 'oe' to utf8 will be unicode 'oe', so
comparision between database translation and received one will NOT
match in utf8.

So we cannont use utf8 for comparison.
But as conversion FROM unicode 0x153 will be either iso-8859-15 0xBD or
iso-8859-1 'oe', we can use the received encoding for comparison as it
will match.


Troubles:
---------
yes, there are some :(


- If a translator uses iso-8859-1 'oe', it will be stored as unicode
  'oe' in the database but if a reviewer uses iso-8859-15, unicode 'oe'
  will be sent to him as iso-8859-15 'oe', he will certainly change it
  into iso-8859-15 0xBD so a bug will be issued with unicode 0x153. The
  bug will be send in iso-8859-1 'oe' so with no difference for the
  translator.

  If he accepts the changes (he cannot see), the database will be
  updated with unicode 0x153 and all is fine.

  BUT: If he just closes the bug, certainly the reviewer will send back
  the same review and this is an endless circle.

  Review comment may help for this, but reviewers will have to write
  them, we cannot automake them.


- When a bug report is issue, it will always be issued in iso-8859-1 so
  even if the translator uses iso-8859-15 he will receive an iso-8859-1
  'oe' and HAVE TO change it back into iso-8859-15 0xBD.

  One possible solution for this last one is to store the
  translator/reviewer encoding in a new database so we can issue the bug
  report with the right encoding.



Any comment?


Donc, c'est juste une idÃe pour le moment, mais c'est peut-Ãtre,
sÃrement pas la bonne liste pour en discuter.

Mais les commentaires sont les bien venus.



Nicolas
--

Re: Fichier po de debconf et UTF-8

Répondre à