Em Tue, 29 Jul 2008 14:44:03 -0300, Blower, Andy <[EMAIL PROTECTED]> escreveu:

Thiago,

Hi!

Sorry I don't understand your objection. Could you expand on it please? Especially where you say "have a memory and bandwidth penalty using 2 bytes to encode many characters that would be encoded as 1 in UTF-8".

Oooops, typo of mine.
Most Portuguese accented characters are encoded as 2 bytes in UTF-8 and 1 byte in *ISO-8859-1*, AFAIK. So, everytime I write "não" ("no"), UTF-8 spends 4 bytes, ISO-8859-1 spends 3.

In my experience char encoding can be an absolute nightmare and having as much as possible as UTF-8 is highly desirable. IIRC Java uses UTF-16 internally which does have 2 bytes for each char, but UTF-8 only uses 2 bytes for unusual chars which is why it's the ideal external charset.

Agreed, but in many languages unusual characters (from a speaker of English or any other languagen without accents) are not unusual, are frequent.

I hope I worded my ideas better now.

Regarding database encodings, I think got confused. It was not the ISO-8859-1-encoded database the problem, but ISO-8859-1-encoded Tapestry templates. Everytime an accented character was submited in a form, I would get 2 characters unless I added accepted-encoding="iso-8859-1" to every form tag.

Thiago

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to