Tom Christiansen <tchr...@perl.com> added the comment:

Please do not call this "utf-8-java". It is called "cesu-8" per UTS#18 at:

  http://unicode.org/reports/tr26/

CESU-8 is *not* a a valid Unicode Transform Format and should not be called 
UTF-8. It is a real pain in the butt, caused by people who misunderand Unicode 
mis-encoding UCS-2 into UTF-8, screwing it up. I understand the need to be able 
to read it, but call it what it is, please.

Despite the talk about Lucene, I note that the Perl port of Lucene uses real 
UTF-8, not CESU-8.

----------
nosy: +tchrist

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue2857>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to