Hi, I have the following problem:
I wrote a minimal class implementing SAX (I attach it to this message). In this class I do the very simple:
public void characters(char[] ch, int start, int length) throws SAXException
{ String s = new String(ch, start, length); System.out.print(s); } I apply this class to the following minimal document: <?xml version="1.0" encoding="utf-8"?> <a>dégénéré</a>where the "é" characters are coded in UTF-8 (bytes 0xC3 0xA9). When I compile the class with the latest version of Xerces-J and run it on MacOS X 10.6 I get a very surprising result: the string dégénéré, where the "é" characters are represented by the single byte 0x8E. This was the position of letter "é" in the old (MacOS 9) encoding MacRoman.
What I don't understand is (a) why does Xerces change the encoding? (b) why does it chose a completely obsolete Mac encoding?
I have tried the same class under Windows XP and when I run it under Eclipse I get correct UTF-8 output, and when I run it a Windows terminal, I get the output in Windows Latin-1 (é is represented by byte 0xE9), which is again a 1-byte encoding.
Could you please tell me what to add to my code so that I will always obtain UTF-8, regardless of the platform? (I have used this code a few years ago, and I never had this problem…)
thanks in advance!
SaxTest.java
Description: Binary data
-- + -----------------------------------------------------------------------+| Yannis Haralambous, Ph.D. yannis.haralamb...@telecom- bretagne.eu | | Professor http://omega.enstb.org/ yannis | | twitter: y_haralambous | | Tel. +33 (0) 2.29.00.14.27 | | Fax +33 (0) 2.29.00.12.82 | | Computer Science Department | | Telecom Bretagne | | Technopole de Brest Iroise, CS 83818, 29238 Brest Cedex 3, France | | Google-Earth coordinates: 48°21'31.57"N 4°34'16.76"W | + -----------------------------------------------------------------------+ ...pour distinguer l'exterieur d'un aquarium, mieux vaut n'etre pas poisson
...the ball I threw while playing in the park has not yet reached the ground
Es gab eine Zeit, wo ich nur ungern ueber Schubert sprechen, nur Naechtens den Baeumen und Sternen von ihm vorerzaehlen moegen.
BEGIN:VCARD VERSION:3.0 N:Haralambous;Yannis;;; FN:Yannis Haralambous ORG:Enseignant-chercheur\, TELECOM Bretagne; EMAIL;type=INTERNET;type=WORK;type=pref:yannis.haralamb...@telecom-bretagne.eu TEL;type=WORK;type=pref:+33 229001427 TEL;type=CELL:+33 607981626 TEL;type=WORK;type=FAX:+33 229001282 item1.ADR;type=WORK;type=pref:;;Département Informatique\, TELECOM Bretagne\, CS 83818;Brest Cedex 3;;29238;France item1.X-ABADR:fr X-ABUID:CD3E6B27-C13F-40A4-B2F7-8393D5CE6493\:ABPerson END:VCARD
--------------------------------------------------------------------- To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org For additional commands, e-mail: j-users-h...@xerces.apache.org