Hi Inma, The last line of your first block you have: return baos.toString(); Note that when you do “toString()” on the byte array it will return a string in Java internal form, not UTF8. I’m guessing that in your next block of code, xmlutf8 is the result of the first block. This means that when you getBytes() from it, you are getting bytes that are no longer in UTF8 form.
HTH, From: Inma Marín López [mailto:[EMAIL PROTECTED] Sent: Thursday, August 02, 2007 12:53 AM To: j-users@xerces.apache.org Subject: Problems with ISO-8859-1 and UTF-8 encodings Hi all, I have some problems with ISO-5589-1 and UTF-8 encodings in XML documents. Concretely, I have this ISO-8859-1 - encoded XML document: <?xml version="1.0" encoding="ISO-8859-1"?> <DOCUMENTO> <PERFILES>Á</PERFILES> <PERFILES>É</PERFILES> <PERFILES>Í</PERFILES> <PERFILES>Ó</PERFILES> <PERFILES>Ú</PERFILES> </DOCUMENTO> Then I UTF-8 - encode it, by means of the following piece of code: Transformer transformer = TransformerFactory.newInstance().newTransformer(); StreamSource ds = new StreamSource(new ByteArrayInputStream(xmliso88191.getBytes())); transformer.setOutputProperty(OutputKeys.ENCODING,"utf-8"); ByteArrayOutputStream baos = new ByteArrayOutputStream(); transformer.transform(ds,new StreamResult(baos)); return baos.toString(); to obtain this XML document: <?xml version="1.0" encoding="utf-8"?> <DOCUMENTO> <PERFILES>Ã?</PERFILES> <PERFILES>É</PERFILES> <PERFILES>Ã?</PERFILES> <PERFILES>Ó</PERFILES> <PERFILES>Ú</PERFILES> </DOCUMENTO> Next, I ISO-8859-1- encode this document (UTF-8 encoded): Transformer transformer = TransformerFactory.newInstance().newTransformer(); StreamSource ds = new StreamSource(new ByteArrayInputStream(xmlutf8.getBytes())); transformer.setOutputProperty(OutputKeys.ENCODING,"iso-8859-1"); ByteArrayOutputStream baos = new ByteArrayOutputStream(); transformer.transform(ds,new StreamResult(baos)); return baos.toString(); But I can not get it. Instead, I obtain the following exception: [Fatal Error] :8:11: Invalid byte 2 of 2-byte UTF-8 sequence. javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: Invali byte 2 of 2-byte UTF-8 sequence. at org.apache.xalan.transformer.TransformerIdentityImpl.transform(Trans ormerIdentityImpl.java:449) at codificacion.PruebasCodificacion.encodeISO88891(PruebasCodificacion. ava:302) at codificacion.PruebasCodificacion.prueba(PruebasCodificacion.java:73) at codificacion.PruebasCodificacion.main(PruebasCodificacion.java:356) Caused by: org.xml.sax.SAXParseException: Invalid byte 2 of 2-byte UTF-8 sequen e. at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xalan.transformer.TransformerIdentityImpl.transform(Trans ormerIdentityImpl.java:432) Is this process correct? Supposing that it is, it seems the exception is due to ‘Ã?’ characters (‘Á’ and ‘Í’ UTF-8 – encoding), so I would like to know how I could UTF-8 - encode ‘Á’ and ‘Í’ characters and then, back them to ISO-8859-1 encoding. Could anybody be so kind as to help me, please? Thank you very much in advance. Inma.