Hi Inma, xmlutf8.getBytes() doesn't return what you think. Both ByteArrayOutputStream.toString() [1] and String.getBytes() [2] use the default encoding (which is probably ISO-8859-1 on your system) for converting between bytes -> chars and chars -> bytes. You can fix this by specifying the encoding on these methods, but if I were you I'd avoid doing the conversions altogether and just create the StreamSource/StreamResult with a java.io.StringReader/java.io.StringWriter instead.
Thanks. [1] http://java.sun.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html#toString() [2] http://java.sun.com/javase/6/docs/api/java/lang/String.html#getBytes() Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] Robert Houben <[EMAIL PROTECTED]> wrote on 08/02/2007 11:36:34 AM: > Hi Inma, > > The last line of your first block you have: > return baos.toString(); > Note that when you do ?toString()? on the byte array it will return > a string in Java internal form, not UTF8. I?m guessing that in your > next block of code, xmlutf8 is the result of the first block. This > means that when you getBytes() from it, you are getting bytes that > are no longer in UTF8 form. > > HTH, > > From: Inma Marín López [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 02, 2007 12:53 AM > To: j-users@xerces.apache.org > Subject: Problems with ISO-8859-1 and UTF-8 encodings > > Hi all, > > I have some problems with ISO-5589-1 and UTF-8 encodings in XML > documents. Concretely, I have this ISO-8859-1 - encoded XML document: > > <?xml version="1.0" encoding="ISO-8859-1"?> > <DOCUMENTO> > <PERFILES>Á</PERFILES> > <PERFILES>É</PERFILES> > <PERFILES>Í</PERFILES> > <PERFILES>Ó</PERFILES> > <PERFILES>Ú</PERFILES> > </DOCUMENTO> > > Then I UTF-8 - encode it, by means of the following piece of code: > > Transformer transformer = TransformerFactory. > newInstance().newTransformer(); > StreamSource ds = new StreamSource(new > ByteArrayInputStream(xmliso88191.getBytes())); > transformer.setOutputProperty(OutputKeys.ENCODING,"utf-8"); > ByteArrayOutputStream baos = new ByteArrayOutputStream(); > transformer.transform(ds,new StreamResult(baos)); > return baos.toString(); > > to obtain this XML document: > > <?xml version="1.0" encoding="utf-8"?> > <DOCUMENTO> > <PERFILES>Ã?</PERFILES> > <PERFILES>Ã?</PERFILES> > <PERFILES>Ã?</PERFILES> > <PERFILES>Ã?</PERFILES> > <PERFILES>Ã?</PERFILES> > </DOCUMENTO> > > Next, I ISO-8859-1- encode this document (UTF-8 encoded): > > Transformer transformer = TransformerFactory. > newInstance().newTransformer(); > StreamSource ds = new StreamSource(new > ByteArrayInputStream(xmlutf8.getBytes())); > transformer.setOutputProperty(OutputKeys.ENCODING,"iso-8859-1"); > ByteArrayOutputStream baos = new ByteArrayOutputStream(); > transformer.transform(ds,new StreamResult(baos)); > return baos.toString(); > > But I can not get it. Instead, I obtain the following exception: > > [Fatal Error] :8:11: Invalid byte 2 of 2-byte UTF-8 sequence. > javax.xml.transform.TransformerException: org.xml.sax. > SAXParseException: Invali > byte 2 of 2-byte UTF-8 sequence. > at org.apache.xalan.transformer.TransformerIdentityImpl. > transform(Trans > ormerIdentityImpl.java:449) > at codificacion.PruebasCodificacion. > encodeISO88891(PruebasCodificacion. > ava:302) > at codificacion.PruebasCodificacion. > prueba(PruebasCodificacion.java:73) > at codificacion.PruebasCodificacion.main(PruebasCodificacion.java:356) > Caused by: org.xml.sax.SAXParseException: Invalid byte 2 of 2-byte > UTF-8 sequen > e. > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) > at org.apache.xalan.transformer.TransformerIdentityImpl. > transform(Trans > ormerIdentityImpl.java:432) > > > Is this process correct? Supposing that it is, it seems the > exception is due to ?Ã?? characters (?Á? and ?Í? UTF-8 ? encoding), > so I would like to know how I could UTF-8 - encode ?Á? and ?Í? > characters and then, back them to ISO-8859-1 encoding. > > Could anybody be so kind as to help me, please? > > Thank you very much in advance. > Inma. > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]