UTF-16 is not an acceptable encoding for XML as it takes two bytes per
character, is byte order sensitive, and the XML tags would not be
recognized...
UTF-8 is the correct encoding!  Any 31 bit character in the ISO10646
specification can be correctly represented in UTF-8.  UNICODE is the first
65768 characters of ISO10646.
A CKJ character code point value of 0x6123 is represented in UTF-8 as three
bytes E6 84 A3.
What byte values are you seeing for the encoding of a given Chinese code
point?

----- Original Message -----
From: Zhu Ming <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, February 05, 2001 4:24 AM
Subject: RE: serializing XML to a ServletOutputStream fails


> Hi,
>
> Maybe you should not use character set "UTF-8". I remember
> that it's 8-bit Unicode. As I know, Chinese and Korean has
> 16-bit code. So at least, you should try 16-bit Unicode.
> I forgot the name, maybe it's "UTF-16". But I'm not sure if
> JDK have fully support to "UTF-16".
>
> I'm not an Unicode expert. I'll be happy if what I say can
> be a hint to solve this problem.
>
> Ming
>
>
> -----Original Message-----
> From: Michael Mealling [mailto:[EMAIL PROTECTED]]
> Sent: Monday, February 05, 2001 03:04
> To: [EMAIL PROTECTED]
> Subject: serializing XML to a ServletOutputStream fails
>
>
> (This might be a bug so I'm cc-ing to tomcat-dev)
> Hi,
>     I'm trying to serialize some XML out to a ServletOutputStream but
> the resulting XML on the client side contains corrupted Unicode
> characters (the DOM I'm serializing out contains Chinese, Korean,
> English, etc). Here's the code in question:
>
>         response.setContentType("text/xml; charset=UTF-8");
>         ServletOutputStream out = response.getOutputStream();
>
>         out.print("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
>                    "<!DOCTYPE cnrp PUBLIC \"-//IETF//DTD CNRP 1.0//EN\"" +
>                    " \"http://www.ietf.org/cnrp.dtd\">\n");
>         out.flush();
>         OutputFormat format = new OutputFormat(document);
>         format.setOmitXMLDeclaration(true);
>         format.setIndenting(true); // it makes debuggin easier
>         format.setEncoding("UTF-8"); // this is the default anyway
>         XMLSerializer serializer = new XMLSerializer(out, format);
>         serializer.serialize(document.getDocumentElement());
>
> The XML that the client gets is fine except that the non-ASCII subset
> of the UTF-8 encoded Unicode characters are garbled. I can serialize
> the XML out to a FileOutputStream and it works just fine.
>
> I'm running Tomcat 3.2.1 that's the backend for a remote
> Apache 1.3.17 server using ajp13 (and thus mod_jk).
>
> This code looks like its the right way to do this but either
> I've hit a bug or else I'm missing something (an encoding somewhere
> between a Stream and a Writer?)
>
> -MM
>
> --
> --------------------------------------------------------------------------
--
> ----
> Michael Mealling |      Vote Libertarian!       | www.rwhois.net/michael
> Sr. Research Engineer   |   www.ga.lp.org/gwinnett     | ICQ#:
> 14198821
> Network Solutions |          www.lp.org          |  [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, email: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, email: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to