Hi, i think that the original poster needs some help with Java & encodings, so i take the freedom to add some (simplified) background here. (And sorry because this is outside the scope of Tomcat but pure Java)
A Java char is internally represented as the UTF encoded bytes of that particular character. Due to UTF's capabilities, any character in the world (and beyond, Klingon and some Fantasy chars are also supported IIRC) is representable as Java char. The challenge here is that the original file was not written using the UTF encoding but probably a chinese encoding, which means that the actual binary data in the file is different from the binary data that you would have had if the file were encoded in UTF In order to create the correct UTF bytes from a file that was encoded in another encoding, Java simply must know the encoding that the file was originally written with, there is simply no other way. So, when the file is read, what you basically get in the first place is a byte[]. Java comes with several input stream classes that perform some encoding magic for you, but none of them is capable of performing "encoding guessing". What is finally happening is: byte[] bytes = .... // your raw bytes here String s = new String(bytes, "UTF-8"); // garbage due to wrong encoding The problem in the original poster's case is that the byte[] above contains the bytes as they were written originally, so in order to reconstruct the original characters, you need to do so here: String s = new String(bytes, "GB2312"); // no garbage if file was encoded with GB2312 IIRC, the GB2312 encoding indeed is a superset of ISO-1 but it still is different byte-wise from UTF, which is why you get garbage. HTH, Robert > -----Original Message----- > From: David Delbecq [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 18, 2005 12:08 PM > To: Tomcat Users List > Subject: Re: Character Encoding -ISo-8859-1 Vs UTF-8 Vs GBK > > > Hi, > > UTF-8 can handle european and chinese character very well. > If you can't read using utf-8 any of those this simply > mean you text file is not saved in utf-8. > > [EMAIL PROTECTED] a écrit : > > >Hi, > >I am trying to read the universal charater form a text file > to my java > >application that stores them in database. When I use > encoding type "GBK" i > >can read all special charater in chinease, when i use > encoding "ISO-8859-1" > >i can read latin but not chinease , but whn i use encoding > as "UTF-8" i > >think i ma supposed to read both chinease and latin > correctly but i am not > >able to read any of them. Can any one give me the pointers > for solution , > >Further the beta- is converted to ss in latin-1 > > > >thanks in advance > >Birendar S Waldiya > > > > > >Notice: The information contained in this e-mail message > and/or attachments to it may contain confidential or > privileged information. If you are not the intended > recipient, any dissemination, use, review, distribution, > printing or copying of the information contained in this > e-mail message and/or attachments to it are strictly > prohibited. If you have received this communication in > error, please notify us by reply e-mail or telephone and > immediately and permanently delete the message and any > attachments. Thank you > > > >--------------------------------------------------------------------- > >To unsubscribe, e-mail: [EMAIL PROTECTED] > >For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]