Hi Robert, How do I get the chance of interpreting the characters with HttpCli? I don't set any property whatsoever regarding the enconding of the data I'm receiving. The TStringStream the data comes is already the way I showed in my last message ... How do I get the "raw" data or something?
Cheers, Marcelo Grossi ----- Original Message ----- From: "Robert Chafer" <[EMAIL PROTECTED]> To: "ICS support mailing" <twsocket@elists.org> Sent: Thursday, July 20, 2006 2:30 PM Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue It depends on how you interpret the characters you are downloading. Look at this page: http://www.expansys.fr/ Now change the encoding from ISO8859-1 to UTF-8 (in IE its right click the page and choose encoding, FF View->Character Encoding). You see how (in IE) the accented characters turn into Chinese? This is because the way you process the characters depends on the encoding used to send them. On Thu, 20 Jul 2006 14:23:06 -0300, you wrote: > Hello, > > I´ve posted a message a few days ago about a html page being > retrieved > with weird chars (through ICS's HttpCli). As very well suggested by JP in > his reply to my message, the page was endeed UTF-8 coded. But the > question > remains (as I am currently building a weird char converter as they appear > on > the captured page ... [yes, very dumb on my behalf]), how can I get the > retrieved characters as UTF-8? I mean, UTF-8 uses more then 1 Byte per > char > and on the TStringStream I'm using to retrieve the data from the HttpCli > I > get mixed type chars. > All the letters (a..z, A..Z, 0..9 and some other chars) are being > retrived as 1 ASCII Byte except for some weird chars that are coming in > some > other format using more than 1 Byte (by more than 1 Byte I don't mean 2 > Bytes, I mean 2 or 3 Bytes depending on the case). Bellow I send you some > example strings taken directly from my application: > > What I get: > a história do municÃpio de .. estrela do agronegócio â?oprêmio é > acima de tudo o reconhecimento do jornalismo, com foco no cidadão, que > estamos fazendo. Ã? o resultado de um trabalho feito dentro de uma > empresa > pública de comunicaçãoâ?o > > What I was supposed to get: > a história do município de .. estrela do agronegócio "prêmio é acima > de > tudo o reconhecimento do jornalismo, com foco no cidadão, que estamos > fazendo. É o resultado de um trabalho feito dentro de uma empresa pública > de > comunicação" > > Note: The weird chars can come in 2 or 3 Bytes. The char " comes as 3 > Bytes (â?o). On the other hand the char É comes in 2 Bytes (Ã?). > Note2.: The texts are in Brazilian Portuguese. > > The question is: Is the problem on the TStringStream that for some > reason is returning some ASCII chars and some others UTF-8 chars? Or the > problem is that I missed some property of THttpCli making the retrieved > page > look so strange? Or the problem lies somewhere else far beyond my little > knowledge? > > Please help! :'( > > Best regards, > > Marcelo Grossi -- Rob Chafer Silverfrost -- To unsubscribe or change your settings for TWSocket mailing list please goto http://www.elists.org/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be -- To unsubscribe or change your settings for TWSocket mailing list please goto http://www.elists.org/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be