Re: [twsocket] HttpCli UTF-8 Coding Issue

Arno Garrels Fri, 21 Jul 2006 07:02:41 -0700

Robert Chafer wrote:
> the first 7 bits of UTF-8 are ASCII, it uses the top 128 characters to
> represent all the other Unicode characters.  Take a look at the JEDI
> library they have converters.


This easy to understand article may help as well:
http://www.joelonsoftware.com/articles/Unicode.html

---
Arno Garrels [TeamICS]
http://www.overbyte.be/eng/overbyte/teamics.html


> 
> On Fri, 21 Jul 2006 10:25:17 -0300, you wrote:
> 
>>  Thank you all for your answers,
>> 
>>     I found out the error. It was, as probably most of you realized
>> so far,  me! : ) I read the UTF-8 specs on Wiki and it says clearly
>> to my face: "uses  up to 4 bytes per character depending on the
>> character ...". Dunno how I  missed that ..
>>      So, what I have to do now is find a UTF-8 to ASCII converter (by
>>  aproximation of course) or build one (wich I was already doing).
>> Anyways,  thanks to all of you folks that took some time to answer
>> me! 
>> 
>>  Really apreciate it!
>> 
>>  Marcelo Grossi
>> 
>>  ----- Original Message -----
>>  From: "Francois PIETTE" <[EMAIL PROTECTED]>
>>  To: "ICS support mailing" <twsocket@elists.org>
>>  Sent: Friday, July 21, 2006 4:44 AM
>>  Subject: Re: [twsocket] HttpCli UTF-8 Coding Issue
>> 
>> 
>>  >> With HTTP component, you always get the data exactly as the
>> server sent  >> it. HTTP component does do any processing on the
>> data itself. It is  >> stored
>>  >> as is in the stream you provide for storage.
>> 
>>  >    Then how come Mozilla Firefox doesn´t have this weird char
>> problem? 
>> 
>>  Firefox is much more than a HTTP component. It has an engine which
>> interpret  the document AND the header sent by the server.
>> 
>>  > I just used a TMemoryStream instead of using my old TStringStream,
>>  > debugged
>>  > the contents of the Buffer and it is as buggy as it was.
>> 
>>  How do you know it is buggy ? I'm sure the problem is that you don't
>>  interpret the data as it is encoded. There are many many ways to
>> represent  characters. Not only speaking about the code used (one
>> byte, two bytes,  multiple bytes, varying number of bytes) but also
>> character sets (mapping  between a given code and the character
>> "image"). 
>> 
>>  >    How come the server is sending me something and the browser
>> something  > else?
>> 
>>  The browser doesn't send anything. The browser interpret what the
>> server  sent.
>>  It may happend that the server doesn't send the same thing to your
>> program  than it sends to the browser. Why ? Because a HTTP request
>> is composed of an  URL but also a header with many kind of
>> informations the client give to help  the server send the correct
>> content. 
>> 
>>  Use a sniffer to compare the request the browser send (pay
>> attention to the  header lines) and what the server returns. Build
>> the same request with the  HTTP component and verify that the server
>> send the exact same content (it  will for sure if the request is the
>> same in all details). 
>> 
>> 
>>  > Because I trully don't believe that Mozilla Firefox is parsing
>>  > that kind of data. It even doesn't respect the same amount of
>> bytes per  > char
>>  > ...). I don't get it.. Me stupid!!! 8/
>> 
>>  I'm sure the browser parse the data and the header to show you the
>> correct  page.
>> 
>>  Contribute to the SSL Effort. Visit
>> http://www.overbyte.be/eng/ssl.html  --
>>  [EMAIL PROTECTED]
>>  http://www.overbyte.be
>> 
>> 
>>  --
>>  To unsubscribe or change your settings for TWSocket mailing list
>>  please goto http://www.elists.org/mailman/listinfo/twsocket
>>  Visit our website at http://www.overbyte.be
> --
> 
> Rob Chafer
> Silverfrost
-- 
To unsubscribe or change your settings for TWSocket mailing list
please goto http://www.elists.org/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

Re: [twsocket] HttpCli UTF-8 Coding Issue

Reply via email to