Re: [twsocket] pop3, buffer and character encoding

Zvone Fri, 02 Jul 2010 09:56:39 -0700

> However, as long as this codepage was one of the windows-xyz,
> single byte character sets converting back to Ansi with the
> same codepage should work without data loss and give you back
> the raw bytes (hopefully). This won't work, for example, with
> Japanese locale settings.


But there is a string type for that purpose. It is called
RawByteString. It is defined as AnsiString($ffff) which in effect
means it is an ansistring with no encoding attached to it so you can
use it to transfer data from functions and avoid codepage conversions.

Yes, by default it uses default system code page for conversions to Unicode.

RawByteString is a single-byte character type but unlike AnsiString it
does not have a specific encoding attached to it. So that means it can
be used to pass values to and from functions that will do
UnicodeConversions. It is not indended to be used for storing data,
just mostly for input/output of functions as the official
documentation specify.

So my best bet is that it would be the best to receive raw byte buffer
(unsigned char or BYTE type) and then place it into RawByteString and
return that value. This should avoid conversions.

In your own functions you can cast RawByteString as input type and use
conversion functions to convert from RawByteString to any codepage you
like (or store it as binary data). There are some functions that do
this I think the ones you need are SetCodePage() and StringCodePage().

Other than that, AnsiString can be defined in various codepages for
example you can declare a
typedef AnsiStringT<28591> Latin1String; and store data in
Latin1String type - this will ensure that the codepage conversions are
always in identical codepage and not dependent on the system code
page.But I think RawByteString is better for this purpose.

Both AnsiString and RawByteString are single-byte types unlike
UnicodeString, UTF8String and others which are multi-byte.

Take a look here:
http://www.micro-isv.asia/2008/08/using-rawbytestring-effectively/

> Note that in MIME headers 8-bit characters are not allowed!
> Email clients must encode 8-bit characters in header lines
> properly. MIME text parts might include 8-bit characters with
> a charset specified in the content-type header, in those cases
> the (AnsiString) text content has to be converted to Unicode
> with the charset specified.

Yes, no 8-bit in header but you still have to receive that buffer and
as far as I know there is no way to get the buffer contents directly
at the moment.
--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

Re: [twsocket] pop3, buffer and character encoding

Reply via email to