> However, as long as this codepage was one of the windows-xyz, > single byte character sets converting back to Ansi with the > same codepage should work without data loss and give you back > the raw bytes (hopefully). This won't work, for example, with > Japanese locale settings.
But there is a string type for that purpose. It is called RawByteString. It is defined as AnsiString($ffff) which in effect means it is an ansistring with no encoding attached to it so you can use it to transfer data from functions and avoid codepage conversions. Yes, by default it uses default system code page for conversions to Unicode. RawByteString is a single-byte character type but unlike AnsiString it does not have a specific encoding attached to it. So that means it can be used to pass values to and from functions that will do UnicodeConversions. It is not indended to be used for storing data, just mostly for input/output of functions as the official documentation specify. So my best bet is that it would be the best to receive raw byte buffer (unsigned char or BYTE type) and then place it into RawByteString and return that value. This should avoid conversions. In your own functions you can cast RawByteString as input type and use conversion functions to convert from RawByteString to any codepage you like (or store it as binary data). There are some functions that do this I think the ones you need are SetCodePage() and StringCodePage(). Other than that, AnsiString can be defined in various codepages for example you can declare a typedef AnsiStringT<28591> Latin1String; and store data in Latin1String type - this will ensure that the codepage conversions are always in identical codepage and not dependent on the system code page.But I think RawByteString is better for this purpose. Both AnsiString and RawByteString are single-byte types unlike UnicodeString, UTF8String and others which are multi-byte. Take a look here: http://www.micro-isv.asia/2008/08/using-rawbytestring-effectively/ > Note that in MIME headers 8-bit characters are not allowed! > Email clients must encode 8-bit characters in header lines > properly. MIME text parts might include 8-bit characters with > a charset specified in the content-type header, in those cases > the (AnsiString) text content has to be converted to Unicode > with the charset specified. Yes, no 8-bit in header but you still have to receive that buffer and as far as I know there is no way to get the buffer contents directly at the moment. -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be