This looks good. Also, WRT the utf8_t, utf16_t, and utf32_t can we not just use utf32_t and then mask off the lower 8 or 16 bits? We can still have utf8_t be defined as char to allow sizeof to work right and we can do sizeof(utf8_t)*2 to get the utf16_t's size.
-----Original Message----- From: Tom Hughes To: [EMAIL PROTECTED] Sent: 10/8/2001 6:51 PM Subject: RE: Transcoding patch In message <[EMAIL PROTECTED]> Gibbs Tanton - tgibbs <[EMAIL PROTECTED]> wrote: > This is good, unless someone has objections I'll commit this. However, we > also need the ability to do unicode in the assembler (I'll do this later > today if no one beats me to it), and we need some way to communicate the > encoding number between the C and the Perl code. The attached patch solves the assembler issue by allowing quoted strings to be prefixed with U8, U16 or U32 to indicate a unicode string of the appropriate type, so: set_s_sc S1, U8"Hello World" creates a UTF-8 string in S1 containg the specified data. I don't particularly like that syntax so if anybody has any better ideas then please say... Most of the patch is useful whatever the syntax though - it will just need tweaking to recognise the appropriate syntax. The patch also adds support for \x escapes in strings as it is difficult to write unicode string constants without that. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ <<utfstr.patch>>