This looks good.

Also, WRT the utf8_t, utf16_t, and utf32_t can we not just use utf32_t and
then mask off the lower 8 or 16 bits?  We can still have utf8_t be defined
as char to allow sizeof to work right and we can do sizeof(utf8_t)*2 to get
the utf16_t's size. 

-----Original Message-----
From: Tom Hughes
To: [EMAIL PROTECTED]
Sent: 10/8/2001 6:51 PM
Subject: RE: Transcoding patch

In message
<[EMAIL PROTECTED]>
          Gibbs Tanton - tgibbs <[EMAIL PROTECTED]> wrote:

> This is good, unless someone has objections I'll commit this.
However, we
> also need the ability to do unicode in the assembler (I'll do this
later
> today if no one beats me to it), and we need some way to communicate
the
> encoding number between the C and the Perl code.

The attached patch solves the assembler issue by allowing quoted
strings to be prefixed with U8, U16 or U32 to indicate a unicode
string of the appropriate type, so:

  set_s_sc S1, U8"Hello World"

creates a UTF-8 string in S1 containg the specified data. I don't
particularly like that syntax so if anybody has any better ideas
then please say... Most of the patch is useful whatever the syntax
though - it will just need tweaking to recognise the appropriate
syntax.

The patch also adds support for \x escapes in strings as it is
difficult to write unicode string constants without that.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/
 <<utfstr.patch>> 

Reply via email to