> struct perl_string {
> void *string_buffer;
> UV length;
> UV allocated;
> UV flags;
> }
>
> The low three bits of the flags field is reserved for the type of the
> string. The various types are:
>
> =over 4
>
> =item BINARY (0)
>
> =item ASCII (1)
>
> =item EBCDIC (2)
>
> =item UTF_8 (3)
>
> =item UTF_32 (4)
>
> =item NATIVE_1 (5) through NATIVE_3 (7)
Some thoughts about string encoding. Because Unicode normalization
and canonical equivalent, some characters that take one codepoint
in one encoding may take two or more codepoints in another encoding,
mainly vowels with diacritics. In that sense, the substr() may give
different results depending on its current encoding.
Here is an example, "re`sume`" takes 6 characters in Latin-1, but
could take 8 characters in Unicode. All Perl functions that directly
deal with character position and length will be sensitive to encoding.
I wonder how we should handle this case.
Hong
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Uri Guttman
- PDD X: Perl API conventions Paolo Molaro
- Re: PDD X: Perl API conventions Damien Neil
- Re: PDD X: Perl API conventions Stephen P. Potter
- Re: PDD X: Perl API conventions Dan Sugalski
- Re: PDD X: Perl API conventions Paolo Molaro
- Re: PDD X: Perl API conventions Damien Neil
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Paolo Molaro
- Re: PDD 4: Internal data types Simon Cozens
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Buddha Buck
- Re: PDD 4: Internal data types Simon Cozens
