> >Here is an example, "re`sume`" takes 6 characters in Latin-1, but > >could take 8 characters in Unicode. All Perl functions that directly > >deal with character position and length will be sensitive to encoding. > >I wonder how we should handle this case. > > My first inclination is to force normalization on any data we manipulate. That was one of the reasons I proposed UTF-8 string encoding. If we don't do normalization (by keeping multiple encoding), we have to avoid using character position, string length, ord(), since they are encoding specific. Perl users will have to face all kinds of problem when they try to deal with individual characters. In any case, we need to make sure that regex not have any problems with normalization. Hong
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Uri Guttman
- PDD X: Perl API conventions Paolo Molaro
- Re: PDD X: Perl API conventions Damien Neil
- Re: PDD X: Perl API conventions Stephen P. Potter
- Re: PDD X: Perl API conventions Dan Sugalski
- Re: PDD X: Perl API conventions Paolo Molaro
- Re: PDD X: Perl API conventions Damien Neil
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Paolo Molaro
- Re: PDD 4: Internal data types Simon Cozens
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Buddha Buck
- Re: PDD 4: Internal data types Simon Cozens
- Re: PDD 4: Internal data types David Mitchell
- Re: PDD 4: Internal data types Dan Sugalski