One more comment- We should take into account that most data will not use 
combining characters and should optimize for that case.

Most text will consist of solely characters with combining class = 0.

We can therefore scan backwards, copying characters while cc=0.

Only if we see a non-zero cc do we need to do anything special for combining 
chars.
In that case you can use the breakiterator to continue, of if you prefer to do 
it on your own,
keep scanning over characters with cc<>0 until the next character with cc=0, 
and then copy that one base character and its trailing combining chars to the 
end of the result string.
Repeat until the beginning of the string.

Tex Texin
Internationalization Architect,   Yahoo! Inc.
 
 


> -----Original Message-----
> From: Tex Texin [mailto:[EMAIL PROTECTED] 
> Sent: Monday, August 22, 2005 5:14 PM
> To: 'Andrei Zmievski'; 'Rolland Santimano'
> Cc: 'PHP Developers Mailing List'
> Subject: RE: [PHP-DEV] Re: strrev() impl - Re: [PHP-DEV] Re: 
> [PHP-CVS] cvs: php-src /ext/standard string.c 
> 
> 
> Or use the breakiterator function. Also allows for locale 
> differences and some special cases. 
> http://icu.sourceforge.net/apiref/icu4c/classBreakIterator.htm
> l#_details
> 
> Although we might prefer it to be locale independent.
> 
> Separately, I was wondering if we should create a variation 
> of the explode function that splits strings into characters 
> representing whole graphemes (base+combining characters).
> 
> 
> > -----Original Message-----
> > From: Andrei Zmievski [mailto:[EMAIL PROTECTED]
> > Sent: Monday, August 22, 2005 10:12 AM
> > To: Rolland Santimano
> > Cc: 'PHP Developers Mailing List'
> > Subject: [PHP-DEV] Re: strrev() impl - Re: [PHP-DEV] Re: 
> > [PHP-CVS] cvs: php-src /ext/standard string.c 
> > 
> > 
> > Not quite. The base characters have a combining class of 0, 
> while for
> > the combining ones that value is > 0 but not necessarily the 
> > same. For 
> > example:
> > 
> >    a +ˉ + ˳  (0 + 230 + 220)
> > 
> > So the code needs to capture sequences starting with class of 0 and
> > followed by one or more chars with class > 0:
> > 
> > int32_t prev; /* Last class transition */
> > uint8_t class = 0;
> > 
> > while ( /* iterate backward over string */ ) {
> >      while (u_getCombiningClass(codept) > 0) {
> >             /* Get 'next' codept */
> >      }
> >      /* Copy codepts from 'next' to 'prev' */
> >      prev = current point of iteration;
> > }
> > 
> > 
> > -Andrei
> > 
> > On Aug 22, 2005, at 3:48 AM, Rolland Santimano wrote:
> > 
> > > OK, so I guess the code should track the combining class
> > and copy out
> > > chunks of codepoints with the same class, something like:
> > >
> > > int32_t prev; /* Last class transition */
> > > uint8_t class = 0;
> > >
> > > while ( /* iterate backward over string */ ) {
> > >     while (u_getCombiningClass(codept) == class) {
> > >        /* Get 'next' codept */
> > >     }
> > >     /* Copy codepts from 'next' to 'prev' */
> > > }
> > >
> > > Is that correct ?
> > >
> > > --
> > > Rolland
> > 
> > --
> > PHP Internals - PHP Runtime Development Mailing List
> > To unsubscribe, visit: http://www.php.net/unsub.php
> > 
> > 
> > 
> > 
> 
> -- 
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
> 
> 
> 
> 

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to