Or use the breakiterator function. Also allows for locale differences and some special cases. http://icu.sourceforge.net/apiref/icu4c/classBreakIterator.html#_details
Although we might prefer it to be locale independent. Separately, I was wondering if we should create a variation of the explode function that splits strings into characters representing whole graphemes (base+combining characters). > -----Original Message----- > From: Andrei Zmievski [mailto:[EMAIL PROTECTED] > Sent: Monday, August 22, 2005 10:12 AM > To: Rolland Santimano > Cc: 'PHP Developers Mailing List' > Subject: [PHP-DEV] Re: strrev() impl - Re: [PHP-DEV] Re: > [PHP-CVS] cvs: php-src /ext/standard string.c > > > Not quite. The base characters have a combining class of 0, while for > the combining ones that value is > 0 but not necessarily the > same. For > example: > > a +ˉ + ˳ (0 + 230 + 220) > > So the code needs to capture sequences starting with class of 0 and > followed by one or more chars with class > 0: > > int32_t prev; /* Last class transition */ > uint8_t class = 0; > > while ( /* iterate backward over string */ ) { > while (u_getCombiningClass(codept) > 0) { > /* Get 'next' codept */ > } > /* Copy codepts from 'next' to 'prev' */ > prev = current point of iteration; > } > > > -Andrei > > On Aug 22, 2005, at 3:48 AM, Rolland Santimano wrote: > > > OK, so I guess the code should track the combining class > and copy out > > chunks of codepoints with the same class, something like: > > > > int32_t prev; /* Last class transition */ > > uint8_t class = 0; > > > > while ( /* iterate backward over string */ ) { > > while (u_getCombiningClass(codept) == class) { > > /* Get 'next' codept */ > > } > > /* Copy codepts from 'next' to 'prev' */ > > } > > > > Is that correct ? > > > > -- > > Rolland > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > > > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php