Andrei, it was controlled by an ini setting. there are certain APIs that take or return offsets, so translation was done in those instances depending on the setting. Here's an example (it's not currently implemented this way, though..) since my concern was only the extension, i didnt touch the engine itself..
pardon the formatting.... <code> /* {{{ proto long BreakIterator::next([long offset]) */ static ZEND_BEGIN_ARG_INFO_EX(arginfo_breakiterator_next, 0, 0, 0) ZEND_ARG_INFO(0, offset) ZEND_END_ARG_INFO(); BREAKITERATOR_METHOD(next) { php_breakiterator_obj *obj = (php_breakiterator_object *)zend_object_store_get_object(getThis() TSRMLS_CC); BreakIterator *iter = (BreakIterator *)obj->ptr; UnicodeString *text = obj->text; long offset, result; if (0 == ZEND_NUM_ARGS()) { offset = (long)iter->next(); } else { long start = 0; if (FAILURE == zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "l", &start)) { return; } if (ICUG(codepoint_semantics)) { FROM_CODEPOINT_INDEX(text->getBuffer(), text->length(), start, offset); offset = (long)iter->next(offset); } else { offset = (long)iter->next(start); } } if (ICUG(codepoint_semantics)) { long result; TO_CODEPOINT_INDEX(text->getBuffer(), text->length(), offset, result); RETURN_LONG(result); } else { RETURN_LONG(offset); } } /* }}} */ </code> clayton "Andrei Zmievski" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > And this was controlled how and from where? > > -Andrei > > > On Aug 14, 2005, at 12:29 PM, <[EMAIL PROTECTED]> > <[EMAIL PROTECTED]> wrote: > >> Back in the early days of the extension, i had a request global >> ICUG(codepoint_semantics) which controlled this. Setting this to false >> would >> revert to code-unit indexing (which ICU does internally). >> >> clayton >> >> "Andrei Zmievski" <[EMAIL PROTECTED]> wrote in message >> news:[EMAIL PROTECTED] >> >>> >>> Then why don't we put our collective brains together and think of a >>> solution for this that does not involve hacks? >>> >>> -Andrei >>> >>> On Aug 14, 2005, at 3:51 AM, Derick Rethans wrote: >>> >>>> >>>> In quite some cases for me i'm sure there are no surrogates in the >>>> text >>>> I'm parsing. Having to deal with rescanning the string for every >>>> access to a character is not really wanted. >>>> >>>> Derick >>>> >>>> >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php >> -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php