Hi! > I've wrapped ICU's BreakIterator and RuleBasedBreakIterator. I stopped > short of adding a procedural interface. I think there's a larger > expectation of a having an OOP interface when working with iterators. > What do you think? If there's no procedural interface, I'll change the > instances of zend_parse_methods to zpp for performance.
Nice! I remember we had TextIterator in PHP 6, IIRC that was the reason BreakIterator never found its way into intl. > BreakIterator also exposes other native methods: > getAvailableLocales(), getLocale() and factory methods to build > several predefined types of BreakIterators: createWordInstance() > for word boundaries, createCharacterInstance() for locale > dependent notions of "characters", createSentenceInstance() for > sentences, createLineInstance() and createTitleInstance() -- for > title casing breaks. These factories currently return One thing I notice here is that with this API it is not possible to programmatically choose what is the iteration unit - you'd have to do a switch for that. Do you think it may be a good idea to have a generic function that allows to choose the unit programmatically? What is the notion of characters - is it grapheme characters? Is there option to iterate over code points too - not sure if it's useful just curious, as we used to have it in PHP 6 IIRC. About getAvailableLocales() - what this actually does? Does it list all avaliable locales in the system, ones that have BreakIterator rules, or something else? If it's not related to BI, I'm not sure we need to have it in BI. What is the intended usage of it? Maybe it should be part of Locale class? > Note that BreakIterator is an iterator only in the sense of the > first 'Iterator' in 'IteratorIterator', i.e., it does not > implement the Iterator interface. The reason is that there is > no sensible implementation for Iterator::key(). Using it for Doesn't it have a notion of current position? If so, key should be the current position. Will this BreakIterator be usable in foreach? I'm not sure I understand it from this description - understanding this without any usage examples, RFCs or code snippets for intended usage is really hard and I think we should really start with doing that. I would expect this class to work like this: foreach(BreakIterator::createWordInstance("blah blah blah") as $i => $word) { echo "Word number $i is $word\n"; } or at least like this: foreach(BreakIterator::createWordInstance("blah blah blah") as $i => $word) { echo "Next word at position $i is: $word\n"; } Is it the model? If not, I think we need to wrap the C API to make this possible, because this is what people expect in PHP from the iterator. > Finally, I added a convenience method to BreakIterator: > getPartsIterator(). This provides an IntlIterator, backed > by the BreakIterator PHP object (i.e. moving the pointer or > changing the text in BreakIterator affects the iterator > and also moving the iterator affects the backing BreakIterator), > which allows traversing the text between each boundary. How that text is being traversed - by code points/characters/graphemes/bytes? -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227 -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php