Hi!

>> I understand that, but I have no idea how to write proper rules for word
>> boundaries, I just want to tell it "give me word boundaries" but not by
>> saying createWordBoundaries() but by doing createIterator($type) where
>> $type == WORD_BOUNDARIES.
> 
> Why? This makes no sense to me. Why would createIterator(WORD_BOUNDARIES)  
> be better than BreakIterator::createWordInstance()? Especially in a  

Libraries, for example. Say I want to make a widget that allows the user
to display text wrapped, and give him option to wrap on words or
sentences or just on any character. I need underlying library that wraps
properly based on some value in config.

However, looking at the ICU API, I see the programmatic creation of
instances are all private, so there's no API access to that as far as I
can see. So I guess this one won't work out.

> I have no special love for it, but your statement is innacurate in one  
> aspect -- I've added a similar function in IntlCalendar... whose  
> implementation is basically the same:

Same goes for that one, then. We need to make the API consistent - if
those are useful, let's have them everywhere, if not - let's leave them
for Locale.

>> Another thing I notice here: why not make:
>> $bi = BreakIterator::createWordInstance(NULL);
>> $bi->setText($foo);
>>
>> into:
>> $bi = BreakIterator::createWordInstance(NULL, $foo);
>>
> 
> Two reasons:
> 
> * it encourages bad behavior, namely not reusing the BreakIterator objects.
> * that's not the ICU signature. If ICU in the future adds overloads with a  
> string in the second argument, we'll find ourselves with odd signatures.

99% of cases BreakIterator object will not be reused anyway, since the
code will be dealing with one text, doing its thing over it and the
forgetting about it. Of course, you can have bigger frameworks and
optimizations - the text parameter is optional, there just to capture
the most common case and avoid boilerplate code.

I think we need to think bigger than copying ICU signatures one to one.
PHP is not C and not Java, why PHP users should follow to the point what
C or Java API users do? PHP is no longer a tiny wrapper over C, most PHP
users never touched C and don't want to parse through ICU C docs to
figure out how stuff works. We need to make it one-stop shop.

> The BreakIterator cannot throws away text. You have to look at the rules  
> statuses. Example:
> 
> $text = 'This is a phrase... with some punctuation.';
> $bi = BreakIterator::createWordInstance(NULL);
> $bi->setText($text);
> foreach ($bi->getPartsIterator() as $v) {
>       if ($bi->getRuleStatus() > BreakIterator::WORD_NONE_LIMIT)
>               var_dump($v);
> }

Could we have internal status in PartsIterator object that would
abstract out such things and provide some API for common cases? Again,
right now description of these APIs is sorely missed - I for example
have no idea what they actually can do - e.g. what getRuleStatus() would
do?

> The ICU docs only say "Compute a hash code for this BreakIterator." If I'm  
> not mistaken from my quick glance at the source, it just returns the  
> length of the forward rules.

Why we need this function? What will be the use of it for a PHP user?

-- 
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to