On 06/20/2011 10:25 AM, John Crenshaw wrote:
Doing this with an explicit iterator object is a fine idea. The syntax
becomes something like:
foreach(new TextIterator($s, 'UTF8') as $pos=>$c)
{
...
}
On the other hand, I think that trying to support iteration without using an
iterator object to mediate would be a disaster, and I'm opposed to doing
something like that because:
1. The code just looks wrong. PHP developers are generally insulated from the
char-arrayness of strings. In addition, since PHP isn't typesafe, the code
becomes highly ambiguous. Is the code iterating an array, or a string? It is
very hard to tell just by looking. It may be convenient to write, but it's
certainly not convenient to read or maintain later. On the other hand, with a
mediating iterator object, the intent becomes obvious, and the code is highly
readable.
2. The odds of iterating any given string are slim at best. Supporting current,
key, next, etc. would require the string object internally to get bloated with
additional unnecessary data that is almost never used. This bloat isn't a
single int either. For optimal performance it would need to consist of no less
than two size_t (char position and binary position), and one encoding indicator.
3. Iteration cannot work without knowing which encoding to use for the string.
Is it UTF8? UTF16? UTF7? Binary or some single byte encoding? Some other exotic
wide encoding? Without an iterator object in the middle, there is no way to
specify this encoding. Always treating this as binary would also be a mistake,
since this is almost certainly never actually the correct behavior, even though
it may often appear to behave correctly with simple inputs.
4. I've had simple mistakes caught numerous times when foreach complains about
getting a scalar rather than an array. So far, it has been exactly right every
time. Allowing strings to be iterated would, in the name of convenience,
increase the probability of stupid mistakes evading detection. Even worse, the
code itself would look logically correct until the developer finally realizes
that they have a string and not an array. Errors like this are probably far
more common in most projects than the need to iterate a string, so making this
change hurts debugging in the common case, for the sake of syntactic sugar in
the rare case. Not a good trade.
John Crenshaw
Priacta, Inc.
I would echo John's statements here. foreach() directly iterating a
string is going to make my life substantially harder. I work in
array-heavy systems, and "bad first argument for foreach()" is already a
hard enough error to track down. It means "somewhere, somehow, you put
a string where you meant to put an array. GLWT." Adding automatic
string iteration would take away even that error message and leave me
with no way to figure out why my code is randomly misbehaving. Just
looking at the code, I would have no way of knowing that such a bug
lurks within. That's the downside of a weakly typed but still typed
language.
A proper iterator class, however, makes a great deal of sense. It could
be implemented user-space fairly easily, no doubt, but for strings of
any appreciable size (like the OP seems to be talking about for code
parsing) I suspect performance and memory usage would be far better if
implemented in C.
Whether it's a byte-based or character-set-sensitive-character-based
iterator... honestly I don't care as long as it's documented properly.
--Larry Garfield
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php