Re: Treating a split() as an array

Rob Dixon Sat, 03 Mar 2007 19:35:57 -0800

Jay Savage wrote:
>
> On 3/3/07, John W. Krahn <[EMAIL PROTECTED]> wrote:
>>
>> Jay Savage wrote:
>>>
>>> On 3/2/07, Robert Boone <[EMAIL PROTECTED]> wrote:
>>>>
>>>> I think this is all you do:
>>>>
>>>> $piid = (split(/\t/, $row))[0];
>>>
>>> Split also takes an optional limit that keeps it from splitting the
>>> string into more than n parts. This keeps spilt from performing
>>> useless operations when you only want the first n-1 items, or when you
>>> want to lump all the items >= n into a single lump:
>>>
>>>    $piid = (split(/\t/,$row,2)[0];
>>>
>>> Most of the time it probably doesn't matter, but adding a limit will
>>> be markedly more efficient if $row is particularly long or you are
>>> looping through an extremely long list of rows.
>>>
>>> As always, see perldoc -f split for the details.
>>
>> perldoc -f split
>>
>> [ snip ]
>>
>>         The LIMIT parameter can be used to split a line partially
>>
>>             ($login, $passwd, $remainder) = split(/:/, $_, 3);
>>
>>         When assigning to a list, if LIMIT is omitted, or zero, Perl
>>         supplies a LIMIT one larger than the number of variables in the
>>         list, to avoid unnecessary work.  For the list above LIMIT would
>>         have been 4 by default. In time critical applications it behooves
>>         you not to split into more fields than you really need.
>>
>> The limit is supplied automagically if the size of the list is know at
>> compile time like in your example above so using the limit argument is
>> superfluous.
>
> I read the doc to say that, given a list of size n, perl will perform
> n + 1 splits by default.


Almost. It will perform n splits, resulting in n+1 pieces. The document says
earlier on:

    If LIMIT is specified and positive, it represents the maximum
    number of fields the EXPR will be split into, though the actual
    number of fields returned depends on the number of times PATTERN
    matches within EXPR.

> limit has diminishing returns as n increases, but for a list of length one,
> not supplying a limit means double the work, since (n + 1) = 2n when n is 1.

Not sure what you mean here. The returns diminish as n approaches the number
of separators in the string, which is - sort of - as n increases. But when the
list has length one the default behaviour is to split into two pieces. What
work is doubled there? Were you thinking it would split twice into three
pieces? Why?

> Furthermore, it's not clear to me what the default limit is in the
> case of a slice. Consider
>
>    $piid = (split(/\t/,$row)[-1];
>    $piid = (split(/\t/,$row)[4];

Since the result of split() isn't being assigned to a list here, the exerpt
from perldoc doesn't apply. It's possible that Perl also optimises in the case
of a list slice with constant indices, but I doubt it. The rule here is

    If LIMIT is unspecified or zero, trailing null fields are stripped
    (which potential users of "pop" would do well to remember).

This implies that the string is split at all field separators found (so as to
be able to omit trailing null fields) and is the behaviour I would have
expected.

> It seems to me that in the case of a slice, split must split the
> entire string, and then return the appropriate element. Wanting a
> single element and wanting the first element are two differnt things.
>
> Maybe the compiler optimizes for the case of a slice with index [0]?
>
> It may, but it's not obvious to me from the docs that it does.

It would be quite possible for it to optimise for any maximum constant index,
such as

$piid = (split(/\t/,$row)[4];

which would use a limit value of 6. But, as I said, I doubt if it does that.

Instead of supplying an explicit value for the limit, the way to get the
default behaviour to work in this instance is to write

  ($piid) = split /\t/, $row;

which would be optimised to

  ($piid) = split /\t/, $row, 2;

as is documented by perldoc.

HTH,

Rob



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Treating a split() as an array

Reply via email to