Re: Grep frustration

Jay Savage Fri, 10 Mar 2006 14:05:56 -0800

On 3/9/06, Tom Phoenix <[EMAIL PROTECTED]> wrote:
> On 3/9/06, Brian McKee <[EMAIL PROTECTED]> wrote:
>
> > What is a pattern match good for if it isn't for finding a substring
> > in a string?
>
> That's a fair question. A pattern match finds a match for a pattern,
> not a substring. Patterns can have metacharacters, they can be
> case-insensitive, they can be anchored, they can save data in memory
> variables like $3. But index() looks for a matching identical
> substring, and that's all. No metacharacters to worry about.
>
> Because it's a simpler operation, using index() can be faster than the
> corresponding pattern match. (Then again, maybe not: A lot of work has
> gone into optimizing Perl's regular expression engine.) But speed
> isn't the main reason to choose index(); it's clarity. It's a simpler
> operation to understand than an escaped pattern match, so I usually
> (but not always) go with index().
>
> Thanks for asking!
>
> --Tom Phoenix
> Stonehenge Perl Training


I've actually found it depends partly on architecture, too; the regex
engine seems better optimized on some platforms than others. I was
quite surprised once when benchmarking a script on a fairly modern OS
X/PPC machine (750MHz CRT iMac) and an ancient Linux box (166MHz PII
Dell Dimension XPS PII running SuSE 9.1)--with both machines running
5.8.6--that the regex solution to the problem I was working on ran
faster on the the Dell, but the index version ran faster on the Mac.

The issue seemed to be the number of function calls. index beat the
pants off m// on both machines for finding literals, of course, but
once I had to perform two indexes, combining them both into a single
regex ran faster on one machine and was only about 1% slower on the
other.

The relevance here is that once your looking for several things, I'd
at least want to bechmark the regex, especially since we're firing up
the regex engine for the first match anyway.

My advice to the OP would be to benchmark all three of the following
and see which comes out on top for him:

    grep /_${feed_date}_.*?\.wav\z/o, @dir_files;
    grep ((index($_, $feed_date) != -1) && (index($_, ".wav") != -1)),
@dir_files;
    grep /\.wav\z/ && (index($_, $feed_date) != -1), @source_list;
    grep { /_${feed_date}_/ && /\.wav$/ } @dir_files;


-- jay
--------------------------------------------------
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.dpguru.com  http://www.engatiki.org

values of β will give rise to dom!

Re: Grep frustration

Reply via email to