Re: [dev] off topic - awk versions performance comparison

Kris Maglione Wed, 11 Aug 2010 04:11:33 -0700

On Wed, Aug 11, 2010 at 04:01:27AM -0700, Robert Ransom wrote:

On Wed, 11 Aug 2010 06:41:30 -0400
Kris Maglione <maglion...@gmail.com> wrote:
On Wed, Aug 11, 2010 at 06:14:55AM -0400, Joseph Xu wrote:
> Hi everyone,
>
> I was playing around with awk and ran into some surprising data for the> function match(s, r), which returns the position of the first match of> regular expression r in string s. Here's the test I ran:
>
> $ yes | head -10000 | tr -d '\n' >/tmp/big
> $ yes | head -1000000 | tr -d '\n' >/tmp/bigger
> $ time awk '{for (i=1; i < 100; i++) { match($0, "y")} }' /tmp/big
>
> real    0m0.056s
> user    0m0.053s
> sys     0m0.000s
> $ time awk '{for (i=1; i < 100; i++) { match($0, "y")} }' /tmp/bigger
>
> real    0m5.695s
> user    0m5.140s
> sys     0m0.553s
>
> The difference is almost exactly 100x, which is the size difference> between the input files. It seems ridiculous that the amount of time> taken to match the first character of a string grows linearly with the> size of the string. The time it takes to load the contents of the file> does not contribute significantly to this increase.
You don't make sense. The second test performs the match twoorders of magnitude more times, so it should be two orders ofmagnitude slower. It's not comparing a single string that is 100times as long, but rather running the same test 100 times foreither 10⁴ or 10⁶ identical strings.
No, he stripped out the newlines with tr.

He might have, if he used GNU tr. However, in that case, myresults are nowhere as dramatic as his.


--
Kris Maglione

There are two ways to write error-free programs; only the third works.
        --Alan J. Perlis

Re: [dev] off topic - awk versions performance comparison

Reply via email to