On Wed, Aug 11, 2010 at 03:06:13PM -0400, Joseph Xu wrote:
I was using GNU tr. The input files were single lines with 10000 or 1000000 y's, so I was doing 100 matches in each case (from the for loop) on the same line. I guess I should have made that more explicit, sorry. I'm interested in what results you're getting.

Sorry, I get tetchy when I haven't had enough sleep:

0.00u 0.00s 0.00r        nawk {for (i=1; i < 100; i++) { match($0, "y")} } big
0.01u 0.00s 0.02r        nawk {for (i=1; i < 100; i++) { match($0, "y")} } 
bigger
0.08u 0.00s 0.08r        gawk {for (i=1; i < 100; i++) { match($0, "y")} } big
7.80u 0.03s 8.08r        gawk {for (i=1; i < 100; i++) { match($0, "y")} } 
bigger
0.04u 0.00s 0.05r        /usr/local/plan9/bin/awk {for (i=1; i < 100; i++) { match($0, 
"y")} } big
5.00u 0.01s 5.20r        /usr/local/plan9/bin/awk {for (i=1; i < 100; i++) { match($0, 
"y")} } bigger

nawk is one-true-awk from FreeBSD. I find the results strange, namely because Plan 9's awk is also one-true-awk. It also produces reasonable results with "^y" instead of "y", while gawk doesn't. I know that nawk uses a combination NFA/DFA, but I see that Plan 9's awk instead pre-process the expression and uses Plan 9's pure-NFA engine. My guess is that, because they're greedy algorithms, they both traverse the entire string looking for a possibly longer match, but my understanding of Plan 9's altorithm was otherwise. Looking at gawk, it also uses a DFA algorithm, so I really can't explain the shortcoming, especially when the anchor is used, but I refuse to look deeper because gawk is written to GNU coding standards and I value my sanity.

As for my previous result, I have a shell function defined that replaces awk with nawk, and it seems that I called awk rather than gawk explicitly.

Oh, and looking at my sig, it's serenditipously by BWK...

--
Kris Maglione

As we said in the preface to the first edition, C "wears well as one's
experience with it grows." With a decade more experience, we still
feel that way.
        --Brian Kernighan and Dennis Ritchie


Reply via email to