On Sat, Aug 14, 2010 at 12:07 PM, Kris Maglione <maglion...@gmail.com> wrote: > On Wed, Aug 11, 2010 at 03:06:13PM -0400, Joseph Xu wrote: >> >> I was using GNU tr. The input files were single lines with 10000 or >> 1000000 y's, so I was doing 100 matches in each case (from the for loop) on >> the same line. I guess I should have made that more explicit, sorry. I'm >> interested in what results you're getting. > > Sorry, I get tetchy when I haven't had enough sleep: > > 0.00u 0.00s 0.00r nawk {for (i=1; i < 100; i++) { match($0, "y")} } > big > 0.01u 0.00s 0.02r nawk {for (i=1; i < 100; i++) { match($0, "y")} } > bigger > 0.08u 0.00s 0.08r gawk {for (i=1; i < 100; i++) { match($0, "y")} } > big > 7.80u 0.03s 8.08r gawk {for (i=1; i < 100; i++) { match($0, "y")} } > bigger > 0.04u 0.00s 0.05r /usr/local/plan9/bin/awk {for (i=1; i < 100; i++) { > match($0, "y")} } big > 5.00u 0.01s 5.20r /usr/local/plan9/bin/awk {for (i=1; i < 100; i++) { > match($0, "y")} } bigger > > nawk is one-true-awk from FreeBSD. I find the results strange, namely > because Plan 9's awk is also one-true-awk. It also produces reasonable > results with "^y" instead of "y", while gawk doesn't. I know that nawk uses > a combination NFA/DFA, but I see that Plan 9's awk instead pre-process the > expression and uses Plan 9's pure-NFA engine. My guess is that, because > they're greedy algorithms, they both traverse the entire string looking for > a possibly longer match, but my understanding of Plan 9's altorithm was > otherwise.
I think one of the very few differences between Plan 9's awk and bwk's awk is UTF-8 support, not sure why that would make a difference, but I thought they were mostly identical otherwise. uriel > Looking at gawk, it also uses a DFA algorithm, so I really can't > explain the shortcoming, especially when the anchor is used, but I refuse to > look deeper because gawk is written to GNU coding standards and I value my > sanity. > > As for my previous result, I have a shell function defined that replaces awk > with nawk, and it seems that I called awk rather than gawk explicitly. > > Oh, and looking at my sig, it's serenditipously by BWK... > > -- > Kris Maglione > > As we said in the preface to the first edition, C "wears well as one's > experience with it grows." With a decade more experience, we still > feel that way. > --Brian Kernighan and Dennis Ritchie > > >