James Howard scribbled this message on Jul 29: > On Thu, 29 Jul 1999, Tim Vanderhoek wrote: > > > fgetln() does a complete copy of the line buffer whenever an > > excessively long line is found. On this point, it's hard to do better > > without using mmap(), but mmap() has its own disadvantages. My last > > suggestion to James was to assume a worst case for long lines and mark > > the worst worst case with an XXX "this is unfortunate". > > <warning type="Anything said here wrong is my fault, not DES's"> > > DES tells me he has a new version (0.10) which mmap()s. It supposedly > cuts the run time down significantly, I do not have the numbers in front > of me. Unfortunetly he has not posted this version yet so I cannot > download it and run it myself. He also says that if mmap fails, he drops > back to stdio. This should only happen in the NFS case, the > 2G case, > etc. > > </warning> > > > [Never mind that it should be spending near 100% of its time in > > procline...that just means he's still got work to do... :-] > > I'd rather see it spending 100% of its time in regexec(), then I can just > blame Henry Spencer :) > > Someone said there was new regex code out, is this true? Can anyone with > a copy test grep with it?
ok, I just made a patch to eliminate the copy that was happening in procfile, and it sped up a grep of a 5meg termcap from about 2.9sec down to .6 seconds... this includes time spent profiling the program.. GNU grep w/o profiling only takes .15sec so we ARE getting closer to GNU grep... it was VERY simple to do... and attached is the patch... this uses the option REG_STARTEND to do what the copy was trying to do... all of the code to use REG_STARTEND was already there, it just needed to be enabled.. enjoy! -- John-Mark Gurney Voice: +1 541 684 8449 Cu Networking P.O. Box 5693, 97405 "The soul contains in itself the event that shall presently befall it. The event is only the actualizing of its thought." -- Ralph Waldo Emerson
diff -u grep-0.10.orig/util.c grep-0.10/util.c --- grep-0.10.orig/util.c Thu Jul 29 05:00:15 1999 +++ grep-0.10/util.c Thu Jul 29 16:38:06 1999 @@ -93,7 +93,6 @@ file_t *f; str_t ln; int c, t, z; - char *tmp; if (fn == NULL) { fn = "(standard input)"; @@ -119,13 +118,8 @@ initqueue(); for (c = 0; !(lflag && c);) { ln.off = grep_tell(f); - if ((tmp = grep_fgetln(f, &ln.len)) == NULL) + if ((ln.dat = grep_fgetln(f, &ln.len)) == NULL) break; - ln.dat = grep_malloc(ln.len + 1); - memcpy(ln.dat, tmp, ln.len); - ln.dat[ln.len] = 0; - if (ln.len > 0 && ln.dat[ln.len - 1] == '\n') - ln.dat[--ln.len] = 0; ln.line_no++; z = tail; @@ -133,7 +127,6 @@ enqueue(&ln); linesqueued++; } - free(ln.dat); c += t; } if (Bflag > 0) @@ -174,7 +167,8 @@ pmatch.rm_so = 0; pmatch.rm_eo = l->len; for (c = i = 0; i < patterns; i++) { - r = regexec(&r_pattern[i], l->dat, 0, &pmatch, eflags); + r = regexec(&r_pattern[i], l->dat, 0, &pmatch, + eflags | REG_STARTEND); if (r == REG_NOMATCH && t == 0) continue; if (wflag && r == 0) {