Your bug report
#44754: Extreme performance degradation in GNU grep 3.4 / 3.6
which was filed against the grep package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 44...@debbugs.gnu.org.
--
44754: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=44754
GNU Bug Tracking System
Contact help-debb...@gnu.org with problems
--- Begin Message ---
On Thu, Nov 26, 2020 at 9:03 AM Jim Meyering <j...@meyering.net> wrote:
>
> On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering <j...@meyering.net> wrote:
> > Thank you for the fine bug report.
> > The grep-3.6 bug you've exposed is due to the fact that your input
> > triggers excessive hash collisions when using the code modeled after
> > gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase
> > take O(N^2) time for N patterns. In the attached, I've switched grep
> > to use the djb2 hash function, and that resolves the problem. I'll
> > also add a NEWS entry and a test before pushing this.
>
> Timings suggest that grep-3.6's preprocessing came closer to O(N^3).
> Here's an example that would take 2-3 days with grep-3.6 and only
> seconds with this fix:
>
> : | grep -Ff <(seq 6400000 | tr 0-9 A-J)
>
> Here's a complete patch.
> I'll push it later today.
Pushed along with two gnulib-related changes.
--- End Message ---
--- Begin Message ---
Hi,
I have a use case where I run grep with a large number of search
patterns on a large text file. It works well with grep-3.3, but with
grep-3.4 it quickly burned through GBs of memory and almost locked
up my system due to swapping.
To avoid attaching those large files, I could mostly reproduce the
effects like this:
ulimit -d 5000000 # avoid system lockup due to excessive swapping
export LC_ALL=C # make sure no Unicode case conversions are needed
% time ./grep-3.3 -Fwif <(seq 300000 | tr 0-9 A-J) <<<foo
real 0m0.054s
user 0m0.048s
sys 0m0.012s
% time ./grep-3.4 -Fwif <(seq 30000 | tr 0-9 A-J) <<<foo
./grep-3.4: Memory exhausted
Aborted
real 0m1.291s
user 0m0.696s
sys 0m0.599s
% time ./grep-3.6 -Fwif <(seq 300000 | tr 0-9 A-J) <<<foo
real 0m13.162s
user 0m12.955s
sys 0m0.211s
grep-3.3 behaves well, even with much larger number of patterns.
Time seems to grow linearly, and memory usage is constant.
grep-3.4 behaves the worst of these 3 versions. Even with just 30000
patterns it exceeds the ulimit of 5 GB.
grep-3.6 behaves a bit better than 3.4, but still bad. Time seems to
be quadratic in the number of patterns, and though memory usage in
this case seems to be almost constant, in my actual use case it also
runs out of memory where grep-3.3 works well with just a few 100 MB
used.
Without "-i", grep-3.4 seems to run as fast as grep-3.3, but
grep-3.6 is almost as slow as with "-i".
So there might actually be two different issues here, one that
affects 3.4 with "-i" and one that affects 3.6 with or without "-i".
Regards,
Frank
--- End Message ---