On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering <j...@meyering.net> wrote: > Thank you for the fine bug report. > The grep-3.6 bug you've exposed is due to the fact that your input > triggers excessive hash collisions when using the code modeled after > gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase > take O(N^2) time for N patterns. In the attached, I've switched grep > to use the djb2 hash function, and that resolves the problem. I'll > also add a NEWS entry and a test before pushing this.
Timings suggest that grep-3.6's preprocessing came closer to O(N^3). Here's an example that would take 2-3 days with grep-3.6 and only seconds with this fix: : | grep -Ff <(seq 6400000 | tr 0-9 A-J) Here's a complete patch. I'll push it later today.
0001-grep-avoid-performance-regression-with-many-patterns.patch
Description: Binary data