On 2016-12-11 Norihiro Tanaka wrote: > The changes switch used algorithm. They convert grep -w -F to grep > -w.
Hi, thanks for helping! Sorry, yes, I forgot I was using --fixed-strings (-F), so yes, my example should have used -F. > Try following test case and before and after the changes, please. > > $ yes $(printf %040d 0) | head -10000000 >k > $ time -p env LC_ALL=C grep -w -F 0 k > > It did not finish in 40s on my machine without the changes, but > finished in 0.54s after the changes, and following case finished in > 1.35s regardless whether the changes applied or not. > > $ time -p env LC_ALL=C grep -w 0 k OK, I see now where you are going with your examples. You are trying to show why the change was made in the first place. That makes sense. > It may be an extreme example, but if a lot of people assume that grep > -F is faster than grep, I think that it is a bug. I've read in numerous places (O'Reilly books mostly) that grep or pcre is often/sometimes faster than fgrep, so I think it is (somewhat) common knowledge and I wouldn't worry too much about that perception. (Even though what you say is intuitive.) > > grep -w -f /usr/share/dict/words /tmp/greptest > > (good older version: 2 seconds to complete, minimal memory) > > (any version after the above commits: 10 or more minutes, never > > waited for it to finish, 1.2GB RAM usage and 100% cpu) > > I believe that it can not happen, as the changes work for grep -F > only. Is it grep -F -w, correctly? grep -F -w is not good at long > pattern after the changes. Yes, you are correct, I mistakenly left out the -F in my example. > I think that the real issue of bug#21763, bug#22239 and bug#22357 is > that dfa uses a lot of memory for a long pattern, but We have no idea > to improve it currently. Let me digress for a moment... I think it's important you can reproduce my test case symptom before we spend time on the solution: > However, the poor performance will be improved partly. Following case > is returned in 2.4s after patched on my machine. > > $ env LC_ALL=C grep -F -w -f /usr/share/dict/words /dev/null On my box the above runs for >2m (never completes before I ^C) on the version **AFTER** the commits (v2.22). On the test build just *BEFORE* the commits (2.21.73-8058), it runs in <2s. So for me, I had a working command (-F -w -f) that used to run quickly, that suddenly became slow/useless. And none of the suggestions I've read can alleviate the problem for me, not locale settings, nor anything else. If you can confirm this is the case on your box, then maybe we can figure out why? The main question I think for my case is why is "-F -w -f" switching modes to the slow DFA at all? AFAIK there's no encoding flaw in the input, so why is grep even switching modes in this use case? What was wrong with just using the old mode? From what I've read, people don't want DFA-mode when using -F and -f. Or, there should be an option to grep that we can specify to turn off this new behavior to get back the old (<2s) performance. Thanks!