On 2016-12-12 Norihiro Tanaka wrote: > > On my box the above runs for >2m (never completes before I ^C) on > > the version **AFTER** the commits (v2.22). On the test build just > > *BEFORE* the commits (2.21.73-8058), it runs in <2s. So for me, I > > had a working command (-F -w -f) that used to run quickly, that > > suddenly became slow/useless. And none of the suggestions I've > > read can alleviate the problem for me, not locale settings, nor > > anything else. If you can confirm this is the case on your box, > > then maybe we can figure out why? > > Can you re-run with current master, as dfa has been improved > frequently?
env LC_ALL=C grep -F -w -f /usr/share/dict/words /dev/null ran for 3+ minutes and then quit; no change then. I had run on HEAD before my git bisect anyway, just to ensure it hadn't been fixed in the meantime. If HEAD helps, then it still isn't good enough, as taking 2+ orders of magnitude more time is still not a viable solution for me. > First is faster after the changes, and second is slower after the The problem, I think, isn't "faster" or "slower", it's usable vs. unusable. I think it's not possible to change a working-for-decades use case to increase runtime by 2, 3 or 10 orders of magnitude without leaving a bunch of previously-happy end users stranded. Hence all the bug reports. Conversely, not many other people are going to notice or care that their other use case improved by 10 or 20% runtime. This change is exact that tradeoff: under 1 order of magnitude improvement for some people, at the expense of many orders of magnitude detriment for others. > If we think simple algorithm as we select kws matcher for grep -F -w > -f and select grep dfa matcher for -F -w -e, we will get extreamly > different performance for following two cases. OK, I looked at the code a lot more closely. Now I see a more salient point perhaps we should be looking at: "In a unibyte locale, switch from fgrep to grep if the pattern matches words (where grep is typically faster)." Why is this even something that's being done? If a user bothered to specify fgrep or -F then that user knows what they are doing! Only advanced users even know that fgrep/-F exists and use it. What the above comment says is that "since we think it'll be faster, let's override the will of the user because we think we know better"! Why not give the user credit that they know what they are doing? And in this case, again consider my above point that you are gaining tiny improvements for some at the expense of horrendous penalties for others. Perhaps undoing the commit and simply expounding on the issue in the docs (man page, etc) would be a better approach? Then the user has control of the situation in their choice of -F or no -F. What is wrong with that? > changes. It's a trade-off. Can you have any idea to select the > better matcher for both two cases? > > - $ env LC_ALL=C grep -F -w -f /usr/share/dict/words /dev/null > > - $ env LC_ALL=C grep -F -w -e "`cat /usr/share/dict/words`" /dev/null Yes, you are correct, if this commit is to stand and we want to find some way to solve my (and the other bug reporters') use case performance problems, then that question will have to be answered. I would say empirical tests would have to be run to discover where the performance gain/loss crossover point lies (perhaps based on input/expression size) and switch based on that. However, my hunch is not trying to outsmart the user in the first place is the correct approach. Thanks for your time and input!