bug#21763: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-28 Thread Norihiro Tanaka
On Tue, 27 Dec 2016 22:37:25 -0800 Paul Eggert wrote: > Norihiro Tanaka wrote: > > So I wrote the patch to use fgrep matcher for both. > > Thanks, I installed that after tweaking the commit message and omitting > unnecessary parens. Thanks, I confirmed it.

bug#22239: bug#21763: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-27 Thread Paul Eggert
Norihiro Tanaka wrote: So I wrote the patch to use fgrep matcher for both. Thanks, I installed that after tweaking the commit message and omitting unnecessary parens.

bug#22239: bug#21763: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-27 Thread Norihiro Tanaka
On Mon, 26 Dec 2016 12:07:49 -0800 Paul Eggert wrote: > Norihiro Tanaka wrote: > > Hmm, how about the following test cases, although it is extreame? > > I don't think we need to worry about performance for the case when -w > is given, and a pattern matches data that contains non-word > characte

bug#22239: bug#21763: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-26 Thread Paul Eggert
Norihiro Tanaka wrote: Hmm, how about the following test cases, although it is extreame? I don't think we need to worry about performance for the case when -w is given, and a pattern matches data that contains non-word characters. In practice, such cases are rare. I expect that most users wou

bug#22357: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-26 Thread Jim Meyering
On Mon, Dec 26, 2016 at 1:06 PM, Norihiro Tanaka wrote: > > On Fri, 23 Dec 2016 17:38:42 -0800 > Paul Eggert wrote: > >> No. Thanks, I hadn't considered that possibility. I looked into the >> slowdown and installed the attached patches, which cause 'grep' to >> run about as fast on this test case

bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-26 Thread Norihiro Tanaka
On Fri, 23 Dec 2016 17:38:42 -0800 Paul Eggert wrote: > No. Thanks, I hadn't considered that possibility. I looked into the > slowdown and installed the attached patches, which cause 'grep' to > run about as fast on this test case as grep 2.25 (though not as fast > as grep 2.26). The main fix is

bug#21763: bug#22239: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-23 Thread Paul Eggert
Norihiro Tanaka wrote: are you aware of extreme slowdown in the following cases after third patch? yes $(printf %040d 0) | head -1000 >inp printf '0\n1\n' >pat env LC_ALL=C src/grep -w -f pat inp No. Thanks, I hadn't considered that possibility. I looked into the slowdown and instal

bug#21763: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-22 Thread Jim Meyering
On Wed, Dec 21, 2016 at 3:04 PM, Norihiro Tanaka wrote: > yes $(printf %040d 0) | head -1000 >inp > printf '0\n1\n' >pat > env LC_ALL=C src/grep -w -f pat inp Thanks for mentioning that. Here is some actual data (Fedora 25 on an i7-4770S): for i in $(env ls -1dv /p/p/grep-*/bin/grep) s

bug#21763: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-21 Thread Norihiro Tanaka
On Tue, 20 Dec 2016 21:17:01 -0800 Paul Eggert wrote: > I installed the attached patches into grep master. These fix the > performance regressions noted at the start of Bug#22357. I see that > the related performance problems noted in Bug#21763 seem to be fixed > too, I expect because of Norihir

bug#21763: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-20 Thread Paul Eggert
I installed the attached patches into grep master. These fix the performance regressions noted at the start of Bug#22357. I see that the related performance problems noted in Bug#21763 seem to be fixed too, I expect because of Norihiro Tanaka's recent changes, so I'll boldly close both bug repor

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-20 Thread Norihiro Tanaka
On Mon, 19 Dec 2016 15:38:12 -0800 Paul Eggert wrote: > but the old 'replace' called 'delete' up to N times, Yes, but constraint == 0 does not happen mostly, so in delete() in "while" does not pass normally. > Anyway, I verified that the change improved performance on the test > case with -f /

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-19 Thread Paul Eggert
On 12/19/2016 02:38 PM, Norihiro Tanaka wrote: BTW, What case do you the worst? One more I think previous 'replace' is not O(N*(N + log N)) but O(N + N log N) i.e. O(N log N) . Well, perhaps I misunderstood it, but the old 'replace' called 'delete' up to N times, and 'delete' took O(log N) to

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-19 Thread Norihiro Tanaka
On Sun, 18 Dec 2016 23:48:10 -0800 Paul Eggert wrote: > >> 'delete' is > >> O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2). > >> 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3). > >> I haven't looked into how likely the worst-case performance is, though.

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-18 Thread Paul Eggert
'delete' is O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2). 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3). I haven't looked into how likely the worst-case performance is, though. Yes. It is same both before and after with my proposed patch, but It see

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-17 Thread Norihiro Tanaka
On Wed, 14 Dec 2016 17:19:27 -0800 Paul Eggert wrote: > I was referring to code with his proposed patch installed. 'delete' is > O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2). > 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3). > I haven't looked into how

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-15 Thread Paul Jackson
An interesting discussion ... thanks :). I tried "grep -f" on a couple of large text files I had. I have a file of "words", one word per line, all lower case ascii, that I had accumulated from various spell checkers and such. This "words" file has 395,696 unique (more or less) English words in it

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-15 Thread L.A. Walsh
Norihiro Tanaka wrote: dfa matcher is not always slower than kws matcher. - $ env LC_ALL=C grep -F -w 0 k - $ env LC_ALL=C grep -F -w -f /usr/share/dict/words /dev/null First is faster after the changes, and second is slower after the changes. It's a trade-off. Can you have any idea to select

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-14 Thread Paul Eggert
On 12/12/2016 03:49 AM, Bruno Haible wrote: Part of the problem appears to be that position-set merging, even with his latest proposed changes, is O(N**2) where N is the pattern size I'm confused. Which code are you talking about? I was referring to code with his proposed patch installed

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-12 Thread Paul Eggert
Trevor Cordes wrote: On 2016-12-12 Paul Eggert wrote: grep should just work. It is not working now and we need to fix that By that rationale, the commit that causes the problems for me should be backed out as a first step, and then a proper solution should be found. But it may be better to i

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-12 Thread Trevor Cordes
On 2016-12-12 Paul Eggert wrote: > grep should just work. It is > not working now and we need to fix that By that rationale, the commit that causes the problems for me should be backed out as a first step, and then a proper solution should be found. If you look at the commit in isolation, it: - a

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-12 Thread Paul Eggert
On 12/12/2016 02:36 AM, Trevor Cordes wrote: If a user bothered to specify fgrep or -F then that user knows what they are doing! Here I have to disagree, and to agree with Bruno. The user should not have to know what grep's algorithm is. grep should just work. It is not working now and we nee

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-12 Thread Norihiro Tanaka
On Sun, 11 Dec 2016 18:55:32 +0100 Bruno Haible wrote: > Norihiro Tanaka wrote: > > dfa matcher is not always slower than kws matcher. ... > > It's a trade-off. Can you have any idea to select the better > > matcher for both two cases? > > There are at least two approaches. > > 1) If you have

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-12 Thread Bruno Haible
Hi Paul, > Before installing anything like that, I'd first like to look into improving > the > DFA performance, along the lines suggested by Norhiro Tanaka. Part of the > problem appears to be that position-set merging, even with his latest > proposed > changes, is O(N**2) where N is the patt

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-12 Thread Trevor Cordes
On 2016-12-12 Norihiro Tanaka wrote: > > On my box the above runs for >2m (never completes before I ^C) on > > the version **AFTER** the commits (v2.22). On the test build just > > *BEFORE* the commits (2.21.73-8058), it runs in <2s. So for me, I > > had a working command (-F -w -f) that used to

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-11 Thread Paul Eggert
arn...@skeeve.com wrote: I'm sure that Paul and Jim would welcome patches. I wrote a patch that prefers -F when the input looks like a troublesome case. It merely uses heuristics though, nothing scientific like what Bruno Haible suggested. Before installing anything like that, I'd first like

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-11 Thread Trevor Cordes
On 2016-12-11 Norihiro Tanaka wrote: > The changes switch used algorithm. They convert grep -w -F to grep > -w. Hi, thanks for helping! Sorry, yes, I forgot I was using --fixed-strings (-F), so yes, my example should have used -F. > Try following test case and before and after the changes, plea

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-11 Thread arnold
Bruno Haible wrote: > Finally, code this formula into the 'grep' program. I'm sure that Paul and Jim would welcome patches. Arnold

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-11 Thread Bruno Haible
Trevor Cordes wrote: > I've read in numerous places (O'Reilly books mostly) that grep or pcre > is often/sometimes faster than fgrep, so I think it is (somewhat) common > knowledge and I wouldn't worry too much about that perception. It is wrong to put the burden of the algorithm choice on the use

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-11 Thread Norihiro Tanaka
On Sun, 11 Dec 2016 05:28:56 -0600 Trevor Cordes wrote: > On my box the above runs for >2m (never completes before I ^C) on the > version **AFTER** the commits (v2.22). On the test build just *BEFORE* > the commits (2.21.73-8058), it runs in <2s. So for me, I had a working > command (-F -w -f)

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-10 Thread Norihiro Tanaka
On Fri, 9 Dec 2016 01:24:19 -0600 Trevor Cordes wrote: > I bisected this bug to commits: > 662b19f2d0edc0bf07f1a2c1421080252df4a37c > 468d5217ed6ec1679512dec208c7f30fb8612957 > (can't narrow it down because the latter doesn't compile for me) The changes switch used algorithm. They convert grep

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-09 Thread Trevor Cordes
I bisected this bug to commits: 662b19f2d0edc0bf07f1a2c1421080252df4a37c 468d5217ed6ec1679512dec208c7f30fb8612957 (can't narrow it down because the latter doesn't compile for me) This bug has hit me hard. I have a script that wants to do: grep -w -f /usr/share/dict/words /tmp/greptest (good olde

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-03-14 Thread Bruce Dubbs
JQK wrote: 【without option "-F"】 # env time grep -w -f <(seq 20) <(shuf -i 1-20 -n 250) : 288.77user 64.23system 10:35.71elapsed 55%CPU (0avgtext+0avgdata 3492784maxresident)k 8967032inputs+0outputs (154389major+1493890minor)pagefaults 0swaps I certainly have different results. There

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-03-14 Thread Norihiro Tanaka
On Mon, 14 Mar 2016 14:31:50 +0800 JQK wrote: > # env time grep -w -f <(seq 20) <(shuf -i 1-20 -n 250) > : > 288.77user 64.23system 10:35.71elapsed 55%CPU (0avgtext+0avgdata > 3492784maxresident)k > 8967032inputs+0outputs (154389major+1493890minor)pagefaults 0swaps The issue also reproced

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-03-13 Thread JQK
On 03/12/2016 04:17 AM, Jim Meyering wrote: > [resending to keep the list on Cc] > On Thu, Mar 10, 2016 at 10:05 PM, JQK wrote: >> On 03/11/2016 01:26 AM, Jim Meyering wrote: >>> On Thu, Mar 10, 2016 at 3:00 AM, JQK wrote: If in the following situation, === file1 has n

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-03-11 Thread Jim Meyering
[resending to keep the list on Cc] On Thu, Mar 10, 2016 at 10:05 PM, JQK wrote: > On 03/11/2016 01:26 AM, Jim Meyering wrote: >> On Thu, Mar 10, 2016 at 3:00 AM, JQK wrote: >>> If in the following situation, >>> >>> === >>> file1 has numbers from 1 to 20, 20 lines >>> file2 has se

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-03-10 Thread JQK
On 03/11/2016 01:26 AM, Jim Meyering wrote: > On Thu, Mar 10, 2016 at 3:00 AM, JQK wrote: >> If in the following situation, >> >> === >> file1 has numbers from 1 to 20, 20 lines >> file2 has several lines(about 200 ~300lines) of random numbers in the >> range of 1-20 >> ===

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-03-10 Thread Jim Meyering
On Thu, Mar 10, 2016 at 3:00 AM, JQK wrote: > If in the following situation, > > === > file1 has numbers from 1 to 20, 20 lines > file2 has several lines(about 200 ~300lines) of random numbers in the > range of 1-20 > === > > The time cost for finishing the following co

bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-03-10 Thread JQK
If in the following situation, === file1 has numbers from 1 to 20, 20 lines file2 has several lines(about 200 ~300lines) of random numbers in the range of 1-20 === The time cost for finishing the following command could be over 15 minutes on linux -- a little huge. $