On Tue, 27 Dec 2016 22:37:25 -0800
Paul Eggert wrote:
> Norihiro Tanaka wrote:
> > So I wrote the patch to use fgrep matcher for both.
>
> Thanks, I installed that after tweaking the commit message and omitting
> unnecessary parens.
Thanks, I confirmed it.
Norihiro Tanaka wrote:
So I wrote the patch to use fgrep matcher for both.
Thanks, I installed that after tweaking the commit message and omitting
unnecessary parens.
On Mon, 26 Dec 2016 12:07:49 -0800
Paul Eggert wrote:
> Norihiro Tanaka wrote:
> > Hmm, how about the following test cases, although it is extreame?
>
> I don't think we need to worry about performance for the case when -w
> is given, and a pattern matches data that contains non-word
> characte
Norihiro Tanaka wrote:
Hmm, how about the following test cases, although it is extreame?
I don't think we need to worry about performance for the case when -w is given,
and a pattern matches data that contains non-word characters. In practice, such
cases are rare. I expect that most users wou
On Mon, Dec 26, 2016 at 1:06 PM, Norihiro Tanaka wrote:
>
> On Fri, 23 Dec 2016 17:38:42 -0800
> Paul Eggert wrote:
>
>> No. Thanks, I hadn't considered that possibility. I looked into the
>> slowdown and installed the attached patches, which cause 'grep' to
>> run about as fast on this test case
On Fri, 23 Dec 2016 17:38:42 -0800
Paul Eggert wrote:
> No. Thanks, I hadn't considered that possibility. I looked into the
> slowdown and installed the attached patches, which cause 'grep' to
> run about as fast on this test case as grep 2.25 (though not as fast
> as grep 2.26). The main fix is
Norihiro Tanaka wrote:
are you aware of extreme slowdown in the following cases after third patch?
yes $(printf %040d 0) | head -1000 >inp
printf '0\n1\n' >pat
env LC_ALL=C src/grep -w -f pat inp
No. Thanks, I hadn't considered that possibility. I looked into the slowdown and
instal
On Wed, Dec 21, 2016 at 3:04 PM, Norihiro Tanaka wrote:
> yes $(printf %040d 0) | head -1000 >inp
> printf '0\n1\n' >pat
> env LC_ALL=C src/grep -w -f pat inp
Thanks for mentioning that.
Here is some actual data (Fedora 25 on an i7-4770S):
for i in $(env ls -1dv /p/p/grep-*/bin/grep) s
On Tue, 20 Dec 2016 21:17:01 -0800
Paul Eggert wrote:
> I installed the attached patches into grep master. These fix the
> performance regressions noted at the start of Bug#22357. I see that
> the related performance problems noted in Bug#21763 seem to be fixed
> too, I expect because of Norihir
I installed the attached patches into grep master. These fix the performance
regressions noted at the start of Bug#22357. I see that the related performance
problems noted in Bug#21763 seem to be fixed too, I expect because of Norihiro
Tanaka's recent changes, so I'll boldly close both bug repor
On Mon, 19 Dec 2016 15:38:12 -0800
Paul Eggert wrote:
> but the old 'replace' called 'delete' up to N times,
Yes, but constraint == 0 does not happen mostly, so in delete() in
"while" does not pass normally.
> Anyway, I verified that the change improved performance on the test
> case with -f /
On 12/19/2016 02:38 PM, Norihiro Tanaka wrote:
BTW, What case do you the worst? One more I think previous 'replace' is
not O(N*(N + log N)) but O(N + N log N) i.e. O(N log N) .
Well, perhaps I misunderstood it, but the old 'replace' called 'delete'
up to N times, and 'delete' took O(log N) to
On Sun, 18 Dec 2016 23:48:10 -0800
Paul Eggert wrote:
> >> 'delete' is
> >> O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2).
> >> 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3).
> >> I haven't looked into how likely the worst-case performance is, though.
'delete' is
O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2).
'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3).
I haven't looked into how likely the worst-case performance is, though.
Yes. It is same both before and after with my proposed patch, but It
see
On Wed, 14 Dec 2016 17:19:27 -0800
Paul Eggert wrote:
> I was referring to code with his proposed patch installed. 'delete' is
> O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2).
> 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3).
> I haven't looked into how
An interesting discussion ... thanks :).
I tried "grep -f" on a couple of large text files I had.
I have a file of "words", one word per line, all lower case ascii,
that I had accumulated from various spell checkers and such.
This "words" file has 395,696 unique (more or less) English words in it
Norihiro Tanaka wrote:
dfa matcher is not always slower than kws matcher.
- $ env LC_ALL=C grep -F -w 0 k
- $ env LC_ALL=C grep -F -w -f /usr/share/dict/words /dev/null
First is faster after the changes, and second is slower after the
changes. It's a trade-off. Can you have any idea to select
On 12/12/2016 03:49 AM, Bruno Haible wrote:
Part of the
problem appears to be that position-set merging, even with his latest proposed
changes, is O(N**2) where N is the pattern size
I'm confused. Which code are you talking about?
I was referring to code with his proposed patch installed
Trevor Cordes wrote:
On 2016-12-12 Paul Eggert wrote:
grep should just work. It is
not working now and we need to fix that
By that rationale, the commit that causes the problems for me should be
backed out as a first step, and then a proper solution should be
found.
But it may be better to i
On 2016-12-12 Paul Eggert wrote:
> grep should just work. It is
> not working now and we need to fix that
By that rationale, the commit that causes the problems for me should be
backed out as a first step, and then a proper solution should be
found. If you look at the commit in isolation, it:
- a
On 12/12/2016 02:36 AM, Trevor Cordes wrote:
If a user bothered to
specify fgrep or -F then that user knows what they are doing!
Here I have to disagree, and to agree with Bruno. The user should not
have to know what grep's algorithm is. grep should just work. It is not
working now and we nee
On Sun, 11 Dec 2016 18:55:32 +0100
Bruno Haible wrote:
> Norihiro Tanaka wrote:
> > dfa matcher is not always slower than kws matcher. ...
> > It's a trade-off. Can you have any idea to select the better
> > matcher for both two cases?
>
> There are at least two approaches.
>
> 1) If you have
Hi Paul,
> Before installing anything like that, I'd first like to look into improving
> the
> DFA performance, along the lines suggested by Norhiro Tanaka. Part of the
> problem appears to be that position-set merging, even with his latest
> proposed
> changes, is O(N**2) where N is the patt
On 2016-12-12 Norihiro Tanaka wrote:
> > On my box the above runs for >2m (never completes before I ^C) on
> > the version **AFTER** the commits (v2.22). On the test build just
> > *BEFORE* the commits (2.21.73-8058), it runs in <2s. So for me, I
> > had a working command (-F -w -f) that used to
arn...@skeeve.com wrote:
I'm sure that Paul and Jim would welcome patches.
I wrote a patch that prefers -F when the input looks like a troublesome case. It
merely uses heuristics though, nothing scientific like what Bruno Haible suggested.
Before installing anything like that, I'd first like
On 2016-12-11 Norihiro Tanaka wrote:
> The changes switch used algorithm. They convert grep -w -F to grep
> -w.
Hi, thanks for helping! Sorry, yes, I forgot I was using
--fixed-strings (-F), so yes, my example should have used -F.
> Try following test case and before and after the changes, plea
Bruno Haible wrote:
> Finally, code this formula into the 'grep' program.
I'm sure that Paul and Jim would welcome patches.
Arnold
Trevor Cordes wrote:
> I've read in numerous places (O'Reilly books mostly) that grep or pcre
> is often/sometimes faster than fgrep, so I think it is (somewhat) common
> knowledge and I wouldn't worry too much about that perception.
It is wrong to put the burden of the algorithm choice on the use
On Sun, 11 Dec 2016 05:28:56 -0600
Trevor Cordes wrote:
> On my box the above runs for >2m (never completes before I ^C) on the
> version **AFTER** the commits (v2.22). On the test build just *BEFORE*
> the commits (2.21.73-8058), it runs in <2s. So for me, I had a working
> command (-F -w -f)
On Fri, 9 Dec 2016 01:24:19 -0600
Trevor Cordes wrote:
> I bisected this bug to commits:
> 662b19f2d0edc0bf07f1a2c1421080252df4a37c
> 468d5217ed6ec1679512dec208c7f30fb8612957
> (can't narrow it down because the latter doesn't compile for me)
The changes switch used algorithm. They convert grep
I bisected this bug to commits:
662b19f2d0edc0bf07f1a2c1421080252df4a37c
468d5217ed6ec1679512dec208c7f30fb8612957
(can't narrow it down because the latter doesn't compile for me)
This bug has hit me hard. I have a script that wants to do:
grep -w -f /usr/share/dict/words /tmp/greptest
(good olde
JQK wrote:
【without option "-F"】
# env time grep -w -f <(seq 20) <(shuf -i 1-20 -n 250)
:
288.77user 64.23system 10:35.71elapsed 55%CPU (0avgtext+0avgdata
3492784maxresident)k
8967032inputs+0outputs (154389major+1493890minor)pagefaults 0swaps
I certainly have different results. There
On Mon, 14 Mar 2016 14:31:50 +0800
JQK wrote:
> # env time grep -w -f <(seq 20) <(shuf -i 1-20 -n 250)
> :
> 288.77user 64.23system 10:35.71elapsed 55%CPU (0avgtext+0avgdata
> 3492784maxresident)k
> 8967032inputs+0outputs (154389major+1493890minor)pagefaults 0swaps
The issue also reproced
On 03/12/2016 04:17 AM, Jim Meyering wrote:
> [resending to keep the list on Cc]
> On Thu, Mar 10, 2016 at 10:05 PM, JQK wrote:
>> On 03/11/2016 01:26 AM, Jim Meyering wrote:
>>> On Thu, Mar 10, 2016 at 3:00 AM, JQK wrote:
If in the following situation,
===
file1 has n
[resending to keep the list on Cc]
On Thu, Mar 10, 2016 at 10:05 PM, JQK wrote:
> On 03/11/2016 01:26 AM, Jim Meyering wrote:
>> On Thu, Mar 10, 2016 at 3:00 AM, JQK wrote:
>>> If in the following situation,
>>>
>>> ===
>>> file1 has numbers from 1 to 20, 20 lines
>>> file2 has se
On 03/11/2016 01:26 AM, Jim Meyering wrote:
> On Thu, Mar 10, 2016 at 3:00 AM, JQK wrote:
>> If in the following situation,
>>
>> ===
>> file1 has numbers from 1 to 20, 20 lines
>> file2 has several lines(about 200 ~300lines) of random numbers in the
>> range of 1-20
>> ===
On Thu, Mar 10, 2016 at 3:00 AM, JQK wrote:
> If in the following situation,
>
> ===
> file1 has numbers from 1 to 20, 20 lines
> file2 has several lines(about 200 ~300lines) of random numbers in the
> range of 1-20
> ===
>
> The time cost for finishing the following co
If in the following situation,
===
file1 has numbers from 1 to 20, 20 lines
file2 has several lines(about 200 ~300lines) of random numbers in the
range of 1-20
===
The time cost for finishing the following command could be over 15
minutes on linux -- a little huge.
$
38 matches
Mail list logo