Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-12 Thread Jim Meyering
On Thu, Sep 11, 2014 at 12:10 PM, Paul Eggert wrote: > On 09/11/2014 11:37 AM, Jim Meyering wrote: >> >> Would you mind adding a test to trigger that one? > > Ordinarily I would have done that already but this -P stuff is so buggy and > slow that I got discouraged. (If we keep having trouble with

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-11 Thread Vincent Lefevre
On 2014-09-11 10:07:49 -0700, Paul Eggert wrote: > Vincent Lefevre wrote: > >I've just reported a new Debian concerning the performance problem. > > It's not clear from http://bugs.debian.org/761157 that the performance > problem occurs only with -P, but I assume that's what is meant. It's specif

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-11 Thread Paul Eggert
On 09/11/2014 11:37 AM, Jim Meyering wrote: Would you mind adding a test to trigger that one? Ordinarily I would have done that already but this -P stuff is so buggy and slow that I got discouraged. (If we keep having trouble with -P I may start lobbying to remove it) Anyway, I gave it a

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-11 Thread Jim Meyering
On Thu, Sep 11, 2014 at 10:07 AM, Paul Eggert wrote: > Vincent Lefevre wrote: > >> I've just reported a new Debian concerning the performance problem. > > > It's not clear from http://bugs.debian.org/761157 that the performance > problem occurs only with -P, but I assume that's what is meant. > >

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-11 Thread Paul Eggert
Vincent Lefevre wrote: I've just reported a new Debian concerning the performance problem. It's not clear from http://bugs.debian.org/761157 that the performance problem occurs only with -P, but I assume that's what is meant. Since this is a performance bug with PCRE, I suggest moving the D

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-11 Thread Vincent Lefevre
On 2014-09-10 13:22:36 +0200, Santiago wrote: > Thanks! I'm including this fix in the current debian package. Unfortunately, it is very slow, with a large slowdown factor. I've just reported a new Debian concerning the performance problem. -- Vincent Lefèvre - Web: 100

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-10 Thread Norihiro Tanaka
Thanks. I have confirmed that new version has expected response as following. $ env LC_ALL=en_US.utf8 src/grep -P '.?b' in ab -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-10 Thread Santiago
El 10/09/14 a las 00:08, Paul Eggert escribió: > Paul Eggert wrote: > >perhaps there's a PCRE version dependency here? > > I found a PCRE-version-dependent problem that may be relevant, and installed > the attached further patch to fix it. Thanks! I'm including this fix in the current debian pack

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-10 Thread Paul Eggert
Paul Eggert wrote: perhaps there's a PCRE version dependency here? I found a PCRE-version-dependent problem that may be relevant, and installed the attached further patch to fix it. From dc7d532d16dec740d11b6817c9b558543aca0136 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 10 Sep 2014

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-09 Thread Paul Eggert
Norihiro Tanaka wrote: I see that new version has no response for following test which was used previously. printf '\x80ab\n' | env LC_ALL=en_US.utf8 src/grep -P '.?b' Thanks for reporting that. The test case works for me (Fedora 20 x86-64, GCC 4.9.1): $ printf '\x80ab\n' | env LC_AL

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-09 Thread Norihiro Tanaka
I see that new version has no response for following test which was used previously. printf '\x80ab\n' | env LC_ALL=en_US.utf8 src/grep -P '.?b' -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.or

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-09 Thread Paul Eggert
Norihiro Tanaka wrote: I'm worried that to re-run for invalid UTF-8 makes slowness for searching of the large number of binary files. Yes, that could be a problem, but even so it's better for grep to report matches than to give up and fail. Perhaps someone could optimize this better later, b

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-09 Thread Norihiro Tanaka
I'm worried that to re-run for invalid UTF-8 makes slowness for searching of the large number of binary files. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-29 Thread Eric Blake
On 08/28/2014 11:47 PM, Santiago wrote: > El 16/08/14 a las 11:36, Paul Eggert escribió: >> > Santiago wrote: >>> > >Another solution would be to don't check if binary files are valid >>> > >(passing PCRE_NO_UTF8_CHECK to pcre_exec), but I don't know if that'd >>> > >avoid security holes >> > >> >

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-16 Thread Paul Eggert
Santiago wrote: Another solution would be to don't check if binary files are valid (passing PCRE_NO_UTF8_CHECK to pcre_exec), but I don't know if that'd avoid security holes It wouldn't. (We already tried it.) -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subje

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-16 Thread Santiago
El 16/08/14 a las 18:26, Vincent Lefevre escribió: > On 2014-08-16 16:01:27 +0200, Santiago wrote: > > Workaround attached. It's too slow against binary files, but I haven't > > found a simpler solution. > > To avoid the slowness, I think that it would be better to detect > (directly, not via PCRE

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-16 Thread Vincent Lefevre
On 2014-08-16 16:01:27 +0200, Santiago wrote: > Workaround attached. It's too slow against binary files, but I haven't > found a simpler solution. To avoid the slowness, I think that it would be better to detect (directly, not via PCRE) invalid UTF-8 sequences and replace them by null bytes *in-pl

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-16 Thread Santiago
El 14/08/14 a las 14:33, Paul Eggert escribió: > Vincent Lefevre wrote: > >On input, using null bytes may be better if one wants to be able to > >match real replacement characters without false positives. > > Maybe, though this is no place to get fancy. It's simple to tell users "an > invalid byt

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Paul Eggert
Vincent Lefevre wrote: On input, using null bytes may be better if one wants to be able to match real replacement characters without false positives. Maybe, though this is no place to get fancy. It's simple to tell users "an invalid byte acts like '?'". Simple is good. Anyway, this is a ma

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Vincent Lefevre
On 2014-08-14 13:13:45 -0700, Paul Eggert wrote: > Vincent Lefevre wrote: > >The problem with this solution is that it would change the length > >of the text, while replacing invalid bytes by zero bytes could be > >done in place (if allowed), with very little change of the code, > >I think. > > Tr

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Vincent Lefevre
On 2014-08-14 11:19:28 -0700, Paul Eggert wrote: > grep should work correctly even if the input contains NUL bytes, so perhaps > it would be better to replace an invalid byte by the UTF-8 sequence for > U+FFFD REPLACEMENT CHARACTER, as that's one standard way to deal with this > problem. Or perhap

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Paul Eggert
Vincent Lefevre wrote: The problem with this solution is that it would change the length of the text, while replacing invalid bytes by zero bytes could be done in place (if allowed), with very little change of the code, I think. True. Though it might be more user-friendly to use '?' as the re

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Paul Eggert
Vincent Lefevre wrote: it would be better to replace invalid UTF-8 sequences by zero bytes before passing them to libpcre. Is it allowed to do that in Pexecute()? Sorry, I don't know. I was hoping that the volunteer (whoever it is) could figure all this stuff out. grep should work correctl

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Vincent Lefevre
On 2014-08-14 09:15:58 -0700, Paul Eggert wrote: > That commit was necessary to avoid undefined behavior in libpcre. We can't > simply undo the commit (unless you want to reintroduce security holes into > grep :-). The current behavior is the best we can do, unless someone fixes > libpcre (which

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Paul Eggert
Santiago wrote: Please, revert ca7868cc27db3d9deafaa2e0ac5a2bb0aa8ef373 That commit was necessary to avoid undefined behavior in libpcre. We can't simply undo the commit (unless you want to reintroduce security holes into grep :-). The current behavior is the best we can do, unless someone