bug#42762: GREP does not support Unicode

2020-08-08 Thread carlo
Its 2020. GREP really should support Unicode. (UTF-16, UTF-8, with and without signature) Format recognition wouldn't have to be automatic; command line switches would be sufficient. I am using version Git for Windows v2.25.0 Kind regards

bug#42762: GREP does not support Unicode

2020-08-08 Thread carlo
: GREP does not support Unicode Hi Carlo! On Sat, 8 Aug 2020 15:13:40 +0200 wrote: > Its 2020. > > GREP really should support Unicode. (UTF-16, UTF-8, with and without > signature) Format recognition wouldn't have to be automatic; command > line switches would be sufficie

bug#51231: increase performance and usability of binary search with -P

2021-10-15 Thread Carlo Arenas
The following patch increase performance of grep when looking at binary data, without any side effects: Summary 'cd grep; ./src/grep -Pc foo /Users/carlo/Downloads/FreeBSD-13.0-BETA2-amd64.vhd' ran 1.77 ± 0.02 times faster than 'cd grep.base; ./src/grep -Pc foo /Users/

bug#51231: disregard patch

2021-10-16 Thread Carlo Arenas
And of course it has side effects (as shown by the test suite), and would only help (if fixed) when the needle is a fixed string, which is 3x slower than doing -F, -G or -E. Apologies for the distraction. Carlo

bug#51235: resolve old FIXME in PCRE implementation to allow more than 1 expression

2021-10-16 Thread Carlo Arenas
, and JIT might be able to run the alteration fast enough for most cases. Hopefully this tiny change is better than the status quo, though. Carlo 0001-pcre-allow-more-than-1-regular-expression.patch Description: Binary data

bug#51235: resolve old FIXME in PCRE implementation to allow more than 1 expression

2021-10-16 Thread Carlo Arenas
On Sat, Oct 16, 2021 at 12:50 AM Paul Eggert wrote: > > On 10/16/21 12:00 AM, Carlo Arenas wrote: > > With this patch, multiple expressions (from -e or -f) are now > > acceptable with -P for easier side by side comparison with the other > > supported engines. > >

bug#47264: [PATCH] pcre: migrate to pcre2

2021-11-08 Thread Carlo Arenas
On Sun, Nov 7, 2021 at 4:30 PM Paul Eggert wrote: > > On 11/7/21 11:26, Carlo Marcelo Arenas Belón wrote: > > Mostly a bug by bug translation of the original code to the PCRE2 API. > > but includes a couple of fixes as well that might be worth doing in > > independen

bug#47264: [PATCH] pcre: migrate to pcre2

2021-11-08 Thread Carlo Arenas
On Mon, Nov 8, 2021 at 11:53 AM Paul Eggert wrote: > > On 11/8/21 01:47, Carlo Arenas wrote: > > On Sun, Nov 7, 2021 at 4:30 PM Paul Eggert wrote: > > > Let me know how to help otherwise. > > The main thing from my point of view is that I'd like to know what tho

bug#51710: [PATCH] pcre: avoid overflow in PCRE JIT stack resizing

2021-11-09 Thread Carlo Arenas
No PCRE2 uses size_t and it is the same (or similar) not signed type when passed to sljit, so no Undefined Behaviour or overflow. We might keep the limit in PCRE2 though, as it should be IMHO far smaller anyway. Carlo Car On Tue, Nov 9, 2021 at 10:28 AM Paul Eggert wrote: > > Than

bug#51727: add an optional flag to -P to disable JIT

2021-11-10 Thread Carlo Arenas
On Tue, Nov 9, 2021 at 4:40 PM Paul Eggert wrote: > > On 11/9/21 11:04, Carlo Marcelo Arenas Belón wrote: > > Severity: wishlist > > > > There are times, when the expression is too simple or will not be used too > > often to justify the extra time in -P that i

bug#47264: [PATCH v2] pcre: migrate to pcre2

2021-11-14 Thread Carlo Arenas
On Sun, Nov 14, 2021 at 12:45 PM Paul Eggert wrote: > > On 11/9/21 02:58, Carlo Marcelo Arenas Belón wrote: > > Sadly, hadn't been able to generate a release, > > Does this mean you're having trouble running 'make dist'? If so, what's > the troub

bug#47264: [PATCH v2] pcre: migrate to pcre2

2021-11-14 Thread Carlo Arenas
On Sun, Nov 14, 2021 at 2:45 PM Jeffrey Walton wrote: > > On Sun, Nov 14, 2021 at 5:26 PM Carlo Arenas wrote: > > On Sun, Nov 14, 2021 at 12:45 PM Paul Eggert wrote: > > > ... > > using idx_t instead of size_t should be fine (if only halves the max > > size

bug#47264: [PATCH v2] pcre: migrate to pcre2

2021-11-14 Thread Carlo Arenas
On Sun, Nov 14, 2021 at 3:18 PM Carlo Arenas wrote: > On Sun, Nov 14, 2021 at 2:45 PM Jeffrey Walton wrote: > > On Sun, Nov 14, 2021 at 5:26 PM Carlo Arenas wrote: > > > On Sun, Nov 14, 2021 at 12:45 PM Paul Eggert wrote: > > > > ... > > > using idx_t in

bug#47264: [PATCH v2] pcre: migrate to pcre2

2021-11-14 Thread Carlo Arenas
On Sun, Nov 14, 2021 at 7:18 PM Paul Eggert wrote: > On 11/14/21 14:25, Carlo Arenas wrote: > > using idx_t instead of size_t should be fine (if only halves the max > > size of the objects managed), but I am concerned that assuming > > PCRE2_SIZE_MAX is always equivalent to

bug#56888: 'echo message | grep []' is affected by files in local directory when using bracket

2022-08-02 Thread Carlo Arenas
you want for your usecase and why it would be better if you quote it. time echo "axyz" | grep '[abcd]xyz' should behave as you expect, regardless of what the current directory has. Carlo

bug#60618: unicode characters are not identified as such for \w and \b with -P

2023-01-06 Thread Carlo Arenas
Reported to PCRE[1] with mention of GNU grep being also affected. [1] https://github.com/PCRE2Project/pcre2/issues/185 From c2d4a43b5b15df7c8853d591bf6ae872c602ed14 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= Date: Fri, 6 Jan 2023 19:34:56 -0800 Subject

bug#60708: pcre: improve support for linking with a library without unicode

2023-01-10 Thread Carlo Arenas
Noticed while testing the previous patch, and which resulted in tests being skipped for the wrong reason. Carlo 0001-pcre-only-use-UTF-when-available-in-the-library.patch Description: Binary data

bug#60708: pcre: improve support for linking with a library without unicode

2023-01-11 Thread Carlo Arenas
introduce any changing behaviour or even code changes (because of the expected optimization), but agree might have been too clever without a corresponding explanation. Carlo

bug#60708: pcre: improve support for linking with a library without unicode

2023-01-11 Thread Carlo Arenas
Your suggested code doesn't address that, it merely changes the error message with one that would be IMHO even less clear and worsens the problem. Using a non Unicode PCRE library is perfectly fine, and there is no "undefined behavior" risk, and indeed `grep -P` without the UTF flag is exactly what the alternate path uses and what is recommended for speed, so? Carlo

bug#60708: pcre: improve support for linking with a library without unicode

2023-01-12 Thread Carlo Arenas
unicode is missing, and take into consideration those tests that set multibyte locale were successful after my change, so they will also need changes as they would misbehave silently otherwise. Carlo

bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate

2023-04-02 Thread Carlo Arenas
gly; the loop is broken if any character is added to any of the `()` branches which might mean that this is also unlikely to happen in well formed expressions. Carlo PS. -P doesn't loop and neither does `echo a | grep -E '((a|())|())+'` nor '(()|(a|()))+` nor `(()|(()|a))+'`

bug#62657: PCRE2-related workarounds that GNU grep might need

2023-04-03 Thread Carlo Arenas
On Mon, Apr 3, 2023 at 2:50 PM Paul Eggert wrote: > >* Disable PCRE2_UCP unless PCRE2 10.35 or higher. this is because of a bug in JIT, alternatively JIT could be disabled >* If ignoring case and PCRE2_MATCH_INVALID_UTF is defined, then > enable PCRE2_NO_START_OPTIMIZE unless PCRE2 10.36

bug#60690: -P '\d' in GNU and git grep

2023-04-03 Thread Carlo Arenas
the next PCRE2 release. Presume PCRE2 is a typo and should have been "git" here? FWIW the PCRE2 fix[1] has been released already with 10.35 and backporting to the Ubuntu 20.04 package that crashed in the original report would also solve the crash with 10.34. Carlo [1] https://gith

bug#62657: PCRE2-related workarounds that GNU grep might need

2023-04-04 Thread Carlo Arenas
On Mon, Apr 3, 2023 at 11:23 PM Paul Eggert wrote: > > On 2023-04-03 23:17, Carlo Arenas wrote: > > On Mon, Apr 3, 2023 at 2:50 PM Paul Eggert wrote: > >> > >> * Disable PCRE2_UCP unless PCRE2 10.35 or higher. > > > > this is because of a bug i

bug#60690: -P '\d' in GNU and git grep

2023-04-05 Thread Carlo Arenas
therefore `\d` meaning `[0-9]` seems "normal". Carlo CC: changed to the real email address for PCRE2 development, for full context on this thread use [4] [1] https://github.com/PCRE2Project/pcre2/pull/186 [2] https://unicode.org/reports/tr18/ [3] https://regex101.com/r/S5RW4c/1 [4] htt

bug#60690: -P '\d' in GNU and git grep

2023-04-07 Thread Carlo Arenas
t PCRE2 already does not implement every recommended aspect > of UTS#18 syntax. PCRE2 also doesn't match Perl, which does support > "\p{gc=Decimal_Number}". Not sure I follow the whole logic here, but PCRE2[3] (search for "general category" which is what the &quo

bug#62769: pcre: correct overpessimistic error checking of pcre2_jit_compile()

2023-04-10 Thread Carlo Arenas
The original code was done in a way that would be useful during porting, but that would hinder future work unnecessarily. Carlo 0001-pcre-correct-overpessimistic-error-checking-of-pcre2.patch Description: Binary data

bug#62769: pcre: correct overpessimistic error checking of pcre2_jit_compile()

2023-04-11 Thread Carlo Arenas
On Tue, Apr 11, 2023 at 3:11 PM Paul Eggert wrote: > > On 4/10/23 23:47, Carlo Arenas wrote: > > The original code was done in a way that would be useful during > > porting, but that would hinder future work unnecessarily. > > Thanks, but wouldn't the attached patch

bug#62745: Color only capture group

2023-04-11 Thread Carlo Arenas
You can do that already with PCRE2 and a lookbehind: echo abcedc|ggrep --color -P '(?=b)c'

bug#62745: Color only capture group

2023-04-11 Thread Carlo Arenas
On Tue, Apr 11, 2023 at 11:51 PM Carlo Arenas wrote: > > echo abcedc|ggrep --color -P '(?=b)c' typo: echo abcedc|ggrep --color -P '(?<=b)c' `ggrep`, would be called grep in your environment

bug#62983: workaround PCRE2 bug affecting at least \D and \W

2023-04-29 Thread Carlo Arenas
Just some nitpicking, but could we use single quotes around the '𝄞' character in pcre-utf8-bug224 instead of double quotes? Carlo

bug#63484: FAIL: y2038-vs-32-bit

2023-05-13 Thread Carlo Arenas
On Sat, May 13, 2023 at 7:48 AM Andreas Schwab wrote: > > On Mai 13 2023, Carlo Marcelo Arenas Belón wrote: > > > on linux m68k. > > ??? Well; the report didn't provide much information, so I made an educated guess. Would you provide a more accurate description? Al

bug#63533: test-mbrlen5.sh failure

2023-05-15 Thread Carlo Arenas
That is a test for a bug that your system image has but that is not relevant to grep (mbrlen doesn't correctly handle a call with a len of 0). Carlo

bug#63533: test-mbrlen5.sh failure

2023-05-19 Thread Carlo Arenas
On Fri, May 19, 2023 at 12:43 PM Carlo Marcelo Arenas Belón wrote: > > On Thu, May 18, 2023 at 10:09:38PM +0200, Jim Meyering wrote: > > On Thu, May 18, 2023 at 2:44 PM Carlo Marcelo Arenas Belón > > wrote: > > > On Wed, May 17, 2023 at 09:09:02PM

bug#63965: grep-3.11: 'make check' fails with glibc-2.37.9000

2023-06-09 Thread Carlo Arenas
On Fri, Jun 9, 2023 at 12:06 AM Jaroslav Škarvada wrote: > diff: in: Value too large for defined data type This has nothing to do with the new glibc, but with the fact that your diff is affected by bug#63492. upgrading to diffutils 3.10 should address that. Carlo

bug#65416: Feature request: include first line of file in output

2023-08-22 Thread Carlo Arenas
ing is the solution, but grep already has a feature that could be used to provide a solution as shown by the following scriptlet (including an scaled data file) : $ cat > c.csv USER,TIP john,0 jane,10 carenas,100 $ ( grep -m1 USER && grep carenas ) < c.csv USER,TIP carenas,100 Carlo

bug#66251: make [[:digit:]] consistent with \d when UCP mode is enabled in -P

2023-09-28 Thread Carlo Arenas
Enable the PCRE2 flag that will be released with 10.43 to keep [[:digit:]] ASCII just like it was done already for `\d`. Carlo 0001-pcre-make-d-and-digit-consistent-in-UCP-mode.patch Description: Binary data

bug#47264: [PATCH] pcre: migrate to pcre2

2021-11-07 Thread Carlo Marcelo Arenas Belón
sets a strict minimum of 10.34 as that is required to pass all tests, even if the issues are minimal and likely to be real bugs that the old PCRE just hide, there is likely more work pending in this area. Performance seems equivalent, and it also seems functionally complete. Signed-off-by: Carlo

bug#51458: grep PCRE - '^' and '$' are not recognized as begin and end of line for multiline strings

2021-11-08 Thread Carlo Marcelo Arenas Belón
example: /\A(?m:\s*^(?:#\w+.*\s*|extern\s+.+)$)*+(?\s*namespace(?:\s+utTestNamespace\s*(?>(?{(?:[^{}]*(?&block)*)*}))|(\s*[\w:]*\s*{)(?&namespace)\s*}))\s*\z/ Carlo [1] https://www.pcre.org/current/doc/html/pcre2pattern.html#internaloptions

bug#51710: [PATCH] pcre: avoid overflow in PCRE JIT stack resizing

2021-11-09 Thread Carlo Marcelo Arenas Belón
like value by sljit. Alternatively, a smaller maximum could be selected as it has been documented[1] that more than 1MB would be unrealistic. [1] https://www.pcre.org/original/doc/html/pcrejit.html#SEC8 Signed-off-by: Carlo Marcelo Arenas Belón --- src/pcresearch.c | 4 1 file changed, 4

bug#47264: [PATCH v2] pcre: migrate to pcre2

2021-11-09 Thread Carlo Marcelo Arenas Belón
in #51710[1] Carlo [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=51710 >From 29c2f2238ed58ceb4101687f3aae7265f6839025 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= Date: Mon, 8 Nov 2021 21:27:03 -0800 Subject: [PATCH v2] pcre: migrate to pcre2 MIME-Version: 1.

bug#51727: add an optional flag to -P to disable JIT

2021-11-09 Thread Carlo Marcelo Arenas Belón
rom caeca5e806fe1b2e368833f05bb4cfb75763d1b3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= Date: Sat, 16 Oct 2021 01:38:11 -0700 Subject: [PATCH] pcre: add a flag to disable JIT MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8

bug#51735: [PATCH] tests: fix test logic for pcre-context

2021-11-09 Thread Carlo Marcelo Arenas Belón
expected LF characters, but a full fix will have to wait until PCRE2. Signed-off-by: Carlo Marcelo Arenas Belón --- tests/pcre-context | 40 ++-- 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/tests/pcre-context b/tests/pcre-context index

bug#47264: [PATCH v2] pcre: migrate to pcre2

2021-11-15 Thread Carlo Marcelo Arenas Belón
On Mon, Nov 15, 2021 at 08:17:02AM -0800, Paul Eggert wrote: > On 11/14/21 20:44, Carlo Arenas wrote: > > > > This shouldn't be a problem in practice. Surely PCRE2_SIZE_MAX is for > > > forward compatibility to a potential future version of PCRE2 that may > > &

bug#47264: [PATCH v2] pcre: migrate to pcre2

2021-11-15 Thread Carlo Marcelo Arenas Belón
On Mon, Nov 15, 2021 at 03:24:41PM -0800, Paul Eggert wrote: > On 11/15/21 12:49, Carlo Marcelo Arenas Belón wrote: > > > Apologies, I realize it is difficult to talk about code in abstract when > > not inlined, but I think it will better addressed by "fixing" it

bug#62983: workaround PCRE2 bug affecting at least \D and \W

2023-04-20 Thread Carlo Marcelo Arenas Belón
instead. Alternatively JIT could be disabled instead, but the option selected has less of an impact on performance. Carlo >From 9194c8e9f9ca7315c2e8c25a7986d0690fb31d7c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= Date: Thu, 20 Apr 2023 18:37:20 -0700 Subject: [PA

bug#62983: workaround PCRE2 bug affecting at least \D and \W

2023-04-21 Thread Carlo Marcelo Arenas Belón
On Fri, Apr 21, 2023 at 11:42:50AM -0700, Paul Eggert wrote: > On 2023-04-20 19:04, Carlo Marcelo Arenas Belón wrote: > > All versions of PCRE2 that include PCRE2_MATCH_INVALID_UTF had a bug on > > its JIT implementation that results in failure to match for the negative > > pe

bug#63016: make it easier to build with development versions of PCRE2

2023-04-22 Thread Carlo Marcelo Arenas Belón
Building against a different version of PCRE2 that the one that is provided with the system is complicated by the fact that unlike what is advertised, if a pkg-config module for libpcre2-8 is found, it will override the values that were provided with PCRE_CFLAGS and PCRE_LIBS. Carlo >F

bug#63484: FAIL: y2038-vs-32-bit

2023-05-13 Thread Carlo Marcelo Arenas Belón
Would the attached workaround the issue? Carlo >From 1fb2147cead1d201b64f4b17154181cd6278eb7f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= Date: Sat, 13 May 2023 07:28:35 -0700 Subject: [PATCH] tests: skip y2038 test upon compare failure * tests/y2038-vs

bug#63484: FAIL: y2038-vs-32-bit

2023-05-13 Thread Carlo Marcelo Arenas Belón
Could you apply the attached patch? Carlo >From b19df9fa4402349e8ae3c35f0e3738f66d354d59 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= Date: Sat, 13 May 2023 07:28:35 -0700 Subject: [PATCH v2] tests: protect y2038 against diff failures * tests/y2038-vs-32-

bug#63484: FAIL: y2038-vs-32-bit

2023-05-13 Thread Carlo Marcelo Arenas Belón
would workaround the diffutils bug in the test, and show that grep is working. Carlo [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=63492 >From 635b53c17492dbf0233c9b803e5a21c82e36d7f5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= Date: Sat, 13 May 2023

bug#63533: test-mbrlen5.sh failure

2023-05-17 Thread Carlo Marcelo Arenas Belón
see this is part of the gnulib tests. Carlo >From d1adf4035c89d4f215ccff48643df7784fbde5ba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= Date: Tue, 16 May 2023 00:11:24 -0700 Subject: [PATCH] gnulib: avoid mbrlen-tests Since e319a8 (grep: improve perfor

bug#63533: test-mbrlen5.sh failure

2023-05-19 Thread Carlo Marcelo Arenas Belón
On Thu, May 18, 2023 at 10:09:38PM +0200, Jim Meyering wrote: > On Thu, May 18, 2023 at 2:44 PM Carlo Marcelo Arenas Belón > wrote: > > On Wed, May 17, 2023 at 09:09:02PM -0400, Caleb Zulawski wrote: > > > > > > Isn’t this test too strict, then? > > > &g