Hi. I just did a fresh download of the CVS repository, ran
./autogen.sh
./configure && make && make check
And got some failures from the check. The results are attached.
Fedora 7, GCC 4.1.2 (a la Redhat).
I an running with LC_ALL=C.
Thanks,
--
Aharon (Arnold) Robbins
> Date: Fri, 23 May 2008 03:03:11 -0300
> From: Tony Abou-Assaleh <[EMAIL PROTECTED]>
> Subject: Re: [bug #23321] Epsclosure speedup patch
> To: Johan Walles <[EMAIL PROTECTED]>
> Cc: bug-grep@gnu.org
>
> Johan Walles wrote:
> > Follow-up Comment #1, bug #23321 (project grep):
> >
> > The just att
I wrote:
> So, I suggest looking at dfaanalyze also. I'm going to look there to see if
> an int -> char change may help too.
Looks like int -> char doesn't help there, and that nalloc could profitably
be of type size_t instead of int, but I'm leaving it alone.
Arnold
Hi. Here is a patch for a bug that I just (re-)fixed in gawk.
It looks like there has not been much activity in the CVS in a long
while. Sigh.
In any case, valgrind is your friend; this showed up in gawk in a UTF-8
locale.
In general, it is worth comparing the gawk dfa.c with the grep one; I
hav
Tony,
Hi.
> Date: Sat, 13 Dec 2008 13:39:19 -0400
> From: Tony Abou-Assaleh
> To: bug-grep@gnu.org
> Subject: Re: dfa.c: fix memory leak
>
> Arnold,
>
> > In general, it is worth comparing the gawk dfa.c with the grep one; I
> > have fixed a few bugs there.
>
> The gawk-stable's and gawk-devel's
Hi Jim.
Has anyone looked at issues with dfa.c? There is a difference in
how dfa handles x{0} from how regex handles it; dfa, IIRC, treats
it as x{1}. Gawk checks for this and doesn't use dfa in such a case,
but it'd be nice if it were fixed.
There may be other fixes in the gawk dfa.[ch] tha
I think the Debian guys sent me the bug report and I sent them a fix.
Glad you were able to put it into grep.
Arnold
> From: Jim Meyering
> To: arn...@skeeve.com
> Cc: ani...@debian.org, bug-grep@gnu.org
> Subject: Re: grep-2.6 is imminent: pending patches, bug reports?
> Date: Thu, 04 Mar 2010
Jim,
You're right - this is a problem on K&R compilers as well as on some
compilers with ANSI prototypes that aren't really ANSI (Ultrix seems to
want to come to mind).
It was once upon a time an issue for gawk; I don't know if it still is
or if the world has since made enough progress.
I will p
NOTE: I'm replying manually since I can't deal with Savannah.
> From: Paolo Bonzini
> Date: Mon, 08 Mar 2010 18:22:01 +
> Subject: [patch #6899] Speed-up for searching in multibyte and ignore-icase.
>
> Especially now that there is a
> good DFA-based matcher in glibc,
Has anybody proven this
Hi Paolo,
> The main problems with the glibc DFA matcher are because it has to
> track subexpression boundaries. So it won't be as fast as dfa.c in
> the general case. Never.
I think this means we can't give up dfa.c, then.
> On the other hand it developed many optimizations for UTF-8 that rig
Thanks for the explanations of the dfa vs. regex.
> > I think I have an obligation at this point to mention:
> >
> > http://swtch.com/~rsc/regexp/
> >
> > In particular, there is code there for an "Efficient (non-backtracking)
> > NFA implementation with submatch tracking. Accepts UTF-8 and
Hi All.
> Patches 1 to 9 are simple cleanups, .
> . The dfa.c after this patch is
> suitable for merging into gawk.
Jim - Please signal me off list as to when I should pull this in.
> Patch 10 adds more UTF-8 test cases (and multibyte in general) to make
> sure nothing breaks.
>
> Patch
I'm not happy with removing the null checks in calls to free(); there
were systems out there that would throw a fatal error if you passed
null to free(). I'd prefer to leave those checks in.
THanks,
Arnold
Hi. I just finished importing dfa.[hc] into gawk.
\s(20 THANK YOU !!! \s for all the work. It passes make check. Attached
is a diff from grep to gawk to retain additional things I need (e.g. VMS).
FYI, I added more years in the copyright to dfa.c based on the years in
which it was published as
Hi Jim.
> I've made the above two changes with a commit in your name.
Great, thanks.
> However, I'll pass on the other changes.
>
> > +#ifdef HAVE_CONFIG_H
> > #include
> > +#endif
I can remove that from dfa.c, no big deal.
> > #include
> > #include
> > #include
> > +
> > +#ifndef VMS
Hi.
> Date: Thu, 25 Mar 2010 14:27:14 +0100
> From: Paolo Bonzini
> To: Jim Meyering
> Cc: bug-grep@gnu.org
> Subject: Re: [PATCH v3] dfa/grep: fix compilation with MBS_SUPPORT
>
> On 03/25/2010 02:11 PM, Jim Meyering wrote:
> >> > Unfortunately, using wchar.h unconditionally would not be okay
Hi. This was needed (along with another, irrelevent-to-grep change) to allow
gawk to compile on latest cygwin.
Beats me how it even compiled ok under Linux. :-)
Thanks,
Arnold
---
Mon Mar 29 05:41:35 2010 Corinna Vinschen
* dfa.c: Include h
Index: dfa.c
===
RCS file: /d/mongo/cvsrep/gawk-stable/dfa.c,v
retrieving revision 1.27
diff -u -r1.27 dfa.c
--- dfa.c 29 Mar 2010 02:58:08 - 1.27
+++ dfa.c 31 Mar 2010 12:45:16 -
@@ -2781,6 +2781,7 @@
unsig
I have testers for SCO Unix using GCC 2.95 (!), as well as VMS and z/OS.
The latter has a vendor supplied C89 compiler but no GCC at all, nor
is it likely to get GCC particularly soon.
I personally don't see having variables at the top of functions as a
big price to pay for continued portability t
> Date: Thu, 01 Apr 2010 10:02:20 +0200
> From: Paolo Bonzini
> To: Aharon Robbins
> CC: j...@meyering.net, r...@sc3d.org, bug-grep@gnu.org
> Subject: Re: dfa.c fix for C89 compilers
>
> On 04/01/2010 08:25 AM, Aharon Robbins wrote:
> > I have testers for SCO Unix us
Hi. I successfully build 2.6.3 on a PPC Mac:
$ uname -a
Darwin Macintosh.local 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:57:01
PDT 2009; root:xnu-1228.15.4~1/RELEASE_PPC Power Macintosh
$ gcc -v
Using built-in specs.
Target: powerpc-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5465~16/s
Hi All.
>From an older mail thread:
> Maybe by the time dfa.c is ready, gnulib's policy will permit at least
> C89+declaration-after-stmt.
There remain systems for which neither a vendor C99 compiler nor GCC
are available.
But I don't think I'm getting anywhere with this issue. So, I will quote
Hi. The Cygwin maintainer tells me that libsigsegv on Windows pulls in the
dreaded header file which defines WCHAR as wchar_t, causing a
conflict with the WCHAR in the enum in dfa.h. I propose the following diff
which compiles OK under Linux and moves the details into dfa.c.
Thanks,
Arnold
Hi. My z/OS maintainer indicates that in_coll_range() doesn't compile
there. He suggests the following patch:
*** dfa.c.orig Fri Apr 2 06:00:20 2010
--- dfa.c Fri Apr 2 05:59:38 2010
***
*** 408,414
--- 408,419
static int
in_coll_range (char ch, char from, char to)
{
Hi. The code
enum token_enum;
typedef token_enum token;
isn't acceptable to at least two otherwise more-or-less C89 compilers.
So, I went whole-hog and moved all the dfa internals into dfa.c. This
fixes the long-standing wish in dfa.h about not exposing internals.
Grep passes it
>From 1d9a811652ab4811a817c3be06e09d08e3d8a09b Mon Sep 17 00:00:00 2001
From: Arnold D. Robbins
Date: Thu, 8 Apr 2010 20:42:34 +0300
Subject: [PATCH] Fix declaration of dfabroken in dfa.h
* dfa.h (dfabroken): Fix declaration to match that in dfa.c.
---
src/dfa.h |2 +-
1 files changed, 1 ins
> The only change I made was to add "[GAWK]" to the log entry...
Great. I'll try to remember to do that for any future changes that
are gawk-only.
Thanks,
Arnold
Now that grep has acquired a full-blown HACKING file, should
README-hacking be removed?
Thanks,
Arnold
I just did a completely fresh git pull of grep. I then did
./bootstrap
./configure && make
make check
The latter dies with:
make[3]: Entering directory `/d/local/src/Gnu/grep/gnulib-tests'
make libtests.a test-alloca-opt test-argmatch test-atexit test-binary-io
test-bi
>From 4a422ff749cb4bbed270776038adb9a5977f47b6 Mon Sep 17 00:00:00 2001
From: Arnold D. Robbins
Date: Tue, 20 Apr 2010 13:45:53 +0300
Subject: [PATCH] Fix add_utf8_anychar to work in non-MBS situation
dfa.c (add_utf8_anychar): Bracket body in `#if MBS_SUPPORT'.
Reported by Anders Wallin, fixed by
It was a clone.
I just tried again, and it all worked, so we'll just call it a phase
of the moon error and get on with life... :-)
Sorry for the noise.
Arnold
> From: Jim Meyering
> To: Aharon Robbins
> Cc: bug-grep@gnu.org
> Subject: Re: problem with make check
> Date:
Hi All.
As part of testing gawk, Nelson Beebe builds it on a bunch of systems.
On MirBSD (which is some fringe BSD variant), the regx8bit test fails.
When gawk is run to bypass dfa, the test passes.
Grep 2.6.3 does not pass all its tests on that system. The ones that
fail are:
FAIL: test-mbrtowc
Hi.
> Date: Sat, 01 May 2010 13:34:57 +0200
> From: Paolo Bonzini
> To: Aharon Robbins
> CC: bug-grep@gnu.org
> Subject: Re: dfa.c problems on MirBSD...
>
> On 04/30/2010 11:10 AM, Aharon Robbins wrote:
> > Hi All.
> >
> > As part of testing gawk, Nelso
> Date: Sat, 01 May 2010 13:34:57 +0200
> From: Paolo Bonzini
> To: Aharon Robbins
> CC: bug-grep@gnu.org
> Subject: Re: dfa.c problems on MirBSD...
>
> On 04/30/2010 11:10 AM, Aharon Robbins wrote:
> > Hi All.
> >
> > As part of testing gawk, Nelso
> From: Jim Meyering
> To: Paolo Bonzini
> Date: Wed, 05 May 2010 12:05:20 +0200
> Cc: bug-grep@gnu.org
> Subject: Re: [PATCH 2/2] tests: add test for newly-fixed performance problem
>
> Paolo Bonzini wrote:
> > On 05/04/2010 07:32 PM, Jim Meyering wrote:
> >>$(AWK) 'BEGIN {for (i=0; i<13000;
Hi All. In old thread with this subject, I was asked if gawk checked
for regexps like [:space:] which should be [[:space:]]. Paolo asked
that I send my reply to the list.
> On 07/06/2010 06:00 AM, Aharon Robbins wrote:
> > Hi Guys.
> >
> > Sorry for the long delay in replyi
Hi. The following patch is needed if compiling without MBS support.
Thanks,
Arnold
--- /usr/local/src/Gnu/grep/src/dfa.c 2010-09-15 08:25:31.0 +0200
+++ dfa.c 2010-09-15 08:26:49.0 +0200
@@ -3122,8 +3150,6 @@
return s1;
}
-#endif /* MBS_SUPPORT */
-
/* Initialize
Sorry for chiming in on this rather late...
> Date: Fri, 24 Sep 2010 16:27:53 -0600
> From: Eric Blake
> To: Bruno Haible
> Cc: Paolo Bonzini , Paul Eggert ,
> bug-grep@gnu.org, Jim Meyering
> Subject: Re: character ranges in regular expressions
>
> On 09/24/2010 03:52 PM, Bruno Haible
Hi.
Upon comparing grep's dfa.c to the development gawk, I suggest the
following fixes.
1. Remove the #ifdef GAWK - upcoming gawk supports \s and \S
2. Move add_utf8_anychar into the MBS_SUPPORT #ifdef. This latter has
been reported on the list before - I thought it'd been checked in even.
arriving in my inbox out of order (not sure why that
is!). I will try to review and reply.
Thanks,
Arnold
> Date: Thu, 09 Jun 2011 10:14:01 -0700
> From: Paul Eggert
> To: Paolo Bonzini
> CC: Aharon Robbins , bug-grep ,
> bug-gnulib , k...@freefriends.org
> Subject
Hi All.
> Date: Thu, 09 Jun 2011 10:14:01 -0700
> From: Paul Eggert
> To: Paolo Bonzini
> CC: Aharon Robbins , bug-grep ,
> bug-gnulib , k...@freefriends.org
> Subject: Re: Dealing with character ranges in grep
>
> On 06/08/2011 10:14 PM, Aharon Robbins wrote:
&g
Hi.
> From: Paolo Bonzini
> Date: Tue, 14 Jun 2011 13:11:32 +0200
> Subject: Re: Dealing with character ranges in grep
> To: Aharon Robbins
> Cc: egg...@cs.ucla.edu, k...@freefriends.org, bug-grep@gnu.org,
> bug-gnu...@gnu.org
>
> > ? In principle, I'm al
Hi All.
Can I get a clear "yes, grep and sed are going to change to Reasonable
Range Interpretation"?
I was looking into the code, in terms of not using RE_RANGES_IGNORE_LOCALES
but simply always doing it based on character set ordering.
Doing so lets up throw away hard_locale.[ch] also.
Befor
Hi.
> From: Jim Meyering
> To: Bruno Haible
> Cc: Paolo Bonzini , Aharon Robbins ,
> bug-gnu...@gnu.org, bug-grep , k...@freefriends.org
> Subject: Re: Dealing with character ranges in grep
> Date: Thu, 16 Jun 2011 07:58:05 +0200
>
> To make this proposed
Hi All.
> Date: Wed, 15 Jun 2011 14:09:45 -0600
> From: Eric Blake
> To: Paul Eggert
> CC: Aharon Robbins , bonz...@gnu.org, bug-grep@gnu.org,
> bug-gnu...@gnu.org, k...@freefriends.org
> Subject: Re: Dealing with character ranges in grep
>
> > Doesn'
> Date: Mon, 27 Jun 2011 15:10:43 +0200
> From: Paolo Bonzini
> To: Aharon Robbins
> CC: egg...@cs.ucla.edu, ebl...@redhat.com, bug-grep@gnu.org,
> bug-gnu...@gnu.org, k...@freefriends.org
> Subject: Re: Dealing with character ranges in grep
>
> On 06/16/2011
Hi Grep Guys.
A while back David Millis reported a rather strange problem with gawk 4.0.0
on Windows:
> Date: Sat, 10 Sep 2011 23:13:25 -0700 (PDT)
> From: David Millis
> To: bug-g...@gnu.org
> Subject: [bug-gawk] 4.0.0 Regex Patterns Choke on Exotic Chars
>
> # A bug in GNU AWK 4.0.0's regex ha
Hi. Just now catching up on email...
> Date: Fri, 16 Sep 2011 15:12:37 +0200
> From: Paolo Bonzini
> To: arn...@skeeve.com
> CC: bug-grep@gnu.org
> Subject: Re: [PATCH 1/5] maint: ensure that MB_CUR_MAX is defined even when
> !MBS_SUPPORT
>
> On 09/16/2011 03:03 PM, arn...@skeeve.com wrote:
> >
Thanks Jim & Eli.
Eli - please submit a patch for the windows bits in gawk.
Thanks!
Arnold
Hi.
> > Having variables grep_mb_cur_max and dfa_mb_cur_max (separate for the
> > reasons Arnold explained) would work, but it would make it impossible
> > for the compiler to throw away the multibyte code when MBS_SUPPORT is zero.
>
> Why?
>
> #if MBS_SUPPORT
> int greb_mb_cur_max = MB_CUR_MAX;
Hi.
1. The setbit_wc() function for not MBS_SUPPORT has return type bool
but does not return a value
2. A test for `defined MBS_SUPPORT' should just be `MBS_SUPPORT' since
MBS_SUPPORT is now always defined.
Diff below.
Thanks,
Arnold
--
--- /usr/local/src/Gnu/g
Hi. Back in the spring we discussed providing Rational Range Interpretation
in GNU tools. I took the plunge with gawk 4.0.
I can provide diffs for grep without much trouble.
Is there still interest in this? If so, do I need to update the doc too?
Thanks,
Arnold
Hi. Here are the patches. I did my best to use git the way y'all want but
if not, please just fix it up...
First patch is for regcomp.c in gnulib. Paulo, you should be able
to use this in sed. Please do. :-)
Second is for dfa.c and grep.texi in grep.
Thanks,
Arnold
---
Hi Paul.
> Date: Tue, 29 Nov 2011 09:17:48 -0800
> From: Paul Eggert
> To: Aharon Robbins
> CC: bug-grep@gnu.org
> Subject: Re: any interest in Rational Range Interpretation?
>
> On 11/28/11 09:36, Aharon Robbins wrote:
> > Hi. Back in the spring we discuss
I decided to donate some time to the grep project by reading the manual
and seeing if I could help it out. Attached is a patch that makes some
things more consistent and makes the printed manual look nicer. I also
fixed a few mistakes here and there.
Enjoy,
Arnold
>From c1c9db1b37427924defa
Hi Paolo.
> Date: Sat, 10 Dec 2011 18:10:13 +0100
> From: Paolo Bonzini
> To: Aharon Robbins
> CC: bug-grep@gnu.org
> Subject: Re: Rational Range Interpretation patches
>
> On 12/01/2011 09:21 PM, Aharon Robbins wrote:
> > diff --git a/src/dfa.c b/src/dfa.c
> &
Hi.
> From: Jim Meyering
> To: arn...@skeeve.com
> Cc: bug-grep@gnu.org
> Subject: Re: STREQ in dfa.c
> Date: Sat, 31 Dec 2011 18:48:15 +0100
>
> arn...@skeeve.com wrote:
>
> > There are not a lot of uses of STREQ in dfa.c. I'd be happy if you
> > expanded the macro there.
>
> Sorry, but that wou
Hi. This came up in the context of gawk on DJGPP, which doesn't have
MBS_SUPPORT or btowc. It was failing several tests. This looks to me
to be the right change.
Thoughts?
Thanks,
Arnold
-
diff --git a/dfa.c b/dfa.c
index 172ff79..b16bf06 100644
--- a/dfa.c
+++ b/dfa.c
@@
Thanks Paul! I will merge into the gawk sources.
Arnold
Hi. This is a minor documentation change in dfa.c.
I hope next week to make another attempt at RRI patches.
Thanks,
Arnold
-
diff --git a/src/dfa.c b/src/dfa.c
index 6ab0ab4..1f79fc0 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -46,7 +46,7 @@
#include "gettext.h"
#def
Hello All.
Here is my 2nd try at RRI. The 3 patches are for dfa.c, grep.texi,
and gnulib/reg*.c.
Paolo - I still think that when compiling the dfa my change is correct,
since the regex routines were called only for checking if the range is
valid.
Thanks,
Arnold
---
>From 5d4a1e56
>From 8b8d37d4e4bfae93920d17f7f1c81b8db26fcefd Mon Sep 17 00:00:00 2001
From: Arnold D. Robbins
Date: Mon, 16 Jan 2012 22:17:58 +0200
Subject: [PATCH] Implement Rational Range Interpretation.
* regcomp.c (build_range_exp): Compare the wide characters
directly instead of using wcscoll.
* regexec
>From 366cc2f4170f8dfbaa2137602e4ccc35e854766a Mon Sep 17 00:00:00 2001
From: Arnold D. Robbins
Date: Mon, 16 Jan 2012 22:07:40 +0200
Subject: [PATCH 2/2] Document Rational Range Interpretation.
---
doc/grep.texi | 21 -
1 files changed, 16 insertions(+), 5 deletions(-)
di
Hi All.
See below for results of running make check on a PPC Mac
$ uname -a
Darwin arnold-robbinss-powerbook-g4-12.local 9.8.0 Darwin Kernel Version 9.8.0:
Wed Jul 15 16:57:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_PPC Power Macintosh
Hi. I just applied the below to gawk's dfa.c.
Thanks,
Arnold
---
diff --git a/dfa.c b/dfa.c
index 64ce8f7..2bce294 100644
--- a/dfa.c
+++ b/dfa.c
@@ -876,7 +876,7 @@ static token
parse_bracket_exp (void)
{
int invert;
- int c, c1, c2;
+ int c = 0, c1 = 0, c2 = 0;
Hi Paul.
> Date: Wed, 15 Feb 2012 15:57:09 -0800
> From: Paul Eggert
> To: Aharon Robbins
> CC: bug-grep@gnu.org
> Subject: Re: avoid gcc 4.6.2 'may be used before set' warnings in dfa.c
>
> On 02/15/2012 10:54 AM, Aharon Robbins wrote:
> > - int c, c1,
Hi.
> That's the problem.
> If this definition from dfa.h can be improved, please report
> the details of the offending compiler:
>
> #if __GNUC__ < 2 || (__GNUC__ == 2 && __GNUC_MINOR__ < 6) || __STRICT_ANSI__
> # define __attribute__(x)
> #endif
I have sent out a query to my testers list
Hi Jim.
> From: Jim Meyering
> To: GNU
> Cc: Arnold Robbins
> Subject: reformatting dfa.c, now is the time
> Date: Thu, 01 Mar 2012 17:15:51 +0100
>
> Some recent review challenges have highlighted the need
> for a consistent formatting style in dfa.c, and since the changes
> Paul pushed today
Hi Jim, Paul, Paolo, anyone else I missed,
I sent in paperwork for copyright assignments in grep and gnulib to the
FSF; they were mailed last Friday so they should be there by now.
Can we move ahead getting the Rational Range Interpretation patches
integrated?
I expect that they would drop in ea
Hi Paolo.
> Date: Fri, 27 Apr 2012 10:39:21 +0200
> From: Paolo Bonzini
> To: Aharon Robbins
> CC: bug-grep@gnu.org, k...@gnu.org, bug-gnulib
> Subject: Re: RRI - copyright assignment mailed in
>
> Il 26/04/2012 22:40, Aharon Robbins ha scritto:
> > Hi Jim, Paul,
Here are the updated RRI patches for grep. First one is for dfa.c and
doc/grep.texi. NOT handled is removal of hard-locale.[ch] from lib/ and
from the make infrastructure.
The second patch is for gnulib. Both are relative to master in both
git repos as of less than an hour ago.
Thanks,
Arnold
Hi Paolo.
> Il 27/04/2012 10:47, Aharon Robbins ha scritto:
> > What I sent to the list a while back should do. If not, I will recreate
> > the patches but I can't promise they'll 100% follow Jim's standards.
>
> Yes, please recreate them.
Done. Sent.
And
Hi Guys,
I just got the below bug report. Apparently it has to do with the
fact that xalloc.h includes .
Any objection to changing the order of includes as suggested? Or
how about moving the include of into dfa.c itself?
Thoughts welcome,
Thanks,
Arnold
> Date: Tue, 15 Jan 2013 18:37:51 +01
Hello All.
On systems where limits.h defines RE_DUP_MAX to be very small (such as
OpenVMS, which defines it to be -1) compiling a regexp can fail.
Although undefs and redefines RE_DUP_MAX, it's included too
early in the process. I am pushing the following change to gawk's copy
of dfa.c. I submit
Hi Eric.
> >> Rather, doesn't that mean that should #include
> >> prior to redefining RE_DUP_MAX, to be sure that it overrides any limits
> >> that would otherwise be present in the system headers?
> >
> > That would be nice, but hell will likely freeze over before glibc
> > updates it for us.
The last I'll say on this...
> Date: Thu, 31 Jan 2013 12:31:52 -0800
> From: Paul Eggert
> To: Aharon Robbins
> CC: ebl...@redhat.com, bug-grep@gnu.org
> Subject: Re: dfa.c order of include problem
>
> On 01/31/13 12:24, Aharon Robbins wrote:
> >> Glibc do
Hello.
This is based on a bug report submitted for gawk, but I have reproduced
it with the dfa.c in grep 2.14.
Attached are the following files:
testdfa.c: A harness that pulls in all the gawk stuff needed
to call dfa the same way gawk does.
dfa_test_re1: A regexp that cau
Hi.
> This is based on a bug report submitted for gawk, but I have reproduced
> it with the dfa.c in grep 2.14.
>
> Attached are the following files:
>
> testdfa.c:A harness that pulls in all the gawk stuff needed
> to call dfa the same way gawk does.
>
Has anyone had a cha
Hi All.
A few weeks ago I reported a bug in dfa that was causing an assertion
failure in gawk. It occurred in UTF locales. Attached are two test
programs, one that fails, and one that doesn't, and input. The failure
is seen in gawk, I'm not sure how to reproduce in grep.
Mike Haertel was kind e
Is this file: gl/lib/regex_internal.h.diff supposed to be in the repo?
Thanks,
Arnold
The following fix to dfa.c was suggested by a static checking tool.
I'm applying it in the gawk code base.
Basically, it's theoretically possible for len to have run off the end
of the `str' array.
Thanks,
Arnold
diff --git a/dfa.c b/dfa.c
index 8b79eb7..490a075 100644
--- a/dfa.c
+++ b/dfa.c
@
# optional
printf '' | ./gawk '/.../' # your tests here. :-)
Much thanks!
> From: Jim Meyering
> Date: Mon, 23 Sep 2013 14:04:09 -0700
> Subject: Re: bug#15440: [PATCH] dfa: fix \s and \S to work for multibyte
> To: Aharon Robbins , 15...@debbugs.gnu.o
Hello All.
> >>> After updating from 2.14 to 2.15 grep has started to fail to match
> >>> patterns
> >>> that contain '\s*' or '\s\+'
>
> And here's a proper patch, including NEWS and test suite additions:
FWIW, I can't reproduce this in gawk (gawk-4.1-stable branch).
The program below correctl
Hi.
> > The program below correctly produces no output, with and without the fix
> > in dfa.c:lex. (I have added the fix anyway.)
Also with LC_ALL=en_US.utf8, without the fix the program still passes.
So, any ideas?
Thanks,
Arnold
Hi.
> > Hi.
> >
> >> > The program below correctly produces no output, with and without the fix
> >> > in dfa.c:lex. (I have added the fix anyway.)
> >
> > Also with LC_ALL=en_US.utf8, without the fix the program still passes.
> >
> > So, any ideas?
>
> Hi Arnold,
> I don't recall how gawk uses df
> > > Hi.
> > >
> > >> > The program below correctly produces no output, with and without the
> > >> > fix
> > >> > in dfa.c:lex. (I have added the fix anyway.)
> > >
> > > Also with LC_ALL=en_US.utf8, without the fix the program still passes.
> > >
> > > So, any ideas?
> >
> > Hi Arnold,
> > I do
Hello All.
I believe that the code in dfa.c that deals with character ranges
is incorrect with respect to Rational Range Interpretation.
This shows up in the following test case:
$ echo \\ | src/grep -Xawk '[\[-\]]'
$
Whereas with gawk:
$ echo \\ | gawk '/[\[-\]]/'
Hi Paul.
> Thanks for continuing to bird-dog this.
It's either "tenacity" or "stubborness". :-)
> > I do think that gawk's code is the correct thing to be doing for RRI.
>
> I agree, and installed the second patch enclosed below to
> implement this.
Cool! Hurray! One more bit that comes into
Hi Paul.
> > What happens if you compile them in and run the grep test suite?
>
> The test suite passes, but grep is bigger and (I presume) slower. The
> GREP-related changes are for performance, and shouldn't affect behavior.
>
> How about if we apply the attached patch to dfa.c, in both gawk a
Hi Paul & Jim,
> > What happens if you compile them in and run the grep test suite?
>
> The test suite passes, but grep is bigger and (I presume) slower. The
> GREP-related changes are for performance, and shouldn't affect behavior.
>
> How about if we apply the attached patch to dfa.c, in both
Hi.
> Date: Sat, 25 Jan 2014 10:56:29 -0800
> From: Paul Eggert
> To: Aharon Robbins , 16...@debbugs.gnu.org
> Subject: Re: bug#16481: dfa.c and Rational Range Interpretation
>
> Aharon Robbins wrote:
> > I don't think it's right to remove this code, but I don&
Hi.
The code in atom() looks to me like it could use a little refactoring
and simplification. I suggest the diff below. With it both grep and gawk
still pass their tests.
Thanks,
Arnold
diff --git a/src/dfa.c b/src/dfa.c
index b79c604..d2916ee 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -1725,17
Hi Paul.
I skimmed the patch.
All that exclusive-ORing looks a little scary to me. Will that work,
for example, on EBCDIC systems? Gawk supports z/OS - a POSIX enviornment
on top of OS/390. Will it work on systems using some of the older
far Eastern, non-Unicode locales?
What is it even doing?
> > I suggest starting with the XOR changes for unibyte locales - they seem
> > (to me) to be good no matter what. And then separately try to deal with
> > the multibyte case.
>
> Unfortunately the changes don't work even for unibyte locales, since
> unibyte locales can have the same problem, i.e.
Hi.
I'm just wondering - does the regex code have the same issue with
title case characters? This is an issue for gawk. I will try to run
your test on gawk, but if you have time to check you can do so by
setting GAWK_NO_DFA in the environment and then gawk will bypass the
dfa matcher.
Thanks!
Hi Paul.
> > As a point of information, it does happen for gawk.
>
> Could you please say where that happens? I just now looked at the gawk
> trunk, and the only two places I saw it calling dfaexec
> (helpers/testdfa.c and re.c), it passed a nonnull backref argument.
OK, I was wrong and spoke
Hi Paul.
> Subject: bug#16895: [PATCH] grep: fix multiple bugs with bracket expressions
> To: 16...@debbugs.gnu.org
> Date: Thu, 27 Feb 2014 09:34:33 -0800
> From: Paul Eggert
>
> I'm afraid there are several problems in the dfa code. I still don't
> have a handle on all of them, but here's my
Hi Paul.
> Date: Thu, 27 Feb 2014 13:24:53 -0800
> From: Paul Eggert
> Organization: UCLA Computer Science Department
> To: Aharon Robbins , 16...@debbugs.gnu.org
> Subject: Re: bug#16895: [PATCH] grep: fix multiple bugs with bracket
> expressions
OK - I tried out tha
Hi.
This turned up in testing on DJGPP. Thanks,
Arnold
--
diff --git a/dfa.c b/dfa.c
index 8771bbe..813c239 100644
--- a/dfa.c
+++ b/dfa.c
@@ -820,9 +820,13 @@ using_simple_locale (void)
static int unibyte_c = -1;
if (unibyte_c < 0)
{
+#ifdef LC_ALL
FYI
Arnold
---
> Date: Tue, 18 Mar 2014 13:44:57 -0600 (MDT)
> From: "Nelson H. F. Beebe"
> To: "Arnold Robbins"
> Cc: be...@math.utah.edu
> Subject: gawk-4.1.0f: a patch for a failed build
>
> On SGI IRIX MIPS, gawk-4.1.0a had built and installed without problems
> on 13-Dec-201
1 - 100 of 128 matches
Mail list logo