> echo Y | LC_ALL=en_US.UTF-8 ./grep -i '[y]'
I think gawk dfa fixes this. It rings a vague bell
OK, I'll try to start using git - I tend to send in patches based
on my copy.
I'll sync to your version for this change.
Arnold
Reported by Corinna Vinschen, vinsc...@redhat.com
Thanks.
compilers (I have users
wit such compilers) and the declaration after executable code won't fly
in that case.
This also makes only one call to wctob() instead of two. :-)
Thanks,
Arnold
Hi Jim.
> You appear to have misread the patch, since that is precisely
> what resulted when I made that change.
You're right. Not enough sleep, or else I need new glasses. Or something.
Sorry for the noise.
Thanks,
Arnold
Hi Jim.
I just now saw this series of patches... Did not get to look at them
yet.
Do you by chance have a few minutes to see if they apply to the dfa.c
in gawk and if so if they break anything?
If not, I'll do the work when I get to it.
Thanks,
Arnold
ticable amount of time calling
it a lot. So I set up a global variable gawk_mb_cur_max and initialize
it in main(), since the result should never change during a single run of
the program. It made a difference.
Hope this helps.
Thanks,
Arnold
g the difference in interpretation?
3. If it is a dfa vs. regex issue then someone should decide how to
bring the two matchers back into consistency with each other.
Thanks,
Arnold
Thanks Jim.
Eli - let us know what happens, please.
Thanks,
Arnold
I would also like to see grep not use decl after statement. I have at least
one platform and I think two where it's not supported, and I have to
keep forward porting the changes in dfa.c.
Thanks,
Arnold
e at least
> > one platform and I think two where it's not supported, and I have to
> > keep forward porting the changes in dfa.c.
>
> Please describe the systems for which you find this necessary.
z/OS - which is a POSIX-ish environment on top of OS/390, and VMS.
Thanks,
Arnold
porter for gawk to z/OS and he
doesn't use GCC, on purpose. I will let you know what I hear back.
I will also double check as to the status of the compiler on Alpha
and Itanium VMS systems.
Thanks,
Arnold
re a reason you don't
> > use it for building gawk? I suggested that the grep team stop using
> > declarations after statements because of z/OS, and got pointed back to
> > your page on z/OS GCC and the mvsgcc project.
> >
> > Thanks,
> >
> > Arnold
&g
> but in dfa.c, the impact is minimal, so I've made the changes there.
T H A N K Y O U !
I really appreciate this.
Arnold
Forgot: YOu can git rm hard-locale.h and DTRT in the makefiles...
THanks,
Arnold
with a
test and appropriate doc. We're trying to change the way the world
relates to ranges to be rational. This won't necessarily be easy.. :-)
Thanks,
Arnold
Hi all. Using grep 2.10 on
$ uname -a
Darwin arnold-robbinss-powerbook-g4-12.local 9.8.0 Darwin Kernel Version 9.8.0:
Wed Jul 15 16:57:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_PPC Power Macintosh
I got an unexpected pass in the test suite, word-delim-multibyte. Let me know
if you want more
1 / Vax
days. I should probably get rid of it.
Thanks,
Arnold
s that the quantity has been
decreasing over time.
Once I get back to RRI for grep, the amount will go down even more. :-)
Thanks,
Arnold
here any reason, in this day and age, to still be using defines
instead of enums for something like this?
Thanks,
Arnold
eal reason, except going along with the rest of the dfa code.
Sounds like a good excuse to upgrade the rest of dfa.c :-)
> Note that you wouldn't always get meaningful names when debugging,
> because more than one bit can be set.
Yes, I know. But it's still an improvement.
Just a suggestion.
Thanks,
Arnold
eeded? None of the other bits of
> inserted code use bool.
Lots of the existing code does use it. :-) The hard_locale.h file
(which is now no longer needed) did the include of .
With that file gone, stdbool.h has to be included directly.
After all, I assume that you still want the code to compile... :-)
Thanks,
Arnold
; after the change. I assume LC_COLLATE still affects equivalence
> classes, such as those in '[[=a=]b]', and that it still
> affects collating symbols, such as the 'ch' in '[[.ch.]]'
The truth is, I have no idea what it's used for, so I didn't
say anything. Can someone state definitively?
Thanks,
Arnold
nontrivial equivalence classes and collating symbols, to check.
> If it works, then we can document it; if not, then we can document
> *that*.
I suspect it will only work on GLIBC and maybe Solaris systems, and
that too should be documented.
Thanks,
Arnold
The definition of is_valid_unibyte_character needs to match what's
currently in HEAD...
Thanks,
Arnold
I still have to do paperwork for the RRI changes. I will get to that,
I hope, within the next week and will let you know when the forms are
in the mail.
Thanks!
Arnold
This is what I wanted, so I tried rewriting the spec as:
>
> grep --color '^[[:alpha:]]+' xxx
> lists nothing.
For grep, you need '^[[:alpha:]]\+'.
HTH,
Arnold
> Now I'm going into the docs to find where it says I need a backslash for
> the + but nothing for the *.
It has to do with history and different flavors of regular expressions
in different tools. I do believe it's documented in grep.texi.
> Thank you for the insight.
Glad to help.
Arnold
It looks like I can just use the code as it is now in grep. I have asked
for compile failures and haven't gotten any.
That will be the simplest thing for me to do, since I don't then need
to add another file to the distribution.
Thank you for the consideration, though!
Arnold
Hi Paul.
I prefer the _Noreturn solution and updating configure.ac vs adding
yet another file to the dist just to support dfa.
Thanks!
Arnold
g else will happen later.
Thanks,
Arnold
Paolo,
I'm sorry. I give up.
Please implement RRI to suit the way you want it done. Feel free to use
whatever from my code / doc as you like, or not.
I am happy with what I have now in gawk and will eventually pick up
grep's changes from dfa if they suit me.
Thanks,
Arnold
running with it.
I will review and integrate.
I am *thrilled* that RRI has finally made its way into grep.
(As an asside - I'm guessing that sed will pick up RRI with the changes
to gnulib?)
Next step: bash. :-)
Thanks,
Arnold
Hi Paolo.
It looks like in this patch you have declarations after executable code.
I thought Jim agreed earlier that for dfa.c it'd be OK to be C89 compatible
and have all declarations before code; I need this for at least one
platform I support.
Thanks,
Arnold
n practice, not least because it fixes
> the denial-of-service problem. POSIX allows this.
Please remember that for gawk dfa.c needs to continue to be able
to match '\0'.
(There's an issue related to that which I need to discuss with Paolo,
off list. Will try to get to it soon. :-)
Thanks,
Arnold
speedup.
More recent versions of grep have had a lot of work done in this area.
You may wish to try downloading and building the most recent release
and then see if it runs faster.
HTH,
Arnold
Paulo Nogueira wrote:
> Hello,
>
> this is not about a bug in the usual sense, rather
nst multiple includes. On GLIBC systems you can
make that assumption. I'm not willing to do so for the broader range
of systems out there.
I personally prefer to include all system headers first, which allows me,
in the files that I control, to then compensate for system idiosyncracies.
To each his own.
Thanks,
Arnold
I suspect that the Solaris sh doesn't understand $(...), but
rather only `...`
Arnold
the routines ..._other_case instead of just _other ?
Thanks,
Arnold
d then separately try to deal with
the multibyte case.
And just to increase the need for Aspirin, any idea how regex handles
this case? I would not be surprised if the code there also doesn't
catch this. Whe! :-)
Arnold
ase folding could probably
still use a little refactoring. I can try to give that whirl if noone
beats me to it.
Thanks,
Arnold
BC bits out into
the standalone regex somehow.
In reponse to another question: Making GLIBC's regex support RRI isn't
hard - getting the GLIBC maintainers to accept the patch, is. :-(
My two cents: Jim & Paul will have to decide.
Thanks,
Arnold
heir loss. To date I know of no distro that does this. And the world is
bigger than just GNU syystems.
We've gone around on this before and we continue to disagree. I have
nothing else to add to this discussion.
Arnold
Hi Jim, Paul.
Here is the small refactoring I suggest for dfa.c
Thanks,
Arnold
diff --git a/dfa.c b/dfa.c
index d5e7fdf..dcd28e5 100644
--- a/dfa.c
+++ b/dfa.c
@@ -1767,18 +1767,19 @@ add_utf8_anychar (void)
static void
atom (void)
{
- if (0)
+ if (MBS_SUPPORT &&
grep team must decide for itself how important it is to be the source
for code that works correctly in other projects as well. The existence
of the option to supply brackref = NULL indicates that being able to
work correctly in both cases is supposed to be possible (at least
conceptually).
Thanks,
Arnold
gnore to precede the two memchr tests.
Hi Jim.
Why copy the using_utf8() routine out of dfa.c? Why not just link
to it instead? If it's static, make it extern... That way if the
logic ever changes then it only has to be changed in one place.
Just a thought. :-)
Arnold
>
Consider MinGW and/or VMS and/or z/OS (OS/390). In some cases, "a Unix-like
shell" isn't an option.
Thanks,
Arnold
e defaults
from configure, and not for portability.)
I understand that grep doesn't need it, but dfa serves more than
one master. I appreciate the grep team's accommodating gawk's needs.
Thanks,
Arnold
--
Paolo Bonzini wrote:
> This partia
grep and gawk and could also include the #define for
> GREP and GAWK symbols.
I could live with that. It's a good idea.
Thanks,
Arnold
gt; not shared between grep and gawk and could also include the #define for
> > GREP and GAWK symbols.
>
> I could live with that. It's a good idea.
I should point out that defining GAWK in dfacompat.h is not the right
thing - I define it globally since there are a few gawk-specific bits in
the regex code too.
Thanks,
Arnold
dfa portability doesn't really belong there.
I respectfully suggest dfaconfig.h (or dfacustom.h) with
#ifdef GAWK
#ifndef HAVE_SETLOCALE
#define selocale(x, y) NULL
#endif
#define static_assert(cond) ...
#endif
as the initial contents.
Much thanks,
Arnold
onfig.h and that a separate file is needed.
Thanks,
Arnold
UPPORT. Opaque to dfa.c.
* xalloc.h - provide declarations (and maybe definitions) for **alloc,
opaque to dfa.c
* dfaconfg.h - provide other definitions needed by dfa.c which may not be
in standard headers for when gnulib isn't being used
Thanks,
Arnold
s.gnu.org.
Moving dfa.h back to where it was is likely to break OpenVMS. See
this in the gawk ChangeLog:
2013-01-31 Arnold D. Robbins
* dfa.c: Include "dfa.h" which includes regex.h after limits.h
so that RE_DUP_MAX gets the correct value. Especially ne
oy your vacation and complete the fixes before the
release?
Thanks,
Arnold
f the
header?
Thanks,
Arnold
Please reconsider the trivial revert and restoration of the
separate binaries.
Thanks,
Arnold
id
the headaches all around?
Then do another clean-up release after that. The urgency to release
RIGHT NOW seems to be self-induced, at least to me, looking in from the
outside.
'nuff said.
Arnold
..
>
> Do we cater to such systems? Without execlp/execvp, many
> of the programs in coreutils will fail to build, including these:
No-one is talking about coreutils. We're talking about grep/fgrep/egrep.
In any case, it looks like the decision's been made.
Thanks,
Arnold
Is strstr() even a good idea? dfa needs to be able to match NUL
bytes in the data. If this prevents that, then it's a problem.
I'm merely asking - I didn't look hard at where the change was made.
If it matching NUL bytes isn't affected then, no problem.
Thanks,
Arnold
Norihiro Tanaka wrote:
> If we don't use KWset, struct dfamust doesn't have to build. This patch
> make a change that it's built on demand.
Gawk doesn't use KWset - does this patch affect gawk?
Thanks,
Arnold
tax bits.)
HTH,
Arnold
Nathan Weeks wrote:
> GNU grep 2.20 disallows the use of an unmatched right parenthesis in
> an extended regular expression:
>
>
> $ echo ')' | grep -E ')'
> grep: Unmatched ) or \)
> ~
.
I would think adding a check for '\r' would be safe and would help
too; given that on Windows systems '\r' generally occurs just as
frequently as '\n', it should give a nice speedup for gawk on those
systems.
The other characters that Erik cited seem less like a big issue to me.
Thanks,
Arnold
Adding a check for \r isn't a big deal in any case, but of the 5
characters Erik mentioned originally, that is the only one where I
see a potential for a check to really make a difference.
Thanks!
Arnold
Norihiro Tanaka wrote:
> Thanks, but it seem that it is also unportable. On Solaris 10 and AIX 7,
> below. Need Gawk for tests?
>
> $ awk 'BEGIN { printf "\x41" }' \x41
If you use octal it should work with any awk.
Arnold
regex and, I believe, dfa routines don't accept this.
Fixing either of them is beyond my skill range, so I thought I'd
pass this one upstream to you folks.
Thanks!
Arnold
principle.
> To permit that change, I'll move the inclusion of xalloc.h --
> the header that defines it -- from dfa.c to dfa.h.
>
>
Please don't. I'd prefer not to have to deal with having that symbol
visible.
Thanks,
Arnold
Jim Meyering wrote:
> and moved
> some declarations "down".
Can we keep delcarations C89 style please? Gawk still supports
some environments that don't allow declarations in the middle
of executable code.
Thanks,
Arnold
Hi.
This looks like a nice patch that gawk would beneifit from.
I have a minor suggestion, which is to make dfabackref into a
static function.
Thanks,
Arnold
--
Norihiro Tanaka wrote:
> On Sat, 18 Jul 2015 22:15:33 -0700
> Jim Meyering wrote:
>
> > Hello
)(.*)(.*)\3\2\1'
grep: regexec.c:1413: pop_fail_stack: Assertion `((Idx) (num) < ((Idx) -2))'
failed.
Aborted (core dumped)
I looked at it in a debugger fs->num before the --fs->num executes looks to
be -1.
Thanks,
Arnold
nt
the headache of maintaining those changes.
So - Caveat Emptor; you may be twisting your code base for the benefit
of just a single system that's WAAAY out in left field.
My two cents worth.
Thanks,
Arnold
ng this patch I looked at Gawk and noticed that it
> > already has its own equivalent of this patch's new mbrtowc_cache variable.
> > Gawk obtains its cache via btowc; although this doesn't work on MirOS BSD
> > due to its buggy btowc, Arnold says he's not worried about
ded it.
And I know that practically speaking this is unlikely, I'm coming at it
more from a perspectgive of bullet-proofing the code.)
My two cents worth, of course.
Thanks,
Arnold
vior if output from multiple
files comes out interleaved, instead of in the order the files were
specified on the command line.
My two cents, of course.
Thanks,
Arnold
The silence in response to this has been thundering. :-(
Ignoring the gawk bits, is the grep team willing to incorporate the
dfa.[ch] changes?
Should this wait until after other pending changes to dfa are applied?
Thanks,
Arnold
Aharon Robbins wrote:
> Hi.
>
> Here is my proposed
t
gawk.
Thanks,
Arnold
ully move to C90.
Thanks!
Arnold
-
--- ../grep/src/dfa.c 2016-08-16 11:40:09.008803100 +0300
+++ dfa.c 2016-08-16 11:46:52.901810300 +0300
@@ -3900,9 +3939,11 @@
bool exact = false;
bool begline = false;
bool endline = false;
+ size_t rj;
bool need_begline
orking versions of
modern software. I think there's a happy medium to be found. :-)
In any case, if C90 code is really bothering you, then undo the changes
and I'll manage.
Thanks,
Arnold
and dfa easily, I expect it is applied before dfa is moved
> to gnulib.
Can you update the comment to dfasyntax to explain the new fourth argument?
How would gawk call dfasyntax? It's not clear to me what you've just
done.
Thanks,
Arnold
Hi.
Just wondering. Do we think that dfa has settled down, or are there still
more changes waiting in the wings?
Thanks,
Arnold
esting it with gawk.
I would appreciate getting the test cases you used so that I can
add them to the gawk test suite. Every time I've merged from grep
my tests pass.
Thanks,
Arnold
mber of bug
reports about this ... :-)
Arnold
st as an option, even if that means that e-acute can
> never be matched to [d-f].
Now, if we could get GLIBC to move to that, we'd have something.
I've tried to submit patches in the past that weren't accepted,
but maybe it's worth trying again.
At least gawk and gnulib-based programs generally do so.
Arnold
Bruno Haible wrote:
> Finally, code this formula into the 'grep' program.
I'm sure that Paul and Jim would welcome patches.
Arnold
Create a shell script named grep with
/usr/bin/grep --color "$@"
in it, and put it in a directory in your search path that is
found before the standard grep.
HTH,
Arnold
Thomas Güttler wrote:
> I am not happy with GREP_OPTIONS being deprecated.
>
> I asked
rstanding what you want, but something like
awk '/pattern to match/ { print ; continue }
{ exit 0}' file
might do what I think you want - exit on first non match.
If gawk can do the same matching you're doing with grep -Pno, that
is a different question.
HTH,
Arnold
If I may beg to differ, I see no reason that GNULIB can't be
ahead of GLIBC. In particular for the benefit of programs that use
it, like grep/sed/gawk.
If necessary, I can copy/paste the change from the bug report, but
it'd be nicer if you'd just push it to GNULIB.
Thanks,
Arno
it's not that important.
If it drags out, it becomes a hassle.
Do you have an ETA on when the fix will get pushed to GNULIB?
Thanks,
Arnold
a bit later than Feb. 1).
Excellent. Thanks!
Arnold
Hi.
Norihiro Tanaka wrote:
> Missing a patch for dfa. Re-send correct patch file.
Paul - is this going to be merged into GNULIB? If so, I'll put it into
gawk now; I want to make a release soon.
Thanks,
Arnold
[
erencing errors. No output written to bash
This is the lack of the readline library.
> However configure works without bash around.
What you tested was bash's configure. The original
query was about grep's configure.
HTH,
Arnold
Paul Eggert wrote:
> Arnold Robbins wrote:
> > I seem to recall that Norihiro Tanaka had sent in some patches to
> > dfa a few months back, but I don't think I saw them integrated into
> > Gnulib. Am I imagining things? If not, any ETA on that?
>
> You'
Hi Karl.
See the attached patch. Less than 10 minutes' work. :-)
Grep guys - I'm pretty sure I've signed papwerwork for grep. Feel free
to incorporate this patch.
I chose '-g' since that letter was unused. It has no mnemonic value.
Thanks,
Arnold
Karl Berry wrote:
Karl Berry wrote:
> See the attached patch. Less than 10 minutes' work. :-)
>
> Thanks Arnold!
You're welcome.
> I chose '-g' since that letter was unused. It has no mnemonic value.
>
> If a one-letter option is going to be used (I thought that m
e -I and then eventually
> repurpose it? But in the meantime this particular feature request is
> done, so I'm taking the liberty of closing the bug report.
Much thanks, Paul.
Karl --- Enjoy! :-)
Arnold
then I could
live with ssize_t (as returned by read(2), for example), but I
would find ptrdiff_t to be ugly and unintuitive.
> PS. Arnold, the above discusses all the changes I know about for dfa.c
> and dfa.h. The proposed API change (size_t->ptrdiff_t) could be
> installed eit
the API.
Thanks!
Arnold
arn...@skeeve.com wrote:
> Other than this, I think internally too, I'd prefer that you
>
> 1,$s/ptrdiff_t/ssize_t/g
I did this, just to see. gawk passes its test suite, both in
64- and 32-bit mode.
FWIW.
Thanks,
Arnold
ing ssize_t.
In any case, as I said, I can live with ptrdiff_t in the implementation,
even though I don't like it that much. (A nice block comment at the
top of dfa.c explaining why ptrdiff_t is used would be appropriate.)
But I really don't want ptrdiff_t in the API.
Thanks,
Arnold
Thanks,
Arnold
arn...@skeeve.com wrote:
> But I really don't want ptrdiff_t in the API.
I see that Paul has made the change to the API over my objections.
Jim --- do you have an opinion on this?
Thanks,
Arnold
1 - 100 of 133 matches
Mail list logo