On Fri, Jan 24, 2025 at 07:26:00PM +0000, Peter White wrote:
> On Fri, Jan 24, 2025 at 01:27:13PM +0000, Andreas BROCKMANN via Bug reports 
> for GNU grep wrote:
> > Hi,
> > 
> > The 1st command below correctly reports trailing spaces, for Unix and 
> > Windows format files.
> > The 2nd one incorrectly reports all lines.
> > 
> >   grep -sHn -i " [[:cntrl:]]*$" *.vhd
> >   grep -sHn -i "\s[[:cntrl:]]*$" *.vhd
> As someone who just today made a similar mistake I would like to point
> out that the pattern does as intended because '*' matches *zero* or more
> occurrences of the preceding atom. So the second pattern matches
> any line that contains a *literal* 's' followed by zero or more control
> chars, which is any line because of the newline at the end which is a
> control char. Since you did not ask for perl regex (-P) grep uses basic
> POSIX regex instead; at least I *think* you want perl syntax given that
> '\s' is only valid in PCRE, IIRC.

Turns out that last part is not true, sorry. I was going by the grep(1)
man page instead of `info grep`, which does say that '\s' is shorthand
for '[[:space:]]'. Still, the 2nd pattern is incorrect. IIUC this is
what it should look like:

        # '-i' is bogus since there is no upper/lower case whitespace
        grep --color=never -sHn '[[:blank:]][[:cntrl:]]*$'

[:blank:] is the more correct char class because '\s' matches anything
in the ASCII range 0-31 (plus <DEL>[127]) and as it so happens <CR> is
in that range. DOS files have the <CR> in front of <LF> (a.k.a. '$'),
which is why the original pattern did match *correctly*. Contrary to the
claim in the OP I could only reproduce the "false" behavior with DOS and
not UNIX files. And now I understand why '[[:cntrl:]]' is in the pattern
(sorry for my initial misunderstanding). DOS, the gift that keeps on
giving. :P

Also note the '--color=never'. I don't know how relevant this is on
Windows but on my terminal emulator (with --color=auto) the <CR> at the
end of a line in DOS files would be printed as a match and the terminal
obeyed with all the ensuing consequences, leaving empty lines without
match text. Another "gift", I guess.


PW



Reply via email to