bug#15440: [PATCH] dfa: fix \s and \S to work for multibyte

2013-10-01 Thread Jim Meyering
On Tue, Sep 24, 2013 at 5:24 AM, Aharon Robbins wrote: > Hi Jim. > > I should note that gawk uses its own regex, although it does rely > on glibc for isspace / iswspace etc... ... close 15440 thanks I've pushed my grep patches, but chose to omit 4 multibyte space characters from the list in the

bug#15440: [PATCH] dfa: fix \s and \S to work for multibyte

2013-09-24 Thread Aharon Robbins
# optional printf '' | ./gawk '/.../' # your tests here. :-) Much thanks! > From: Jim Meyering > Date: Mon, 23 Sep 2013 14:04:09 -0700 > Subject: Re: bug#15440: [PATCH] dfa: fix \s and \S to work for multibyte > To: Aharon Robbins , 15...@debbugs.gnu.o

bug#15440: [PATCH] dfa: fix \s and \S to work for multibyte

2013-09-23 Thread Jim Meyering
[using the right bug address, this time] On Mon, Sep 23, 2013 at 11:26 AM, Aharon Robbins wrote: > Hi. > >> $ printf '\x82\n' > in; ./grep -q '\S' in && echo match >> match >> >> Now, require a back-reference (forcing switch from grep's DFA matcher >> to use of the regex functions), and y

bug#15440: [PATCH] dfa: fix \s and \S to work for multibyte

2013-09-22 Thread Jim Meyering
This one really surprised me. Learning that multibyte \s and \S had been broken since grep-2.6 did not make my day. But fixing it helped. Here's how it started: To demonstrate the (first)bug, set up to use a UTF8 locale: export LC_ALL=en_US.UTF-8 then run this and note that it matches: