bug#24975: Matching issues with characters whose encoding ends in some other character

2016-11-28 Thread Jim Meyering
On Mon, Nov 28, 2016 at 5:49 AM, Norihiro Tanaka wrote: > Jim Meyering wrote: > >> I suspect this won't be the last word in this area, because it feels >> like we should be able to adjust DFA's tables so that people using >> such locales can retain DFA's efficiency without the bug in the >> curre

bug#24975: Matching issues with characters whose encoding ends in some other character

2016-11-28 Thread Paul Eggert
Thanks for that DFA fix, which should be much better than the previous workaround. I installed it into gnulib and installed the attached patch into grep. From 76348c87e73b37d44caec3bb5b24c33c1455ed96 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Mon, 28 Nov 2016 08:39:37 -0800 Subject: [PATCH

bug#24975: Matching issues with characters whose encoding ends in some other character

2016-11-28 Thread Norihiro Tanaka
Jim Meyering wrote: > I suspect this won't be the last word in this area, because it feels > like we should be able to adjust DFA's tables so that people using > such locales can retain DFA's efficiency without the bug in the > current implementation. Hi Jim, It is a bug in dfa for period expre

bug#24975: Matching issues with characters whose encoding ends in some other character

2016-11-28 Thread Norihiro Tanaka
Jim Meyering wrote: > I suspect this won't be the last word in this area, because it feels > like we should be able to adjust DFA's tables so that people using > such locales can retain DFA's efficiency without the bug in the > current implementation. Hi Jim, It is a bug in dfa for period expre

bug#24975: Matching issues with characters whose encoding ends in some other character

2016-11-27 Thread Jim Meyering
On Sun, Nov 20, 2016 at 9:53 PM, Jim Meyering wrote: > On Sun, Nov 20, 2016 at 2:59 PM, Stephane Chazelas > wrote: >> 2016-11-20 21:50:28 +, Stephane Chazelas: >>> $ locale charmap >>> GB18030 >>> $ printf '\uC9\n' | grep '.*7' | hd >>> 81 30 87 37 0a

bug#24975: Matching issues with characters whose encoding ends in some other character

2016-11-20 Thread Jim Meyering
On Sun, Nov 20, 2016 at 2:59 PM, Stephane Chazelas wrote: > 2016-11-20 21:50:28 +, Stephane Chazelas: >> $ locale charmap >> GB18030 >> $ printf '\uC9\n' | grep '.*7' | hd >> 81 30 87 37 0a|.0.7.| >> 0005 >> >> U+00C9's encoding does end in t

bug#24975: Matching issues with characters whose encoding ends in some other character

2016-11-20 Thread Stephane Chazelas
2016-11-20 21:50:28 +, Stephane Chazelas: > $ locale charmap > GB18030 > $ printf '\uC9\n' | grep '.*7' | hd > 81 30 87 37 0a|.0.7.| > 0005 > > U+00C9's encoding does end in the 0x37 byte (7 in ASCII and GB18030). [...] > Reproduced with 2.25

bug#24975: Matching issues with characters whose encoding ends in some other character

2016-11-20 Thread Stephane Chazelas
$ locale charmap GB18030 $ printf '\uC9\n' | grep '.*7' | hd 81 30 87 37 0a|.0.7.| 0005 U+00C9's encoding does end in the 0x37 byte (7 in ASCII and GB18030). $ printf '\uC9\n' | grep '.*0' fails. $ printf '\uC9\n' | grep -o '.*7' returns wi