At Mon, 11 Oct 2004 23:29:15 +0900 (JST), Tatsuya Kinoshita wrote: > > Package: gawk > > Version: 1:3.1.4-1 > > > Executing the following line in a shell: > > > > echo -e '--- orig/lisp/ChangeLog\n+++ mod/lisp/ChangeLog' | LANG=ja_JP > > gawk '/[Cc]hangeLog/ { print }' > > > > yields not the expected two lines of output, but instead only the first one: > > > > --- orig/lisp/ChangeLog > > > > > > If the LANG-setting portion is changed to use C, then it works as > > expected (others such as "de" seem to work too): > > > > echo -e '--- orig/lisp/ChangeLog\n+++ mod/lisp/ChangeLog' | LANG=C gawk > > '/[Cc]hangeLog/ { print }' > > > > yields: > > > > --- orig/lisp/ChangeLog > > +++ mod/lisp/ChangeLog > > > > > > I'm not sure if the actual encoding has any impact -- ja_JP, ja_JP.utf8, > > and ja_JP.eucjp all exhibit the same problem. > > ko_KR, zh_CN, and zh_TW exhibit the same problem. On CJK > locales, this bug causes gawk scripts unusable. > > Downgrading gawk to version 1:3.1.3-3 prevents the problem. > > Could anyone fix this bug?
One possible workaround is use GAWK_NO_DFA=1 % echo -e '--- orig/lisp/ChangeLog\n+++ mod/lisp/ChangeLog' | LANG=ja_JP.eucJP GAWK_NO_DFA=1 gawk '/[Cc]hangeLog/ { print }' --- orig/lisp/ChangeLog +++ mod/lisp/ChangeLog I may find the reason of this bug. This is because pattern string has been changed, but begin,end remain to point the same address so that mblen_buf and inputwcs won't be updated. For example, this patch will fix the problem, but it may slow down, so I think better fixes should be made. --- dfa.c~ 2004-07-26 23:11:41.000000000 +0900 +++ dfa.c 2004-10-12 01:05:14.000000000 +0900 @@ -2872,13 +2872,14 @@ { int remain_bytes, i; buf_begin -= buf_offset; +#if 0 if (buf_begin <= (unsigned char const *)begin && (unsigned char const *) end <= buf_end) { buf_offset = (unsigned char const *)begin - buf_begin; buf_begin = begin; buf_end = end; goto go_fast; } - +#endif buf_offset = 0; buf_begin = begin; buf_end = end; Regards, Fumitoshi UKAI <[EMAIL PROTECTED]> / <[EMAIL PROTECTED]> Hewlett-Packard Laboratories Japan http://ecardfile.com/id/ukai