Norihiro Tanaka <nori...@kcn.ne.jp> wrote: > Eric Blake <ebl...@redhat.com> wrote: > > Is it worth extending your optimization to all five of the > > POSIX-guaranteed single byte characters? > > Thanks, but I don't want to perform it immediately. DFA has already > regarded newline as a single byte character, but hasn't others yet. So, > we may need to make many changes to handle invalid locales and sequences > not to conform to the rule. If we omitted that, It might be that limits > are added to the locale to be able to apply DFA to. Threfore, it should > be performed carefully.
I would think adding a check for '\r' would be safe and would help too; given that on Windows systems '\r' generally occurs just as frequently as '\n', it should give a nice speedup for gawk on those systems. The other characters that Erik cited seem less like a big issue to me. Thanks, Arnold