Norihiro Tanaka <nori...@kcn.ne.jp> wrote:

> Eric Blake <ebl...@redhat.com> wrote:
> > Is it worth extending your optimization to all five of the
> > POSIX-guaranteed single byte characters?
>
> Thanks, but I don't want to perform it immediately.  DFA has already
> regarded newline as a single byte character, but hasn't others yet.  So,
> we may need to make many changes to handle invalid locales and sequences
> not to conform to the rule.  If we omitted that, It might be that limits
> are added to the locale to be able to apply DFA to.  Threfore, it should
> be performed carefully.

I would think adding a check for '\r' would be safe and would help
too; given that on Windows systems '\r' generally occurs just as
frequently as '\n', it should give a nice speedup for gawk on those
systems.

The other characters that Erik cited seem less like a big issue to me.

Thanks,

Arnold



Reply via email to