arn...@skeeve.com wrote: > Gawk does not remove CR in advance, unless someone specifically > set RS = "\r\n", in which case the full regex matcher is used > to first find \r\n in the raw input buffer.
Thanks, I also confirmed it on source code of Gawk. > So for gawk, adding a check for (c == eolbyte || c == '\r') > should produce more speedup on Windows. > > (Hmm, on Windows the default is probably text mode which causes > the library/OS to hide the \r anway. Harumph. But if binary mode > wsa requested then it could still make a difference.) I think It's better to build KWset rather than rely on checking for '\r' in non-UTF8 multibyte mode of DFA. Further more, even if we add checking for '\r' to DFA, I think that we can't use to speed up on Windows, so that DFA can't correctly locate a matched position except a pattern which is fixed string.