Hi. Norihiro Tanaka <nori...@kcn.ne.jp> wrote:
> arn...@skeeve.com wrote: > > I would think adding a check for '\r' would be safe and would help > > too; given that on Windows systems '\r' generally occurs just as > > frequently as '\n', it should give a nice speedup for gawk on those > > systems. > > As I recognize that DFA and regex aren't support multiple eolbytes as > CR-LF, I can't understand where we can use the change. Grep converts > Windows text to Unix text by removal of CR in advance. Gawk does not remove CR in advance, unless someone specifically set RS = "\r\n", in which case the full regex matcher is used to first find \r\n in the raw input buffer. So for gawk, adding a check for (c == eolbyte || c == '\r') should produce more speedup on Windows. (Hmm, on Windows the default is probably text mode which causes the library/OS to hide the \r anway. Harumph. But if binary mode wsa requested then it could still make a difference.) > BTW, although I say `newline', correctly notice that it's `eolbyte' > which mayn't be either LF or NUL. Understood and agreed. Adding a check for \r isn't a big deal in any case, but of the 5 characters Erik mentioned originally, that is the only one where I see a potential for a check to really make a difference. Thanks! Arnold