Hi Roger, On Wed, 9 Aug 2017 07:03:24 +0000, Roger Krebs wrote: > I've added a BEGIN section at the beginning awk sript file setting the record > separator explicitly for the input file (RS) as well as for the output file (ORS): > > BEGIN { > RS="\r\n" > ORS="\r\n" > } > { > ... your script > } > > Especially the RS parameter wasn't necessary in the past but now it is.
Which is a pretty much of a pain when there is no easy fallback solution provided in case a major change is applied. E.g. for sed - if I understand the reference to sed in https://cygwin.com/ml/cygwin/2017-08/msg00033.html correctly - a separate switch '-b' is added. For the latest gawk version I cannot see anything like that which means that all of our awk scripts run against cygwin's gawk do break without any tweak unless I am missing anything here. This is - to say the least - unpleasant in the light of what Cygwin claims to be, namely 'a large collection of GNU and Open Source tools which provide functionality similar to a Linux distribution on Windows' (from the top of the start website www.cygwin.com). Again, admittedly I did not dive into the discussion and the substance of the reasoning to make this move to gawk | sed | grep. Now I can see the following *easy* solutions to the very situation here (input only for now): 1 - Inserting the BEGIN section as you suggested into more than 1k scripts (not feasible due to additional regression test workload) 2 - Calling 'gawk -vRS=\r\n -vORS=\r\n' instead of 'gawk' (hack to turn back the additional the latest gawk's complexity, wrapper needed) 3 - Wrapping a d2u/u2d pipe solution (additional app and wrapper needed again) 4 - Using another compiled version of gawk which does *not* disable the out-of-the-box gawk feature to swallow CRs (cf., e.g., http://git.savannah.gnu.org/cgit/gawk.git/tree/awkgram.y#n3543), i.e. without the artificial obstacle to now know the EOL type of the input file ahead of running gawk. > It works in all my cases. The only disadvantage: you have to know what kind ... plus the disadvantage to systematically amend all the scripts instead of having an external solution > of files you want to handle in the awk script. The same awk script will not > work for DOS files as well as for linux files. ... another issue originated by the change and which didnt exist before. > Best > > Roger Please don't get me wrong, but this raises a real issue here and I am not sure which rationale other than 'let's get more of the Linux-feel' drove the decision. All the best, J. > -----Ursprüngliche Nachricht----- > Von: cygwin-ow...@cygwin.com [mailto:cygwin-ow...@cygwin.com] Im > Auftrag von Jannick > Gesendet: Mittwoch, 9. August 2017 02:48 > An: cygwin@cygwin.com > Betreff: RE: gawk 4.1.4: CR separate char for CRLF files > > On Tue, 08 Aug 2017 16:23:40 -0700 (PDT), Steven Penny wrote: > > On Wed, 9 Aug 2017 01:15:08, "Jannick" wrote: > > > the current version 4.1.4 of gawk appears to unpleasantly treat CR > > > for CRLF files, i.e. CR is not gracefully swallowed, but is a > > > separate > character. > > > > > > This makes some, if not all, of the scripts we are working with here > > > useless, unless the input files are converted to LF which certainly > > > is not feasible. IIRC the issue did not show up some versions back. > > > > > > Is this a bug - or am I missing something here? > > > > Learn to read: > > > > http://cygwin.com/ml/cygwin/2017-08/msg00033.html > > Thanks - quickly done. > > The link reveals that CRLF/LF conversion is now mandatory to work with > cygwin's gawk on DOS machines. As far as I can see there is no legacy > solution like for, e.g., sed (-b switch) to have an easy solution for the issue, > especially when invoking gawk from makefiles (piping). > > I consider this bad news while admittedly not fully understanding the whole > background of the move which is not necessary for now. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple