> Propose a patch, and I will consider it. In my opinion, it was much > easier to do igncr as an all or none option than it was to parse the first > line and discard \r on a per-file basis, not to mention that all-or-none > is easily configurable so that those of us who WANT literal \r as required > by POSIX can do so. To date, I have been the only person proposing > patches in the area of \r\n handling, rather than just ideas (which is > probably why I am the maintainer of the cygwin bash port). And I have no > personal interest in redoing this patch to a more complex algorithm at > this time.
I can understand your point of view, I'm sure it is a huge pain to deal with this and then have to answer all the questions and complaints from people who don't know why their scripts broke. Given the number of support requests on the mailing list, it definitely seems worthwhile to have this issue handled more transparently. The most compelling argument for not translating \r\n, as I understand it, is that searching for the \r slows down processing of large scripts, such as configure, when it is not necessary. On the other hand, most user-created scripts that do contain \r\n line endings are not large enough for the overhead to be problematic. For that reason, I think it makes the most sense to enable or disable igncr on a per-file basis. The other obvious alternative would be to have igncr turned on by default, but then users would have to explicitly disable it to get the performance enhancement when running configure scripts, and most would simply not know they needed to do it. Below is a patch to shell.c which implements my suggestion (It is just the output of diff -u, is that OK? The patch is relative to the 3.2-4 version.) I just modified the open_shell_script() function; now after it reads in 80 characters to make sure the script isn't binary, it also scans that buffer for the first \n it finds and if it is preceded by \r, then it enables igncr. If necessary it reads more characters, 80 bytes at a time. In most cases the performance decrease for this check will be entirely negligible, particularly since the first line of the script is almost always going to just be the #!/bin/bash line anyway. The only downside I can see to this is that bash will behave differently depending whether it is reading from a file or from a pipe. But that is already true in other cases as well, for instance the check for binary files is not performed when reading from a pipe. It would be possible to extend the \r\n check to piped input as well, although it would be somewhat more annoying to implement. If you think this overall concept is a good idea, I am happy to take everyone's suggestions and implement them. It's a bit of a struggle for me to write C code instead of C++, but I can give it a shot. -Lewis --------- --- shell.c 2006-10-25 15:40:53.625000000 -0400 +++ shell_patched.c 2006-10-25 15:41:02.093750000 -0400 @@ -1335,6 +1335,7 @@ open_shell_script (script_name) char *script_name; { + extern int igncr; int fd, e, fd_is_tty; char *filename, *path_filename, *t; char sample[80]; @@ -1426,8 +1427,32 @@ internal_error ("%s: cannot execute binary file", filename); exit (EX_BINARY_FILE); } + + /* Now scan the first line of the file, if it ends with \r\n, then + enable the igncr shell option. */ + + while(sample_len>1) + { + int i; + for(i=0; i!=sample_len; ++i) if(sample[i]=='\n') + { + if(i>0 && sample[i-1]=='\r') igncr=1; + break; + } + if(i<sample_len) break; + + /* else need to read more data */ + sample_len = read(fd, sample, sizeof(sample)); + if(sample_len<0) + { + file_error(filename); + exit(EX_NOEXEC); + } + } + /* Now rewind the file back to the beginning. */ lseek (fd, 0L, 0); + } /* Open the script. But try to move the file descriptor to a randomly ------------ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/