I looked into my scripts a little harder, have better results, some new
conclusions:
-----------------------------------------------------
line ending | mount mode | igncr | "user" time
-----------------------------------------------------
CRLF | text | set | 1.0114s
-----------------------------------------------------
CRLF | text | clear | 0.984s
-----------------------------------------------------
LF | text | set | 0.56995s
-----------------------------------------------------
LF | text | clear | 0.5653s
-----------------------------------------------------
CRLF | bin | set | 0.59435s
-----------------------------------------------------
CRLF | bin | clear | whoops!
-----------------------------------------------------
LF | bin | set | 0.5545s
-----------------------------------------------------
LF | bin | clear | 0.5576s
-----------------------------------------------------
The worst cases are still text mounts with CRLF files (further impugning
text mode mounts) but my statement below about "not bash's fault" is
apparently not completely true.
In the bin mode section (the Cygwin recommended mount mode): note here
that there's an approx 7% penalty between the most accomodating case
(CRLF on a binmode mount with igncr set) and the most restrictive case
(LF only on a bin mode mount with igncr clear). Less than 10% penalty
on this perverse benchmark (handling _nothing_ but linefeeds) seems like
a small price for compatibility.
-Rob
Rob Walker wrote:
Larry Hall (Cygwin) wrote:
On 10/12/2006, Rob Walker wrote:
If you're referring to the performance gain realized, I think it
could have been accomplished (if not as trivially) without breaking
CRLF handling. This seems to be indicated in other posts, ones that
talk about reworking line parsing.
I believe the response to this is <http://cygwin.com/acronyms/#PTC>.
In other
words, if your belief is strong enough and you have the knowledge to
back up
that belief, you just need the persistence to follow through on all
that to
show everyone your concrete ideas. Since we've had allot of opinionated
discussions on topics like this from the uninformed or those who lack
the
conviction to actually submit a patch to back up their point of view,
it's
important to realize here that patches speak louder than words (hm,
PSLTW -
acronym alert? ;-) )
Actually, though, I was asking about a bigger-picture strategy. One
that appears to be steering Cygwin away from interoperability of the
past, towards a more rigid interpretation of what Cygwin's suitable
uses are. Do you have a set of guiding principles you consult when
deciding the fate of Cygwin? Who do you consider Cygwin's customers
to be?
The basic strategy is that in cases where decisions have to be made
between
supporting Linux-like behavior or Windows conventions, err on the side
of Linux. Since the tools are meant to support the Linux way of doing
things, it's important they do. Otherwise people who are looking for
and
expecting this behavior are left out.
Are you saying that these people expect bash to treat CRLF as if the
CR were non-whitespace? Can you give me an example where this would
be a useful feature?
They are the ones these tools are
built to support. That said, support for various Windows ways and
conventions
are supported by default and when they don't conflict with the
above. But
when there is a conflict, Linux-like behavior is the goal.
I guess you're saying (in this case) that the performance benefit of
barfing on CRLF outweighs the usefulness of bash's invisible handling
of CRLF?
To test this assertion, I benchmarked bash (3.1-9). The script I used
to test is essesntially empty, with nothing but the shebang, a call to
shopt, and 50k empty lines. I chose empty lines to keep bash's other
complexities out of the picture. I only wanted to measure is how long
it takes bash to parse lines.
Here are my results:
-----------------------------------------------------
line ending | mount mode | igncr | time ./test.sh
-----------------------------------------------------
| | | real 0m4.219s
CRLF | text | set | user 0m0.983s
| | | sys 0m3.202s
-----------------------------------------------------
| | | real 0m4.312s
CRLF | text | clear | user 0m1.062s
| | | sys 0m3.265s
-----------------------------------------------------
| | | real 0m2.109s
LF | text | set | user 0m0.608s
| | | sys 0m1.499s
-----------------------------------------------------
| | | real 0m2.125s
LF | text | clear | user 0m0.592s
| | | sys 0m1.546s
-----------------------------------------------------
| | | real 0m2.125s
CRLF | bin | set | user 0m0.546s
| | | sys 0m1.530s
-----------------------------------------------------
| | |
CRLF | bin | clear | Whoops!
| | |
-----------------------------------------------------
| | | real 0m2.188s
LF | bin | set | user 0m0.608s
| | | sys 0m1.546s
-----------------------------------------------------
| | | real 0m2.141s
LF | bin | clear | user 0m0.640s
| | | sys 0m1.515s
-----------------------------------------------------
My conclusions:
1) CRLF vs. LF line endings have essentially no effect on the
performance of this version of bash, even on a test where bash is
doing nothing but handling linefeeds.
2) Ignoring CR on a binmode mount has no performance penalty over a
clean LF-only file. In fact, the margin of error in this test was
higher than the performance penalty.
3) CRLF on a text mode mount is really, really bad. This isn't bash's
fault (note the time spent in user mode is the same as on binary
mounts, all the time is spent in sys), and so to me looks like a
non-solution to the problem of bash not handling CRLF; to say nothing
of the other issues with text mode mounts.
Looks like making igncr the default in Cygwin is a no-cost solution in
terms of performance, and a big win for compatibility.
Has anyone else done anything like this? Any flaws in my analysis?
Thanks for reading.
-Rob
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/