Amaury Forgeot d'Arc <amaur...@gmail.com> added the comment:

I think there's actually a bug in the MSVCRT read() function, which was not too 
hard to spot (see explanation below).  In short, a CRLF file opened in text 
mode may skip a newline after 4GB.

I'm re-closing the issue as "won't fix". There's really nothing we can do about 
it.  But note that Python 3.x is not affected (raw files are always opened in 
binary mode and CRLF translation is done by Python); with 2.7, you may use 
io.open().

Other issues: issue1142, issue1672853, issue1451466 also report the same 
end-of-line issue on Windows (I just searched for "windows gb" in the 
tracker...) I'll close them as well.

Now, the explanation of the bug; it's not easy to reproduce because it depends 
both on the internal FILE buffer size and the number of chars passed to fread().
In the Microsoft CRT source code, in open.c, there is a block starting with 
this encouraging comment "This is the hard part.  We found a CR at end of 
buffer.  We must peek ahead to see if next char is an LF."
Oddly, there is an almost exact copy of this function in Perl source code:
http://perl5.git.perl.org/perl.git/blob/4342f4d6df6a7dfa22a470aa21e54a5622c009f3:/win32/win32.c#l3668
The problem is in the call to SetFilePointer(), used to step back one position 
after the lookahead; it will fail because it is unable to return the current 
position in a 32bit DWORD. [The fix is easy; do you see it?]
At this point, the function thinks that the next read() will return the LF, but 
it won't because the file pointer was not moved back.

----------
nosy: +amaury.forgeotdarc
resolution: invalid -> wont fix

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue1744752>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to