Bugs item #1744752, was opened at 2007-06-28 04:23 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1744752&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Rune Devik (runedevik) Assigned to: Nobody/Anonymous (nobody) Summary: Newline skipped in "for line in file" Initial Comment: Creating new ticket for the bug described here since it was closed (and I was not able to reopen it): http://sourceforge.net/tracker/index.php?func=detail&aid=1636950&group_id=5470&atid=105470 The problem is that when you open a hughe file on windows with the "r" mode it will sometimes merge two lines. As I said in the ticket above (but probably ignored since I updated a closed ticket): Hi I have the same problem with a huge file (8GB) containing long lines. Sometimes two lines are merged into one and rerunning the test script that reads the file it's always the same lines that are merged. Also the merging happens more frequently towards the end of the file it seems. I tried to reproduce with a smaller data set (10 lines before the two lines that get merged, the two lines that gets merged and the 10 lines after that) but I was not able to reproduce on this smaller data set. However if you open this huge file in "rb" mode instead of "r" mode everything works as it should and no lines are merged at all! If I copy the file over to linux and rerun the test script no lines are merged (regardless if mode is "r" or "rb") so this is windows specific and might have something todo with the adding of \r\n if only \n is found when you open the file in "r" mode maybe? Also I have reproduced it on both python 2.3.5 and 2.5c1 on both windows XP and windows 2003. More stats on the input file in both "r" mode and "rb" mode below: Input file size: 8 695 828 KB fp = open(file, "r"): - total number of lines read: 668909 - length of the longest line: 13179792 - length of the shortest line: 89 - 56 lines contains the content of two lines - Always just two lines that are merged into one! - Always the same lines that are merged rerunning the test on the same file. open(file, "rb"): - total number of lines read: 668965 - length of the longest line: 13179793 - length of the shortest line: 90 - no lines merged Regards, Rune Devik ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2007-07-03 22:16 Message: Logged In: YES user_id=33168 Originator: NO Without a reproducible test case, there's really nothing we can do. You will need to debug this on your own. Try setting a breakpoint in the debugger in the file object, probably in get_line(). If you can make a self contained test case, then we can help. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1744752&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com