[issue1597011] Reading with bz2.BZ2File() returns one garbage character

Sean Reifschneider Tue, 28 Aug 2007 03:26:15 -0700

Sean Reifschneider added the comment:

There are some bugs in the bz2 module.  The problem boils down to the
following code, notice how *c is assigned *BEFORE* the check to see if
there was a read error:


   do {
      BZ2_bzRead(&bzerror, f->fp, &c, 1);
      f->pos++;
      *buf++ = c;
   } while (bzerror == BZ_OK && c != '\n' && buf != end);

This could be fixed by putting a "if (bzerror == BZ_OK) break;" after
the BZ2_bzRead() call.

However, I also noticed that in the universal newline section of the
code it is reading a character, incrementing f->pos, *THEN* checking if
buf == end and if so is throwing away the character.

I changed the code around so that the read loop is unified between
universal newlines and regular newlines.  I guess this is a small
performance penalty, since it's checking the newline mode for each
character read, however we're already doing a system call for every
character so one additional comparison and jump to merge duplicate code
for maintenance reasons is probably a good plan.  Especially since the
reason for this bug only existed in one of the two duplicated parts of
the code.

Please let me know if this looks good to commit.

----------
nosy: +jafo

_____________________________________
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1597011>
_____________________________________

Index: Modules/bz2module.c
===================================================================
--- Modules/bz2module.c	(revision 57479)
+++ Modules/bz2module.c	(working copy)
@@ -249,24 +249,20 @@
 
 	for (;;) {
 		Py_BEGIN_ALLOW_THREADS
-		if (univ_newline) {
-			while (1) {
-				BZ2_bzRead(&bzerror, f->fp, &c, 1);
-				f->pos++;
-				if (bzerror != BZ_OK || buf == end)
-					break;
+		while (buf != end) {
+			BZ2_bzRead(&bzerror, f->fp, &c, 1);
+			if (bzerror != BZ_OK) break;
+			f->pos++;
+			if (univ_newline) {
 				if (skipnextlf) {
 					skipnextlf = 0;
 					if (c == '\n') {
-						/* Seeing a \n here with
-						 * skipnextlf true means we
+						/* Seeing a \n here with skipnextlf true means we
 						 * saw a \r before.
 						 */
 						newlinetypes |= NEWLINE_CRLF;
-						BZ2_bzRead(&bzerror, f->fp,
-							   &c, 1);
-						if (bzerror != BZ_OK)
-							break;
+						BZ2_bzRead(&bzerror, f->fp, &c, 1);
+						if (bzerror != BZ_OK) break;
 					} else {
 						newlinetypes |= NEWLINE_CR;
 					}
@@ -276,17 +272,12 @@
 					c = '\n';
 				} else if ( c == '\n')
 					newlinetypes |= NEWLINE_LF;
-				*buf++ = c;
-				if (c == '\n') break;
 			}
-			if (bzerror == BZ_STREAM_END && skipnextlf)
-				newlinetypes |= NEWLINE_CR;
-		} else /* If not universal newlines use the normal loop */
-			do {
-				BZ2_bzRead(&bzerror, f->fp, &c, 1);
-				f->pos++;
-				*buf++ = c;
-			} while (bzerror == BZ_OK && c != '\n' && buf != end);
+			*buf++ = c;
+			if (c == '\n') break;
+		}
+		if (univ_newline && bzerror == BZ_STREAM_END && skipnextlf)
+			newlinetypes |= NEWLINE_CR;
 		Py_END_ALLOW_THREADS
 		f->f_newlinetypes = newlinetypes;
 		f->f_skipnextlf = skipnextlf;

_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1597011] Reading with bz2.BZ2File() returns one garbage character

Reply via email to