Michael Fox added the comment: I thought about it some more and the only bug here is mine, failing to explicitly set mode='rt'.
Maybe back when someone invented text and binary modes they should have been clear which was to be the default for all things. Maybe when someone made the base class, io.IOBase they should have defined an .open() with a mode that matched open(). Maybe when someone first implemented gzip.open() they should have matched open(). But that's not what happened and there's going to be lots of code out there relying on the default 'rt' for open() and 'rb' for gzip/bz2/lzma.open(). There's going to be lots of bugs in the future as people familiar with one thing assume the default is the same for the other. But oh well. It's too late change. On Mon, May 20, 2013 at 9:50 AM, Michael Fox <rep...@bugs.python.org> wrote: > > Michael Fox added the comment: > > I thought of an even more hazardous case: > > if compression == 'gz': > import gzip > open = gzip.open > elif compression == 'xz': > import lzma > open = lzma.open > else: > pass > > On Mon, May 20, 2013 at 9:41 AM, Michael Fox <rep...@bugs.python.org> wrote: >> >> Michael Fox added the comment: >> >> You're right. In fact, what doesn't make sense is to be doing >> line-oriented reads on a binary file. Why was I doing that? >> >> I do have another quibble though. The open() function is like this: >> >> open(file, mode='r', buffering=-1, encoding=None, >> errors=None, newline=None, closefd=True, opener=None) -> file object >> >> The lzma.open() function is like this: >> >> lzma.open = open(filename, mode='rb', *, format=None, check=-1, >> preset=None, filters=None, encoding=None, errors=None, newline=None) >> >> It seems to me that it would be best for them to be as congruent as >> possible. Because people will try to do this (I did): >> >> if filename.endswith('.xz'): >> f = lzma.open(filename) >> else: >> f = open(filename) >> for line in f: ... >> >> And then they will be in for a surprise. Would you consider changing >> the default mode of lzma.open() to 'rt' and implement the 'buffering' >> parameter as it is implemented in open()? And further, can we discuss >> whether "duck typing" is becoming generally problematic in an >> expanding standard library and whether there should be some process, >> language, testing or something to ensure the ducks really quack the >> same? >> >> For example, there could be a standard testsuite which everything >> purporting to implement an open() function should be subject to. >> >> On Mon, May 20, 2013 at 7:42 AM, Nadeem Vawda <rep...@bugs.python.org> wrote: >>> >>> Nadeem Vawda added the comment: >>> >>> No, that is the intended behavior for binary streams - they operate at >>> the level of individual byes. If you want to treat your input file as >>> Unicode-encoded text, you should open it in text mode. This will return a >>> TextIOWrapper which handles the decoding and line splitting properly. >>> >>> ---------- >>> >>> _______________________________________ >>> Python tracker <rep...@bugs.python.org> >>> <http://bugs.python.org/issue18003> >>> _______________________________________ >> >> -- >> >> - >> Michael >> >> ---------- >> >> _______________________________________ >> Python tracker <rep...@bugs.python.org> >> <http://bugs.python.org/issue18003> >> _______________________________________ > > -- > > - > Michael > > ---------- > > _______________________________________ > Python tracker <rep...@bugs.python.org> > <http://bugs.python.org/issue18003> > _______________________________________ -- - Michael ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18003> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com