[issue18003] New lzma crazy slow with line-oriented reading.

Michael Fox Mon, 20 May 2013 09:52:19 -0700

Michael Fox added the comment:

I thought of an even more hazardous case:


if compression == 'gz':
    import gzip
    open = gzip.open
elif compression == 'xz':
    import lzma
    open = lzma.open
else:
    pass

On Mon, May 20, 2013 at 9:41 AM, Michael Fox <rep...@bugs.python.org> wrote:
>
> Michael Fox added the comment:
>
> You're right. In fact, what doesn't make sense is to be doing
> line-oriented reads on a binary file. Why was I doing that?
>
> I do have another quibble though. The open() function is like this:
>
> open(file, mode='r', buffering=-1, encoding=None,
>          errors=None, newline=None, closefd=True, opener=None) -> file object
>
> The lzma.open() function is like this:
>
> lzma.open = open(filename, mode='rb', *, format=None, check=-1,
> preset=None, filters=None, encoding=None, errors=None, newline=None)
>
> It seems to me that it would be best for them to be as congruent as
> possible. Because people will try to do this (I did):
>
> if filename.endswith('.xz'):
>     f = lzma.open(filename)
> else:
>     f = open(filename)
> for line in f: ...
>
> And then they will be in for a surprise. Would you consider changing
> the default mode of lzma.open() to 'rt' and implement the 'buffering'
> parameter as it is implemented in open()? And further, can we discuss
> whether "duck typing" is becoming generally problematic in an
> expanding standard library and whether there should be some process,
> language, testing or something to ensure the ducks really quack the
> same?
>
> For example, there could be a standard testsuite which everything
> purporting to implement an open() function should be subject to.
>
> On Mon, May 20, 2013 at 7:42 AM, Nadeem Vawda <rep...@bugs.python.org> wrote:
>>
>> Nadeem Vawda added the comment:
>>
>> No, that is the intended behavior for binary streams - they operate at
>> the level of individual byes. If you want to treat your input file as
>> Unicode-encoded text, you should open it in text mode. This will return a
>> TextIOWrapper which handles the decoding and line splitting properly.
>>
>> ----------
>>
>> _______________________________________
>> Python tracker <rep...@bugs.python.org>
>> <http://bugs.python.org/issue18003>
>> _______________________________________
>
> --
>
> -
> Michael
>
> ----------
>
> _______________________________________
> Python tracker <rep...@bugs.python.org>
> <http://bugs.python.org/issue18003>
> _______________________________________

-- 

-
Michael

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18003>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18003] New lzma crazy slow with line-oriented reading.

Reply via email to