New submission from Peter:

Under Python 2, gzip.open defaults to giving (non-unicode) strings.

Under Python 3, gzip.open defaults to giving bytes. Therefore it was fixed to 
allow text mode be specified, see http://bugs.python.org/issue13989

In order to write Python 2 and 3 compatible code to get strings from gzip, I 
now use:

>>> import gzip
>>> handle = gzip.open(filename, "rt")

In general mode="rt" works great, but I just found this fails under Windows XP 
running Python 2.7, example below using the following gzipped plain text file:

https://github.com/biopython/biopython/blob/master/Doc/examples/ls_orchid.gbk.gz

This works perfectly on Linux giving strings on both Python 2 and 3 - not I am 
printing with repr to confirm we have a string object:

$ python2.7 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 
'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 
30-NOV-2006\n'
2.7.10 (default, Sep 28 2015, 13:58:31) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)]

Also with a slightly newer Python 2.7,

$ /mnt/apps/python/2.7/bin/python  -c "import gzip; 
print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; 
print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 
30-NOV-2006\n'
2.7.13 (default, Mar  9 2017, 15:07:48) 
[GCC 4.9.2 20150212 (Red Hat 4.9.2-6)]

$ python3.5 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 
'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 
30-NOV-2006\n'
3.5.0 (default, Sep 28 2015, 11:25:31) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)]

$ python3.4 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 
'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 
30-NOV-2006\n'
3.4.3 (default, Aug 21 2015, 11:12:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]

$ python3.3 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 
'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 
30-NOV-2006\n'
3.3.0 (default, Nov  7 2012, 21:52:39) 
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)]


This works perfectly on macOS giving strings on both Python 2 and 3:


$ python2.7 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 
'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 
30-NOV-2006\n'
2.7.10 (default, Jul 30 2016, 19:40:32) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)]

$ python3.6 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 
'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 
30-NOV-2006\n'
3.6.0 (v3.6.0:41df79263a11, Dec 22 2016, 17:23:13) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]


This works perfectly on Python 3 running on Windows XP,


C:\repositories\biopython\Doc\examples>c:\Python33\python.exe -c "import gzip; 
print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline()\
)); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 
30-NOV-2006\n'
3.3.5 (v3.3.5:62cf4e77f785, Mar  9 2014, 10:37:12) [MSC v.1600 32 bit (Intel)]

C:\repositories\biopython\Doc\examples> C:\Python34\python.exe -c "import gzip; 
print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline(\
))); import sys; print(sy
s.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 
30-NOV-2006\n'
3.4.4 (v3.4.4:737efcadf5a6, Dec 20 2015, 19:28:18) [MSC v.1600 32 bit (Intel)]



However, it fails on Windows XP running Python 2.7.11 and (after upgrading) 
Python 2.7.13 though:


C:\repositories\biopython\Doc\examples>c:\Python27\python -c "import sys; 
print(sys.version); import gzip; print(repr(gzip.open('ls_orch\
id.gbk.gz', 'rt').readlines()))"
2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (Intel)]

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\Python27\lib\gzip.py", line 34, in open
    return GzipFile(filename, mode, compresslevel)
  File "c:\Python27\lib\gzip.py", line 94, in __init__
    fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
ValueError: Invalid mode ('rtb')


Note that the strangely contradictory mode seems to be accepted by Python 2.7 
under Linux or macOS:


$ python
Python 2.7.10 (default, Sep 28 2015, 13:58:31) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> gzip.open('ls_orchid.gbk.gz', 'rt')
<gzip open file 'ls_orchid.gbk.gz', mode 'rtb' at 0x7f9af30c2f60 0x7f9aed1e5e50>
>>> quit()


$ python2.7
Python 2.7.10 (default, Jul 30 2016, 19:40:32) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> gzip.open('ls_orchid.gbk.gz', 'rt')
<gzip open file 'ls_orchid.gbk.gz', mode 'rtb' at 0x10282c6f0 0x10287ef10>
>>> quit()

----------
components: Library (Lib)
messages: 291259
nosy: maubp
priority: normal
severity: normal
status: open
title: gzip.open(filename, "rt") fails on Python 2.7.11 on win32, invalid mode 
rtb
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30012>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to