New submission from Peter: Under Python 2, gzip.open defaults to giving (non-unicode) strings.
Under Python 3, gzip.open defaults to giving bytes. Therefore it was fixed to allow text mode be specified, see http://bugs.python.org/issue13989 In order to write Python 2 and 3 compatible code to get strings from gzip, I now use: >>> import gzip >>> handle = gzip.open(filename, "rt") In general mode="rt" works great, but I just found this fails under Windows XP running Python 2.7, example below using the following gzipped plain text file: https://github.com/biopython/biopython/blob/master/Doc/examples/ls_orchid.gbk.gz This works perfectly on Linux giving strings on both Python 2 and 3 - not I am printing with repr to confirm we have a string object: $ python2.7 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)" 'LOCUS Z78533 740 bp DNA linear PLN 30-NOV-2006\n' 2.7.10 (default, Sep 28 2015, 13:58:31) [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] Also with a slightly newer Python 2.7, $ /mnt/apps/python/2.7/bin/python -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)" 'LOCUS Z78533 740 bp DNA linear PLN 30-NOV-2006\n' 2.7.13 (default, Mar 9 2017, 15:07:48) [GCC 4.9.2 20150212 (Red Hat 4.9.2-6)] $ python3.5 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)" 'LOCUS Z78533 740 bp DNA linear PLN 30-NOV-2006\n' 3.5.0 (default, Sep 28 2015, 11:25:31) [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] $ python3.4 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)" 'LOCUS Z78533 740 bp DNA linear PLN 30-NOV-2006\n' 3.4.3 (default, Aug 21 2015, 11:12:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] $ python3.3 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)" 'LOCUS Z78533 740 bp DNA linear PLN 30-NOV-2006\n' 3.3.0 (default, Nov 7 2012, 21:52:39) [GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] This works perfectly on macOS giving strings on both Python 2 and 3: $ python2.7 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)" 'LOCUS Z78533 740 bp DNA linear PLN 30-NOV-2006\n' 2.7.10 (default, Jul 30 2016, 19:40:32) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] $ python3.6 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)" 'LOCUS Z78533 740 bp DNA linear PLN 30-NOV-2006\n' 3.6.0 (v3.6.0:41df79263a11, Dec 22 2016, 17:23:13) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] This works perfectly on Python 3 running on Windows XP, C:\repositories\biopython\Doc\examples>c:\Python33\python.exe -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline()\ )); import sys; print(sys.version)" 'LOCUS Z78533 740 bp DNA linear PLN 30-NOV-2006\n' 3.3.5 (v3.3.5:62cf4e77f785, Mar 9 2014, 10:37:12) [MSC v.1600 32 bit (Intel)] C:\repositories\biopython\Doc\examples> C:\Python34\python.exe -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline(\ ))); import sys; print(sy s.version)" 'LOCUS Z78533 740 bp DNA linear PLN 30-NOV-2006\n' 3.4.4 (v3.4.4:737efcadf5a6, Dec 20 2015, 19:28:18) [MSC v.1600 32 bit (Intel)] However, it fails on Windows XP running Python 2.7.11 and (after upgrading) Python 2.7.13 though: C:\repositories\biopython\Doc\examples>c:\Python27\python -c "import sys; print(sys.version); import gzip; print(repr(gzip.open('ls_orch\ id.gbk.gz', 'rt').readlines()))" 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (Intel)] Traceback (most recent call last): File "<string>", line 1, in <module> File "c:\Python27\lib\gzip.py", line 34, in open return GzipFile(filename, mode, compresslevel) File "c:\Python27\lib\gzip.py", line 94, in __init__ fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb') ValueError: Invalid mode ('rtb') Note that the strangely contradictory mode seems to be accepted by Python 2.7 under Linux or macOS: $ python Python 2.7.10 (default, Sep 28 2015, 13:58:31) [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import gzip >>> gzip.open('ls_orchid.gbk.gz', 'rt') <gzip open file 'ls_orchid.gbk.gz', mode 'rtb' at 0x7f9af30c2f60 0x7f9aed1e5e50> >>> quit() $ python2.7 Python 2.7.10 (default, Jul 30 2016, 19:40:32) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import gzip >>> gzip.open('ls_orchid.gbk.gz', 'rt') <gzip open file 'ls_orchid.gbk.gz', mode 'rtb' at 0x10282c6f0 0x10287ef10> >>> quit() ---------- components: Library (Lib) messages: 291259 nosy: maubp priority: normal severity: normal status: open title: gzip.open(filename, "rt") fails on Python 2.7.11 on win32, invalid mode rtb versions: Python 2.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30012> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com