New submission from Serhiy Storchaka: TarFile.list() fails on some files. In particular on Lib/test/testtar.tar.
>>> import tarfile >>> tarfile.open('Lib/test/testtar.tar').list() ?rw-r--r-- tarfile/tarfile 7011 2003-01-06 01:19:43 ustar/conttype ?rw-r--r-- tarfile/tarfile 7011 2003-01-06 01:19:43 ustar/regtype ?rwxr-xr-x tarfile/tarfile 0 2003-01-06 01:19:43 ustar/dirtype/ ?rwxr-xr-x tarfile/tarfile 255 2003-01-06 01:19:43 ustar/dirtype-with-size/ ?rw-r--r-- tarfile/tarfile 0 2003-01-06 01:19:43 ustar/lnktype link to ustar/regtype ?rwxrwxrwx tarfile/tarfile 0 2003-01-06 01:19:43 ustar/symtype -> regtype ?rw-rw---- tarfile/tarfile 3,0 2003-01-06 01:19:43 ustar/blktype ?rw-rw-rw- tarfile/tarfile 1,3 2003-01-06 01:19:43 ustar/chrtype ?rw-r--r-- tarfile/tarfile 0 2003-01-06 01:19:43 ustar/fifotype ?rw-r--r-- tarfile/tarfile 86016 2003-01-06 01:19:43 ustar/sparse ?rw-r--r-- tarfile/tarfile 7011 2003-01-06 01:19:43 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/serhiy/py/cpython/Lib/tarfile.py", line 1846, in list print(tarinfo.name + ("/" if tarinfo.isdir() else ""), end=' ') UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 14: surrogates not allowed Command-line interface of the tarfile module also fails: $ ./python -m tarfile -v -l Lib/test/testtar.tar ?rw-r--r-- tarfile/tarfile 7011 2003-01-06 01:19:43 ustar/conttype ?rw-r--r-- tarfile/tarfile 7011 2003-01-06 01:19:43 ustar/regtype ?rwxr-xr-x tarfile/tarfile 0 2003-01-06 01:19:43 ustar/dirtype/ ?rwxr-xr-x tarfile/tarfile 255 2003-01-06 01:19:43 ustar/dirtype-with-size/ ?rw-r--r-- tarfile/tarfile 0 2003-01-06 01:19:43 ustar/lnktype link to ustar/regtype ?rwxrwxrwx tarfile/tarfile 0 2003-01-06 01:19:43 ustar/symtype -> regtype ?rw-rw---- tarfile/tarfile 3,0 2003-01-06 01:19:43 ustar/blktype ?rw-rw-rw- tarfile/tarfile 1,3 2003-01-06 01:19:43 ustar/chrtype ?rw-r--r-- tarfile/tarfile 0 2003-01-06 01:19:43 ustar/fifotype ?rw-r--r-- tarfile/tarfile 86016 2003-01-06 01:19:43 ustar/sparse Traceback (most recent call last): File "/home/serhiy/py/cpython/Lib/runpy.py", line 160, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/home/serhiy/py/cpython/Lib/runpy.py", line 73, in _run_code exec(code, run_globals) File "/home/serhiy/py/cpython/Lib/tarfile.py", line 2500, in <module> main() File "/home/serhiy/py/cpython/Lib/tarfile.py", line 2444, in main tf.list(verbose=args.verbose) File "/home/serhiy/py/cpython/Lib/tarfile.py", line 1846, in list print(tarinfo.name + ("/" if tarinfo.isdir() else ""), end=' ') UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 14: surrogates not allowed ?rw-r--r-- tarfile/tarfile 7011 2003-01-06 01:19:43 serhiy@raxxla:~/py/cpython$ ---------- components: IO, Library (Lib), Unicode messages: 205475 nosy: benjamin.peterson, ezio.melotti, haypo, lars.gustaebel, lemburg, pitrou, serhiy.storchaka priority: normal severity: normal status: open title: TarFile.list() fails on some files type: behavior versions: Python 2.7, Python 3.3, Python 3.4 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19920> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com