New submission from K Richard Pixley: There's a problem with tarfile. Write a program to traverse the contents of a modest sized tar archive. Make sure your tar archive is compressed. Then read the tar archive with your program.
I'm finding that allowing tarfile to read a compressed archive costs me somewhere on the order of a 60x performance penalty by comparison to opening the file with gzip, then passing the gzip contents to tarfile. Programs that could take a few minutes are literally taking a few hours when using tarfile. This seems stupid. The tarfile library could do the same thing I'm doing manually, in fact, I had assumed that it would and was surprised by the performance I was seeing, so I ran with the profiler and saw millions of decompression calls. It's almost as though the tarfile library is decompressing the entire archive for every member extraction. Note, you can get even worse performance if you sort the member names and then extract in that order. I'm not sure whether this "should" matter since the tar file order is sequential. ---------- components: Library (Lib) messages: 195232 nosy: teamnoir priority: normal severity: normal status: open title: pathological performance using tarfile type: performance versions: Python 2.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18744> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com