[issue18744] pathological performance using tarfile

K Richard Pixley Fri, 16 Aug 2013 14:38:07 -0700

K Richard Pixley added the comment:

I see your point.


The alternative would be to limit the size of archive that can be extracted 
from to the size of virtual memory, which is essentially what I'm doing 
manually.  Either way, someone will be surprised.  I'm not which which way will 
result in the least surprise since I suspect that far more people will be 
extracting from compressed archives than will be extracting very large 
archives.  The failure mode with limited file size seems much less frequent but 
also much more annoying.  In comparison, the failure, (and the pathological 
case is effectively a failure), reading compressed archives seems much more 
common to me, although granted, not completely a total failure.

I think this should be mentioned in the doc because I, at least, was extremely 
surprised by this behavior and it cost me some time to track it down.  I might 
suggest something along the lines of:

Be careful when working with compressed archives.  In order to support the 
largest file sizes possible, some approaches may result in pathological 
behavior causing the original archive to be decompressed, in full, many times.  
You should be able to avoid this behavior if you traverse the TarInfo items in 
file order.  You might also consider decompressing the archive first, in 
memory, and then handing the memory copy to tarfile for processing.

----------
status: pending -> open

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18744>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18744] pathological performance using tarfile

Reply via email to