Lars Gustäbel added the comment:

Let me present for discussion a proposal (and a patch with documentation) with 
an approach that is a little different, but in my opinion the most effective. I 
hope that it will appeal to all involved.

My proposal consists of a new class SafeTarFile, that is a subclass and drop-in 
replacement for the TarFile class and can be employed whenever the user feels 
the necessity.  It can be used the same way as TarFile, with the difference 
that SafeTarFile is equipped with a wide range of tests and as soon as it 
detects anything bad it interrupts the current operation with a SecurityError 
exception. This way damage is effectively averted, and it is up to the 
developer to decide whether he rejects the archive altogether (which is the 
obvious and recommended measure) or he wants to continue to process it in a 
subsequent step (on his own responsibility).

To simplify a few common operations, SafeTarFile has three more methods: 
analyze(), filter() and is_safe(). These methods will allow access to the 
archive without SecurityError exceptions being raised. The analyze() method is 
a kind of low-level iterator that produces each TarInfo object together with a 
list of warnings (if the member is bad) as a tuple. This gives a developer 
access to all the information he needs to implement his own more differentiated 
way of handling bad archives. The filter() method is a convenience method that 
provides an iterator over all the "good" members of an archive leaving out all 
the "bad" ones. It can be used as an argument to SafeTarFile.extractall() for 
example. is_safe() is a high-level shortcut method that reduces the result of 
the analysis to a simple True or False.

SafeTarFile has a variety of checks that test e.g. for bad pathnames, bad 
permissions and duplicate files. Also, to prevent denial-of-service scenarios, 
it enforces user-defined limits upon the archive, such as a maximum number of 
files or a maxmimum size of unpacked data.

The main advantage of this approach is the higher degree of security. The 
practice of rewriting paths (e.g. like in Daniel.Garcia's patch) is 
error-prone, has side-effects and is hard to maintain because of its tendency 
towards regression. It just adds another layer of complexity to an already 
complex and delicate problem.

SafeTarFile (or whatever it will be called) is backward compatible and easy to 
maintain, because it is an isolated addition to the tarfile module. It is 
easily subclassable to add more tests. It can be used as a standalone tool to 
check for bad archives and possible denial-of-service scenarios. Its analyze() 
method tells the user exactly what's wrong with the archive instead of keeping 
it away from him. Instead of silently extracting files to locations they 
weren't expected to be stored (i.e. after "fixing" their pathnames), 
SafeTarFile simply refuses to extract them at all. This way it is far more 
transparent and understandable to the user what happens.

Feedback is welcome.

----------
assignee:  -> lars.gustaebel
priority: release blocker -> normal
versions:  -Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file35127/safetarfile-1.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21109>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to