Lars Gustäbel added the comment:

In the past, our answer to these kinds of bug reports has always been that you 
must not extract an archive from an untrusted source without making sure that 
it has no malicious contents. And that tarfile conforms to the posix 
specifications with respect to extraction of files and pathname resolution.  
That's why we put this prominent warning in the documentation, and I think its 
advice still holds.

I don't think that this issue should be marked as a release blocker, because 
the way tarfile currently works was a conscious decision, not an accident.  
tarfile does what it is designed to do: it processes a sequence of instructions 
to store a number of files in the filesystem. So the attack that is described 
by Daniel Garcia exploits neither a bug in tarfile nor a loophole in the tar 
archive format. A necessary condition for this attack to work is that the 
attacker has to trick the user into extracting the malicious archive first. 
After that, tarfile interprets the contained instructions word-for-word but 
still only within the boundaries defined by the user's privileges.

I think it is obvious that it is potentially dangerous to extract tar archives 
we didn't create ourselves, because we actually give another person direct 
access to our filesystem. tarfile could mitigate some of the adverse effects, 
but this will not change the fact that it remains unsafe to use tarfile to a 
certain degree unless you use it with your own data or take reasonable 
precautions.

Anyway, if we come to the conclusion that we want to eliminate this kind of 
attack, we must be aware that there is a lot more to do than that. tarfile as 
it is today is vulnerable to all known attacks against tar programs, and maybe 
even a few more that rely on its specific implementation.


1. Path traversal:

    The archive contains files names e.g. /etc/passwd or ../etc/passwd.

2. Symlink file attack:

    foo links to /etc/passwd.
    Another member named foo follows, its data overwrites the target file's 
data.

3. Symlink directory attack:

    foo links to /etc.
    The following member foo/passwd overwrites /etc/passwd.

4. Hardlink attack:

    Hardlink member foo links to /etc/passwd.
    tarfile creates the hardlink to /etc/passwd because it cannot find it 
inside the archive and falls back to the one in the filesystem.
    Another file named foo follows, its data overwrites /etc/passwd's data.

5. Permission manipulation:

    The archive contains an executable that is placed somewhere in PATH with 
its setuid flag set, so that an unprivileged user is able to gain root 
privileges.

6. Device file attacks:

    The archive contains a device node foo with the same major and minor 
numbers as an attached device.
    Another member named foo follows, its data is written to the device.

7. Huge zero file attacks:

    Bzip2 and lzma allow it to store huge blobs of repetetive data in tiny 
archives. When unpacked this data may fill up an entire filesystem.

8. Excessive memory usage:

    tarfile saves one TarInfo object per member it finds in an archive. If the 
archive contains several millions of members, this may fill up the memory.

9. Saving a huge sparse file:

    tarfile is unable to detect holes in sparse files and thus cannot store 
them efficiently. Archiving a huge sparse file can take very long and may lead 
to a very big archive that fills up the filesystem.


Additionally, there are more issues mentioned in the GNU tar manual:

  https://www.gnu.org/software/tar/manual/html_node/Security.html


In conclusion, I like to emphasize that tarfile is a library, it is no 
replacement for GNU tar. And as a library it has a different focus, it is 
merely a building block for an application, and has to be used with a little 
bit of responsibility. And even if we start to implement all possible checks, 
I'm afraid we never can do without a warning in the documentation that reminds 
everyone to keep an eye on what they're doing.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21109>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to