Bugs item #1527974, was opened at 2006-07-24 23:00
Message generated for change (Comment added) made by arve_knudsen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1527974&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Arve Knudsen (arve_knudsen)
Assigned to: Nobody/Anonymous (nobody)
Summary: tarfile chokes on ipython archive on Windows

Initial Comment:
I'm trying to extract files from the latest ipython tar
archive, available from
http://ipython.scipy.org/dist/ipython-0.7.2.tar.gz,
using tarfile. This is on Windows XP, using Python
2.4.3. There is only a problem if I open the archive in
stream mode (the "mode" argument to tarfile.open is
"r|gz"), in which case tarfile raises StreamError. I'd
be happy if this error could be sorted out.

The following script should trigger the error:

import tarfile

f = file(r"ipython-0.7.2.tar.gz", "rb")
tar = tarfile.open(fileobj=f, mode="r|gz")
try:
    for m in tar:
        tar.extract(m)
finally:
    tar.close()
    f.close(

The resulting exception:
Traceback (most recent call last):
  File "tst.py", line 7, in ?
    tar.extract(m)
  File "C:\Program Files\Python24\lib\tarfile.py", line
1335, in extract
    self._extract_member(tarinfo, os.path.join(path,
tarinfo.name))
  File "C:\Program Files\Python24\lib\tarfile.py", line
1431, in _extract_member

    self.makelink(tarinfo, targetpath)
  File "C:\Program Files\Python24\lib\tarfile.py", line
1515, in makelink
    self._extract_member(self.getmember(linkpath),
targetpath)
  File "C:\Program Files\Python24\lib\tarfile.py", line
1423, in _extract_member

    self.makefile(tarinfo, targetpath)
  File "C:\Program Files\Python24\lib\tarfile.py", line
1461, in makefile
    copyfileobj(source, target)
  File "C:\Program Files\Python24\lib\tarfile.py", line
158, in copyfileobj
    shutil.copyfileobj(src, dst)
  File "C:\Program Files\Python24\lib\shutil.py", line
22, in copyfileobj
    buf = fsrc.read(length)
  File "C:\Program Files\Python24\lib\tarfile.py", line
551, in _readnormal
    self.fileobj.seek(self.offset + self.pos)
  File "C:\Program Files\Python24\lib\tarfile.py", line
420, in seek
    raise StreamError, "seeking backwards is not allowed"
tarfile.StreamError: seeking backwards is not allowed

----------------------------------------------------------------------

>Comment By: Arve Knudsen (arve_knudsen)
Date: 2006-07-27 00:20

Message:
Logged In: YES 
user_id=1522083

Regarding my last comment, sorry about the noise. After 
giving it some more thought I realized it was not very 
realistic implementation wise, seeing as you can't know 
whether a file is being linked to when you encounter it in 
the stream (right?).

So I followed your suggestion instead and handled the links 
on the client level. What I think I'd like to see in 
TarFile though is an 'extractall' method with the ability 
to report progress to an optional callback, since I'm only 
opening in stream mode as a hack to implement this myself 
(by monitoring file position). From browsing tarfile's 
source it seems it might require some effort though (with 
e.g. BZ2File you can't know the amount of data without 
decompressing everything?).

----------------------------------------------------------------------

Comment By: Arve Knudsen (arve_knudsen)
Date: 2006-07-25 11:58

Message:
Logged In: YES 
user_id=1522083

Yes I admit that is a weakness to my proposed approach.
Perhaps it would be a better idea to extract hardlinked
files to a temporary location and copy those files when
needed, as a cache? The only problem that I can think of
with this approach is the overhead, but perhaps this could
be configurable through a keyword if you think it would pose
a significant problem (i.e. keeping extra copies of
potentially huge files)?

The temporary cache would be private to tarfile, so there
should be no need to worry about modifications to the
contained files.

----------------------------------------------------------------------

Comment By: Lars Gustäbel (gustaebel)
Date: 2006-07-25 11:31

Message:
Logged In: YES 
user_id=642936

Copying the previously extracted file is no option. When the
archive is extracted inside a loop, you never know what
happens between two extract() calls. The original file could
have been renamed, changed or removed. Suppose you want to
extract just those members which are hard links:

for tarinfo in tar:
    if tarinfo.islnk():
        tar.extract(tarinfo)

I agree with you that the error message is bad because it
does not give the slightest idea of what's going wrong. I'll
see what I can do about that.

To work around your particular problem, my idea is to
subclass the TarFile class and replace the makelink() method
with one that simply copies the file as you proposed.

----------------------------------------------------------------------

Comment By: Arve Knudsen (arve_knudsen)
Date: 2006-07-25 10:59

Message:
Logged In: YES 
user_id=1522083

Thanks for the clarification, Lars. I'd prefer to continue
with my current approach however, since it allows me to
report progress as the tarfile is unpacked/decompressed.
Also, I don't think it would be satisfactory at all if
tarfile would just die with a mysterious error in such cases.

In order to resolve this, why must tarfile extract the file
again, can't it copy the already extracted file?

----------------------------------------------------------------------

Comment By: Lars Gustäbel (gustaebel)
Date: 2006-07-25 10:42

Message:
Logged In: YES 
user_id=642936

The traceback tells me that there is a hard link inside the
archive which means that a file in the archive is referenced
to twice. This hard link can be extracted only on platforms
that have an os.link() function. On Win32 they're not
supported by the file system, but tarfile works around this
by extracting the referenced file twice. In order to extract
the file the second time it is necessary that tarfile seeks
back in the input file to access the file's data again. But
"seeking backwards is not allowed" when a file is opened in
streaming mode ;-)
If you do not necessarily need streaming mode for your
application, better use "r:gz" or "r" and the problem will
be gone.

----------------------------------------------------------------------

Comment By: Arve Knudsen (arve_knudsen)
Date: 2006-07-25 10:04

Message:
Logged In: YES 
user_id=1522083

Ok, I've verified now that the problem persists with Python
2.4.4 (from the 2.4 branch in svn). The exact same thing
happens.

----------------------------------------------------------------------

Comment By: Arve Knudsen (arve_knudsen)
Date: 2006-07-25 09:29

Message:
Logged In: YES 
user_id=1522083

Well yeah, it appears to be Windows specific. I just tested
on Linux (Ubuntu), also with Python 2.4.3. I'll try 2.4.3+
on Windows to see if it makes any difference. Come to think
of it I think I experienced this problem in that past on
Linux, but then I solved it by repacking ipython. Also, if I
pack it myself on Windows using bsdtar it works fine.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-07-25 05:35

Message:
Logged In: YES 
user_id=33168

I tested this on Linux with both 2.5 and 2.4.3+ without
problems.  I believe there were some fixes in this area. 
Could you try testing with the 2.4.3+ current which will
become 2.4.4 (or 2.5b2)?  If this is still a problem, it
looks like it may be Windows specific.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1527974&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to