Re: Problem with tarfile module to open *.tar.gz files - unreliable ?

Dave Angel Fri, 20 Aug 2010 07:14:14 -0700

m_ahlenius wrote:

On Aug 20, 6:57 am, m_ahlenius <ahleni...@gmail.com> wrote:

On Aug 20, 5:34 am, Dave Angel <da...@ieee.org> wrote:

m_ahlenius wrote:

Hi,

I am relatively new to doing serious work in python. I am using it to

access a large number of log files.  Some of the logs get corrupted
and I need to detect that when processing them.  This code seems to
work for quite a few of the logs (all same structure)  It also
correctly identifies some corrupt logs but then it identifies others
as being corrupt when they are not.

example error msg from below code:Could not open the log file: '/disk/7-29-04-02-01.console.log.tar.gz'

Exception: CRC check\
 failed 0x8967e931 !=x4e5f1036L

When I manually examine the supposed corrupt log file and use

"tar -xzvof /disk/7-29-04-02-01.console.log.tar.gz "  on it, it opens
just fine.

Is there anything wrong with how I am using this module? (extra code

removed for clarity)

if tarfile.is_tarfile( file ):

        try:
            xf =arfile.open( file, "r:gz" )
            for locFile in xf:
                logfile =f.extractfile( locFile )
                validFileFlag =rue
                # iterate through each log file, grab the first and
the last lines
                lines =ter( logfile )
                firstLine =ines.next()
                for nextLine in lines:
                    ....
                        continue

logfile.close()

                 ...
            xf.close()
        except Exception, e:
            validFileFlag =alse
            msg =\nCould not open the log file: " + repr(file) + "
Exception: " + str(e) + "\n"
 else:
        validFileFlag =alse
        lTime =xtractFileNameTime( file )
        msg =>>>>>>> Warning " + file + " is NOT a valid tar archive
\n"
        print msg

I haven't used tarfile, but this feels like a problem with the Win/Unix
line endings.  I'm going to assume you're running on Windows, which
could trigger the problem I'm going to describe.

You use 'file' to hold something, but don't show us what. In fact, it'sa lousy name, since it's already a Python builtin. But if it's holdingfileobj, that you've separately opened, then you need to change that

open to use mode 'rb'

The problem, if I've guessed right, is that occasionally you'll

accidentally encounter a 0d0a sequence in the middle of the (binary)
compressed data.  If you're on Windows, and use the default 'r' mode,
it'll be changed into a 0a byte.  Thus corrupting the checksum, and
eventually the contents.

DaveA

Hi,

thanks for the comments - I'll change the variable name.

I am running this on linux so don't think its a Windows issue.  So if
that's the case
is the 0d0a still an issue?

'mark


Oh and what's stored currently in
The file var us just the unopened pathname to the
Target file I want to open

No, on Linux, there should be no such problem. And I have to assumethat if you pass the filename as a string, the library would use 'rb'anyway. It's just if you pass a fileobj, AND are on Windows.

Sorry I wasted your time, but nobody else had answered, and I hoped itmight help.


DaveA

--
http://mail.python.org/mailman/listinfo/python-list

Re: Problem with tarfile module to open *.tar.gz files - unreliable ?

Reply via email to