Re: Checking Common File Types

rusi Sun, 01 Dec 2013 19:12:08 -0800

On Monday, December 2, 2013 5:11:15 AM UTC+5:30, jade wrote:
> > To: pytho...@python.org
> > From: wlf...@ix.netcom.com
> > Subject: Re: Checking Common File Types
> > Date: Sun, 1 Dec 2013 18:23:22 -0500
> > 
> > On Sun, 1 Dec 2013 18:27:16 +0000, jade <jade...@msn.com> declaimed the
> > following:
> > 
> > >Hello, 
> > >I'm trying to create a script that checks all the files in my 'downloaded' 
> > >directory against common file types and then tells me how many of the 
> > >files in that directory aren't either a GIF or a JPG file. I'm familiar 
> > >with basic Python but this is the first time I've attempted anything like 
> > >this and I'm looking for a little help or a point in the right direction? 
> > >
> > >file_sigs = {'\xFF\xD8\xFF':('JPEG','jpg'),  '\x47\x49\x46':('GIF','gif')}
> > 
> >     Apparently you presume the file extensions are inaccurate, as you are
> > digging into the files for signatures.
> > 
> > >def readFile():    filename = r'c:/temp/downloads'      fh = 
> > >open(filename, 'r')     file_sig = fh.read(4) print '[*] check_sig() 
> > >File:',filename #, 'Hash Sig:', binascii.hexlify(file_sig) 
> > 
> >     Note: if you are hardcoding forward slashes, you don't need the raw
> > indicator...
> > 
> >     That said, what is "c:/temp/downloads"? You apparently are opening IT
> > as the file to be examined. Is it supposed to be a directory containing
> > many files, a file containing a list of files, ???
> > 
> >     What is "check_sig" -- it looks like a function you haven't defined --
> > but it's inside the quotes making a string literal that will never be
> > called anyway.
> > 
> >     If you are just concerned with one directory of files, you might want
> > to read the help file on the glob module, along with os.path
> > (join/splitext/etc). Or just string methods...
> > 
> > >>> import glob
> > >>> import os.path
> > >>> TARGET = os.path.join(os.environ["USERPROFILE"],
> > ...         "documents/BW-conversion/*")
> > >>> TARGET = os.path.join(os.environ["USERPROFILE"],
> > ...         "documents/BW-conversion/*")
> > >>> files = glob.glob(TARGET)
> > >>> for fn in files:
> > ...         fp, fx = os.path.splitext(fn)
> > ...         print "File %s purports to be of type %s" % (fn, fx.upper())
> > ... 
> > File C:\Users\Wulfraed\documents/BW-conversion\BW-1.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\BW-2.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\BW-3.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\BW-4.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\BWConv.html purports to be
> > of type .HTML
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_b1.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_b2.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_b3.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_b4.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_b5.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_b6.jpg purports to be of
> > type .JPG
> > File C:\Users\Wulfraed\documents/BW-conversion\roo_col.jpg purports to be
> > of type .JPG
> > >>> 
> > -- 
> >     Wulfraed                 Dennis Lee Bieber         AF6VN
> >     wlf...@ix.netcom.com    HTTP://wlfraed.home.netcom.com/
> > 
> > -- 
> > https://mail.python.org/mailman/listinfo/python-list
>
>
>
> Hi, thanks for all your replies. I realised pretty soon after I asked for 
> help that I was trying to read the wrong amount of bytes and set about 
> completely rewriting my code (after a coffee break)
>
> import sys, os, binascii
>
> def readfile():
>
>
>     dictionary = {'474946':('GIF', 'gif'), 'ffd8ff':('JPEG', 'jpeg')}
>     try:
>         files = os.listdir('C:\\Temp\\downloads')        
>         for item in files:
>             f = open('C:\\Temp\\downloads\\'+ item, 'r')
>             file_sig = f.read(3)
>             file_sig_hex = binascii.hexlify(file_sig)
>                         
>             if file_sig_hex in dictionary:
>                 print item + ' is a image file, it is a ' + file_sig
>
>             else:
>                 print item + ' is not an image file, it is' +file_sig
>
>             print file_sig_hex
>
>     
>
>     except:
>         print 'Error. Try again'
>
>     finally:
>         if 'f' in locals():
>             f.close()
>
> def main():
>  
>     readfile()
>
> if __name__ == '__main__':
>     main()
>
> As of right now my script prints out 'Error Try again' but when i comment out 
> this part of the code;
>
>           if file_sig_hex in dictionary:
>                 print item + ' is a image file' + dictionary 
>
>             else:
>                 print item + ' is not an image file, is it' +dictionary 
>
>             
>
> it prints the file signatures to the screen, however what I'm trying to do 
> with the if statement is tell me if the file is an image and give me is 
> signature and if it is not, I want it to tell me and still give me it's 
> signature and tell me what type of file it is. Can anyone point out an 
> obvious error?



You are catching all exceptions -- that garbages all the debugging finesse that 
python offers you. Dont.
http://stackoverflow.com/questions/10594113/bad-idea-to-catch-all-exceptions-in-python

On a different note: You seem to be using google groups.
It causes some nuisance to people:
https://wiki.python.org/moin/GoogleGroupsPython

Heres a more automated solution 

see my post here:
https://groups.google.com/forum/#!topic/comp.lang.python/Cf6adRN3KGs
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Checking Common File Types

Reply via email to