thanks for the responses. i'm having quite a good time learning python. On Thu, Sep 18, 2014 at 11:45 AM, Chris Kaynor <ckay...@zindagigames.com> wrote: > > Additionally, you may want to specify binary mode by using open(file_path, > 'rb') to ensure platform-independence ('r' uses Universal newlines, which > means on Windows, Python will convert "\r\n" to "\n" while reading the > file). Additionally, some platforms will treat binary files differently. >
would it be good to use 'rb' all the time? On Thu, Sep 18, 2014 at 11:48 AM, Chris Angelico <ros...@gmail.com> wrote: > On Fri, Sep 19, 2014 at 4:11 AM, David Alban <exta...@extasia.org> wrote: > > exit( 0 ) > > Unnecessary - if you omit this, you'll exit 0 implicitly at the end of > the script. > aha. i've been doing this for years even with perl, and apparently it's not necessary in perl either. i was influenced by shell. this shell code: * if [[ -n $report_mode ]] ; then* * do_report* * fi* * exit 0* is an example of why you want the last normally executed shell statement to be "exit 0". if you omit the exit statement it in this example, and $report_mode is not set, your shell program will give a non-zero return code and appear to have terminated with an error. in shell the last expression evaluated determines the return code to the os. ok, i don't need to do this in python. On Thu, Sep 18, 2014 at 1:23 PM, Peter Otten <__pete...@web.de> wrote: > > file_path may contain newlines, therefore you should probably use "\0" to > separate the records. i chose to stick with ascii nul as the default field separator, but i added a --field-separator option in case someone wants human readable output. style question: if there is only one, possibly short statement in a block, do folks usually move it up to the line starting the block? *if not S_ISREG( mode ) or S_ISLNK( mode ):* * return* vs. *if not S_ISREG( mode ) or S_ISLNK( mode ): return* or even: *with open( file_path, 'rb' ) as f: md5sum = md5_for_file( file_path )* fyi, here are my changes: *usage: dupscan [-h] [--start-directory START_DIRECTORY]* * [--field-separator FIELD_SEPARATOR]* *scan files in a tree and print a line of information about each regular file* *optional arguments:* * -h, --help show this help message and exit* * --start-directory START_DIRECTORY, -d START_DIRECTORY* * Specify the root of the filesystem tree to be* * processed. The default is '.'* * --field-separator FIELD_SEPARATOR, -s FIELD_SEPARATOR* * Specify the string to use as a field separator in* * output. The default is the ascii nul character.* *#!/usr/bin/python* *import argparse* *import hashlib* *import os* *from platform import node* *from stat import S_ISREG, S_ISLNK* *ASCII_NUL = chr(0)* * # from: http://stackoverflow.com/questions/1131220/get-md5-hash-of-big-files-in-python <http://stackoverflow.com/questions/1131220/get-md5-hash-of-big-files-in-python>* * # except that i use hexdigest() rather than digest()* *def md5_for_file( path, block_size=2**20 ):* * md5 = hashlib.md5()* * with open( path, 'rb' ) as f:* * while True:* * data = f.read(block_size)* * if not data:* * break* * md5.update(data)* * return md5.hexdigest()* *def file_info( directory, basename, field_separator=ASCII_NUL ):* * file_path = os.path.join( directory, basename )* * st = os.lstat( file_path )* * mode = st.st_mode* * if not S_ISREG( mode ) or S_ISLNK( mode ): * * return* * with open( file_path, 'rb' ) as f:* * md5sum = md5_for_file( file_path )* * return field_separator.join( [ thishost, md5sum, str( st.st_dev ), str( st.st_ino ), str( st.st_nlink ), str( st.st_size ), file_path ] )* *if __name__ == "__main__":* * parser = argparse.ArgumentParser(description='scan files in a tree and print a line of information about each regular file')* * parser.add_argument('--start-directory', '-d', default='.', help='''Specify the root of the filesystem tree to be processed. The default is '.' ''')* * parser.add_argument('--field-separator', '-s', default=ASCII_NUL, help='Specify the string to use as a field separator in output. The default is the ascii nul character.')* * args = parser.parse_args()* * start_directory = args.start_directory.rstrip('/')* * field_separator = args.field_separator* * thishost = node()* * if thishost == '':* * thishost='[UNKNOWN]'* * for directory_path, directory_names, file_names in os.walk( start_directory ):* * for file_name in file_names:* * print file_info( directory_path, file_name, field_separator )* -- Live in a world of your own, but always welcome visitors.
-- https://mail.python.org/mailman/listinfo/python-list