On Fri, Sep 19, 2014 at 4:11 AM, David Alban <exta...@extasia.org> wrote: > i'm a long time perl programmer who is learning python. i'd be interested > in any comments you might have on my code below. feel free to respond > privately if you prefer. i'd like to know if i'm on the right track.
Sure! Happy to help out. But first, a comment about your English, as shown above: It's conventional to capitalize in certain places, and it does make your prose more readable. Just as PEP 8 does for Python code, tidy spelling and linguistic conventions make it easier for other "English programmers" (wordsmiths?) to follow what you're doing. I've trimmed out any parts of your code that I don't have comments on. > ascii_nul = chr(0) You use this in exactly one place, to initialize sep. I'd either set sep up here, or use ascii_nul down below (probably the former). But if you're going to have a named constant, the Python convention is to upper-case it: ASCII_NUL = chr(0). > def md5_for_file(f, block_size=2**20): > md5 = hashlib.md5() > while True: > data = f.read(block_size) > if not data: > break > md5.update(data) > return md5.hexdigest() This is used by opening a file and then passing the open file to this function. Recommendation: Pass a file name, and have this function open and close the file itself. Then it'll also be a candidate for the obvious tidyup of using the file as a context manager - the 'with' statement guarantees that the file will be promptly closed. > thishost = socket.gethostname() (Be aware that this won't always give you a useful value. You may want to provide a default.) > start_directory = re.sub( '/+$', '', args.start_directory ) Hello, Perl programmer :) When there's text to manipulate, you first reach for a regular expression. What's this looking to do, exactly? Trim off any trailing slashes? Python has a clearer way to write that: start_directory = args.start_directory.rstrip("/") Lots of text manipulation in Python is done with methods on the string itself. Have an explore - there's all sorts of powerful stuff there. > for directory_path, directory_names, file_names in os.walk( start_directory > ): Personally, I believe in Huffman coding my names. For locals that exist for the sole purpose of loop iteration (particularly the third one, which you use in one place), I'd use shorter names: for path, dirs, files in os.walk(start_directory): for fn in files: > lstat_info = os.lstat( file_path ) > dev = lstat_info.st_dev > ino = lstat_info.st_ino > nlink = lstat_info.st_nlink > size = lstat_info.st_size > > sep = ascii_nul > > print "%s%c%s%c%d%c%d%c%d%c%d%c%s" % ( thishost, sep, md5sum, sep, dev, > sep, ino, sep, nlink, sep, size, sep, file_path ) Python 3 turns print into a function, which is able to do this rather more cleanly. You can call on the new behaviour in a Python 2 program by putting this immediately under your shebang: from __future__ import print_function And then you can write the print call like this: print(thishost, md5sum, dev, ino, nlink, size, file_path, sep=chr(0)) Or, eliminating all the assignment to locals: st = os.lstat( file_path ) print(thishost, md5sum, st.st_dev, st.st_ino, st.st_nlink, st.st_size, file_path, sep=chr(0)) That's shorter than your original line, *and* it doesn't have all the assignments :) > exit( 0 ) Unnecessary - if you omit this, you'll exit 0 implicitly at the end of the script. Standard rule of programming: There's always another way to do it. Standard rule of asking for advice: There's always someone who will advocate another way of doing it. It's up to you to decide which advice is worth following, which is worth taking note of but not actually following, and which is to be thrown out as rubbish :) All the best! ChrisA -- https://mail.python.org/mailman/listinfo/python-list