On Thursday, April 2, 2015 at 8:03:53 PM UTC-4, Dennis Lee Bieber wrote: > On Thu, 2 Apr 2015 05:46:57 -0700 (PDT), Saran A > <ahlusar.ahluwa...@gmail.com> declaimed the following: > > > > >@ChrisA - this is a smaller function that will take the most updated file. > >My intention is the following: > > > >* Monitor a folder for files that are dropped throughout the day > > > I would suggest that your first prototype is to be a program that > contains a function whose only purpose is to report on the files it finds > -- forget about all the processing/moving of the files until you can > successfully loop around the work of fetching the directory and handling > the file names found (by maybe printing the names of the ones determined to > be new since last fetch). > > >* When a file is dropped in the folder the program should scan the file > > > >o IF all the contents in the file have the same length (let's assume line > >length) > > > >o THEN the file should be moved to a "success" folder and a text file > >written indicating the total number of records/lines/words processed > > > >o IF the file is empty OR the contents are not all of the same length > > > >o THEN the file should be moved to a "failure" folder and a text file > >written indicating the cause for failure (for example: Empty file or line > >100 was not the same length as the rest). > > > You still haven't defined how you determine the "correct length" of the > record. What if the first line is 79 characters, and all the others are 80 > characters? Do you report ALL lines EXCEPT the first as being the wrong > length, when really it is the first line that is wrong? > > Also, if the files are Unicode (UTF-8, in particular) -- the byte > length of a line could differ but the character length could be the same. > > >Here is the code I have written: > > > >import os > >import time > >import glob > >import sys > > > >def initialize_logger(output_dir): > > logger = logging.getLogger() > > logger.setLevel(logging.DEBUG) > > > > # create console handler and set level to info > > handler = logging.StreamHandler() > > handler.setLevel(logging.INFO) > > formatter = logging.Formatter("%(levelname)s - %(message)s") > > handler.setFormatter(formatter) > > logger.addHandler(handler) > > > > # create error file handler and set level to error > > handler = logging.FileHandler(os.path.join(output_dir, "error.log"),"w", > > encoding=None, delay="true") > > handler.setLevel(logging.ERROR) > > formatter = logging.Formatter("%(levelname)s - %(message)s") > > handler.setFormatter(formatter) > > logger.addHandler(handler) > > > > # create debug file handler and set level to debug > > handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w") > > handler.setLevel(logging.DEBUG) > > formatter = logging.Formatter("%(levelname)s - %(message)s") > > handler.setFormatter(formatter) > > logger.addHandler(handler) > > > >#Helper Functions for the Success and Failure Folder Outcomes, respectively > > > >#checks the length of the file > > def file_len(filename > > with open(filename) as f: > > for i, l in enumerate(f): > > pass > > return i + 1 > > > >#copies file to new destination > > > > def copyFile(src, dest): > > try: > > shutil.copy(src, dest) > > # eg. src and dest are the same file > > except shutil.Error as e: > > print('Error: %s' % e) > > # eg. source or destination doesn't exist > > except IOError as e: > > print('Error: %s' % e.strerror) > > > >#Failure Folder > > > >def move_to_failure_folder_and_return_error_file(): > > os.mkdir('Failure') > > copyFile(filename, 'Failure') > > initialize_logger('rootdir/Failure') > > logging.error("Either this file is empty or the lines") > > > ># Success Folder Requirement > > > >def move_to_success_folder_and_read(file): > > os.mkdir('Success') > > copyFile(filename, 'Success') > > print("Success", file) > > return file_len() > > > > > >#This simply checks the file information by name > > > >def fileinfo(file): > > filename = os.path.basename(file) > > rootdir = os.path.dirname(file) > > lastmod = time.ctime(os.path.getmtime(file)) > > creation = time.ctime(os.path.getctime(file)) > > filesize = os.path.getsize(file) > > return filename, rootdir, lastmod, creation, filesize > > > >if __name__ == '__main__': > > import sys > > validate_files(sys.argv[1:]) > > Yeesh... Did you even try running that? > > validate_files is not defined > file_len is at the wrong indentation > is syntactically garbage > is a big time-waste (you read > the file just to > enumerate the number of lines? Why didn't you count the lines while > checking the line lengths) > copyFile is at the wrong indentation > (after a bunch of word_word, > why camelCase here) > > Correct all the edit errors and copy/paste the actual file that at > least attempts to run. > > You might also want to look at os.stat, rather than using three os.path > calls. > -- > Wulfraed Dennis Lee Bieber AF6VN > wlfr...@ix.netcom.com HTTP://wlfraed.home.netcom.com/
@Dennis: Below is my full program (so far). Please feel free to tear it apart and provide me with constructive criticism. I have been programming for 8 months now and this is a huge learning experience for me. Feedback and modifications is very welcome. What would be a better name for dirlist? # # # Without data to examine here, I can only guess based on this requirement's language that # # fixed records are in the input. ##I made the assumption that the directories are in the same filesystem # # Takes the function fileinfo as a starting point and demonstrates calling a function from within a function. # I tested this little sample on a small set of files created with MD5 checksums. I wrote the Python in such a way as it # would work with Python 2.x or 3.x (note the __future__ at the top). # # # There are so many wonderful ways of failure, so, from a development standpoint, I would probably spend a bit # # more time trying to determine which failure(s) I would want to report to the user, and how (perhaps creating my own Exceptions) # # # The only other comments I would make are about safe-file handling. # # # #1: Question: After a user has created a file that has failed (in # # # processing),can the user create a file with the same name? # # # If so, then you will probably want to look at some sort # # # of file-naming strategy to avoid overwriting evidence of # # # earlier failures. # # # File naming is a tricky thing. I referenced the tempfile module [1] and the Maildir naming scheme to see two different # # types of solutions to the problem of choosing a unique filename. ## I am assuming that all of my files are going to be specified in unicode ## Utilized Spyder's Scientific Computing IDE to debug, check for indentation errors and test function suite from __future__ import print_function import os.path import time import difflib import logging def initialize_logger(output_dir): logger = logging.getLogger() logger.setLevel(logging.DEBUG) # create console handler and set level to info handler = logging.StreamHandler() handler.setLevel(logging.INFO) formatter = logging.Formatter("%(levelname)s - %(message)s") handler.setFormatter(formatter) logger.addHandler(handler) # create error file handler and set level to error handler = logging.FileHandler(os.path.join(output_dir, "error.log"),"w", encoding=None, delay="true") handler.setLevel(logging.ERROR) formatter = logging.Formatter("%(levelname)s - %(message)s") handler.setFormatter(formatter) logger.addHandler(handler) # create debug file handler and set level to debug handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w") handler.setLevel(logging.DEBUG) formatter = logging.Formatter("%(levelname)s - %(message)s") handler.setFormatter(formatter) logger.addHandler(handler) #This function's purpose is to obtain the filename, rootdir and filesize def fileinfo(f): filename = os.path.basename(f) rootdir = os.path.dirname(f) filesize = os.path.getsize(f) return filename, rootdir, filesize #This helper function returns the length of the file def file_len(f): with open(f) as f: for i, l in enumerate(f): pass return i + 1 #This helper function attempts to copy file and move file to the respective directory #I am assuming that the directories are in the same filesystem # If directories ARE in different file systems, I would use the following helper function: # def move(src, dest): # shutil.move(src, dest) def copy_and_move_file(src, dest): try: os.rename(src, dest) # eg. src and dest are the same file except IOError as e: print('Error: %s' % e.strerror) path = "." dirlist = os.listdir(path) # Caveats of the "main" function is that it does not scale well #(although it is appropriate if one assumes that there will be few changes) # It does not account for updated files existing in the directory - only new files "dropped" in # (If this was included in the requirements, os.stat would be appropriate here) def main(dirlist): before = dict([(f, 0) for f in dirlist]) while True: time.sleep(1) #time between update check after = dict([(f, None) for f in dirlist]) added = [f for f in after if not f in before] if added: f = ''.join(added) print('Sucessfully added %s file - ready to validate') %(f) return validate_files(f) else: return move_to_failure_folder_and_return_error_file(f) def validate_files(f): creation = time.ctime(os.path.getctime(f)) lastmod = time.ctime(os.path.getmtime(f)) if creation == lastmod and file_len(f) > 0: return move_to_success_folder_and_read(f) if file_len < 0 and creation != lastmod: return move_to_success_folder_and_read(f) else: return move_to_failure_folder_and_return_error_file(f) # Failure/Success Folder Functions def move_to_failure_folder_and_return_error_file(): filename, rootdir, lastmod, creation, filesize = fileinfo(file) os.mkdir('Failure') copy_and_move_file( 'Failure') initialize_logger('rootdir/Failure') logging.error("Either this file is empty or there are no lines") def move_to_success_folder_and_read(): filename, rootdir, lastmod, creation, filesize = fileinfo(file) os.mkdir('Success') copy_and_move_file(rootdir, 'Success') #file name print("Success", file) return file_len(file) if __name__ == '__main__': main(dirlist) -- https://mail.python.org/mailman/listinfo/python-list