Hi Alex, On 2010-08-16 18:44, Alex van der Spek wrote: > Anybody catches any other ways to improve my program (attached), you are > most welcome. Help me learn, that is one of the objectives of this > newsgroup, right? Or is it all about exchanging the next to impossible > solution to the never to happen unreal world problems?
I don't know what a concordance table is, and I haven't looked a lot into your program, but anyway here are some things I noticed at a glance: | #! usr/bin/env python | # Merge log files to autolog file | import os | import fileinput | #top='C:\\Documents and Settings\\avanderspek\\My Documents\\CiDRAdata\\Syncrude\\CSL\\August2010' | top='C:\\Users\\ZDoor\\Documents\\CiDRA\\Syncrude\CSL\\August2010' If you have backslashes in strings, you might want to use "raw strings". Instead of "c:\\Users\\ZDoor" you'd write r"c:\Users\ZDoor" (notice the r in front of the string). | i,j,k=0,0,0 | date={} I suggest to use more spacing to make the code more readable. Have a look at http://www.python.org/dev/peps/pep-0008/ for more formatting (and other) tips. | fps=0.3048 | tab='\t' | | bt='-999.25'+'\t''-999.25'+'\t''-999.25'+'\t''-999.25'+'\t'+'-999.25' If these numbers are always the same, you should use something like NUMBER = "-999.25" COLUMNS = 5 bt = "\t".join(COLUMNS * [NUMBER]) (with better naming, of course). Why don't you use `tab` here? I _highly_ recommend to use longer (unabbreviated) names. | al='Status'+'\t'+'State'+'\t'+'-999.25' | | for root,dirs,files in os.walk(top): | #Build a concordance table of days on which data was collected | for name in files: | ext=name.split('.',1)[1] There's a function `splitext` in `os.path`. | if ext=='txt': | dat=name.split('_')[1].split('y')[1] | if dat in date.keys(): You can just write `if dat in date` (in Python versions >= 2.2, I think). | date[dat]+=1 | else: | date[dat]=1 | print 'Concordance table of days:' | print date | print 'List of files processed:' | #Build a list of same day filenames, 5 max for a profile meter,skip first and last days | for f in sorted(date.keys())[2:-1]: | logs=[] | for name in files: | ext=name.split('.')[1] | if ext=='txt': | dat=name.split('_')[1].split('y')[1] I guess I'd move the parsing stuff (`x.split(s)[i]` etc.) into small functions with meaningful names. After that I'd probably notice there's much redundancy and refactor them. ;) | if dat==f: | logs.append(os.path.join(root,name)) | #Open the files and read line by line | datsec=False | lines=[[] for i in range(5)] One thing to watch out for: The above is different from `[[]] * 5` which uses the _same_ empty list for all entries. Probably the semantics you chose is correct. | fn=0 | for line in fileinput.input(logs): | if line.split()[0]=='DataID': | datsec=True | ln=0 | if datsec: | lines[fn].append(line.split()) | ln+=1 | if ln==10255: This looks like a "magic number" and should be turned into a constant. | datsec=False | fileinput.nextfile() | fn+=1 | print fileinput.filename().rsplit('\\',1)[1] | fileinput.close() | aut='000_AutoLog'+f+'.log' | out=os.path.join(root,aut) | alf=open(out,'w') | alf.write('Timestamp (mm/dd/yyyy hh:mm:ss) VF 1 VF 2 VF 3 VF 4 VF 5 Q 1 Q 2 Q 3 Q 4 Q 5 Status State Metric Band Temperature 1 Band Temperature 2 Band Temperature 3 Band Temperature 4 Band Temperature 5 SPL 1 SPL 2 SPL 3 SPL 4 SPL 5'+'\n') | for wn in range(1,10255,1): You don't need to write the step argument if it's 1. | for i in range(5): | lines[i][wn][2]=str(float(lines[i][wn][2])/fps) | tp=lines[0][wn][0]+' '+lines[0][wn][1] | vf=tab.join([lines[i][wn][2] for i in range(5)]) | vq=tab.join([lines[i][wn][3] for i in range(5)]) | vs=tab.join([lines[i][wn][4] for i in range(5)]) | #sf=tab.join([lines[i][wn][5] for i in range(5)]) | #sq=tab.join([lines[i][wn][6] for i in range(5)]) | #ss=tab.join([lines[i][wn][7] for i in range(5)]) Maybe use an extra function? def choose_a_better_name(): return tab.join([lines[index][wn][2] for index in range(5)]) Moreover, the repetition of this line looks as if you wanted to put the right hand sides of the assignments in a list, instead of assigning to distinct names (`vf` etc.). By the way, you use the number 5 a lot. I guess this should be a constant, too. | alf.write(tp+'\t'+vf+'\t'+vq+'\t'+al+'\t'+bt+'\t'+vs+'\n') Suggestion: Use tab.join([tp, vf, vq, al, bt, vs]) + "\n" Again, not using distinct variables would have an advantage here. | alf.close() | print "Done" Stefan -- http://mail.python.org/mailman/listinfo/python-list