Even though I am starting to get the hang of Python, I continue to find myself finding problems that I cannot solve. I have never used dictionaries before and I feel that they really help improve efficiency when trying to analyze huge amounts of data (rather than having nested loops).
Basically what I have is 2 different files containing data. My program will take the first line in one file and see if it exists in another file. If it does find a match, then it will write the data to a file. --------------- Right now, the code will open file1 and store all contents in a list. Then it will do the same thing to file2. THEEEEN it will loop over list1 and insert into a Hash table. I am trying to find out a way to make this code more efficient. SO here is what i would rather have..... when i open file1 send directly to the hash table totally bypassing the insertion of the script...... Is this possible? def fcompare(f1name, f2name): import re mailsrch = re.compile(r'[EMAIL PROTECTED],4}') f1 = fopen(f1name) f2 = fopen(f2name) if not f1 or not f2: return 0 a = f1.readlines(); f1.close() b = f2.readlines(); f2.close() file1List= [] print "starting list 1" for c in a: file1List.extend(mailsrch.findall(c)) print "storing File1 in dictionary." d1 = {} for item in file1List : d1[item] = None print "finished storing information in lists." print "starting list 2" file2List = [] for d in b: file2List.extend(mailsrch.findall(d)) utp = open("match.txt","w") for item in file2List : if d1.has_key( item ) : utp.write(item + '\n') utp.close() #del file1List #del file2List print "finished comparing 2 lists." #return 1
-- http://mail.python.org/mailman/listinfo/python-list