On Sat, Feb 20, 2010 at 4:21 PM, Vincent Davis <vinc...@vincentdavis.net>wrote:
> Thanks for the help, this is considerably faster and easier to read (see > below). I changed it to avoid the "break" and I think it makes it easy to > understand. I am checking the conditions each time slows it but it is worth > it to me at this time. > > It seems you are beginning to understand that programmer time is more valuable than machine time. Congratulations. > def read_data_file(filename): > reader = csv.reader(open(filename, "U"),delimiter='\t') > > data = [] > mask = [] > outliers = [] > modified = [] > > data_append = data.append > mask_append = mask.append > outliers_append = outliers.append > modified_append = modified.append > > I know some people do this to speed things up. Really, I don't think it's necessary or wise to do so. > maskcount = 0 > outliercount = 0 > modifiedcount = 0 > > for row in reader: > if '[MASKS]' in row: > maskcount += 1 > if '[OUTLIERS]' in row: > outliercount += 1 > if '[MODIFIED]' in row: > modifiedcount += 1 > if not any((maskcount, outliercount, modifiedcount, not row)): > data_append(row) > elif not any((outliercount, modifiedcount, not row)): > mask_append(row) > elif not any((modifiedcount, not row)): > outliers_append(row) > else: > if row: modified_append(row) > > Just playing with the logic here: 1. Notice that if "not row" is True, nothing happens? Pull it out explicitly. 2. Notice how it switches from mode to mode? Program it more explicitly. Here's my suggestion: def parse_masks(reader): for row in reader: if not row: continue elif '[OUTLIERS]' in row: parse_outliers(reader) elif '[MODIFIED]' in row: parse_modified(reader) masks.append(row) def parse_outliers(reader): for row in reader: if not row: continue elif '[MODIFIED]' in row: parse_modified(reader) outliers.append(row) def parse_modified(reader): for row in reader: if not row: continue modified.append(row) for row in reader: if not row: continue elif '[MASKS]' in row: parse_masks(reader) elif '[OUTLIERS]' in row: parse_outliers(reader) elif '[MODIFIED]' in row: parse_modified(reader) else: data.append(row) Since there is global state involved, you may want to save yourself some trouble in the future and put the above in a class where separate parsers can be kept separate. It looks like your program is turning into a regular old parser. Any format that is a little more than trivial to parse will need a real parser like the above. -- Jonathan Gardner jgard...@jonathangardner.net
-- http://mail.python.org/mailman/listinfo/python-list