On Monday 31 December 2007 10:36, Chris Fuller wrote: > lin = re.findall('\s*([^\s]+)\s+([^\s]+)\s+(\d+)( [kM])?bytes', s)
This is incorrect. The first version of the script I wrote split the file into records by calling split('bytes'). I erroneously assumed I would obtain the desired results by sinmply adding "bytes" to the RE. The original RE could have been written such that this would have worked, (and would have been a little "cleaner") but it wasn't. The space should be obligatory, and not included with the [kM] group. I tried some of Kent's suggestions, and compared the run times. Nested split()'s are faster than REs! Python isn't as slow as you'd think :) # seperate into records (drop some trailing whitespace) lin = [i.split() for i in s.split('bytes')[:-1]] for fields in lin: try: if fields[3] == 'M': mul = 1000000 elif fields[3] == 'k': mul = 1000 except IndexError: mul = 1 lout.append( (fields[0], fields[1], int(fields[2])*mul) ) Cheers _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor