On Tue, Apr 23, 2013 at 11:19 AM, Rodrick Brown <rodrick.br...@gmail.com> wrote: > with gzip.open(args.inputfile) as datafile: > for line in datafile: > outfile = '{}{}{}_combined.log'.format(dateobj.year, > dateobj.month, dateobj.day) > outdir = (args.outputdir + os.sep + siteurl) > > with open(outdir + os.sep + outfile, 'w+') as outf: > outf.write(line)
You're opening files and closing them again for every line. This wouldn't cause you to spin the CPU (more likely it'd thrash the hard disk - unless you have an SSD), but it is certainly an optimization target. Can you know in advance what files you need? If not, I'd try something like this: outf = {} # Might want a better name though ..... outfile = ... if outfile not in outf: os.makedirs(...) outf[outfile] = open(...) outf[outfile].write(line) for f in outf.values(): f.close() Open files only as needed, close 'em all at the end. ChrisA -- http://mail.python.org/mailman/listinfo/python-list