In article <mailman.944.1366680414.3114.python-l...@python.org>, Rodrick Brown <rodrick.br...@gmail.com> wrote:
> I would like some feedback on possible solutions to make this script run > faster. If I had to guess, I would think this stuff: > line = line.replace('mediacdn.xxx.com', 'media.xxx.com') > line = line.replace('staticcdn.xxx.co.uk', ' > static.xxx.co.uk') > line = line.replace('cdn.xxx', 'www.xxx') > line = line.replace('cdn.xxx', 'www.xxx') > line = line.replace('cdn.xx', 'www.xx') > siteurl = line.split()[6].split('/')[2] > line = re.sub(r'\bhttps?://%s\b' % siteurl, "", line, 1) You make 6 copies of every line. That's slow. But I'm also going to quote something I wrote here a couple of months back: > I've been doing some log analysis. It's been taking a grovelingly long > time, so I decided to fire up the profiler and see what's taking so > long. I had a pretty good idea of where the ONLY TWO POSSIBLE hotspots > might be (looking up IP addresses in the geolocation database, or > producing some pretty pictures using matplotlib). It was just a matter > of figuring out which it was. > > As with most attempts to out-guess the profiler, I was totally, > absolutely, and embarrassingly wrong. So, my real advice to you is to fire up the profiler and see what it says. -- http://mail.python.org/mailman/listinfo/python-list