On Apr 11, 2005, at 8:00 AM, Joshua Ginsberg wrote:

On Apr 10, 2005, at 4:14 PM, Bob Ippolito wrote:


On Apr 10, 2005, at 2:46 PM, Joshua Ginsberg wrote:

I writing some python code to do some analysis of my mail logs. I took a 10,000 line snippet from them (the files are about 5-6 million usually) to test my code with. I'm developing it on a Powerbook G4 1.2GHz with 1.25GB of RAM and the Apple distributed Python* and I tested my code on the 10,000 line snippet. It took 2 minutes and 10 seconds to process that snippet. Way too slow -- I'd be looking at about 20 hours to process a single daily log file.

Just for fun, I copied the same code and the same log snippet to a dual-proc P3 500MHz machine running Fedora Core 2* with 1GB of RAM and tested it there. This machine provides web services and domain control for my network, so it's moderately utilized. The same code took six seconds to execute.

Granted I've got the GUI and all of that bogging down my Mac. However, I had nothing else fighting for CPU cycles and 700MB of RAM free when my testing was done. Even still, what would account for such a wide, wide, wide variation in the time required to process the data file? The code is 90% regular expressions and string finds.

* versions are:
Python 2.3 (#1, Sep 13 2003, 00:49:11)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
and
Python 2.3.3 (#1, May  7 2004, 10:31:40)
[GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2

Try it with a newer version of Python on Mac OS X. I had a similar problem, and it turned out to be Python 2.3.0's fault. Specifically, the implementation of the datetime module's parser was really, really, really stupid and slow in early versions of Python 2.3.


Well, I compiled a fresh version of Python 2.3.5 from python.org to test the datetime theory... and I'm still getting 150sec execution times. :-/ I'm gonna test the string vs. strop now...

Use Python's profiling tools and/or Apple's Shark to see what's slow.

-bob

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to