On Apr 11, 2005, at 8:00 AM, Joshua Ginsberg wrote:
On Apr 10, 2005, at 4:14 PM, Bob Ippolito wrote:
On Apr 10, 2005, at 2:46 PM, Joshua Ginsberg wrote:
I writing some python code to do some analysis of my mail logs. I
took a 10,000 line snippet from them (the files are about 5-6
million usually) to test my code with. I'm developing it on a
Powerbook G4 1.2GHz with 1.25GB of RAM and the Apple distributed
Python* and I tested my code on the 10,000 line snippet. It took 2
minutes and 10 seconds to process that snippet. Way too slow --
I'd be looking at about 20 hours to process a single daily log
file.
Just for fun, I copied the same code and the same log snippet to a
dual-proc P3 500MHz machine running Fedora Core 2* with 1GB of RAM
and tested it there. This machine provides web services and domain
control for my network, so it's moderately utilized. The same code
took six seconds to execute.
Granted I've got the GUI and all of that bogging down my Mac.
However, I had nothing else fighting for CPU cycles and 700MB of
RAM free when my testing was done. Even still, what would account
for such a wide, wide, wide variation in the time required to
process the data file? The code is 90% regular expressions and
string finds.
* versions are:
Python 2.3 (#1, Sep 13 2003, 00:49:11)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
and
Python 2.3.3 (#1, May 7 2004, 10:31:40)
[GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
Try it with a newer version of Python on Mac OS X. I had a similar
problem, and it turned out to be Python 2.3.0's fault.
Specifically, the implementation of the datetime module's parser was
really, really, really stupid and slow in early versions of Python
2.3.
Well, I compiled a fresh version of Python 2.3.5 from python.org to
test the datetime theory... and I'm still getting 150sec execution
times. :-/ I'm gonna test the string vs. strop now...
Use Python's profiling tools and/or Apple's Shark to see what's slow.
-bob
--
http://mail.python.org/mailman/listinfo/python-list