It's often said that you shouldn't try to guess what's slow, but use profiling tools to measure what's slow. I had a great example of that yesterday. We have some web server code that does a big database (MongoDB) query and some post-processing of the data in python. It worked fine in testing, but in production we started seeing slowness.
We had one query which was taking about 15 seconds to return 8000 values. We looked at the post-processing code and convinced ourselves everything was O(n log n) or better. We looked at the database and convinced ourselves that all the required indexes existed. So, I pulled out cProfile. It turned out that fully 50% of the time was being spent in the gettz() calls in this routine: def to_timestamp(d): """Convert d into a unix timestamp. d is assumed to be in UTC, as the pymongo module returns datetimes based in UTC, but without actual timezone information attached.""" d = d.replace(tzinfo = dateutil.tz.gettz('UTC')) d = d.astimezone(dateutil.tz.gettz()) return time.mktime(d.timetuple()) + (d.microsecond / 1e6) which was being called once per returned value from the database. Factoring out the gettz() calls got us a 2x speedup in page load time. Never would have found that without profiling. Profiling also showed that most of the rest of the time was spent in the BSON deserialization routines. We found a way to reduce the amount of data being returned, and eliminated most of that too. If we stared at the code long enough we probably would have realized that on our own, but we never would have guessed the gettz() calls were so expensive. -- http://mail.python.org/mailman/listinfo/python-list