First, this post actually already talks about it (found after doing my own profiling with guppy/heapy): http://groups.google.com/group/web2py/browse_thread/thread/534961af8e972300/212d7a59f2a7f239
The general idea is that all sql statements are stored as is in db._timings. This isn't an issue, except if A) You are doing very, very large sql statements. I'm uploading lots of files, storing them in the DB, which triggers this to a lesser extent. B) You are running a "background job" as detailed in the book. This includes queues, etc. Basically, if you continue using the same DB object for any reason for a considerable period of time, or if you're doing large inserts in a batch/background job, this issue probably will bite you. Here's my setup: I'm putting through roughly 60k transactions in just under 3 minutes, across 4 servers running 3 instances of web2py each. The performance is staggeringly slow (for the traffic I'm pushing through) if I use the DAL, thus I'm storing into redis as a "transactional" storage place. I've got batch jobs setup to then pull out of redis, and sync with the historical database. In addition to the pure transactions being stored, I'm also pushing roughly 25k recordings, wav format, about 3-8 seconds each into the database. (Using a shared drive isn't an option for several reasons, and believe it or not storing the recordings in the DB as a blob field actually works pretty well) My batch jobs run for 6 hours before restarting themselves. There's one addiitonal machine for the batch jobs, running 1 job per type, 3 jobs total (one per core with one free core). The jobs are configured as cron jobs in web2py. I've created a redis based locking system to prevent multiple job spawns, and to restart within a minute if not processing. The problem: Within the batch job run time period, db._timings stores the content of every sql statement passed through. Unfortunately, this is also the content of blob fields as well as some very large (>5k) logging archive fields, etc. This means that within a few minutes, I have multiple hundreds of MB in memory that have been inserted into the DB. The solution (interim and suggestion for the code): I have a thread in each batch job set to maintain lock states, manage job queues, and sync various minor systems. It also makes sure that a db.commit gets called at least every 5 seconds (to ensure the data is available within a reasonable time frame, and limit data failure in case of a crash). Adding "db._timings = []" to the db.commit portion reduces the amount of memory used for db access longterm. Suggestion for the code would be, why wouldn't db.commit clear db._timings within the DAL? I'm not 100% sure why _timings is there in the first place, other than for some debugging.