Can you submit an issue: http://code.google.com/p/web2py/issues/list
On Tuesday, October 18, 2011 10:56:56 AM UTC-4, Dragonfyre13 wrote: > > First, this post actually already talks about it (found after doing my > own profiling with guppy/heapy): > > http://groups.google.com/group/web2py/browse_thread/thread/534961af8e972300/212d7a59f2a7f239 > > > The general idea is that all sql statements are stored as is in > db._timings. This isn't an issue, except if > A) You are doing very, very large sql statements. I'm uploading lots > of files, storing them in the DB, which triggers this to a lesser > extent. > B) You are running a "background job" as detailed in the book. This > includes queues, etc. Basically, if you continue using the same DB > object for any reason for a considerable period of time, or if you're > doing large inserts in a batch/background job, this issue probably > will bite you. > > Here's my setup: > I'm putting through roughly 60k transactions in just under 3 minutes, > across 4 servers running 3 instances of web2py each. The performance > is staggeringly slow (for the traffic I'm pushing through) if I use > the DAL, thus I'm storing into redis as a "transactional" storage > place. I've got batch jobs setup to then pull out of redis, and sync > with the historical database. In addition to the pure transactions > being stored, I'm also pushing roughly 25k recordings, wav format, > about 3-8 seconds each into the database. (Using a shared drive isn't > an option for several reasons, and believe it or not storing the > recordings in the DB as a blob field actually works pretty well) > My batch jobs run for 6 hours before restarting themselves. There's > one addiitonal machine for the batch jobs, running 1 job per type, 3 > jobs total (one per core with one free core). The jobs are configured > as cron jobs in web2py. I've created a redis based locking system to > prevent multiple job spawns, and to restart within a minute if not > processing. > > The problem: > Within the batch job run time period, db._timings stores the content > of every sql statement passed through. Unfortunately, this is also the > content of blob fields as well as some very large (>5k) logging > archive fields, etc. This means that within a few minutes, I have > multiple hundreds of MB in memory that have been inserted into the DB. > > The solution (interim and suggestion for the code): > I have a thread in each batch job set to maintain lock states, manage > job queues, and sync various minor systems. It also makes sure that a > db.commit gets called at least every 5 seconds (to ensure the data is > available within a reasonable time frame, and limit data failure in > case of a crash). Adding "db._timings = []" to the db.commit portion > reduces the amount of memory used for db access longterm. Suggestion > for the code would be, why wouldn't db.commit clear db._timings within > the DAL? I'm not 100% sure why _timings is there in the first place, > other than for some debugging.