Can you submit an issue: http://code.google.com/p/web2py/issues/list

On Tuesday, October 18, 2011 10:56:56 AM UTC-4, Dragonfyre13 wrote:
>
> First, this post actually already talks about it (found after doing my 
> own profiling with guppy/heapy): 
>
> http://groups.google.com/group/web2py/browse_thread/thread/534961af8e972300/212d7a59f2a7f239
>  
>
> The general idea is that all sql statements are stored as is in 
> db._timings. This isn't an issue, except if 
> A) You are doing very, very large sql statements. I'm uploading lots 
> of files, storing them in the DB, which triggers this to a lesser 
> extent. 
> B) You are running a "background job" as detailed in the book. This 
> includes queues, etc. Basically, if you continue using the same DB 
> object for any reason for a considerable period of time, or if you're 
> doing large inserts in a batch/background job, this issue probably 
> will bite you. 
>
> Here's my setup: 
> I'm putting through roughly 60k transactions in just under 3 minutes, 
> across 4 servers running 3 instances of web2py each. The performance 
> is staggeringly slow (for the traffic I'm pushing through) if I use 
> the DAL, thus I'm storing into redis as a "transactional" storage 
> place. I've got batch jobs setup to then pull out of redis, and sync 
> with the historical database. In addition to the pure transactions 
> being stored, I'm also pushing roughly 25k recordings, wav format, 
> about 3-8 seconds each into the database. (Using a shared drive isn't 
> an option for several reasons, and believe it or not storing the 
> recordings in the DB as a blob field actually works pretty well) 
> My batch jobs run for 6 hours before restarting themselves. There's 
> one addiitonal machine for the batch jobs, running 1 job per type, 3 
> jobs total (one per core with one free core). The jobs are configured 
> as cron jobs in web2py. I've created a redis based locking system to 
> prevent multiple job spawns, and to restart within a minute if not 
> processing. 
>
> The problem: 
> Within the batch job run time period, db._timings stores the content 
> of every sql statement passed through. Unfortunately, this is also the 
> content of blob fields as well as some very large (>5k) logging 
> archive fields, etc. This means that within a few minutes, I have 
> multiple hundreds of MB in memory that have been inserted into the DB. 
>
> The solution (interim and suggestion for the code): 
> I have a thread in each batch job set to maintain lock states, manage 
> job queues, and sync various minor systems. It also makes sure that a 
> db.commit gets called at least every 5 seconds (to ensure the data is 
> available within a reasonable time frame, and limit data failure in 
> case of a crash). Adding "db._timings = []" to the db.commit portion 
> reduces the amount of memory used for db access longterm. Suggestion 
> for the code would be, why wouldn't db.commit clear db._timings within 
> the DAL? I'm not 100% sure why _timings is there in the first place, 
> other than for some debugging.

Reply via email to