Luis, The problem of churning huge data can be tackled very well in Python. The task can be very slow if we use wrong methods; at the same time, it can be amazingly fast by using the correct mix of methods in Python. (e.g. 'for' loops cause a large overhead).
I have gone through the same situation before. At that time, I found the following links very helpful & optimized my code to a large extent. http://wiki.python.org/moin/PythonSpeed/PerformanceTips http://www.python.org/doc/essays/list2str.html http://www.skymind.com/~ocrow/python_string/ Maybe I can throw some more light if you show your actual code. --Vineet On Jul 23, 8:00 am, Luis Goncalves <lgoncal...@gmail.com> wrote: > I am trying to use web2py's DAL for a project that is not a webapp. > > I have a database with about 8M entries, and I want to process the data. > > Suppose I create a query that returns a lot of results: > extreme case example: > > q = db.table.id>0 > > How do I iterate through all the results of a large query, q, > *without*having to retrieve all the data to memory? > > Is there something better than: > > # q = a db query that returns a huge number of results > n = q.count() > s = db(q) > > for i in range(1,n): > r = s.select( limitby=(i,1)).first() > # do something with r > ... > > I've tried this out (interactively, to see what is happening), > and when I get to i=2, > > s.select( limitby=(2,1)).first() > > the computer starts to swap and hangs for minutes. > > So my question, again, is: > > Is there an efficient way to iterate through a large query (or set = > db(query) ) > that avoids overloading the system memory? > > Thanks, > Luis.