Luis,
The problem of churning huge data can be tackled very well in Python.
The task can be very slow if we use wrong methods; at the same time,
it can be amazingly fast by using the correct mix of methods in
Python.
(e.g. 'for' loops cause a large overhead).

I have gone through the same situation before.
At that time, I found the following links very helpful & optimized my
code to a large extent.
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
http://www.python.org/doc/essays/list2str.html
http://www.skymind.com/~ocrow/python_string/

Maybe I can throw some more light if you show your actual code.

--Vineet

On Jul 23, 8:00 am, Luis Goncalves <lgoncal...@gmail.com> wrote:
> I am trying to use web2py's DAL for a project that is not a webapp.
>
> I have a database with about 8M entries, and I want to process the data.
>
> Suppose I create a query that returns a lot of results:
> extreme case example:
>
> q = db.table.id>0
>
> How do I iterate through all the results of  a  large query, q, 
> *without*having to retrieve all the data to memory?
>
> Is there something better than:
>
> # q = a db query that returns a huge number of results
> n = q.count()
> s = db(q)
>
> for i in range(1,n):
> r = s.select( limitby=(i,1)).first()
> # do something with r
> ...
>
> I've tried this out (interactively, to see what is happening),
> and when I get to i=2,
>
> s.select( limitby=(2,1)).first()
>
> the computer starts to swap and hangs for minutes.
>
> So my question, again, is:
>
> Is there an efficient way to iterate through a large query (or set =
> db(query) )
> that avoids overloading the system memory?
>
> Thanks,
> Luis.

Reply via email to