You never want to do this:
n = q.count()
s = db(q)
for i in range(1,n):
   r = s.select( limitby=(i,1)).first()
   # do something with r

instead you consider something like this:

i, m = 0, 1000
while True:
   rows = db(q).select(limitby=(i*m,(i+1)*m))
   for r in rows:
      # do something with r
   if len(rows)<m: break
   i+=1


On Jul 22, 10:00 pm, Luis Goncalves <lgoncal...@gmail.com> wrote:
> I am trying to use web2py's DAL for a project that is not a webapp.
>
> I have a database with about 8M entries, and I want to process the data.
>
> Suppose I create a query that returns a lot of results:
> extreme case example:
>
> q = db.table.id>0
>
> How do I iterate through all the results of  a  large query, q, 
> *without*having to retrieve all the data to memory?
>
> Is there something better than:
>
> # q = a db query that returns a huge number of results
> n = q.count()
> s = db(q)
>
> for i in range(1,n):
> r = s.select( limitby=(i,1)).first()
> # do something with r
> ...
>
> I've tried this out (interactively, to see what is happening),
> and when I get to i=2,
>
> s.select( limitby=(2,1)).first()
>
> the computer starts to swap and hangs for minutes.
>
> So my question, again, is:
>
> Is there an efficient way to iterate through a large query (or set =
> db(query) )
> that avoids overloading the system memory?
>
> Thanks,
> Luis.

Reply via email to