You never want to do this: n = q.count() s = db(q) for i in range(1,n): r = s.select( limitby=(i,1)).first() # do something with r
instead you consider something like this: i, m = 0, 1000 while True: rows = db(q).select(limitby=(i*m,(i+1)*m)) for r in rows: # do something with r if len(rows)<m: break i+=1 On Jul 22, 10:00 pm, Luis Goncalves <lgoncal...@gmail.com> wrote: > I am trying to use web2py's DAL for a project that is not a webapp. > > I have a database with about 8M entries, and I want to process the data. > > Suppose I create a query that returns a lot of results: > extreme case example: > > q = db.table.id>0 > > How do I iterate through all the results of a large query, q, > *without*having to retrieve all the data to memory? > > Is there something better than: > > # q = a db query that returns a huge number of results > n = q.count() > s = db(q) > > for i in range(1,n): > r = s.select( limitby=(i,1)).first() > # do something with r > ... > > I've tried this out (interactively, to see what is happening), > and when I get to i=2, > > s.select( limitby=(2,1)).first() > > the computer starts to swap and hangs for minutes. > > So my question, again, is: > > Is there an efficient way to iterate through a large query (or set = > db(query) ) > that avoids overloading the system memory? > > Thanks, > Luis.