@pbreit : if I'm not mistaken, the "parse()" function in DAL handles a lot ... basically read just any value returned from the database to the correct type (blob, integer, double, datetime, .... list:*, just to say a few), handles virtualfields and prepares the Rows object making the Storage(). All the resultset is fetched, "parsed()" line by line, column by column, and returned as a whole list. Also if operations were to be kept at a minimum, building a simple list of 1 million record take some time, (and memory). Obviously the first shortcut is use limitby. The second is selecting only the needed columns (best practice anyway to limit the amount of wire transfer). No other shoutcuts.
@all: Giving that a million record is hardly displayed, SQLTABLEd, SQLFORM.gridded, pickled, jsoned, etc.....I think the "need" here is having a Storage() object instead of a tuple for accessing the values as row.id instead of row[0], etc as in current executesql. I see two roads (to be considered as 1 AND 2 and also as 1 OR 2) here, but I don't know the implications of having to deal with SQLFORM.grid, for instance. 1) skip the parse() and return a quasi-rows object "a la executesql", just turning the list of tuples as list of row objects, skipping virtual fields, references, etc. 2) approach the returned values as a generator, and optionally as an only-forward iterator "a la web.py iterbetter()" (https://github.com/webpy/webpy/blob/master/web/db.py#L646), i.e. you can fetch the results and cycle through them only one time Downsides for method 1): - some types could not be converted at all, references would be unavailable (but for million records you should anyway have done a join), no virtualfields - building a list of million records still requires time - for large datasets having to work on them only "after" all the list is prepared could be taking some time Downsides for method 2): - may work as expected only if underlying db driver supports it - in the need of cycling the resultset for the second time another query is required or you have to come up with you own memoization (i.e. prepare an empty list to append results you're interested in cycling again) I used web.py a lot and also for millions of records it's very fast for returning results. NB: the "time cut" here is the one passing from the "db is ready to return results" to "the queryset is done and prepared for me to work on it". The assumption on 2) is that rarely you need to access randomly a million of records - i.e. myrows[31289], myrows[129459] - and what you can do in 2 loops on the same dataset is usually "accomplishable" (that word even exist ? :D) in a single one. --