Thank you for the feedback everyone. The main reason I fetch them all first is to make sure I'm not inserting duplicate records. We have a lot of files that have thousands of records and sometimes they're duplicates. I hash a few columns from each record and if the value is the same then I don't insert the record. If there is a more efficient way to do this please let me know.
On Monday, March 16, 2015 at 6:26:50 PM UTC-4, Niphlod wrote: > > I have 300m+ records too, but I don't need to fetch all of those in a > single query. > It's not a matter of indexes or anything else: the "system hang" you're > incurring to is not the one that the database takes from the moment you > send the query to the first row returned, but the time you take to > "accumulate" 1m row into memory. > > On Monday, March 16, 2015 at 10:46:39 PM UTC+1, Dane Wright wrote: >> >> I have a table which currently contains 10m+ records (local government >> spending in www.appgov.org). A native SQL count(*) will take 3-5 secs >> but reading all the records like this via the DAL will result in the >> system hanging. I try not to read too many of these records in any one >> online transaction by allowing only selections and calculating subtotals >> offline. I also create MySQL indexes outside of Web2py to speed things up >> where necessary. Of course I also can't ever say db.laspend.id>0 in the >> Database Admin facility either! >> >>> >>> -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.