Thank you for the advice everyone. Getting a lot of great input. Dave,
I'm working with a large json file which I'm converting to a dictionary and inserting the keys value. I want to use web2py's bulk_insert feature but the data has duplicates so I must first remove all the duplicates from the json, recreate the dictionary record, append it to a list, and then bulk_insert from there. I'm using MySQL is there a preferred way to handle bulk_inserts over web2py? On Monday, March 16, 2015 at 7:58:15 PM UTC-4, Dave S wrote: > > > > On Monday, March 16, 2015 at 4:14:20 PM UTC-7, LoveWeb2py wrote: >> >> Thank you for the feedback everyone. >> >> The main reason I fetch them all first is to make sure I'm not inserting >> duplicate records. We have a lot of files that have thousands of records >> and sometimes they're duplicates. I hash a few columns from each record and >> if the value is the same then I don't insert the record. If there is a more >> efficient way to do this please let me know. >> >> > Are you doing bulk inserts from files generated elsewhere? Merging two > (or more) databases? > > If you are doing individual inserts, I'd get the maybe-new record in, do a > select based on one or more fields in the record, and then do the column > hash comparison on the results that were returned. > > For bulk inserts, I'd be inclined to do the import with the DB Engine's > management tools, and maybe lean on having the DBE enforce "must be unique" > rules. > > /dps > > > >> On Monday, March 16, 2015 at 6:26:50 PM UTC-4, Niphlod wrote: >>> >>> I have 300m+ records too, but I don't need to fetch all of those in a >>> single query. >>> It's not a matter of indexes or anything else: the "system hang" you're >>> incurring to is not the one that the database takes from the moment you >>> send the query to the first row returned, but the time you take to >>> "accumulate" 1m row into memory. >>> >>> On Monday, March 16, 2015 at 10:46:39 PM UTC+1, Dane Wright wrote: >>>> >>>> I have a table which currently contains 10m+ records (local government >>>> spending in www.appgov.org). A native SQL count(*) will take 3-5 secs >>>> but reading all the records like this via the DAL will result in the >>>> system hanging. I try not to read too many of these records in any one >>>> online transaction by allowing only selections and calculating subtotals >>>> offline. I also create MySQL indexes outside of Web2py to speed things up >>>> where necessary. Of course I also can't ever say db.laspend.id>0 in >>>> the Database Admin facility either! >>>> >>>>> >>>>> -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.