Thank you for the advice everyone. Getting a lot of great input.

Dave,

I'm working with a large json file which I'm converting to a dictionary and 
inserting the keys value. I want to use web2py's bulk_insert feature but 
the data has duplicates so I must first remove all the duplicates from the 
json, recreate the dictionary record, append it to a list, and then 
bulk_insert from there. I'm using MySQL is there a preferred way to handle 
bulk_inserts over web2py?

On Monday, March 16, 2015 at 7:58:15 PM UTC-4, Dave S wrote:
>
>
>
> On Monday, March 16, 2015 at 4:14:20 PM UTC-7, LoveWeb2py wrote:
>>
>> Thank you for the feedback everyone.
>>
>> The main reason I fetch them all first is to make sure I'm not inserting 
>> duplicate records. We have a lot of files that have thousands of records 
>> and sometimes they're duplicates. I hash a few columns from each record and 
>> if the value is the same then I don't insert the record. If there is a more 
>> efficient way to do this please let me know.
>>
>>
> Are you doing bulk inserts from files generated elsewhere?  Merging two 
> (or more) databases?
>
> If you are doing individual inserts, I'd get the maybe-new record in, do a 
> select based on one or more fields in the record, and then do the column 
> hash comparison on the results that were returned.
>
> For bulk inserts, I'd be inclined to do the import with the DB Engine's 
> management tools, and maybe lean on having the DBE enforce "must be unique" 
> rules.
>
> /dps
>
>  
>
>> On Monday, March 16, 2015 at 6:26:50 PM UTC-4, Niphlod wrote:
>>>
>>> I have 300m+ records too, but I don't need to fetch all of those in a 
>>> single query. 
>>> It's not a matter of indexes or anything else: the "system hang" you're 
>>> incurring to is not the one that the database takes from the moment you 
>>> send the query to the first row returned, but the time you take to 
>>> "accumulate" 1m row into memory.
>>>
>>> On Monday, March 16, 2015 at 10:46:39 PM UTC+1, Dane Wright wrote:
>>>>
>>>> I have a table which currently contains 10m+ records (local government 
>>>> spending in www.appgov.org). A native SQL count(*) will take 3-5 secs 
>>>> but  reading all the records like this via the DAL will result in the 
>>>> system hanging. I try not to read too many of these records in any one 
>>>> online transaction by allowing only selections and calculating subtotals 
>>>> offline. I also create MySQL indexes outside of Web2py to speed things up 
>>>> where necessary. Of course I also can't ever say db.laspend.id>0 in 
>>>> the Database Admin facility either!
>>>>
>>>>>
>>>>>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to