Bulk inserts are the way the go if you can. When inserting a bunch of
data, avoid using the django orm. Do it in plain SQL. The overhead of
creating django orm model instances is way too expensive. Alltough it
may not be bulk insert the sense Nick mentioned above I wrote DSE [
http://pypi.python.org/pypi/dse/0.3.1 ] for that exact purpose, to
insert or update a bunch of data using plain SQL, but it hasn`t been
tested much so don`t use it in production. The thing it solves, can be
done in simple SQL;

Use cursor.executemany("<prepared insert statement>", <list of tuples
with params>).

DSE takes care of creating SQL-statements for insert and updates based
on your models, handles any default values you might have defined in
your model, caches lists of params and executes cursor.executemany
when the list of cached items reaches a specified limit. My experience
is that the performance gain of this solution or a similar one is
huge. Using cursor.executemany might be what Nick meant by bulk
insert, but I think different DB backends handles it differently. I
don`t know. Anyway, I've inserted many thousands of records using DSE
and it takes a fraction of the time when compared to doing it with the
orm.


NB! DSE is a proof-of-concept project more than anything else. It
needs a good re-write, extensive testing and docs, but it might be
helpful.

Thomas





On Wed, Jan 19, 2011 at 2:35 AM, Nick Arnett <nick.arn...@gmail.com> wrote:
>
>
> On Tue, Jan 18, 2011 at 12:04 PM, Sithembewena Lloyd Dube
> <zebr...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I am building a search app. that will query an API. The app. will also
>> store search terms in a very simple table structure.
>>
>> Big question: if the app. eventually hit 10 million searches and I was
>> storing every single search term, would the table hold or would I run into
>> issues?
>
> As someone else said, 10 million records is no big deal for MySQL, in
> principle.
> However, you probably would do better to avoid all the overhead of a
> database transaction for storing each of these.  I'm going to assume that
> there will be duplicates, especially if you normalize the queries.  It would
> make a lot more sense to log the queries into a text file, which has
> extremely low overhead.  Then you'd periodically process the log files,
> normalizing and eliminating duplicates, producing a bulk insert to load into
> the database.  Bulk inserts will be FAR more efficient than using Django.
> Nick
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>



-- 
Mvh/Best regards,
Thomas Weholt
http://www.weholt.org

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Reply via email to