Bulk inserts are the way the go if you can. When inserting a bunch of data, avoid using the django orm. Do it in plain SQL. The overhead of creating django orm model instances is way too expensive. Alltough it may not be bulk insert the sense Nick mentioned above I wrote DSE [ http://pypi.python.org/pypi/dse/0.3.1 ] for that exact purpose, to insert or update a bunch of data using plain SQL, but it hasn`t been tested much so don`t use it in production. The thing it solves, can be done in simple SQL;
Use cursor.executemany("<prepared insert statement>", <list of tuples with params>). DSE takes care of creating SQL-statements for insert and updates based on your models, handles any default values you might have defined in your model, caches lists of params and executes cursor.executemany when the list of cached items reaches a specified limit. My experience is that the performance gain of this solution or a similar one is huge. Using cursor.executemany might be what Nick meant by bulk insert, but I think different DB backends handles it differently. I don`t know. Anyway, I've inserted many thousands of records using DSE and it takes a fraction of the time when compared to doing it with the orm. NB! DSE is a proof-of-concept project more than anything else. It needs a good re-write, extensive testing and docs, but it might be helpful. Thomas On Wed, Jan 19, 2011 at 2:35 AM, Nick Arnett <nick.arn...@gmail.com> wrote: > > > On Tue, Jan 18, 2011 at 12:04 PM, Sithembewena Lloyd Dube > <zebr...@gmail.com> wrote: >> >> Hi all, >> >> I am building a search app. that will query an API. The app. will also >> store search terms in a very simple table structure. >> >> Big question: if the app. eventually hit 10 million searches and I was >> storing every single search term, would the table hold or would I run into >> issues? > > As someone else said, 10 million records is no big deal for MySQL, in > principle. > However, you probably would do better to avoid all the overhead of a > database transaction for storing each of these. I'm going to assume that > there will be duplicates, especially if you normalize the queries. It would > make a lot more sense to log the queries into a text file, which has > extremely low overhead. Then you'd periodically process the log files, > normalizing and eliminating duplicates, producing a bulk insert to load into > the database. Bulk inserts will be FAR more efficient than using Django. > Nick > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to django-users@googlegroups.com. > To unsubscribe from this group, send email to > django-users+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. > -- Mvh/Best regards, Thomas Weholt http://www.weholt.org -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.