I created a POC patch that does bulk loading of fixtures. The work can be found from github: https://github.com/akaariai/django/tree/fixture_loading. Be warned, it really is POC :)
The results are somewhat depressing for Django tests. The reason is that there just isn't that much fixture loading going on. Some statistics numbers I gathered using the patch: total_objects: 11200 average batch size: 2.51069012179 selects done: 4291 # Need to do a select to see which objects are already in the DB updates done: 1756 # Those that were are updated in the regular way raw inserts done: 159 # Can't use bulk_insert for inherited models, so use save(force_insert=True) instead batch_inserts done: 9277 # Amount of objects inserted using bulk_create batch_count: 3695 So, before patch there would have been 22400 queries (each save is a select + update or insert), after patch there were 4291 + 1756 + 159 + 3695 = 9901 queries. The speed of running the tests was about 3% faster using sqlite3. Things look a bit better when loading a dump of 10000 objects into db, for sqlite3 I got 4-5x better timing, for PostgreSQL I got 3-4x better timing for that test. If the DB would be non-local the result would be even better. All tests are passed on SQLite3, except two tests in modeltests/ fixtures which I suspect are QuerySet ordering related. There was one model that was missing default ordering, and it caused some breakages due to different order of results. The remaining broken tests are a bit complicated to debug. The broken tests are two full dumps of the DB, one in XML and one in JSON format. The dumps are then compared character for character to expected output. The dumps are somewhat large and I am somewhat tired, I think there is a row in different order somewhere in there... I am currently running the PostgreSQL tests, all I know is the PostgreSQL tests take a long time :) The main issues with the patch are: - I expect it to break when given models with lots of columns in big batches: too many sql parameters for backend. - Signals are sent, but they are sent in batches. - Generally more complicated object saving. The old way was very easy to understand: each object is saved when read from the fixture. Now that isn't true any more. - Same model, same pk multiple times in one fixture does not work. Django doesn't create fixtures like this. - I might be missing something obvious in the patch, fixture loading is a new area to me. The feature could be useful if there are users loading big fixture files regularly. Otherwise it complicates fixture loading for little gain. I am not going to create a ticket for this one. I have too many tickets in POC+DDN state already. - Anssi -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
