I created a POC patch that does bulk loading of fixtures. The work can
be found from github: https://github.com/akaariai/django/tree/fixture_loading.
Be warned, it really is POC :)

The results are somewhat depressing for Django tests. The reason is
that there just isn't that much fixture loading going on. Some
statistics numbers I gathered using the patch:

total_objects:  11200
average batch size: 2.51069012179
selects done: 4291 # Need to do a select to see which objects are
already in the DB
updates done: 1756 # Those that were are updated in the regular way
raw inserts done: 159 # Can't use bulk_insert for inherited models, so
use save(force_insert=True) instead
batch_inserts done: 9277 # Amount of objects inserted using
bulk_create
batch_count: 3695

So, before patch there would have been 22400 queries (each save is a
select + update or insert), after patch there were 4291 + 1756 + 159 +
3695 = 9901 queries.

The speed of running the tests was about 3% faster using sqlite3.

Things look a bit better when loading a dump of 10000 objects into db,
for sqlite3 I got 4-5x better timing, for PostgreSQL I got 3-4x better
timing for that test. If the DB would be non-local the result would be
even better.

All tests are passed on SQLite3, except two tests in modeltests/
fixtures which I suspect are QuerySet ordering related. There was one
model that was missing default ordering, and it caused some breakages
due to different order of results. The remaining broken tests are a
bit complicated to debug. The broken tests are two full dumps of the
DB, one in XML and one in JSON format. The dumps are then compared
character for character to expected output. The dumps are somewhat
large and I am somewhat tired, I think there is a row in different
order somewhere in there...

I am currently running the PostgreSQL tests, all I know is the
PostgreSQL tests take a long time :)

The main issues with the patch are:
  - I expect it to break when given models with lots of columns in big
batches: too many sql parameters for backend.
  - Signals are sent, but they are sent in batches.
  - Generally more complicated object saving. The old way was very
easy to understand: each object is saved when read from the fixture.
Now that isn't true any more.
  - Same model, same pk multiple times in one fixture does not work.
Django doesn't create fixtures like this.
  - I might be missing something obvious in the patch, fixture loading
is a new area to me.

The feature could be useful if there are users loading big fixture
files regularly. Otherwise it complicates fixture loading for little
gain.

I am not going to create a ticket for this one. I have too many
tickets in POC+DDN state already.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Reply via email to