Hi everybody!
I would like to know what options exist when you have a huge migration
that will obviously not run on your productive server.
I have spitted a model in two smaller ones and wrote then a migration to
populate these new models. The number of original objects is around
250,000 and I have also a few references to update. In the end, the
migration lasted more than 30 mn on my machine (16 GB RAM and it was
swapping a lot) and it failed on another machine because the RAM was out
(the process was using then about 13 GB). On the productive server we
have even less RAM so to run the migration as it is is really out of
question.
I have tried to use all the Django mechanisms that I know to optimize
the queries: select_related, prefetch_related, bulk_create,
QuerySet.update... Now, the migration I am talking about use
bulk_create(batch_size=None) and process the whole queryset at once.
Before that, as the migration was not so long lasting because I had 2
references less to update, I tried other values for batch_size and also
I processed the queryset as pages of a few hundreds or thousands
objects. The results were not better then batch_size=None and "all at
once", that's why I finally used "basic settings" (and the migration was
lasting about 5 minutes). I will have to reintroduce some tweaks because
the extra updates of the two relations I mentioned is making here a big
difference.
I am wondering if someone already found him/herself in a similar
situation, and with what solution you finally came to.
If the migration lasts very long, it's not a problem by itself but I
don't want to lock the database for 15 mn. The fact is that I don't know
what is happening during the migration process, what is locked by what?
I will split the migration in "pages" to use less RAM anyway, but I was
also thinking of migrating in two different steps *or* files, in order
to process separately the objects that are not editable (basically most
of them, that we keep for history but they are read-only) and the others
(which should be much faster and thus people working will not be blocked
for long). Does it make sense? Some other ideas?
Thanks a lot!
Adrien
--
You received this message because you are subscribed to the Google Groups "Django
users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-users/ca0c145a-8147-9bbd-e01e-e74355a16a2a%40init.at.
For more options, visit https://groups.google.com/d/optout.