Hi everyone,

We're working on upgrading our product from Django 1.6 to 1.8, and one of 
the biggest tasks is moving from Django Evolution to Django's migrations. 
Please correct me if I'm missing something, but from my testing and 
investigation into the code, it seems that the migrate command will always 
apply each migration file one-at-a-time, individually, rather than 
pre-optimizing all the changes that need to be made across all pending 
migrations. This results in a lot of unnecessary, potentially expensive SQL 
operations. I'm wondering if there are plans to optimize this to reduce 
migration time.

For example, say we have a Book model. Over the course of 5 migrations, 
we've added 3 fields, changed one field, and removed one more. (Maybe there 
are other migrations for other models being applied in-between these 
migrations.)

Now let's say either a new database is being created for the first time, or 
an existing database is being upgraded from a state prior to the creation 
of the Book model. My expectation is that the database would only need to 
issue a single CREATE TABLE statement (and possibly follow-up ALTER TABLE 
ADD CONSTRAINTs). However, it's looking like it's instead executing a 
CREATE TABLE and then multiple ALTER TABLEs (even multiple ones per field 
within one migration).

While not too terrible for a brand new table, it is a problem for an 
existing table with a lot of data. If you have 5 new field changes to an 
existing table containing a million rows and you're running on MySQL (for 
instance), you'll end up with 5 ALTER TABLEs instead of one, causing MySQL 
to dump data/alter table/import data 5 times.

Django Evolution used to do this as well, and it made for very slow 
database migrations for tables with a lot of data. We had complaints of 
multi-day-long upgrades from some big users. Nowadays, it contains an 
optimizer 
(https://github.com/beanbaginc/django-evolution/blob/master/django_evolution/mutators.py#L377)
 
that first gathers all the operations that need to be performed, 
consolidates them whenever possible (such as rolling up any follow-up ALTER 
TABLEs into an earlier CREATE TABLE, filters out anything table-related for 
a table that will just be deleted later, and consolidates ALTER TABLEs into 
as few statements as possible.). This has done wonders for database 
migration time.

I know there's squashmigrations, but that's only really useful if you're in 
full control over all affected code and databases. (The documentation 
states, "if you’re a third-party project, just ensure your users upgrade 
releases in order without skipping any," which isn't feasible in shipping 
products.). It seems that the same optimization work could be done at 
application time though, couldn't it?

Are there already plans to add an optimizer for the full migration? What 
are the big issues blocking it?

Thanks,

Christian

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/6e91e997-4e17-48fe-9ca4-3f34accfca41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to