Hello Raphael, > Have there ever been any surveys about how the size of Django projects? I don't know the value of investigating this further except for our own usage.
I'm not aware of any similar surveys in the recent years but I would say *240 models across 90 apps, with roughly 500 migrations* would be considered a really large project in my experience. Did you look into squashing these 500 migrations by any chance? Something we did at $DAYJOB to speed up the test bootstraping process is to prebuild containers with migrations already applied in production so CI running PRs only applies new migrations on top of them. > Does the caching of ModelState.render as done in this PR (still need to work through a couple failing tests) sound reasonable? Or is this veering too far in the performance/safety guarantee tradeoff? While the layer you added seems to yield significant benefits I would argue that it complicates an already too complex apps rendering caching layer. As you'll probably come to discover while trying to resolve the currently failing tests model.Fields equality is not implemented how you'd expect it to be[0] and thus require costly deconstruction to be used as a cache staleness predicate[1]. > Is the migration operation infrastructure considered a public API? As in, would changing the Operation model API (potentially breaking subclasses) be considered a major undertaking? Or would it be an acceptable cost to pay for some performance improvements? Given the large adoption of migrations and the fact the Operation API is publicly documented[2] I would say the performance benefits would need to be quite substantial to break backward compatibility. In my opinion, and I think that's something Markus Holtermann who also worked a lot on speeding up migrations would agree on, we should focus our efforts on avoiding model rendering at all cost. We've already made all state mutation (Operation.state_forwards) avoid all accesses to .apps and I think the next step would be to make `database_forwards` and `database_backwards` do the same. This is something Markus worked on a few years ago[3]. Cheers, Simon [0] https://github.com/django/django/blob/1d0bab0bfd77edcf1228d45bf654457a8ff1890d/django/db/models/fields/__init__.py#L495-L499 [1] https://github.com/django/django/blob/1d0bab0bfd77edcf1228d45bf654457a8ff1890d/django/db/migrations/autodetector.py#L49-L87 [2] https://docs.djangoproject.com/en/2.2/ref/migration-operations/#writing-your-own [3] https://github.com/django/django/compare/master...MarkusH:schemaeditor-modelstate Le dimanche 19 mai 2019 22:13:03 UTC-4, Raphael Gaschignard a écrit : > > Hi Developers, > > We have a decently-sized "large project", around 240 models across 90 > apps, with roughly 500 migrations to work off of. We do periodically squash > migrations to keep the migration count under control, but because of all > this migrations in our testing server take 3-5 minutes to run to > completion. > > I am not sure about what the size of a typical Django project is (or > rather, a typical "large project") so it's hard for me to quantify how big > of an issue this is. > > Looking through the migration code and some profiling I found a place > where caching was possible (on the ModelState -> Model rendering, based > on some of the invariants stated in ModelState code), which would cut > *our* full migration from 230 seconds to 50 seconds (on my machine at > least). On the specific caching I did, I was hitting a 90% cache hit rate > on our full migration run. > > Caching is always a bit scary, though, and there are a *lot* of places in > the apps registry code/model registration code in particular where caches > are constantly being wiped. So this stuff scares me quite a bit. In my > personal ideal, I would love to be able to check in my caching thing but > have it be behind some MIGRATIONS_FASTER_BUT_MAYBE_UNSAFE flag. I am not > recommending this for Django because it's not how the project tends to do > things, this is just my personal feeling. After all, you're rarely running > all your migrations in production, so this is a testing problem more than > anything. > > I do think there would be an alternative way to move forward though. > Currently the migrations Operation class relies on having the from_state > and to_state for DB operations in particular. But I think that we could > change up this API based on how these properties are used in > Django-provided Operation classes to avoid having to copy the state to > provide from_state and to_state. I haven't gone through with this > investigation too much yet but I think this would improve things a bit. > > So this is a multi-pronged question: > > - Have there ever been any surveys about how the size of Django projects? > I don't know the value of investigating this further except for our own > usage. > > - Does the caching of ModelState.render as done in this PR > <https://github.com/django/django/pull/11388> (still need to work through > a couple failing tests) sound reasonable? Or is this veering too far in the > performance/safety guarantee tradeoff? > - Is the migration operation infrastructure considered a public API? As > in, would changing the Operation model API (potentially breaking > subclasses) be considered a major undertaking? Or would it be an acceptable > cost to pay for some performance improvements? > > I am still trying to wrap my head around some of this problem space, so > any insight will be very appreciated > > Thanks, > Raphael > -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/978c443a-d020-4084-9e47-77fd2122389d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
