I was thinking of something like that ... but I am not a good enough plpgsql programmer to figure out how to do the chunking in SQL. I needed a way to do the data calculations a thousand rows at a time, rather than all ninety-million in one gulp. So I have Python do the chunking and SQL do the update.
There was also the documentation aspect: "where do I store the SQL code and the instructions for how to do the migration manually?" This way, everything is contained in the migration module. The SQL is stored in Python triple-quoted string literals, along with the code to run it, and the database connection information comes right out of the django settings module. On Thursday, May 14, 2015 at 4:16:43 PM UTC-6, John Fabiani wrote: > > As a newbie - would it be better to use pgadmin (or psql) to make the > changes and migrate --fake so that Django would be happy? > > Johnf > > On 05/14/2015 02:43 PM, Vernon D. Cole wrote: > > I have learned the hard way this week that data migrations in django 1.8, > as wonderful has they are, do not scale. > > My test data table is now sitting at about 90,000,000 rows. I was able to > add a "null=True" field in an instant, as documented. Then came my attempt > to fill it -- I tried using RunSQL, as suggested, and the migration ran for > more than a day before crashing. The entire migration is treated as a > single transaction, so none of the work was committed. My next attempt was > to use RunPython with manual transaction control, but that cannot be done > because the "atomic=False" argument to RunPython does not work on a > Postgres database. > > My final solution (which should be done propagating my new field about > three days from now) was to have the migration module spawn a copy of > itself as a no-wait subprocess. It then runs as a main program, it opens > its own connection to the database, and does the conversion a chunk at a > time with manual transaction control. > > I think that my solution may be of benefit to others who might be able to > adapt my code to their own situation -- but I am not sure how to publish > it. It is not a module, so publication on PyPi or such would be wrong. > What would be effective? > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to django-users...@googlegroups.com <javascript:>. > To post to this group, send email to django...@googlegroups.com > <javascript:>. > Visit this group at http://groups.google.com/group/django-users. > To view this discussion on the web visit > https://groups.google.com/d/msgid/django-users/58bbdd64-2b6d-491f-9429-3cba8a54d94f%40googlegroups.com > > <https://groups.google.com/d/msgid/django-users/58bbdd64-2b6d-491f-9429-3cba8a54d94f%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/dd2abf3f-0981-43cc-ac39-8c05a730f3ec%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.