Re: [openstack-dev] [nova][heat] sqlalchemy-migrate tool to alembic

Mike Bayer Fri, 22 May 2015 14:30:26 -0700

OK, have just gotten off a chat with the folks at summit.

I am glad that I've managed to get my concerns about this approach outthere. For people reading my notes here, I've gotten the answer to myquestion about how database access code is written for a system that ismoving from some particular schema structure A to a new one B.

Essentially, supposing we've released "L", and we are now in developmentfor "M". Over the course of M, we are adding new objects, e.g. tables,columns; let's call these M1, M2, M3. These objects are meant toreplace older objects in L, say L1, L2, L3.

As M is being developed, at all times the model and database accesslayer must consider both of L1, L2, L3, and M1, M2, M3, at the sametime. Meaning, the notion of schema migrations as something you commitand at which point you can cleanly just change your model is gone. Themodel needs to be able to load data from the L1/L2/L3 objects and ensurethat it gets copied to M1, M2, M3, either as the APIs are accessed undernormal use, or via a "background process" that will move data over fromL to M. It is only when the database data is fully moved to M, butalso when *all database-connected applications* are moved up as well,that the "contract" phase can be run. The "contract" phase will thendrop every object in the database that is not in the currently runningmodel, including any additional objects that were added by the operator.

Now, the approach of having a model that can bridge the gap between twoschemas, to delay the full migration of changes across, is in fact verycommon in the real world of database applications. This is a commontechnique that is often necessary.

What is dramatically different in Nova's case is that what is normallyjust a particularly tedious tool one can choose to use in specificsituations, that of the model that must bridge two different schemadesigns and slowly migrate data, now becomes an absolute hardrequirement in all cases. It is no longer considered to be tenablefor developers to decide on a case-by-case basis which kinds ofmigrations are trivial and can safely be run during an "expand" type ofphase, vs. those that are data- and lock- intensive if done in bulk andtherefore should be carefully rolled out over time at low scale.

Let me be clear that one of the big things I want to work on is cleaningup the model and database access code I see in Nova and many otherOpenstack applications. Right now it's complicated, slow, and isriddled with evidence that people didn't always have a firm grasp of theAPIs when they wrote it. But what we are talking about is creating ahard link between the complexity of the model/DB access code and theability to make necessary changes and improvements to the schema. Itmeans that every schema change now inflicts verbosity and complexitydirectly into the model and database access logic, not in aself-contained, write-once-and-forget-it database migration scriptelsewhere; data migrations and business model access code are to beliterally merged together most likely into the same function in manycases. This is definitely going to make my job of cleaning up,simplifying, and vastly improving the performance of this logic thatmuch more difficult. This is the squeeze point within the wholeapproach and it is also the one which the Nova team could offer theleast specifics on. While simple things like column transitionsshouldn't be too terrible, more significant changes like table moves orrestructurings will be very difficult; and as always, while this mightbe fine for Nova, it definitely is not appropriate for less matureOpenstack projects just starting out with new schema designs that willhave a bigger need for periodic refactorings.

The rationale for this hard-edged decision is that all-at-once datamigrations are slow and place an enormous load on the database, andtherefore must be banned in all cases, no matter how trivial. Ananecdotal reference to some obviously serious outage that occurredduring a Nova migration was cited as evidence.

I'm generally not in favor of this approach to a problem. The drivingphilosophy of SQLAlchemy and related tools are one of developerempowerment, not of shuttling away database details behindone-size-fits-all abstractions that keep developers as far from pointyand sharp edges as possible; because the edges aren't as sharp as youremember and a good developer is more deft with tools than you think.This philosophy is one that I developed over many years working atcompanies and watching how various forms of technical anxiety led to allkinds of obtuse, awkward, and sometimes outright byzantine ways ofoperating, all because something a long time ago failed to work asexpected, and it was therefore banned forever - it was usually my job toextricate teams from these ways of thinking and re-acquaint them withmore flexible and fluent approaches, while at the same time assuagingtheir anxiety that we can in fact use our brains to solve problemscorrectly as they come up rather than relying on iron bound constraintsthat are extremely difficult to modify from a technical perspective.

My proposal to Nova is not that they shouldn't go with this approach,but only that they proceed with a version of the idea that has an escapehatch, and at the same time that we make clear to other projects thatthis approach is a very specific road to travel and it should not beassumed to be appropriate for everyone. If Nova goes full on withonline schema migrations, it means there will no longer be any fixedschema migration files, and no way that even the most trivial datamigration can be implemented without going through the new system ofbuilding out a model and database access layer that talks to bothlogical schemas and has to migrate its own data over time. If OTOH theyimplement the exact same workflow, such that the migrations are stillgenerated into files that represent discrete and fixed states of aschema, they will be able to maintain that approach to a varying degree,as they are ultimately exercising the new workflow on top of atraditional system which can still allow for tuning and version controlof schema changes as well as inline data migrations whereappropriate. As a bonus, the system works in a fixed way and won'tdelete the objects planted by the operator; it also allows for atraditional dependency model that will ensure that certain moves alwayshappen before others, such as ensuring a "contract" against the previousversion is completed before the next version's "expand" proceeds, thusallowing the database to remain in a clean and defined state. If Iunderstood correctly, the current plan is that "contract" is an optionalthing that perhaps some operators might never do at all; they'd justhave a database which has old tables and columns from many versions agostill lying around.

The one objection raised to my alternative proposal is based on thenotion that a certain kind of database "move" might apply in one way toa particular target database and in a different way to another. Inthe general case, this notion doesn't hold a lot of validity, becausethe system is emitting Alembic directives in any case which themselvesare database agnostic; I only propose that we render the directives intoa fixed file first. The specific concept that was raised howeverregards the notion of a schema operation that in one case wants to bedone in the "expand" phase and in another wants to be done in the"migrate" phase. Asking for an example, the issue of certain indexesthat behave differently on different MySQL versions; an index additionthat would be a performance blocker during the "migrate" phase on anolder MySQL version should be blocked from the "expand" phase but mightbe safer to run within "expand" for later versions of MySQL.

But again, this is not a very difficult issue to overcome. The currentonline schema migration code already has within it a ruleset that canaccommodate such a rule. We simply move that rule to be within themigration directive itself. So that in the expand phase, instead of"op.create_index(indexname)", we have some kind of qualifier such as"op.create_index(indexname, create_rule=requires_high_rowcount)", orsimilar. Again, this is not manually coded in a migration, it isrendered out by the autogenerate facilities which would be utilized bythe online schema engine that has already been built. The originalonline schema blueprint referred to the advantage of working with"declarative" structures vs. "imperative" structures, and I certainlyagree; that's why Alembic's directives are themselves declarative andwhy the new rules as embedded will be very high level and declarativethemselves. I doubt very many of these new directives will be neededand they will be simple to implement in any case.

Alembic also supports alternative migration flows for tables such as"copy and move", which can be evaluated as options in some cases. In anycase, a system where we can manually establish particular migrations towork in certain ways is more flexible than one where we have to ensurethat a ruleset knows ahead of time how to detect and adapt to certaintables and conditions on the fly with no pre-defined direction. Withoutany place to establish migration behaviors declaratively other than themodel itself means that I can imagine that we ultimately would have tostart adding "hints" to our models, like"use_migration_style_X_on_mysql" to mapped classes, so that the onlineschema system has clues as to what we want to happen in certain cases.That version of the system would also be tasked with making guesses insome cases; after all, a declaration in the model itself doesn'tactually know what its being migrated *from*, as online schema changesstart with a schema that is essentially in an undefined state. It wouldbe better if migration directives still had a dedicated place of theirown to be explicitly laid out, version-controlled, built against aspecific and known previous state, tunable and configurable as needed,without mixing them up within the object model's declarations.

Having the expand/contract workflow available in such a way that iscompatible with traditional migration files means that this becomes afeature that Alembic can continue to add support for, and could evenbecome a feature within Alembic itself. Right now, the "autogenerate"feature has a pretty straight job of gathering a list of changes andspitting out a single migration file. It would be a great idea to openup this API such that different kinds of workflows can be plugged in,such that rulesets can interpret the autogenerate change stream intodifferent revision stream structures. We'd get all the capabilities ofthe expand/contract workflow without being rigidly welded to it, and asan Alembic feature it would mean the production of new kinds ofworkflows would be available through an upstream declarative system thatwould have the benefit of real-world use by other teams outside ofOpenstack.

The impact on other projects, not just other Openstack projects but alsoAlembic itself, is really why I'm motivated to comment on this system.It's not that it's so important to me if Nova has a certain process ornot (though I do want to clean up their database access code). It'smore that Nova is always looked upon as the driver for how all otherOpenstack applications do things; what they do is what we will all bedoing soon enough. Just look at the subject of this email thread; it'snot about Nova at all, it's about the Heat project, which is eagerlylooking to copy Nova's approach and wondering if they should just dothat instead of migrating to Alembic traditionally. This is why Ireally want to get my reservations out there. While I do want allprojects to be on similar database approaches that ultimately derivefrom the oslo.* namespace, I'd hope that this one can be opened up a bitbefore it is taken on by everyone else. The Nova team seemed to hearme on this and they'd like to encourage other projects to wait on movingto this approach until it can be proven. But they also agreed thatyeah, everyone likes to copy Nova a lot. The proof they say, will beif this approach fails completely and they decide it isn't working. Idon't think that will actually happen. Patterns like these just asoften drag development down in a more slow and subtle way and over timecontribute towards that calcified "don't change it!" culture that takesmonths or years to develop. For reference, google "Frog in theWater". It's a known thing. :)

Allowing certain patterns while always providing for a flexible "escapehatch" to work at different levels simultaneously, combined with astrong emphasis on explicitness, has always been the driving philosophyof SQLAlchemy. That's why there's a Core and an ORM which are separatebut highly interactive together. It's an approach that works and I'dlike to continue to encourage Openstack projects to subscribe to thisphilosophy.




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][heat] sqlalchemy-migrate tool to alembic

Reply via email to