On 08/25/2016 01:13 PM, Steve Martinelli wrote:
The keystone team is pursuing a trigger-based approach to support
rolling, zero-downtime upgrades. The proposed operator experience is
documented here:

  http://docs.openstack.org/developer/keystone/upgrading.html

This differs from Nova and Neutron's approaches to solve for rolling
upgrades (which use oslo.versionedobjects), however Keystone is one of
the few services that doesn't need to manage communication between
multiple releases of multiple service components talking over the
message bus (which is the original use case for oslo.versionedobjects,
and for which it is aptly suited). Keystone simply scales horizontally
and every node talks directly to the database.

Database triggers are obviously a new challenge for developers to write,
honestly challenging to debug (being side effects), and are made even
more difficult by having to hand write triggers for MySQL, PostgreSQL,
and SQLite independently (SQLAlchemy offers no assistance in this case),
as seen in this patch:

  https://review.openstack.org/#/c/355618/

However, implementing an application-layer solution with
oslo.versionedobjects is not an easy task either; refer to Neutron's
implementation:


https://review.openstack.org/#/q/topic:bp/adopt-oslo-versioned-objects-for-db

Our primary concern at this point are how to effectively test the
triggers we write against our supported database systems, and their
various deployment variations. We might be able to easily drop SQLite
support (as it's only supported for our own test suite), but should we
expect variation in support and/or actual behavior of triggers across
the MySQLs, MariaDBs, Perconas, etc, of the world that would make it
necessary to test each of them independently? If you have operational
experience working with triggers at scale: are there landmines that we
need to be aware of? What is it going to take for us to say we support
*zero* dowtime upgrades with confidence?

I would really hold off doing anything triggers related until there was sufficient testing for that, especially with potentially dirty data.

Triggers also really bring in a whole new DSL that people need to learn and understand, not just across this boundary, but in the future debugging issues. And it means that any errors happening here are now in a place outside of normal logging / recovery mechanisms.

There is a lot of value that in these hard problem spaces like zero down uptime we keep to common patterns between projects because there are limited folks with the domain knowledge, and splitting that even further makes it hard to make this more universal among projects.

        -Sean

--
Sean Dague
http://dague.net

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to