On 04/22/2016 04:27 PM, Ed Leafe wrote:
OK, so I know that Friday afternoons are usually the worst times to write a blog post and start an email discussion, and that the Friday immediately before a Summit is the absolute worst, but I did it anyway. http://blog.leafe.com/index.php/2016/04/22/distributed_data_nova/ Summary: we are creating way too much complexity by trying to make Nova handle things that are best handled by a distributed database. The recent split of the Nova DB into an API database and separate cell databases is the glaring example of going down the wrong road. Anyway, read it on your flight (or, in my case, drive) to Austin, and feel free to pull me aside to explain just how wrong I am. ;-)
Distributed databases aren't mutually exclusive against SQL databases. I am only vaguely familiar with Cells and how it divides up data into entirely different databases of the same schema, and perhaps it wasn't executed well, however a discussion like this would need to separate the concept of "distributed" from the notion that "that means we need a database that advertises itself as distributed!".
The general problem Cells is solving strikes me very much as a traditional horizontal sharding problem. While key stores like to advertise that cross-database sharding is very easy with plain key/values, that's at the expense of the enormous amount of functionality you give up, including ACID and the relational model. There's no reason you can't horizontally shard a relational database, and while Cells seems like it's made this approach somewhat rigid, it doesn't have to be that way. SQLAlchemy has long had a horizontal sharding extension and relational databases like Postgresql also include horizontal sharding structures built in (see http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html). If you shard your data into compartments the way Cells does, you can still pretty much keep ACID local to one database at a time, or if you want to distribute a transaction you can use two phase commit which MySQL and Postgresql both support.
A key reason the NoSQL movement failed to completely replace relational databases as its advocates seemed to think would happen about five years ago, was that they spent lots of time claiming to solve problems in SQL that weren't actually problems, such as the idea that "schemaless" is easier to work with (there's always a schema, NoSQL just has no way of validating or enforcing it), or that you just couldn't do key/value transactions nearly as fast with ACID (until Postgresql made a few tweaks and successfully beats MongoDB at this task now).
It may or may not be the case that "Cells didn't do a very good job of distributing SQL" but that doesn't mean "SQL is not appropriate for distributing data". Facebook and LinkedIn have built distributed database systems based on MySQL at profoundly massive scales. Openstack's problem I'm going to guess isn't as hard as that.
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev