Re: [openstack-dev] [nova] Distributed Database

Mike Bayer Sat, 23 Apr 2016 21:37:59 -0700


On 04/22/2016 04:27 PM, Ed Leafe wrote:

OK, so I know that Friday afternoons are usually the worst times to
write a blog post and start an email discussion, and that the Friday
immediately before a Summit is the absolute worst, but I did it anyway.

http://blog.leafe.com/index.php/2016/04/22/distributed_data_nova/

Summary: we are creating way too much complexity by trying to make Nova
handle things that are best handled by a distributed database. The
recent split of the Nova DB into an API database and separate cell
databases is the glaring example of going down the wrong road.

Anyway, read it on your flight (or, in my case, drive) to Austin, and
feel free to pull me aside to explain just how wrong I am. ;-)

Distributed databases aren't mutually exclusive against SQL databases.I am only vaguely familiar with Cells and how it divides up data intoentirely different databases of the same schema, and perhaps it wasn'texecuted well, however a discussion like this would need to separate theconcept of "distributed" from the notion that "that means we need adatabase that advertises itself as distributed!".

The general problem Cells is solving strikes me very much as atraditional horizontal sharding problem. While key stores like toadvertise that cross-database sharding is very easy with plainkey/values, that's at the expense of the enormous amount offunctionality you give up, including ACID and the relational model.There's no reason you can't horizontally shard a relational database,and while Cells seems like it's made this approach somewhat rigid, itdoesn't have to be that way. SQLAlchemy has long had a horizontalsharding extension and relational databases like Postgresql also includehorizontal sharding structures built in (seehttp://www.postgresql.org/docs/9.1/static/ddl-partitioning.html). Ifyou shard your data into compartments the way Cells does, you can stillpretty much keep ACID local to one database at a time, or if you want todistribute a transaction you can use two phase commit which MySQL andPostgresql both support.

A key reason the NoSQL movement failed to completely replace relationaldatabases as its advocates seemed to think would happen about five yearsago, was that they spent lots of time claiming to solve problems in SQLthat weren't actually problems, such as the idea that "schemaless" iseasier to work with (there's always a schema, NoSQL just has no way ofvalidating or enforcing it), or that you just couldn't do key/valuetransactions nearly as fast with ACID (until Postgresql made a fewtweaks and successfully beats MongoDB at this task now).

It may or may not be the case that "Cells didn't do a very good job ofdistributing SQL" but that doesn't mean "SQL is not appropriate fordistributing data". Facebook and LinkedIn have built distributeddatabase systems based on MySQL at profoundly massive scales.Openstack's problem I'm going to guess isn't as hard as that.





__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Distributed Database

Reply via email to