Re: [openstack-dev] [nova] Distributed Database

Mike Bayer Mon, 02 May 2016 08:55:30 -0700


On 05/02/2016 07:38 AM, Matthieu Simonin wrote:



As far as we understand the idea of an ORM is to hide the relational database 
with an Object oriented API.

I actually disagree with that completely. The reason ORMs are somaligned is because of this misconception; developer attempts to use anORM so that they will need not have to have any awareness of theirdatabase, how queries are constructed, or even its schema's design;witness tools such as Django ORM and Rails ActiveRecord which promisethis. You then end up with an inefficient and unextensible messbecause the developers never considered anything about how the databaseworks or how it is queried, nor do they even have easy ways to monitoror control it while still making use of the tool. There are many blogposts and articles that discuss this and it is in general known as the"object relational impedance mismatch".

SQLAlchemy's success comes from its rejection of this entire philosophy.The purpose of SQLAlchemy's ORM is not to "hide" anything but ratherto apply automation to the many aspects of relational databasecommunication as well as row->object mapping that otherwise expressthemselves in an application as either a large amount of repetitiveboilerplate throughout an application or as an awkward series of ad-hocabstractions that don't really do the job very well. SQLAlchemy isdesigned to expose both the schema design as well as the structure ofqueries completely. My talk at [1] goes into this topic in detailincluding specific API architectures that facilitate this concept.

It's for that reason that I've always rejected notions of attempting toapply SQLAlchemy directly on top of a datastore that is explicitlynon-relational. By doing so, you remove a vast portion of thefunctionality that relational databases provide and there's really nopoint in using a tool like SQLAlchemy that is very explicit about DDLand SQL on top of that kind of database.

To effectively put SQLAlchemy on top of a non-relational datastore, whatyou really want to do is build an entire SQL engine on top of it. Thisis actually feasible; I was doing work for the now-defunct FoundationDB(was bought by Apple) who had a very good implementation ofSQL-on-top-of-distributed keystore going, and the Cockroach and TiDBprojects you mention are definitely the most appropriate choice to takeif a certain variety of distribution underneath SQL is desired.


 Concerning SQLAlchemy,

relationnal aspect of the underlying database may also be used by the user but 
we observed that in Nova, most
of the db interactions are written in an Object-oriented style (few queries are 
using SQL),
thus we don't think that Nova requires a relational database, it just requires 
an object oriented abstraction to manipulate a database.

Well IMO that's actually often a problem. My goal across Openstackprojects in general is to allow them to make use of SQL more effectivelythan they do right now; for example, in Neutron I am helping them tomove a block of code that inefficiently needs to load a block of datainto memory, scan it for CIDR overlaps, and then push data back out.This approach prevents it from performing a single UPDATE statement andushers in the need for pessimistic locking against concurrenttransactions. Instead, I've written for them a simple stored functionproof-of-concept [2] that will allow the entire operation to beperformed on the database side alone in a single statement. Wins likethese are much less feasible if not impossible when a project decides itwants to split its backend store between dramatically differentdatabases which don't offer such features.


Concretely, we think that there are three possible approaches:
     1) We can use the SQLAlchemy API as the common denominator between a 
relational and non-relational implementation of the db.api component. These two 
implementation could continue to converge by sharing a large amount of code.
     2) We create a new non-relational implementation (from scratch) of the 
db.api component. It would require probably more work.
     3) We are also studying a last alternative: writing a SQLAlchemy engine 
that targets NewSQL databases (scalability + ACID):
      - https://github.com/cockroachdb/cockroach
      - https://github.com/pingcap/tidb

Going with a NewSQL backend is by far the best approach here. Thatway, very little needs to be reinvented and the application's approachto data doesn't need to dramatically change.

But also, w.r.t. Cells there seems to be some remaining debate over whyexactly a distributed approach is even needed. As others have posted, asingle MySQL database, replicated across Galera or not, scales just finefor far more data than Nova ever needs to store. So it's not clear whythe need for a dramatic rewrite of its datastore is called for.

[1]http://www.sqlalchemy.org/library.html#handcodedapplicationswithsqlalchemy


[2] https://gist.github.com/zzzeek/a3bccad40610b9b69803531cc71a79b1

Matthieu Simonin
for the discovery project
https://beyondtheclouds.github.io/


[1] https://github.com/BeyondTheClouds/rome

[2]
https://github.com/BeyondTheClouds/rome/blob/master/lib/rome/core/expression/expression.py#L172

[3]
https://github.com/BeyondTheClouds/rome/blob/master/lib/rome/core/expression/expression.py#L102



-- Ed Leafe






__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Distributed Database

Reply via email to