On 05/02/2016 07:38 AM, Matthieu Simonin wrote:

As far as we understand the idea of an ORM is to hide the relational database 
with an Object oriented API.

I actually disagree with that completely. The reason ORMs are so maligned is because of this misconception; developer attempts to use an ORM so that they will need not have to have any awareness of their database, how queries are constructed, or even its schema's design; witness tools such as Django ORM and Rails ActiveRecord which promise this. You then end up with an inefficient and unextensible mess because the developers never considered anything about how the database works or how it is queried, nor do they even have easy ways to monitor or control it while still making use of the tool. There are many blog posts and articles that discuss this and it is in general known as the "object relational impedance mismatch".

SQLAlchemy's success comes from its rejection of this entire philosophy. The purpose of SQLAlchemy's ORM is not to "hide" anything but rather to apply automation to the many aspects of relational database communication as well as row->object mapping that otherwise express themselves in an application as either a large amount of repetitive boilerplate throughout an application or as an awkward series of ad-hoc abstractions that don't really do the job very well. SQLAlchemy is designed to expose both the schema design as well as the structure of queries completely. My talk at [1] goes into this topic in detail including specific API architectures that facilitate this concept.

It's for that reason that I've always rejected notions of attempting to apply SQLAlchemy directly on top of a datastore that is explicitly non-relational. By doing so, you remove a vast portion of the functionality that relational databases provide and there's really no point in using a tool like SQLAlchemy that is very explicit about DDL and SQL on top of that kind of database.

To effectively put SQLAlchemy on top of a non-relational datastore, what you really want to do is build an entire SQL engine on top of it. This is actually feasible; I was doing work for the now-defunct FoundationDB (was bought by Apple) who had a very good implementation of SQL-on-top-of-distributed keystore going, and the Cockroach and TiDB projects you mention are definitely the most appropriate choice to take if a certain variety of distribution underneath SQL is desired.

 Concerning SQLAlchemy,
relationnal aspect of the underlying database may also be used by the user but 
we observed that in Nova, most
of the db interactions are written in an Object-oriented style (few queries are 
using SQL),
thus we don't think that Nova requires a relational database, it just requires 
an object oriented abstraction to manipulate a database.

Well IMO that's actually often a problem. My goal across Openstack projects in general is to allow them to make use of SQL more effectively than they do right now; for example, in Neutron I am helping them to move a block of code that inefficiently needs to load a block of data into memory, scan it for CIDR overlaps, and then push data back out. This approach prevents it from performing a single UPDATE statement and ushers in the need for pessimistic locking against concurrent transactions. Instead, I've written for them a simple stored function proof-of-concept [2] that will allow the entire operation to be performed on the database side alone in a single statement. Wins like these are much less feasible if not impossible when a project decides it wants to split its backend store between dramatically different databases which don't offer such features.

Concretely, we think that there are three possible approaches:
     1) We can use the SQLAlchemy API as the common denominator between a 
relational and non-relational implementation of the db.api component. These two 
implementation could continue to converge by sharing a large amount of code.
     2) We create a new non-relational implementation (from scratch) of the 
db.api component. It would require probably more work.
     3) We are also studying a last alternative: writing a SQLAlchemy engine 
that targets NewSQL databases (scalability + ACID):
      - https://github.com/cockroachdb/cockroach
      - https://github.com/pingcap/tidb

Going with a NewSQL backend is by far the best approach here. That way, very little needs to be reinvented and the application's approach to data doesn't need to dramatically change.

But also, w.r.t. Cells there seems to be some remaining debate over why exactly a distributed approach is even needed. As others have posted, a single MySQL database, replicated across Galera or not, scales just fine for far more data than Nova ever needs to store. So it's not clear why the need for a dramatic rewrite of its datastore is called for.

[1] http://www.sqlalchemy.org/library.html#handcodedapplicationswithsqlalchemy

[2] https://gist.github.com/zzzeek/a3bccad40610b9b69803531cc71a79b1

Matthieu Simonin
for the discovery project

[1] https://github.com/BeyondTheClouds/rome



-- Ed Leafe

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Reply via email to