Re: [openstack-dev] [nova] Distributed Database

Matthieu Simonin Mon, 02 May 2016 04:43:44 -0700


----- Mail original -----
> De: "Mike Bayer" <mba...@redhat.com>
> À: "OpenStack Development Mailing List (not for usage questions)" 
> <openstack-dev@lists.openstack.org>
> Envoyé: Jeudi 28 Avril 2016 18:57:59
> Objet: Re: [openstack-dev] [nova] Distributed Database
> 
> 
> 
> On 04/28/2016 08:44 AM, Edward Leafe wrote:
> > On Apr 24, 2016, at 3:28 PM, Robert Collins <robe...@robertcollins.net>
> > wrote:
> >
> >> For instance, the things I think are essential for a distributed
> >> database based datastore:
> >> - good single-machine developer story. Must not need a physical
> >> cluster to hack on OpenStack
> >> - deal gracefully with single node/rack/site failures (when deployed
> >> appropriately) - allow limiting failure domain impact
> >> - straightforward programming model: wrong uses should be obvious to
> >> reviewers
> >> - low latency performance with big datasets: e.g. nova list as an
> >> admin should be able to get the Nth page as rapidly as the 2nd or 3rd.
> >> - code to deliver that should be (approximately) no worse than the current
> >> code
> >
> > Agree on all of these points, as well as the rest of your post.
> >
> > After several hallway track discussions, as well as yesterday’s Cells V2
> > discussion, I’ve written a follow-up post:
> >
> > http://blog.leafe.com/index.php/2016/04/28/fragmented-data/
> >
> > Feedback, of course, is welcomed!
> 
> 
> Regarding ROME [1], I've taken a look at its source code and while it is
> certainly interesting, I wouldn't recommend lifting and moving all of
> Nova's database infrastructure onto it as a dependency within the near
> term, as the state of this code is very immature.  SQLAlchemy itself was
> once immature as well, so there is no sin here, but that was eleven
> years ago.



We definitely agree that the code is not mature. 
This code is a Proof-of-Concept made by a PhD in order to see whether the Nova 
(and Glance) code  can rely on a NoSQL system without requiring intrusive 
modifications.
In addition it allowed us to perform some performance tests and showed that 80% 
of the requests when booting a VM were faster using the current version of 
ROME+Redis.
We wanted to present this POC to the OpenStack Community in order to get some 
feedback on its relevance and, if possible, advice on how we can switch from a 
PoC to a concrete contribution in the upstream. 
you can refer to this pdf for further information : 
https://github.com/BeyondTheClouds/BeyondTheClouds.github.io/raw/master/DOCS/PAPERS/2016/RR-804/RR-480.pdf

> 
> The internals here are not only highly dependent on SQLAlchemy internals
> (pinned at the 0.9 series which is obsolete), 

We are thinking on how to change this (see below)

> it is using these APIs in
> a very brittle and non-performant way [2]. 

Yes. As mentionned, hopefully, it'll be much better in a few.

> In this code example, the
> internal elements of SQLAlchemy expression objects are repeatedly run
> through str() which on each call runs a full string compilation step in
> order to test for what their actual type is.  It can't be overstated how
> inappropriate this approach is and the author of the library would have
> benefited from reaching out to me in order to get some guidance on the
> correct way to introspect SQLAlchemy expression objects.  Basic Python
> idioms like type checking also seem to be misunderstood [3].
> 
> I don't think anyone denies that Nova can use any kind of database
> backend but the point was raised that to start from scratch with an
> entirely new database approach is an enormous job.   If the first step
> of that job is in fact "port SQLAlchemy and the relational model to
> Redis", that makes the job extremely more involved and I'd disagree with
> your post's assertion that "It's not too late" if this is the case.
> If the admission of ROME for Nova is that the relational model is in
> fact necessary for Nova, then that disqualifies NoSQL databases out of
> the gate - it's one thing to lament that MySQL is not as "distributed"
> out of the box as a NoSQL database, but it's another to lament that
> non-relational databases are not in fact relational.


As far as we understand the idea of an ORM is to hide the relational database 
with an Object oriented API. Concerning SQLAlchemy, 
relationnal aspect of the underlying database may also be used by the user but 
we observed that in Nova, most of the db interactions are written in an 
Object-oriented style (few queries are using SQL),
thus we don't think that Nova requires a relational database, it just requires 
an object oriented abstraction to manipulate a database.
The dependency with SQLAlchemy exists today because it simplified the 
implementation of our PoC. If having such a NoSQL db driver makes sense for the 
community (DragonFlow project's developers told us that they are also 
interested from such a driver), this dependency can be removed in order to 
directly switch from the object model to the NoSQL one.


Concretely, we think that there are three possible approaches:
    1) We can use the SQLAlchemy API as the common denominator between a 
relational and non-relational implementation of the db.api component. These two 
implementation could continue to converge by sharing a large amount of code.
    2) We create a new non-relational implementation (from scratch) of the 
db.api component. It would require probably more work.
    3) We are also studying a last alternative: writing a SQLAlchemy engine 
that targets NewSQL databases (scalability + ACID):
     - https://github.com/cockroachdb/cockroach
     - https://github.com/pingcap/tidb

Last but not the least, we expect to have a meeting with Joshua Harlow in order 
to see whether ROME can become an optional oslo.db driver.  We plan to have 
such a discussion by mid-May. 
According to the discussion, we can rewrite ROME in a more pythonic way. To 
achieve such a goal, we highlight that a full time engineer will join our  team 
on July, the 1st. He can re-implement ROME from scratch in an appropriate way 
(as we know now what is required to make Nova work with Redis, and with the 
support of OpenStack core-developers, we would be able to improve our 
proposition and continue to increase its performance).

Regarding all AMQP/ Cell V2 remarks, it can probably make sense to create 
another thread as it seems that several points need to be discussed/clarified 
(on our side, we would be interest to contribute on conducting performance 
evaluations of AMQP solutions such as 0MQ for instance). 


Matthieu Simonin 
for the discovery project
https://beyondtheclouds.github.io/

> 
> [1] https://github.com/BeyondTheClouds/rome
> 
> [2]
> https://github.com/BeyondTheClouds/rome/blob/master/lib/rome/core/expression/expression.py#L172
> 
> [3]
> https://github.com/BeyondTheClouds/rome/blob/master/lib/rome/core/expression/expression.py#L102
> 
> >
> >
> > -- Ed Leafe
> >
> >
> >
> >
> >
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Distributed Database

Reply via email to