On 09/30/2014 08:03 AM, Soren Hansen wrote:
2014-09-12 1:05 GMT+02:00 Jay Pipes <jaypi...@gmail.com>:
If Nova was to take Soren's advice and implement its data-access layer
on top of Cassandra or Riak, we would just end up re-inventing SQL
Joins in Python-land.

I may very well be wrong(!), but this statement makes it sound like you've
never used e.g. Riak. Or, if you have, not done so in the way it's
supposed to be used.

If you embrace an alternative way of storing your data, you wouldn't just
blindly create a container for each table in your RDBMS.

For example: In Nova's SQL-based datastore we have a table for security
groups and another for security group rules. Rows in the security group
rules table have a foreign key referencing the security group to which
they belong. In a datastore like Riak, you could have a security group
container where each value contains not just the security group
information, but also all the security group rules. No joins in
Python-land necessary.

OK, that's all fine for a simple one-to-many relation.

How would I go about getting the associated fixed IPs for a network? The query to get associated fixed IPs for a network [1] in Nova looks like this:

SELECT
 fip.address,
 fip.instance_uuid,
 fip.network_id,
 fip.virtual_interface_id,
 vif.address,
 i.hostname,
 i.updated_at,
 i.created_at,
 fip.allocated,
 fip.leased,
 vif2.id
FROM fixed_ips fip
LEFT JOIN virtual_interfaces vif
 ON vif.id = fip.virtual_interface_id
 AND vif.deleted = 0
LEFT JOIN instances i
 ON fip.instance_uuid = i.uuid
 AND i.deleted = 0
LEFT JOIN (
 SELECT MIN(vi.id) AS id, vi.instance_uuid
 FROM virtual_interfaces vi
 GROUP BY instance_uuid
) as vif2
WHERE fip.deleted = 0
AND fip.network_id = :network_id
AND fip.virtual_interface_id IS NOT NULL
AND fip.instance_uuid IS NOT NULL
AND i.host = :host

would I have a Riak container for virtual_interfaces that would also have instance information, network information, fixed_ip information? How would I accomplish the query against a derived table that gets the minimum virtual interface ID for each instance UUID?

More than likely, I would end up having to put a bunch of indexes and relations into my Riak containers and structures just so I could do queries like the above. Failing that, I'd need to do multiple queries to multiple Riak containers and then join the resulting projection in memory, in Python. And that is why I say you will just end up implementing joins in Python.

A relational database was built for the above types of queries, and that's why I said it's the best tool for the job *in this specific case*.

Now... that said...

Is it possible to go through the Nova schema and identify mini-schemas that could be pulled out of the RDBMS and placed into Riak or Cassandra? Absolutely yes! The service group and compute node usage records are good candidates for that, in my opinion. With the nova.objects work that was completed over the last few cycles, we might actually now have the foundation in place to make doing this a reality. I welcome your contributions in this area.

[1] https://github.com/openstack/nova/blob/stable/icehouse/nova/db/sqlalchemy/api.py#L2608

I've said it before, and I'll say it again. In Nova at least, the SQL
schema is complex because the problem domain is complex. That means
lots of relations, lots of JOINs, and that means the best way to query
for that data is via an RDBMS.

I was really hoping you could be more specific than "best"/"most
appropriate" so that we could have a focused discussion.

I don't think relying on a central data store is in any conceivable way
appropriate for a project like OpenStack. Least of all Nova.

I don't see how we can build a highly available, distributed service on
top of a centralized data store like MySQL.

Tens or hundreds of thousands of nodes, spread across many, many racks
and datacentre halls are going to experience connectivity problems[1].

This means that some percentage of your infrastructure (possibly many
thousands of nodes, affecting many, many thousands of customers) will
find certain functionality not working on account of your datastore not
being reachable from the part of the control plane they're attempting to
use (or possibly only being able to read from it).

I say over and over again that people should own their own uptime.
Expect things to fail all the time. Do whatever you need to do to ensure
your service keeps working even when something goes wrong. Of course
this applies to our customers too. Even if we take the greatest care to
avoid downtime, customers should spread their workloads across multiple
availability zones and/or regions and probably even multiple cloud
providers. Their service towards their users is their responsibility.

However, our service towards our users is our responsibility. We should
take the greatest care to avoid having internal problems affect our
users.  Building a massively distributed system like Nova on top of a
centralized data store is practically a guarantee of the opposite.

I don't disagree with anything you say above. At all. I welcome the coming cycles where we will get to split pieces out of Nova (which will afford us the opportunity to decouple certain mini-schemas from the RDBMS and use more appropriate distributed data stores like Cassandra or Riak for those smaller schemas).

For complex control plane software like Nova, though, an RDBMS is the
best tool for the job given the current lay of the land in open source
data storage solutions matched with Nova's complex query and
transactional requirements.

What transactional requirements?

https://github.com/openstack/nova/blob/stable/icehouse/nova/db/sqlalchemy/api.py#L1654

When you delete an instance, you don't want the delete to just stop half-way through the transaction and leave around a bunch of orphaned children. Similarly, when you reserve something, it helps to not have a half-finished state change that you need to go clean up if something goes boom.

https://github.com/openstack/nova/blob/stable/icehouse/nova/db/sqlalchemy/api.py#L3054

Folks in these other programs have actually, you know, thought about
these kinds of things and had serious discussions about alternatives.
It would be nice to have someone acknowledge that instead of snarky
comments implying everyone else "has it wrong".

I'm terribly sorry, but repeating over and over that an RDBMS is "the
best tool" without further qualification than "Nova's data model is
really complex" reads *exactly* like a snarky comment implying everyone
else "has it wrong".

Sorry if I sound snarky. I thought your blog post was the definition of snark.

Best,
-jay

[1]: http://aphyr.com/posts/288-the-network-is-reliable


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to