On 09/30/2014 08:03 AM, Soren Hansen wrote:
2014-09-12 1:05 GMT+02:00 Jay Pipes <jaypi...@gmail.com>:
If Nova was to take Soren's advice and implement its data-access layer
on top of Cassandra or Riak, we would just end up re-inventing SQL
Joins in Python-land.
I may very well be wrong(!), but this statement makes it sound like you've
never used e.g. Riak. Or, if you have, not done so in the way it's
supposed to be used.
If you embrace an alternative way of storing your data, you wouldn't just
blindly create a container for each table in your RDBMS.
For example: In Nova's SQL-based datastore we have a table for security
groups and another for security group rules. Rows in the security group
rules table have a foreign key referencing the security group to which
they belong. In a datastore like Riak, you could have a security group
container where each value contains not just the security group
information, but also all the security group rules. No joins in
Python-land necessary.
OK, that's all fine for a simple one-to-many relation.
How would I go about getting the associated fixed IPs for a network? The
query to get associated fixed IPs for a network [1] in Nova looks like this:
SELECT
fip.address,
fip.instance_uuid,
fip.network_id,
fip.virtual_interface_id,
vif.address,
i.hostname,
i.updated_at,
i.created_at,
fip.allocated,
fip.leased,
vif2.id
FROM fixed_ips fip
LEFT JOIN virtual_interfaces vif
ON vif.id = fip.virtual_interface_id
AND vif.deleted = 0
LEFT JOIN instances i
ON fip.instance_uuid = i.uuid
AND i.deleted = 0
LEFT JOIN (
SELECT MIN(vi.id) AS id, vi.instance_uuid
FROM virtual_interfaces vi
GROUP BY instance_uuid
) as vif2
WHERE fip.deleted = 0
AND fip.network_id = :network_id
AND fip.virtual_interface_id IS NOT NULL
AND fip.instance_uuid IS NOT NULL
AND i.host = :host
would I have a Riak container for virtual_interfaces that would also
have instance information, network information, fixed_ip information?
How would I accomplish the query against a derived table that gets the
minimum virtual interface ID for each instance UUID?
More than likely, I would end up having to put a bunch of indexes and
relations into my Riak containers and structures just so I could do
queries like the above. Failing that, I'd need to do multiple queries to
multiple Riak containers and then join the resulting projection in
memory, in Python. And that is why I say you will just end up
implementing joins in Python.
A relational database was built for the above types of queries, and
that's why I said it's the best tool for the job *in this specific case*.
Now... that said...
Is it possible to go through the Nova schema and identify mini-schemas
that could be pulled out of the RDBMS and placed into Riak or Cassandra?
Absolutely yes! The service group and compute node usage records are
good candidates for that, in my opinion. With the nova.objects work that
was completed over the last few cycles, we might actually now have the
foundation in place to make doing this a reality. I welcome your
contributions in this area.
[1]
https://github.com/openstack/nova/blob/stable/icehouse/nova/db/sqlalchemy/api.py#L2608
I've said it before, and I'll say it again. In Nova at least, the SQL
schema is complex because the problem domain is complex. That means
lots of relations, lots of JOINs, and that means the best way to query
for that data is via an RDBMS.
I was really hoping you could be more specific than "best"/"most
appropriate" so that we could have a focused discussion.
I don't think relying on a central data store is in any conceivable way
appropriate for a project like OpenStack. Least of all Nova.
I don't see how we can build a highly available, distributed service on
top of a centralized data store like MySQL.
Tens or hundreds of thousands of nodes, spread across many, many racks
and datacentre halls are going to experience connectivity problems[1].
This means that some percentage of your infrastructure (possibly many
thousands of nodes, affecting many, many thousands of customers) will
find certain functionality not working on account of your datastore not
being reachable from the part of the control plane they're attempting to
use (or possibly only being able to read from it).
I say over and over again that people should own their own uptime.
Expect things to fail all the time. Do whatever you need to do to ensure
your service keeps working even when something goes wrong. Of course
this applies to our customers too. Even if we take the greatest care to
avoid downtime, customers should spread their workloads across multiple
availability zones and/or regions and probably even multiple cloud
providers. Their service towards their users is their responsibility.
However, our service towards our users is our responsibility. We should
take the greatest care to avoid having internal problems affect our
users. Building a massively distributed system like Nova on top of a
centralized data store is practically a guarantee of the opposite.
I don't disagree with anything you say above. At all. I welcome the
coming cycles where we will get to split pieces out of Nova (which will
afford us the opportunity to decouple certain mini-schemas from the
RDBMS and use more appropriate distributed data stores like Cassandra or
Riak for those smaller schemas).
For complex control plane software like Nova, though, an RDBMS is the
best tool for the job given the current lay of the land in open source
data storage solutions matched with Nova's complex query and
transactional requirements.
What transactional requirements?
https://github.com/openstack/nova/blob/stable/icehouse/nova/db/sqlalchemy/api.py#L1654
When you delete an instance, you don't want the delete to just stop
half-way through the transaction and leave around a bunch of orphaned
children. Similarly, when you reserve something, it helps to not have a
half-finished state change that you need to go clean up if something
goes boom.
https://github.com/openstack/nova/blob/stable/icehouse/nova/db/sqlalchemy/api.py#L3054
Folks in these other programs have actually, you know, thought about
these kinds of things and had serious discussions about alternatives.
It would be nice to have someone acknowledge that instead of snarky
comments implying everyone else "has it wrong".
I'm terribly sorry, but repeating over and over that an RDBMS is "the
best tool" without further qualification than "Nova's data model is
really complex" reads *exactly* like a snarky comment implying everyone
else "has it wrong".
Sorry if I sound snarky. I thought your blog post was the definition of
snark.
Best,
-jay
[1]: http://aphyr.com/posts/288-the-network-is-reliable
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev