Re: [openstack-dev] [oslo][nova] Messaging: everything can talk to everything, and that is a bad thing

Adam Young Tue, 22 Mar 2016 14:26:46 -0700

On 03/22/2016 09:15 AM, Flavio Percoco wrote:

On 21/03/16 21:43 -0400, Adam Young wrote:
I had a good discussion with the Nova folks in IRC today.
My goal was to understand what could talk to what, and the shortaccording to dansmith
" any node in nova land has to be able to talk to the queue for anyother one for the most part: compute->compute, compute->conductor,conductor->compute, api->everything. There might be a few exceptions,but not worth it, IMHO, in the current architecture."
Longer conversation is here:
http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2016-03-21.log.html#t2016-03-21T17:54:27
Right now, the message queue is a nightmare. All sorts of sensitiveinformation flows over the message queue: Tokens (including admin)are the most obvious. Every piece of audit data. All notificationsand all control messages.
Before we continue down the path of "anything can talk to anything"can we please map out what needs to talk to what, and why? Many ofthe use cases seem to be based on something that should be kicked offby the conductor, such as "migrate, resize, live-migrate" and itsounds like there are plans to make that happen.
So, let's assume we can get to the point where, if node 1 needs totalk to node 2, it will do so only via the conductor. With that inplace, we can put an access control rule in place:
I don't think this is going to scale well. Eventually, this will require
evolving the conductor to some sort of message scheduler, which ispretty much
what the message bus is supposed to do.

I'll limit this to what happens with Rabbit and QPID (AMQP1.0) and leave0 our of it for now. I'll use rabbit as shorthand for both these, butthe rules are the same for qpid.

For, say, a migrate operation, the call goes to API, controller, andeventually down to one of the compute nodes. Source? Target? I don'tknow the code well enough to say, but let's say it is the source. Itsends an RPC message to the target node. The message goes to thecentral broker right now, and then back down to the targen node.Meanwhile, the source node has set up a reply queue and that queue namehas gone into the message. The target machine responds by getting areference to the response queue and sends a message. This message goesup to the broker, and then down to the the source node.

A man in the middle could sit there and also read off the queue. Itcould modify a message, with its own response queue, and happily tranferthings back and forth.

So, we have the HMAC proposal, which then puts crypto and keydistribution all over the place. Yes, it would guard against a MITMattack, but the cost in complexity and processor time it high.

Rabbit does not have a very flexible ACL scheme, bascially, a RegEx perRabbit user. However, we could easily spin up a new queue for directnode to node communication that did meet an ACL regex. For example, ifwe said that the regex was that the node could only read/write queuesthat have its name in them, to make a request and response queue betweennode-1 and node-2 we could create a queues



node-1-node-2
node-1-node-2-<uuid>-reply

So, instead of a single queue request, there are two. And conductorcould tell the target node: start listening on this queue.

Or, we could pass the message through the conductor. The requestmessage goes from node-1 to conductor, where conductor validates thebusinees logic of the message, then puts it into the message queue fornode-2. Responses can then go directly back from node-2 to node-1 theway they do now.

OR...we could set up a direct socket between the two nodes, with thesocket set up info going over the broker. OR we could use a webserver, OR send it over SNMP. Or SMTP, OR TFTP. There are many waysto get the messages from node to node.

If we are going to use the message broker to do this, we should atleast make it possible to secure it, even if it is not the default approach.

It might be possible to use a broker specific technology to optimizethis, but I am not a Rabbit expert. Maybe there is some way offiltering messages?

1. Compute nodes can only read from the queuecompute.<name>-novacompute-<index>.localdomain
2.  Compute nodes can only write to response queues in the RPC vhost
3. Compute nodes can only write to notification queus in thenotification host.
I know that with AMQP, we should be able to identify the writer of amessage. This means that each compute node should have its ownuser. I have identified how to do that for Rabbit and QPid. Iassume for 0mq is would make sense to use ZAP(http://rfc.zeromq.org/spec:27) but I'd rather the 0mq maintainerschime in here.
NOTE: Gentle reminder that qpidd has been removed from oslo.messaging.

Yes, but QPID is proton is AMQP1.0 and I did a proof of concept with itlast summer. It supports encryption and authentication over GSSAPI andis, I think, the best option for securing messaging in an OpenStackdeployment at the moment.

I think you can configure rabbit, amqp1 and other technologies to dowhat you'resuggesting here without much trouble. TBH, I'm not sure how manychances would
be required in Nova (or even oslo.messaging) but I'd dare to say non are
required.
I think it is safe (and sane) to have the same use on the computenode communicate with Neutron, Nova, and Ceilometer. This willavoid a false sense of security: if one is compromised, they are allgoing to be compromised. Plan accordingly.
Beyond that, we should have message broker users for each of thecomponents that is a client of the broker.
Applications that run on top of the cloud, and that do not getpresence on the compute nodes, should have their own VHost. I seeSahara on my Tripleo deploy, but I assume there are others. Eitherthey completely get their own vhost, or the apps should share oneseparate from the RPC/Notification vhosts we currently have. EvenHeat might fall into this category.
Note that those application users can be allowed to read from thenotification queues if necessary. They just should not be using thesame vhost for their own traffic.
Please tell me if/where I am blindingly wrong in my analysis.
I guess my question is: Have you identified things that need to bechanged in
any of the projects for this to be possible? Or is it a pure deployment
recommendation/decision?

There are certainly deployment changes we need to make that help. And wecan likely make it such that the compute nodes can only read from theirown appropriate queues. However, without changing the queue namingscheme, I can't see how to control who can write to where. Right now,its a free for all.

I'd argue that any change (assuming changes are required) are likelyto happen
in specific projects (Nova, Neutron, etc) and that once this scenario is
supported, it'll remain a deployment choice to follow it or not. If Iwant myundercloud services to use a single vhost and a single user, I must beable todo that. The proposal in this email complicates deploymentssignificantly,
despite it making sense from a security stand point.

So, nothing I am saying is preventing that. OTOH, there is insufficientsupport from the RPC approach to do a more secure ACL.

One more thing. Depending on the messaging technology, havingdifferent virtualhosts may have an impact on the performance when running under hugeloads given
the fact that the data will be partitioned differently and, therefore,
written/read differently. I don't have good data at hand about this,sorry.

So, I think that performance can be optimized many ways, includinghaving multiple Brokers involved in a deployment. I've seenarchitecture diagrams to that effect, but have not had to put it in toproduction myself.


Flavio



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo][nova] Messaging: everything can talk to everything, and that is a bad thing

Reply via email to