On Wed, Jul 6, 2016 at 7:24 AM, Aaron Conole <acon...@redhat.com> wrote:
> Aaron Conole <acon...@redhat.com> writes:
>
>> Daniele Di Proietto <diproiet...@vmware.com> writes:
>>
>>> On 10/06/2016 10:51, "Aaron Conole" <acon...@redhat.com> wrote:
>>>
>>>>Aaron Conole <acon...@redhat.com> writes:
>>>>
>>>>> Christian Ehrhardt <christian.ehrha...@canonical.com> writes:
>>>>>
>>>>>> On Tue, May 24, 2016 at 4:10 PM, Aaron Conole <acon...@redhat.com> wrote:
>>>>>>
>>>>>>> Daniele Di Proietto <diproiet...@vmware.com> writes:
>>>>>>>
>>>>>>> > Hi Aaron,
>>>>>>> >
>>>>>>> > I'm still a little bit nervous about calling chown on a (partially)
>>>>>>> > user controlled file name.
>>>>>>>
>>>>>>> I agree, that always seems scary.
>>>>>>>
>>>>>>> > Before moving forward I wanted to discuss a couple of other options:
>>>>>>> >
>>>>>>> > * Ansis (in CC) suggested using -runas parameter in qemu.  This way
>>>>>>> > qemu can open the socket as root and drop privileges before starting
>>>>>>> > guest execution.
>>>>>>>
>>>>>>> I'm not sure how to do this with libvirt, or via the OpenStack Neutron
>>>>>>> plugin.  I also don't know if it would be an acceptable workaround for
>>>>>>> users.  Additionally, I recall there being something of a "don't even
>>>>>>> know if this works" around it.  Maybe Christian or Ansis (both in CC)
>>>>>>> can expound on it.
>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>> IIRC we kind of agree that long term a proper MAC will be much better but
>>>>>> most involved people needed something to get it working like "now".
>>>>>> Since they are complementary (other than the fix removing a bit of the
>>>>>> urgency for more MAC) it was kind of the least bad option.
>>>>>>
>>>>>> You have to be aware that I brought up the discussion on d...@dpdk.org - 
>>>>>> see
>>>>>> [1] and [2]:
>>>>>> But this will take time and eventually still be the applications task to
>>>>>> "do something" - no matter if via API or via the chmod's right now.
>>>>>> So Aaron is trying to get something that works now until the long term
>>>>>> things are in place, which I appreciate.
>>>>>>
>>>>>> FYI - I was even more in a hurry as it was clear that OVS-2.5 won't get
>>>>>> this in time I run with [3] for now.
>>>>>> I never intended to suggest that, but with the discussion in place, one
>>>>>> could ask if you (Aaron) want to pick up that instead.
>>>>>> That would keep OVS free for now until DPDK made up the API (see [2]) for
>>>>>> socket ownership control and this then could be implemented in OVS?
>>>>>>
>>>>>> (I hope) In some months/years we will all be happy to drop this bunch of
>>>>>> interim solutions, never the less we need it for now.
>>>>>>
>>>>>> [1]: http://dpdk.org/dev/patchwork/patch/12222/
>>>>>> [2]: http://dpdk.org/ml/archives/dev/2015-December/030326.html
>>>>>> [3]:
>>>>>> https://git.launchpad.net/~ubuntu-server/dpdk/commit/?h=ubuntu-xenial-to-dpdk2.2&id=f3c7aa1b2ddea8e092ad4a89e41a0e19d01ed4e7
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>
>>>>>>> I think originally we quickly discussed 4 possible solutions (and
>>>>>>> hopefully I captured them correctly):
>>>>>>>
>>>>>>> 1. OVS downgrades to the ovs user, and kvm runs under the ovs
>>>>>>>    group.  I don't actually like this solution because kvm could then
>>>>>>>    pollute the ovs database.
>>>>>>>
>>>>>>> 2. OVS runs as some user and sets the user/group ownership of the socket
>>>>>>>    via chown/chmod where permissions come from the database (the
>>>>>>>    original context had ovs running as root - but as I described above
>>>>>>>    it doesn't need to be root provided ovs+DPDK can start without root).
>>>>>>>
>>>>>>> 3. OVS runs as some user, kvm starts as root, opens the socket and
>>>>>>>    downgrades.  IIRC, this doesn't actually work, or it may have
>>>>>>>    implications on other projects.  I don't remember exactly what was
>>>>>>>    not as great about this solution, TBH.
>>>>>>>
>>>>>>> 4. OVS and KVM run as whatever users; MAC is used to enforce the
>>>>>>>    layering between them.
>>>>>>>
>>>>>>> I think solution 2 and solution 4 don't actually interfere with each
>>>>>>> other, and can be used to a complementary effect (if implemented
>>>>>>> properly) so that the MAC layer enforces access, but even without MAC,
>>>>>>> the DAC layer can provide appropriate whitelisting behavior.
>>>>>>>
>>>>>>
>>>>>> I also remember several complex changes needed for the #1 and #3 that
>>>>>> always would end up with huge effort and a high risk not being accepted.
>>>>>> Probably that is what you refer to with "implications on other projects".
>>>>>>
>>>>>> Also keep in mind the position of dpdk out of the last few discussions
>>>>>> which I'd like to summarize as "dpdk got this path from an app, so this 
>>>>>> app
>>>>>> OWNS that path".
>>>>>
>>>>> I'd like to continue on, but I am not sure what the concerns are right
>>>>> now.  Is it possible to enumerate them point by point so that I can
>>>>> understand them?  I think there are two outstanding concerns right now:
>>>>>
>>>>> 1. the proposed approach is not good enough (vis-a-vis DAC vs. MAC)
>>>>>
>>>>> 2. the proposed approach would be better implemented in the utility
>>>>>    that wants access to the sockets (vis-a-vis the libvirt discussion)
>>>>>
>>>>> Am I understanding the concerns correctly?
>>>>
>>>>Ping?
>>>
>>> I found another theoretical problem with the chmod approach, let me try to
>>> explain:
>>>
>>> There's an extremely small race window between the socket creation and the
>>> chmod which could theoretically be exploited to change the owner of a socket
>>> (e.g. ovnsb_db.sock) in ovs rundir, by controlling the name of the port:
>>>
>>> 1. There's no southbound database running, because it's not yet been
>>>    started or because it's being restarted.
>>> 2. The user creates a vhost port, naming it ovnsb_db.sock.
>>>    rte_vhost_driver_register() succeeds and creates a socket in the file
>>>    system.
>>> 3. The southbound database is started, it removes ovnsb_db.sock and 
>>> recreates
>>>    it.
>>> 4. Now OVS changes the owner and the permission of what it thinks is a
>>>    vhost-user socket.
>>>
>>> If 3 manages to get between 2 and 4, we have a problem. It's a pretty small
>>> window, and it's unlikely that an attacker can control when the southbound
>>> database is restarted.
>>>
>>> I feel like I'm nitpicking, but I'm not sure how serious is the security
>>> impact of what I'm describing.
>>>
>>> I suggested an alternative approach, and I've tried implementing a quick
>>> POC on top on your patch:
>>>
>>> ---8<---
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 24ebb41..d7adc66 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -30,6 +30,7 @@
>>>  #include <sys/types.h>
>>>  #include <sys/stat.h>
>>>  #include <getopt.h>
>>> +#include <sys/fsuid.h>
>>>
>>>  #include "chutil.h"
>>>  #include "dirs.h"
>>> @@ -891,6 +892,17 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>>       */
>>>      snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>>>               vhost_sock_dir, name);
>>> +    uid_t orig_u = geteuid();
>>> +    gid_t orig_g = getegid();
>>> +    if (vhost_sock_def_owner) {
>>> +        uid_t u;
>>> +        gid_t g;
>>> +        if (!ovs_strtousr(vhost_sock_def_owner, &u, NULL, &g, false)) {
>>> +            VLOG_INFO("UID: %d GID: %d", u, g);
>>> +            setfsuid(u);
>>> +            setfsgid(g);
>>> +        }
>>> +    }
>>>
>>>      err = rte_vhost_driver_register(dev->vhost_id);
>>>      if (err) {
>>> @@ -903,16 +915,12 @@ netdev_dpdk_vhost_user_construct(struct netdev 
>>> *netdev)
>>>          err = vhost_construct_helper(netdev);
>>>      }
>>>
>>> -    ovs_mutex_unlock(&dpdk_mutex);
>>> -    if (!err && vhost_sock_def_owner &&
>>> -        (err = ovs_chown(dev->vhost_id, vhost_sock_def_owner))) {
>>> -        VLOG_ERR("vhost-user socket device ownership change failed.");
>>> +    if (vhost_sock_def_owner) {
>>> +        setfsuid(orig_u);
>>> +        setfsgid(orig_g);
>>>      }
>>>
>>> -    if (!err && vhost_sock_def_perms &&
>>> -        (err = ovs_chmod(dev->vhost_id, vhost_sock_def_perms))) {
>>> -        VLOG_ERR("vhost-user socket device permission change failed.");
>>> -    }
>>> +    ovs_mutex_unlock(&dpdk_mutex);
>>>
>>>      return err;
>>>  }
>>>
>>> ---8<---
>>>
>>> Compared to the chmod approach this has some limitation:
>>>
>>> 1. It doesn't support changing permissions, only the owner.  This
>>>    could be done with umask, but I couldn't find any system call
>>>    to change the umask for a single thread.
>>> 2. Unless vhost-sock-dir is owned by the target owner, the socket
>>>    cannot be created.  I'm not sure whether this is a reasonable
>>>    limitation for the use cases you have in mind.
>>> 3. setfsuid() is Linux specific and somehow deprecated according
>>>    to the manpage:
>>>
>>>    "Thus, setfsuid() is nowadays unneeded and should be avoided
>>>    in new applications"
>>>
>>>    I haven't used seteuid, because it changes the euid of the whole
>>>    process and that may interfere with other operations on OVS.
>>
>> Thanks for this PoC, and explanation.  I agree, there is a race, and I'd
>> like to work on trying to solve the problem.
>>
>>> If these limitations are unacceptable, I can see how we can use
>>> chmod.  After all, as you point out, it's probably better to do it
>>> in OVS than in some script.
>>
>> I think fchmod and fchown may actually be the correct calls to have, and
>> will refactor these chown/chmod utils functions as such, which (I
>> believe) avoids the race as you describe.
>>
>>> Thanks for your patience in solving this problem,
>>
>> Thanks for your reviews!
>>
>>> Daniele
>
> I've been trying to find a way to solve this in a portable fashion, but
> it appears to either require changes in DPDK to let the application
> create the unix server socket, or a linux-only guarantee with a
> non-linux racy fallback.  My attempt at the former change was NAK'd by
> upstream DPDK.  If the latter is unacceptable, then we can drop these
> patches and hope that client mode will be a workable solution.
>
> Thoughts?
Do you mean as if Open vSwitch would be the one connecting to DPDK
socket (the client and server roles swapped)?

If so, then I sent a similar patch, but for OpenFlow controller
sockets - http://openvswitch.org/pipermail/dev/2016-June/073733.html.
Basically --no-self-confinement flag makes OVS to enter in mode where
it can connect to sockets outside the /var/run/openvswitch directory.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to