On 3/18/21 8:47 PM, Ilya Maximets wrote: > On 3/18/21 6:52 PM, Stefan Hajnoczi wrote: >> On Wed, Mar 17, 2021 at 09:25:26PM +0100, Ilya Maximets wrote: >> Hi, >> Some questions to understand the problems that SocketPair Broker solves: >> >>> Even more configuration tricks required in order to share some sockets >>> between different containers and not only with the host, e.g. to >>> create service chains. >> >> How does SocketPair Broker solve this? I guess the idea is that >> SocketPair Broker must be started before other containers. That way >> applications don't need to sleep and reconnect when a socket isn't >> available yet. >> >> On the other hand, the SocketPair Broker might be unavailable (OOM >> killer, crash, etc), so applications still need to sleep and reconnect >> to the broker itself. I'm not sure the problem has actually been solved >> unless there is a reason why the broker is always guaranteed to be >> available? > > Hi, Stefan. Thanks for your feedback! > > The idea is to have the SocketPair Broker running right from the > boot of the host. If it will use a systemd socket-based service > activation, the socket should persist while systemd is alive, IIUC. > OOM, crash and restart of the broker should not affect existence > of the socket and systemd will spawn a service if it's not running > for any reason without loosing incoming connections. > >> >>> And some housekeeping usually required for applications in case the >>> socket server terminated abnormally and socket files left on a file >>> system: >>> "failed to bind to vhu: Address already in use; remove it and try again" >> >> QEMU avoids this by unlinking before binding. The drawback is that users >> might accidentally hijack an existing listen socket, but that can be >> solved with a pidfile. > > How exactly this could be solved with a pidfile? And what if this is > a different application that tries to create a socket on a same path? > e.g. QEMU creates a socket (started in a server mode) and user > accidentally created dpdkvhostuser port in Open vSwitch instead of > dpdkvhostuserclient. This way rte_vhost library will try to bind > to an existing socket file and will fail. Subsequently port creation > in OVS will fail. We can't allow OVS to unlink files because this > way OVS users will have ability to unlink random sockets that OVS has > access to and we also has no idea if it's a QEMU that created a file > or it was a virtio-user application or someone else. > There are, probably, ways to detect if there is any alive process that > has this socket open, but that sounds like too much for this purpose, > also I'm not sure if it's possible if actual user is in a different > container. > So I don't see a good reliable way to detect these conditions. This > falls on shoulders of a higher level management software or a user to > clean these socket files up before adding ports. > >> >>> Additionally, all applications (system and user's!) should follow >>> naming conventions and place socket files in particular location on a >>> file system to make things work. >> >> Does SocketPair Broker solve this? Applications now need to use a naming >> convention for keys, so it seems like this issue has not been >> eliminated. > > Key is an arbitrary sequence of bytes, so it's hard to call it a naming > convention. But they need to know keys, you're right. And to be > careful I said "eliminates most of the inconveniences". :) > >> >>> This patch-set aims to eliminate most of the inconveniences by >>> leveraging an infrastructure service provided by a SocketPair Broker. >> >> I don't understand yet why this is useful for vhost-user, where the >> creation of the vhost-user device backend and its use by a VMM are >> closely managed by one piece of software: >> >> 1. Unlink the socket path. >> 2. Create, bind, and listen on the socket path. >> 3. Instantiate the vhost-user device backend (e.g. talk to DPDK/SPDK >> RPC, spawn a process, etc) and pass in the listen fd. >> 4. In the meantime the VMM can open the socket path and call connect(2). >> As soon as the vhost-user device backend calls accept(2) the >> connection will proceed (there is no need for sleeping). >> >> This approach works across containers without a broker. > > Not sure if I fully understood a question here, but anyway. > > This approach works fine if you know what application to run. > In case of a k8s cluster, it might be a random DPDK application > with virtio-user ports running inside a container and want to > have a network connection. Also, this application needs to run > virtio-user in server mode, otherwise restart of the OVS will > require restart of the application. So, you basically need to > rely on a third-party application to create a socket with a right > name and in a correct location that is shared with a host, so > OVS can find it and connect. > > In a VM world everything is much more simple, since you have > a libvirt and QEMU that will take care of all of these stuff > and which are also under full control of management software > and a system administrator. > In case of a container with a "random" DPDK application inside > there is no such entity that can help. Of course, some solution > might be implemented in docker/podman daemon to create and manage > outside-looking sockets for an application inside the container, > but that is not available today AFAIK and I'm not sure if it > ever will. > >> >> BTW what is the security model of the broker? Unlike pathname UNIX >> domain sockets there is no ownership permission check. > > I thought about this. Yes, we should allow connection to this socket > for a wide group of applications. That might be a problem. > However, 2 applications need to know the 1024 (at most) byte key in > order to connect to each other. This might be considered as a > sufficient security model in case these keys are not predictable. > Suggestions on how to make this more secure are welcome.
Digging more into unix sockets, I think that broker might use SO_PEERCRED to identify at least a uid and gid of a client. This way we can implement policies, e.g. one client might request to pair it only with clients from the same group or from the same user. This is actually a great extension for the SocketPair Broker Protocol. Might even use SO_PEERSEC to enforce even stricter policies based on selinux context. > > If it's really necessary to completely isolate some connections > from other ones, one more broker could be started. But I'm not > sure what the case it should be. > > Broker itself closes the socketpair on its side, so the connection > between 2 applications is direct and should be secure as far as > kernel doesn't allow other system processes to intercept data on > arbitrary unix sockets. > > Best regards, Ilya Maximets. >