Ben Pfaff <b...@ovn.org> writes: > On Thu, Aug 11, 2016 at 11:33:59AM -0400, Aaron Conole wrote: >> Ben Pfaff <b...@ovn.org> writes: >> >> > On Mon, Aug 01, 2016 at 08:24:40AM -0400, Aaron Conole wrote: >> >> Aaron Conole <acon...@redhat.com> writes: >> >> >> >> > Currently, when using Open vSwitch with DPDK and qemu guests, the >> >> > recommended >> >> > method for joining the guests is via the dpdkvhostuser interface. This >> >> > interface uses Unix Domain sockets to communicate. When these sockets >> >> > are >> >> > created, they inherit the permissions and ownership from the >> >> > vswitchd process. >> >> > This can lead to an undesirable state where the QEMU process >> >> > cannot use the >> >> > socket file until manual intervention is performed (via `chown` >> >> > and/or `chmod` >> >> > calls). >> >> > >> >> > This patchset gives the ability to set the permissions and >> >> > ownership of all >> >> > dpdkvhostuser sockets from the database, avoiding the manual >> >> > intervention >> >> > required to connect QEMU and OVS via DPDK. >> >> > >> >> > The first patch adds chmod and chown calls to lib, with unit tests. The >> >> > second patch hooks those calls into the netdev_dpdk_vhost_user_construct >> >> > function, after the socket is created. >> >> > >> >> > Changes from v2: >> >> > * Added a new 2nd patch to series for chmod/chown on already >> >> > opened files. >> >> > There exist known implementations for other systems, including >> >> > FreeBSD, but >> >> > only linux is implemented. ENOTSUP is set when these calls fail >> >> > on non-linux >> >> > systems. >> >> >> >> Ping. I was hoping to have this applied before the 2.6 branch was cut; >> >> is that possible? >> > >> > The code in patch 2, which goes through every fd in /proc to find the >> > socket it needs because DPDK will not tell it the fd and will not set >> > the needed permissions, is appalling. It may be the biggest kluge I've >> > seen this year. >> >> Ouch. I'm sure that's an accomplishment I don't want associated with my >> name. Do you take issue with the design (I agree it's awful to need >> anything like this)? Did the implementation really smell that bad? > > It's not about the code (and it's certainly not ad hominem). It's about > the whole technique. The idea of opening a socket, then scanning the > whole fd table by reading through the file system to determine the > number of the fd that was opened, is appalling. > >> > This would be slightly less appalling if it used fstat() and >> > getsockname() instead of /proc. Still not great. >> >> I disagree - patch 2/3 wouldn't exist. [...] > > The approach I was talking about was to scan the fd table using a loop > over the valid fds, calling fstat() on each to find out whether it's a > socket then getsockname() to find out whether it's the right socket. > > Come to think of it, I don't know what happens if you try to > getsockname() on a socket whose name is longer than whatever fits in > sockaddr_un. Dammit. > >> [...] That goes to the last sentence >> in this. Both of those calls will work for a descriptor (and notice >> that patch 1/3 uses them). The issue I have is trying to go from a >> pathname to a descriptor (which is not a standardized interface for a >> plethora of reasons). If I had the descriptor, though, patch 2/3 would >> never have been submitted (because it would be unneeded - I do >> fchmod/fchown in patch 1/3). > > Yes, right. I understand the problem, I just don't understand why it > exists. The DPDK maintainers can solve it instantly in 5 minutes of > work. > >> > This definitely needs comments to explain why it exists and what it's >> > doing. > > Thanks for the explanation. In this case, I meant that the *code* needs > to have comments, since it had none. It took quite a while from looking > at the code to figure out what was going on, and longer to figure out > why. Comments should make this obvious to the reader. > > [...] > >> > This could be properly fixed in DPDK by adding a 1-line helper function >> > that returns the fd for a vhost-user server. >> >> +1 >> I agree strongly. However, Christian Ehrhardt and myself have each >> submitted patches[5][6] to try and make this kind of thing non-odius. >> These patches have not been accepted. > > Neither of those is a simple "return server->listen_fd;" function. Any > chance that would be accepted? It would solve the problem just fine.
I don't know, but I'll outline my plans, and if you agree, then I'll put them into action. First, I'll drop patch 2/3. Second, I'll take your suggestion below, and implement it in the netdev-dpdk area (as part of "patch 3/3", which will become 2/2). I'll submit the above by mid-day Friday, tested. Third, I'll cook up the DPDK server socket fd patch, and try to get that part of a future DPDK release. If accepted, then when OvS ports to that release, I will submit a patch removing your suggestion, and using the file descriptor instead (or whatever becomes of it). If this sounds acceptable, I'll start in on it ASAP. >> If you think that patch 2/3 is that maloderous that it cannot be >> accepted, then there will always probably be this kind of disjointed >> file-system permissions gap for vhost-user server mode interfaces. >> Unless you have a better suggestion - I'm all ears then :) > > I do have one suggestion. Something like this: > > for (int i = 0; i < 10; i++) { > /* Find the first available file descriptor, then close it. */ > int fd = socket(...); > close(fd); > > /* Create server socket. */ > server = dpdk_create_server_socket(filename, ...); > > /* Check that fd is the new server socket fd. > * This is normally true unless some other thread created a fd > * concurrently. */ > struct stat s; > struct sockaddr_un sun; > socklen_t sun_len = sizeof sun; > if (!stat(fd, &s) && S_ISSOCK(s.st_mode) && > getsockname(fd, &sun, &sun_len) && !strcmp(filename, sun.sun_path)) { > /* Yay, we found the fd. */ > return success; > } > > /* Race. Try again. */ > dpdk_close_server_socket(server); > } Thanks, Ben. I am not opposed to going with this approach. Unfortunately, there are no truly good solutions I see here at the moment. -Aaron _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev