Thanks so much for the review, Ben! Wall of text follows (apologies), so I suggest having your breakfast first :)
Ben Pfaff <b...@ovn.org> writes: > On Mon, Aug 01, 2016 at 08:24:40AM -0400, Aaron Conole wrote: >> Aaron Conole <acon...@redhat.com> writes: >> >> > Currently, when using Open vSwitch with DPDK and qemu guests, the >> > recommended >> > method for joining the guests is via the dpdkvhostuser interface. This >> > interface uses Unix Domain sockets to communicate. When these sockets are >> > created, they inherit the permissions and ownership from the >> > vswitchd process. >> > This can lead to an undesirable state where the QEMU process cannot use the >> > socket file until manual intervention is performed (via `chown` >> > and/or `chmod` >> > calls). >> > >> > This patchset gives the ability to set the permissions and ownership of all >> > dpdkvhostuser sockets from the database, avoiding the manual intervention >> > required to connect QEMU and OVS via DPDK. >> > >> > The first patch adds chmod and chown calls to lib, with unit tests. The >> > second patch hooks those calls into the netdev_dpdk_vhost_user_construct >> > function, after the socket is created. >> > >> > Changes from v2: >> > * Added a new 2nd patch to series for chmod/chown on already opened files. >> > There exist known implementations for other systems, including >> > FreeBSD, but >> > only linux is implemented. ENOTSUP is set when these calls fail >> > on non-linux >> > systems. >> >> Ping. I was hoping to have this applied before the 2.6 branch was cut; >> is that possible? > > The code in patch 2, which goes through every fd in /proc to find the > socket it needs because DPDK will not tell it the fd and will not set > the needed permissions, is appalling. It may be the biggest kluge I've > seen this year. Ouch. I'm sure that's an accomplishment I don't want associated with my name. Do you take issue with the design (I agree it's awful to need anything like this)? Did the implementation really smell that bad? > This would be slightly less appalling if it used fstat() and > getsockname() instead of /proc. Still not great. I disagree - patch 2/3 wouldn't exist. That goes to the last sentence in this. Both of those calls will work for a descriptor (and notice that patch 1/3 uses them). The issue I have is trying to go from a pathname to a descriptor (which is not a standardized interface for a plethora of reasons). If I had the descriptor, though, patch 2/3 would never have been submitted (because it would be unneeded - I do fchmod/fchown in patch 1/3). > This definitely needs comments to explain why it exists and what it's > doing. There were a few goals for me when writing this API: 1. Do not set permissions of files which OvS isn't supposed to own. 2. Avoid TOCTOU[1][2] races inherent when using POSIX calls to set permissions (and advanced things like filesystem maze attacks[3], etc). 3. Work as 'generically' as possible. I originally implemented this using openat, fstatat, and fchmodat, but it turns out that fchmodat with AT_SYMLINK_NOFOLLOW does not work under linux. The suggested workaround[4] from the glibc folks is to use openat(..., O_PATH|O_NOFOLLOW), then fstat() and fchmod() the resulting fd. However, we cannot use this approach because openat() is not for use with sockets (and a quick test just now on my system is showing that this won't work). I don't want to think about what kinds of buggy network stack we'd have to use that would allow the client to connect() and then set the permissions of the server... While sitting in my cube cursing out the kind of mess this all ends up being, I had a thought that if I can just get a way of mapping the name we passed to the open() or bind() call, then we can use fchmod/fchown (with the caveat that fchmod/fchown on unix domain sockets is linux specific behavior). This is a short-circuit of the above attempts to open / access the file descriptor. If it is already open, then the mapping is immutable anyway. For some of the "light" reading I did on the subject, see the end of this email. > This could be properly fixed in DPDK by adding a 1-line helper function > that returns the fd for a vhost-user server. +1 I agree strongly. However, Christian Ehrhardt and myself have each submitted patches[5][6] to try and make this kind of thing non-odius. These patches have not been accepted. If you think that patch 2/3 is that maloderous that it cannot be accepted, then there will always probably be this kind of disjointed file-system permissions gap for vhost-user server mode interfaces. Unless you have a better suggestion - I'm all ears then :) Thanks again, -Aaron References: 1: https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use 2: http://www.informit.com/articles/article.aspx?p=23947&seqNum=3 3: https://www.usenix.org/legacy/event/fast08/tech/full_papers/tsafrir/tsafrir_html/ 4: https://sourceware.org/bugzilla/show_bug.cgi?id=14578 5: http://dpdk.org/dev/patchwork/patch/12222/ 6: http://dev.dpdk.narkive.com/4xlJytoX/dpdk-dev-rfc-librte-vhost-add-unix-domain-socket-fd-registration Some additional reading if you feel bored (and I can keep these coming): * https://www.usenix.org/legacy/event/sec05/tech/full_papers/borisov/borisov.pdf * http://www.cercs.gatech.edu/tech-reports/tr2006/git-cercs-06-08.pdf * http://www.snia.org/sites/default/orig/sdc_archives/2008_presentations/thursday/DilmaDaSilva_TOCTTOU_Race.pdf _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev