On Fri, Nov 18, 2016 at 02:26:21AM -0500, Jeff Cody wrote: > On Wed, Nov 16, 2016 at 08:12:41AM +0000, Stefan Hajnoczi wrote: > > On Tue, Nov 15, 2016 at 10:38 PM, ashish mittal <ashmit...@gmail.com> wrote: > > > On Wed, Sep 28, 2016 at 2:45 PM, Stefan Hajnoczi <stefa...@gmail.com> > > > wrote: > > >> On Tue, Sep 27, 2016 at 09:09:49PM -0700, Ashish Mittal wrote: > > >> 5. > > >> I don't see any endianness handling or portable alignment of struct > > >> fields in the network protocol code. Binary network protocols need to > > >> take care of these issue for portability. This means libqnio compiled > > >> for different architectures will not work. Do you plan to support any > > >> other architectures besides x86? > > >> > > > > > > No, we support only x86 and do not plan to support any other arch. > > > Please let me know if this necessitates any changes to the configure > > > script. > > > > I think no change to ./configure is necessary. The library will only > > ship on x86 so other platforms will never attempt to compile the code. > > > > >> 6. > > >> The networking code doesn't look robust: kvset uses assert() on input > > >> from the network so the other side of the connection could cause SIGABRT > > >> (coredump), the client uses the msg pointer as the cookie for the > > >> response packet so the server can easily crash the client by sending a > > >> bogus cookie value, etc. Even on the client side these things are > > >> troublesome but on a server they are guaranteed security issues. I > > >> didn't look into it deeply. Please audit the code. > > >> > > > > > > By design, our solution on OpenStack platform uses a closed set of > > > nodes communicating on dedicated networks. VxHS servers on all the > > > nodes are on a dedicated network. Clients (qemu) connects to these > > > only after reading the server IP from the XML (read by libvirt). The > > > XML cannot be modified without proper access. Therefore, IMO this > > > problem would be relevant only if someone were to use qnio as a > > > generic mode of communication/data transfer, but for our use-case, we > > > will not run into this problem. Is this explanation acceptable? > > > > No. The trust model is that the guest is untrusted and in the worst > > case may gain code execution in QEMU due to security bugs. > > > > You are assuming block/vxhs.c and libqnio are trusted but that > > assumption violates the trust model. > > > > In other words: > > 1. Guest exploits a security hole inside QEMU and gains code execution > > on the host. > > 2. Guest uses VxHS client file descriptor on host to send a malicious > > packet to VxHS server. > > 3. VxHS server is compromised by guest. > > 4. Compromised VxHS server sends malicious packets to all other > > connected clients. > > 5. All clients have been compromised. > > > > This means both the VxHS client and server must be robust. They have > > to validate inputs to avoid buffer overflows, assertion failures, > > infinite loops, etc. > > > > Stefan > > > The libqnio code is important with respect to the VxHS driver. It is a bit > different than other existing external protocol drivers, in that the current > user and developer base is small, and the code itself is pretty new. So I > think for the VxHS driver here upstream, we really do need to get some of > the libqnio issues squared away. I don't know if we've ever explicitly > address the extent to which libqnio issues affect the driver > merging, so I figure it is probably worth discussing here. > > To try and consolidate libqnio discussion, here is what I think I've read / > seen from others as the major issues that should be addressed in libqnio: > > * Code auditing, static analysis, and general code cleanup. Things like > memory leaks shouldn't be happening, and some prior libqnio compiler > warnings imply that there is more code analysis that should be done with > libqnio. > > (With regards to memory leaks: Valgrind may be useful to track these down: > > # valgrind ./qemu-io -c 'write -pP 0xae 66000 128k' \ > vxhs://localhost/test.raw > > ==30369== LEAK SUMMARY: > ==30369== definitely lost: 4,168 bytes in 2 blocks > ==30369== indirectly lost: 1,207,720 bytes in 58,085 blocks) > > * Potential security issues such as buffer overruns, input validation, etc., > need to be audited. > > * Async operations need to be truly asynchronous, without blocking calls. > > * Daniel pointed out that there is no authentication method for taking to a > remote server. This seems a bit scary. Maybe all that is needed here is > some clarification of the security scheme for authentication? My > impression from above is that you are relying on the networks being > private to provide some sort of implicit authentication, though, and this > seems fragile (and doesn't protect against a compromised guest or other > process on the server, for one).
While relying on some kind of private network may have been acceptable 10 years ago, I don't think it is a credible authentication / security strategy in the current (increasingly) hostile network environments. You really have to assume as a starting position that even internal networks are compromised these days. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|