On Sun, Feb 28, 2010 at 02:57:56PM -0600, Anthony Liguori wrote: > On 02/28/2010 11:19 AM, Michael S. Tsirkin wrote: >>> Both have security implications so I think it's important that they >>> be addressed. Otherwise, I'm pretty happy with how things are. >>> >> Care suggesting some solutions? >> > > The obvious thing to do would be to use the memory notifier in vhost to > keep track of whenever something remaps the ring's memory region and if > that happens, issue an ioctl to vhost to change the location of the > ring.
It would be easy to do, but what I wondered about, is what happens in the guest meanwhile. Which ring address has the correct descriptors: the old one? The new one? Both? This question leads me to the belief that well-behaved guest will never encounter this. > Also, you would need to merge the vhost slot management code with > the KVM slot management code. > > I'm sympathetic to your arguments though. As qemu is today, the above > is definitely the right thing to do. But ram is always ram and ram > always has a fixed (albeit non-linear) mapping within a guest. We can > probably be smarter in qemu. > > There are areas of MMIO/ROM address space that *sometimes* end up > behaving like ram, but that's a different case. The one other case to > consider is ram hot add/remove in which case, ram may be removed or > added (but it's location will never change during its lifetime). > > Here's what I'll propose, and I'd really like to hear what Paul think > about it before we start down this path. > > I think we should add a new API that's: > > void cpu_ram_add(target_phys_addr_t start, ram_addr_t size); > > This API would do two things. It would call qemu_ram_alloc() and > cpu_register_physical_memory() just as code does today. It would also > add this region into a new table. > > There would be: > > void *cpu_ram_map(target_phys_addr_t start, ram_addr_t *size); > void cpu_ram_unmap(void *mem); > > These calls would use this new table to lookup ram addresses. These > mappings are valid as long as the guest is executed. Within the table, > each region would have a reference count. When it comes time to do hot > add/remove, we would wait to remove a region until the reference count > went to zero to avoid unmapping during DMA. > > cpu_ram_add() never gets called with overlapping regions. We'll modify > cpu_register_physical_memory() to ensure that a ram mapping is never > changed after initial registration. > > vhost no longer needs to bother keeping the dynamic table up to date so > it removes all of the slot management code from vhost. KVM still needs > the code to handle rom/ram mappings but we can take care of that next. > virtio-net's userspace code can do the same thing as vhost and only map > the ring once which should be a big performance improvement. > > It also introduces a place to do madvise() reset registrations. > > This is definitely appropriate for target-i386. I suspect it is for > other architectures too. > > Regards, > > Anthony Liguori > >>> >>>>> Furthermore, vhost reduces a virtual machine's security. It offers an >>>>> impressive performance boost (particularly when dealing with 10gbit+ >>>>> networking) but for a user that doesn't have such strong networking >>>>> performance requirements, I think it's reasonable for them to not want >>>>> to make a security trade off. >>>>> >>>>> >>>> It's hard for me to see how it reduces VM security. If it does, it's >>>> not by design and will be fixed. >>>> >>>> >>> If you have a bug in vhost-net (would never happen of course) then it's >>> a host-kernel exploit whereas if we have a bug in virtio-net userspace, >>> it's a local user exploit. We have a pretty robust architecture to deal >>> with local user exploits (qemu can run unprivilieged, SELinux enforces >>> mandatory access control) but a host-kernel can not be protected against. >>> >>> I'm not saying that we should never put things in the kernel, but >>> there's definitely a security vs. performance trade off here. >>> >>> Regards, >>> >>> Anthony Liguori >>> >> Not sure I get the argument completely. Any kernel service with a bug >> might be exploited for priveledge escalation. Yes, more kernel code >> gives you more attack surface, but given we use rich interfaces such as >> ones exposed by kvm, I am not sure by how much. >> >> Also note that vhost net does not take qemu out of the equation for >> everything, just for datapath operations. >> >>