On 02/28/2010 11:19 AM, Michael S. Tsirkin wrote:
Both have security implications so I think it's important that they
be addressed. Otherwise, I'm pretty happy with how things are.
Care suggesting some solutions?
The obvious thing to do would be to use the memory notifier in vhost to
keep track of whenever something remaps the ring's memory region and if
that happens, issue an ioctl to vhost to change the location of the
ring. Also, you would need to merge the vhost slot management code with
the KVM slot management code.
I'm sympathetic to your arguments though. As qemu is today, the above
is definitely the right thing to do. But ram is always ram and ram
always has a fixed (albeit non-linear) mapping within a guest. We can
probably be smarter in qemu.
There are areas of MMIO/ROM address space that *sometimes* end up
behaving like ram, but that's a different case. The one other case to
consider is ram hot add/remove in which case, ram may be removed or
added (but it's location will never change during its lifetime).
Here's what I'll propose, and I'd really like to hear what Paul think
about it before we start down this path.
I think we should add a new API that's:
void cpu_ram_add(target_phys_addr_t start, ram_addr_t size);
This API would do two things. It would call qemu_ram_alloc() and
cpu_register_physical_memory() just as code does today. It would also
add this region into a new table.
There would be:
void *cpu_ram_map(target_phys_addr_t start, ram_addr_t *size);
void cpu_ram_unmap(void *mem);
These calls would use this new table to lookup ram addresses. These
mappings are valid as long as the guest is executed. Within the table,
each region would have a reference count. When it comes time to do hot
add/remove, we would wait to remove a region until the reference count
went to zero to avoid unmapping during DMA.
cpu_ram_add() never gets called with overlapping regions. We'll modify
cpu_register_physical_memory() to ensure that a ram mapping is never
changed after initial registration.
vhost no longer needs to bother keeping the dynamic table up to date so
it removes all of the slot management code from vhost. KVM still needs
the code to handle rom/ram mappings but we can take care of that next.
virtio-net's userspace code can do the same thing as vhost and only map
the ring once which should be a big performance improvement.
It also introduces a place to do madvise() reset registrations.
This is definitely appropriate for target-i386. I suspect it is for
other architectures too.
Regards,
Anthony Liguori
Furthermore, vhost reduces a virtual machine's security. It offers an
impressive performance boost (particularly when dealing with 10gbit+
networking) but for a user that doesn't have such strong networking
performance requirements, I think it's reasonable for them to not want
to make a security trade off.
It's hard for me to see how it reduces VM security. If it does, it's
not by design and will be fixed.
If you have a bug in vhost-net (would never happen of course) then it's
a host-kernel exploit whereas if we have a bug in virtio-net userspace,
it's a local user exploit. We have a pretty robust architecture to deal
with local user exploits (qemu can run unprivilieged, SELinux enforces
mandatory access control) but a host-kernel can not be protected against.
I'm not saying that we should never put things in the kernel, but
there's definitely a security vs. performance trade off here.
Regards,
Anthony Liguori
Not sure I get the argument completely. Any kernel service with a bug
might be exploited for priveledge escalation. Yes, more kernel code
gives you more attack surface, but given we use rich interfaces such as
ones exposed by kvm, I am not sure by how much.
Also note that vhost net does not take qemu out of the equation for
everything, just for datapath operations.