vhostfd options

Michael S. Tsirkin Sun, 28 Feb 2010 13:05:34 -0800

On Sun, Feb 28, 2010 at 02:57:56PM -0600, Anthony Liguori wrote:
> On 02/28/2010 11:19 AM, Michael S. Tsirkin wrote:
>>> Both have  security implications so I think it's important that they
>>> be addressed.   Otherwise, I'm pretty happy with how things are.
>>>      
>> Care suggesting some solutions?
>>    
>
> The obvious thing to do would be to use the memory notifier in vhost to  
> keep track of whenever something remaps the ring's memory region and if  
> that happens, issue an ioctl to vhost to change the location of the  
> ring.


It would be easy to do, but what I wondered about, is what happens in the
guest meanwhile. Which ring address has the correct descriptors: the old
one?  The new one? Both?  This question leads me to the belief that well-behaved
guest will never encounter this.

>  Also, you would need to merge the vhost slot management code with  
> the KVM slot management code.
>
> I'm sympathetic to your arguments though.  As qemu is today, the above  
> is definitely the right thing to do.  But ram is always ram and ram  
> always has a fixed (albeit non-linear) mapping within a guest.  We can  
> probably be smarter in qemu.
>
> There are areas of MMIO/ROM address space that *sometimes* end up  
> behaving like ram, but that's a different case.  The one other case to  
> consider is ram hot add/remove in which case, ram may be removed or  
> added (but it's location will never change during its lifetime).
>
> Here's what I'll propose, and I'd really like to hear what Paul think  
> about it before we start down this path.
>
> I think we should add a new API that's:
>
> void cpu_ram_add(target_phys_addr_t start, ram_addr_t size);
>
> This API would do two things.  It would call qemu_ram_alloc() and  
> cpu_register_physical_memory() just as code does today.  It would also  
> add this region into a new table.
>
> There would be:
>
> void *cpu_ram_map(target_phys_addr_t start, ram_addr_t *size);
> void cpu_ram_unmap(void *mem);
>
> These calls would use this new table to lookup ram addresses.  These  
> mappings are valid as long as the guest is executed.  Within the table,  
> each region would have a reference count.  When it comes time to do hot  
> add/remove, we would wait to remove a region until the reference count  
> went to zero to avoid unmapping during DMA.
>
> cpu_ram_add() never gets called with overlapping regions.  We'll modify  
> cpu_register_physical_memory() to ensure that a ram mapping is never  
> changed after initial registration.
>
> vhost no longer needs to bother keeping the dynamic table up to date so  
> it removes all of the slot management code from vhost.  KVM still needs  
> the code to handle rom/ram mappings but we can take care of that next.   
> virtio-net's userspace code can do the same thing as vhost and only map  
> the ring once which should be a big performance improvement.
>
> It also introduces a place to do madvise() reset registrations.
>
> This is definitely appropriate for target-i386.  I suspect it is for  
> other architectures too.
>
> Regards,
>
> Anthony Liguori
>
>>>      
>>>>> Furthermore, vhost reduces a virtual machine's security.  It offers an
>>>>> impressive performance boost (particularly when dealing with 10gbit+
>>>>> networking) but for a user that doesn't have such strong networking
>>>>> performance requirements, I think it's reasonable for them to not want
>>>>> to make a security trade off.
>>>>>
>>>>>          
>>>> It's hard for me to see how it reduces VM security. If it does, it's
>>>> not by design and will be fixed.
>>>>
>>>>        
>>> If you have a bug in vhost-net (would never happen of course) then it's
>>> a host-kernel exploit whereas if we have a bug in virtio-net userspace,
>>> it's a local user exploit.  We have a pretty robust architecture to deal
>>> with local user exploits (qemu can run unprivilieged, SELinux enforces
>>> mandatory access control) but a host-kernel can not be protected against.
>>>
>>> I'm not saying that we should never put things in the kernel, but
>>> there's definitely a security vs. performance trade off here.
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>>      
>> Not sure I get the argument completely. Any kernel service with a bug
>> might be exploited for priveledge escalation. Yes, more kernel code
>> gives you more attack surface, but given we use rich interfaces such as
>> ones exposed by kvm, I am not sure by how much.
>>
>> Also note that vhost net does not take qemu out of the equation for
>> everything, just for datapath operations.
>>
>>

[Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

Reply via email to