Re: [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap

Dr. David Alan Gilbert Mon, 11 Dec 2017 07:46:23 -0800

* Igor Mammedov (imamm...@redhat.com) wrote:
> On Mon, 11 Dec 2017 11:03:00 +0000
> "Dr. David Alan Gilbert" <dgilb...@redhat.com> wrote:
> 
> > * Igor Mammedov (imamm...@redhat.com) wrote:
> > > On Fri, 8 Dec 2017 17:51:56 +0000
> > > "Dr. David Alan Gilbert" <dgilb...@redhat.com> wrote:
> > >   
> > > > * Igor Mammedov (imamm...@redhat.com) wrote:  
> > > > > On Thu, 7 Dec 2017 18:17:51 +0000
> > > > > "Dr. David Alan Gilbert" <dgilb...@redhat.com> wrote:
> > > > >     
> > > > > > * Igor Mammedov (imamm...@redhat.com) wrote:    
> > > > > > > vhost_verify_ring_mappings() were used to verify that
> > > > > > > rings are still accessible and related memory hasn't
> > > > > > > been moved after flatview is updated.
> > > > > > > 
> > > > > > > It were doing checks by mapping ring's GPA+len and
> > > > > > > checking that HVA hasn't changed with new memory map.
> > > > > > > To avoid maybe expensive mapping call, we were
> > > > > > > identifying address range that changed and were doing
> > > > > > > mapping only if ring were in changed range.
> > > > > > > 
> > > > > > > However it's no neccessary to perform ringi's GPA
> > > > > > > mapping as we already have it's current HVA and all
> > > > > > > we need is to verify that ring's GPA translates to
> > > > > > > the same HVA in updated flatview.
> > > > > > > 
> > > > > > > That could be done during time when we are building
> > > > > > > vhost memmap where vhost_update_mem_cb() already maps
> > > > > > > every RAM MemoryRegionSection to get section HVA. This
> > > > > > > fact can be used to check that ring belongs to the same
> > > > > > > MemoryRegion in the same place, like this:
> > > > > > > 
> > > > > > >   hva_ring_offset = GPA(ring) - GPA(MemoryRegionSection)
> > > > > > >   ring_hva == HVA(MemoryRegionSection) + hva_ring_offset
> > > > > > > 
> > > > > > > Doing this would allow us to drop ~100LOC of code which
> > > > > > > figures out changed memory range and avoid calling not
> > > > > > > needed reverse vhost_memory_map(is_write=true) which is
> > > > > > > overkill for the task at hand.      
> > > > > > 
> > > > > > There are a few things about this I don't understand; however if
> > > > > > it's right I agree that it means we can kill off my comparison
> > > > > > code.
> > > > > > 
> > > > > > 
> > > > > > First, can I check what constraints 'verify_ring' needs to check;
> > > > > > if I'm understanding the original code it's checking that :
> > > > > >     a) If a queue falls within a region, that the whole queue can
> > > > > >        be mapped    
> > > > >          see vhost_verify_ring_part_mapping():
> > > > > 
> > > > >              p = vhost_memory_map(dev, part_addr, &l, 1);             
> > > > >                     
> > > > >              if (!p || l != part_size) 
> > > > >                   error_out
> > > > >              
> > > > >          1st: (!p) requested part_addr must be mapped
> > > > >               i.e. be a part of MemorySection in flatview
> > > > >          AND
> > > > >          2nd: (l != part_size) mapped range size must match what we 
> > > > > asked for
> > > > >               i.e. mapped as continuous range so that
> > > > >                  [GPA, GPA + part_size) could directly correspond to 
> > > > > [HVA, HVA + part_size)
> > > > >               and that's is possible only withing 1 MemorySection && 
> > > > > 1 MeoryRegion
> > > > >               if I read address_space_map() correctly
> > > > >                      flatview_translate() -> GPA's MemorySection
> > > > >                      flatview_extend_translation() -> 1:1 GPA->HVA 
> > > > > range size
> > > > >               
> > > > >          So answer in case of RAM memory region that we are 
> > > > > interested in, would be:
> > > > >          queue falls within a MemorySection and whole queue fits in 
> > > > > to it    
> > > > 
> > > > Yes, OK.
> > > >   
> > > > > >     b) And the start of the queue corresponds to where we thought
> > > > > >        it used to be (in GPA?)    
> > > > >          that cached at vq->desc queue HVA hasn't changed after 
> > > > > flatview change
> > > > >             if (p != part)
> > > > >                error_out    
> > > > 
> > > > OK, so it's the HVA that's not changed based on taking the part_addr
> > > > which is GPA and checking the map?  
> > > Yes, I think so.
> > >   
> > > > > >    so that doesn't have any constraint on the ordering of the 
> > > > > > regions
> > > > > >    or whether the region is the same size or anything.
> > > > > >   Also I don't think it would spot if there was a qeueue that had no
> > > > > >   region associated with it at all.
> > > > > > 
> > > > > > Now, comments on your code below:
> > > > > > 
> > > > > >     
> > > > > > > Signed-off-by: Igor Mammedov <imamm...@redhat.com>
> > > > > > > ---
> > > > > > > PS:
> > > > > > >    should be applied on top of David's series
> > > > > > > 
> > > > > > > ---
> > > > > > >  include/hw/virtio/vhost.h |   2 -
> > > > > > >  hw/virtio/vhost.c         | 195 
> > > > > > > ++++++++++++++--------------------------------
> > > > > > >  2 files changed, 57 insertions(+), 140 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > > > > > > index 467dc77..fc4af5c 100644
> > > > > > > --- a/include/hw/virtio/vhost.h
> > > > > > > +++ b/include/hw/virtio/vhost.h
> > > > > > > @@ -68,8 +68,6 @@ struct vhost_dev {
> > > > > > >      uint64_t log_size;
> > > > > > >      Error *migration_blocker;
> > > > > > >      bool memory_changed;
> > > > > > > -    hwaddr mem_changed_start_addr;
> > > > > > > -    hwaddr mem_changed_end_addr;
> > > > > > >      const VhostOps *vhost_ops;
> > > > > > >      void *opaque;
> > > > > > >      struct vhost_log *log;
> > > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > > > > > index 5b9a7d7..026bac3 100644
> > > > > > > --- a/hw/virtio/vhost.c
> > > > > > > +++ b/hw/virtio/vhost.c
> > > > > > > @@ -296,35 +296,43 @@ static void vhost_memory_unmap(struct 
> > > > > > > vhost_dev *dev, void *buffer,
> > > > > > >      }
> > > > > > >  }
> > > > > > >  
> > > > > > > -static int vhost_verify_ring_part_mapping(struct vhost_dev *dev,
> > > > > > > -                                          void *part,
> > > > > > > -                                          uint64_t part_addr,
> > > > > > > -                                          uint64_t part_size,
> > > > > > > -                                          uint64_t start_addr,
> > > > > > > -                                          uint64_t size)
> > > > > > > +static int vhost_verify_ring_part_mapping(void *ring_hva,
> > > > > > > +                                          uint64_t ring_gpa,
> > > > > > > +                                          uint64_t ring_size,
> > > > > > > +                                          void *reg_hva,
> > > > > > > +                                          uint64_t reg_gpa,
> > > > > > > +                                          uint64_t reg_size)
> > > > > > >  {
> > > > > > > -    hwaddr l;
> > > > > > > -    void *p;
> > > > > > > -    int r = 0;
> > > > > > > +    uint64_t hva_ring_offset;
> > > > > > > +    uint64_t ring_last = range_get_last(ring_gpa, ring_size);
> > > > > > > +    uint64_t reg_last = range_get_last(reg_gpa, reg_size);
> > > > > > >  
> > > > > > > -    if (!ranges_overlap(start_addr, size, part_addr, part_size)) 
> > > > > > > {
> > > > > > > +    /* start check from the end so that the rest of checks
> > > > > > > +     * are done when whole ring is in merged sections range
> > > > > > > +     */
> > > > > > > +    if (ring_last < reg_last || ring_gpa > reg_last) {
> > > > > > >          return 0;
> > > > > > >      }      
> > > > > > 
> > > > > >   so does that mean if our region never grows to be as big as the 
> > > > > > ring
> > > > > > we wont spot the problem?    
> > > > > I think there is mistake here it should be:
> > > > >    ring_last < reg_gpa || ring_gpa > reg_last
> > > > > 
> > > > > so if ring is out side of continuous region, we do not care => no 
> > > > > change    
> > > > 
> > > > OK, I don't see how that corresponds to your 'start check from the end'
> > > > comment - I thought it was doing something smarter to deal with this
> > > > being called from the _cb routine potentially before another part of the
> > > > range had been joined onto it.
> > > > In that case, we can just use ranges_overlap like the original
> > > > routine did.  
> > > I suppose range check will do as well, the reason for check from the end
> > > was optimization to make less checks than ranges_overlap(),
> > > considering we are comparing against vhost_memory_region.  
> > 
> > Have a look at the RFC v2 I've posted; I've reworked it a bit so
> > we call this code aftwards rather than from the _cb.
> I'll skimmed through it already and it looks good to me on the first glance


OK, great.

> /it needs spelling fixes at least/

Sounds normal for my code!  I'll go through and try and clean them
up; I'll also try and make sure it's bisectable.

> I'll do line by line review and testing this week

Thanks.

> > 
> > > But it seems like a bit an overdoing, since by the commit time
> > > flatview would contain 1:1 mapping of MemorySection:vhost_memory_region
> > > (modulo sections we are not interested in). It should be unlikely to 
> > > get 2 MemoryRegions allocated one after another and merge in 
> > > vhost_memory_region()
> > > into one vhost_memory_region. Maybe we would be better off with just
> > > copying MemorySections into vhost_memory_region in vhost_update_mem_cb()
> > > and drop merging altogether.  
> > 
> > Except I need the merging for the hugepage stuff that's coming in the
> > postcopy world.
> > 
> > > I'd even consider 'non deterministic' merging of 2 MemoryRegions harmful
> > > to migration, since vhost might see different memory map on target vs 
> > > source.  
> > 
> > I think that's come up before and it was decided it's not a problem
> > as long as the regions are still mapped; the only consistency required
> > is between the qemu and the vhost (either the kernel or the vhost-user);
> > it shouldn't be a guest visibile issue if I understand it correctly.
> non consistent merging will cause change in number of memory regions
> in vhost memmap and even if it's not visible to guests it's still
> migration blocker in borderline cases where number of regions
> is around vhost_backend_memslots_limit(). i.e. it source QEMU starts
> fine by luck (merging happens) but target fails as it hits limit (merging 
> fails).

Yep, but it's weighed against merging meaning that you take less
slots up so you're less likely to run into the limit.
IMHO the real problem is having a fixed small arbitrary number of slots
rather than something dynamic.

> Anyways it's problem that exists in current code as well and
> not related to this series, so we can ponder on it later
> (perhaps we don't see the issue because merging doesn't happen
> in practice).

I suspect it does but you get very little variation and very rarely
get near the limit with merging, I think the
worst case is there's a VGA mapping case which can be a bit random
but that probably happens rarely with a lot of regions.

I tested a simple world with real vhost (rather than user) and it
is surviving hotplug RAM.

Dave
> 
>  
> > Dave
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC] vhost: check if ring mapping is still valid when building memmap

Reply via email to