On Wed, Oct 03, 2018 at 03:56:37PM +1000, David Gibson wrote:
> On Tue, Oct 02, 2018 at 09:31:22PM +1000, Paul Mackerras wrote:
> > From: Suraj Jitindar Singh <sjitindarsi...@gmail.com>
> > 
> > When a host (L0) page which is mapped into a (L1) guest is in turn
> > mapped through to a nested (L2) guest we keep a reverse mapping (rmap)
> > so that these mappings can be retrieved later.
> > 
> > Whenever we create an entry in a shadow_pgtable for a nested guest we
> > create a corresponding rmap entry and add it to the list for the
> > L1 guest memslot at the index of the L1 guest page it maps. This means
> > at the L1 guest memslot we end up with lists of rmaps.
> > 
> > When we are notified of a host page being invalidated which has been
> > mapped through to a (L1) guest, we can then walk the rmap list for that
> > guest page, and find and invalidate all of the corresponding
> > shadow_pgtable entries.
> > 
> > In order to reduce memory consumption, we compress the information for
> > each rmap entry down to 52 bits -- 12 bits for the LPID and 40 bits
> > for the guest real page frame number -- which will fit in a single
> > unsigned long.  To avoid a scenario where a guest can trigger
> > unbounded memory allocations, we scan the list when adding an entry to
> > see if there is already an entry with the contents we need.  This can
> > occur, because we don't ever remove entries from the middle of a list.
> > 
> > A struct nested guest rmap is a list pointer and an rmap entry;
> > ----------------
> > | next pointer |
> > ----------------
> > | rmap entry   |
> > ----------------
> > 
> > Thus the rmap pointer for each guest frame number in the memslot can be
> > either NULL, a single entry, or a pointer to a list of nested rmap entries.
> > 
> > gfn  memslot rmap array
> >     -------------------------
> >  0  | NULL                  |       (no rmap entry)
> >     -------------------------
> >  1  | single rmap entry     |       (rmap entry with low bit set)
> >     -------------------------
> >  2  | list head pointer     |       (list of rmap entries)
> >     -------------------------
> > 
> > The final entry always has the lowest bit set and is stored in the next
> > pointer of the last list entry, or as a single rmap entry.
> > With a list of rmap entries looking like;
> > 
> > -----------------   -----------------       -------------------------
> > | list head ptr     | ----> | next pointer  | ----> | single rmap entry     
> > |
> > -----------------   -----------------       -------------------------
> >                     | rmap entry    |       | rmap entry            |
> >                     -----------------       -------------------------
> > 
> > Signed-off-by: Suraj Jitindar Singh <sjitindarsi...@gmail.com>
> > Signed-off-by: Paul Mackerras <pau...@ozlabs.org>
> > ---
> >  arch/powerpc/include/asm/kvm_book3s.h    |   3 +
> >  arch/powerpc/include/asm/kvm_book3s_64.h |  70 ++++++++++++++++-
> >  arch/powerpc/kvm/book3s_64_mmu_radix.c   |  44 +++++++----
> >  arch/powerpc/kvm/book3s_hv.c             |   1 +
> >  arch/powerpc/kvm/book3s_hv_nested.c      | 130 
> > ++++++++++++++++++++++++++++++-
> >  5 files changed, 233 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> > b/arch/powerpc/include/asm/kvm_book3s.h
> > index d983778..1d2286d 100644
> > --- a/arch/powerpc/include/asm/kvm_book3s.h
> > +++ b/arch/powerpc/include/asm/kvm_book3s.h
> > @@ -196,6 +196,9 @@ extern int kvmppc_mmu_radix_translate_table(struct 
> > kvm_vcpu *vcpu, gva_t eaddr,
> >                     int table_index, u64 *pte_ret_p);
> >  extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
> >                     struct kvmppc_pte *gpte, bool data, bool iswrite);
> > +extern void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte, unsigned long 
> > gpa,
> > +                   unsigned int shift, struct kvm_memory_slot *memslot,
> > +                   unsigned int lpid);
> >  extern bool kvmppc_hv_handle_set_rc(struct kvm *kvm, pgd_t *pgtable,
> >                                 bool writing, unsigned long gpa,
> >                                 unsigned int lpid);
> > diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
> > b/arch/powerpc/include/asm/kvm_book3s_64.h
> > index 5496152..38614f0 100644
> > --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> > +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> > @@ -53,6 +53,66 @@ struct kvm_nested_guest {
> >     struct kvm_nested_guest *next;
> >  };
> >  
> > +/*
> > + * We define a nested rmap entry as a single 64-bit quantity
> > + * 0xFFF0000000000000      12-bit lpid field
> > + * 0x000FFFFFFFFFF000      40-bit guest physical address field
> 
> I thought we could potentially support guests with >1TiB of RAM..?

We can, that's really a (4k) page frame number, not a physical
address.  We can support 52-bit guest physical addresses.

Paul.

Reply via email to