Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support

Paul Brook Thu, 11 Mar 2010 10:10:41 -0800

> On Thu, Mar 11, 2010 at 04:28:04PM +0000, Paul Brook wrote:
> > > > +               /*
> > > > +                * Align on HPAGE_SIZE so "(gfn ^ pfn)&
> > > > +                * (HPAGE_SIZE-1) == 0" to allow KVM to take advantage
> > > > +                * of hugepages with NPT/EPT.
> > > > +                */
> > > > +               new_block->host = qemu_memalign(1<<  TARGET_HPAGE_BITS, 
> > > > size);
> >
> > This should not be target dependent. i.e. it should be the host page
> > size.
> 
> Yep I noticed. I'm not aware of an official way to get that
> information out of the kernel (hugepagesize in /proc/meminfo is
> dependent on hugetlbfs which in turn is not a dependency for
> transparent hugepage support) but hey I can add it myself to
> /sys/kernel/mm/transparent_hugepage/hugepage_size !


sysconf(_SC_HUGEPAGESIZE); would seem to be the obvious answer.
 
> > > That is a little wasteful.  How about a hint to mmap() requesting
> > > proper alignment (MAP_HPAGE_ALIGN)?
> >
> > I'd kinda hope that we wouldn't need to. i.e. the host kernel is smart
> > enough to automatically align large allocations anyway.
> 
> Kernel won't do that, and the main reason is to avoid creating more
> vmas, it's more efficient to waste virtual space and have userland
> allocate more than needed, than ask the kernel alignment and force it
> to create more vmas because of holes generated out of it. virtual
> memory costs nothing.

Huh. That seems unfortunate :-(

> Also khugepaged can later zero out the pte_none regions to create a
> full segment all backed by hugepages, however if we do that khugepaged
> will eat into the free memory space. At the moment I kept khugepaged a
> zero-memory-footprint thing. But I'm currently adding an option called
> collapse_unmapped to allow khugepaged to collapse unmapped pages too
> so if there are only 2/3 pages in the region before the memalign, they
> also can be mapped by a large tlb to allow qemu run faster.

I don't really understand what you're getting at here. Surely a naturally 
aligned block is always going to be easier to defragment than a misaligned 
block.

If the allocation size is not a multiple of the preferred alignment, then you 
probably loose either way, and we shouldn't be requesting increased alignment.

> > This is probably a useful optimization regardless of KVM.
> 
> HPAGE alignment is only useful with KVM because it can only payoff
> with EPT/NPT, transparent hugepage already works fine without that
> (but ok it'd be a microoptimization for the first and last few pages
> in the whole vma). This is why I made it conditional to
> kvm_enabled(). I can remove the kvm_enabled() check if you worry about
> the first and last pages in the huge anon vma.

I wouldn't be surprised if putting the start of guest ram on a large TLB entry 
was a win. Your guest kernel often lives there!

> OTOH the madvise(MADV_HUGEPAGE) is surely good idea for qemu too. KVM
> normally runs on 64bit hosts, so it's no big deal if we waste 1M of
> virtual memory here and there but I thought on qemu you preferred not
> to have alignment and have the first few and last few pages in a vma
> not backed by large tlb. Ideally we should also align on hpage size if
> sizeof(long) = 8. Not sure what's the recommended way to code that
> though and it'll make it a bit more complex for little good.

Assuming we're allocating in large chunks, I doubt an extra hugepage worth of 
VMA is a big issue.

Either way I'd argue that this isn't something qemu should have to care about, 
and is actually a bug in posix_memalign.

Paul

Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support

Reply via email to