Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support

Andrea Arcangeli Thu, 11 Mar 2010 10:53:50 -0800

On Thu, Mar 11, 2010 at 05:55:10PM +0000, Paul Brook wrote:
> sysconf(_SC_HUGEPAGESIZE); would seem to be the obvious answer.


There's not just one hugepage size and that thing doesn't exist yet
plus it'd require mangling over glibc too. If it existed I could use
it but I think this is better:

$ cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size 
2097152

Ok? If this file doesn't exist we won't align, so we also align on
qemu not only on kvm for the concern below on the first and last bytes.

> > Also khugepaged can later zero out the pte_none regions to create a
> > full segment all backed by hugepages, however if we do that khugepaged
> > will eat into the free memory space. At the moment I kept khugepaged a
> > zero-memory-footprint thing. But I'm currently adding an option called
> > collapse_unmapped to allow khugepaged to collapse unmapped pages too
> > so if there are only 2/3 pages in the region before the memalign, they
> > also can be mapped by a large tlb to allow qemu run faster.
> 
> I don't really understand what you're getting at here. Surely a naturally 
> aligned block is always going to be easier to defragment than a misaligned 
> block.

Basically was I was saying it's about touching subpage 0, 1 of an
hugepage, then posix_memalign extends the vma and nobody is ever going
to touch page 2-511 because those are the virtual addresses
wasted. khugepaged before couldn't allocate an hugepage for only page
0, 1 because the vma stopped there, but later after the vma is
extended it can. So previously I wasn't mapping this range with an
hugepage, but now I'm mapping it with an hugepage too. And a sysfs
control will select the max number of unmapped subpages for the
collapse to happen. For just 1 subpage mapped in the hugepage virtual
range, it won't make sense to use large tlb and waste 511 pages of
ram.

> If the allocation size is not a multiple of the preferred alignment, then you 
> probably loose either way, and we shouldn't be requesting increased alignment.

That's probably good idea. Also note, if we were to allocate
separately the 0-640k 1m-end, for NPT to work we'd need to start the
second block misaligned at a 1m address. So maybe I should move the
alignment out of qemu_ram_alloc and have it in the caller?

> I wouldn't be surprised if putting the start of guest ram on a large TLB 
> entry 
> was a win. Your guest kernel often lives there!

Yep, that's easy to handle with the hpage_pmd_size ;).

> Assuming we're allocating in large chunks, I doubt an extra hugepage worth of 
> VMA is a big issue.
> 
> Either way I'd argue that this isn't something qemu should have to care 
> about, 
> and is actually a bug in posix_memalign.

Hmm the last is a weird claim considering posix_memalign gets an explicit
alignment parameter and it surely can't choose what alignment to
use. We can argue about the kernel side having to align automatically
but again if it would do that, it'd generate unnecessary vma holes
which we don't want.

I think it's quite simple, just use my new sysfs control, if it exists
always use that alignment instead of the default. We've only to decide
if to align inside or outside of qemu_ram_alloc.

Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support

Reply via email to