> On Thu, Mar 11, 2010 at 04:28:04PM +0000, Paul Brook wrote: > > > > + /* > > > > + * Align on HPAGE_SIZE so "(gfn ^ pfn)& > > > > + * (HPAGE_SIZE-1) == 0" to allow KVM to take advantage > > > > + * of hugepages with NPT/EPT. > > > > + */ > > > > + new_block->host = qemu_memalign(1<< TARGET_HPAGE_BITS, > > > > size); > > > > This should not be target dependent. i.e. it should be the host page > > size. > > Yep I noticed. I'm not aware of an official way to get that > information out of the kernel (hugepagesize in /proc/meminfo is > dependent on hugetlbfs which in turn is not a dependency for > transparent hugepage support) but hey I can add it myself to > /sys/kernel/mm/transparent_hugepage/hugepage_size !
sysconf(_SC_HUGEPAGESIZE); would seem to be the obvious answer. > > > That is a little wasteful. How about a hint to mmap() requesting > > > proper alignment (MAP_HPAGE_ALIGN)? > > > > I'd kinda hope that we wouldn't need to. i.e. the host kernel is smart > > enough to automatically align large allocations anyway. > > Kernel won't do that, and the main reason is to avoid creating more > vmas, it's more efficient to waste virtual space and have userland > allocate more than needed, than ask the kernel alignment and force it > to create more vmas because of holes generated out of it. virtual > memory costs nothing. Huh. That seems unfortunate :-( > Also khugepaged can later zero out the pte_none regions to create a > full segment all backed by hugepages, however if we do that khugepaged > will eat into the free memory space. At the moment I kept khugepaged a > zero-memory-footprint thing. But I'm currently adding an option called > collapse_unmapped to allow khugepaged to collapse unmapped pages too > so if there are only 2/3 pages in the region before the memalign, they > also can be mapped by a large tlb to allow qemu run faster. I don't really understand what you're getting at here. Surely a naturally aligned block is always going to be easier to defragment than a misaligned block. If the allocation size is not a multiple of the preferred alignment, then you probably loose either way, and we shouldn't be requesting increased alignment. > > This is probably a useful optimization regardless of KVM. > > HPAGE alignment is only useful with KVM because it can only payoff > with EPT/NPT, transparent hugepage already works fine without that > (but ok it'd be a microoptimization for the first and last few pages > in the whole vma). This is why I made it conditional to > kvm_enabled(). I can remove the kvm_enabled() check if you worry about > the first and last pages in the huge anon vma. I wouldn't be surprised if putting the start of guest ram on a large TLB entry was a win. Your guest kernel often lives there! > OTOH the madvise(MADV_HUGEPAGE) is surely good idea for qemu too. KVM > normally runs on 64bit hosts, so it's no big deal if we waste 1M of > virtual memory here and there but I thought on qemu you preferred not > to have alignment and have the first few and last few pages in a vma > not backed by large tlb. Ideally we should also align on hpage size if > sizeof(long) = 8. Not sure what's the recommended way to code that > though and it'll make it a bit more complex for little good. Assuming we're allocating in large chunks, I doubt an extra hugepage worth of VMA is a big issue. Either way I'd argue that this isn't something qemu should have to care about, and is actually a bug in posix_memalign. Paul