Hi,

On 1/5/2017 6:16 PM, Sergio Gonzalez Monroy wrote:
On 05/01/2017 10:09, Sergio Gonzalez Monroy wrote:
On 04/01/2017 21:34, Walker, Benjamin wrote:
On Wed, 2017-01-04 at 19:39 +0800, Tan, Jianfeng wrote:
Hi Benjamin,


On 12/30/2016 4:41 AM, Walker, Benjamin wrote:
DPDK today begins by allocating all of the required
hugepages, then finds all of the physical addresses for
those hugepages using /proc/self/pagemap, sorts the
hugepages by physical address, then remaps the pages to
contiguous virtual addresses. Later on and if vfio is
enabled, it asks vfio to pin the hugepages and to set their
DMA addresses in the IOMMU to be the physical addresses
discovered earlier. Of course, running as an unprivileged
user means all of the physical addresses in
/proc/self/pagemap are just 0, so this doesn't end up
working. Further, there is no real reason to choose the
physical address as the DMA address in the IOMMU - it would
be better to just count up starting at 0.
Why not just using virtual address as the DMA address in this case to
avoid maintaining another kind of addresses?
That's a valid choice, although I'm just storing the DMA address in the
physical address field that already exists. You either have a physical
address or a DMA address and never both.

   Also, because the
pages are pinned after the virtual to physical mapping is
looked up, there is a window where a page could be moved.
Hugepage mappings can be moved on more recent kernels (at
least 4.x), and the reliability of hugepages having static
mappings decreases with every kernel release.
Do you mean kernel might take back a physical page after mapping it to a
virtual page (maybe copy the data to another physical page)? Could you
please show some links or kernel commits?
Yes - the kernel can move a physical page to another physical page
and change the virtual mapping at any time. For a concise example
see 'man migrate_pages(2)', or for a more serious example the code
that performs memory page compaction in the kernel which was
recently extended to support hugepages.

Before we go down the path of me proving that the mapping isn't static,
let me turn that line of thinking around. Do you have any documentation
demonstrating that the mapping is static? It's not static for 4k pages, so
why are we assuming that it is static for 2MB pages? I understand that
it happened to be static for some versions of the kernel, but my understanding
is that this was purely by coincidence and never by intention.

It looks to me as if you are talking about Transparent hugepages, and not hugetlbfs managed hugepages (DPDK usecase). AFAIK memory (hugepages) managed by hugetlbfs is not compacted and/or moved, they are not part of the kernel memory management.


Please forgive my loose/poor use of words here when saying that "they are not part of the kernel memory management", I mean to say that they are not part of the kernel memory management process you were mentioning, ie. compacting, moving, etc.

Sergio

So again, do you have some references to code/articles where this "dynamic" behavior of hugepages managed by hugetlbfs is mentioned?

Sergio

According to the information Benjamin provided, I did some home work and find this macro in kernel config, CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION, and further the function, hugepage_migration_supported().

Seems that there are at least three ways to make this behavior happen (I'm basing on Linux 4.8.1):

a) Through a syscall migrate_pages();
b) through a syscall move_pages();
c) Since some version of kernel, there's a kthread named kcompactd for each numa socket, to perform memory compaction.

Thanks,
Jianfeng


Note that this
probably means that using uio on recent kernels is subtly
broken and cannot be supported going forward because there
is no uio mechanism to pin the memory.

The first open question I have is whether DPDK should allow
uio at all on recent (4.x) kernels. My current understanding
is that there is no way to pin memory and hugepages can now
be moved around, so uio would be unsafe. What does the
community think here?

My second question is whether the user should be allowed to
mix uio and vfio usage simultaneously. For vfio, the
physical addresses are really DMA addresses and are best
when arbitrarily chosen to appear sequential relative to
their virtual addresses.
Why "sequential relative to their virtual addresses"? IOMMU table is for
DMA addr -> physical addr mapping. So we need to DMA addresses
"sequential relative to their physical addresses"? Based on your above
analysis on how hugepages are initialized, virtual addresses is a good
candidate for DMA address?
The code already goes through a separate organizational step on all of
the pages that remaps the virtual addresses such that they're sequential relative to the physical backing pages, so this mostly ends up as the same
thing.
Choosing to use the virtual address is a totally valid choice, but I worry it
may lead to confusion during debugging or in a multi-process scenario.
I'm open to making this choice instead of starting from zero, though.

Thanks,
Jianfeng




Reply via email to