On 21-Dec-17 9:38 PM, Walker, Benjamin wrote:
On Tue, 2017-12-19 at 11:14 +0000, Anatoly Burakov wrote:
Quick outline of all changes done as part of this patchset:
* Malloc heap adjusted to handle holes in address space
* Single memseg list replaced by multiple expandable memseg lists
* VA space for hugepages is preallocated in advance
* Added dynamic alloc/free for pages, happening as needed on malloc/free
SPDK will need some way to register for a notification when pages are allocated
or freed. For storage, the number of requests per second is (relative to
networking) fairly small (hundreds of thousands per second in a traditional
block storage stack, or a few million per second with SPDK). Given that, we can
afford to do a dynamic lookup from va to pa/iova on each request in order to
greatly simplify our APIs (users can just pass pointers around instead of
mbufs). DPDK has a way to lookup the pa from a given va, but it does so by
scanning /proc/self/pagemap and is very slow. SPDK instead handles this by
implementing a lookup table of va to pa/iova which we populate by scanning
through the DPDK memory segments at start up, so the lookup in our table is
sufficiently fast for storage use cases. If the list of memory segments changes,
we need to know about it in order to update our map.
Hi Benjamin,
So, in other words, we need callbacks on alloa/free. What information
would SPDK need when receiving this notification? Since we can't really
know in advance how many pages we allocate (it may be one, it may be a
thousand) and they no longer are guaranteed to be contiguous, would a
per-page callback be OK? Alternatively, we could have one callback per
operation, but only provide VA and size of allocated memory, while
leaving everything else to the user. I do add a virt2memseg() function
which would allow you to look up segment physical addresses easier, so
you won't have to manually scan memseg lists to get IOVA for a given VA.
Thanks for your feedback and suggestions!
Having the map also enables a number of other nice things - for instance we
allow users to register memory that wasn't allocated through DPDK and use it for
DMA operations. We keep that va to pa/iova mapping in the same map. I appreciate
you adding APIs to dynamically register this type of memory with the IOMMU on
our behalf. That allows us to eliminate a nasty hack where we were looking up
the vfio file descriptor through sysfs in order to send the registration ioctl.
* Added contiguous memory allocation API's for rte_malloc and rte_memzone
* Integrated Pawel Wodkowski's patch [1] for registering/unregistering memory
with VFIO
--
Thanks,
Anatoly