On 05-Feb-18 10:18 AM, Nélio Laranjeiro wrote:
On Mon, Feb 05, 2018 at 10:03:35AM +0000, Burakov, Anatoly wrote:
On 02-Feb-18 7:28 PM, Yongseok Koh wrote:
On Tue, Dec 26, 2017 at 05:19:25PM +0000, Walker, Benjamin wrote:
On Fri, 2017-12-22 at 09:13 +0000, Burakov, Anatoly wrote:
On 21-Dec-17 9:38 PM, Walker, Benjamin wrote:
SPDK will need some way to register for a notification when pages are
allocated
or freed. For storage, the number of requests per second is (relative to
networking) fairly small (hundreds of thousands per second in a traditional
block storage stack, or a few million per second with SPDK). Given that, we
can
afford to do a dynamic lookup from va to pa/iova on each request in order to
greatly simplify our APIs (users can just pass pointers around instead of
mbufs). DPDK has a way to lookup the pa from a given va, but it does so by
scanning /proc/self/pagemap and is very slow. SPDK instead handles this by
implementing a lookup table of va to pa/iova which we populate by scanning
through the DPDK memory segments at start up, so the lookup in our table is
sufficiently fast for storage use cases. If the list of memory segments
changes,
we need to know about it in order to update our map.

Hi Benjamin,

So, in other words, we need callbacks on alloa/free. What information
would SPDK need when receiving this notification? Since we can't really
know in advance how many pages we allocate (it may be one, it may be a
thousand) and they no longer are guaranteed to be contiguous, would a
per-page callback be OK? Alternatively, we could have one callback per
operation, but only provide VA and size of allocated memory, while
leaving everything else to the user. I do add a virt2memseg() function
which would allow you to look up segment physical addresses easier, so
you won't have to manually scan memseg lists to get IOVA for a given VA.

Thanks for your feedback and suggestions!

Yes - callbacks on alloc/free would be perfect. Ideally for us we want one
callback per virtual memory region allocated, plus a function we can call to
find the physical addresses/page break points on that virtual region. The
function that finds the physical addresses does not have to be efficient - we'll
just call that once when the new region is allocated and store the results in a
fast lookup table. One call per virtual region is better for us than one call
per physical page because we're actually keeping multiple different types of
memory address translation tables in SPDK. One translates from va to pa/iova, so
for this one we need to break this up into physical pages and it doesn't matter
if you do one call per virtual region or one per physical page. However another
one translates from va to RDMA lkey, so it is much more efficient if we can
register large virtual regions in a single call.

Another yes to callbacks. Like Benjamin mentioned about RDMA, MLX PMD has to
look up LKEY per each packet DMA. Let me briefly explain about this for your
understanding. For security reason, we don't allow application initiates a DMA
transaction with unknown random physical addresses. Instead, va-to-pa mapping
(we call it Memory Region) should be pre-registered and LKEY is the index of the
translation entry registered in device. With the current static memory model, it
is easy to manage because v-p mapping is unchanged over time. But if it becomes
dynamic, MLX PMD should get notified with the event to register/un-regsiter
Memory Region.

For MLX PMD, it is also enough to get one notification per allocation/free of a
virutal memory region. It shouldn't necessarily be a per-page call like Benjamin
mentioned because PA of region doesn't need to be contiguous for registration.
But it doesn't need to know about physical address of the region (I'm not saying
it is unnecessary, but just FYI :-).

Thanks,
Yongseok


Thanks for your feedback, good to hear we're on the right track. I already
have a prototype implementation of this working, due for v1 submission :)

Hi Anatoly,

Good to know.
Do you see some performances impact with this series?

Thanks,


In general case, no impact is noticeable, since e.g. underlying ring implementation does not depend on IO space layout whatsoever. In certain specific cases, some optimizations that were made on the assumption that physical space is contiguous, would no longer be possible (e.g. calculating offset spanning several pages) unless VFIO is in use, as due to unpredictability of IO space layout, each page will have to be checked individually, rather than sharing common base offset.

--
Thanks,
Anatoly

Reply via email to