On 27-Sep-18 11:49 AM, Kiran Kumar wrote:
With current KNI implementation kernel module will work only in
IOVA=PA mode. This patch will add support for kernel module to work
with IOVA=VA mode.

The idea is to maintain a mapping in KNI module between user pages and
kernel pages and in fast path perform a lookup in this table and get
the kernel virtual address for corresponding user virtual address.

In IOVA=VA mode, the memory allocated to the pool is physically
and virtually contiguous. We will take advantage of this and create a
mapping in the kernel.In kernel we need mapping for queues
(tx_q, rx_q,... slow path) and mbuf memory (fast path).

At the KNI init time, in slow path we will create a mapping for the
queues and mbuf using get_user_pages similar to af_xdp. Using pool
memory base address, we will create a page map table for the mbuf,
which we will use in the fast path for kernel page translation.

At KNI init time, we will pass the base address of the pool and size of
the pool to kernel. In kernel, using get_user_pages API, we will get
the pages with size PAGE_SIZE and store the mapping and start address
of user space in a table.

In fast path for any user address perform PAGE_SHIFT
(user_addr >> PAGE_SHIFT) and subtract the start address from this value,
we will get the index of the kernel page with in the page map table.
Adding offset to this kernel page address, we will get the kernel address
for this user virtual address.

For example user pool base address is X, and size is S that we passed to
kernel. In kernel we will create a mapping for this using get_user_pages.
Our page map table will look like [Y, Y+PAGE_SIZE, Y+(PAGE_SIZE*2) ....]
and user start page will be U (we will get it from X >> PAGE_SHIFT).

For any user address Z we will get the index of the page map table using
((Z >> PAGE_SHIFT) - U). Adding offset (Z & (PAGE_SIZE - 1)) to this
address will give kernel virtual address.

Signed-off-by: Kiran Kumar <kkokkilaga...@caviumnetworks.com>
---

<snip>

+
+       /* IOVA mode. 1 = VA, 0 = PA */
+       uint8_t iova_mode;

Wouldn't it be easier to understand if the same values (e.g. iova_mode == RTE_IOVA_VA) were used here as they are in DPDK?

--
Thanks,
Anatoly

Reply via email to