Re: [dpdk-dev] IOVA_CONTIG flag needed in kni initialization

Ferruh Yigit Tue, 04 Feb 2020 05:03:01 -0800

On 1/30/2020 8:19 PM, Scott Wasson wrote:
> Hi,
>  
> We’re seeing an issue since upgrading to 19.08, the kni FIFO’s apparently 
> aren’t contiguous.  From user-space’s perspective, the kni’s tx_q straddles 
> the 2MB pageboundary at 0x17a600000.  The mbuf pointers in the ring prior to 
> this address are valid.  The tx_q’s write pointer is indicating there are 
> mbufs at 0x17a600000 and beyond, but the pointers are all NULL.
>  
> Because the rte_kni kernel module is loaded:
>  
> In eal.c:
>                                 /* Workaround for KNI which requires physical 
> address to work */
>                                 if (iova_mode == RTE_IOVA_VA &&
>                                                                 
> rte_eal_check_module("rte_kni") == 1) {
>                                                 if (phys_addrs) {
>                                                                 iova_mode = 
> RTE_IOVA_PA;
>  
> Iova_mode is forced to PA.
>  
> Through brute-force and experimentation, we determined that enabling 
> --legacy-mem caused the problem to go away.  But this caused the locations of 
> the kni’s data structures to move, so they no longer straddled a hugepages 
> boundary.  Our concern is that the furniture may move around again and bring 
> us back to where we were.  Being tied to using --legacy-mem is undesirable in 
> the long-term, anyway.
>  
> Through further brute-force and experimentation, we found that the following 
> code patch helps (even without --legacy-mem):
>  
> index 3d2ffb2..5cc9d69 100644
> --- a/lib/librte_kni/rte_kni.c
> +++ b/lib/librte_kni/rte_kni.c
> @@ -143,31 +143,31 @@ kni_reserve_mz(struct rte_kni *kni)
>         char mz_name[RTE_MEMZONE_NAMESIZE];
>         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_TX_Q_MZ_NAME_FMT, 
> kni->name);
> -       kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, 0);
> +       kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
>         KNI_MEM_CHECK(kni->m_tx_q == NULL, tx_q_fail);
>         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RX_Q_MZ_NAME_FMT, 
> kni->name);
> -       kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, 0);
> +       kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
>         KNI_MEM_CHECK(kni->m_rx_q == NULL, rx_q_fail);
>         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_ALLOC_Q_MZ_NAME_FMT, 
> kni->name);
> -       kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, 0);
> +       kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
>         KNI_MEM_CHECK(kni->m_alloc_q == NULL, alloc_q_fail);
>         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_FREE_Q_MZ_NAME_FMT, 
> kni->name);
> -       kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, 0);
> +       kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
>         KNI_MEM_CHECK(kni->m_free_q == NULL, free_q_fail);
>         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_REQ_Q_MZ_NAME_FMT, 
> kni->name);
> -       kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, 0);
> +       kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
>         KNI_MEM_CHECK(kni->m_req_q == NULL, req_q_fail);
>         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RESP_Q_MZ_NAME_FMT, 
> kni->name);
> -       kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, 0);
> +       kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
>         KNI_MEM_CHECK(kni->m_resp_q == NULL, resp_q_fail);
>         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_SYNC_ADDR_MZ_NAME_FMT, 
> kni->name);
> -       kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, 0);
> +       kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, 
> SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
>         KNI_MEM_CHECK(kni->m_sync_addr == NULL, sync_addr_fail);
>         return 0;
>  
> I removed --legacy-mem, the tx_q still straddles the same 2MB page boundary, 
> yet now it’s been running for a few hours and everything seems OK.
>  
> This would seem to follow precedent in rte_mempool.c:
>  
>                                 /* if we're trying to reserve contiguous 
> memory, add appropriate
>                                 * memzone flag.
>                                 */
>                                 if (try_contig)
>                                                 flags |= 
> RTE_MEMZONE_IOVA_CONTIG;
>  
> which I think explains why our mbufs haven’t seen data truncation issues.
>  
> Could you please why RTE_MEMZONE_IOVA_CONTIG is necessary in PA mode?  Isn’t 
> contiguousness a fundamental property of physical addressing?
>  
> Are we still potentially vulnerable with --legacy-mem and without the above 
> code change?  Did we just get lucky because the furniture moved and doesn’t 
> straddle a page boundary at the moment?
>  
> We also tested with stock 19.11 and did not see the crash.  However the 
> FIFO’s were not straddling a page boundary, and so we believe it is also 
> vulnerable.
>


cc'ed Anatoly.

Hi Scott,

Yes, KNI FIFO's needs to be physically contiguous, after memory rework
'rte_memzone_reserve()' doesn't guarantee the physically contiguous memory and
requires 'RTE_MEMZONE_IOVA_CONTIG' as you did above.

With '--legacy-mem' you are getting old behavior which provides physically
contiguous and it should be safe already.


Can you please send a patch for above change?


Thanks,
ferruh

Re: [dpdk-dev] IOVA_CONTIG flag needed in kni initialization

Reply via email to