On 1/30/2020 8:19 PM, Scott Wasson wrote: > Hi, > > We’re seeing an issue since upgrading to 19.08, the kni FIFO’s apparently > aren’t contiguous. From user-space’s perspective, the kni’s tx_q straddles > the 2MB pageboundary at 0x17a600000. The mbuf pointers in the ring prior to > this address are valid. The tx_q’s write pointer is indicating there are > mbufs at 0x17a600000 and beyond, but the pointers are all NULL. > > Because the rte_kni kernel module is loaded: > > In eal.c: > /* Workaround for KNI which requires physical > address to work */ > if (iova_mode == RTE_IOVA_VA && > > rte_eal_check_module("rte_kni") == 1) { > if (phys_addrs) { > iova_mode = > RTE_IOVA_PA; > > Iova_mode is forced to PA. > > Through brute-force and experimentation, we determined that enabling > --legacy-mem caused the problem to go away. But this caused the locations of > the kni’s data structures to move, so they no longer straddled a hugepages > boundary. Our concern is that the furniture may move around again and bring > us back to where we were. Being tied to using --legacy-mem is undesirable in > the long-term, anyway. > > Through further brute-force and experimentation, we found that the following > code patch helps (even without --legacy-mem): > > index 3d2ffb2..5cc9d69 100644 > --- a/lib/librte_kni/rte_kni.c > +++ b/lib/librte_kni/rte_kni.c > @@ -143,31 +143,31 @@ kni_reserve_mz(struct rte_kni *kni) > char mz_name[RTE_MEMZONE_NAMESIZE]; > snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_TX_Q_MZ_NAME_FMT, > kni->name); > - kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, 0); > + kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); > KNI_MEM_CHECK(kni->m_tx_q == NULL, tx_q_fail); > snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RX_Q_MZ_NAME_FMT, > kni->name); > - kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, 0); > + kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); > KNI_MEM_CHECK(kni->m_rx_q == NULL, rx_q_fail); > snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_ALLOC_Q_MZ_NAME_FMT, > kni->name); > - kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, 0); > + kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); > KNI_MEM_CHECK(kni->m_alloc_q == NULL, alloc_q_fail); > snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_FREE_Q_MZ_NAME_FMT, > kni->name); > - kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, 0); > + kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); > KNI_MEM_CHECK(kni->m_free_q == NULL, free_q_fail); > snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_REQ_Q_MZ_NAME_FMT, > kni->name); > - kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, 0); > + kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); > KNI_MEM_CHECK(kni->m_req_q == NULL, req_q_fail); > snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RESP_Q_MZ_NAME_FMT, > kni->name); > - kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, 0); > + kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); > KNI_MEM_CHECK(kni->m_resp_q == NULL, resp_q_fail); > snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_SYNC_ADDR_MZ_NAME_FMT, > kni->name); > - kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, 0); > + kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, > SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); > KNI_MEM_CHECK(kni->m_sync_addr == NULL, sync_addr_fail); > return 0; > > I removed --legacy-mem, the tx_q still straddles the same 2MB page boundary, > yet now it’s been running for a few hours and everything seems OK. > > This would seem to follow precedent in rte_mempool.c: > > /* if we're trying to reserve contiguous > memory, add appropriate > * memzone flag. > */ > if (try_contig) > flags |= > RTE_MEMZONE_IOVA_CONTIG; > > which I think explains why our mbufs haven’t seen data truncation issues. > > Could you please why RTE_MEMZONE_IOVA_CONTIG is necessary in PA mode? Isn’t > contiguousness a fundamental property of physical addressing? > > Are we still potentially vulnerable with --legacy-mem and without the above > code change? Did we just get lucky because the furniture moved and doesn’t > straddle a page boundary at the moment? > > We also tested with stock 19.11 and did not see the crash. However the > FIFO’s were not straddling a page boundary, and so we believe it is also > vulnerable. >
cc'ed Anatoly. Hi Scott, Yes, KNI FIFO's needs to be physically contiguous, after memory rework 'rte_memzone_reserve()' doesn't guarantee the physically contiguous memory and requires 'RTE_MEMZONE_IOVA_CONTIG' as you did above. With '--legacy-mem' you are getting old behavior which provides physically contiguous and it should be safe already. Can you please send a patch for above change? Thanks, ferruh