> -----Original Message----- > From: Elena Agostini <eagost...@nvidia.com> > Sent: Tuesday, September 7, 2021 01:23 > To: Wang, Haiyue <haiyue.w...@intel.com>; Jerin Jacob <jerinjac...@gmail.com> > Cc: NBU-Contact-Thomas Monjalon <tho...@monjalon.net>; Jerin Jacob > <jer...@marvell.com>; dpdk-dev > <dev@dpdk.org>; Stephen Hemminger <step...@networkplumber.org>; David Marchand > <david.march...@redhat.com>; Andrew Rybchenko > <andrew.rybche...@oktetlabs.ru>; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; Yigit, Ferruh <ferruh.yi...@intel.com>; > techbo...@dpdk.org > Subject: RE: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library > > > > > -----Original Message----- > > From: Wang, Haiyue <haiyue.w...@intel.com> > > Sent: Monday, September 6, 2021 7:15 PM > > To: Elena Agostini <eagost...@nvidia.com>; Jerin Jacob > > <jerinjac...@gmail.com> > > Cc: NBU-Contact-Thomas Monjalon <tho...@monjalon.net>; Jerin Jacob > > <jer...@marvell.com>; dpdk-dev <dev@dpdk.org>; Stephen Hemminger > > <step...@networkplumber.org>; David Marchand > > <david.march...@redhat.com>; Andrew Rybchenko > > <andrew.rybche...@oktetlabs.ru>; Honnappa Nagarahalli > > <honnappa.nagaraha...@arm.com>; Yigit, Ferruh <ferruh.yi...@intel.com>; > > techbo...@dpdk.org > > Subject: RE: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing > > library > > > > > > > -----Original Message----- > > > From: Elena Agostini <eagost...@nvidia.com> > > > Sent: Tuesday, September 7, 2021 00:11 > > > To: Jerin Jacob <jerinjac...@gmail.com> > > > Cc: Wang, Haiyue <haiyue.w...@intel.com>; NBU-Contact-Thomas > > Monjalon > > > <tho...@monjalon.net>; Jerin Jacob <jer...@marvell.com>; dpdk-dev > > > <dev@dpdk.org>; Stephen Hemminger <step...@networkplumber.org>; > > David > > > Marchand <david.march...@redhat.com>; Andrew Rybchenko > > > <andrew.rybche...@oktetlabs.ru>; Honnappa Nagarahalli > > > <honnappa.nagaraha...@arm.com>; Yigit, Ferruh > > > <ferruh.yi...@intel.com>; techbo...@dpdk.org > > > Subject: RE: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing > > > library > > > > > > > > > > > > > > > > > > > > > > > I'd like to introduce (with a dedicated option) the memory API in > > > > > testpmd to provide an example of how to TX/RX packets using device > > > > memory. > > > > > > > > Not sure without embedding sideband communication mechanism how > > it > > > > can notify to GPU and back to CPU. If you could share the example > > > > API sequence that helps to us understand the level of coupling with > > testpmd. > > > > > > > > > > There is no need of communication mechanism here. > > > Assuming there is not workload to process network packets (to not > > > complicate things), the steps are: > > > 1) Create a DPDK mempool with device external memory using the hcdev > > > (or gpudev) library > > > 2) Use that mempool to tx/rx/fwd packets > > > > > > As an example, you look at my l2fwd-nv application here: > > > https://github.com/NVIDIA/l2fwd-nv > > > > > > > To enhance the 'rte_extmem_register' / 'rte_pktmbuf_pool_create_extbuf' > > ? > > > > The purpose of these two functions is different. > Here DPDK allows the user to use any kind of memory to rx/tx packets. > It's not about allocating memory.
> > Maybe I'm missing the point here: what's the main objection in having a GPU > library? Exactly. ;-) Maybe a real device code is worth for people to get the whole picture. > > > if (l2fwd_mem_type == MEM_HOST_PINNED) { > > ext_mem.buf_ptr = rte_malloc("extmem", ext_mem.buf_len, 0); > > CUDA_CHECK(cudaHostRegister(ext_mem.buf_ptr, > > ext_mem.buf_len, cudaHostRegisterMapped)); > > void *pDevice; > > CUDA_CHECK(cudaHostGetDevicePointer(&pDevice, > > ext_mem.buf_ptr, 0)); > > if (pDevice != ext_mem.buf_ptr) > > rte_exit(EXIT_FAILURE, "GPU pointer does not match > > CPU > > pointer\n"); > > } else { > > ext_mem.buf_iova = RTE_BAD_IOVA; > > CUDA_CHECK(cudaMalloc(&ext_mem.buf_ptr, > > ext_mem.buf_len)); > > if (ext_mem.buf_ptr == NULL) > > rte_exit(EXIT_FAILURE, "Could not allocate GPU > > memory\n"); > > > > unsigned int flag = 1; > > CUresult status = cuPointerSetAttribute(&flag, > > CU_POINTER_ATTRIBUTE_SYNC_MEMOPS, (CUdeviceptr)ext_mem.buf_ptr); > > if (CUDA_SUCCESS != status) { > > rte_exit(EXIT_FAILURE, "Could not set SYNC MEMOP > > attribute > > for GPU memory at %llx\n", (CUdeviceptr)ext_mem.buf_ptr); > > } > > ret = rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len, > > NULL, ext_mem.buf_iova, GPU_PAGE_SIZE); > > if (ret) > > rte_exit(EXIT_FAILURE, "Could not register GPU > > memory\n"); > > } > > ret = rte_dev_dma_map(rte_eth_devices[l2fwd_port_id].device, > > ext_mem.buf_ptr, ext_mem.buf_iova, ext_mem.buf_len); > > if (ret) > > rte_exit(EXIT_FAILURE, "Could not DMA map EXT memory\n"); > > mpool_payload = rte_pktmbuf_pool_create_extbuf("payload_mpool", > > l2fwd_nb_mbufs, > > > > 0, 0, > ext_mem.elt_size, > > > rte_socket_id(), > > &ext_mem, 1); > > if (mpool_payload == NULL) > > rte_exit(EXIT_FAILURE, "Could not create EXT memory > > mempool\n"); > > > >