On Thu, Oct 28, 2021 at 11:48 PM Radha Mohan <mohun...@gmail.com> wrote: > > On Tue, Oct 26, 2021 at 1:49 AM Jerin Jacob <jerinjac...@gmail.com> wrote: > > > > On Tue, Oct 26, 2021 at 9:43 AM Radha Mohan Chintakuntla > > <rad...@marvell.com> wrote: > > > > > > Add functions for the dmadev vchan setup and DMA operations. > > > > > > Signed-off-by: Radha Mohan Chintakuntla <rad...@marvell.com>
> > > +static int > > > +cnxk_dmadev_copy(void *dev_private, uint16_t vchan, rte_iova_t src, > > > + rte_iova_t dst, uint32_t length, uint64_t flags) > > > +{ > > > + uint64_t cmd[DPI_MAX_CMD_SIZE] = {0}; > > > + union dpi_instr_hdr_s *header = (union dpi_instr_hdr_s *)&cmd[0]; > > > + rte_iova_t fptr, lptr; > > > + struct cnxk_dpi_vf_s *dpivf = dev_private; > > > + struct cnxk_dpi_compl_s *comp_ptr; > > > + int num_words = 0; > > > + int rc; > > > + > > > + RTE_SET_USED(vchan); > > > + > > > + header->s.xtype = dpivf->conf.direction; > > > + header->s.pt = DPI_HDR_PT_ZBW_CA; > > > + comp_ptr = dpivf->conf.c_desc.compl_ptr[dpivf->conf.c_desc.tail]; > > > + comp_ptr->cdata = DPI_REQ_CDATA; > > > + header->s.ptr = (uint64_t)comp_ptr; > > > + STRM_INC(dpivf->conf.c_desc); > > > + > > > + /* pvfe should be set for inbound and outbound only */ > > > + if (header->s.xtype <= 1) > > > + header->s.pvfe = 1; > > > + num_words += 4; > > > + > > > + header->s.nfst = 1; > > > + header->s.nlst = 1; > > > > Including filling zeros in cmd and the rest of the filling can be > > moved to slow path.. > > > > Please change the logic to populate the static items based on > > configure/channel setup > > in slowpath and update only per transfer-specific items to have better > > performance. > > > These are instruction header values that we are filling. If you look > at it there is really one 64bit field that can be filled beforehand > a.k.a slowpath in vchan_setup(). > Rest of the header can only be filled here like nlst, nfst (these are > number of pointers to be DMA'ed) and completion pointer. So just for > that I do not see a value in moving around the code. Two things, 1) By dong like below, > > > + header->s.nfst = 1; > > > + header->s.nlst = 1; it will generate multiple stores. One option is to have a local u64 variable and form the required descriptor and write it one shot. It is a standard optimation strategy used in fastpath. 2) uint64_t cmd[DPI_MAX_CMD_SIZE] = {0}; This will result in memset of 64B, That reason for creating template based on vchan make sense. Looks like moving to a template-based scheme need a lot of rework in the driver, I will leave you to decide performance vs other aspects as you are maintaining the driver. No strong opinion. > > <snip>