On Thu, Sep 22, 2016 at 09:36:15AM +0000, Ananyev, Konstantin wrote: Hi Konstantin,
> > Hi Jerin, > > > > > > > Hi Jerin, > > > > Hi Konstantin, > > > > > > > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > + > > > > > > > +#ifdef RTE_ETHDEV_TX_PREP > > > > > > > > > > > > Sorry for being a bit late on that discussion, but what the > > > > > > point of having that config macro (RTE_ETHDEV_TX_PREP ) at all? > > > > > > As I can see right now, if driver doesn't setup tx_pkt_prep, > > > > > > then nb_pkts would be return anyway... > > > > > > > > > > > > BTW, there is my another question - should it be that way? > > > > > > Shouldn't we return 0 (and set rte_errno=ENOTSUP) here if > > > > > > dev->tx_pk_prep == NULL? > > > > > > > > > > > > > > > > It's an answer to the Jerin's request discussed here: > > > > > http://dpdk.org/ml/archives/dev/2016-September/046437.html > > > > > > > > > > When driver doesn't support tx_prep, default behavior is "we don't > > > > > know requirements, so we have nothing to do here". It will > > > > > simplify > > > > application logic and improve performance for these drivers, I think. > > > > Catching this error with every burst may be problematic. > > > > > > > > > > As for RTE_ETHDEV_TX_PREP macro, suggested by Jerin in the same > > > > > thread, I still don't think It's the best solution of the problem > > > > described by him. I have added it here for further discussion. > > > > > > > > > > Jerin, have you something to add? > > > > > > > > Nothing very specific to add here. I think, I have tried to share > > > > the rational in, http://dpdk.org/ml/archives/dev/2016- > > > > September/046437.html > > > > > > > > > > Ok, not sure I am fully understand your intention here. > > > I think I understand why you propose rte_eth_tx_prep() to do: > > > if (!dev->tx_pkt_prep) > > > return nb_pkts; > > > > > > That allows drivers to NOOP the tx_prep functionality without paying > > > the price for callback invocation. > > > > In true sense, returning the nb_pkts makes it functional NOP as well(i.e > > The PMD does not have any limitation on Tx side, so all > > packets are _good_(no preparation is required)) > > > > > > > What I don't understand, why with that in place we also need a NOOP > > > for the whole rte_eth_tx_prep(): > > > +static inline uint16_t > > > +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id > > > __rte_unused, > > > + struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts) { > > > + return nb_pkts; > > > +} > > > + > > > +#endif > > > > > > What are you trying to save here: just reading ' dev->tx_pkt_prep'? > > > If so, then it seems not that performance critical for me. > > > Something else? > > > > The proposed scheme can make it as true NOOP from compiler perspective too > > if a target decided to do that, I have checked the > > instruction generation with arm Assembly, a non true compiler NOOP has > > following instructions overhead at minimum. > > > > # 1 load > > # 1 mov > > if (!dev->tx_pkt_prep) > > return nb_pkts; > > Yep. > > > > > # compile can't predict this function needs be executed or not so > > # pressure on register allocation and mostly likely it call for > > # some stack push and pop based load on outer function(as it is an > > # inline function) > > > Well, I suppose compiler wouldn't try to fill function argument registers > before the branch above. Not the case with arm gcc compiler(may be based outer function load). The recent, external pool manager function pointer conversion reduced around 700kpps/core in local cache mode(even though the function pointers are not executed) > > > > > return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id], tx_pkts, > > nb_pkts); > > > > # 1 branch > > if (unlikely(nb_prep < nb_rx)) { > > # bunch of code not executed, but pressure on i cache > > int i; > > for (i = nb_prep; i < nb_rx; i++) > > rte_pktmbuf_free(pkts_burst[i]); > > } > > > > From a server target(IA or high-end armv8) with external PCIe based system > > makes sense to have RTE_ETHDEV_TX_PREP option > > enabled(which is the case in proposed patch) but the low end arm platforms > > with > > - limited cores > > - less i cache > > - IPC == 1 > > - running around 1GHz > > - most importantly, _integrated_ nic controller with no external PCIE > > support > > does not make much sense to waste the cycles/time for it. > > cycle saved is cycle earned. > > Ok, so it is all to save one memory de-refrence and a comparison plus branch. > Do you have aby estimation how badly it would hit low-end cpu performance? around 400kpps/core. On four core systems, around 2 mpps.(4 core with 10G*2 ports) > The reason I am asking: obviously I would prefer to avoid to introduce new > build config option > (that's the common dpdk coding practice these days) unless it is really > important. Practice is something we need to revisit based on the new use case/usage. I think, the scheme of non external pcie based NW cards is new to DPDK. > > > > > Since DPDK compilation is based on _target_, I really don't see any issue > > with this approach nor it does not hurt anything on server > > target. > > So, IMO, It should be upto the target to decide what works better for the > > target. > > > > Jerin > > > > > From my point of view NOOP on the driver level is more than enough. > > > Again I would prefer to introduce new config option, if possible. > > > > > > Konstantin > > >