> -----Original Message----- > From: Akhil Goyal [mailto:akhil.go...@nxp.com] > Sent: Monday, December 24, 2018 10:54 AM > To: Ananyev, Konstantin <konstantin.anan...@intel.com>; dev@dpdk.org > Cc: Nicolau, Radu <radu.nico...@intel.com>; Horton, Remy > <remy.hor...@intel.com> > Subject: Re: [dpdk-dev] [PATCH v4 1/9] examples/ipsec-secgw: avoid to request > unused TX offloads > > > > On 12/24/2018 3:49 PM, Ananyev, Konstantin wrote: > > > >>>> On 12/14/2018 10:10 PM, Konstantin Ananyev wrote: > >>>>> ipsec-secgw always enables TX offloads > >>>>> (DEV_TX_OFFLOAD_MULTI_SEGS, DEV_TX_OFFLOAD_SECURITY), > >>>>> even when they are not requested by the config. > >>>>> That causes many PMD to choose full-featured TX function, > >>>>> which in many cases is much slower then one without offloads. > >>>>> That patch adds checks to enabled extra HW offloads, only when > >>>>> they were requested. > >>>>> Plus it enables DEV_TX_OFFLOAD_IPV4_CKSUM, > >>>>> only when other HW TX ofloads are going to be enabled. > >>>>> Otherwise SW version of ip cksum calculation is used. > >>>>> That allows to use vector TX function, when inline-ipsec is not > >>>>> requested. > >>>>> > >>>>> Signed-off-by: Remy Horton <remy.hor...@intel.com> > >>>>> Signed-off-by: Konstantin Ananyev <konstantin.anan...@intel.com> > >>>>> Acked-by: Radu Nicolau <radu.nico...@intel.com> > >>>>> --- > >>>>> examples/ipsec-secgw/ipsec-secgw.c | 44 +++++++++++++++-------- > >>>>> examples/ipsec-secgw/ipsec.h | 6 ++++ > >>>>> examples/ipsec-secgw/sa.c | 56 > >>>>> ++++++++++++++++++++++++++++++ > >>>>> 3 files changed, 91 insertions(+), 15 deletions(-) > >>>>> > >>>>> diff --git a/examples/ipsec-secgw/ipsec-secgw.c > >>>>> b/examples/ipsec-secgw/ipsec-secgw.c > >>>>> index 1bc0b5b50..cfc2b05e5 100644 > >>>>> --- a/examples/ipsec-secgw/ipsec-secgw.c > >>>>> +++ b/examples/ipsec-secgw/ipsec-secgw.c > >>>>> @@ -208,8 +208,6 @@ static struct rte_eth_conf port_conf = { > >>>>> }, > >>>>> .txmode = { > >>>>> .mq_mode = ETH_MQ_TX_NONE, > >>>>> - .offloads = (DEV_TX_OFFLOAD_IPV4_CKSUM | > >>>>> - DEV_TX_OFFLOAD_MULTI_SEGS), > >>>> I believe this is disabling checksum offload for all cases and then > >>>> enabling only for inline crypto and inline proto. > >>> Yes. > >>> > >>>> This is breaking lookaside proto and lookaside none cases. Please > >>>> correct me if I am wrong. > >>> Why breaking? > >> reduction in performance is kind of breaking the code. > > I didn’t observe any performance drop with that patch. > > In fact there was a tiny improvement (see below). > > Did you see any regression with this patch on your HW? > NXP hardware are low -end to mid end devices and we are always > bottleneck by core cycles. > So we would like to have as much offloads to HW as possible.
Ok, then I suppose we need to introduce new cmd-line options, Something like: --txoffloads=<tx_offload_mask> --rx_offloads=<rx_offload_mask> to keep everyone happy. Are you ok with that? Konstantin > > > >>> For cases when HW cksum offload is disabled, IPv4 cksum calculation > >>> will be done in SW, see below: > >>> prepare_tx_pkt(...) > >>> { > >>> ... > >>> + > >>> + /* calculate IPv4 cksum in SW */ > >>> + if ((pkt->ol_flags & PKT_TX_IP_CKSUM) == 0) > >>> + ip->ip_sum = rte_ipv4_cksum((struct ipv4_hdr > >>> *)ip); > >>> > >>> > >>> We tested lookaside-none case quite extensively - all works well, > >>> in fact on Intel NICs it became even a bit faster because of that change > >>> (though not much). > >> yes, it may work well on one hardware, but may not perform good in other > >> hardware where cores are limited. > > Could you elaborate a bit more what do you mean by 'cores are limited' here? > we have single core devices as well on which we run ipsec-secgw. > > Do you mean that for some low end cpus calculating IPv4 cksum in SW is too > > expensive? > yes, limited by core cycles and not by HW > > Note that prepare_tx_pkts() and friends read/write L2/L3 packet headers > > anyway - > > so IPv4 header will be in L1 cache already. > Agreed, but still it will consume some cycles which are more than that > of HW. > > > >>> Disabling HW offloads when they are not really required has 2 benefits: > >>> 1) allows app to be run on NICs without HW offloads support. > >>> 2) allows dev_configure() for TX path to select simple/vector TX > >>> functions > >>> which for many NICs are significantly faster. > >>> > >>> Konstantin > >>> > >>>> So a NACK for this if my understanding is correct. > >>>>