Re: [EXT] Re: [PATCH v1 1/1] net/qede: fix redundant condition in debug code

2022-01-10 Thread Igor Russkikh


> On Tue, Nov 30, 2021 at 10:29 PM Anatoly Burakov
>  wrote:
>>
>> Expression "a && 1" is equivalent to just "a", so fix the accidental
>> inclusion of a literal in code.
>>
>> Cc: sta...@dpdk.org
>>
>> Fixes: ec55c118792b ("net/qede: add infrastructure for debug data 
>> collection")
>> Cc: rm...@marvell.com
> 
> Hi @Rasesh Mody @Devendra Singh Rawat  @Igor Russkikh
> 
> Please review this patch to merge.
> 
> 
>>
>> Signed-off-by: Anatoly Burakov 

Reviewed-by: Igor Russkikh 

Thanks!


RE: [EXT] [PATCH v1 1/1] net/qede: fix redundant condition in debug code

2022-01-10 Thread Rasesh Mody
> From: Anatoly Burakov 
> Sent: Tuesday, November 30, 2021 10:29 PM
> 
> External Email
> 
> --
> Expression "a && 1" is equivalent to just "a", so fix the accidental 
> inclusion of
> a literal in code.
> 
> Cc: sta...@dpdk.org
> 
> Fixes: ec55c118792b ("net/qede: add infrastructure for debug data
> collection")
> Cc: rm...@marvell.com
> 
> Signed-off-by: Anatoly Burakov 

Acked-by: Rasesh Mody 

Thanks! 

> ---
> 
> Notes:
> This isn't a bug, this is just a syntactic anomaly, likely a remnant of 
> some
> kind of debugging code.
> 
> This issue was found with Control Flag [1], which i ran on DPDK codebase
> just
> out of curiosity. This was the only issue worth addressing that the tool
> produced output for.
> 
> [1] https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_IntelLabs_control-
> 2Dflag&d=DwIDAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=Vhi2FR3R84xPMUtUhj
> NPxoiMSxcj1IW0xDKEoZ0F00o&m=OrZLdoVFyT0inpO-NpRW-
> bqCiG9lrnzODBoic5Pwb8qrKh_6y0JbHFKrzJ6vHBQH&s=d76wgQiSey5O9D5N7
> HhUGNvReAzVZpe4wmjHgXhJI78&e=
> 
>  drivers/net/qede/qede_debug.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/qede/qede_debug.c
> b/drivers/net/qede/qede_debug.c index 2297d245c4..ba807ea680 100644
> --- a/drivers/net/qede/qede_debug.c
> +++ b/drivers/net/qede/qede_debug.c
> @@ -3522,7 +3522,7 @@ static enum dbg_status qed_grc_dump(struct
> ecore_hwfn *p_hwfn,
> 
>   /* Dump MCP HW Dump */
>   if (qed_grc_is_included(p_hwfn,
> DBG_GRC_PARAM_DUMP_MCP_HW_DUMP) &&
> - !qed_grc_get_param(p_hwfn, DBG_GRC_PARAM_NO_MCP) &&
> 1)
> + !qed_grc_get_param(p_hwfn, DBG_GRC_PARAM_NO_MCP))
>   offset += qed_grc_dump_mcp_hw_dump(p_hwfn,
>  p_ptt,
>  dump_buf + offset, dump);
> --
> 2.25.1



Re: [PATCH 1/2] net/cnxk: update meter bpf ID in rq

2022-01-10 Thread Jerin Jacob
On Tue, Nov 30, 2021 at 12:47 PM Sunil Kumar Kori  wrote:
>
> >-Original Message-
> >From: Rakesh Kudurumalla 
> >Sent: Tuesday, November 30, 2021 12:12 PM
> >To: Nithin Kumar Dabilpuram ; Kiran Kumar
> >Kokkilagadda ; Sunil Kumar Kori
> >; Satha Koteswara Rao Kottidi
> >
> >Cc: dev@dpdk.org; Rakesh Kudurumalla 
> >Subject: [PATCH 1/2] net/cnxk: update meter bpf ID in rq
> >
> >Patch updates configured meter bpf is in rq context during meter creation
> >
> >Signed-off-by: Rakesh Kudurumalla 

Series applied to dpdk-next-net-mrvl/for-next-net. Thanks.


> >---
> > drivers/net/cnxk/cn10k_rte_flow.c  |  9 -
> >drivers/net/cnxk/cnxk_ethdev_mtr.c | 25 ++---
> > 2 files changed, 22 insertions(+), 12 deletions(-)
> >
> >diff --git a/drivers/net/cnxk/cn10k_rte_flow.c
> >b/drivers/net/cnxk/cn10k_rte_flow.c
> >index b830abe63e..402bb1c72f 100644
> >--- a/drivers/net/cnxk/cn10k_rte_flow.c
> >+++ b/drivers/net/cnxk/cn10k_rte_flow.c
> >@@ -36,20 +36,20 @@ cn10k_mtr_configure(struct rte_eth_dev *eth_dev,
> >   for (i = 0; actions[i].type != RTE_FLOW_ACTION_TYPE_END; i++) {
> >   if (actions[i].type == RTE_FLOW_ACTION_TYPE_METER) {
> >   mtr_conf = (const struct rte_flow_action_meter
> >-  *)(actions->conf);
> >+  *)(actions[i].conf);
> >   mtr_id = mtr_conf->mtr_id;
> >   is_mtr_act = true;
> >   }
> >   if (actions[i].type == RTE_FLOW_ACTION_TYPE_QUEUE) {
> >   q_conf = (const struct rte_flow_action_queue
> >-*)(actions->conf);
> >+*)(actions[i].conf);
> >   if (is_mtr_act)
> >   nix_mtr_rq_update(eth_dev, mtr_id, 1,
> > &q_conf->index);
> >   }
> >   if (actions[i].type == RTE_FLOW_ACTION_TYPE_RSS) {
> >   rss_conf = (const struct rte_flow_action_rss
> >-  *)(actions->conf);
> >+  *)(actions[i].conf);
> >   if (is_mtr_act)
> >   nix_mtr_rq_update(eth_dev, mtr_id,
> > rss_conf->queue_num,
> >@@ -98,7 +98,7 @@ cn10k_rss_action_validate(struct rte_eth_dev *eth_dev,
> >   return -EINVAL;
> >   }
> >
> >-  if (eth_dev->data->dev_conf.rxmode.mq_mode !=
> >RTE_ETH_MQ_RX_RSS) {
> >+  if (eth_dev->data->dev_conf.rxmode.mq_mode != ETH_MQ_RX_RSS) {
> >   plt_err("multi-queue mode is disabled");
> >   return -ENOTSUP;
> >   }
> >@@ -171,7 +171,6 @@ cn10k_flow_create(struct rte_eth_dev *eth_dev, const
> >struct rte_flow_attr *attr,
> >   return NULL;
> >   }
> >   }
> >-
> >   for (i = 0; actions[i].type != RTE_FLOW_ACTION_TYPE_END; i++) {
> >   if (actions[i].type == RTE_FLOW_ACTION_TYPE_METER) {
> >   mtr = (const struct rte_flow_action_meter *)actions[i]
> >diff --git a/drivers/net/cnxk/cnxk_ethdev_mtr.c
> >b/drivers/net/cnxk/cnxk_ethdev_mtr.c
> >index 39d8563826..a36fcb8aaf 100644
> >--- a/drivers/net/cnxk/cnxk_ethdev_mtr.c
> >+++ b/drivers/net/cnxk/cnxk_ethdev_mtr.c
> >@@ -35,7 +35,6 @@ static struct rte_mtr_capabilities mtr_capa = {
> >   .chaining_n_mtrs_per_flow_max = NIX_MTR_COUNT_PER_FLOW,
> >   .chaining_use_prev_mtr_color_supported = true,
> >   .chaining_use_prev_mtr_color_enforced = true,
> >-  .meter_rate_max = NIX_BPF_RATE_MAX / 8, /* Bytes per second */
> >   .color_aware_srtcm_rfc2697_supported = true,
> >   .color_aware_trtcm_rfc2698_supported = true,
> >   .color_aware_trtcm_rfc4115_supported = true, @@ -180,20 +179,20
> >@@ cnxk_nix_mtr_capabilities_get(struct rte_eth_dev *dev,
> > struct rte_mtr_capabilities *capa,
> > struct rte_mtr_error *error)
> > {
> >-  struct cnxk_eth_dev *eth_dev = cnxk_eth_pmd_priv(dev);
> >-  uint16_t count[ROC_NIX_BPF_LEVEL_MAX] = {0};
> >   uint8_t lvl_mask = ROC_NIX_BPF_LEVEL_F_LEAF |
> >ROC_NIX_BPF_LEVEL_F_MID |
> >  ROC_NIX_BPF_LEVEL_F_TOP;
> >+  struct cnxk_eth_dev *eth_dev = cnxk_eth_pmd_priv(dev);
> >+  uint16_t count[ROC_NIX_BPF_LEVEL_MAX] = {0};
> >   struct roc_nix *nix = ð_dev->nix;
> >-  int rc;
> >-  int i;
> >+  uint32_t time_unit;
> >+  int rc, i;
> >
> >   RTE_SET_USED(dev);
> >
> >   if (!capa)
> >   return -rte_mtr_error_set(error, EINVAL,
> >-  RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL,
> >-  "NULL input parameter");
> >+
> >RTE_MTR_ERROR_TYPE_MTR_PARAMS, NULL,
> >+"NU

RE: [EXT] Re: [dpdk-stable] [PATCH v2] test: avoid hang if queues are full and Tx fails

2022-01-10 Thread Rakesh Kudurumalla
ping

> -Original Message-
> From: Rakesh Kudurumalla
> Sent: Monday, December 13, 2021 12:10 PM
> To: Thomas Monjalon ; Jerin Jacob Kollanukkaran
> 
> Cc: sta...@dpdk.org; dev@dpdk.org; david.march...@redhat.com;
> ferruh.yi...@intel.com; andrew.rybche...@oktetlabs.ru;
> ajit.khapa...@broadcom.com
> Subject: RE: [EXT] Re: [dpdk-stable] [PATCH v2] test: avoid hang if queues are
> full and Tx fails
> 
> 
> 
> > -Original Message-
> > From: Thomas Monjalon 
> > Sent: Monday, November 29, 2021 2:44 PM
> > To: Rakesh Kudurumalla ; Jerin Jacob
> > Kollanukkaran 
> > Cc: sta...@dpdk.org; dev@dpdk.org; david.march...@redhat.com;
> > ferruh.yi...@intel.com; andrew.rybche...@oktetlabs.ru;
> > ajit.khapa...@broadcom.com
> > Subject: Re: [EXT] Re: [dpdk-stable] [PATCH v2] test: avoid hang if
> > queues are full and Tx fails
> >
> > 29/11/2021 09:52, Rakesh Kudurumalla:
> > > From: Thomas Monjalon 
> > > > 22/11/2021 08:59, Rakesh Kudurumalla:
> > > > > From: Thomas Monjalon 
> > > > > > 20/07/2021 18:50, Rakesh Kudurumalla:
> > > > > > > Current pmd_perf_autotest() in continuous mode tries to
> > > > > > > enqueue MAX_TRAFFIC_BURST completely before starting the test.
> > > > > > > Some drivers cannot accept complete MAX_TRAFFIC_BURST even
> > > > > > > though
> > > > rx+tx
> > > > > > > desc
> > > > > > count
> > > > > > > can fit it.
> > > > > >
> > > > > > Which driver is failing to do so?
> > > > > > Why it cannot enqueue 32 packets?
> > > > >
> > > > > Octeontx2 driver is failing to enqueue because hardware buffers
> > > > > are full
> > > > before test.
> >
> > Aren't you stopping the support of octeontx2?
> > Why do you care now?
> >  yes we are not supporting octeontx2,but this  issue is observed in
> > cnxk driver ,current patch fixes the same
> > > >
> > > > Why hardware buffers are full?
> > > Hardware buffers are full because number of number of descriptors in
> > > continuous mode Is less than MAX_TRAFFIC_BURST, so if enque fails ,
> > > there is no way hardware can drop the Packets . pmd_per_autotest
> > > application evaluates performance after enqueueing packets Initially.
> > > >
> > > > > pmd_perf_autotest() in continuous mode tries to enqueue
> > > > > MAX_TRAFFIC_BURST (2048) before starting the test.
> > > > >
> > > > > > > This patch changes behaviour to stop enqueuing after few retries.
> > > > > >
> > > > > > If there is a real limitation, there will be issues in more
> > > > > > places than this test program.
> > > > > > I feel it should be addressed either in the driver or at ethdev 
> > > > > > level.
> > > > > >
> > > > > > [...]
> > > > > > > @@ -480,10 +483,19 @@ main_loop(__rte_unused void *args)
> > > > > > >   nb_tx = RTE_MIN(MAX_PKT_BURST, num);
> > > > > > >   nb_tx = rte_eth_tx_burst(portid, 0,
> > > > > > >   &tx_burst[idx],
> > nb_tx);
> > > > > > > + if (nb_tx == 0)
> > > > > > > + retry_cnt++;
> > > > > > >   num -= nb_tx;
> > > > > > >   idx += nb_tx;
> > > > > > > + if (retry_cnt == MAX_RETRY_COUNT) {
> > > > > > > + retry_cnt = 0;
> > > > > > > + break;
> > > > > > > + }
> >
> >



RE: [dpdk-dev] [dpdk-users] A question about the link status of the intel E810 nic can not be up?

2022-01-10 Thread Zhang, Qi Z



> -Original Message-
> From: wangyunjian 
> Sent: Monday, January 10, 2022 11:20 AM
> To: dev@dpdk.org; us...@dpdk.org; Yang, Qiming ;
> Zhang, Qi Z 
> Cc: Huangshaozhang ; dingxiaoxiong
> 
> Subject: [dpdk-dev] [dpdk-users] A question about the link status of the intel
> E810 nic can not be up?
> 
> Hi, All:
> 
> I am using Intel E810 with DPDK v21.11 to create a dpdkbond but there is a
> probability that the failure will occur.
> 
> During the test, the bonding is repeatedly added and deleted. Sometimes, the
> link status of the NIC is Down. And if call ice_dev_set_link_up again, the 
> link
> status of the NIC can be recovered.
> 
> Alternatively, the problem can be avoided by modifying the ice pmd driver.
> 
> diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index
> 13a7a9702a..fcd22e20a5 100644
> --- a/drivers/net/ice/ice_ethdev.c
> +++ b/drivers/net/ice/ice_ethdev.c
> @@ -3604,7 +3604,7 @@ ice_dev_start(struct rte_eth_dev *dev)
> ice_dev_set_link_up(dev);
> 
> /* Call get_link_info aq commond to enable/disable LSE */
> -   ice_link_update(dev, 0);
> +   ice_link_update(dev, 1);

This looks good to me, it's reasonable to wait for complete right after set 
link up as it is not in an link status change interrupt handling scenario.

We will try to reproduce this issue, meanwhile could you help on following 
below 2 things.

1. share the device ID / firmware version that you met this issue.
2. send a v2 with reworded title and commit log as a normal patch if you want 
to contribute.

Thanks
Qi

> 
> pf->adapter_stopped = false;


Re: [dpdk-dev] [RFC PATCH] ethdev: mtr: enhance input color table features

2022-01-10 Thread Jerin Jacob
On Tue, Dec 7, 2021 at 11:52 PM Dumitrescu, Cristian
 wrote:
>
> HI Jerin,

Hi Chistian,


>
> Sorry for my delay. I am currently in vacation until the beginning on January 
> 2022, so my response is slower than usual.

I was too on vacation.

>
> Thanks for the explanation of the capabilities you are trying to enable in 
> the API. Based on this, I am reconsidering some of my previous input. Here 
> are my current thoughts:
>
> a) Each meter object can be configured with multiple methods to determine the 
> input color: IP DSCP and VLAN PCP are just two of them, others are possible 
> (as also listed by you). I think the problem we are trying to solve here is: 
> in case multiple such methods are enabled for a given meter object AND more 
> than one enabled method is applicable for a particular input packet at 
> run-time (e.g. the packet is an IP packet, but it also contains at least one 
> VLAN label, and both IP DSCP and the VLAN PCP methods are enabled for the 
> meter object), how do we decide which method is to be applied? So, IMO we 
> need to define a priority scheme that always picks a single method:
>
> - a unique priority for each method;
>
> - a default method to be used when none of the enabled methods is 
> applicable to the input packet.
>
> b) The default method must be usable even when the input packet type is 
> unknown to the HW (unsupported), e.g. we should still be able to decide the 
> input color of a packet that is not an IP packet, such as ARP, and it does 
> not contain any VLAN labels. For this, I suggest we add a field ins struct 
> rte_mtr_params called default_input_color of type enum rte_color:
>
> struct rte_mtr_params {
> enum rte_color default_input_color; /* Input color to be set 
> for the input packet when none of the enabled input color methods is 
> applicable to the current input packet. Ignored when this method is set to 
> the color blind method. */
> };
>
> c) In terms of methods and their priority, I propose to start with the 
> following options, each of which referring to a set of methods, with a clear 
> way to select a unique method. This is the minimal set needed to support the 
> HW capabilities that you mention, this set can be extended as needed in the 
> future:
>
> enum rte_mtr_input_color_method {
> RTE_MTR_INPUT_COLOR_METHOD_COLOR_BLIND, /* The input color is 
> always green. The default_input_color is ignored for this method. */
> RTE_ MTR_INPUT_COLOR_METHOD_IP_DSCP, /* If the input packet 
> is IPv4 or IPv6, its input color is detected  by the DSCP field indexing into 
> the dscp_table. Otherwise, the default_input_color is applied. */
> RTE_ MTR_INPUT_COLOR_METHOD_OUTER_VLAN_PCP, /* If the input 
> packet has at least one VLAN label, its input color is detected by the 
> outermost VLAN PCP indexing into the vlan_table. Otherwise, the 
> default_input_color is applied. */
> RTE_ MTR_INPUT_COLOR_METHOD_OUTER_VLAN_PCP_IP_DSCP, /* If the 
> input packet has at least one VLAN label, its input color is detected by the 
> outermost VLAN PCP indexing into the vlan_table. Otherwise, if the packet is 
> IPv4 or IPv6, its input color is detected by the DSCP field indexing into the 
> dscp_table. Otherwise, the default_input_color is applied. */
> RTE_ MTR_INPUT_COLOR_METHOD_INNER_VLAN_PCP, /* If the input 
> packet has at least one VLAN label, its input color is detected by the 
> innermost VLAN PCP indexing into the vlan_table. Otherwise, the 
> default_input_color is applied. */
> RTE_ MTR_INPUT_COLOR_METHOD_INNER_VLAN_PCP_IP_DSCP, /* If the 
> input packet has at least one VLAN label, its input color is detected by the 
> innermost VLAN PCP indexing into the vlan_table. Otherwise, if the packet is 
> IPv4 or IPv6, its input color is detected by the DSCP field indexing into the 
> dscp_table. Otherwise, the default_input_color is applied. */
> };
>
> struct rte_mtr_params {
> enum rte_mtr_input_color_method input_color_method;
> };
>
> d) The above point means that all the protocol dependent tables are 
> independent of each other, so they can all exist at the same time, so we 
> cannot put all of them into a union or a single unified input color 
> translation table. In case any of these tables is enabled/required by the 
> input color scheme, but it is set to NULL, it must automatically resolve to 
> an "all-green" table (as it is already the case for the existing dscp_table).
>
> struct rte_mtr_params {
> enum rte_color *ip_dscp_table; /* Used when the IP DSCP input 
> color method is selected for the current input packet. If set to NULL, it 
> defaults to 64 elements of RTE_COLOR_GREEN. */
> enum rte_color *vlan_table; /* Used when the 
> outermost/innermost VLAN PCP input color method is selected for the current 
> input pa

RE: [PATCH v2] net/ice: fix Tx Checksum offload

2022-01-10 Thread Zhang, Qi Z



> -Original Message-
> From: Liu, KevinX 
> Sent: Sunday, December 12, 2021 10:35 PM
> To: dev@dpdk.org
> Cc: Yang, Qiming ; Zhang, Qi Z
> ; Yang, SteveX ; Liu, KevinX
> ; sta...@dpdk.org
> Subject: [PATCH v2] net/ice: fix Tx Checksum offload
> 
> The tunnel packets is missing some information after Tx forwarding.
> 
> In ice_txd_enable_offload, when set tunnel packet Tx checksum offload enable,
> td_offset should be set with outer l2/l3 len instead of inner l2/l3 len.
> 
> In ice_txd_enable_checksum, td_offset should also be set with outer
> l3 len.
> 
> This patch fix the bug that the checksum engine can forward Ipv4/Ipv6 tunnel
> packets.
> 
> Fixes: 28f9002ab67f ("net/ice: add Tx AVX512 offload path")
> Fixes: 17c7d0f9d6a4 ("net/ice: support basic Rx/Tx")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Kevin Liu 

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi



Re: [RFC] mempool: modify flush threshold

2022-01-10 Thread Bruce Richardson
On Sat, Jan 08, 2022 at 12:00:17PM +0100, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richard...@intel.com]
> > Sent: Friday, 7 January 2022 16.12
> > 
> > On Tue, Dec 28, 2021 at 03:28:45PM +0100, Morten Brørup wrote:
> > > Hi mempool maintainers and DPDK team.
> > >
> > > Does anyone know the reason or history why
> > CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it is
> > counterintuitive.
> > >
> > > The mempool cache flush threshold was introduced in DPDK version 1.3;
> > it was not in DPDK version 1.2. The copyright notice for rte_mempool.c
> > says year 2012.
> > >
> > >
> > > Here is my analysis:
> > >
> > > With the multiplier of 1.5, a mempool cache is allowed to be filled
> > up to 50 % above than its target size before its excess entries are
> > flushed to the mempool (thereby reducing the cache length to the target
> > size).
> > >
> > > In the opposite direction, a mempool cache is allowed to be drained
> > completely, i.e. up to 100 % below its target size.
> > >
> > > My instinct tells me that it would be more natural to let a mempool
> > cache go the same amount above and below its target size, i.e. using a
> > flush multiplier of 2 instead of 1.5.
> > >
> > > Also, the cache should be allowed to fill up to and including the
> > flush threshold, so it is flushed when the threshold is exceeded,
> > instead of when it is reached.
> > >
> > > Here is a simplified example:
> > >
> > > Imagine a cache target size of 32, corresponding to a typical packet
> > burst. With a flush threshold of 2 (and len > threshold instead of len
> > >= threshold), the cache could hold 1 +/-1 packet bursts. With the
> > current multiplier it can only hold [0 .. 1.5[ packet bursts, not
> > really providing a lot of elasticity.
> > >
> > Hi Morten,
> > 
> > Interesting to see this being looked at again. The original idea of
> > adding
> > in some extra room above the requested value was to avoid the worst-
> > case
> > scenario of a pool oscillating between full and empty repeatedly due to
> > the
> > addition/removal of perhaps a single packet. As for why 1.5 was chosen
> > as
> > the value, I don't recall any particular reason for it myself. The main
> > objective was to have a separate flush and size values so that we could
> > go
> > a bit above full, and when flushing, not emptying the entire cache down
> > to
> > zero.
> 
> Thanks for providing the historical background for this feature, Bruce.
> 
> > 
> > In terms of the behavioural points you make above, I wonder if symmetry
> > is
> > actually necessary or desirable in this case. After all, the ideal case
> > is
> > probably to keep the mempool neither full nor empty, so that both
> > allocations or frees can be done without having to go to the underlying
> > shared data structure. To accomodate this, the mempool will only flush
> > when
> > the number of elements is greater than size * 1.5, and then it only
> > flushes
> > elements down to size, ensuring that allocations can still take place.
> > On allocation, new buffers are taken when we don't have enough in the
> > cache
> > to fullfil the request, and then the cache is filled up to size, not to
> > the
> > flush threshold.
> 
> I agree with the ideal case.
> 
> However, it looks like the addition of the flush threshold also changed the 
> "size" parameter to effectively become "desired length" instead. This 
> interpretation is also supported by the flush algorithm, which doesn't flush 
> below the "size", but to the "size". So based on interpretation, I was 
> wondering why it is not symmetrical around the "desired length", but allowed 
> to go 100 % below and only 50 % above.
> 
> > 
> > Now, for the scenario you describe - where the mempool cache size is
> > set to
> > be the same as the burst size, this scheme probably does break down, in
> > that we don't really have any burst elasticity. However, I would query
> > if
> > that is a configuration that is used, since to the user it should
> > appear
> > correctly to provide no elasticity. Looking at testpmd, and our example
> > apps, the standard there is a burst size of 32 and mempool cache of
> > ~256.
> > In OVS code, netdev-dpdk.c seems to initialize the mempool  with cache
> > size
> > of RTE_MEMPOOL_CACHE_MAX_SIZE (through define MP_CACHE_SZ). In all
> > these
> > cases, I think the 1.5 threshold should work just fine for us.
> 
> My example was for demonstration only, and hopefully not being used by any 
> applications.
> 
> The simplified example was intended to demonstrate the theoretical effect of 
> the unbalance in using the 1.5 threshold. It will be similar with a cache 
> size of 256 objects: You will be allowed to go 8 bursts (calculation: 256 
> objects / 32 objects per burst) below the cache "size" without retrieving 
> objects from the backing pool, but only 4 bursts above the cache "size" 
> before flushing objects to the backing pool.
> 
> > That said,
> > if you want to bump it up to 2x, I can't s

RE: [PATCH] net/iavf: fix segmentation offload condition

2022-01-10 Thread Zhang, Qi Z



> -Original Message-
> From: Nicolau, Radu 
> Sent: Monday, November 22, 2021 11:30 PM
> To: Wu, Jingjing ; Xing, Beilei 
> Cc: dev@dpdk.org; Zhang, Qi Z ; Nicolau, Radu
> 
> Subject: [PATCH] net/iavf: fix segmentation offload condition
> 
> Apply segmentation offload when requested for non tunneled packets e.g.
> IPsec transport mode.
> 
> Fixes: 1e728b01120c ("net/iavf: rework Tx path")
> 
> Signed-off-by: Radu Nicolau 

Reviewed-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks
Qi



RE: Understanding Flow API action RSS

2022-01-10 Thread Ori Kam
Hi Ivan,

> -Original Message-
> From: Ivan Malov 
> Sent: Sunday, January 9, 2022 3:03 PM
> Subject: RE: Understanding Flow API action RSS
> 
> Hi Ori,
> 
> On Sun, 9 Jan 2022, Ori Kam wrote:
> 
> > Hi Stephen and Ivan
> >
> >> -Original Message-
> >> From: Stephen Hemminger 
> >> Sent: Tuesday, January 4, 2022 11:56 PM
> >> Subject: Re: Understanding Flow API action RSS
> >>
> >> On Tue, 4 Jan 2022 21:29:14 +0300 (MSK)
> >> Ivan Malov  wrote:
> >>
> >>> Hi Stephen,
> >>>
> >>> On Tue, 4 Jan 2022, Stephen Hemminger wrote:
> >>>
>  On Tue, 04 Jan 2022 13:41:55 +0100
>  Thomas Monjalon  wrote:
> 
> > +Cc Ori Kam, rte_flow maintainer
> >
> > 29/12/2021 15:34, Ivan Malov:
> >> Hi all,
> >>
> >> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' 
> >> is
> >> to provide "Queue indices to use". But it is unclear whether the order 
> >> of
> >> elements is meaningful or not. Does that matter? Can queue indices 
> >> repeat?
> 
>  The order probably doesn't matter, it is like the RSS indirection table.
> >>>
> >>> Sorry, but RSS indirection table (RETA) assumes some structure. In it,
> >>> queue indices can repeat, and the order is meaningful. In DPDK, RETA
> >>> may comprise multiple "groups", each one comprising 64 entries.
> >>>
> >>> This 'queue' array in flow action RSS does not stick with the same
> >>> terminology, it does not reuse the definition of RETA "group", etc.
> >>> Just "queue indices to use". No definition of order, no structure.
> >>>
> >>> The API contract is not clear. Neither to users, nor to PMDs.
> >>>
> >> From API in RSS the queues are simply the queue ID, order doesn't matter,
> > Duplicating the queue may affect the the spread based on the HW/PMD.
> > In common case each queue should appear only once and the PMD may duplicate
> > entries to get the best performance.
> 
> Look. In a DPDK PMD, one has "global" RSS table. Consider the following
> example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue
> indices may repeat. They may have different order: 1, 1, 0, 0, ... .
> The order is of great importance. If you send a packet to a
> DPDK-powered server, you can know in advance its hash value.
> Hence, you may strictly predict which RSS table entry this
> hash will point at. That predicts the target Rx queue.
> 
> So the questions which one should attempt to clarify, are as follows:
> 1) Is the 'queue' array ordered? (Does the order of elements matter?)
> 2) Can its elements repeat? (*allowed* or *not allowed*?)
> 
>From API point of view the array is ordered, and may have repeating elements.

> >
> 
> rx queue = RSS_indirection_table[ RSS_hash_value % 
>  RSS_indirection_table_size ]
> 
>  So you could play with multiple queues matching same hash value, but that
>  would be uncommon.
> 
> >> An ethdev may have "global" RSS setting with an indirection table of 
> >> some
> >> fixed size (say, 512). In what comes to flow rules, does that size 
> >> matter?
> 
>  Global RSS is only used if the incoming packet does not match any 
>  rte_flow
>  action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or 
>  RTE_FLOW_ACTION_TYPE_RSS
>  these take precedence.
> >>>
> >>> Yes, I know all of that. The question is how does the PMD select RETA size
> >>> for this action? Can it select an arbitrary value? Or should it stick with
> >>> the "global" one (eg. 512)? How does the user know the table size?
> >>>
> >>> If the user simply wants to spread traffic across the given queues,
> >>> the effective table size is a don't care to them, and the existing
> >>> API contract is fine. But if the user expects that certain packets
> >>> hit some precise queues, they need to know the table size for that.
> >>>
> > Just like you said RSS simply spread the traffic to the given queues.
> 
> Yes, to the given queues. The question is whether the 'queue' array
> has RETA properties (order matters; elements can repeat) or not.
> 

Yes order matters and elements can repeat.

> > If application wants to send traffic to some queue it should use the queue 
> > action.
> 
> Yes, but that's not what I mean. Consider the following example. The user
> generates packets with random IP addresses at machine A. These packets
> hit DPDK at machine B. For a given *packet*, the sender (A) can
> compute its RSS hash in software. This will point out the RETA
> entry index. But, in order to predict the exact *queue* index,
> the sender has to know the table (its contents, its size).
> 
Why do application need this info?

> For a "global" DPDK RSS setting, the table can be easily obtained with
> an ethdev callback / API. Very simple. Fixed-size table, and it can
> be queried. But how does one obtain similar knowledge for RSS action?
> 
The RSS action was designed to allow balanced traffic spread.
The size of the reta is PMD dependent, in some PMD the size wi

RE: [PATCH] common/qat: enable gen4 b devices

2022-01-10 Thread Zhang, Roy Fan
> -Original Message-
> From: Kusztal, ArkadiuszX 
> Sent: Tuesday, December 28, 2021 9:50 AM
> To: dev@dpdk.org
> Cc: gak...@marvell.com; Zhang, Roy Fan ; Kusztal,
> ArkadiuszX 
> Subject: [PATCH] common/qat: enable gen4 b devices
> 
> This commit enables CPM2.0b devices in Intel QuickAssist
> Technology PMD.
> 
> Signed-off-by: Arek Kusztal 
> ---
Acked-by: Fan Zhang 


RE: [PATCH] mempool: optimize incomplete cache handling

2022-01-10 Thread Morten Brørup
+Bruce; you seemed interested in my work in this area.

> From: Jerin Jacob [mailto:jerinjac...@gmail.com]
> Sent: Monday, 10 January 2022 08.27
> 
> On Fri, Jan 7, 2022 at 2:16 PM Morten Brørup 
> wrote:
> >
> > > From: Jerin Jacob [mailto:jerinjac...@gmail.com]
> > > Sent: Thursday, 6 January 2022 17.55
> > >
> > > On Thu, Jan 6, 2022 at 5:54 PM Morten Brørup
> 
> > > wrote:
> > > >
> > > > A flush threshold for the mempool cache was introduced in DPDK
> > > version
> > > > 1.3, but rte_mempool_do_generic_get() was not completely updated
> back
> > > > then.
> > > >
> > > > The incompleteness did not cause any functional bugs, so this
> patch
> > > > could be considered refactoring for the purpose of cleaning up.
> > > >
> > > > This patch completes the update of rte_mempool_do_generic_get()
> as
> > > > follows:
> > > >
> > > > 1. A few comments were malplaced or no longer correct.
> > > > Some comments have been updated/added/corrected.
> > > >
> > > > 2. The code that initially screens the cache request was not
> updated.
> > > > The initial screening compared the request length to the cache
> size,
> > > > which was correct before, but became irrelevant with the
> introduction
> > > of
> > > > the flush threshold. E.g. the cache can hold up to flushthresh
> > > objects,
> > > > which is more than its size, so some requests were not served
> from
> > > the
> > > > cache, even though they could be.
> > > > The initial screening has now been corrected to match the initial
> > > > screening in rte_mempool_do_generic_put(), which verifies that a
> > > cache
> > > > is present, and that the length of the request does not overflow
> the
> > > > memory allocated for the cache.
> > > >
> > > > 3. The code flow for satisfying the request from the cache was
> weird.
> > > > The likely code path where the objects are simply served from the
> > > cache
> > > > was treated as unlikely; now it is treated as likely.
> > > > And in the code path where the cache was backfilled first,
> numbers
> > > were
> > > > added and subtracted from the cache length; now this code path
> simply
> > > > sets the cache length to its final value.
> > > >
> > > > 4. The objects were returned in reverse order.
> > > > Returning the objects in reverse order is not necessary, so
> > > rte_memcpy()
> > > > is now used instead.
> > >
> > > Have you checked the performance with network workload?
> > > IMO, reverse order makes sense(LIFO vs FIFO).
> > > The LIFO makes the cache warm as the same buffers are reused
> > > frequently.
> >
> > I have not done any performance testing. We probably agree that the
> only major difference lies in how the objects are returned. And we
> probably also agree that rte_memcpy() is faster than the copy loop it
> replaced, especially when n is constant at compile time. So the
> performance difference mainly depends on the application, which I will
> discuss below.
> >
> > Let's first consider LIFO vs. FIFO.
> >
> > The key argument for the rte_memcpy() optimization is that we are
> still getting the burst of objects from the top of the stack (LIFO);
> only the order of the objects inside the burst is not reverse anymore.
> >
> > Here is an example:
> >
> > The cache initially contains 8 objects: 01234567.
> >
> > 8 more objects are put into the cache: 89ABCDEF.
> >
> > The cache now holds: 0123456789ABCDEF.
> 
> Agree. However I think, it may matter with less sized L1 cache
> machines and burst size is more where it plays role what can be in L1
> with the scheme.

Good point! Thinking further about it made me realize that the mempool cache 
flushing algorithm is fundamentally flawed, at least in some cases...


rte_mempool_do_generic_put():

When putting objects into the cache, and the cache length exceeds the flush 
threshold, the most recent (hot) objects are flushed to the ring, thus leaving 
the less recent (colder) objects at the top of the cache stack.

Example (cache size: 8, flush threshold: 12, put 8 objects):

Initial cache: 01234567

Cache after putting (hot) objects 89ABCDEF: 0123456789ABCDEF

Cache flush threshold reached. Resulting cache: 01234567

Furthermore, the cache has to be completely depleted before the hot objects 
that were flushed to the ring are retrieved from the ring again.


rte_mempool_do_generic_get():

When getting objects from the cache, and the cache does not hold the requested 
number of objects, the cache will first be backfilled from the ring, thus 
putting colder objects at the top of the cache stack, and then the objects will 
be returned from the top of the cache stack, i.e. the backfilled (cold) objects 
will be returned first.

Example (cache size: 8, get 8 objects):

Initial cache: 0123 (hot or lukewarm)

Cache after backfill to size + requested objects: 0123456789ABCDEF

Returned objects: FEDCBA98 (cold)

Cache after returning objects: 012345678 (i.e. cold objects at the top)


> 
> I would suggest splitting each performance improvement as a separate
> patch for be

Re: Rss hash on mellanox 100G card

2022-01-10 Thread Medvedkin, Vladimir

Hi Yaron,

On 09/01/2022 13:20, Yaron Illouz wrote:
I am using Mellanox Technologies MT27800 Family [ConnectX-5], using dpdk 
19 with multi rx queue with rss 
port_conf.rx_adv_conf.rss_conf.rss_hf=(ETH_RSS_IP | ETH_RSS_UDP | 
ETH_RSS_TCP)


I analyze traffic and need all packet of same session to arrive to the 
same process ( session for now can be ip+port)


So Packet that have the same ip + port arrive to the same queue.

But If some packet are ip fragmented, packet arrive to different 
process. It is a problem!


How can i calculate the hash value in the c++ code, like it is done in 
the card, so i can reassemble packets and send them to the same process 
like the non fragmented packets?




I think you need rte_softrss/rte_thash_gfni() , details could be found here:
https://doc.dpdk.org/guides/prog_guide/toeplitz_hash_lib.html

Or as an alternative option you can configure your NIC hash function to 
calculate the hash on L3 fields only.


--
Regards,
Vladimir


axgbe PMD fixes and updates

2022-01-10 Thread ssebasti
Updates to axgbe pmd driver 
>From sseba...@amd.com # This line is ignored.
From: sseba...@amd.com
Reply-To: 
Subject: axgbe PMD fixes and updates
In-Reply-To: 




[PATCH v1 1/6] net/axgbe: always attempt link training in KR mode

2022-01-10 Thread ssebasti
From: Selwin Sebastian 

Link training is always attempted when in KR mode, but the code is
structured to check if link training has been enabled before attempting
to perform it.Since that check will always be true, simplify the code
to always enable and start link training during KR auto-negotiation.

Signed-off-by: Selwin Sebastian 
---
 drivers/net/axgbe/axgbe_mdio.c | 62 --
 1 file changed, 15 insertions(+), 47 deletions(-)

diff --git a/drivers/net/axgbe/axgbe_mdio.c b/drivers/net/axgbe/axgbe_mdio.c
index 32d8c666f9..913ceada0d 100644
--- a/drivers/net/axgbe/axgbe_mdio.c
+++ b/drivers/net/axgbe/axgbe_mdio.c
@@ -80,31 +80,10 @@ static void axgbe_an_clear_interrupts_all(struct axgbe_port 
*pdata)
axgbe_an37_clear_interrupts(pdata);
 }
 
-static void axgbe_an73_enable_kr_training(struct axgbe_port *pdata)
-{
-   unsigned int reg;
-
-   reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
 
-   reg |= AXGBE_KR_TRAINING_ENABLE;
-   XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL, reg);
-}
-
-static void axgbe_an73_disable_kr_training(struct axgbe_port *pdata)
-{
-   unsigned int reg;
-
-   reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
-
-   reg &= ~AXGBE_KR_TRAINING_ENABLE;
-   XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL, reg);
-}
 
 static void axgbe_kr_mode(struct axgbe_port *pdata)
 {
-   /* Enable KR training */
-   axgbe_an73_enable_kr_training(pdata);
-
/* Set MAC to 10G speed */
pdata->hw_if.set_speed(pdata, SPEED_1);
 
@@ -114,9 +93,6 @@ static void axgbe_kr_mode(struct axgbe_port *pdata)
 
 static void axgbe_kx_2500_mode(struct axgbe_port *pdata)
 {
-   /* Disable KR training */
-   axgbe_an73_disable_kr_training(pdata);
-
/* Set MAC to 2.5G speed */
pdata->hw_if.set_speed(pdata, SPEED_2500);
 
@@ -126,9 +102,6 @@ static void axgbe_kx_2500_mode(struct axgbe_port *pdata)
 
 static void axgbe_kx_1000_mode(struct axgbe_port *pdata)
 {
-   /* Disable KR training */
-   axgbe_an73_disable_kr_training(pdata);
-
/* Set MAC to 1G speed */
pdata->hw_if.set_speed(pdata, SPEED_1000);
 
@@ -142,8 +115,6 @@ static void axgbe_sfi_mode(struct axgbe_port *pdata)
if (pdata->kr_redrv)
return axgbe_kr_mode(pdata);
 
-   /* Disable KR training */
-   axgbe_an73_disable_kr_training(pdata);
 
/* Set MAC to 10G speed */
pdata->hw_if.set_speed(pdata, SPEED_1);
@@ -154,8 +125,6 @@ static void axgbe_sfi_mode(struct axgbe_port *pdata)
 
 static void axgbe_x_mode(struct axgbe_port *pdata)
 {
-   /* Disable KR training */
-   axgbe_an73_disable_kr_training(pdata);
 
/* Set MAC to 1G speed */
pdata->hw_if.set_speed(pdata, SPEED_1000);
@@ -166,8 +135,6 @@ static void axgbe_x_mode(struct axgbe_port *pdata)
 
 static void axgbe_sgmii_1000_mode(struct axgbe_port *pdata)
 {
-   /* Disable KR training */
-   axgbe_an73_disable_kr_training(pdata);
 
/* Set MAC to 1G speed */
pdata->hw_if.set_speed(pdata, SPEED_1000);
@@ -178,8 +145,6 @@ static void axgbe_sgmii_1000_mode(struct axgbe_port *pdata)
 
 static void axgbe_sgmii_100_mode(struct axgbe_port *pdata)
 {
-   /* Disable KR training */
-   axgbe_an73_disable_kr_training(pdata);
 
/* Set MAC to 1G speed */
pdata->hw_if.set_speed(pdata, SPEED_1000);
@@ -284,6 +249,12 @@ static void axgbe_an73_set(struct axgbe_port *pdata, bool 
enable,
 {
unsigned int reg;
 
+   /* Disable KR training for now */
+   reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
+   reg &= ~AXGBE_KR_TRAINING_ENABLE;
+   XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL, reg);
+
+   /* Update AN settings */
reg = XMDIO_READ(pdata, MDIO_MMD_AN, MDIO_CTRL1);
reg &= ~MDIO_AN_CTRL1_ENABLE;
 
@@ -379,20 +350,17 @@ static enum axgbe_an axgbe_an73_tx_training(struct 
axgbe_port *pdata,
XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_FECCTRL, reg);
 
/* Start KR training */
-   reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
-   if (reg & AXGBE_KR_TRAINING_ENABLE) {
-   if (pdata->phy_if.phy_impl.kr_training_pre)
-   pdata->phy_if.phy_impl.kr_training_pre(pdata);
+   if (pdata->phy_if.phy_impl.kr_training_pre)
+   pdata->phy_if.phy_impl.kr_training_pre(pdata);
 
-   reg |= AXGBE_KR_TRAINING_START;
-   XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL,
-   reg);
-
-   PMD_DRV_LOG(DEBUG, "KR training initiated\n");
+   reg = XMDIO_READ(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL);
+   reg |= AXGBE_KR_TRAINING_ENABLE;
+   reg |= AXGBE_KR_TRAINING_START;
+   XMDIO_WRITE(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_10GBR_PMD_CTRL, reg);
 
-   if (p

[PATCH v1 2/6] net/axgbe: toggle PLL settings during rate change

2022-01-10 Thread ssebasti
From: Selwin Sebastian 

For each rate change command submission, the FW has to do a phy
power off sequence internally. For this to happen correctly, the
PLL re-initialization control setting has to be turned off before
sending mailbox commands and re-enabled once the command submission
is complete. Without the PLL control setting, the link up takes
longer time in a fixed phy configuration.

Signed-off-by: Selwin Sebastian 
---
 drivers/net/axgbe/axgbe_common.h   |  9 +
 drivers/net/axgbe/axgbe_phy_impl.c | 22 --
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/net/axgbe/axgbe_common.h b/drivers/net/axgbe/axgbe_common.h
index df0aa21a9b..5a7ac35b6a 100644
--- a/drivers/net/axgbe/axgbe_common.h
+++ b/drivers/net/axgbe/axgbe_common.h
@@ -1314,6 +1314,11 @@
 #define MDIO_VEND2_PMA_CDR_CONTROL 0x8056
 #endif
 
+#ifndef MDIO_VEND2_PMA_MISC_CTRL0
+#define MDIO_VEND2_PMA_MISC_CTRL0  0x8090
+#endif
+
+
 #ifndef MDIO_CTRL1_SPEED1G
 #define MDIO_CTRL1_SPEED1G (MDIO_CTRL1_SPEED10G & ~BMCR_SPEED100)
 #endif
@@ -1392,6 +1397,10 @@ static inline uint32_t high32_value(uint64_t addr)
return (addr >> 32) & 0x0;
 }
 
+#define XGBE_PMA_PLL_CTRL_MASK BIT(15)
+#define XGBE_PMA_PLL_CTRL_SET  BIT(15)
+#define XGBE_PMA_PLL_CTRL_CLEAR0x
+
 /*END*/
 
 /* Bit setting and getting macros
diff --git a/drivers/net/axgbe/axgbe_phy_impl.c 
b/drivers/net/axgbe/axgbe_phy_impl.c
index 02236ec192..dc9489f0aa 100644
--- a/drivers/net/axgbe/axgbe_phy_impl.c
+++ b/drivers/net/axgbe/axgbe_phy_impl.c
@@ -1196,8 +1196,22 @@ static void axgbe_phy_set_redrv_mode(struct axgbe_port 
*pdata)
axgbe_phy_put_comm_ownership(pdata);
 }
 
+static void axgbe_phy_pll_ctrl(struct axgbe_port *pdata, bool enable)
+{
+   XMDIO_WRITE_BITS(pdata, MDIO_MMD_PMAPMD, MDIO_VEND2_PMA_MISC_CTRL0,
+   XGBE_PMA_PLL_CTRL_MASK,
+   enable ? XGBE_PMA_PLL_CTRL_SET
+   : XGBE_PMA_PLL_CTRL_CLEAR);
+
+   /* Wait for command to complete */
+   rte_delay_us(150);
+}
+
 static void axgbe_phy_start_ratechange(struct axgbe_port *pdata)
 {
+   /* Clear the PLL so that it helps in power down sequence */
+   axgbe_phy_pll_ctrl(pdata, false);
+
/* Log if a previous command did not complete */
if (XP_IOREAD_BITS(pdata, XP_DRIVER_INT_RO, STATUS))
PMD_DRV_LOG(NOTICE, "firmware mailbox not ready for command\n");
@@ -1213,10 +1227,14 @@ static void axgbe_phy_complete_ratechange(struct 
axgbe_port *pdata)
wait = AXGBE_RATECHANGE_COUNT;
while (wait--) {
if (!XP_IOREAD_BITS(pdata, XP_DRIVER_INT_RO, STATUS))
-   return;
-
+   goto reenable_pll;
rte_delay_us(1500);
}
+
+reenable_pll:
+/* Re-enable the PLL control */
+   axgbe_phy_pll_ctrl(pdata, true);
+
PMD_DRV_LOG(NOTICE, "firmware mailbox command did not complete\n");
 }
 
-- 
2.25.1



[PATCH v1 3/6] net/axgbe: simplify mailbox interface rate change code

2022-01-10 Thread ssebasti
From: Selwin Sebastian 

Simplify and centralize the mailbox command rate change interface by
having a single function perform the writes to the mailbox registers
to issue the request.

Signed-off-by: Selwin Sebastian 
---
 drivers/net/axgbe/axgbe_phy_impl.c | 95 --
 1 file changed, 23 insertions(+), 72 deletions(-)

diff --git a/drivers/net/axgbe/axgbe_phy_impl.c 
b/drivers/net/axgbe/axgbe_phy_impl.c
index dc9489f0aa..0894dbf74b 100644
--- a/drivers/net/axgbe/axgbe_phy_impl.c
+++ b/drivers/net/axgbe/axgbe_phy_impl.c
@@ -1207,21 +1207,26 @@ static void axgbe_phy_pll_ctrl(struct axgbe_port 
*pdata, bool enable)
rte_delay_us(150);
 }
 
-static void axgbe_phy_start_ratechange(struct axgbe_port *pdata)
+static void axgbe_phy_perform_ratechange(struct axgbe_port *pdata,
+   unsigned int cmd, unsigned int sub_cmd)
 {
+   unsigned int s0 = 0;
+   unsigned int wait;
/* Clear the PLL so that it helps in power down sequence */
axgbe_phy_pll_ctrl(pdata, false);
 
/* Log if a previous command did not complete */
if (XP_IOREAD_BITS(pdata, XP_DRIVER_INT_RO, STATUS))
PMD_DRV_LOG(NOTICE, "firmware mailbox not ready for command\n");
-   else
-   return;
-}
 
-static void axgbe_phy_complete_ratechange(struct axgbe_port *pdata)
-{
-   unsigned int wait;
+   /* Construct the command */
+   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, COMMAND, cmd);
+   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, sub_cmd);
+
+   /* Issue the command */
+   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_0, s0);
+   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_1, 0);
+   XP_IOWRITE_BITS(pdata, XP_DRIVER_INT_REQ, REQUEST, 1);
 
/* Wait for command to complete */
wait = AXGBE_RATECHANGE_COUNT;
@@ -1240,21 +1245,10 @@ static void axgbe_phy_complete_ratechange(struct 
axgbe_port *pdata)
 
 static void axgbe_phy_rrc(struct axgbe_port *pdata)
 {
-   unsigned int s0;
 
-   axgbe_phy_start_ratechange(pdata);
 
/* Receiver Reset Cycle */
-   s0 = 0;
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, COMMAND, 5);
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 0);
-
-   /* Call FW to make the change */
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_0, s0);
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_1, 0);
-   XP_IOWRITE_BITS(pdata, XP_DRIVER_INT_REQ, REQUEST, 1);
-
-   axgbe_phy_complete_ratechange(pdata);
+   axgbe_phy_perform_ratechange(pdata, 5, 0);
 
PMD_DRV_LOG(DEBUG, "receiver reset complete\n");
 }
@@ -1263,13 +1257,9 @@ static void axgbe_phy_power_off(struct axgbe_port *pdata)
 {
struct axgbe_phy_data *phy_data = pdata->phy_data;
 
-   axgbe_phy_start_ratechange(pdata);
+   /* Power off */
+   axgbe_phy_perform_ratechange(pdata, 0, 0);
 
-   /* Call FW to make the change */
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_0, 0);
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_1, 0);
-   XP_IOWRITE_BITS(pdata, XP_DRIVER_INT_REQ, REQUEST, 1);
-   axgbe_phy_complete_ratechange(pdata);
phy_data->cur_mode = AXGBE_MODE_UNKNOWN;
 
PMD_DRV_LOG(DEBUG, "phy powered off\n");
@@ -1278,31 +1268,21 @@ static void axgbe_phy_power_off(struct axgbe_port 
*pdata)
 static void axgbe_phy_sfi_mode(struct axgbe_port *pdata)
 {
struct axgbe_phy_data *phy_data = pdata->phy_data;
-   unsigned int s0;
 
axgbe_phy_set_redrv_mode(pdata);
 
-   axgbe_phy_start_ratechange(pdata);
-
/* 10G/SFI */
-   s0 = 0;
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, COMMAND, 3);
if (phy_data->sfp_cable != AXGBE_SFP_CABLE_PASSIVE) {
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 0);
+   axgbe_phy_perform_ratechange(pdata, 3, 0);
} else {
if (phy_data->sfp_cable_len <= 1)
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 1);
+   axgbe_phy_perform_ratechange(pdata, 3, 1);
else if (phy_data->sfp_cable_len <= 3)
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 2);
+   axgbe_phy_perform_ratechange(pdata, 3, 2);
else
-   XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, SUB_COMMAND, 3);
+   axgbe_phy_perform_ratechange(pdata, 3, 3);
}
 
-   /* Call FW to make the change */
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_0, s0);
-   XP_IOWRITE(pdata, XP_DRIVER_SCRATCH_1, 0);
-   XP_IOWRITE_BITS(pdata, XP_DRIVER_INT_REQ, REQUEST, 1);
-   axgbe_phy_complete_ratechange(pdata);
phy_data->cur_mode = AXGBE_MODE_SFI;
 
PMD_DRV_LOG(DEBUG, "10GbE SFI mode set\n");
@@ -1311,22 +1291,11 @@ static void axgbe_phy_sfi_mode(struct axgbe_port *pdata)
 static void axgbe_phy_kr_mode(struct axgbe_port *pdata)
 {
struct axgbe_phy_data *phy_data = pdata->phy_data;
-   unsigned int

[PATCH v1 6/6] net/axgbe: alter the port speed bit range

2022-01-10 Thread ssebasti
From: Selwin Sebastian 

Newer generation Hardware uses the slightly different
port speed bit widths, so alter the existing port speed
bit range to extend support to the newer generation hardware
while maintaining the backward compatibility with older
generation hardware.

The previously reserved bits are now being used which
then requires the adjustment to the BIT values, e.g.:

Before:
   PORT_PROPERTY_0[22:21] - Reserved
   PORT_PROPERTY_0[26:23] - Supported Speeds

After:
   PORT_PROPERTY_0[21] - Reserved
   PORT_PROPERTY_0[26:22] - Supported Speeds

To make this backwards compatible, the existing BIT
definitions for the port speeds are incremented by one
to maintain the original position.

Signed-off-by: Selwin Sebastian 
---
 drivers/net/axgbe/axgbe_common.h   | 4 ++--
 drivers/net/axgbe/axgbe_phy_impl.c | 8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/axgbe/axgbe_common.h b/drivers/net/axgbe/axgbe_common.h
index a5431dd998..5310ac54f5 100644
--- a/drivers/net/axgbe/axgbe_common.h
+++ b/drivers/net/axgbe/axgbe_common.h
@@ -1032,8 +1032,8 @@
 #define XP_PROP_0_PORT_ID_WIDTH8
 #define XP_PROP_0_PORT_MODE_INDEX  8
 #define XP_PROP_0_PORT_MODE_WIDTH  4
-#define XP_PROP_0_PORT_SPEEDS_INDEX23
-#define XP_PROP_0_PORT_SPEEDS_WIDTH4
+#define XP_PROP_0_PORT_SPEEDS_INDEX22
+#define XP_PROP_0_PORT_SPEEDS_WIDTH5
 #define XP_PROP_1_MAX_RX_DMA_INDEX 24
 #define XP_PROP_1_MAX_RX_DMA_WIDTH 5
 #define XP_PROP_1_MAX_RX_QUEUES_INDEX  8
diff --git a/drivers/net/axgbe/axgbe_phy_impl.c 
b/drivers/net/axgbe/axgbe_phy_impl.c
index 2aad8babd2..776144696a 100644
--- a/drivers/net/axgbe/axgbe_phy_impl.c
+++ b/drivers/net/axgbe/axgbe_phy_impl.c
@@ -7,10 +7,10 @@
 #include "axgbe_common.h"
 #include "axgbe_phy.h"
 
-#define AXGBE_PHY_PORT_SPEED_100   BIT(0)
-#define AXGBE_PHY_PORT_SPEED_1000  BIT(1)
-#define AXGBE_PHY_PORT_SPEED_2500  BIT(2)
-#define AXGBE_PHY_PORT_SPEED_1 BIT(3)
+#define AXGBE_PHY_PORT_SPEED_100   BIT(1)
+#define AXGBE_PHY_PORT_SPEED_1000  BIT(2)
+#define AXGBE_PHY_PORT_SPEED_2500  BIT(3)
+#define AXGBE_PHY_PORT_SPEED_1 BIT(4)
 
 #define AXGBE_MUTEX_RELEASE0x8000
 
-- 
2.25.1



[PATCH v1 4/6] net/axgbe: reset PHY Rx when mailbox command timeout

2022-01-10 Thread ssebasti
From: Selwin Sebastian 

Sometimes mailbox commands timeout when the RX data path becomes
unresponsive. This prevents the submission of new mailbox commands
to DXIO. This patch identifies the timeout and resets the RX data
path so that the next message can be submitted properly.

Signed-off-by: Selwin Sebastian 
---
 drivers/net/axgbe/axgbe_common.h   | 14 ++
 drivers/net/axgbe/axgbe_phy_impl.c | 29 -
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/net/axgbe/axgbe_common.h b/drivers/net/axgbe/axgbe_common.h
index 5a7ac35b6a..a5431dd998 100644
--- a/drivers/net/axgbe/axgbe_common.h
+++ b/drivers/net/axgbe/axgbe_common.h
@@ -1270,10 +1270,18 @@
 #define MDIO_PMA_10GBR_FECCTRL 0x00ab
 #endif
 
+#ifndef MDIO_PMA_RX_CTRL1
+#define MDIO_PMA_RX_CTRL1  0x8051
+#endif
+
 #ifndef MDIO_PCS_DIG_CTRL
 #define MDIO_PCS_DIG_CTRL  0x8000
 #endif
 
+#ifndef MDIO_PCS_DIGITAL_STAT
+#define MDIO_PCS_DIGITAL_STAT  0x8010
+#endif
+
 #ifndef MDIO_AN_XNP
 #define MDIO_AN_XNP0x0016
 #endif
@@ -1354,6 +1362,8 @@
 #define AXGBE_KR_TRAINING_ENABLE   BIT(1)
 
 #define AXGBE_PCS_CL37_BP  BIT(12)
+#define XGBE_PCS_PSEQ_STATE_MASK   0x1c
+#define XGBE_PCS_PSEQ_STATE_POWER_GOOD 0x10
 
 #define AXGBE_AN_CL37_INT_CMPLTBIT(0)
 #define AXGBE_AN_CL37_INT_MASK 0x01
@@ -1401,6 +1411,10 @@ static inline uint32_t high32_value(uint64_t addr)
 #define XGBE_PMA_PLL_CTRL_SET  BIT(15)
 #define XGBE_PMA_PLL_CTRL_CLEAR0x
 
+#define XGBE_PMA_RX_RST_0_MASK BIT(4)
+#define XGBE_PMA_RX_RST_0_RESET_ON 0x10
+#define XGBE_PMA_RX_RST_0_RESET_OFF0x00
+
 /*END*/
 
 /* Bit setting and getting macros
diff --git a/drivers/net/axgbe/axgbe_phy_impl.c 
b/drivers/net/axgbe/axgbe_phy_impl.c
index 0894dbf74b..e52dbb9585 100644
--- a/drivers/net/axgbe/axgbe_phy_impl.c
+++ b/drivers/net/axgbe/axgbe_phy_impl.c
@@ -1196,6 +1196,28 @@ static void axgbe_phy_set_redrv_mode(struct axgbe_port 
*pdata)
axgbe_phy_put_comm_ownership(pdata);
 }
 
+static void axgbe_phy_rx_reset(struct axgbe_port *pdata)
+{
+   int reg;
+
+   reg = XMDIO_READ_BITS(pdata, MDIO_MMD_PCS, MDIO_PCS_DIGITAL_STAT,
+ XGBE_PCS_PSEQ_STATE_MASK);
+   if (reg == XGBE_PCS_PSEQ_STATE_POWER_GOOD) {
+   /* Mailbox command timed out, reset of RX block is required.
+* This can be done by asseting the reset bit and wait for
+* its compeletion.
+*/
+   XMDIO_WRITE_BITS(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_RX_CTRL1,
+XGBE_PMA_RX_RST_0_MASK, 
XGBE_PMA_RX_RST_0_RESET_ON);
+   rte_delay_us(20);
+   XMDIO_WRITE_BITS(pdata, MDIO_MMD_PMAPMD, MDIO_PMA_RX_CTRL1,
+XGBE_PMA_RX_RST_0_MASK, 
XGBE_PMA_RX_RST_0_RESET_OFF);
+   rte_delay_us(45);
+   PMD_DRV_LOG(ERR, "firmware mailbox reset performed\n");
+   }
+}
+
+
 static void axgbe_phy_pll_ctrl(struct axgbe_port *pdata, bool enable)
 {
XMDIO_WRITE_BITS(pdata, MDIO_MMD_PMAPMD, MDIO_VEND2_PMA_MISC_CTRL0,
@@ -1216,8 +1238,10 @@ static void axgbe_phy_perform_ratechange(struct 
axgbe_port *pdata,
axgbe_phy_pll_ctrl(pdata, false);
 
/* Log if a previous command did not complete */
-   if (XP_IOREAD_BITS(pdata, XP_DRIVER_INT_RO, STATUS))
+   if (XP_IOREAD_BITS(pdata, XP_DRIVER_INT_RO, STATUS)) {
PMD_DRV_LOG(NOTICE, "firmware mailbox not ready for command\n");
+   axgbe_phy_rx_reset(pdata);
+   }
 
/* Construct the command */
XP_SET_BITS(s0, XP_DRIVER_SCRATCH_0, COMMAND, cmd);
@@ -1235,6 +1259,9 @@ static void axgbe_phy_perform_ratechange(struct 
axgbe_port *pdata,
goto reenable_pll;
rte_delay_us(1500);
}
+   PMD_DRV_LOG(NOTICE, "firmware mailbox command did not complete\n");
+   /* Reset on error */
+   axgbe_phy_rx_reset(pdata);
 
 reenable_pll:
 /* Re-enable the PLL control */
-- 
2.25.1



[PATCH v1 5/6] net/axgbe: add support for new port mode

2022-01-10 Thread ssebasti
From: Selwin Sebastian 

Add support for a new port mode that is abackplane
connection without support for auto negotiation.

Signed-off-by: Selwin Sebastian 
---
 drivers/net/axgbe/axgbe_phy_impl.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/axgbe/axgbe_phy_impl.c 
b/drivers/net/axgbe/axgbe_phy_impl.c
index e52dbb9585..2aad8babd2 100644
--- a/drivers/net/axgbe/axgbe_phy_impl.c
+++ b/drivers/net/axgbe/axgbe_phy_impl.c
@@ -46,6 +46,7 @@ enum axgbe_port_mode {
AXGBE_PORT_MODE_10GBASE_T,
AXGBE_PORT_MODE_10GBASE_R,
AXGBE_PORT_MODE_SFP,
+   AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG,
AXGBE_PORT_MODE_MAX,
 };
 
@@ -885,6 +886,7 @@ static enum axgbe_mode axgbe_phy_an73_redrv_outcome(struct 
axgbe_port *pdata)
if (ad_reg & 0x80) {
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
mode = AXGBE_MODE_KR;
break;
default:
@@ -894,6 +896,7 @@ static enum axgbe_mode axgbe_phy_an73_redrv_outcome(struct 
axgbe_port *pdata)
} else if (ad_reg & 0x20) {
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
mode = AXGBE_MODE_KX_1000;
break;
case AXGBE_PORT_MODE_1000BASE_X:
@@ -1052,6 +1055,7 @@ static unsigned int axgbe_phy_an_advertising(struct 
axgbe_port *pdata)
 
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
advertising |= ADVERTISED_1baseKR_Full;
break;
case AXGBE_PORT_MODE_BACKPLANE_2500:
@@ -1122,6 +1126,7 @@ static enum axgbe_an_mode axgbe_phy_an_mode(struct 
axgbe_port *pdata)
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
return AXGBE_AN_MODE_CL73;
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
case AXGBE_PORT_MODE_BACKPLANE_2500:
return AXGBE_AN_MODE_NONE;
case AXGBE_PORT_MODE_1000BASE_T:
@@ -1400,6 +1405,7 @@ static enum axgbe_mode axgbe_phy_switch_mode(struct 
axgbe_port *pdata)
 
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
return axgbe_phy_switch_bp_mode(pdata);
case AXGBE_PORT_MODE_BACKPLANE_2500:
return axgbe_phy_switch_bp_2500_mode(pdata);
@@ -1495,6 +1501,7 @@ static enum axgbe_mode axgbe_phy_get_mode(struct 
axgbe_port *pdata,
 
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
return axgbe_phy_get_bp_mode(speed);
case AXGBE_PORT_MODE_BACKPLANE_2500:
return axgbe_phy_get_bp_2500_mode(speed);
@@ -1644,6 +1651,7 @@ static bool axgbe_phy_use_mode(struct axgbe_port *pdata, 
enum axgbe_mode mode)
 
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
return axgbe_phy_use_bp_mode(pdata, mode);
case AXGBE_PORT_MODE_BACKPLANE_2500:
return axgbe_phy_use_bp_2500_mode(pdata, mode);
@@ -1806,6 +1814,7 @@ static bool axgbe_phy_port_mode_mismatch(struct 
axgbe_port *pdata)
 
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
if ((phy_data->port_speeds & AXGBE_PHY_PORT_SPEED_1000) ||
(phy_data->port_speeds & AXGBE_PHY_PORT_SPEED_1))
return false;
@@ -1858,6 +1867,7 @@ static bool axgbe_phy_conn_type_mismatch(struct 
axgbe_port *pdata)
 
switch (phy_data->port_mode) {
case AXGBE_PORT_MODE_BACKPLANE:
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
case AXGBE_PORT_MODE_BACKPLANE_2500:
if (phy_data->conn_type == AXGBE_CONN_TYPE_BACKPLANE)
return false;
@@ -2122,6 +2132,8 @@ static int axgbe_phy_init(struct axgbe_port *pdata)
/* Backplane support */
case AXGBE_PORT_MODE_BACKPLANE:
pdata->phy.supported |= SUPPORTED_Autoneg;
+   /*fallthrough;*/
+   case AXGBE_PORT_MODE_BACKPLANE_NO_AUTONEG:
pdata->phy.supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
pdata->phy.supported |= SUPPORTED_Backplane;
if (phy_data->port_speeds & AXGBE_PHY_PORT_SPEED_1000) {
-- 
2.25.1



RE: [PATCH] dma/idxd: fix burst capacity calculation

2022-01-10 Thread Pai G, Sunil
Hi Bruce, Kevin

This patch seems to have uncovered a bug in the driver.
On applying, the idxd_burst_capacity API seems to return 0 for cases even when 
there are batch descriptors and ring space available.
Seems like there is a wraparound missing when calculating the descriptor ring 
space, causing this behavior.  

Below change seems to fix the issue.

@@ -483,7 +496,7 @@ idxd_burst_capacity(const void *dev_private, uint16_t vchan 
__rte_unused)
/* For descriptors, check for wrap-around on write but not read */  
 
if (idxd->ids_returned > write_idx) 
 
write_idx += idxd->desc_ring_mask + 1;  
 
-   used_space = write_idx - idxd->ids_returned;
 
+   used_space = (write_idx - idxd->ids_returned)&idxd->desc_ring_mask; 
 



Could we include this fix in the current patch ?

Thanks and regards,
Sunil


Re: [PATCH] dma/idxd: fix burst capacity calculation

2022-01-10 Thread Bruce Richardson
On Mon, Jan 10, 2022 at 01:09:02PM +, Pai G, Sunil wrote:
> Hi Bruce, Kevin
> 
> This patch seems to have uncovered a bug in the driver.
> On applying, the idxd_burst_capacity API seems to return 0 for cases even 
> when there are batch descriptors and ring space available.
> Seems like there is a wraparound missing when calculating the descriptor ring 
> space, causing this behavior.
> 
> Below change seems to fix the issue.
> 
> @@ -483,7 +496,7 @@ idxd_burst_capacity(const void *dev_private, uint16_t 
> vchan __rte_unused)
> /* For descriptors, check for wrap-around on write but not read */
> if (idxd->ids_returned > write_idx)
> write_idx += idxd->desc_ring_mask + 1;
> -   used_space = write_idx - idxd->ids_returned;
> +   used_space = (write_idx - idxd->ids_returned)&idxd->desc_ring_mask;
> 
> 
> 
> Could we include this fix in the current patch ?
>
Hi Sunil,

what values for the write_idx and ids_returned vars give this error, and
how does masking help? I'd expect masking to increase the number of times
the function returns zero, rather than decreasing it.

/Bruce 


RE: [PATCH] dma/idxd: fix burst capacity calculation

2022-01-10 Thread Pai G, Sunil
Hi Bruce,

> what values for the write_idx and ids_returned vars give this error, and how
> does masking help? I'd expect masking to increase the number of times the
> function returns zero, rather than decreasing it.


Here are the values from the idxd dump:
dev_capa: 0x50051 - mem2mem sva handles_errors copy fill
max_vchans_supported: 1
nb_vchans_configured: 1
silent_mode: off
 IDXD Private Data ==
Portal: 0x77ffb000
Config: { ring_size: 4096 }
Batch ring (sz = 129, max_batches = 128):
62370  62402  62434  62466  62498  62530  62562  62594  62626  62658  62690  
62722  62754  62786  62818  62850  62882  62914  62946  62978  63010  63042  
63074  6
3106  63138  63170  63202  63234  63266  63298  63330  63362  63394  63426  
63458  63490  63522  63554  63586  63618  63650  63682  63714  63746  63778  
63810  63842  63874  63906  63938  63970  64002  64034  64066  64098  6413
0  64162  64194  64226  64258  64290  64322  64354  64386  64418  64450  64482  
64514  64546  64578  64610  64642  64674  64706  64738  64770  64802  64834  
64866  64898  64930  64962  64994  65026  65058  65090  65122  65154  
65186  65218  65250  65282  65314  65346  65378  65410  65442  65474  65506 [rd 
ptr]  2 [wr ptr]  61442  61474  61506  61538  61570  61602  61634  61666  61698 
 61730  61762  61794  61826  61858  61890  61922  61954  61986  620
18  62050  62082  62114  62146  62178  62210  62242  62274  62306  62338
  Curr batch: start = 2, size = 0
  IDS: avail = 65506, returned: 65506
max packets per batch from hw: 1024
batches left: 127, ring space left: 4064
idxd->desc_ring_mask: 4095, used_space: 4128, used_space: 4128, 
idxd->max_batch_size: 1024, idxd->batch_size: 0
write_idx: 4098, idxd->batch_idx_read: 98, idxd->batch_idx_write: 99, 
idxd->desc_ring_mask - used_space: 65503

relevant data from above:
write_idx: 4098 , IDS returned: 65506, idxd->desc_ring_mask: 4095

without the fix :
used_space = write_idx - idxd->ids_returned; (4098 - 65506)   = -61408

with fix: 
used_space = (write_idx - idxd->ids_returned)& idxd->desc_ring_mask; (4098 - 
65506)&4095   = 32 
which seems to match the rd ptr and wr ptr.

Thanks and regards,
Sunil


RE: [PATCH v3 2/2] examples/l3fwd: add config file support for EM

2022-01-10 Thread Ananyev, Konstantin


 
> Add support to define ipv4 and ipv6 forwarding tables
> from reading from a config file for EM with a format
> similar to l3fwd-acl one.
> 
> With the removal of the hardcoded route tables for IPv4
> and IPv6 from 'l3fwd_em', these routes have been moved
> to a separate default config file for use with EM.
> 
> Related l3fwd docs have been updated to relfect these
> changes.
> 
> Signed-off-by: Sean Morrissey 
> Signed-off-by: Ravi Kerur 
> ---

Acked-by: Konstantin Ananyev 

> 2.25.1



RE: [dpdk-dev] [PATCH] ring: fix error return value when creating ring

2022-01-10 Thread Ananyev, Konstantin
> The error value returned by rte_ring_create_elem() should be positive
> integers. However, if the rte_ring_get_memsize_elem() function fails,
> a negative number is returned and is directly used as the return value.
> As a result, this will cause the external call to check the return
> value to fail(like called by rte_mempool_create()).
> 
> Fixes: a182620042aa ("ring: get size in memory")
> Cc: sta...@dpdk.org
> 
> Reported-by: Nan Zhou 
> Signed-off-by: Yunjian Wang 
> ---
>  lib/ring/rte_ring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/ring/rte_ring.c b/lib/ring/rte_ring.c
> index f17bd966be..185f9be798 100644
> --- a/lib/ring/rte_ring.c
> +++ b/lib/ring/rte_ring.c
> @@ -267,7 +267,7 @@ rte_ring_create_elem(const char *name, unsigned int 
> esize, unsigned int count,
> 
>   ring_size = rte_ring_get_memsize_elem(esize, count);
>   if (ring_size < 0) {
> - rte_errno = ring_size;
> + rte_errno = -ring_size;
>   return NULL;
>   }
> 
> --

Acked-by: Konstantin Ananyev 

> 2.27.0



RE: [PATCH] ring: update Doxygen comments re RING_F_EXACT_SZ

2022-01-10 Thread Ananyev, Konstantin



> - Add RING_F_EXACT_SZ description to rte_ring_init and
>   rte_ring_create param comments.
> - Fix ring size comments.
> 
> Signed-off-by: Robert Sanford 
> ---

Acked-by: Konstantin Ananyev 

> 2.7.4



Re: [PATCH v1] maintainers: update for testpmd

2022-01-10 Thread Ferruh Yigit

On 1/7/2022 9:10 AM, Li, Xiaoyun wrote:

-Original Message-
From: Zhang, Yuying 
Sent: Friday, January 7, 2022 16:54
To: dev@dpdk.org; Li, Xiaoyun ; Yigit, Ferruh
; tho...@monjalon.net
Cc: Zhang, Yuying 
Subject: [PATCH v1] maintainers: update for testpmd

Add Yuying Zhang as a co-maintainer.

Signed-off-by: Yuying Zhang 


Acked-by: Xiaoyun Li 



Acked-by: Ferruh Yigit 


Thanks for the volunteer.



+1


---
  MAINTAINERS | 1 +
  1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 18d9edaf88..852595fe91 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1694,6 +1694,7 @@ F: app/test/sample_packet_forward.h  Networking
drivers testing tool
  M: Xiaoyun Li 
  M: Aman Singh 
+M: Yuying Zhang 
  T: git://dpdk.org/next/dpdk-next-net
  F: app/test-pmd/
  F: doc/guides/testpmd_app_ug/
--
2.25.1






RE: Understanding Flow API action RSS

2022-01-10 Thread Ivan Malov

Hi Ori,

Many-many thanks for your commentary.

The nature of 'queue' array in flow action RSS is clear now.
I hope PMD vendors and API users share this vision, too.
Propably, this should be properly documented.
We'll see what we cad do in that direction.

Please see one more question below.

On Mon, 10 Jan 2022, Ori Kam wrote:


Hi Ivan,


-Original Message-
From: Ivan Malov 
Sent: Sunday, January 9, 2022 3:03 PM
Subject: RE: Understanding Flow API action RSS

Hi Ori,

On Sun, 9 Jan 2022, Ori Kam wrote:


Hi Stephen and Ivan


-Original Message-
From: Stephen Hemminger 
Sent: Tuesday, January 4, 2022 11:56 PM
Subject: Re: Understanding Flow API action RSS

On Tue, 4 Jan 2022 21:29:14 +0300 (MSK)
Ivan Malov  wrote:


Hi Stephen,

On Tue, 4 Jan 2022, Stephen Hemminger wrote:


On Tue, 04 Jan 2022 13:41:55 +0100
Thomas Monjalon  wrote:


+Cc Ori Kam, rte_flow maintainer

29/12/2021 15:34, Ivan Malov:

Hi all,

In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is
to provide "Queue indices to use". But it is unclear whether the order of
elements is meaningful or not. Does that matter? Can queue indices repeat?


The order probably doesn't matter, it is like the RSS indirection table.


Sorry, but RSS indirection table (RETA) assumes some structure. In it,
queue indices can repeat, and the order is meaningful. In DPDK, RETA
may comprise multiple "groups", each one comprising 64 entries.

This 'queue' array in flow action RSS does not stick with the same
terminology, it does not reuse the definition of RETA "group", etc.
Just "queue indices to use". No definition of order, no structure.

The API contract is not clear. Neither to users, nor to PMDs.


From API in RSS the queues are simply the queue ID, order doesn't matter,

Duplicating the queue may affect the the spread based on the HW/PMD.
In common case each queue should appear only once and the PMD may duplicate
entries to get the best performance.


Look. In a DPDK PMD, one has "global" RSS table. Consider the following
example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue
indices may repeat. They may have different order: 1, 1, 0, 0, ... .
The order is of great importance. If you send a packet to a
DPDK-powered server, you can know in advance its hash value.
Hence, you may strictly predict which RSS table entry this
hash will point at. That predicts the target Rx queue.

So the questions which one should attempt to clarify, are as follows:
1) Is the 'queue' array ordered? (Does the order of elements matter?)
2) Can its elements repeat? (*allowed* or *not allowed*?)

From API point of view the array is ordered, and may have repeating elements.






   rx queue = RSS_indirection_table[ RSS_hash_value % 
RSS_indirection_table_size ]

So you could play with multiple queues matching same hash value, but that
would be uncommon.


An ethdev may have "global" RSS setting with an indirection table of some
fixed size (say, 512). In what comes to flow rules, does that size matter?


Global RSS is only used if the incoming packet does not match any rte_flow
action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS
these take precedence.


Yes, I know all of that. The question is how does the PMD select RETA size
for this action? Can it select an arbitrary value? Or should it stick with
the "global" one (eg. 512)? How does the user know the table size?

If the user simply wants to spread traffic across the given queues,
the effective table size is a don't care to them, and the existing
API contract is fine. But if the user expects that certain packets
hit some precise queues, they need to know the table size for that.


Just like you said RSS simply spread the traffic to the given queues.


Yes, to the given queues. The question is whether the 'queue' array
has RETA properties (order matters; elements can repeat) or not.



Yes order matters and elements can repeat.


If application wants to send traffic to some queue it should use the queue 
action.


Yes, but that's not what I mean. Consider the following example. The user
generates packets with random IP addresses at machine A. These packets
hit DPDK at machine B. For a given *packet*, the sender (A) can
compute its RSS hash in software. This will point out the RETA
entry index. But, in order to predict the exact *queue* index,
the sender has to know the table (its contents, its size).


Why do application need this info?


For a "global" DPDK RSS setting, the table can be easily obtained with
an ethdev callback / API. Very simple. Fixed-size table, and it can
be queried. But how does one obtain similar knowledge for RSS action?


The RSS action was designed to allow balanced traffic spread.
The size of the reta is PMD dependent, in some PMD the size will be
the number of queues in others it will be the number of queues but in
power of 2, so if the app requested 8 queues the reta will also be 8.
In any case PMD should use the given order, if 

RE: [PATCH 1/1] ring: fix off by 1 mistake

2022-01-10 Thread Ananyev, Konstantin
> When enqueueing/dequeueing to/from the ring we try to optimize by manual
> loop unrolling.  The check for this optimization looks like:
> 
>   if (likely(idx + n < size)) {
> 
> where 'idx' points to the first usable element (empty slot for enqueue,
> data for dequeue).  The correct comparison here should be '<=' instead
> of '<'.
> 
> This is not a functional error since we fall back to the loop with
> correct checks on indexes.  Just a minor suboptimal behaviour for the
> case when we want to enqueue/dequeue exactly the number of elements that
> we have in the ring before wrapping to its beginning.
> 
> Signed-off-by: Andrzej Ostruszka 
> ---
>  lib/ring/rte_ring_elem_pvt.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h
> index 275ec55393..83788c56e6 100644
> --- a/lib/ring/rte_ring_elem_pvt.h
> +++ b/lib/ring/rte_ring_elem_pvt.h
> @@ -17,7 +17,7 @@ __rte_ring_enqueue_elems_32(struct rte_ring *r, const 
> uint32_t size,
>   unsigned int i;
>   uint32_t *ring = (uint32_t *)&r[1];
>   const uint32_t *obj = (const uint32_t *)obj_table;
> - if (likely(idx + n < size)) {
> + if (likely(idx + n <= size)) {
>   for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
>   ring[idx] = obj[i];
>   ring[idx + 1] = obj[i + 1];
> @@ -62,7 +62,7 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r, uint32_t 
> prod_head,
>   uint32_t idx = prod_head & r->mask;
>   uint64_t *ring = (uint64_t *)&r[1];
>   const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table;
> - if (likely(idx + n < size)) {
> + if (likely(idx + n <= size)) {
>   for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>   ring[idx] = obj[i];
>   ring[idx + 1] = obj[i + 1];
> @@ -95,7 +95,7 @@ __rte_ring_enqueue_elems_128(struct rte_ring *r, uint32_t 
> prod_head,
>   uint32_t idx = prod_head & r->mask;
>   rte_int128_t *ring = (rte_int128_t *)&r[1];
>   const rte_int128_t *obj = (const rte_int128_t *)obj_table;
> - if (likely(idx + n < size)) {
> + if (likely(idx + n <= size)) {
>   for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
>   memcpy((void *)(ring + idx),
>   (const void *)(obj + i), 32);
> @@ -151,7 +151,7 @@ __rte_ring_dequeue_elems_32(struct rte_ring *r, const 
> uint32_t size,
>   unsigned int i;
>   uint32_t *ring = (uint32_t *)&r[1];
>   uint32_t *obj = (uint32_t *)obj_table;
> - if (likely(idx + n < size)) {
> + if (likely(idx + n <= size)) {
>   for (i = 0; i < (n & ~0x7); i += 8, idx += 8) {
>   obj[i] = ring[idx];
>   obj[i + 1] = ring[idx + 1];
> @@ -196,7 +196,7 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r, uint32_t 
> prod_head,
>   uint32_t idx = prod_head & r->mask;
>   uint64_t *ring = (uint64_t *)&r[1];
>   unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table;
> - if (likely(idx + n < size)) {
> + if (likely(idx + n <= size)) {
>   for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {
>   obj[i] = ring[idx];
>   obj[i + 1] = ring[idx + 1];
> @@ -229,7 +229,7 @@ __rte_ring_dequeue_elems_128(struct rte_ring *r, uint32_t 
> prod_head,
>   uint32_t idx = prod_head & r->mask;
>   rte_int128_t *ring = (rte_int128_t *)&r[1];
>   rte_int128_t *obj = (rte_int128_t *)obj_table;
> - if (likely(idx + n < size)) {
> + if (likely(idx + n <= size)) {
>   for (i = 0; i < (n & ~0x1); i += 2, idx += 2)
>   memcpy((void *)(obj + i), (void *)(ring + idx), 32);
>   switch (n & 0x1) {
> --

Acked-by: Konstantin Ananyev 

> 2.34.1.448.ga2b2bfdf31-goog



Re: [PATCH] dma/idxd: fix burst capacity calculation

2022-01-10 Thread Bruce Richardson
On Mon, Jan 10, 2022 at 01:44:06PM +, Pai G, Sunil wrote:
> Hi Bruce,
> 
> > what values for the write_idx and ids_returned vars give this error, and how
> > does masking help? I'd expect masking to increase the number of times the
> > function returns zero, rather than decreasing it.
> 
> 
> Here are the values from the idxd dump:
> dev_capa: 0x50051 - mem2mem sva handles_errors copy fill
> max_vchans_supported: 1
> nb_vchans_configured: 1
> silent_mode: off
>  IDXD Private Data ==
> Portal: 0x77ffb000
> Config: { ring_size: 4096 }
> Batch ring (sz = 129, max_batches = 128):
> 62370  62402  62434  62466  62498  62530  62562  62594  62626  62658  62690  
> 62722  62754  62786  62818  62850  62882  62914  62946  62978  63010  63042  
> 63074  6
> 3106  63138  63170  63202  63234  63266  63298  63330  63362  63394  63426  
> 63458  63490  63522  63554  63586  63618  63650  63682  63714  63746  63778  
> 63810  63842  63874  63906  63938  63970  64002  64034  64066  64098  6413
> 0  64162  64194  64226  64258  64290  64322  64354  64386  64418  64450  
> 64482  64514  64546  64578  64610  64642  64674  64706  64738  64770  64802  
> 64834  64866  64898  64930  64962  64994  65026  65058  65090  65122  65154
> 65186  65218  65250  65282  65314  65346  65378  65410  65442  65474  65506 
> [rd ptr]  2 [wr ptr]  61442  61474  61506  61538  61570  61602  61634  61666  
> 61698  61730  61762  61794  61826  61858  61890  61922  61954  61986  620
> 18  62050  62082  62114  62146  62178  62210  62242  62274  62306  62338
>   Curr batch: start = 2, size = 0
>   IDS: avail = 65506, returned: 65506
> max packets per batch from hw: 1024
> batches left: 127, ring space left: 4064
> idxd->desc_ring_mask: 4095, used_space: 4128, used_space: 4128, 
> idxd->max_batch_size: 1024, idxd->batch_size: 0
> write_idx: 4098, idxd->batch_idx_read: 98, idxd->batch_idx_write: 99, 
> idxd->desc_ring_mask - used_space: 65503
> 
> relevant data from above:
> write_idx: 4098 , IDS returned: 65506, idxd->desc_ring_mask: 4095
> 
> without the fix :
> used_space = write_idx - idxd->ids_returned; (4098 - 65506)   = -61408
> 
> with fix:
> used_space = (write_idx - idxd->ids_returned)& idxd->desc_ring_mask; (4098 - 
> 65506)&4095   = 32
> which seems to match the rd ptr and wr ptr.
> 
Thanks, Sunil, that's clear now.
Rather than clamping at the end, I think it may be more logical to clamp
the ids_returned value at the start instead. How about the following diff -
does that also fix it for you?

/Bruce

--- a/drivers/dma/idxd/idxd_common.c
+++ b/drivers/dma/idxd/idxd_common.c
@@ -472,6 +472,7 @@ uint16_t
 idxd_burst_capacity(const void *dev_private, uint16_t vchan __rte_unused)
 {
const struct idxd_dmadev *idxd = dev_private;
+   const uint16_t read_idx = idxd->ids_returned & idxd->desc_ring_mask;
uint16_t write_idx = idxd->batch_start + idxd->batch_size;
uint16_t used_space;
 
@@ -481,9 +482,9 @@ idxd_burst_capacity(const void *dev_private, uint16_t vchan 
__rte_unused)
return 0;
 
/* For descriptors, check for wrap-around on write but not read */
-   if (idxd->ids_returned > write_idx)
+   if (read_idx > write_idx)
write_idx += idxd->desc_ring_mask + 1;
-   used_space = write_idx - idxd->ids_returned;
+   used_space = write_idx - read_idx;
 
return RTE_MIN((idxd->desc_ring_mask - used_space), 
idxd->max_batch_size);
 }



RE: Understanding Flow API action RSS

2022-01-10 Thread Ori Kam
Hi Ian,


> -Original Message-
> From: Ivan Malov 
> Subject: RE: Understanding Flow API action RSS
> 
> Hi Ori,
> 
> Many-many thanks for your commentary.
> 
> The nature of 'queue' array in flow action RSS is clear now.
> I hope PMD vendors and API users share this vision, too.
> Propably, this should be properly documented.
> We'll see what we cad do in that direction.
> 
> Please see one more question below.
> 
> On Mon, 10 Jan 2022, Ori Kam wrote:
> 
> > Hi Ivan,
> >
> >> -Original Message-
> >> From: Ivan Malov 
> >> Sent: Sunday, January 9, 2022 3:03 PM
> >> Subject: RE: Understanding Flow API action RSS
> >>
> >> Hi Ori,
> >>
> >> On Sun, 9 Jan 2022, Ori Kam wrote:
> >>
> >>> Hi Stephen and Ivan
> >>>
>  -Original Message-
>  From: Stephen Hemminger 
>  Sent: Tuesday, January 4, 2022 11:56 PM
>  Subject: Re: Understanding Flow API action RSS
> 
>  On Tue, 4 Jan 2022 21:29:14 +0300 (MSK)
>  Ivan Malov  wrote:
> 
> > Hi Stephen,
> >
> > On Tue, 4 Jan 2022, Stephen Hemminger wrote:
> >
> >> On Tue, 04 Jan 2022 13:41:55 +0100
> >> Thomas Monjalon  wrote:
> >>
> >>> +Cc Ori Kam, rte_flow maintainer
> >>>
> >>> 29/12/2021 15:34, Ivan Malov:
>  Hi all,
> 
>  In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 
>  'queue' is
>  to provide "Queue indices to use". But it is unclear whether the 
>  order of
>  elements is meaningful or not. Does that matter? Can queue indices 
>  repeat?
> >>
> >> The order probably doesn't matter, it is like the RSS indirection 
> >> table.
> >
> > Sorry, but RSS indirection table (RETA) assumes some structure. In it,
> > queue indices can repeat, and the order is meaningful. In DPDK, RETA
> > may comprise multiple "groups", each one comprising 64 entries.
> >
> > This 'queue' array in flow action RSS does not stick with the same
> > terminology, it does not reuse the definition of RETA "group", etc.
> > Just "queue indices to use". No definition of order, no structure.
> >
> > The API contract is not clear. Neither to users, nor to PMDs.
> >
>  From API in RSS the queues are simply the queue ID, order doesn't matter,
> >>> Duplicating the queue may affect the the spread based on the HW/PMD.
> >>> In common case each queue should appear only once and the PMD may 
> >>> duplicate
> >>> entries to get the best performance.
> >>
> >> Look. In a DPDK PMD, one has "global" RSS table. Consider the following
> >> example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue
> >> indices may repeat. They may have different order: 1, 1, 0, 0, ... .
> >> The order is of great importance. If you send a packet to a
> >> DPDK-powered server, you can know in advance its hash value.
> >> Hence, you may strictly predict which RSS table entry this
> >> hash will point at. That predicts the target Rx queue.
> >>
> >> So the questions which one should attempt to clarify, are as follows:
> >> 1) Is the 'queue' array ordered? (Does the order of elements matter?)
> >> 2) Can its elements repeat? (*allowed* or *not allowed*?)
> >>
> >> From API point of view the array is ordered, and may have repeating 
> >> elements.
> >
> >>>
> >>
> >>rx queue = RSS_indirection_table[ RSS_hash_value % 
> >> RSS_indirection_table_size ]
> >>
> >> So you could play with multiple queues matching same hash value, but 
> >> that
> >> would be uncommon.
> >>
>  An ethdev may have "global" RSS setting with an indirection table of 
>  some
>  fixed size (say, 512). In what comes to flow rules, does that size 
>  matter?
> >>
> >> Global RSS is only used if the incoming packet does not match any 
> >> rte_flow
> >> action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or 
> >> RTE_FLOW_ACTION_TYPE_RSS
> >> these take precedence.
> >
> > Yes, I know all of that. The question is how does the PMD select RETA 
> > size
> > for this action? Can it select an arbitrary value? Or should it stick 
> > with
> > the "global" one (eg. 512)? How does the user know the table size?
> >
> > If the user simply wants to spread traffic across the given queues,
> > the effective table size is a don't care to them, and the existing
> > API contract is fine. But if the user expects that certain packets
> > hit some precise queues, they need to know the table size for that.
> >
> >>> Just like you said RSS simply spread the traffic to the given queues.
> >>
> >> Yes, to the given queues. The question is whether the 'queue' array
> >> has RETA properties (order matters; elements can repeat) or not.
> >>
> >
> > Yes order matters and elements can repeat.
> >
> >>> If application wants to send traffic to some queue it should use the 
> >>> queue action.
> >>
> >> Yes, but that's

[PATCH] net/sfc: validate queue span when parsing flow action RSS

2022-01-10 Thread Ivan Malov
The current code silently shrinks the value if it exceeds
the supported maximum. Do not do that. Validate the value.

Fixes: d77d07391d4d ("net/sfc: support flow API RSS action")
Cc: sta...@dpdk.org

Signed-off-by: Ivan Malov 
Reviewed-by: Andrew Rybchenko 
---
 drivers/net/sfc/sfc_flow.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index fc74c8035e..509fde4a86 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -1477,6 +1477,9 @@ sfc_flow_parse_rss(struct sfc_adapter *sa,
rxq_hw_index_max = rxq->hw_index;
}
 
+   if (rxq_hw_index_max - rxq_hw_index_min + 1 > EFX_MAXRSS)
+   return -EINVAL;
+
switch (action_rss->func) {
case RTE_ETH_HASH_FUNCTION_DEFAULT:
case RTE_ETH_HASH_FUNCTION_TOEPLITZ:
@@ -1612,9 +1615,8 @@ sfc_flow_filter_insert(struct sfc_adapter *sa,
uint8_t *rss_key;
 
if (spec_filter->rss) {
-   rss_spread = MIN(flow_rss->rxq_hw_index_max -
-   flow_rss->rxq_hw_index_min + 1,
-   EFX_MAXRSS);
+   rss_spread = flow_rss->rxq_hw_index_max -
+flow_rss->rxq_hw_index_min + 1;
rss_hash_types = flow_rss->rss_hash_types;
rss_key = flow_rss->rss_key;
} else {
-- 
2.30.2



RE: [PATCH v3] net/ixgbe: add vector Rx parameter check

2022-01-10 Thread Zhang, Qi Z



> -Original Message-
> From: Wang, Haiyue 
> Sent: Tuesday, December 14, 2021 3:29 PM
> To: Rong, Leyi ; Bin Zheng
> ; dev@dpdk.org
> Cc: lian...@liangbit.com; sta...@dpdk.org; jia@intel.com
> Subject: RE: [PATCH v3] net/ixgbe: add vector Rx parameter check
> 
> > -Original Message-
> > From: Rong, Leyi 
> > Sent: Monday, December 13, 2021 11:03
> > To: Bin Zheng ; dev@dpdk.org
> > Cc: Wang, Haiyue ; lian...@liangbit.com;
> > sta...@dpdk.org; jia@intel.com
> > Subject: RE: [PATCH v3] net/ixgbe: add vector Rx parameter check
> >
> >
> > > -Original Message-
> > > From: Bin Zheng 
> > > Sent: Friday, December 10, 2021 4:22 PM
> > > To: dev@dpdk.org
> > > Cc: Wang, Haiyue ; lian...@liangbit.com;
> > > sta...@dpdk.org; Rong, Leyi ; Bin Zheng
> > > ; jia@intel.com
> > > Subject: [PATCH v3] net/ixgbe: add vector Rx parameter check
> > >
> > > Under the circumstance that `rx_tail` wrap back to zero and the
> > > advance speed of `rx_tail` is greater than `rxrearm_start`,
> > > `rx_tail` will catch up with `rxrearm_start` and surpass it.
> > > This may cause some mbufs be reused by application.
> > >
> > > So we need to make some restrictions to ensure that  `rx_tail` will
> > > not exceed `rxrearm_start`.
> > >
> > > e.g.
> > >
> > > RDH: 972 RDT: 991 rxrearm_nb: 991 rxrearm_start: 992 rx_tail: 959
> > > RDH: 1004 RDT: 1023 rxrearm_nb: 991 rxrearm_start: 0 rx_tail: 991
> > > RDH: 12 RDT: 31 rxrearm_nb: 991 rxrearm_start: 32 rx_tail: 1023
> > > RDH: 31 RDT: 63 rxrearm_nb: 960 rxrearm_start: 64 rx_tail: 0
> > > RDH: 95 RDT: 95 rxrearm_nb: 1016 rxrearm_start: 96 rx_tail: 88
> > > RDH: 95 RDT: 127 rxrearm_nb: 991 rxrearm_start: 128 rx_tail: 95 ...
> > > RDH: 908 RDT: 927 rxrearm_nb: 991 rxrearm_start: 928 rx_tail: 895
> > > RDH: 940 RDT: 959 rxrearm_nb: 991 rxrearm_start: 960 rx_tail: 927
> > > RDH: 980 RDT: 991 rxrearm_nb: 991 rxrearm_start: 992 rx_tail: 959
> > > RDH: 991 RDT: 991 rxrearm_nb: 1026 rxrearm_start: 992 rx_tail: 994
> > >
> > > when `rx_tail` catches up with `rxrearm_start`,
> > > 2(994 - 992) mbufs be reused by application !
> > >
> > > Bugzilla ID: 882
> > > Fixes: 5a3cca342417 ("net/ixgbe: fix vector Rx")
> > > Cc: jia@intel.com
> > > Cc: sta...@dpdk.org
> > >
> > > Signed-off-by: Bin Zheng 
> > > ---
> > >  drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 11 +++
> > >  1 file changed, 11 insertions(+)
> > >
> 
> 
> >
> > Acked-by: Leyi Rong 
> 
> Reviewed-by: Haiyue Wang 

Applied to dpdk-next-net-intel.

Thanks
Qi


RE: [PATCH 1/1] mempool: implement index-based per core cache

2022-01-10 Thread Ananyev, Konstantin



 
> Current mempool per core cache implementation stores pointers to mbufs
> On 64b architectures, each pointer consumes 8B
> This patch replaces it with index-based implementation,
> where in each buffer is addressed by (pool base address + index)
> It reduces the amount of memory/cache required for per core cache
> 
> L3Fwd performance testing reveals minor improvements in the cache
> performance (L1 and L2 misses reduced by 0.60%)
> with no change in throughput

I feel really sceptical about that patch and the whole idea in general:
- From what I read above there is no real performance improvement observed.
  (In fact on my IA boxes mempool_perf_autotest reports ~20% slowdown,
  see below for more details). 
- Space utilization difference looks neglectable too.
- The change introduces a new build time config option with a major limitation:
   All memzones in a pool have to be within the same 4GB boundary. 
   To address it properly, extra changes will be required in init(/populate) 
part of the code.
   All that will complicate mempool code, will make it more error prone
   and harder to maintain.
But, as there is no real gain in return - no point to add such extra complexity 
at all.

Konstantin

CSX 2.1 GHz
==

echo 'mempool_perf_autotest' | ./dpdk-test -n 4 --lcores='6-13' --no-pci

params :
  rate_persec   

 (normal/index-based/diff %)
(with cache)
cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=32 : 
740989337.00/504116019.00/-31.97
cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=128 : 
756495155.00/615002931.00/-18.70
cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=32 : 
1483499110.00/1007248997.00/-32.10
cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=128 : 
1512439807.00/1229927218.00/-18.68
cache=512 cores=8 n_get_bulk=32 n_put_bulk=32 n_keep=32 : 
5933668757.00/4029048421.00/-32.10
cache=512 cores=8 n_get_bulk=32 n_put_bulk=32 n_keep=128 : 
6049234942.00/492344.00/-18.65

(with user-owned cache)
cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=32 : 
630600499.00/504312627.00/-20.03
 cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=128 : 
756259225.00/615042252.00/-18.67
 cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=32 : 
1262052966.00/1007039283.00/-20.21
 cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=128 : 
1517853081.00/1230818508.00/-18.91
 cache=512 cores=8 n_get_bulk=32 n_put_bulk=32 n_keep=32 
:5054529533.00/4028052273.00/-20.31
 cache=512 cores=8 n_get_bulk=32 n_put_bulk=32 n_keep=128 : 
6059340592.00/4912893129.00/-18.92

> 
> Suggested-by: Honnappa Nagarahalli 
> Signed-off-by: Dharmik Thakkar 
> Reviewed-by: Ruifeng Wang 
> ---
>  lib/mempool/rte_mempool.h | 114 +-
>  lib/mempool/rte_mempool_ops_default.c |   7 ++
>  2 files changed, 119 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 1e7a3c15273c..4fabd3b1920b 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -50,6 +50,10 @@
>  #include 
>  #include 
> 
> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE
> +#include 
> +#endif
> +
>  #include "rte_mempool_trace_fp.h"
> 
>  #ifdef __cplusplus
> @@ -239,6 +243,9 @@ struct rte_mempool {
>   int32_t ops_index;
> 
>   struct rte_mempool_cache *local_cache; /**< Per-lcore local cache */
> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE
> + void *pool_base_value; /**< Base value to calculate indices */
> +#endif
> 
>   uint32_t populated_size; /**< Number of populated objects. */
>   struct rte_mempool_objhdr_list elt_list; /**< List of objects in pool */
> @@ -1314,7 +1321,19 @@ rte_mempool_cache_flush(struct rte_mempool_cache 
> *cache,
>   if (cache == NULL || cache->len == 0)
>   return;
>   rte_mempool_trace_cache_flush(cache, mp);
> +
> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE
> + unsigned int i;
> + unsigned int cache_len = cache->len;
> + void *obj_table[RTE_MEMPOOL_CACHE_MAX_SIZE * 3];
> + void *base_value = mp->pool_base_value;
> + uint32_t *cache_objs = (uint32_t *) cache->objs;
> + for (i = 0; i < cache_len; i++)
> + obj_table[i] = (void *) RTE_PTR_ADD(base_value, cache_objs[i]);
> + rte_mempool_ops_enqueue_bulk(mp, obj_table, cache->len);
> +#else
>   rte_mempool_ops_enqueue_bulk(mp, cache->objs, cache->len);
> +#endif
>   cache->len = 0;
>  }
> 
> @@ -1334,8 +1353,13 @@ static __rte_always_inline void
>  rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>  unsigned int n, struct rte_mempool_cache *cache)
>  {
> +#ifdef RTE_MEMPOOL_INDEX_BASED_LCORE_CACHE
> + uint32_t *cache_objs;
> + void *base_value;
> + uint32_t i;
> +#else
>   void *

RE: [RFC 1/3] ethdev: support GRE optional fields

2022-01-10 Thread Sean Zhang (Networking SW)
Hi Ori,

> -Original Message-
> From: Ori Kam 
> Sent: Sunday, January 9, 2022 8:30 PM
> To: Sean Zhang (Networking SW) ; Matan Azrad
> ; NBU-Contact-Thomas Monjalon (EXTERNAL)
> ; Ferruh Yigit ; Andrew
> Rybchenko 
> Cc: dev@dpdk.org
> Subject: RE: [RFC 1/3] ethdev: support GRE optional fields
> 
> Hi Sean,
> 
> 
> > -Original Message-
> > From: Sean Zhang 
> > Subject: [RFC 1/3] ethdev: support GRE optional fields
> >
> > Add flow pattern items and header format for matching optional fields
> > (checksum/key/sequence) in GRE header. And the flags in gre item
> > should be correspondingly set with the new added items.
> >
> > Signed-off-by: Sean Zhang 
> > ---
> >  doc/guides/prog_guide/rte_flow.rst | 16 
> >  lib/ethdev/rte_flow.c  |  1 +
> >  lib/ethdev/rte_flow.h  | 18 ++
> >  3 files changed, 35 insertions(+)
> >
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > b/doc/guides/prog_guide/rte_flow.rst
> > index c51ed88..48d5685 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -1113,6 +1113,22 @@ This should be preceded by item ``GRE``.
> >  - Value to be matched is a big-endian 32 bit integer.
> >  - When this item present it implicitly match K bit in default mask as "1"
> >
> > +Item: ``GRE_OPTION``
> > +
> > +
> > +Matches a GRE optional fields (checksum/key/sequence).
> > +This should be preceded by item ``GRE``.
> > +
> > +- ``checksum``: checksum.
> > +- ``key``: key.
> > +- ``sequence``: sequence.
> > +- The items in GRE_OPTION do not change bit flags(c_bit/k_bit/s_bit)
> > +in GRE
> > +  item. The bit flags need be set with GRE item by application. When
> > +the items
> > +  present, the corresponding bits in GRE spec and mask should be set
> > +"1" by
> > +  application, it means to match specified value of the fields. When
> > +the items
> > +  no present, but the corresponding bits in GRE spec and mask is "1",
> > +it means
> > +  to match any value of the fields.
> > +
> >  Item: ``FUZZY``
> >  ^^^
> >
> > diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c index
> > a93f68a..03bd1df 100644
> > --- a/lib/ethdev/rte_flow.c
> > +++ b/lib/ethdev/rte_flow.c
> > @@ -139,6 +139,7 @@ struct rte_flow_desc_data {
> > MK_FLOW_ITEM(META, sizeof(struct rte_flow_item_meta)),
> > MK_FLOW_ITEM(TAG, sizeof(struct rte_flow_item_tag)),
> > MK_FLOW_ITEM(GRE_KEY, sizeof(rte_be32_t)),
> > +   MK_FLOW_ITEM(GRE_OPTION, sizeof(struct rte_gre_hdr_option)),
> 
> I think that this new item is making the gre_key redundant, why not
> deprecate it?

Do you mean to add description like bellow?

  Item: ``GRE_KEY``
  ^^^
 +This action is deprecated. Consider `Item: GRE_OPTION`.

> 
> > MK_FLOW_ITEM(GTP_PSC, sizeof(struct rte_flow_item_gtp_psc)),
> > MK_FLOW_ITEM(PPPOES, sizeof(struct rte_flow_item_pppoe)),
> > MK_FLOW_ITEM(PPPOED, sizeof(struct rte_flow_item_pppoe)), diff -
> -git
> > a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index 1031fb2..27b4140
> > 100644
> > --- a/lib/ethdev/rte_flow.h
> > +++ b/lib/ethdev/rte_flow.h
> > @@ -660,6 +660,13 @@ enum rte_flow_item_type {
> >  * See struct rte_flow_item_ppp.
> >  */
> > RTE_FLOW_ITEM_TYPE_PPP,
> > +
> > +   /**
> > +* Matches GRE optional fields.
> > +*
> > +* See struct rte_gre_hdr_option.
> > +*/
> > +   RTE_FLOW_ITEM_TYPE_GRE_OPTION,
> >  };
> >
> >  /**
> > @@ -1196,6 +1203,17 @@ struct rte_flow_item_gre {  #endif
> >
> >  /**
> > + * RTE_FLOW_ITEM_TYPE_GRE_OPTION.
> > + *
> > + * Matches GRE optional fields in header.
> > + */
> > +struct rte_gre_hdr_option {
> > +   rte_be16_t checksum;
> > +   rte_be32_t key;
> > +   rte_be32_t sequence;
> > +};
> > +
> > +/**
> >   * RTE_FLOW_ITEM_TYPE_FUZZY
> >   *
> >   * Fuzzy pattern match, expect faster than default.
> > --
> > 1.8.3.1
> 
> Best,
> Ori

Thanks,
Sean


[PATCH] app/testpmd: update raw flow to take hex input

2022-01-10 Thread nipun . gupta
From: Nipun Gupta 

This patch enables method to provide key and mask for raw rules
to be provided as hexadecimal values. There is new parameter
pattern_mask added to support this.

Signed-off-by: Nipun Gupta 
---
 app/test-pmd/cmdline_flow.c | 15 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 13 +
 2 files changed, 28 insertions(+)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index bbe3dc0115..cfa68abc40 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -166,6 +166,7 @@ enum index {
ITEM_RAW_OFFSET,
ITEM_RAW_LIMIT,
ITEM_RAW_PATTERN,
+   ITEM_RAW_PATTERN_HEX,
ITEM_ETH,
ITEM_ETH_DST,
ITEM_ETH_SRC,
@@ -1107,6 +1108,7 @@ static const enum index item_raw[] = {
ITEM_RAW_OFFSET,
ITEM_RAW_LIMIT,
ITEM_RAW_PATTERN,
+   ITEM_RAW_PATTERN_HEX,
ITEM_NEXT,
ZERO,
 };
@@ -2664,6 +2666,19 @@ static const struct token token_list[] = {
 ARGS_ENTRY_ARB(sizeof(struct rte_flow_item_raw),
ITEM_RAW_PATTERN_SIZE)),
},
+   [ITEM_RAW_PATTERN_HEX] = {
+   .name = "pattern_hex",
+   .help = "hex string to look for",
+   .next = NEXT(item_raw,
+NEXT_ENTRY(COMMON_HEX),
+NEXT_ENTRY(ITEM_PARAM_IS,
+   ITEM_PARAM_SPEC,
+   ITEM_PARAM_MASK)),
+   .args = ARGS(ARGS_ENTRY(struct rte_flow_item_raw, pattern),
+ARGS_ENTRY(struct rte_flow_item_raw, length),
+ARGS_ENTRY_ARB(sizeof(struct rte_flow_item_raw),
+   ITEM_RAW_PATTERN_SIZE)),
+   },
[ITEM_ETH] = {
.name = "eth",
.help = "match Ethernet header",
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 44228cd7d2..68c216c805 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3634,6 +3634,7 @@ This section lists supported pattern items and their 
attributes, if any.
   - ``offset {integer}``: absolute or relative offset for pattern.
   - ``limit {unsigned}``: search area limit for start of pattern.
   - ``pattern {string}``: byte string to look for.
+  - ``pattern_hex {string}``: byte string (provided in hexadecimal) to look 
for.
 
 - ``eth``: match Ethernet header.
 
@@ -5080,6 +5081,18 @@ PPPoL2TPv2oUDP RSS rules can be created by the following 
commands::
  testpmd> flow create 0 ingress pattern eth / ipv6 / udp / l2tpv2 / ppp / ipv6
   / end actions rss types ipv6 end queues end / end
 
+Sample RAW rule
+~~~
+
+A RAW rule can be creted as following using ``pattern_hex`` key and mask.
+
+::
+
+testpmd> flow create 0 group 0 priority 1 ingress pattern raw relative is 
0 search is 0 offset
+ is 0 limit is 0 pattern_hex spec 
0a0a0a0a
+ pattern_hex mask 
 / end actions
+ queue index 4 / end
+
 BPF Functions
 --
 
-- 
2.17.1



[PATCH] examples/l3fwd: fix Rx burst size for event mode

2022-01-10 Thread nipun . gupta
From: Nipun Gupta 

While dequeing the packets from the event device, burst size
is provided in the API. This was not getting properly
conigured in the application. This patch correctly configures
the burst size.

Fixes: aaf58cb85b62 ("examples/l3fwd: add event port and queue setup")
Cc: sta...@dpdk.org

Signed-off-by: Nipun Gupta 
---
 examples/l3fwd/l3fwd_event_internal_port.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/examples/l3fwd/l3fwd_event_internal_port.c 
b/examples/l3fwd/l3fwd_event_internal_port.c
index 1e8f46bc11..32cf657148 100644
--- a/examples/l3fwd/l3fwd_event_internal_port.c
+++ b/examples/l3fwd/l3fwd_event_internal_port.c
@@ -118,6 +118,8 @@ l3fwd_event_port_setup_internal_port(void)
event_p_conf.event_port_cfg |=
RTE_EVENT_PORT_CFG_DISABLE_IMPL_REL;
 
+   evt_rsrc->deq_depth = def_p_conf.dequeue_depth;
+
for (event_p_id = 0; event_p_id < evt_rsrc->evp.nb_ports;
event_p_id++) {
ret = rte_event_port_setup(event_d_id, event_p_id,
-- 
2.17.1



[PATCH] vdpa/sfc: make MCDI memzone name unique

2022-01-10 Thread abhimanyu.saini
From: Abhimanyu Saini 

Buffer for MCDI channel is allocated using rte_memzone_reserve_aligned
with zone name 'mcdi'. Since multiple MCDI channels are needed to
support multiple VF(s) and rte_memzone_reserve_aligned expects unique
zone names, append PCI address to zone name to make it unique.

Signed-off-by: Abhimanyu Saini 
---
 drivers/vdpa/sfc/sfc_vdpa_hw.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/sfc/sfc_vdpa_hw.c b/drivers/vdpa/sfc/sfc_vdpa_hw.c
index fd1fee7..a7018b1 100644
--- a/drivers/vdpa/sfc/sfc_vdpa_hw.c
+++ b/drivers/vdpa/sfc/sfc_vdpa_hw.c
@@ -25,21 +25,30 @@
 {
uint64_t mcdi_iova;
size_t mcdi_buff_size;
+   char mz_name[RTE_MEMZONE_NAMESIZE];
const struct rte_memzone *mz = NULL;
int numa_node = sva->pdev->device.numa_node;
int ret;

mcdi_buff_size = RTE_ALIGN_CEIL(len, PAGE_SIZE);
+   ret = snprintf(mz_name, RTE_MEMZONE_NAMESIZE, "%s_%s",
+  sva->pdev->name, name);
+   if (ret < 0 || ret >= RTE_MEMZONE_NAMESIZE) {
+   sfc_vdpa_err(sva, "%s_%s too long to fit in mz_name",
+sva->pdev->name, name);
+   return -EINVAL;
+   }

-   sfc_vdpa_log_init(sva, "name=%s, len=%zu", name, len);
+   sfc_vdpa_log_init(sva, "name=%s, len=%zu", mz_name, len);

-   mz = rte_memzone_reserve_aligned(name, mcdi_buff_size,
+   mz = rte_memzone_reserve_aligned(mz_name, mcdi_buff_size,
 numa_node,
 RTE_MEMZONE_IOVA_CONTIG,
 PAGE_SIZE);
if (mz == NULL) {
sfc_vdpa_err(sva, "cannot reserve memory for %s: len=%#x: %s",
-name, (unsigned int)len, rte_strerror(rte_errno));
+mz_name, (unsigned int)len,
+rte_strerror(rte_errno));
return -ENOMEM;
}

--
1.8.3.1

This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.


RE: [RFC 1/3] ethdev: support GRE optional fields

2022-01-10 Thread Ori Kam
Hi Sean,

> -Original Message-
> From: Sean Zhang (Networking SW) 
> Sent: Tuesday, January 11, 2022 5:45 AM
> Subject: RE: [RFC 1/3] ethdev: support GRE optional fields
> 
> Hi Ori,
> 
> > -Original Message-
> > From: Ori Kam 
> > Sent: Sunday, January 9, 2022 8:30 PM
> > To: Sean Zhang (Networking SW) ; Matan Azrad
> > ; NBU-Contact-Thomas Monjalon (EXTERNAL)
> > ; Ferruh Yigit ; Andrew
> > Rybchenko 
> > Cc: dev@dpdk.org
> > Subject: RE: [RFC 1/3] ethdev: support GRE optional fields
> >
> > Hi Sean,
> >
> >
> > > -Original Message-
> > > From: Sean Zhang 
> > > Subject: [RFC 1/3] ethdev: support GRE optional fields
> > >
> > > Add flow pattern items and header format for matching optional fields
> > > (checksum/key/sequence) in GRE header. And the flags in gre item
> > > should be correspondingly set with the new added items.
> > >
> > > Signed-off-by: Sean Zhang 
> > > ---
> > >  doc/guides/prog_guide/rte_flow.rst | 16 
> > >  lib/ethdev/rte_flow.c  |  1 +
> > >  lib/ethdev/rte_flow.h  | 18 ++
> > >  3 files changed, 35 insertions(+)
> > >
> > > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > > b/doc/guides/prog_guide/rte_flow.rst
> > > index c51ed88..48d5685 100644
> > > --- a/doc/guides/prog_guide/rte_flow.rst
> > > +++ b/doc/guides/prog_guide/rte_flow.rst
> > > @@ -1113,6 +1113,22 @@ This should be preceded by item ``GRE``.
> > >  - Value to be matched is a big-endian 32 bit integer.
> > >  - When this item present it implicitly match K bit in default mask as "1"
> > >
> > > +Item: ``GRE_OPTION``
> > > +
> > > +
> > > +Matches a GRE optional fields (checksum/key/sequence).
> > > +This should be preceded by item ``GRE``.
> > > +
> > > +- ``checksum``: checksum.
> > > +- ``key``: key.
> > > +- ``sequence``: sequence.
> > > +- The items in GRE_OPTION do not change bit flags(c_bit/k_bit/s_bit)
> > > +in GRE
> > > +  item. The bit flags need be set with GRE item by application. When
> > > +the items
> > > +  present, the corresponding bits in GRE spec and mask should be set
> > > +"1" by
> > > +  application, it means to match specified value of the fields. When
> > > +the items
> > > +  no present, but the corresponding bits in GRE spec and mask is "1",
> > > +it means
> > > +  to match any value of the fields.
> > > +
> > >  Item: ``FUZZY``
> > >  ^^^
> > >
> > > diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c index
> > > a93f68a..03bd1df 100644
> > > --- a/lib/ethdev/rte_flow.c
> > > +++ b/lib/ethdev/rte_flow.c
> > > @@ -139,6 +139,7 @@ struct rte_flow_desc_data {
> > >   MK_FLOW_ITEM(META, sizeof(struct rte_flow_item_meta)),
> > >   MK_FLOW_ITEM(TAG, sizeof(struct rte_flow_item_tag)),
> > >   MK_FLOW_ITEM(GRE_KEY, sizeof(rte_be32_t)),
> > > + MK_FLOW_ITEM(GRE_OPTION, sizeof(struct rte_gre_hdr_option)),
> >
> > I think that this new item is making the gre_key redundant, why not
> > deprecate it?
> 
> Do you mean to add description like bellow?
> 
>   Item: ``GRE_KEY``
>   ^^^
>  +This action is deprecated. Consider `Item: GRE_OPTION`.

Yes and also add the depreciation notice in the release notes,
there is also need to see if other PMD are using the GRE_KEY.
But to be clear do not remove this support now just send notice that it should
be removed.

> 
> >
> > >   MK_FLOW_ITEM(GTP_PSC, sizeof(struct rte_flow_item_gtp_psc)),
> > >   MK_FLOW_ITEM(PPPOES, sizeof(struct rte_flow_item_pppoe)),
> > >   MK_FLOW_ITEM(PPPOED, sizeof(struct rte_flow_item_pppoe)), diff -
> > -git
> > > a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index 1031fb2..27b4140
> > > 100644
> > > --- a/lib/ethdev/rte_flow.h
> > > +++ b/lib/ethdev/rte_flow.h
> > > @@ -660,6 +660,13 @@ enum rte_flow_item_type {
> > >* See struct rte_flow_item_ppp.
> > >*/
> > >   RTE_FLOW_ITEM_TYPE_PPP,
> > > +
> > > + /**
> > > +  * Matches GRE optional fields.
> > > +  *
> > > +  * See struct rte_gre_hdr_option.
> > > +  */
> > > + RTE_FLOW_ITEM_TYPE_GRE_OPTION,
> > >  };
> > >
> > >  /**
> > > @@ -1196,6 +1203,17 @@ struct rte_flow_item_gre {  #endif
> > >
> > >  /**
> > > + * RTE_FLOW_ITEM_TYPE_GRE_OPTION.
> > > + *
> > > + * Matches GRE optional fields in header.
> > > + */
> > > +struct rte_gre_hdr_option {
> > > + rte_be16_t checksum;
> > > + rte_be32_t key;
> > > + rte_be32_t sequence;
> > > +};
> > > +
> > > +/**
> > >   * RTE_FLOW_ITEM_TYPE_FUZZY
> > >   *
> > >   * Fuzzy pattern match, expect faster than default.
> > > --
> > > 1.8.3.1
> >
> > Best,
> > Ori
> 
> Thanks,
> Sean


RE: [dpdk-dev] [dpdk-users] A question about Mellanox ConnectX-5 and ConnectX-4 Lx nic can't send packets?

2022-01-10 Thread Dmitry Kozlyuk
Hello,

Thanks for attaching all the details.
Can you please reproduce it with --log-level=pmd.common.mlx5:debug
and send the logs?

> For example, if the environment is configured with 10GB hugepages
> but each hugepage is physically discontinuous, this problem
> can be reproduced.

What the hugepage size?
In general, net/mlx5 does not rely on physical addresses.
(You probably mean that a range of hugepages is discontiguous,
because **each** hugepage is contiguous by definition.)

> This problem is introduced by this patch:
> https://git.dpdk.org/dpdk/commit/?id=fec28ca0e3a93143829f3b41a28a8da933f28499.

Did you find it with bisection?