Re: [dpdk-dev] [PATCH v3] lib/rte_rib6: fix stack buffer overflow

2021-06-22 Thread David Marchand
On Mon, Jun 21, 2021 at 3:28 PM  wrote:
>
> From: Owen Hilyard 

Hi Owen, Vladimir,


Owen, two comments on the patch title.

- We (try to) never prefix with lib/, as it gives no additional info.
The prefix should be the library name.
There were some transgressions to this rule, but this was Thomas or me
being absent minded.

For other parts of the tree, it is a bit more complex, but if unsure,
the simplest is to look at the git history.
Here this is the rib library, so "rib: " is enough.


- The title purpose is to give a hint of the functional impact: people
looking for fixes for a type of bug can find it more easily.

Here, just indicating we are fixing a buffer overflow won't help judge
in which usecase the issue happenned.
How about: "rib: fix max depth IPv6 lookup"


>
> ASAN found a stack buffer overflow in lib/rib/rte_rib6.c:get_dir.
> The fix for the stack buffer overflow was to make sure depth
> was always < 128, since when depth = 128 it caused the index
> into the ip address to be 16, which read off the end of the array.
>
> While trying to solve the buffer overflow, I noticed that a few
> changes could be made to remove the for loop entirely.
>
> Fixes: f7e861e21c ("rib: support IPv6")
Cc: sta...@dpdk.org

>
> Signed-off-by: Owen Hilyard 


Vladimir, can you review this fix?

Thanks!

-- 
David Marchand



Re: [dpdk-dev] [PATCH] lib/flow_classify: fix leaking rules on delete

2021-06-22 Thread David Marchand
On Wed, Jun 16, 2021 at 9:57 PM  wrote:
>
> From: Owen Hilyard 
>
> Rules in a classify table were not freed if the table
> had a delete function.
>
> Fixes: be41ac2a3 ("flow_classify: introduce flow classify library")
Cc: sta...@dpdk.org

>
> Signed-off-by: Owen Hilyard 
> ---
>  lib/flow_classify/rte_flow_classify.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/flow_classify/rte_flow_classify.c 
> b/lib/flow_classify/rte_flow_classify.c
> index f125267e8..06aed3b70 100644
> --- a/lib/flow_classify/rte_flow_classify.c
> +++ b/lib/flow_classify/rte_flow_classify.c
> @@ -579,7 +579,7 @@ rte_flow_classify_table_entry_delete(struct 
> rte_flow_classifier *cls,
> &rule->u.key.key_del,
> &rule->key_found,
> &rule->entry);
> -
> +   free(rule);
> return ret;
> }
> }

I find it strange to free the rule regardless of the result of the
f_delete() op.
The same is done out of the loop which means this function returns
-EINVAL and frees the rule in this case too.

Bernard, Ferruh, can you review please?

Thanks!


-- 
David Marchand



Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc

2021-06-22 Thread Singh, Aman Deep

Hi Raslan,

Can you please provide link to this RFC 38415-g30
I just had some doubt on byte-order conversion as per RFC 1700 



Regards
Aman



Re: [dpdk-dev] [PATCH] kni: fix wrong mbuf alloc count in kni_allocate_mbufs

2021-06-22 Thread wangyunjian
> -Original Message-
> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
> Sent: Monday, June 21, 2021 7:26 PM
> To: wangyunjian ; dev@dpdk.org
> Cc: liucheng (J) ; dingxiaoxiong
> 
> Subject: Re: [dpdk-dev] [PATCH] kni: fix wrong mbuf alloc count in
> kni_allocate_mbufs
> 
> On 6/21/2021 4:27 AM, wangyunjian wrote:
> >> -Original Message-
> >> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
> >> Sent: Friday, June 18, 2021 9:37 PM
> >> To: wangyunjian ; dev@dpdk.org
> >> Cc: liucheng (J) ; dingxiaoxiong
> >> 
> >> Subject: Re: [dpdk-dev] [PATCH] kni: fix wrong mbuf alloc count in
> >> kni_allocate_mbufs
> >>
> >> On 5/31/2021 1:09 PM, wangyunjian wrote:
> >>> From: Yunjian Wang 
> >>>
> >>> In kni_allocate_mbufs(), we alloc mbuf for alloc_q as this code.
> >>> allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
> >>>   & (MAX_MBUF_BURST_NUM - 1);
> >>> The value of allocq_free maybe zero (e.g 32 & (32 - 1) = 0), and it
> >>> will not fill the alloc_q. When the alloc_q's free count is zero, it
> >>> will drop the packet in kernel kni.
> >>>
> >>
> >> nack
> >>
> >> Both 'read' & 'write' pointers can be max 'len-1', so 'read - write -
> >> 1' can't be 'len'.
> >> For above example first part can't be '32'.
> >>
> >> But if you are observing a problem, can you please describe it a
> >> little more, it may be because of something else.
> >
> > The ring size is 1024. After init, write = read = 0. Then we fill 
> > kni->alloc_q to
> full. At this time, write = 1023, read = 0.
> > Then the kernel send 32 packets to userspace. At this time, write = 1023,
> read = 32.
> > And then the userspace recieve this 32 packets. Then fill the kni->alloc_q, 
> > (32
> - 1023 - 1)&31 = 0, fill nothing.
> > ...
> > Then the kernel send 32 packets to userspace. At this time, write = 1023,
> read = 992.
> > And then the userspace recieve this 32 packets. Then fill the kni->alloc_q,
> (992 - 1023 - 1)&31 = 0, fill nothing.
> > Then the kernel send 32 packets to userspace. The kni->alloc_q only has 31
> mbufs and will drop one packet.
> >
> > Absolutely, this is a special scene. Normally, it will fill some mbufs 
> > everytime,
> but may not enough for the kernel to use.
> > In this patch, we always keep the kni->alloc_q to full for the kernel to 
> > use.
> >
> 
> I see now, yes it is technically possible to have above scenario and it can 
> cause
> glitch in the datapath.
> 
> Below fix looks good, +1 to use 'kni_fifo_free_count()' instead of calculation
> within the function which may be wrong for the 'RTE_USE_C11_MEM_MODEL'
> case.

I compiled them on the ARM and x86 platforms with the 'RTE_USE_C11_MEM_MODEL'
case, and no error is reported.

> 
> Can you please add fixes line too?

OK, will include it in next version.

Thanks

> 
> > Thanks
> >
> >>
> >>> In this patch, we set the allocq_free as the min between
> >>> MAX_MBUF_BURST_NUM and the free count of the alloc_q.
> >>>
> >>> Signed-off-by: Cheng Liu 
> >>> Signed-off-by: Yunjian Wang 
> >>> ---
> >>>  lib/kni/rte_kni.c | 5 +++--
> >>>  1 file changed, 3 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/lib/kni/rte_kni.c b/lib/kni/rte_kni.c index
> >>> 9dae6a8d7c..20d8f20cef 100644
> >>> --- a/lib/kni/rte_kni.c
> >>> +++ b/lib/kni/rte_kni.c
> >>> @@ -677,8 +677,9 @@ kni_allocate_mbufs(struct rte_kni *kni)
> >>>   return;
> >>>   }
> >>>
> >>> - allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1)
> >>> - & (MAX_MBUF_BURST_NUM - 1);
> >>> + allocq_free = kni_fifo_free_count(kni->alloc_q);
> >>> + allocq_free = (allocq_free > MAX_MBUF_BURST_NUM) ?
> >>> +   MAX_MBUF_BURST_NUM : allocq_free;
> >>>   for (i = 0; i < allocq_free; i++) {
> >>>   pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
> >>>   if (unlikely(pkts[i] == NULL)) {
> >>>
> >



Re: [dpdk-dev] [RFC PATCH v3 1/3] sched: add PIE based congestion management

2021-06-22 Thread Liguzinski, WojciechX


> -Original Message-
> From: Stephen Hemminger  
> Sent: Monday, June 21, 2021 8:18 PM
> To: Liguzinski, WojciechX 
> Cc: dev@dpdk.org; Singh, Jasvinder ; Dumitrescu, 
> Cristian ; Dharmappa, Savinay 
> ; Ajmera, Megha 
> Subject: Re: [dpdk-dev] [RFC PATCH v3 1/3] sched: add PIE based congestion 
> management
>
> On Mon, 21 Jun 2021 08:35:04 +0100
> "Liguzinski, WojciechX"  wrote:
>
> > +/**
> > + * @brief Initialises run-time data
> > + *
> > + * @param pie [in,out] data pointer to PIE runtime data
> > + *
> > + * @return Operation status
> > + * @retval 0 success
> > + * @retval !0 error
> > + */
> > +int
> > +rte_pie_rt_data_init(struct rte_pie *pie);
>
> All the new code needs to be marked experimental.
> Why return an error on the init() function, then you are going to make 
> application check the result and lead to lots more code.
>
> Other places in DPDK use void for init functions.

Thanks for comments.
I'll apply necessary updates to V4 of RFC patches.

BR,
Wojciech


Re: [dpdk-dev] [PATCH] kni: fix wrong mbuf alloc count in kni_allocate_mbufs

2021-06-22 Thread Ferruh Yigit
On 6/22/2021 8:32 AM, wangyunjian wrote:
>> -Original Message-
>> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
>> Sent: Monday, June 21, 2021 7:26 PM
>> To: wangyunjian ; dev@dpdk.org
>> Cc: liucheng (J) ; dingxiaoxiong
>> 
>> Subject: Re: [dpdk-dev] [PATCH] kni: fix wrong mbuf alloc count in
>> kni_allocate_mbufs
>>
>> On 6/21/2021 4:27 AM, wangyunjian wrote:
 -Original Message-
 From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
 Sent: Friday, June 18, 2021 9:37 PM
 To: wangyunjian ; dev@dpdk.org
 Cc: liucheng (J) ; dingxiaoxiong
 
 Subject: Re: [dpdk-dev] [PATCH] kni: fix wrong mbuf alloc count in
 kni_allocate_mbufs

 On 5/31/2021 1:09 PM, wangyunjian wrote:
> From: Yunjian Wang 
>
> In kni_allocate_mbufs(), we alloc mbuf for alloc_q as this code.
> allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
>   & (MAX_MBUF_BURST_NUM - 1);
> The value of allocq_free maybe zero (e.g 32 & (32 - 1) = 0), and it
> will not fill the alloc_q. When the alloc_q's free count is zero, it
> will drop the packet in kernel kni.
>

 nack

 Both 'read' & 'write' pointers can be max 'len-1', so 'read - write -
 1' can't be 'len'.
 For above example first part can't be '32'.

 But if you are observing a problem, can you please describe it a
 little more, it may be because of something else.
>>>
>>> The ring size is 1024. After init, write = read = 0. Then we fill 
>>> kni->alloc_q to
>> full. At this time, write = 1023, read = 0.
>>> Then the kernel send 32 packets to userspace. At this time, write = 1023,
>> read = 32.
>>> And then the userspace recieve this 32 packets. Then fill the kni->alloc_q, 
>>> (32
>> - 1023 - 1)&31 = 0, fill nothing.
>>> ...
>>> Then the kernel send 32 packets to userspace. At this time, write = 1023,
>> read = 992.
>>> And then the userspace recieve this 32 packets. Then fill the kni->alloc_q,
>> (992 - 1023 - 1)&31 = 0, fill nothing.
>>> Then the kernel send 32 packets to userspace. The kni->alloc_q only has 31
>> mbufs and will drop one packet.
>>>
>>> Absolutely, this is a special scene. Normally, it will fill some mbufs 
>>> everytime,
>> but may not enough for the kernel to use.
>>> In this patch, we always keep the kni->alloc_q to full for the kernel to 
>>> use.
>>>
>>
>> I see now, yes it is technically possible to have above scenario and it can 
>> cause
>> glitch in the datapath.
>>
>> Below fix looks good, +1 to use 'kni_fifo_free_count()' instead of 
>> calculation
>> within the function which may be wrong for the 'RTE_USE_C11_MEM_MODEL'
>> case.
> 
> I compiled them on the ARM and x86 platforms with the 'RTE_USE_C11_MEM_MODEL'
> case, and no error is reported.
> 

May not be build error, but in 'RTE_USE_C11_MEM_MODEL' case 'read'/'write' are
not volatile and need to read them via C11 atomic instructions. 'allocq_free'
calculation in the 'kni_allocate_mbufs()' doesn't do that, that is why better to
replace calculation with 'kni_fifo_free_count()'.

>>
>> Can you please add fixes line too?
> 
> OK, will include it in next version.
> 

Thanks.

> Thanks
> 
>>
>>> Thanks
>>>

> In this patch, we set the allocq_free as the min between
> MAX_MBUF_BURST_NUM and the free count of the alloc_q.
>
> Signed-off-by: Cheng Liu 
> Signed-off-by: Yunjian Wang 
> ---
>  lib/kni/rte_kni.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/lib/kni/rte_kni.c b/lib/kni/rte_kni.c index
> 9dae6a8d7c..20d8f20cef 100644
> --- a/lib/kni/rte_kni.c
> +++ b/lib/kni/rte_kni.c
> @@ -677,8 +677,9 @@ kni_allocate_mbufs(struct rte_kni *kni)
>   return;
>   }
>
> - allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1)
> - & (MAX_MBUF_BURST_NUM - 1);
> + allocq_free = kni_fifo_free_count(kni->alloc_q);
> + allocq_free = (allocq_free > MAX_MBUF_BURST_NUM) ?
> +   MAX_MBUF_BURST_NUM : allocq_free;
>   for (i = 0; i < allocq_free; i++) {
>   pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
>   if (unlikely(pkts[i] == NULL)) {
>
>>>
> 



Re: [dpdk-dev] [PATCH v3] ethdev: add IPv4 and L4 checksum RSS offload types

2021-06-22 Thread Singh, Aman Deep

Acked-by: Aman Deep Singh 



Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-22 Thread Ananyev, Konstantin
> 
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev,
> > Konstantin
> >
> > >
> > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev,
> > > > Konstantin
> > > >
> > > > > > How can we hide the callbacks since they are used by inline
> > burst
> > > > functions.
> > > > >
> > > > > I probably I owe a better explanation to what I meant in first
> > mail.
> > > > > Otherwise it sounds confusing.
> > > > > I'll try to write a more detailed one in next few days.
> > > >
> > > > Actually I gave it another thought over weekend, and might be we
> > can
> > > > hide rte_eth_dev_cb even in a simpler way. I'd use eth_rx_burst()
> > as
> > > > an example, but the same principle applies to other 'fast'
> > functions.
> > > >
> > > >  1. Needed changes for PMDs rx_pkt_burst():
> > > > a) change function prototype to accept 'uint16_t port_id' and
> > > > 'uint16_t queue_id',
> > > >  instead of current 'void *'.
> > > > b) Each PMD rx_pkt_burst() will have to call
> > rte_eth_rx_epilog()
> > > > function at return.
> > > >  This  inline function will do all CB calls for that queue.
> > > >
> > > > To be more specific, let say we have some PMD: xyz with RX
> > function:
> > > >
> > > > uint16_t
> > > > xyz_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t
> > > > nb_pkts)
> > > > {
> > > >  struct xyz_rx_queue *rxq = rx_queue;
> > > >  uint16_t nb_rx = 0;
> > > >
> > > >  /* do actual stuff here */
> > > > 
> > > > return nb_rx;
> > > > }
> > > >
> > > > It will be transformed to:
> > > >
> > > > uint16_t
> > > > xyz_recv_pkts(uint16_t port_id, uint16_t queue_id, struct rte_mbuf
> > > > **rx_pkts, uint16_t nb_pkts)
> > > > {
> > > >  struct xyz_rx_queue *rxq;
> > > >  uint16_t nb_rx;
> > > >
> > > >  rxq = _rte_eth_rx_prolog(port_id, queue_id);
> > > >  if (rxq == NULL)
> > > >  return 0;
> > > >  nb_rx = _xyz_real_recv_pkts(rxq, rx_pkts, nb_pkts);
> > > >  return _rte_eth_rx_epilog(port_id, queue_id, rx_pkts,
> > > > nb_pkts);
> > > > }
> > > >
> > > > And somewhere in ethdev_private.h:
> > > >
> > > > static inline void *
> > > > _rte_eth_rx_prolog(uint16_t port_id, uint16_t queue_id);
> > > > {
> > > >struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > > >
> > > > #ifdef RTE_ETHDEV_DEBUG_RX
> > > > RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
> > > > RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, NULL);
> > > >
> > > > if (queue_id >= dev->data->nb_rx_queues) {
> > > > RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> > > > queue_id);
> > > > return NULL;
> > > > }
> > > > #endif
> > > >   return dev->data->rx_queues[queue_id];
> > > > }
> > > >
> > > > static inline uint16_t
> > > > _rte_eth_rx_epilog(uint16_t port_id, uint16_t queue_id, struct
> > rte_mbuf
> > > > **rx_pkts, const uint16_t nb_pkts);
> > > > {
> > > > struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > > >
> > > > #ifdef RTE_ETHDEV_RXTX_CALLBACKS
> > > > struct rte_eth_rxtx_callback *cb;
> > > >
> > > > /* __ATOMIC_RELEASE memory order was used when the
> > > >  * call back was inserted into the list.
> > > >  * Since there is a clear dependency between loading
> > > >  * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
> > > >  * not required.
> > > >  */
> > > > cb = __atomic_load_n(&dev->post_rx_burst_cbs[queue_id],
> > > > __ATOMIC_RELAXED);
> > > >
> > > > if (unlikely(cb != NULL)) {
> > > > do {
> > > > nb_rx = cb->fn.rx(port_id, queue_id,
> > rx_pkts,
> > > > nb_rx,
> > > > nb_pkts, cb-
> > >param);
> > > > cb = cb->next;
> > > > } while (cb != NULL);
> > > > }
> > > > #endif
> > > >
> > > > rte_ethdev_trace_rx_burst(port_id, queue_id, (void
> > **)rx_pkts,
> > > > nb_rx);
> > > > return nb_rx;
> > > >  }
> > >
> > > That would make the compiler inline _rte_eth_rx_epilog() into the
> > driver when compiling the DPDK library. But
> > > RTE_ETHDEV_RXTX_CALLBACKS is a definition for the application
> > developer to use when compiling the DPDK application.
> >
> > I believe it is for both - user app and DPDK drivers.
> > AFAIK, they both have to use the same rte_config.h, otherwise things
> > will be broken.
> > If let say RTE_ETHDEV_RXTX_CALLBACKS is not enabled in ethdev, then
> > user wouldn't be able to add a callback at first place.
> 
> In the case of RTE_ETHDEV_RXTX_CALLBACKS, it is independent:

Not really.
There are few libraries within DPDK that do rely on rx/tx callbacks:
bpf, latencystat, pdump, power.
With the approach above their functionality will be broken -
setup functions will return success, but actual callbacks will not be invoked. 
From othe

Re: [dpdk-dev] [PATCH v1 4/7] power: remove thread safety from PMD power API's

2021-06-22 Thread Ananyev, Konstantin


> Currently, we expect that only one callback can be active at any given
> moment, for a particular queue configuration, which is relatively easy
> to implement in a thread-safe way. However, we're about to add support
> for multiple queues per lcore, which will greatly increase the
> possibility of various race conditions.
> 
> We could have used something like an RCU for this use case, but absent
> of a pressing need for thread safety we'll go the easy way and just
> mandate that the API's are to be called when all affected ports are
> stopped, and document this limitation. This greatly simplifies the
> `rte_power_monitor`-related code.

I think you need to update RN too with that.
Another thing - do you really need the whole port stopped?
>From what I understand - you work on queues, so it is enough for you
that related RX queue is stopped.
So, to make things a bit more robust, in pmgmt_queue_enable/disable 
you can call rte_eth_rx_queue_info_get() and check queue state.
 
> Signed-off-by: Anatoly Burakov 
> ---
>  lib/power/meson.build  |   3 +
>  lib/power/rte_power_pmd_mgmt.c | 106 -
>  lib/power/rte_power_pmd_mgmt.h |   6 ++
>  3 files changed, 35 insertions(+), 80 deletions(-)
> 
> diff --git a/lib/power/meson.build b/lib/power/meson.build
> index c1097d32f1..4f6a242364 100644
> --- a/lib/power/meson.build
> +++ b/lib/power/meson.build
> @@ -21,4 +21,7 @@ headers = files(
>  'rte_power_pmd_mgmt.h',
>  'rte_power_guest_channel.h',
>  )
> +if cc.has_argument('-Wno-cast-qual')
> +cflags += '-Wno-cast-qual'
> +endif
>  deps += ['timer', 'ethdev']
> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
> index db03cbf420..0707c60a4f 100644
> --- a/lib/power/rte_power_pmd_mgmt.c
> +++ b/lib/power/rte_power_pmd_mgmt.c
> @@ -40,8 +40,6 @@ struct pmd_queue_cfg {
>   /**< Callback mode for this queue */
>   const struct rte_eth_rxtx_callback *cur_cb;
>   /**< Callback instance */
> - volatile bool umwait_in_progress;
> - /**< are we currently sleeping? */
>   uint64_t empty_poll_stats;
>   /**< Number of empty polls */
>  } __rte_cache_aligned;
> @@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct 
> rte_mbuf **pkts __rte_unused,
>   struct rte_power_monitor_cond pmc;
>   uint16_t ret;
> 
> - /*
> -  * we might get a cancellation request while being
> -  * inside the callback, in which case the wakeup
> -  * wouldn't work because it would've arrived too early.
> -  *
> -  * to get around this, we notify the other thread that
> -  * we're sleeping, so that it can spin until we're done.
> -  * unsolicited wakeups are perfectly safe.
> -  */
> - q_conf->umwait_in_progress = true;
> -
> - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> -
> - /* check if we need to cancel sleep */
> - if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
> - /* use monitoring condition to sleep */
> - ret = rte_eth_get_monitor_addr(port_id, qidx,
> - &pmc);
> - if (ret == 0)
> - rte_power_monitor(&pmc, UINT64_MAX);
> - }
> - q_conf->umwait_in_progress = false;
> -
> - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> + /* use monitoring condition to sleep */
> + ret = rte_eth_get_monitor_addr(port_id, qidx,
> + &pmc);
> + if (ret == 0)
> + rte_power_monitor(&pmc, UINT64_MAX);
>   }
>   } else
>   q_conf->empty_poll_stats = 0;
> @@ -183,6 +162,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int 
> lcore_id, uint16_t port_id,
>  {
>   struct pmd_queue_cfg *queue_cfg;
>   struct rte_eth_dev_info info;
> + rte_rx_callback_fn clb;
>   int ret;
> 
>   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> @@ -232,17 +212,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int 
> lcore_id, uint16_t port_id,
>   ret = -ENOTSUP;
>   goto end;
>   }
> - /* initialize data before enabling the callback */
> - queue_cfg->empty_poll_stats = 0;
> - queue_cfg->cb_mode = mode;
> - queue_cfg->umwait_in_progress = false;
> - queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
> -
> - /* ensure we update our state before callback starts */
> - rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
> -
> - queue_cfg->cur_cb 

Re: [dpdk-dev] [PATCH v1 1/2] devtools: add relative path support for ABI compatibility check

2021-06-22 Thread Bruce Richardson
On Tue, Jun 01, 2021 at 09:56:52AM +0800, Feifei Wang wrote:
> From: Phil Yang 
> 
> Because dpdk guide does not limit the relative path for ABI
> compatibility check, users maybe set 'DPDK_ABI_REF_DIR' as a relative
> path:
> 
> ~/dpdk/devtools$ DPDK_ABI_REF_VERSION=v19.11 DPDK_ABI_REF_DIR=build-gcc-shared
> ./test-meson-builds.sh
> 
> And if the DESTDIR is not an absolute path, ninja complains:
> + install_target build-gcc-shared/v19.11/build 
> build-gcc-shared/v19.11/build-gcc-shared
> + rm -rf build-gcc-shared/v19.11/build-gcc-shared
> + echo 'DESTDIR=build-gcc-shared/v19.11/build-gcc-shared ninja -C 
> build-gcc-shared/v19.11/build install'
> + DESTDIR=build-gcc-shared/v19.11/build-gcc-shared
> + ninja -C build-gcc-shared/v19.11/build install
> ...
> ValueError: dst_dir must be absolute, got 
> build-gcc-shared/v19.11/build-gcc-shared/usr/local/share/dpdk/
> examples/bbdev_app
> ...
> Error: install directory 'build-gcc-shared/v19.11/build-gcc-shared' does not 
> exist.
> 
> To fix this, add relative path support using 'readlink -f'.
> 
> Signed-off-by: Phil Yang 
> Signed-off-by: Feifei Wang 
> Reviewed-by: Juraj Linkeš 
> Reviewed-by: Ruifeng Wang 
> ---
>  devtools/test-meson-builds.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
> index daf817ac3e..43b906598d 100755
> --- a/devtools/test-meson-builds.sh
> +++ b/devtools/test-meson-builds.sh
> @@ -168,7 +168,8 @@ build () #check> [meson options]
>   config $srcdir $builds_dir/$targetdir $cross --werror $*
>   compile $builds_dir/$targetdir
>   if [ -n "$DPDK_ABI_REF_VERSION" -a "$abicheck" = ABI ] ; then
> - abirefdir=${DPDK_ABI_REF_DIR:-reference}/$DPDK_ABI_REF_VERSION
> + abirefdir=$(readlink -f \
> + ${DPDK_ABI_REF_DIR:-reference}/$DPDK_ABI_REF_VERSION)
>   if [ ! -d $abirefdir/$targetdir ]; then
>   # clone current sources
>   if [ ! -d $abirefdir/src ]; then

This looks a simple enough change.

Acked-by: Bruce Richardson 


[dpdk-dev] [PATCH 0/3] net/bonding: make dedicated queues work with mlx5

2021-06-22 Thread Martin Havlik
This patchset fixes the inability to use dedicated queues
on mlx5 PMD due to RTE Flow rule attempted creation prior to
starting the device.
Missing return value check and copy paste error near the rule
creation have also been fixed.

Cc: Chas Williams 
Cc: "Min Hu (Connor)" 
Cc: Declan Doherty 
Cc: Tomasz Kulasek 
Cc: Jan Viktorin 

Martin Havlik (3):
  net/bonding: fix proper return value check and correct log message
  net/bonding: fix not checked return value
  net/bonding: start ethdev prior to setting 8023ad flow

 drivers/net/bonding/rte_eth_bond_pmd.c | 33 +++---
 1 file changed, 25 insertions(+), 8 deletions(-)

-- 
2.27.0



[dpdk-dev] [PATCH 1/3] net/bonding: fix proper return value check and correct log message

2021-06-22 Thread Martin Havlik
Return value is now saved to errval and log message on error
reports correct function name, doesn't use q_id which was out of context,
and uses up-to-date errval.

Fixes: 112891cd27e5 ("net/bonding: add dedicated HW queues for LACP control")
Cc: tomaszx.kula...@intel.com

Signed-off-by: Martin Havlik 
Cc: Jan Viktorin 
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index b01ef003e..4c43bf916 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1805,12 +1805,13 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
!= 0)
return errval;
 
-   if (bond_ethdev_8023ad_flow_verify(bonded_eth_dev,
-   slave_eth_dev->data->port_id) != 0) {
+   errval = bond_ethdev_8023ad_flow_verify(bonded_eth_dev,
+   slave_eth_dev->data->port_id);
+   if (errval != 0) {
RTE_BOND_LOG(ERR,
-   "rte_eth_tx_queue_setup: port=%d queue_id %d, 
err (%d)",
-   slave_eth_dev->data->port_id, q_id, errval);
-   return -1;
+   "bond_ethdev_8023ad_flow_verify: port=%d, err 
(%d)",
+   slave_eth_dev->data->port_id, errval);
+   return errval;
}
 
if 
(internals->mode4.dedicated_queues.flow[slave_eth_dev->data->port_id] != NULL)
-- 
2.27.0



[dpdk-dev] [PATCH 2/3] net/bonding: fix not checked return value

2021-06-22 Thread Martin Havlik
Return value from bond_ethdev_8023ad_flow_set() is now checked
and appropriate message is logged on error.

Fixes: 112891cd27e5 ("net/bonding: add dedicated HW queues for LACP control")
Cc: tomaszx.kula...@intel.com

Signed-off-by: Martin Havlik 
Cc: Jan Viktorin 
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 4c43bf916..a6755661c 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1819,8 +1819,14 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,

internals->mode4.dedicated_queues.flow[slave_eth_dev->data->port_id],
&flow_error);
 
-   bond_ethdev_8023ad_flow_set(bonded_eth_dev,
+   errval = bond_ethdev_8023ad_flow_set(bonded_eth_dev,
slave_eth_dev->data->port_id);
+   if (errval != 0) {
+   RTE_BOND_LOG(ERR,
+   "bond_ethdev_8023ad_flow_set: port=%d, err 
(%d)",
+   slave_eth_dev->data->port_id, errval);
+   return errval;
+   }
}
 
/* Start device */
-- 
2.27.0



[dpdk-dev] [PATCH 3/3] net/bonding: start ethdev prior to setting 8023ad flow

2021-06-22 Thread Martin Havlik
When dedicated queues are enabled, mlx5 PMD fails to install RTE Flows
if the underlying ethdev is not started:
bond_ethdev_8023ad_flow_set(267) - bond_ethdev_8023ad_flow_set: port not 
started (slave_port=0 queue_id=1)

Signed-off-by: Martin Havlik 
Cc: Jan Viktorin 
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index a6755661c..fea3bc537 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1818,25 +1818,35 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
rte_flow_destroy(slave_eth_dev->data->port_id,

internals->mode4.dedicated_queues.flow[slave_eth_dev->data->port_id],
&flow_error);
+   }
 
+   /* Start device */
+   errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
+   if (errval != 0) {
+   RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err (%d)",
+   slave_eth_dev->data->port_id, errval);
+   return -1;
+   }
+
+   if (internals->mode == BONDING_MODE_8023AD &&
+   internals->mode4.dedicated_queues.enabled == 1) {
errval = bond_ethdev_8023ad_flow_set(bonded_eth_dev,
slave_eth_dev->data->port_id);
if (errval != 0) {
RTE_BOND_LOG(ERR,
"bond_ethdev_8023ad_flow_set: port=%d, err 
(%d)",
slave_eth_dev->data->port_id, errval);
+
+   errval = rte_eth_dev_stop(slave_eth_dev->data->port_id);
+   if (errval < 0) {
+   RTE_BOND_LOG(ERR,
+   "rte_eth_dev_stop: port=%d, err (%d)",
+   slave_eth_dev->data->port_id, errval);
+   }
return errval;
}
}
 
-   /* Start device */
-   errval = rte_eth_dev_start(slave_eth_dev->data->port_id);
-   if (errval != 0) {
-   RTE_BOND_LOG(ERR, "rte_eth_dev_start: port=%u, err (%d)",
-   slave_eth_dev->data->port_id, errval);
-   return -1;
-   }
-
/* If RSS is enabled for bonding, synchronize RETA */
if (bonded_eth_dev->data->dev_conf.rxmode.mq_mode & ETH_MQ_RX_RSS) {
int i;
-- 
2.27.0



Re: [dpdk-dev] [PATCH v1 5/7] power: support callbacks for multiple Rx queues

2021-06-22 Thread Ananyev, Konstantin


> Currently, there is a hard limitation on the PMD power management
> support that only allows it to support a single queue per lcore. This is
> not ideal as most DPDK use cases will poll multiple queues per core.
> 
> The PMD power management mechanism relies on ethdev Rx callbacks, so it
> is very difficult to implement such support because callbacks are
> effectively stateless and have no visibility into what the other ethdev
> devices are doing.  This places limitations on what we can do within the
> framework of Rx callbacks, but the basics of this implementation are as
> follows:
> 
> - Replace per-queue structures with per-lcore ones, so that any device
>   polled from the same lcore can share data
> - Any queue that is going to be polled from a specific lcore has to be
>   added to the list of cores to poll, so that the callback is aware of
>   other queues being polled by the same lcore
> - Both the empty poll counter and the actual power saving mechanism is
>   shared between all queues polled on a particular lcore, and is only
>   activated when a special designated "power saving" queue is polled. To
>   put it another way, we have no idea which queue the user will poll in
>   what order, so we rely on them telling us that queue X is the last one
>   in the polling loop, so any power management should happen there.
> - A new API is added to mark a specific Rx queue as "power saving".

Honestly, I don't understand the logic behind that new function.
I understand that depending on HW we ca monitor either one or multiple queues.
That's ok, but why we now need to mark one queue as a 'very special' one?
Why can't rte_power_ethdev_pmgmt_queue_enable() just:
Check is number of monitored queues exceed HW/SW capabilities,
and if so then just return a failure.
Otherwise add queue to the list and treat them all equally, i.e:
go to power save mode when number of sequential empty polls on
all monitored queues will exceed EMPTYPOLL_MAX threshold?

>   Failing to call this API will result in no power management, however
>   when having only one queue per core it is obvious which queue is the
>   "power saving" one, so things will still work without this new API for
>   use cases that were previously working without it.
> - The limitation on UMWAIT-based polling is not removed because UMWAIT
>   is incapable of monitoring more than one address.
> 
> Signed-off-by: Anatoly Burakov 
> ---
>  lib/power/rte_power_pmd_mgmt.c | 335 ++---
>  lib/power/rte_power_pmd_mgmt.h |  34 
>  lib/power/version.map  |   3 +
>  3 files changed, 306 insertions(+), 66 deletions(-)
> 
> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
> index 0707c60a4f..60dd21a19c 100644
> --- a/lib/power/rte_power_pmd_mgmt.c
> +++ b/lib/power/rte_power_pmd_mgmt.c
> @@ -33,7 +33,19 @@ enum pmd_mgmt_state {
>   PMD_MGMT_ENABLED
>  };
> 
> -struct pmd_queue_cfg {
> +struct queue {
> + uint16_t portid;
> + uint16_t qid;
> +};

Just a thought: if that would help somehow, it can be changed to:
union queue {
uint32_t raw;
struct { uint16_t portid, qid;
}; 
};

That way in queue find/cmp functions below you can operate with single raw 
32-bt values.
Probably not that important, as all these functions are on slow path, but might 
look nicer.

> +struct pmd_core_cfg {
> + struct queue queues[RTE_MAX_ETHPORTS];

If we'll have ability to monitor multiple queues per lcore, would it be always 
enough?
>From other side, it is updated on control path only.
Wouldn't normal list with malloc(/rte_malloc) would be more suitable here?  

> + /**< Which port-queue pairs are associated with this lcore? */
> + struct queue power_save_queue;
> + /**< When polling multiple queues, all but this one will be ignored */
> + bool power_save_queue_set;
> + /**< When polling multiple queues, power save queue must be set */
> + size_t n_queues;
> + /**< How many queues are in the list? */
>   volatile enum pmd_mgmt_state pwr_mgmt_state;
>   /**< State of power management for this queue */
>   enum rte_power_pmd_mgmt_type cb_mode;
> @@ -43,8 +55,97 @@ struct pmd_queue_cfg {
>   uint64_t empty_poll_stats;
>   /**< Number of empty polls */
>  } __rte_cache_aligned;
> +static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE];
> 
> -static struct pmd_queue_cfg 
> port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
> +static inline bool
> +queue_equal(const struct queue *l, const struct queue *r)
> +{
> + return l->portid == r->portid && l->qid == r->qid;
> +}
> +
> +static inline void
> +queue_copy(struct queue *dst, const struct queue *src)
> +{
> + dst->portid = src->portid;
> + dst->qid = src->qid;
> +}
> +
> +static inline bool
> +queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) {

Here and in other places - any reason why standard DPDK coding style is not 
used?

> + const struct queue *pw

[dpdk-dev] 回复: [PATCH v1 1/2] net/i40e: improve performance for scalar Tx

2021-06-22 Thread Feifei Wang
Hi, Beilei

Thanks for your comments, please see below.

> -邮件原件-
> 发件人: Xing, Beilei 
> 发送时间: 2021年6月22日 14:08
> 收件人: Feifei Wang 
> 抄送: dev@dpdk.org; nd ; Ruifeng Wang
> 
> 主题: RE: [PATCH v1 1/2] net/i40e: improve performance for scalar Tx
> 
> 
> 
> > -Original Message-
> > From: Feifei Wang 
> > Sent: Thursday, May 27, 2021 4:17 PM
> > To: Xing, Beilei 
> > Cc: dev@dpdk.org; n...@arm.com; Feifei Wang ;
> > Ruifeng Wang 
> > Subject: [PATCH v1 1/2] net/i40e: improve performance for scalar Tx
> >
> > For i40e scalar Tx path, if implement FAST_FREE_MBUF mode, it means
> > per- queue all mbufs come from the same mempool and have refcnt = 1.
> >
> > Thus we can use bulk free of the buffers when mbuf fast free mode is
> > enabled.
> >
> > For scalar path in arm platform:
> > In n1sdp, performance is improved by 7.8%; In thunderx2, performance
> > is improved by 6.7%.
> >
> > For scalar path in x86 platform,
> > performance is improved by 6%.
> >
> > Suggested-by: Ruifeng Wang 
> > Signed-off-by: Feifei Wang 
> > ---
> >  drivers/net/i40e/i40e_rxtx.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx.c
> > b/drivers/net/i40e/i40e_rxtx.c index
> > 6c58decece..fe7b20f750 100644
> > --- a/drivers/net/i40e/i40e_rxtx.c
> > +++ b/drivers/net/i40e/i40e_rxtx.c
> > @@ -1295,6 +1295,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)  {
> > struct i40e_tx_entry *txep;
> > uint16_t i;
> > +   struct rte_mbuf *free[RTE_I40E_TX_MAX_FREE_BUF_SZ];
> >
> > if ((txq->tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &
> >
>   rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) != @@ -1308,9
> +1309,11
> > @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
> >
> > if (txq->offloads & DEV_TX_OFFLOAD_MBUF_FAST_FREE) {
> > for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
> > -   rte_mempool_put(txep->mbuf->pool, txep->mbuf);
> > +   free[i] = txep->mbuf;
> 
> The tx_rs_thresh can be 'nb_desc - 3', so if tx_rs_thres >
> RTE_I40E_TX_MAX_FREE_BUF_SZ, there'll be out of bounds, right?

Actually tx_rs_thresh  <=  tx__free_thresh  <  nb_desc - 3 
(i40e_dev_tx_queue_setup).
However, I don't know how it affects the relationship between tx_rs_thresh and
RTE_I40E_TX_MAX_FREE_BUF_SZ.

Furthermore, I think you are right that tx_rs_thres can be greater than
RTE_I40E_TX_MAX_FREE_BUF_SZ in tx_simple_mode (i40e_set_tx_function_flag).

Thus, in scalar path, we can change like:
---
int n = txq->tx_rs_thresh;
int32_t i = 0, j = 0;
const int32_t k = RTE_ALIGN_FLOOR(n, RTE_I40E_TX_MAX_FREE_BUF_SZ);
const int32_t m = n % RTE_I40E_TX_MAX_FREE_BUF_SZ;
struct rte_mbuf *free[RTE_I40E_TX_MAX_FREE_BUF_SZ];

For FAST_FREE_MODE:

if (k) {
for (j = 0; j != k - RTE_I40E_TX_MAX_FREE_BUF_SZ;
j += RTE_I40E_TX_MAX_FREE_BUF_SZ) {
for (i = 0; i mbuf;
txep->mbuf = NULL;
}
rte_mempool_put_bulk(free[0]->pool, (void **)free,
RTE_I40E_TX_MAX_FREE_BUF_SZ);
}
} else {
for (i = 0; i < m; ++i, ++txep) {
free[i] = txep->mbuf;
txep->mbuf = NULL;
}
rte_mempool_put_bulk(free[0]->pool, (void **)free, m);
}
---

Best Regards
Feifei


Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-22 Thread Morten Brørup
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev,
> Konstantin
> 
> >
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev,
> > > Konstantin
> > >
> > > >
> > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev,
> > > > > Konstantin
> > > > >
> > > > > > > How can we hide the callbacks since they are used by inline
> > > burst
> > > > > functions.
> > > > > >
> > > > > > I probably I owe a better explanation to what I meant in
> first
> > > mail.
> > > > > > Otherwise it sounds confusing.
> > > > > > I'll try to write a more detailed one in next few days.
> > > > >
> > > > > Actually I gave it another thought over weekend, and might be
> we
> > > can
> > > > > hide rte_eth_dev_cb even in a simpler way. I'd use
> eth_rx_burst()
> > > as
> > > > > an example, but the same principle applies to other 'fast'
> > > functions.
> > > > >
> > > > >  1. Needed changes for PMDs rx_pkt_burst():
> > > > > a) change function prototype to accept 'uint16_t port_id'
> and
> > > > > 'uint16_t queue_id',
> > > > >  instead of current 'void *'.
> > > > > b) Each PMD rx_pkt_burst() will have to call
> > > rte_eth_rx_epilog()
> > > > > function at return.
> > > > >  This  inline function will do all CB calls for that
> queue.
> > > > >
> > > > > To be more specific, let say we have some PMD: xyz with RX
> > > function:
> > > > >
> > > > > uint16_t
> > > > > xyz_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> uint16_t
> > > > > nb_pkts)
> > > > > {
> > > > >  struct xyz_rx_queue *rxq = rx_queue;
> > > > >  uint16_t nb_rx = 0;
> > > > >
> > > > >  /* do actual stuff here */
> > > > > 
> > > > > return nb_rx;
> > > > > }
> > > > >
> > > > > It will be transformed to:
> > > > >
> > > > > uint16_t
> > > > > xyz_recv_pkts(uint16_t port_id, uint16_t queue_id, struct
> rte_mbuf
> > > > > **rx_pkts, uint16_t nb_pkts)
> > > > > {
> > > > >  struct xyz_rx_queue *rxq;
> > > > >  uint16_t nb_rx;
> > > > >
> > > > >  rxq = _rte_eth_rx_prolog(port_id, queue_id);
> > > > >  if (rxq == NULL)
> > > > >  return 0;
> > > > >  nb_rx = _xyz_real_recv_pkts(rxq, rx_pkts, nb_pkts);
> > > > >  return _rte_eth_rx_epilog(port_id, queue_id, rx_pkts,
> > > > > nb_pkts);
> > > > > }
> > > > >
> > > > > And somewhere in ethdev_private.h:
> > > > >
> > > > > static inline void *
> > > > > _rte_eth_rx_prolog(uint16_t port_id, uint16_t queue_id);
> > > > > {
> > > > >struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > > > >
> > > > > #ifdef RTE_ETHDEV_DEBUG_RX
> > > > > RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
> > > > > RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, NULL);
> > > > >
> > > > > if (queue_id >= dev->data->nb_rx_queues) {
> > > > > RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> > > > > queue_id);
> > > > > return NULL;
> > > > > }
> > > > > #endif
> > > > >   return dev->data->rx_queues[queue_id];
> > > > > }
> > > > >
> > > > > static inline uint16_t
> > > > > _rte_eth_rx_epilog(uint16_t port_id, uint16_t queue_id, struct
> > > rte_mbuf
> > > > > **rx_pkts, const uint16_t nb_pkts);
> > > > > {
> > > > > struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > > > >
> > > > > #ifdef RTE_ETHDEV_RXTX_CALLBACKS
> > > > > struct rte_eth_rxtx_callback *cb;
> > > > >
> > > > > /* __ATOMIC_RELEASE memory order was used when the
> > > > >  * call back was inserted into the list.
> > > > >  * Since there is a clear dependency between loading
> > > > >  * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory
> order is
> > > > >  * not required.
> > > > >  */
> > > > > cb = __atomic_load_n(&dev->post_rx_burst_cbs[queue_id],
> > > > > __ATOMIC_RELAXED);
> > > > >
> > > > > if (unlikely(cb != NULL)) {
> > > > > do {
> > > > > nb_rx = cb->fn.rx(port_id, queue_id,
> > > rx_pkts,
> > > > > nb_rx,
> > > > > nb_pkts, cb-
> > > >param);
> > > > > cb = cb->next;
> > > > > } while (cb != NULL);
> > > > > }
> > > > > #endif
> > > > >
> > > > > rte_ethdev_trace_rx_burst(port_id, queue_id, (void
> > > **)rx_pkts,
> > > > > nb_rx);
> > > > > return nb_rx;
> > > > >  }
> > > >
> > > > That would make the compiler inline _rte_eth_rx_epilog() into the
> > > driver when compiling the DPDK library. But
> > > > RTE_ETHDEV_RXTX_CALLBACKS is a definition for the application
> > > developer to use when compiling the DPDK application.
> > >
> > > I believe it is for both - user app and DPDK drivers.
> > > AFAIK, they both have to use the same rte_config.h, otherwise
> things
> > > will be broken.
> > > If let say RTE_ETHDEV_RXTX_CALLBACKS is not enabled in ethdev, then
> > > user wouldn't be able to 

[dpdk-dev] 回复: [PATCH v1 1/2] net/i40e: improve performance for scalar Tx

2021-06-22 Thread Feifei Wang
Sorry for a mistake for the code, it should be:

int n = txq->tx_rs_thresh;
 int32_t i = 0, j = 0;
const int32_t k = RTE_ALIGN_FLOOR(n, RTE_I40E_TX_MAX_FREE_BUF_SZ);
const int32_t m = n % RTE_I40E_TX_MAX_FREE_BUF_SZ; 
struct rte_mbuf *free[RTE_I40E_TX_MAX_FREE_BUF_SZ];

For FAST_FREE_MODE:

if (k) {
for (j = 0; j != k - RTE_I40E_TX_MAX_FREE_BUF_SZ;
j += RTE_I40E_TX_MAX_FREE_BUF_SZ) {
for (i = 0; i mbuf;
txep->mbuf = NULL;
}
rte_mempool_put_bulk(free[0]->pool, (void **)free,
RTE_I40E_TX_MAX_FREE_BUF_SZ);
}
 } 

if (m) {
for (i = 0; i < m; ++i, ++txep) {
free[i] = txep->mbuf;
txep->mbuf = NULL;
}
 }
 rte_mempool_put_bulk(free[0]->pool, (void **)free, m); }



[dpdk-dev] [PATCH v5] devtools: script to track map symbols

2021-06-22 Thread Ray Kinsella
Script to track growth of stable and experimental symbols
over releases since v19.11.

Signed-off-by: Ray Kinsella 
---
v2: reworked to fix pylint errors
v3: sent with the correct in-reply-to
v4: fix typos picked up by the CI
v5: fix terminal_size & directory args

 devtools/count_symbols.py | 262 ++
 1 file changed, 262 insertions(+)
 create mode 100755 devtools/count_symbols.py

diff --git a/devtools/count_symbols.py b/devtools/count_symbols.py
new file mode 100755
index 00..96990f609f
--- /dev/null
+++ b/devtools/count_symbols.py
@@ -0,0 +1,262 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+'''Tool to count the number of symbols in each DPDK release'''
+from pathlib import Path
+import sys
+import os
+import subprocess
+import argparse
+import re
+import datetime
+
+try:
+from parsley import makeGrammar
+except ImportError:
+print('This script uses the package Parsley to parse C Mapfiles.\n'
+  'This can be installed with \"pip install parsley".')
+sys.exit()
+
+MAP_GRAMMAR = r"""
+
+ws = (' ' | '\r' | '\n' | '\t')*
+
+ABI_VER = ({})
+DPDK_VER = ('DPDK_' ABI_VER)
+ABI_NAME = ('INTERNAL' | 'EXPERIMENTAL' | DPDK_VER)
+comment = '#' (~'\n' anything)+ '\n'
+symbol = (~(';' | '}}' | '#') anything )+:c ';' -> ''.join(c)
+global = 'global:'
+local = 'local: *;'
+symbols = comment* symbol:s ws comment* -> s
+
+abi = (abi_section+):m -> dict(m)
+abi_section = (ws ABI_NAME:e ws '{{' ws global* (~local ws symbols)*:s ws 
local* ws '}}' ws DPDK_VER* ';' ws) -> (e,s)
+"""
+
+def get_abi_versions():
+'''Returns a string of possible dpdk abi versions'''
+
+year = datetime.date.today().year - 2000
+tags = " |".join(['\'{}\''.format(i) \
+ for i in reversed(range(21, year + 1)) ])
+tags  = tags + ' | \'20.0.1\' | \'20.0\' | \'20\''
+
+return tags
+
+def get_dpdk_releases():
+'''Returns a list of dpdk release tags names  since v19.11'''
+
+year = datetime.date.today().year - 2000
+year_range = "|".join("{}".format(i) for i in range(19,year + 1))
+pattern = re.compile(r'^\"v(' +  year_range + r')\.\d{2}\"$')
+
+cmd = ['git', 'for-each-ref', '--sort=taggerdate', '--format', '"%(tag)"']
+try:
+result = subprocess.run(cmd, \
+stdout=subprocess.PIPE, \
+stderr=subprocess.PIPE,
+check=True)
+except subprocess.CalledProcessError:
+print("Failed to interogate git for release tags")
+sys.exit()
+
+
+tags = result.stdout.decode('utf-8').split('\n')
+
+# find the non-rcs between now and v19.11
+tags = [ tag.replace('\"','') \
+ for tag in reversed(tags) \
+ if pattern.match(tag) ][:-3]
+
+return tags
+
+def fix_directory_name(path):
+'''Prepend librte to the source directory name'''
+mapfilepath1 = str(path.parent.name)
+mapfilepath2 = str(path.parents[1])
+mapfilepath = mapfilepath2 + '/librte_' + mapfilepath1
+
+return mapfilepath
+
+def directory_renamed(path, rel):
+'''Fix removal of the librte_ from the directory names'''
+
+mapfilepath = fix_directory_name(path)
+tagfile = '{}:{}/{}'.format(rel, mapfilepath,  path.name)
+
+try:
+result = subprocess.run(['git', 'show', tagfile], \
+stdout=subprocess.PIPE, \
+stderr=subprocess.PIPE,
+check=True)
+except subprocess.CalledProcessError:
+result = None
+
+return result
+
+def mapfile_renamed(path, rel):
+'''Fix renaming of the map file'''
+newfile = None
+
+result = subprocess.run(['git', 'ls-tree', \
+ rel, str(path.parent) + '/'], \
+stdout=subprocess.PIPE, \
+stderr=subprocess.PIPE,
+check=True)
+dentries = result.stdout.decode('utf-8')
+dentries = dentries.split('\n')
+
+# filter entries looking for the map file
+dentries = [dentry for dentry in dentries if dentry.endswith('.map')]
+if len(dentries) > 1 or len(dentries) == 0:
+return None
+
+dparts = dentries[0].split('/')
+newfile = dparts[len(dparts) - 1]
+
+if newfile is not None:
+tagfile = '{}:{}/{}'.format(rel, path.parent, newfile)
+
+try:
+result = subprocess.run(['git', 'show', tagfile], \
+stdout=subprocess.PIPE, \
+stderr=subprocess.PIPE,
+check=True)
+except subprocess.CalledProcessError:
+result = None
+
+else:
+result = None
+
+return result
+
+def mapfile_and_directory_renamed(path, rel):
+'''Fix renaming of the map file & the source directory'''
+mapfilepath = Path("{}/{}".fo

Re: [dpdk-dev] [PATCH v3] lib/rte_rib6: fix stack buffer overflow

2021-06-22 Thread Medvedkin, Vladimir

Hi Owen, David,

Apart from David's comments looks good to me.


On 22/06/2021 10:10, David Marchand wrote:

On Mon, Jun 21, 2021 at 3:28 PM  wrote:


From: Owen Hilyard 


Hi Owen, Vladimir,


Owen, two comments on the patch title.

- We (try to) never prefix with lib/, as it gives no additional info.
The prefix should be the library name.
There were some transgressions to this rule, but this was Thomas or me
being absent minded.

For other parts of the tree, it is a bit more complex, but if unsure,
the simplest is to look at the git history.
Here this is the rib library, so "rib: " is enough.


- The title purpose is to give a hint of the functional impact: people
looking for fixes for a type of bug can find it more easily.

Here, just indicating we are fixing a buffer overflow won't help judge
in which usecase the issue happenned.
How about: "rib: fix max depth IPv6 lookup"




ASAN found a stack buffer overflow in lib/rib/rte_rib6.c:get_dir.
The fix for the stack buffer overflow was to make sure depth
was always < 128, since when depth = 128 it caused the index
into the ip address to be 16, which read off the end of the array.

While trying to solve the buffer overflow, I noticed that a few
changes could be made to remove the for loop entirely.

Fixes: f7e861e21c ("rib: support IPv6")

Cc: sta...@dpdk.org



Signed-off-by: Owen Hilyard 



Vladimir, can you review this fix?

Thanks!



Acked-by: Vladimir Medvedkin 

--
Regards,
Vladimir


[dpdk-dev] [PATCH v2] kni: fix wrong mbuf alloc count in kni_allocate_mbufs

2021-06-22 Thread wangyunjian
From: Yunjian Wang 

In kni_allocate_mbufs(), we alloc mbuf for alloc_q as this code.
allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
& (MAX_MBUF_BURST_NUM - 1);
The value of allocq_free maybe zero, for example :
The ring size is 1024. After init, write = read = 0. Then we fill
kni->alloc_q to full. At this time, write = 1023, read = 0.

Then the kernel send 32 packets to userspace. At this time, write
= 1023, read = 32. And then the userspace receive this 32 packets.
Then fill the kni->alloc_q, (32 - 1023 - 1) & 31 = 0, fill nothing.
...
Then the kernel send 32 packets to userspace. At this time, write
= 1023, read = 992. And then the userspace receive this 32 packets.
Then fill the kni->alloc_q, (992 - 1023 - 1) & 31 = 0, fill nothing.

Then the kernel send 32 packets to userspace. The kni->alloc_q only
has 31 mbufs and will drop one packet.

Absolutely, this is a special scene. Normally, it will fill some
mbufs everytime, but may not enough for the kernel to use.

In this patch, we always keep the kni->alloc_q to full for the kernel
to use.

Fixes: 49da4e82cf94 ("kni: allocate no more mbuf than empty slots in queue")
Cc: sta...@dpdk.org

Signed-off-by: Cheng Liu 
Signed-off-by: Yunjian Wang 
---
v2:
   add fixes tag and update commit log
---
 lib/kni/rte_kni.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/kni/rte_kni.c b/lib/kni/rte_kni.c
index 9dae6a8d7c..eb24b0d0ae 100644
--- a/lib/kni/rte_kni.c
+++ b/lib/kni/rte_kni.c
@@ -677,8 +677,9 @@ kni_allocate_mbufs(struct rte_kni *kni)
return;
}
 
-   allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1)
-   & (MAX_MBUF_BURST_NUM - 1);
+   allocq_free = kni_fifo_free_count(kni->alloc_q);
+   allocq_free = (allocq_free > MAX_MBUF_BURST_NUM) ?
+   MAX_MBUF_BURST_NUM : allocq_free;
for (i = 0; i < allocq_free; i++) {
pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
if (unlikely(pkts[i] == NULL)) {
-- 
2.23.0



Re: [dpdk-dev] [PATCH] lib/flow_classify: fix leaking rules on delete

2021-06-22 Thread Iremonger, Bernard
Hi David, Owen,

> -Original Message-
> From: David Marchand 
> Sent: Tuesday, June 22, 2021 8:24 AM
> To: Owen Hilyard ; Iremonger, Bernard
> ; Yigit, Ferruh 
> Cc: dev 
> Subject: Re: [PATCH] lib/flow_classify: fix leaking rules on delete
> 
> On Wed, Jun 16, 2021 at 9:57 PM  wrote:
> >
> > From: Owen Hilyard 
> >
> > Rules in a classify table were not freed if the table had a delete
> > function.
> >
> > Fixes: be41ac2a3 ("flow_classify: introduce flow classify library")
> Cc: sta...@dpdk.org
> 
> >
> > Signed-off-by: Owen Hilyard 
> > ---
> >  lib/flow_classify/rte_flow_classify.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/flow_classify/rte_flow_classify.c
> > b/lib/flow_classify/rte_flow_classify.c
> > index f125267e8..06aed3b70 100644
> > --- a/lib/flow_classify/rte_flow_classify.c
> > +++ b/lib/flow_classify/rte_flow_classify.c
> > @@ -579,7 +579,7 @@ rte_flow_classify_table_entry_delete(struct
> rte_flow_classifier *cls,
> > &rule->u.key.key_del,
> > &rule->key_found,
> > &rule->entry);
> > -
> > +   free(rule);
> > return ret;
> > }
> > }
> 
> I find it strange to free the rule regardless of the result of the
> f_delete() op.

I agree the result of the f_delete() op should be checked before freeing the 
rule.

> The same is done out of the loop which means this function returns -EINVAL
> and frees the rule in this case too.

The free() outside the loop at line 587 does not make sense to me now and 
should be removed.

> 
> Bernard, Ferruh, can you review please?
> 
> Thanks!
> 
> 
> --
> David Marchand

Regards,

Bernard.


Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-22 Thread Ananyev, Konstantin


> 
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev,
> > Konstantin
> >
> > >
> > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev,
> > > > Konstantin
> > > >
> > > > >
> > > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev,
> > > > > > Konstantin
> > > > > >
> > > > > > > > How can we hide the callbacks since they are used by inline
> > > > burst
> > > > > > functions.
> > > > > > >
> > > > > > > I probably I owe a better explanation to what I meant in
> > first
> > > > mail.
> > > > > > > Otherwise it sounds confusing.
> > > > > > > I'll try to write a more detailed one in next few days.
> > > > > >
> > > > > > Actually I gave it another thought over weekend, and might be
> > we
> > > > can
> > > > > > hide rte_eth_dev_cb even in a simpler way. I'd use
> > eth_rx_burst()
> > > > as
> > > > > > an example, but the same principle applies to other 'fast'
> > > > functions.
> > > > > >
> > > > > >  1. Needed changes for PMDs rx_pkt_burst():
> > > > > > a) change function prototype to accept 'uint16_t port_id'
> > and
> > > > > > 'uint16_t queue_id',
> > > > > >  instead of current 'void *'.
> > > > > > b) Each PMD rx_pkt_burst() will have to call
> > > > rte_eth_rx_epilog()
> > > > > > function at return.
> > > > > >  This  inline function will do all CB calls for that
> > queue.
> > > > > >
> > > > > > To be more specific, let say we have some PMD: xyz with RX
> > > > function:
> > > > > >
> > > > > > uint16_t
> > > > > > xyz_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> > uint16_t
> > > > > > nb_pkts)
> > > > > > {
> > > > > >  struct xyz_rx_queue *rxq = rx_queue;
> > > > > >  uint16_t nb_rx = 0;
> > > > > >
> > > > > >  /* do actual stuff here */
> > > > > > 
> > > > > > return nb_rx;
> > > > > > }
> > > > > >
> > > > > > It will be transformed to:
> > > > > >
> > > > > > uint16_t
> > > > > > xyz_recv_pkts(uint16_t port_id, uint16_t queue_id, struct
> > rte_mbuf
> > > > > > **rx_pkts, uint16_t nb_pkts)
> > > > > > {
> > > > > >  struct xyz_rx_queue *rxq;
> > > > > >  uint16_t nb_rx;
> > > > > >
> > > > > >  rxq = _rte_eth_rx_prolog(port_id, queue_id);
> > > > > >  if (rxq == NULL)
> > > > > >  return 0;
> > > > > >  nb_rx = _xyz_real_recv_pkts(rxq, rx_pkts, nb_pkts);
> > > > > >  return _rte_eth_rx_epilog(port_id, queue_id, rx_pkts,
> > > > > > nb_pkts);
> > > > > > }
> > > > > >
> > > > > > And somewhere in ethdev_private.h:
> > > > > >
> > > > > > static inline void *
> > > > > > _rte_eth_rx_prolog(uint16_t port_id, uint16_t queue_id);
> > > > > > {
> > > > > >struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > > > > >
> > > > > > #ifdef RTE_ETHDEV_DEBUG_RX
> > > > > > RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
> > > > > > RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, NULL);
> > > > > >
> > > > > > if (queue_id >= dev->data->nb_rx_queues) {
> > > > > > RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n",
> > > > > > queue_id);
> > > > > > return NULL;
> > > > > > }
> > > > > > #endif
> > > > > >   return dev->data->rx_queues[queue_id];
> > > > > > }
> > > > > >
> > > > > > static inline uint16_t
> > > > > > _rte_eth_rx_epilog(uint16_t port_id, uint16_t queue_id, struct
> > > > rte_mbuf
> > > > > > **rx_pkts, const uint16_t nb_pkts);
> > > > > > {
> > > > > > struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > > > > >
> > > > > > #ifdef RTE_ETHDEV_RXTX_CALLBACKS
> > > > > > struct rte_eth_rxtx_callback *cb;
> > > > > >
> > > > > > /* __ATOMIC_RELEASE memory order was used when the
> > > > > >  * call back was inserted into the list.
> > > > > >  * Since there is a clear dependency between loading
> > > > > >  * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory
> > order is
> > > > > >  * not required.
> > > > > >  */
> > > > > > cb = __atomic_load_n(&dev->post_rx_burst_cbs[queue_id],
> > > > > > __ATOMIC_RELAXED);
> > > > > >
> > > > > > if (unlikely(cb != NULL)) {
> > > > > > do {
> > > > > > nb_rx = cb->fn.rx(port_id, queue_id,
> > > > rx_pkts,
> > > > > > nb_rx,
> > > > > > nb_pkts, cb-
> > > > >param);
> > > > > > cb = cb->next;
> > > > > > } while (cb != NULL);
> > > > > > }
> > > > > > #endif
> > > > > >
> > > > > > rte_ethdev_trace_rx_burst(port_id, queue_id, (void
> > > > **)rx_pkts,
> > > > > > nb_rx);
> > > > > > return nb_rx;
> > > > > >  }
> > > > >
> > > > > That would make the compiler inline _rte_eth_rx_epilog() into the
> > > > driver when compiling the DPDK library. But
> > > > > RTE_ETHDEV_RXTX_CALLBACKS is a definition for the application
> > > > developer to use when compiling the DPDK appli

Re: [dpdk-dev] [PATCH v2] kni: fix wrong mbuf alloc count in kni_allocate_mbufs

2021-06-22 Thread Ferruh Yigit
On 6/22/2021 11:57 AM, wangyunjian wrote:
> From: Yunjian Wang 
> 
> In kni_allocate_mbufs(), we alloc mbuf for alloc_q as this code.
> allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
>   & (MAX_MBUF_BURST_NUM - 1);
> The value of allocq_free maybe zero, for example :
> The ring size is 1024. After init, write = read = 0. Then we fill
> kni->alloc_q to full. At this time, write = 1023, read = 0.
> 
> Then the kernel send 32 packets to userspace. At this time, write
> = 1023, read = 32. And then the userspace receive this 32 packets.
> Then fill the kni->alloc_q, (32 - 1023 - 1) & 31 = 0, fill nothing.
> ...
> Then the kernel send 32 packets to userspace. At this time, write
> = 1023, read = 992. And then the userspace receive this 32 packets.
> Then fill the kni->alloc_q, (992 - 1023 - 1) & 31 = 0, fill nothing.
> 
> Then the kernel send 32 packets to userspace. The kni->alloc_q only
> has 31 mbufs and will drop one packet.
> 
> Absolutely, this is a special scene. Normally, it will fill some
> mbufs everytime, but may not enough for the kernel to use.
> 
> In this patch, we always keep the kni->alloc_q to full for the kernel
> to use.
> 
> Fixes: 49da4e82cf94 ("kni: allocate no more mbuf than empty slots in queue")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Cheng Liu 
> Signed-off-by: Yunjian Wang 

Acked-by: Ferruh Yigit 

What do you think to change patch title to something like:
kni: fix mbuf allocation for alloc FIFO


Re: [dpdk-dev] [PATCH v2] kni: fix wrong mbuf alloc count in kni_allocate_mbufs

2021-06-22 Thread wangyunjian
> -Original Message-
> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
> Sent: Tuesday, June 22, 2021 8:28 PM
> To: wangyunjian ; dev@dpdk.org
> Cc: tho...@monjalon.net; gowrishanka...@linux.vnet.ibm.com;
> dingxiaoxiong ; sta...@dpdk.org; liucheng (J)
> 
> Subject: Re: [PATCH v2] kni: fix wrong mbuf alloc count in kni_allocate_mbufs
> 
> On 6/22/2021 11:57 AM, wangyunjian wrote:
> > From: Yunjian Wang 
> >
> > In kni_allocate_mbufs(), we alloc mbuf for alloc_q as this code.
> > allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
> > & (MAX_MBUF_BURST_NUM - 1);
> > The value of allocq_free maybe zero, for example :
> > The ring size is 1024. After init, write = read = 0. Then we fill
> > kni->alloc_q to full. At this time, write = 1023, read = 0.
> >
> > Then the kernel send 32 packets to userspace. At this time, write =
> > 1023, read = 32. And then the userspace receive this 32 packets.
> > Then fill the kni->alloc_q, (32 - 1023 - 1) & 31 = 0, fill nothing.
> > ...
> > Then the kernel send 32 packets to userspace. At this time, write =
> > 1023, read = 992. And then the userspace receive this 32 packets.
> > Then fill the kni->alloc_q, (992 - 1023 - 1) & 31 = 0, fill nothing.
> >
> > Then the kernel send 32 packets to userspace. The kni->alloc_q only
> > has 31 mbufs and will drop one packet.
> >
> > Absolutely, this is a special scene. Normally, it will fill some mbufs
> > everytime, but may not enough for the kernel to use.
> >
> > In this patch, we always keep the kni->alloc_q to full for the kernel
> > to use.
> >
> > Fixes: 49da4e82cf94 ("kni: allocate no more mbuf than empty slots in
> > queue")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Cheng Liu 
> > Signed-off-by: Yunjian Wang 
> 
> Acked-by: Ferruh Yigit 
> 
> What do you think to change patch title to something like:
> kni: fix mbuf allocation for alloc FIFO

OK, I will change patch title later.

Thanks


[dpdk-dev] [PATCH] kni: fix crash on userspace VA for segmented packets

2021-06-22 Thread Ferruh Yigit
When IOVA=VA, address translation for segmented packets is wrong, it
assumes the address in the mbuf->next is physical address, not VA
address.

Fixing the address translation to work both PA & VA mode.

Fixes: e73831dc6c26 ("kni: support userspace VA")
Cc: sta...@dpdk.org

Signed-off-by: Ferruh Yigit 
---
Cc: vattun...@marvell.com
---
 kernel/linux/kni/kni_net.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index f259327954b2..611719b5ee27 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -245,7 +245,7 @@ kni_fifo_trans_pa2va(struct kni_dev *kni,
break;
 
prev_kva = kva;
-   kva = pa2kva(kva->next);
+   kva = get_kva(kni, kva->next);
/* Convert physical address to virtual address 
*/
prev_kva->next = pa2va(prev_kva->next, kva);
}
@@ -422,7 +422,7 @@ kni_net_rx_normal(struct kni_dev *kni)
break;
 
prev_kva = kva;
-   kva = pa2kva(kva->next);
+   kva = get_kva(kni, kva->next);
data_kva = kva2data_kva(kva);
/* Convert physical address to virtual address 
*/
prev_kva->next = pa2va(prev_kva->next, kva);
@@ -501,7 +501,7 @@ kni_net_rx_lo_fifo(struct kni_dev *kni)
kni->va[i] = pa2va(kni->pa[i], kva);
 
while (kva->next) {
-   next_kva = pa2kva(kva->next);
+   next_kva = get_kva(kni, kva->next);
/* Convert physical address to virtual address 
*/
kva->next = pa2va(kva->next, next_kva);
kva = next_kva;
-- 
2.31.1



[dpdk-dev] [PATCH v2 1/2] power: don't use rte prefix in internal code

2021-06-22 Thread David Hunt
From: Anatoly Burakov 

Currently, ACPI code uses rte_power_info as the struct name, which
gives the appearance that this is an externally visible API. Fix to
use internal namespace.

Signed-off-by: Anatoly Burakov 
Acked-by: David Hunt 
---
 lib/power/power_acpi_cpufreq.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/lib/power/power_acpi_cpufreq.c b/lib/power/power_acpi_cpufreq.c
index d028a9947f..1b8c69cc8b 100644
--- a/lib/power/power_acpi_cpufreq.c
+++ b/lib/power/power_acpi_cpufreq.c
@@ -78,7 +78,7 @@ enum power_state {
 /**
  * Power info per lcore.
  */
-struct rte_power_info {
+struct acpi_power_info {
unsigned int lcore_id;   /**< Logical core id */
uint32_t freqs[RTE_MAX_LCORE_FREQS]; /**< Frequency array */
uint32_t nb_freqs;   /**< number of available freqs */
@@ -90,14 +90,14 @@ struct rte_power_info {
uint16_t turbo_enable;   /**< Turbo Boost enable/disable */
 } __rte_cache_aligned;
 
-static struct rte_power_info lcore_power_info[RTE_MAX_LCORE];
+static struct acpi_power_info lcore_power_info[RTE_MAX_LCORE];
 
 /**
  * It is to set specific freq for specific logical core, according to the index
  * of supported frequencies.
  */
 static int
-set_freq_internal(struct rte_power_info *pi, uint32_t idx)
+set_freq_internal(struct acpi_power_info *pi, uint32_t idx)
 {
if (idx >= RTE_MAX_LCORE_FREQS || idx >= pi->nb_freqs) {
RTE_LOG(ERR, POWER, "Invalid frequency index %u, which "
@@ -133,7 +133,7 @@ set_freq_internal(struct rte_power_info *pi, uint32_t idx)
  * governor will be saved for rolling back.
  */
 static int
-power_set_governor_userspace(struct rte_power_info *pi)
+power_set_governor_userspace(struct acpi_power_info *pi)
 {
FILE *f;
int ret = -1;
@@ -189,7 +189,7 @@ power_set_governor_userspace(struct rte_power_info *pi)
  * sys file.
  */
 static int
-power_get_available_freqs(struct rte_power_info *pi)
+power_get_available_freqs(struct acpi_power_info *pi)
 {
FILE *f;
int ret = -1, i, count;
@@ -259,7 +259,7 @@ power_get_available_freqs(struct rte_power_info *pi)
  * It is to fopen the sys file for the future setting the lcore frequency.
  */
 static int
-power_init_for_setting_freq(struct rte_power_info *pi)
+power_init_for_setting_freq(struct acpi_power_info *pi)
 {
FILE *f;
char fullpath[PATH_MAX];
@@ -299,7 +299,7 @@ power_acpi_cpufreq_check_supported(void)
 int
 power_acpi_cpufreq_init(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
uint32_t exp_state;
 
if (lcore_id >= RTE_MAX_LCORE) {
@@ -374,7 +374,7 @@ power_acpi_cpufreq_init(unsigned int lcore_id)
  * needed by writing the sys file.
  */
 static int
-power_set_governor_original(struct rte_power_info *pi)
+power_set_governor_original(struct acpi_power_info *pi)
 {
FILE *f;
int ret = -1;
@@ -420,7 +420,7 @@ power_set_governor_original(struct rte_power_info *pi)
 int
 power_acpi_cpufreq_exit(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
uint32_t exp_state;
 
if (lcore_id >= RTE_MAX_LCORE) {
@@ -475,7 +475,7 @@ power_acpi_cpufreq_exit(unsigned int lcore_id)
 uint32_t
 power_acpi_cpufreq_freqs(unsigned int lcore_id, uint32_t *freqs, uint32_t num)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
 
if (lcore_id >= RTE_MAX_LCORE) {
RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
@@ -522,7 +522,7 @@ power_acpi_cpufreq_set_freq(unsigned int lcore_id, uint32_t 
index)
 int
 power_acpi_cpufreq_freq_down(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
 
if (lcore_id >= RTE_MAX_LCORE) {
RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
@@ -540,7 +540,7 @@ power_acpi_cpufreq_freq_down(unsigned int lcore_id)
 int
 power_acpi_cpufreq_freq_up(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
 
if (lcore_id >= RTE_MAX_LCORE) {
RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
@@ -581,7 +581,7 @@ power_acpi_cpufreq_freq_max(unsigned int lcore_id)
 int
 power_acpi_cpufreq_freq_min(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
 
if (lcore_id >= RTE_MAX_LCORE) {
RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
@@ -598,7 +598,7 @@ power_acpi_cpufreq_freq_min(unsigned int lcore_id)
 int
 power_acpi_turbo_status(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
 
if (lcore_id >= RTE_MAX_LCORE) {
RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
@@ -614,7 +614,7 @@ power_acpi_turbo_status(unsigned int lcore_id)
 int
 power_acpi_enable_turbo(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+

[dpdk-dev] [PATCH v2 2/2] power: refactor pstate and acpi code

2021-06-22 Thread David Hunt
From: Anatoly Burakov 

Currently, ACPI and PSTATE modes have lots of code duplication,
confusing logic, and a bunch of other issues that can, and have, led to
various bugs and resource leaks.

This commit factors out the common parts of sysfs reading/writing for
ACPI and PSTATE drivers.

Signed-off-by: Anatoly Burakov 
Signed-off-by: David Hunt 

---
changes in v2
* fixed bugs raised by Richael Zhuang in review - open file rw+, etc.
* removed FOPS* and FOPEN* macros, which contained control statements.
* fixed some checkpatch warnings.
---
 lib/power/meson.build|   7 +
 lib/power/power_acpi_cpufreq.c   | 192 
 lib/power/power_common.c | 146 
 lib/power/power_common.h |  17 ++
 lib/power/power_pstate_cpufreq.c | 374 ++-
 5 files changed, 335 insertions(+), 401 deletions(-)

diff --git a/lib/power/meson.build b/lib/power/meson.build
index c1097d32f1..74c5f3a294 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -5,6 +5,13 @@ if not is_linux
 build = false
 reason = 'only supported on Linux'
 endif
+
+# we do some snprintf magic so silence format-nonliteral
+flag_nonliteral = '-Wno-format-nonliteral'
+if cc.has_argument(flag_nonliteral)
+   cflags += flag_nonliteral
+endif
+
 sources = files(
 'guest_channel.c',
 'power_acpi_cpufreq.c',
diff --git a/lib/power/power_acpi_cpufreq.c b/lib/power/power_acpi_cpufreq.c
index 1b8c69cc8b..9ca8d8a8f2 100644
--- a/lib/power/power_acpi_cpufreq.c
+++ b/lib/power/power_acpi_cpufreq.c
@@ -19,41 +19,10 @@
 #include "power_acpi_cpufreq.h"
 #include "power_common.h"
 
-#ifdef RTE_LIBRTE_POWER_DEBUG
-#define POWER_DEBUG_TRACE(fmt, args...) do { \
-   RTE_LOG(ERR, POWER, "%s: " fmt, __func__, ## args); \
-} while (0)
-#else
-#define POWER_DEBUG_TRACE(fmt, args...)
-#endif
-
-#define FOPEN_OR_ERR_RET(f, retval) do { \
-   if ((f) == NULL) { \
-   RTE_LOG(ERR, POWER, "File not opened\n"); \
-   return retval; \
-   } \
-} while (0)
-
-#define FOPS_OR_NULL_GOTO(ret, label) do { \
-   if ((ret) == NULL) { \
-   RTE_LOG(ERR, POWER, "fgets returns nothing\n"); \
-   goto label; \
-   } \
-} while (0)
-
-#define FOPS_OR_ERR_GOTO(ret, label) do { \
-   if ((ret) < 0) { \
-   RTE_LOG(ERR, POWER, "File operations failed\n"); \
-   goto label; \
-   } \
-} while (0)
-
 #define STR_SIZE 1024
 #define POWER_CONVERT_TO_DECIMAL 10
 
 #define POWER_GOVERNOR_USERSPACE "userspace"
-#define POWER_SYSFILE_GOVERNOR   \
-   "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_governor"
 #define POWER_SYSFILE_AVAIL_FREQ \

"/sys/devices/system/cpu/cpu%u/cpufreq/scaling_available_frequencies"
 #define POWER_SYSFILE_SETSPEED   \
@@ -135,53 +104,18 @@ set_freq_internal(struct acpi_power_info *pi, uint32_t 
idx)
 static int
 power_set_governor_userspace(struct acpi_power_info *pi)
 {
-   FILE *f;
-   int ret = -1;
-   char buf[BUFSIZ];
-   char fullpath[PATH_MAX];
-   char *s;
-   int val;
-
-   snprintf(fullpath, sizeof(fullpath), POWER_SYSFILE_GOVERNOR,
-   pi->lcore_id);
-   f = fopen(fullpath, "rw+");
-   FOPEN_OR_ERR_RET(f, ret);
-
-   s = fgets(buf, sizeof(buf), f);
-   FOPS_OR_NULL_GOTO(s, out);
-   /* Strip off terminating '\n' */
-   strtok(buf, "\n");
-
-   /* Save the original governor */
-   rte_strscpy(pi->governor_ori, buf, sizeof(pi->governor_ori));
-
-   /* Check if current governor is userspace */
-   if (strncmp(buf, POWER_GOVERNOR_USERSPACE,
-   sizeof(POWER_GOVERNOR_USERSPACE)) == 0) {
-   ret = 0;
-   POWER_DEBUG_TRACE("Power management governor of lcore %u is "
-   "already userspace\n", pi->lcore_id);
-   goto out;
-   }
-
-   /* Write 'userspace' to the governor */
-   val = fseek(f, 0, SEEK_SET);
-   FOPS_OR_ERR_GOTO(val, out);
-
-   val = fputs(POWER_GOVERNOR_USERSPACE, f);
-   FOPS_OR_ERR_GOTO(val, out);
-
-   /* We need to flush to see if the fputs succeeds */
-   val = fflush(f);
-   FOPS_OR_ERR_GOTO(val, out);
-
-   ret = 0;
-   RTE_LOG(INFO, POWER, "Power management governor of lcore %u has been "
-   "set to user space successfully\n", pi->lcore_id);
-out:
-   fclose(f);
+   return power_set_governor(pi->lcore_id, POWER_GOVERNOR_USERSPACE,
+   pi->governor_ori, sizeof(pi->governor_ori));
+}
 
-   return ret;
+/**
+ * It is to check the governor and then set the original governor back if
+ * needed by writing the sys file.
+ */
+static int
+power_set_governor_original(struct acpi_power_info *pi)
+{
+   return power_set_governor(pi->lcore_id, pi->governor_or

[dpdk-dev] [PATCH v3] kni: fix mbuf allocation for alloc FIFO

2021-06-22 Thread wangyunjian
From: Yunjian Wang 

In kni_allocate_mbufs(), we alloc mbuf for alloc_q as this code.
allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
& (MAX_MBUF_BURST_NUM - 1);
The value of allocq_free maybe zero, for example :
The ring size is 1024. After init, write = read = 0. Then we fill
kni->alloc_q to full. At this time, write = 1023, read = 0.

Then the kernel send 32 packets to userspace. At this time, write
= 1023, read = 32. And then the userspace receive this 32 packets.
Then fill the kni->alloc_q, (32 - 1023 - 1) & 31 = 0, fill nothing.
...
Then the kernel send 32 packets to userspace. At this time, write
= 1023, read = 992. And then the userspace receive this 32 packets.
Then fill the kni->alloc_q, (992 - 1023 - 1) & 31 = 0, fill nothing.

Then the kernel send 32 packets to userspace. The kni->alloc_q only
has 31 mbufs and will drop one packet.

Absolutely, this is a special scene. Normally, it will fill some
mbufs everytime, but may not enough for the kernel to use.

In this patch, we always keep the kni->alloc_q to full for the kernel
to use.

Fixes: 49da4e82cf94 ("kni: allocate no more mbuf than empty slots in queue")
Cc: sta...@dpdk.org

Signed-off-by: Cheng Liu 
Signed-off-by: Yunjian Wang 
Acked-by: Ferruh Yigit 
---
v3:
   update patch title
v2:
   add fixes tag and update commit log
---
 lib/kni/rte_kni.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/kni/rte_kni.c b/lib/kni/rte_kni.c
index 9dae6a8d7c..eb24b0d0ae 100644
--- a/lib/kni/rte_kni.c
+++ b/lib/kni/rte_kni.c
@@ -677,8 +677,9 @@ kni_allocate_mbufs(struct rte_kni *kni)
return;
}
 
-   allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1)
-   & (MAX_MBUF_BURST_NUM - 1);
+   allocq_free = kni_fifo_free_count(kni->alloc_q);
+   allocq_free = (allocq_free > MAX_MBUF_BURST_NUM) ?
+   MAX_MBUF_BURST_NUM : allocq_free;
for (i = 0; i < allocq_free; i++) {
pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
if (unlikely(pkts[i] == NULL)) {
-- 
2.23.0



[dpdk-dev] [PATCH v5 1/2] power: don't use rte prefix in internal code

2021-06-22 Thread David Hunt
From: Anatoly Burakov 

Currently, ACPI code uses rte_power_info as the struct name, which
gives the appearance that this is an externally visible API. Fix to
use internal namespace.

Signed-off-by: Anatoly Burakov 
Acked-by: David Hunt 
---
 lib/power/power_acpi_cpufreq.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/lib/power/power_acpi_cpufreq.c b/lib/power/power_acpi_cpufreq.c
index d028a9947f..1b8c69cc8b 100644
--- a/lib/power/power_acpi_cpufreq.c
+++ b/lib/power/power_acpi_cpufreq.c
@@ -78,7 +78,7 @@ enum power_state {
 /**
  * Power info per lcore.
  */
-struct rte_power_info {
+struct acpi_power_info {
unsigned int lcore_id;   /**< Logical core id */
uint32_t freqs[RTE_MAX_LCORE_FREQS]; /**< Frequency array */
uint32_t nb_freqs;   /**< number of available freqs */
@@ -90,14 +90,14 @@ struct rte_power_info {
uint16_t turbo_enable;   /**< Turbo Boost enable/disable */
 } __rte_cache_aligned;
 
-static struct rte_power_info lcore_power_info[RTE_MAX_LCORE];
+static struct acpi_power_info lcore_power_info[RTE_MAX_LCORE];
 
 /**
  * It is to set specific freq for specific logical core, according to the index
  * of supported frequencies.
  */
 static int
-set_freq_internal(struct rte_power_info *pi, uint32_t idx)
+set_freq_internal(struct acpi_power_info *pi, uint32_t idx)
 {
if (idx >= RTE_MAX_LCORE_FREQS || idx >= pi->nb_freqs) {
RTE_LOG(ERR, POWER, "Invalid frequency index %u, which "
@@ -133,7 +133,7 @@ set_freq_internal(struct rte_power_info *pi, uint32_t idx)
  * governor will be saved for rolling back.
  */
 static int
-power_set_governor_userspace(struct rte_power_info *pi)
+power_set_governor_userspace(struct acpi_power_info *pi)
 {
FILE *f;
int ret = -1;
@@ -189,7 +189,7 @@ power_set_governor_userspace(struct rte_power_info *pi)
  * sys file.
  */
 static int
-power_get_available_freqs(struct rte_power_info *pi)
+power_get_available_freqs(struct acpi_power_info *pi)
 {
FILE *f;
int ret = -1, i, count;
@@ -259,7 +259,7 @@ power_get_available_freqs(struct rte_power_info *pi)
  * It is to fopen the sys file for the future setting the lcore frequency.
  */
 static int
-power_init_for_setting_freq(struct rte_power_info *pi)
+power_init_for_setting_freq(struct acpi_power_info *pi)
 {
FILE *f;
char fullpath[PATH_MAX];
@@ -299,7 +299,7 @@ power_acpi_cpufreq_check_supported(void)
 int
 power_acpi_cpufreq_init(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
uint32_t exp_state;
 
if (lcore_id >= RTE_MAX_LCORE) {
@@ -374,7 +374,7 @@ power_acpi_cpufreq_init(unsigned int lcore_id)
  * needed by writing the sys file.
  */
 static int
-power_set_governor_original(struct rte_power_info *pi)
+power_set_governor_original(struct acpi_power_info *pi)
 {
FILE *f;
int ret = -1;
@@ -420,7 +420,7 @@ power_set_governor_original(struct rte_power_info *pi)
 int
 power_acpi_cpufreq_exit(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
uint32_t exp_state;
 
if (lcore_id >= RTE_MAX_LCORE) {
@@ -475,7 +475,7 @@ power_acpi_cpufreq_exit(unsigned int lcore_id)
 uint32_t
 power_acpi_cpufreq_freqs(unsigned int lcore_id, uint32_t *freqs, uint32_t num)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
 
if (lcore_id >= RTE_MAX_LCORE) {
RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
@@ -522,7 +522,7 @@ power_acpi_cpufreq_set_freq(unsigned int lcore_id, uint32_t 
index)
 int
 power_acpi_cpufreq_freq_down(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
 
if (lcore_id >= RTE_MAX_LCORE) {
RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
@@ -540,7 +540,7 @@ power_acpi_cpufreq_freq_down(unsigned int lcore_id)
 int
 power_acpi_cpufreq_freq_up(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
 
if (lcore_id >= RTE_MAX_LCORE) {
RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
@@ -581,7 +581,7 @@ power_acpi_cpufreq_freq_max(unsigned int lcore_id)
 int
 power_acpi_cpufreq_freq_min(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
 
if (lcore_id >= RTE_MAX_LCORE) {
RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
@@ -598,7 +598,7 @@ power_acpi_cpufreq_freq_min(unsigned int lcore_id)
 int
 power_acpi_turbo_status(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+   struct acpi_power_info *pi;
 
if (lcore_id >= RTE_MAX_LCORE) {
RTE_LOG(ERR, POWER, "Invalid lcore ID\n");
@@ -614,7 +614,7 @@ power_acpi_turbo_status(unsigned int lcore_id)
 int
 power_acpi_enable_turbo(unsigned int lcore_id)
 {
-   struct rte_power_info *pi;
+

[dpdk-dev] [PATCH v5 2/2] power: refactor pstate and acpi code

2021-06-22 Thread David Hunt
From: Anatoly Burakov 

Currently, ACPI and PSTATE modes have lots of code duplication,
confusing logic, and a bunch of other issues that can, and have, led to
various bugs and resource leaks.

This commit factors out the common parts of sysfs reading/writing for
ACPI and PSTATE drivers.

Signed-off-by: Anatoly Burakov 
Signed-off-by: David Hunt 

---
changes in v2
* fixed bugs raised by Richael Zhuang in review - open file rw+, etc.
* removed FOPS* and FOPEN* macros, which contained control statements.
* fixed some checkpatch warnings.
---
 lib/power/meson.build|   7 +
 lib/power/power_acpi_cpufreq.c   | 192 
 lib/power/power_common.c | 146 
 lib/power/power_common.h |  17 ++
 lib/power/power_pstate_cpufreq.c | 374 ++-
 5 files changed, 335 insertions(+), 401 deletions(-)

diff --git a/lib/power/meson.build b/lib/power/meson.build
index c1097d32f1..74c5f3a294 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -5,6 +5,13 @@ if not is_linux
 build = false
 reason = 'only supported on Linux'
 endif
+
+# we do some snprintf magic so silence format-nonliteral
+flag_nonliteral = '-Wno-format-nonliteral'
+if cc.has_argument(flag_nonliteral)
+   cflags += flag_nonliteral
+endif
+
 sources = files(
 'guest_channel.c',
 'power_acpi_cpufreq.c',
diff --git a/lib/power/power_acpi_cpufreq.c b/lib/power/power_acpi_cpufreq.c
index 1b8c69cc8b..9ca8d8a8f2 100644
--- a/lib/power/power_acpi_cpufreq.c
+++ b/lib/power/power_acpi_cpufreq.c
@@ -19,41 +19,10 @@
 #include "power_acpi_cpufreq.h"
 #include "power_common.h"
 
-#ifdef RTE_LIBRTE_POWER_DEBUG
-#define POWER_DEBUG_TRACE(fmt, args...) do { \
-   RTE_LOG(ERR, POWER, "%s: " fmt, __func__, ## args); \
-} while (0)
-#else
-#define POWER_DEBUG_TRACE(fmt, args...)
-#endif
-
-#define FOPEN_OR_ERR_RET(f, retval) do { \
-   if ((f) == NULL) { \
-   RTE_LOG(ERR, POWER, "File not opened\n"); \
-   return retval; \
-   } \
-} while (0)
-
-#define FOPS_OR_NULL_GOTO(ret, label) do { \
-   if ((ret) == NULL) { \
-   RTE_LOG(ERR, POWER, "fgets returns nothing\n"); \
-   goto label; \
-   } \
-} while (0)
-
-#define FOPS_OR_ERR_GOTO(ret, label) do { \
-   if ((ret) < 0) { \
-   RTE_LOG(ERR, POWER, "File operations failed\n"); \
-   goto label; \
-   } \
-} while (0)
-
 #define STR_SIZE 1024
 #define POWER_CONVERT_TO_DECIMAL 10
 
 #define POWER_GOVERNOR_USERSPACE "userspace"
-#define POWER_SYSFILE_GOVERNOR   \
-   "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_governor"
 #define POWER_SYSFILE_AVAIL_FREQ \

"/sys/devices/system/cpu/cpu%u/cpufreq/scaling_available_frequencies"
 #define POWER_SYSFILE_SETSPEED   \
@@ -135,53 +104,18 @@ set_freq_internal(struct acpi_power_info *pi, uint32_t 
idx)
 static int
 power_set_governor_userspace(struct acpi_power_info *pi)
 {
-   FILE *f;
-   int ret = -1;
-   char buf[BUFSIZ];
-   char fullpath[PATH_MAX];
-   char *s;
-   int val;
-
-   snprintf(fullpath, sizeof(fullpath), POWER_SYSFILE_GOVERNOR,
-   pi->lcore_id);
-   f = fopen(fullpath, "rw+");
-   FOPEN_OR_ERR_RET(f, ret);
-
-   s = fgets(buf, sizeof(buf), f);
-   FOPS_OR_NULL_GOTO(s, out);
-   /* Strip off terminating '\n' */
-   strtok(buf, "\n");
-
-   /* Save the original governor */
-   rte_strscpy(pi->governor_ori, buf, sizeof(pi->governor_ori));
-
-   /* Check if current governor is userspace */
-   if (strncmp(buf, POWER_GOVERNOR_USERSPACE,
-   sizeof(POWER_GOVERNOR_USERSPACE)) == 0) {
-   ret = 0;
-   POWER_DEBUG_TRACE("Power management governor of lcore %u is "
-   "already userspace\n", pi->lcore_id);
-   goto out;
-   }
-
-   /* Write 'userspace' to the governor */
-   val = fseek(f, 0, SEEK_SET);
-   FOPS_OR_ERR_GOTO(val, out);
-
-   val = fputs(POWER_GOVERNOR_USERSPACE, f);
-   FOPS_OR_ERR_GOTO(val, out);
-
-   /* We need to flush to see if the fputs succeeds */
-   val = fflush(f);
-   FOPS_OR_ERR_GOTO(val, out);
-
-   ret = 0;
-   RTE_LOG(INFO, POWER, "Power management governor of lcore %u has been "
-   "set to user space successfully\n", pi->lcore_id);
-out:
-   fclose(f);
+   return power_set_governor(pi->lcore_id, POWER_GOVERNOR_USERSPACE,
+   pi->governor_ori, sizeof(pi->governor_ori));
+}
 
-   return ret;
+/**
+ * It is to check the governor and then set the original governor back if
+ * needed by writing the sys file.
+ */
+static int
+power_set_governor_original(struct acpi_power_info *pi)
+{
+   return power_set_governor(pi->lcore_id, pi->governor_or

Re: [dpdk-dev] [PATCH v2 1/2] power: don't use rte prefix in internal code

2021-06-22 Thread David Hunt



On 22/6/2021 1:43 PM, David Hunt wrote:

From: Anatoly Burakov 

Currently, ACPI code uses rte_power_info as the struct name, which
gives the appearance that this is an externally visible API. Fix to
use internal namespace.

Signed-off-by: Anatoly Burakov 
Acked-by: David Hunt 
---
  lib/power/power_acpi_cpufreq.c | 34 +-
  1 file changed, 17 insertions(+), 17 deletions(-)



Please ignore, should have been sent as v5.




Re: [dpdk-dev] [PATCH v2 2/2] power: refactor pstate and acpi code

2021-06-22 Thread David Hunt



On 22/6/2021 1:43 PM, David Hunt wrote:

From: Anatoly Burakov 

Currently, ACPI and PSTATE modes have lots of code duplication,
confusing logic, and a bunch of other issues that can, and have, led to
various bugs and resource leaks.

This commit factors out the common parts of sysfs reading/writing for
ACPI and PSTATE drivers.

Signed-off-by: Anatoly Burakov 
Signed-off-by: David Hunt 

---
changes in v2
* fixed bugs raised by Richael Zhuang in review - open file rw+, etc.
* removed FOPS* and FOPEN* macros, which contained control statements.
* fixed some checkpatch warnings.





Please ignore, should have been sent as v5.




Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays

2021-06-22 Thread Morten Brørup
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev,
> Konstantin
> 
> >
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev,
> > > Konstantin
> > >
> > > >
> > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev,
> > > > > Konstantin
> > > > >
> > > > > >
> > > > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of
> Ananyev,
> > > > > > > Konstantin
> > > > > > >
> > > > > > > > > How can we hide the callbacks since they are used by
> inline
> > > > > burst
> > > > > > > functions.
> > > > > > > >
> > > > > > > > I probably I owe a better explanation to what I meant in
> > > first
> > > > > mail.
> > > > > > > > Otherwise it sounds confusing.
> > > > > > > > I'll try to write a more detailed one in next few days.
> > > > > > >
> > > > > > > Actually I gave it another thought over weekend, and might
> be
> > > we
> > > > > can
> > > > > > > hide rte_eth_dev_cb even in a simpler way. I'd use
> > > eth_rx_burst()
> > > > > as
> > > > > > > an example, but the same principle applies to other 'fast'
> > > > > functions.
> > > > > > >
> > > > > > >  1. Needed changes for PMDs rx_pkt_burst():
> > > > > > > a) change function prototype to accept 'uint16_t
> port_id'
> > > and
> > > > > > > 'uint16_t queue_id',
> > > > > > >  instead of current 'void *'.
> > > > > > > b) Each PMD rx_pkt_burst() will have to call
> > > > > rte_eth_rx_epilog()
> > > > > > > function at return.
> > > > > > >  This  inline function will do all CB calls for
> that
> > > queue.
> > > > > > >
> > > > > > > To be more specific, let say we have some PMD: xyz with RX
> > > > > function:
> > > > > > >
> > > > > > > uint16_t
> > > > > > > xyz_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
> > > uint16_t
> > > > > > > nb_pkts)
> > > > > > > {
> > > > > > >  struct xyz_rx_queue *rxq = rx_queue;
> > > > > > >  uint16_t nb_rx = 0;
> > > > > > >
> > > > > > >  /* do actual stuff here */
> > > > > > > 
> > > > > > > return nb_rx;
> > > > > > > }
> > > > > > >
> > > > > > > It will be transformed to:
> > > > > > >
> > > > > > > uint16_t
> > > > > > > xyz_recv_pkts(uint16_t port_id, uint16_t queue_id, struct
> > > rte_mbuf
> > > > > > > **rx_pkts, uint16_t nb_pkts)
> > > > > > > {
> > > > > > >  struct xyz_rx_queue *rxq;
> > > > > > >  uint16_t nb_rx;
> > > > > > >
> > > > > > >  rxq = _rte_eth_rx_prolog(port_id, queue_id);
> > > > > > >  if (rxq == NULL)
> > > > > > >  return 0;
> > > > > > >  nb_rx = _xyz_real_recv_pkts(rxq, rx_pkts,
> nb_pkts);
> > > > > > >  return _rte_eth_rx_epilog(port_id, queue_id,
> rx_pkts,
> > > > > > > nb_pkts);
> > > > > > > }
> > > > > > >
> > > > > > > And somewhere in ethdev_private.h:
> > > > > > >
> > > > > > > static inline void *
> > > > > > > _rte_eth_rx_prolog(uint16_t port_id, uint16_t queue_id);
> > > > > > > {
> > > > > > >struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > > > > > >
> > > > > > > #ifdef RTE_ETHDEV_DEBUG_RX
> > > > > > > RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
> > > > > > > RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, NULL);
> > > > > > >
> > > > > > > if (queue_id >= dev->data->nb_rx_queues) {
> > > > > > > RTE_ETHDEV_LOG(ERR, "Invalid RX
> queue_id=%u\n",
> > > > > > > queue_id);
> > > > > > > return NULL;
> > > > > > > }
> > > > > > > #endif
> > > > > > >   return dev->data->rx_queues[queue_id];
> > > > > > > }
> > > > > > >
> > > > > > > static inline uint16_t
> > > > > > > _rte_eth_rx_epilog(uint16_t port_id, uint16_t queue_id,
> struct
> > > > > rte_mbuf
> > > > > > > **rx_pkts, const uint16_t nb_pkts);
> > > > > > > {
> > > > > > > struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > > > > > >
> > > > > > > #ifdef RTE_ETHDEV_RXTX_CALLBACKS
> > > > > > > struct rte_eth_rxtx_callback *cb;
> > > > > > >
> > > > > > > /* __ATOMIC_RELEASE memory order was used when the
> > > > > > >  * call back was inserted into the list.
> > > > > > >  * Since there is a clear dependency between
> loading
> > > > > > >  * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory
> > > order is
> > > > > > >  * not required.
> > > > > > >  */
> > > > > > > cb = __atomic_load_n(&dev-
> >post_rx_burst_cbs[queue_id],
> > > > > > > __ATOMIC_RELAXED);
> > > > > > >
> > > > > > > if (unlikely(cb != NULL)) {
> > > > > > > do {
> > > > > > > nb_rx = cb->fn.rx(port_id,
> queue_id,
> > > > > rx_pkts,
> > > > > > > nb_rx,
> > > > > > > nb_pkts,
> cb-
> > > > > >param);
> > > > > > > cb = cb->next;
> > > > > > > } while (cb != NULL);
> > > > > > > }
> > > > > > > #endif
> > > > > > >
> > > > > > > rte_ethdev_trace_rx_burst(port_id, queue_i

Re: [dpdk-dev] [PATCH v5 2/2] power: refactor pstate and acpi code

2021-06-22 Thread David Hunt
Adding people to the CC list that were on v4 of this patch set, and 
Richael who raised some issues in v4.


On 22/6/2021 1:58 PM, David Hunt wrote:

From: Anatoly Burakov 

Currently, ACPI and PSTATE modes have lots of code duplication,
confusing logic, and a bunch of other issues that can, and have, led to
various bugs and resource leaks.

This commit factors out the common parts of sysfs reading/writing for
ACPI and PSTATE drivers.

Signed-off-by: Anatoly Burakov 
Signed-off-by: David Hunt 

---
changes in v2 (should read v5)
* fixed bugs raised by Richael Zhuang in review - open file rw+, etc.
* removed FOPS* and FOPEN* macros, which contained control statements.
* fixed some checkpatch warnings.



So in the process of posting v5, I picked the email id from v4 in 
patchwork, used that in my --in-reply-to, and somehow it screwed up the 
threading as it looks like I'm responding to v3. So I'm sending this 
email to make sure all the people CC'd in v4 are included in this (v5).


Anatoly is busy at the moment, so I'm addressing the issues raised in 
v4, and additionally adressing the checkpatch issues where it does not 
like the macros with control statements, so removing those, as I don't 
like them either.


Regards,
Dave.






[dpdk-dev] [PATCH 1/1] net/i40e: fix compilation failure on core-avx-i

2021-06-22 Thread Shahed Shaikh
i40e_rxtx_vec_sse.c fails to compile with below configuration:
- #define RTE_LIBRTE_I40E_16BYTE_RX_DESC 1 in config/rte_config.h
- cpu=core-axv-i
- gcc which supports -mavx2 (e.g. gcc 4.8.5)

This is because commit 0604b1f2208f ("net/i40e: fix crash in AVX512")
added i40e_rxq_rearm_common() to i40e_rxtx_vec_common.h which is
included by i40e_rxtx_vec_sse.c.

This function is enabled for compilation if CC_AVX2_SUPPORT is defined.
As per drivers/net/i40e/meson.build, CC_AVX2_SUPPORT is defined when
either CPU supports __AVX2__ or compiler supports -mavx2 option.

So for given configuration, CC_AVX2_SUPPORT gets defined but we
don't pass -mavx2 explicitly to gcc while compiling i40e_rxtx_vec_sse.c.
Hence it fails due to avx2 specific code from i40e_rxq_rearm_command().

This patch tries to fix the compilation by moving
i40e_rxq_rearm_common() to a new header file which will only be
included by i40e_rxtx_vec_avx2.c and i40e_rxtx_vec_avx512.c.

Fixes: 0604b1f2208f ("net/i40e: fix crash in AVX512")
Cc: sta...@dpdk.org

Signed-off-by: Shahed Shaikh 
---
 drivers/net/i40e/i40e_rxtx_vec_avx2.c   |   2 +-
 drivers/net/i40e/i40e_rxtx_vec_avx512.c |   2 +-
 drivers/net/i40e/i40e_rxtx_vec_avx_common.h | 210 
 drivers/net/i40e/i40e_rxtx_vec_common.h | 201 ---
 4 files changed, 212 insertions(+), 203 deletions(-)
 create mode 100644 drivers/net/i40e/i40e_rxtx_vec_avx_common.h

diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c 
b/drivers/net/i40e/i40e_rxtx_vec_avx2.c
index 3b9eef91a9..2afbb71b75 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c
@@ -10,7 +10,7 @@
 #include "base/i40e_type.h"
 #include "i40e_ethdev.h"
 #include "i40e_rxtx.h"
-#include "i40e_rxtx_vec_common.h"
+#include "i40e_rxtx_vec_avx_common.h"

 #include 

diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c 
b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
index bd21d64223..ad225b0e54 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c
@@ -10,7 +10,7 @@
 #include "base/i40e_type.h"
 #include "i40e_ethdev.h"
 #include "i40e_rxtx.h"
-#include "i40e_rxtx_vec_common.h"
+#include "i40e_rxtx_vec_avx_common.h"

 #include 

diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx_common.h 
b/drivers/net/i40e/i40e_rxtx_vec_avx_common.h
new file mode 100644
index 00..9f34e52efb
--- /dev/null
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx_common.h
@@ -0,0 +1,210 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Intel Corporation
+ */
+
+#ifndef _I40E_RXTX_VEC_AVX_COMMON_H_
+#define _I40E_RXTX_VEC_AVX_COMMON_H_
+
+#include "i40e_rxtx_vec_common.h"
+
+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
+
+#ifdef CC_AVX2_SUPPORT
+static __rte_always_inline void
+i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512)
+{
+   int i;
+   uint16_t rx_id;
+   volatile union i40e_rx_desc *rxdp;
+   struct i40e_rx_entry *rxep = &rxq->sw_ring[rxq->rxrearm_start];
+
+   rxdp = rxq->rx_ring + rxq->rxrearm_start;
+
+   /* Pull 'n' more MBUFs into the software ring */
+   if (rte_mempool_get_bulk(rxq->mp,
+(void *)rxep,
+RTE_I40E_RXQ_REARM_THRESH) < 0) {
+   if (rxq->rxrearm_nb + RTE_I40E_RXQ_REARM_THRESH >=
+   rxq->nb_rx_desc) {
+   __m128i dma_addr0;
+   dma_addr0 = _mm_setzero_si128();
+   for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) {
+   rxep[i].mbuf = &rxq->fake_mbuf;
+   _mm_store_si128((__m128i *)&rxdp[i].read,
+   dma_addr0);
+   }
+   }
+   rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed +=
+   RTE_I40E_RXQ_REARM_THRESH;
+   return;
+   }
+
+#ifndef RTE_LIBRTE_I40E_16BYTE_RX_DESC
+   struct rte_mbuf *mb0, *mb1;
+   __m128i dma_addr0, dma_addr1;
+   __m128i hdr_room = _mm_set_epi64x(RTE_PKTMBUF_HEADROOM,
+   RTE_PKTMBUF_HEADROOM);
+   /* Initialize the mbufs in vector, process 2 mbufs in one loop */
+   for (i = 0; i < RTE_I40E_RXQ_REARM_THRESH; i += 2, rxep += 2) {
+   __m128i vaddr0, vaddr1;
+
+   mb0 = rxep[0].mbuf;
+   mb1 = rxep[1].mbuf;
+
+   /* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
+   RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
+   offsetof(struct rte_mbuf, buf_addr) + 8);
+   vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr);
+   vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr);
+
+   /* convert pa to dma_addr hdr/data */
+   dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
+   dma_addr1 = _mm_unpa

Re: [dpdk-dev] [PATCH] net/e1000: fix nic ops function was no initialized in secondary process

2021-06-22 Thread Tengfei Zhang

On 2021/6/22 上午10:16, Wang, Haiyue wrote:

From: 张 杨 
Sent: Monday, June 21, 2021 16:35
To: Wang, Haiyue 
Cc: dev@dpdk.org; Zhang, Qi Z ; Lin, Xueqin 

Subject: Re: [PATCH] net/e1000: fix nic ops function was no initialized in 
secondary process

发件人: Wang, Haiyue 
发送时间: 2021年6月21日星期一 15:31
收件人: Tengfei Zhang
抄送: mailto:dev@dpdk.org; Zhang, Qi Z; Lin, Xueqin
主题: RE: [PATCH] net/e1000: fix nic ops function was no initialized in secondary 
process



-Original Message-
From: Tengfei Zhang 
Sent: Saturday, June 19, 2021 01:27
To: Wang, Haiyue 
Cc: mailto:dev@dpdk.org; Tengfei Zhang 
Subject: [PATCH] net/e1000: fix nic ops function was no initialized in 
secondary process

'e1000_setup_init_funcs' was not called in secondary process,
it initialize mac,phy,nvm ops.
when secondary process get link status,it will coredump.
Thanks, Tengfei.
Since primary / secondary is so complicated, AFAIK, the control path is in
primary, the secondary is mainly for rx/tx ops officially, like new Intel
ice PMD:
     if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
     ice_set_rx_function(dev);
     ice_set_tx_function(dev);
     return 0;
     }

So you can keep your patch as private for special secondary usage. ;-)
Signed-off-by: Tengfei Zhang 
---
   drivers/net/e1000/em_ethdev.c  | 1 +
   drivers/net/e1000/igb_ethdev.c | 2 ++
   2 files changed, 3 insertions(+)

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index a0ca371b02..cd5faa4228 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -258,6 +258,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
  * has already done this work. Only check we don't need a different
  * RX function */
     if (rte_eal_process_type() != RTE_PROC_PRIMARY){
+ e1000_setup_init_funcs(hw, TRUE);
     if (eth_dev->data->scattered_rx)
     eth_dev->rx_pkt_burst =
     (eth_rx_burst_t)ð_em_recv_scattered_pkts;
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 10ee0f3341..7d9d60497d 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -737,6 +737,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
  * has already done this work. Only check we don't need a different
  * RX function */
     if (rte_eal_process_type() != RTE_PROC_PRIMARY){
+ e1000_setup_init_funcs(hw, TRUE);
     if (eth_dev->data->scattered_rx)
     eth_dev->rx_pkt_burst = ð_igb_recv_scattered_pkts;
     return 0;
@@ -931,6 +932,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
  * has already done this work. Only check we don't need a different
  * RX function */
     if (rte_eal_process_type() != RTE_PROC_PRIMARY){
+ e1000_setup_init_funcs(hw, TRUE);
     if (eth_dev->data->scattered_rx)
     eth_dev->rx_pkt_burst = ð_igb_recv_scattered_pkts;
     return 0;
--
2.26.2





this issue does not appear in  ice, i40e, vmxnet3 PMD drivers. Only in e1000 , 
ixgbe drivers.
Ice  pmd driver gets link status by read reg directly.

For making primary & secondary working well, these drivers try to avoid save
the global data and ops function in shared data at the design of beginning.



I agree with what you said "primary, the secondary is mainly for rx/tx ops 
officially".
My opinion is the "set actions" shouldn't called in secondary process, but "get 
actions" was very common operation, they shouldn't be banned.

It's not banned, just because e1000's design introduces the global data ops in 
the
shared data, which is not good for sharing and accessing, since the address of 
the
global data ops is changed in secondary process.

As you can see in " e1000_setup_init_funcs", they not only set function 
pointers,
but also call them, not sure this will break other things or not. ;-)


Thanks for your reply




I think "get" operation should not crash  in  any  process.

and on different  nics ,  this operation  should  be consistent.


Yes , "e1000_setup_init_funcs" not only set function pointers, this 
patch is not good enough,  just happens to  work.


I look forward to   a better solution



[dpdk-dev] [PATCH v2] examples/power: add baseline mode to PMD power

2021-06-22 Thread David Hunt
The PMD Power Management scheme currently has 3 modes,
scale, monitor and pause. However, it would be nice to
have a baseline mode for easy comparison of power savings
with and without these modes.

This patch adds a 'baseline' mode were the pmd power
management is not enabled. Use --pmg-mgmt=baseline.

Signed-off-by: David Hunt 

---
changes in v2
* added a bool for baseline mode rather than abusing enums
---
 examples/l3fwd-power/main.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index f8dfed1634..aeb2411e62 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -207,6 +207,7 @@ enum appmode {
 enum appmode app_mode;
 
 static enum rte_power_pmd_mgmt_type pmgmt_type;
+bool baseline_enabled;
 
 enum freq_scale_hint_t
 {
@@ -1617,7 +1618,7 @@ print_usage(const char *prgname)
" empty polls, full polls, and core busyness to telemetry\n"
" --interrupt-only: enable interrupt-only mode\n"
" --pmd-mgmt MODE: enable PMD power management mode. "
-   "Currently supported modes: monitor, pause, scale\n",
+   "Currently supported modes: baseline, monitor, pause, scale\n",
prgname);
 }
 
@@ -1714,6 +1715,7 @@ parse_pmd_mgmt_config(const char *name)
 #define PMD_MGMT_MONITOR "monitor"
 #define PMD_MGMT_PAUSE   "pause"
 #define PMD_MGMT_SCALE   "scale"
+#define PMD_MGMT_BASELINE  "baseline"
 
if (strncmp(PMD_MGMT_MONITOR, name, sizeof(PMD_MGMT_MONITOR)) == 0) {
pmgmt_type = RTE_POWER_MGMT_TYPE_MONITOR;
@@ -1729,6 +1731,10 @@ parse_pmd_mgmt_config(const char *name)
pmgmt_type = RTE_POWER_MGMT_TYPE_SCALE;
return 0;
}
+   if (strncmp(PMD_MGMT_BASELINE, name, sizeof(PMD_MGMT_BASELINE)) == 0) {
+   baseline_enabled = true;
+   return 0;
+   }
/* unknown PMD power management mode */
return -1;
 }
@@ -2528,6 +2534,9 @@ main(int argc, char **argv)
/* init RTE timer library to be used late */
rte_timer_subsystem_init();
 
+   /* if we're running pmd-mgmt mode, don't default to baseline mode */
+   baseline_enabled = false;
+
/* parse application arguments (after the EAL ones) */
ret = parse_args(argc, argv);
if (ret < 0)
@@ -2767,7 +2776,8 @@ main(int argc, char **argv)
 "Fail to add ptype cb\n");
}
 
-   if (app_mode == APP_MODE_PMD_MGMT) {
+   if ((app_mode == APP_MODE_PMD_MGMT) &&
+   (baseline_enabled == false)) {
ret = rte_power_ethdev_pmgmt_queue_enable(
lcore_id, portid, queueid,
pmgmt_type);
-- 
2.17.1



[dpdk-dev] [PATCH v3] net/mlx5: add TCP and IPv6 to supported flow items list in Windows

2021-06-22 Thread Tal Shnaiderman
WINOF2 2.70 Windows kernel driver allows DevX rule creation
of types TCP and IPv6.

Added the types to the supported items in mlx5_flow_os_item_supported
to allow them to be created in the PMD.

Added description of new rules support in Windows kernel driver WINOF2 2.70
to the mlx5 driver guide.

Signed-off-by: Tal Shnaiderman 

---
v3 merge code and docu changes to a single patch.
---
---
 doc/guides/nics/mlx5.rst| 3 +++
 drivers/net/mlx5/windows/mlx5_flow_os.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 83299646dd..eb44a070b1 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -123,6 +123,9 @@ Limitations
 
 - IPv4/UDP with CVLAN filtering
 - Unicast MAC filtering
+  - Additional rules are supported from WinOF2 version 2.70:
+- IPv4/TCP with CVLAN filtering
+- L4 steering rules for port RSS of UDP, TCP and IP
 
 - For secondary process:
 
diff --git a/drivers/net/mlx5/windows/mlx5_flow_os.h 
b/drivers/net/mlx5/windows/mlx5_flow_os.h
index 26c3e59789..df92f25ce6 100644
--- a/drivers/net/mlx5/windows/mlx5_flow_os.h
+++ b/drivers/net/mlx5/windows/mlx5_flow_os.h
@@ -42,6 +42,8 @@ mlx5_flow_os_item_supported(int item)
case RTE_FLOW_ITEM_TYPE_ETH:
case RTE_FLOW_ITEM_TYPE_IPV4:
case RTE_FLOW_ITEM_TYPE_UDP:
+   case RTE_FLOW_ITEM_TYPE_TCP:
+   case RTE_FLOW_ITEM_TYPE_IPV6:
return true;
default:
return false;
-- 
2.16.1.windows.4



[dpdk-dev] [PATCH v1] doc: update ABI in MAINTAINERS file

2021-06-22 Thread Ray Kinsella
Update to ABI MAINTAINERS.

Signed-off-by: Ray Kinsella 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5877a16971..dab8883a4f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -117,7 +117,6 @@ F: .ci/
 
 ABI Policy & Versioning
 M: Ray Kinsella 
-M: Neil Horman 
 F: lib/eal/include/rte_compat.h
 F: lib/eal/include/rte_function_versioning.h
 F: doc/guides/contributing/abi_*.rst
-- 
2.26.2



[dpdk-dev] [PATCH] net/mlx5: fix multi-segment inline for the first segment

2021-06-22 Thread Viacheslav Ovsiienko
If the first segment in the multi-segment packet is short
and below the inline threshold it should be inline into
the WQE to improve the performance. For example, the T-Rex
traffic generator might use small leading segments to
handle packet headers and performance was affected.

Fixes: cacb44a09962 ("net/mlx5: add no-inline Tx flag")
Cc: sta...@dpdk.org

Signed-off-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_tx.h | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index e8b1c0f108..1a35919371 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -2041,6 +2041,8 @@ mlx5_tx_packet_multi_inline(struct mlx5_txq_data 
*__rte_restrict txq,
unsigned int nxlen;
uintptr_t start;
 
+   mbuf = loc->mbuf;
+   nxlen = rte_pktmbuf_data_len(mbuf);
/*
 * Packet length exceeds the allowed inline data length,
 * check whether the minimal inlining is required.
@@ -2050,28 +2052,23 @@ mlx5_tx_packet_multi_inline(struct mlx5_txq_data 
*__rte_restrict txq,
MLX5_ESEG_MIN_INLINE_SIZE);
MLX5_ASSERT(txq->inlen_mode <= txq->inlen_send);
inlen = txq->inlen_mode;
-   } else {
-   if (loc->mbuf->ol_flags & PKT_TX_DYNF_NOINLINE ||
-   !vlan || txq->vlan_en) {
-   /*
-* VLAN insertion will be done inside by HW.
-* It is not utmost effective - VLAN flag is
-* checked twice, but we should proceed the
-* inlining length correctly and take into
-* account the VLAN header being inserted.
-*/
-   return mlx5_tx_packet_multi_send
-   (txq, loc, olx);
-   }
+   } else if (vlan && !txq->vlan_en) {
+   /*
+* VLAN insertion is requested and hardware does not
+* support the offload, will do with software inline.
+*/
inlen = MLX5_ESEG_MIN_INLINE_SIZE;
+   } else if (mbuf->ol_flags & PKT_TX_DYNF_NOINLINE ||
+  nxlen > txq->inlen_send) {
+   return mlx5_tx_packet_multi_send(txq, loc, olx);
+   } else {
+   goto do_first;
}
/*
 * Now we know the minimal amount of data is requested
 * to inline. Check whether we should inline the buffers
 * from the chain beginning to eliminate some mbufs.
 */
-   mbuf = loc->mbuf;
-   nxlen = rte_pktmbuf_data_len(mbuf);
if (unlikely(nxlen <= txq->inlen_send)) {
/* We can inline first mbuf at least. */
if (nxlen < inlen) {
@@ -2093,6 +2090,7 @@ mlx5_tx_packet_multi_inline(struct mlx5_txq_data 
*__rte_restrict txq,
goto do_align;
}
}
+do_first:
do {
inlen = nxlen;
mbuf = NEXT(mbuf);
-- 
2.18.1



[dpdk-dev] [PATCH 0/2] OCTEONTX crypto adapter support

2021-06-22 Thread Shijith Thotton
Below patches add crypto adapter OP_FORWARD support for OCTEON TX PMD.

Shijith Thotton (2):
  drivers: add octeontx crypto adapter framework
  drivers: add octeontx crypto adapter data path

 doc/guides/rel_notes/release_21_08.rst|   4 +
 drivers/common/cpt/cpt_common.h   |   2 +-
 drivers/crypto/octeontx/meson.build   |   6 +
 drivers/crypto/octeontx/otx_cryptodev.c   |   4 +
 drivers/crypto/octeontx/otx_cryptodev.h   |   4 -
 .../crypto/octeontx/otx_cryptodev_hw_access.h |   1 +
 drivers/crypto/octeontx/otx_cryptodev_ops.c   | 272 +-
 drivers/crypto/octeontx/otx_cryptodev_ops.h   |   7 +
 drivers/crypto/octeontx/version.map   |   9 +
 drivers/event/octeontx/meson.build|   1 +
 drivers/event/octeontx/ssovf_evdev.c  |  68 +
 drivers/event/octeontx/ssovf_worker.c |  11 +
 drivers/event/octeontx/ssovf_worker.h |  25 +-
 .../octeontx2/otx2_evdev_crypto_adptr_rx.h|   6 +-
 14 files changed, 332 insertions(+), 88 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH 1/2] drivers: add octeontx crypto adapter framework

2021-06-22 Thread Shijith Thotton
Set crypto adapter event device slow-path call backs.

Signed-off-by: Shijith Thotton 
---
 drivers/crypto/octeontx/meson.build   |  1 +
 drivers/crypto/octeontx/otx_cryptodev.c   |  4 ++
 drivers/crypto/octeontx/otx_cryptodev.h   |  4 --
 .../crypto/octeontx/otx_cryptodev_hw_access.h |  1 +
 drivers/event/octeontx/meson.build|  1 +
 drivers/event/octeontx/ssovf_evdev.c  | 67 +++
 6 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/octeontx/meson.build 
b/drivers/crypto/octeontx/meson.build
index daef47a72f..37603c5c89 100644
--- a/drivers/crypto/octeontx/meson.build
+++ b/drivers/crypto/octeontx/meson.build
@@ -7,6 +7,7 @@ endif
 
 deps += ['bus_pci']
 deps += ['common_cpt']
+deps += ['eventdev']
 
 sources = files(
 'otx_cryptodev.c',
diff --git a/drivers/crypto/octeontx/otx_cryptodev.c 
b/drivers/crypto/octeontx/otx_cryptodev.c
index ba73c2f939..7207909abb 100644
--- a/drivers/crypto/octeontx/otx_cryptodev.c
+++ b/drivers/crypto/octeontx/otx_cryptodev.c
@@ -14,6 +14,10 @@
 
 #include "cpt_pmd_logs.h"
 
+/* Device ID */
+#define PCI_VENDOR_ID_CAVIUM   0x177d
+#define CPT_81XX_PCI_VF_DEVICE_ID  0xa041
+
 uint8_t otx_cryptodev_driver_id;
 
 static struct rte_pci_id pci_id_cpt_table[] = {
diff --git a/drivers/crypto/octeontx/otx_cryptodev.h 
b/drivers/crypto/octeontx/otx_cryptodev.h
index b66ef4a8f7..5d8607eafb 100644
--- a/drivers/crypto/octeontx/otx_cryptodev.h
+++ b/drivers/crypto/octeontx/otx_cryptodev.h
@@ -8,10 +8,6 @@
 /* Cavium OCTEON TX crypto PMD device name */
 #define CRYPTODEV_NAME_OCTEONTX_PMDcrypto_octeontx
 
-/* Device ID */
-#define PCI_VENDOR_ID_CAVIUM   0x177d
-#define CPT_81XX_PCI_VF_DEVICE_ID  0xa041
-
 #define CPT_LOGTYPE otx_cpt_logtype
 
 extern int otx_cpt_logtype;
diff --git a/drivers/crypto/octeontx/otx_cryptodev_hw_access.h 
b/drivers/crypto/octeontx/otx_cryptodev_hw_access.h
index 0ec258157a..f7b1e93402 100644
--- a/drivers/crypto/octeontx/otx_cryptodev_hw_access.h
+++ b/drivers/crypto/octeontx/otx_cryptodev_hw_access.h
@@ -45,6 +45,7 @@ struct cpt_instance {
struct rte_mempool *sess_mp;
struct rte_mempool *sess_mp_priv;
struct cpt_qp_meta_info meta_info;
+   uint8_t ca_enabled;
 };
 
 struct command_chunk {
diff --git a/drivers/event/octeontx/meson.build 
b/drivers/event/octeontx/meson.build
index 3cb140b4de..0d9eec3f2e 100644
--- a/drivers/event/octeontx/meson.build
+++ b/drivers/event/octeontx/meson.build
@@ -12,3 +12,4 @@ sources = files(
 )
 
 deps += ['common_octeontx', 'mempool_octeontx', 'bus_vdev', 'net_octeontx']
+deps += ['crypto_octeontx']
diff --git a/drivers/event/octeontx/ssovf_evdev.c 
b/drivers/event/octeontx/ssovf_evdev.c
index d8b359801a..25bf207db6 100644
--- a/drivers/event/octeontx/ssovf_evdev.c
+++ b/drivers/event/octeontx/ssovf_evdev.c
@@ -5,6 +5,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -19,6 +20,7 @@
 
 #include "ssovf_evdev.h"
 #include "timvf_evdev.h"
+#include "otx_cryptodev_hw_access.h"
 
 static uint8_t timvf_enable_stats;
 
@@ -725,6 +727,67 @@ ssovf_timvf_caps_get(const struct rte_eventdev *dev, 
uint64_t flags,
timvf_enable_stats);
 }
 
+static int
+ssovf_crypto_adapter_caps_get(const struct rte_eventdev *dev,
+ const struct rte_cryptodev *cdev, uint32_t *caps)
+{
+   RTE_SET_USED(dev);
+   RTE_SET_USED(cdev);
+
+   *caps = 0;
+
+   return 0;
+}
+
+static int
+ssovf_crypto_adapter_qp_add(const struct rte_eventdev *dev,
+   const struct rte_cryptodev *cdev,
+   int32_t queue_pair_id,
+   const struct rte_event *event)
+{
+   struct cpt_instance *qp;
+   uint8_t qp_id;
+
+   RTE_SET_USED(event);
+
+   if (queue_pair_id == -1) {
+   for (qp_id = 0; qp_id < cdev->data->nb_queue_pairs; qp_id++) {
+   qp = cdev->data->queue_pairs[qp_id];
+   qp->ca_enabled = 1;
+   }
+   } else {
+   qp = cdev->data->queue_pairs[queue_pair_id];
+   qp->ca_enabled = 1;
+   }
+
+   ssovf_fastpath_fns_set((struct rte_eventdev *)(uintptr_t)dev);
+
+   return 0;
+}
+
+static int
+ssovf_crypto_adapter_qp_del(const struct rte_eventdev *dev,
+   const struct rte_cryptodev *cdev,
+   int32_t queue_pair_id)
+{
+   struct cpt_instance *qp;
+   uint8_t qp_id;
+
+   RTE_SET_USED(dev);
+
+   if (queue_pair_id == -1) {
+   for (qp_id = 0; qp_id < cdev->data->nb_queue_pairs; qp_id++) {
+   qp = cdev->data->queue_pairs[qp_id];
+   qp->ca_enabled = 0;
+   }
+   } else {
+   qp = cdev->data->queue_pairs[queue_pair_id];
+   qp->ca_enabled = 0;
+   }
+
+   return 0;
+}
+
 /* Initialize and register 

[dpdk-dev] [PATCH 2/2] drivers: add octeontx crypto adapter data path

2021-06-22 Thread Shijith Thotton
Added support for crypto adapter OP_FORWARD mode.

As OcteonTx CPT crypto completions could be out of order, each crypto op
is enqueued to CPT, dequeued from CPT and enqueued to SSO one-by-one.

Signed-off-by: Shijith Thotton 
---
 doc/guides/rel_notes/release_21_08.rst|   4 +
 drivers/common/cpt/cpt_common.h   |   2 +-
 drivers/crypto/octeontx/meson.build   |   5 +
 drivers/crypto/octeontx/otx_cryptodev_ops.c   | 272 +-
 drivers/crypto/octeontx/otx_cryptodev_ops.h   |   7 +
 drivers/crypto/octeontx/version.map   |   9 +
 drivers/event/octeontx/ssovf_evdev.c  |   3 +-
 drivers/event/octeontx/ssovf_worker.c |  11 +
 drivers/event/octeontx/ssovf_worker.h |  25 +-
 .../octeontx2/otx2_evdev_crypto_adptr_rx.h|   6 +-
 10 files changed, 259 insertions(+), 85 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf3ce..70a91cc654 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -55,6 +55,10 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Updated Marvell OCTEON TX PMD.**
+
+  Added support for crypto adapter OP_FORWARD mode.
+
 
 Removed Items
 -
diff --git a/drivers/common/cpt/cpt_common.h b/drivers/common/cpt/cpt_common.h
index 7fea0ca879..724e5ec736 100644
--- a/drivers/common/cpt/cpt_common.h
+++ b/drivers/common/cpt/cpt_common.h
@@ -54,7 +54,7 @@ struct cpt_request_info {
uint64_t ei2;
} ist;
uint8_t *rptr;
-   const struct otx2_cpt_qp *qp;
+   const void *qp;
 
/** Control path fields */
uint64_t time_out;
diff --git a/drivers/crypto/octeontx/meson.build 
b/drivers/crypto/octeontx/meson.build
index 37603c5c89..3ae6729e8f 100644
--- a/drivers/crypto/octeontx/meson.build
+++ b/drivers/crypto/octeontx/meson.build
@@ -6,6 +6,7 @@ if not is_linux
 endif
 
 deps += ['bus_pci']
+deps += ['bus_vdev']
 deps += ['common_cpt']
 deps += ['eventdev']
 
@@ -18,3 +19,7 @@ sources = files(
 )
 
 includes += include_directories('../../common/cpt')
+includes += include_directories('../../common/octeontx')
+includes += include_directories('../../event/octeontx')
+includes += include_directories('../../mempool/octeontx')
+includes += include_directories('../../net/octeontx')
diff --git a/drivers/crypto/octeontx/otx_cryptodev_ops.c 
b/drivers/crypto/octeontx/otx_cryptodev_ops.c
index d75f4b5f81..2ec95bbca4 100644
--- a/drivers/crypto/octeontx/otx_cryptodev_ops.c
+++ b/drivers/crypto/octeontx/otx_cryptodev_ops.c
@@ -6,6 +6,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -21,6 +23,8 @@
 #include "cpt_ucode.h"
 #include "cpt_ucode_asym.h"
 
+#include "ssovf_worker.h"
+
 static uint64_t otx_fpm_iova[CPT_EC_ID_PMAX];
 
 /* Forward declarations */
@@ -412,15 +416,17 @@ otx_cpt_asym_session_clear(struct rte_cryptodev *dev,
rte_mempool_put(sess_mp, priv);
 }
 
-static __rte_always_inline int32_t __rte_hot
+static __rte_always_inline void * __rte_hot
 otx_cpt_request_enqueue(struct cpt_instance *instance,
struct pending_queue *pqueue,
void *req, uint64_t cpt_inst_w7)
 {
struct cpt_request_info *user_req = (struct cpt_request_info *)req;
 
-   if (unlikely(pqueue->pending_count >= DEFAULT_CMD_QLEN))
-   return -EAGAIN;
+   if (unlikely(pqueue->pending_count >= DEFAULT_CMD_QLEN)) {
+   rte_errno = EAGAIN;
+   return NULL;
+   }
 
fill_cpt_inst(instance, req, cpt_inst_w7);
 
@@ -434,18 +440,12 @@ otx_cpt_request_enqueue(struct cpt_instance *instance,
/* Default mode of software queue */
mark_cpt_inst(instance);
 
-   pqueue->req_queue[pqueue->enq_tail] = (uintptr_t)user_req;
-
-   /* We will use soft queue length here to limit requests */
-   MOD_INC(pqueue->enq_tail, DEFAULT_CMD_QLEN);
-   pqueue->pending_count += 1;
-
CPT_LOG_DP_DEBUG("Submitted NB cmd with request: %p "
 "op: %p", user_req, user_req->op);
-   return 0;
+   return req;
 }
 
-static __rte_always_inline int __rte_hot
+static __rte_always_inline void * __rte_hot
 otx_cpt_enq_single_asym(struct cpt_instance *instance,
struct rte_crypto_op *op,
struct pending_queue *pqueue)
@@ -456,11 +456,13 @@ otx_cpt_enq_single_asym(struct cpt_instance *instance,
struct cpt_asym_sess_misc *sess;
uintptr_t *cop;
void *mdata;
+   void *req;
int ret;
 
if (unlikely(rte_mempool_get(minfo->pool, &mdata) < 0)) {
CPT_LOG_DP_ERR("Could not allocate meta buffer for request");
-   return -ENOMEM;
+   rte_errno = ENOMEM;
+   return NULL;
}
 
sess = get_

Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-22 Thread Jerin Jacob
On Fri, Jun 18, 2021 at 3:11 PM fengchengwen  wrote:
>
> On 2021/6/18 13:52, Jerin Jacob wrote:
> > On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson
> >  wrote:
> >>
> >> On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote:
> >>> On Wed, Jun 16, 2021 at 3:47 PM fengchengwen  
> >>> wrote:
> 
>  On 2021/6/16 15:09, Morten Brørup wrote:
> >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson
> >> Sent: Tuesday, 15 June 2021 18.39
> >>
> >> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> >>> This patch introduces 'dmadevice' which is a generic type of DMA
> >>> device.
> >>>
> >>> The APIs of dmadev library exposes some generic operations which can
> >>> enable configuration and I/O with the DMA devices.
> >>>
> >>> Signed-off-by: Chengwen Feng 
> >>> ---
> >> Thanks for sending this.
> >>
> >> Of most interest to me right now are the key data-plane APIs. While we
> >> are
> >> still in the prototyping phase, below is a draft of what we are
> >> thinking
> >> for the key enqueue/perform_ops/completed_ops APIs.
> >>
> >> Some key differences I note in below vs your original RFC:
> >> * Use of void pointers rather than iova addresses. While using iova's
> >> makes
> >>   sense in the general case when using hardware, in that it can work
> >> with
> >>   both physical addresses and virtual addresses, if we change the APIs
> >> to use
> >>   void pointers instead it will still work for DPDK in VA mode, while
> >> at the
> >>   same time allow use of software fallbacks in error cases, and also a
> >> stub
> >>   driver than uses memcpy in the background. Finally, using iova's
> >> makes the
> >>   APIs a lot more awkward to use with anything but mbufs or similar
> >> buffers
> >>   where we already have a pre-computed physical address.
> >> * Use of id values rather than user-provided handles. Allowing the
> >> user/app
> >>   to manage the amount of data stored per operation is a better
> >> solution, I
> >>   feel than proscribing a certain about of in-driver tracking. Some
> >> apps may
> >>   not care about anything other than a job being completed, while other
> >> apps
> >>   may have significant metadata to be tracked. Taking the user-context
> >>   handles out of the API also makes the driver code simpler.
> >> * I've kept a single combined API for completions, which differs from
> >> the
> >>   separate error handling completion API you propose. I need to give
> >> the
> >>   two function approach a bit of thought, but likely both could work.
> >> If we
> >>   (likely) never expect failed ops, then the specifics of error
> >> handling
> >>   should not matter that much.
> >>
> >> For the rest, the control / setup APIs are likely to be rather
> >> uncontroversial, I suspect. However, I think that rather than xstats
> >> APIs,
> >> the library should first provide a set of standardized stats like
> >> ethdev
> >> does. If driver-specific stats are needed, we can add xstats later to
> >> the
> >> API.
> >>
> >> Appreciate your further thoughts on this, thanks.
> >>
> >> Regards,
> >> /Bruce
> >
> > I generally agree with Bruce's points above.
> >
> > I would like to share a couple of ideas for further discussion:
> >>>
> >>>
> >>> I believe some of the other requirements and comments for generic DMA 
> >>> will be
> >>>
> >>> 1) Support for the _channel_, Each channel may have different
> >>> capabilities and functionalities.
> >>> Typical cases are, each channel have separate source and destination
> >>> devices like
> >>> DMA between PCIe EP to Host memory, Host memory to Host memory, PCIe
> >>> EP to PCIe EP.
> >>> So we need some notion of the channel in the specification.
> >>>
> >>
> >> Can you share a bit more detail on what constitutes a channel in this case?
> >> Is it equivalent to a device queue (which we are flattening to individual
> >> devices in this API), or to a specific configuration on a queue?
> >
> > It not a queue. It is one of the attributes for transfer.
> > I.e in the same queue, for a given transfer it can specify the
> > different "source" and "destination" device.
> > Like CPU to Sound card, CPU to network card etc.
> >
> >
> >>
> >>> 2) I assume current data plane APIs are not thread-safe. Is it right?
> >>>
> >> Yes.
> >>
> >>>
> >>> 3) Cookie scheme outlined earlier looks good to me. Instead of having
> >>> generic dequeue() API
> >>>
> >>> 4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void * src,
> >>> void * dst, unsigned int length);
> >>> to two stage API like, Where one will be used in fastpath and other
> >>> one will use used in slowpath.
> >>>
> >>> - slowpath API will for take channel and take other attributes for 
> >>

Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-22 Thread Jerin Jacob
On Fri, Jun 18, 2021 at 3:25 PM Bruce Richardson
 wrote:
>
> On Fri, Jun 18, 2021 at 11:22:28AM +0530, Jerin Jacob wrote:
> > On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson
> >  wrote:
> > >
> > > On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote:
> > > > On Wed, Jun 16, 2021 at 3:47 PM fengchengwen  
> > > > wrote:
> > > > >
> > > > > On 2021/6/16 15:09, Morten Brørup wrote:
> > > > > >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce 
> > > > > >> Richardson
> > > > > >> Sent: Tuesday, 15 June 2021 18.39
> > > > > >>
> > > > > >> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> > > > > >>> This patch introduces 'dmadevice' which is a generic type of DMA
> > > > > >>> device.
> > > > > >>>
> > > > > >>> The APIs of dmadev library exposes some generic operations which 
> > > > > >>> can
> > > > > >>> enable configuration and I/O with the DMA devices.
> > > > > >>>
> > > > > >>> Signed-off-by: Chengwen Feng 
> > > > > >>> ---
> > > > > >> Thanks for sending this.
> > > > > >>
> > > > > >> Of most interest to me right now are the key data-plane APIs. 
> > > > > >> While we
> > > > > >> are
> > > > > >> still in the prototyping phase, below is a draft of what we are
> > > > > >> thinking
> > > > > >> for the key enqueue/perform_ops/completed_ops APIs.
> > > > > >>
> > > > > >> Some key differences I note in below vs your original RFC:
> > > > > >> * Use of void pointers rather than iova addresses. While using 
> > > > > >> iova's
> > > > > >> makes
> > > > > >>   sense in the general case when using hardware, in that it can 
> > > > > >> work
> > > > > >> with
> > > > > >>   both physical addresses and virtual addresses, if we change the 
> > > > > >> APIs
> > > > > >> to use
> > > > > >>   void pointers instead it will still work for DPDK in VA mode, 
> > > > > >> while
> > > > > >> at the
> > > > > >>   same time allow use of software fallbacks in error cases, and 
> > > > > >> also a
> > > > > >> stub
> > > > > >>   driver than uses memcpy in the background. Finally, using iova's
> > > > > >> makes the
> > > > > >>   APIs a lot more awkward to use with anything but mbufs or similar
> > > > > >> buffers
> > > > > >>   where we already have a pre-computed physical address.
> > > > > >> * Use of id values rather than user-provided handles. Allowing the
> > > > > >> user/app
> > > > > >>   to manage the amount of data stored per operation is a better
> > > > > >> solution, I
> > > > > >>   feel than proscribing a certain about of in-driver tracking. Some
> > > > > >> apps may
> > > > > >>   not care about anything other than a job being completed, while 
> > > > > >> other
> > > > > >> apps
> > > > > >>   may have significant metadata to be tracked. Taking the 
> > > > > >> user-context
> > > > > >>   handles out of the API also makes the driver code simpler.
> > > > > >> * I've kept a single combined API for completions, which differs 
> > > > > >> from
> > > > > >> the
> > > > > >>   separate error handling completion API you propose. I need to 
> > > > > >> give
> > > > > >> the
> > > > > >>   two function approach a bit of thought, but likely both could 
> > > > > >> work.
> > > > > >> If we
> > > > > >>   (likely) never expect failed ops, then the specifics of error
> > > > > >> handling
> > > > > >>   should not matter that much.
> > > > > >>
> > > > > >> For the rest, the control / setup APIs are likely to be rather
> > > > > >> uncontroversial, I suspect. However, I think that rather than 
> > > > > >> xstats
> > > > > >> APIs,
> > > > > >> the library should first provide a set of standardized stats like
> > > > > >> ethdev
> > > > > >> does. If driver-specific stats are needed, we can add xstats later 
> > > > > >> to
> > > > > >> the
> > > > > >> API.
> > > > > >>
> > > > > >> Appreciate your further thoughts on this, thanks.
> > > > > >>
> > > > > >> Regards,
> > > > > >> /Bruce
> > > > > >
> > > > > > I generally agree with Bruce's points above.
> > > > > >
> > > > > > I would like to share a couple of ideas for further discussion:
> > > >
> > > >
> > > > I believe some of the other requirements and comments for generic DMA 
> > > > will be
> > > >
> > > > 1) Support for the _channel_, Each channel may have different
> > > > capabilities and functionalities.
> > > > Typical cases are, each channel have separate source and destination
> > > > devices like
> > > > DMA between PCIe EP to Host memory, Host memory to Host memory, PCIe
> > > > EP to PCIe EP.
> > > > So we need some notion of the channel in the specification.
> > > >
> > >
> > > Can you share a bit more detail on what constitutes a channel in this 
> > > case?
> > > Is it equivalent to a device queue (which we are flattening to individual
> > > devices in this API), or to a specific configuration on a queue?
> >
> > It not a queue. It is one of the attributes for transfer.
> > I.e in the same queue, for a given transfer it can specify the
> > different "source" and "destination" device.
> > Like CPU

Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-22 Thread Jerin Jacob
On Fri, Jun 18, 2021 at 3:34 PM Bruce Richardson
 wrote:
>
> On Fri, Jun 18, 2021 at 10:46:08AM +0530, Jerin Jacob wrote:
> > On Thu, Jun 17, 2021 at 1:30 PM Bruce Richardson
> >  wrote:
> > >
> > > On Thu, Jun 17, 2021 at 01:12:22PM +0530, Jerin Jacob wrote:
> > > > On Thu, Jun 17, 2021 at 12:43 AM Bruce Richardson
> > > >  wrote:
> > > > >
> > > > > On Wed, Jun 16, 2021 at 11:38:08PM +0530, Jerin Jacob wrote:
> > > > > > On Wed, Jun 16, 2021 at 11:01 PM Bruce Richardson
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, Jun 16, 2021 at 05:41:45PM +0800, fengchengwen wrote:
> > > > > > > > On 2021/6/16 0:38, Bruce Richardson wrote:
> > > > > > > > > On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> > > > > > > > >> This patch introduces 'dmadevice' which is a generic type of 
> > > > > > > > >> DMA
> > > > > > > > >> device.
> > > > > > > > >>
> > > > > > > > >> The APIs of dmadev library exposes some generic operations 
> > > > > > > > >> which can
> > > > > > > > >> enable configuration and I/O with the DMA devices.
> > > > > > > > >>
> > > > > > > > >> Signed-off-by: Chengwen Feng 
> > > > > > > > >> ---
> > > > > > > > > Thanks for sending this.
> > > > > > > > >
> > > > > > > > > Of most interest to me right now are the key data-plane APIs. 
> > > > > > > > > While we are
> > > > > > > > > still in the prototyping phase, below is a draft of what we 
> > > > > > > > > are thinking
> > > > > > > > > for the key enqueue/perform_ops/completed_ops APIs.
> > > > > > > > >
> > > > > > > > > Some key differences I note in below vs your original RFC:
> > > > > > > > > * Use of void pointers rather than iova addresses. While 
> > > > > > > > > using iova's makes
> > > > > > > > >   sense in the general case when using hardware, in that it 
> > > > > > > > > can work with
> > > > > > > > >   both physical addresses and virtual addresses, if we change 
> > > > > > > > > the APIs to use
> > > > > > > > >   void pointers instead it will still work for DPDK in VA 
> > > > > > > > > mode, while at the
> > > > > > > > >   same time allow use of software fallbacks in error cases, 
> > > > > > > > > and also a stub
> > > > > > > > >   driver than uses memcpy in the background. Finally, using 
> > > > > > > > > iova's makes the
> > > > > > > > >   APIs a lot more awkward to use with anything but mbufs or 
> > > > > > > > > similar buffers
> > > > > > > > >   where we already have a pre-computed physical address.
> > > > > > > >
> > > > > > > > The iova is an hint to application, and widely used in DPDK.
> > > > > > > > If switch to void, how to pass the address (iova or just va ?)
> > > > > > > > this may introduce implementation dependencies here.
> > > > > > > >
> > > > > > > > Or always pass the va, and the driver performs address 
> > > > > > > > translation, and this
> > > > > > > > translation may cost too much cpu I think.
> > > > > > > >
> > > > > > >
> > > > > > > On the latter point, about driver doing address translation I 
> > > > > > > would agree.
> > > > > > > However, we probably need more discussion about the use of iova 
> > > > > > > vs just
> > > > > > > virtual addresses. My thinking on this is that if we specify the 
> > > > > > > API using
> > > > > > > iovas it will severely hurt usability of the API, since it forces 
> > > > > > > the user
> > > > > > > to take more inefficient codepaths in a large number of cases. 
> > > > > > > Given a
> > > > > > > pointer to the middle of an mbuf, one cannot just pass that 
> > > > > > > straight as an
> > > > > > > iova but must instead do a translation into offset from mbuf 
> > > > > > > pointer and
> > > > > > > then readd the offset to the mbuf base address.
> > > > > > >
> > > > > > > My preference therefore is to require the use of an IOMMU when 
> > > > > > > using a
> > > > > > > dmadev, so that it can be a much closer analog of memcpy. Once an 
> > > > > > > iommu is
> > > > > > > present, DPDK will run in VA mode, allowing virtual addresses to 
> > > > > > > our
> > > > > > > hugepage memory to be sent directly to hardware. Also, when using
> > > > > > > dmadevs on top of an in-kernel driver, that kernel driver may do 
> > > > > > > all iommu
> > > > > > > management for the app, removing further the restrictions on what 
> > > > > > > memory
> > > > > > > can be addressed by hardware.
> > > > > >
> > > > > >
> > > > > > One issue of keeping void * is that memory can come from stack or 
> > > > > > heap .
> > > > > > which HW can not really operate it on.
> > > > >
> > > > > when kernel driver is managing the IOMMU all process memory can be 
> > > > > worked
> > > > > on, not just hugepage memory, so using iova is wrong in these cases.
> > > >
> > > > But not for stack and heap memory. Right?
> > > >
> > > Yes, even stack and heap can be accessed.
> >
> > The HW device cannot as that memory is NOT mapped to IOMMU. It will
> > result in the transaction
> > fault.
> >
>
> Not if the kernel driver rather than DPDK is managing the 

Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-22 Thread Jerin Jacob
On Fri, Jun 18, 2021 at 2:22 PM fengchengwen  wrote:
>
> On 2021/6/17 22:18, Bruce Richardson wrote:
> > On Thu, Jun 17, 2021 at 12:02:00PM +0100, Bruce Richardson wrote:
> >> On Thu, Jun 17, 2021 at 05:48:05PM +0800, fengchengwen wrote:
> >>> On 2021/6/17 1:31, Bruce Richardson wrote:
>  On Wed, Jun 16, 2021 at 05:41:45PM +0800, fengchengwen wrote:
> > On 2021/6/16 0:38, Bruce Richardson wrote:
> >> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> >>> This patch introduces 'dmadevice' which is a generic type of DMA
> >>> device.
> >>>
> >>> The APIs of dmadev library exposes some generic operations which can
> >>> enable configuration and I/O with the DMA devices.
> >>>
> >>> Signed-off-by: Chengwen Feng 
> >>> ---
> >> Thanks for sending this.
> >>
> >> Of most interest to me right now are the key data-plane APIs. While we 
> >> are
> >> still in the prototyping phase, below is a draft of what we are 
> >> thinking
> >> for the key enqueue/perform_ops/completed_ops APIs.
> >>
> >> Some key differences I note in below vs your original RFC:
> >> * Use of void pointers rather than iova addresses. While using iova's 
> >> makes
> >>   sense in the general case when using hardware, in that it can work 
> >> with
> >>   both physical addresses and virtual addresses, if we change the APIs 
> >> to use
> >>   void pointers instead it will still work for DPDK in VA mode, while 
> >> at the
> >>   same time allow use of software fallbacks in error cases, and also a 
> >> stub
> >>   driver than uses memcpy in the background. Finally, using iova's 
> >> makes the
> >>   APIs a lot more awkward to use with anything but mbufs or similar 
> >> buffers
> >>   where we already have a pre-computed physical address.
> >
> > The iova is an hint to application, and widely used in DPDK.
> > If switch to void, how to pass the address (iova or just va ?)
> > this may introduce implementation dependencies here.
> >
> > Or always pass the va, and the driver performs address translation, and 
> > this
> > translation may cost too much cpu I think.
> >
> 
>  On the latter point, about driver doing address translation I would 
>  agree.
>  However, we probably need more discussion about the use of iova vs just
>  virtual addresses. My thinking on this is that if we specify the API 
>  using
>  iovas it will severely hurt usability of the API, since it forces the 
>  user
>  to take more inefficient codepaths in a large number of cases. Given a
>  pointer to the middle of an mbuf, one cannot just pass that straight as 
>  an
>  iova but must instead do a translation into offset from mbuf pointer and
>  then readd the offset to the mbuf base address.
> 
>  My preference therefore is to require the use of an IOMMU when using a
>  dmadev, so that it can be a much closer analog of memcpy. Once an iommu 
>  is
>  present, DPDK will run in VA mode, allowing virtual addresses to our
>  hugepage memory to be sent directly to hardware. Also, when using
>  dmadevs on top of an in-kernel driver, that kernel driver may do all 
>  iommu
>  management for the app, removing further the restrictions on what memory
>  can be addressed by hardware.
> >>>
> >>> Some DMA devices many don't support IOMMU or IOMMU bypass default, so 
> >>> driver may
> >>> should call rte_mem_virt2phy() do the address translate, but the 
> >>> rte_mem_virt2phy()
> >>> cost too many CPU cycles.
> >>>
> >>> If the API defined as iova, it will work fine in:
> >>> 1) If DMA don't support IOMMU or IOMMU bypass, then start application with
> >>>--iova-mode=pa
> >>> 2) If DMA support IOMMU, --iova-mode=pa/va work both fine
> >>>
> >>
> >> I suppose if we keep the iova as the datatype, we can just cast "void *"
> >> pointers to that in the case that virtual addresses can be used directly. I
> >> believe your RFC included a capability query API - "uses void * as iova"
> >> should probably be one of those capabilities, and that would resolve this.
> >> If DPDK is in iova=va mode because of the presence of an iommu, all drivers
> >> could report this capability too.
> >>
> 
> >> * Use of id values rather than user-provided handles. Allowing the 
> >> user/app
> >>   to manage the amount of data stored per operation is a better 
> >> solution, I
> >>   feel than proscribing a certain about of in-driver tracking. Some 
> >> apps may
> >>   not care about anything other than a job being completed, while 
> >> other apps
> >>   may have significant metadata to be tracked. Taking the user-context
> >>   handles out of the API also makes the driver code simpler.
> >
> > The user-provided handle was mainly used to simply application 
> > implementatio

Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-22 Thread Bruce Richardson
On Tue, Jun 22, 2021 at 11:01:47PM +0530, Jerin Jacob wrote:
> On Fri, Jun 18, 2021 at 3:25 PM Bruce Richardson
>  wrote:
> >
> > >
> > Taking the case of a simple copy op, the parameters we need are:
> >
> > * src
> > * dst
> > * length
> 
> OK. Is it the case where no other attribute that supported in HW or
> you are not planning to
> expose that through DPDK generic DMA API.
> 
Only other parameters that might be needed can all be specified as flags,
so all we need for a copy op is a general flags field for future expansion.

> >
> > Depending on the specific hardware there will also be passed in the
> > descriptor a completion address, but we plan for these cases to always have
> > the completions written back to a set location so that we have essentially
> > ring-writeback, as with the hardware which doesn't explicitly have a
> > separate completion address. Beyond that, I believe the only descriptor
> > fields we will use are just the flags field indicating the op type etc.
> 
> OK. In HW, we need to have IOVA for completion address that's only the
> constraint. rest looks good to me.
> 
That's like what we have, but I was not planning on exposing the completion
address through the API at all, but have it internal to the driver and let
the "completion" APIs just inform the app what is done or not. If we expose
completion addresses, then that leaves the app open to having to parse
different completion formats, so it needs to be internal IMHO.

/Bruce


[dpdk-dev] [PATCH 1/2] common/octeontx2: send link event to VF

2021-06-22 Thread Harman Kalra
Currently link event is only sent to the PF by AF as soon as it comes
up, or in case of any physical change in link. PF will broadcast
these link events to all its VFs as soon as it receives it.
But no event is sent when a new VF comes up, hence it will not have
the link status.
Adding support for sending link status to the VF once it comes up
successfully.

Signed-off-by: Harman Kalra 
---
 drivers/common/octeontx2/otx2_dev.c | 26 ++
 drivers/common/octeontx2/otx2_dev.h | 10 +++---
 2 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/drivers/common/octeontx2/otx2_dev.c 
b/drivers/common/octeontx2/otx2_dev.c
index 6a84df2344..1485e2b357 100644
--- a/drivers/common/octeontx2/otx2_dev.c
+++ b/drivers/common/octeontx2/otx2_dev.c
@@ -163,6 +163,32 @@ af_pf_wait_msg(struct otx2_dev *dev, uint16_t vf, int 
num_msg)
rsp->rc = msg->rc;
rsp->pcifunc = msg->pcifunc;
 
+   /* Whenever a PF comes up, AF sends the link status to it but
+* when VF comes up no such event is sent to respective VF.
+* Using MBOX_MSG_NIX_LF_START_RX response from AF for the
+* purpose and send the link status of PF to VF.
+*/
+   if (msg->id == MBOX_MSG_NIX_LF_START_RX) {
+   /* Send link status to VF */
+   struct cgx_link_user_info linfo;
+   struct mbox_msghdr *vf_msg;
+
+   /* Get the link status */
+   if (dev->ops && dev->ops->link_status_get)
+   dev->ops->link_status_get(dev, &linfo);
+
+   /* Prepare the message to be sent */
+   vf_msg = otx2_mbox_alloc_msg(&dev->mbox_vfpf_up, vf,
+size);
+   otx2_mbox_req_init(MBOX_MSG_CGX_LINK_EVENT, vf_msg);
+   memcpy((uint8_t *)vf_msg + sizeof(struct mbox_msghdr),
+  &linfo, sizeof(struct cgx_link_user_info));
+
+   vf_msg->rc = msg->rc;
+   vf_msg->pcifunc = msg->pcifunc;
+   /* Send to VF */
+   otx2_mbox_msg_send(&dev->mbox_vfpf_up, vf);
+   }
offset = mbox->rx_start + msg->next_msgoff;
}
rte_spinlock_unlock(&mdev->mbox_lock);
diff --git a/drivers/common/octeontx2/otx2_dev.h 
b/drivers/common/octeontx2/otx2_dev.h
index cd4fe517db..be0faacc6a 100644
--- a/drivers/common/octeontx2/otx2_dev.h
+++ b/drivers/common/octeontx2/otx2_dev.h
@@ -57,15 +57,19 @@
 
 struct otx2_dev;
 
-/* Link status callback */
-typedef void (*otx2_link_status_t)(struct otx2_dev *dev,
+/* Link status update callback */
+typedef void (*otx2_link_status_update_t)(struct otx2_dev *dev,
   struct cgx_link_user_info *link);
 /* PTP info callback */
 typedef int (*otx2_ptp_info_t)(struct otx2_dev *dev, bool ptp_en);
+/* Link status get callback */
+typedef void (*otx2_link_status_get_t)(struct otx2_dev *dev,
+  struct cgx_link_user_info *link);
 
 struct otx2_dev_ops {
-   otx2_link_status_t link_status_update;
+   otx2_link_status_update_t link_status_update;
otx2_ptp_info_t ptp_info_update;
+   otx2_link_status_get_t link_status_get;
 };
 
 #define OTX2_DEV   \
-- 
2.18.0



[dpdk-dev] [PATCH 2/2] net/octeontx2: callback for getting link status

2021-06-22 Thread Harman Kalra
Adding a new callback for reading the link status. PF can read it's
link status and can forward the same to VF once it comes up.

Signed-off-by: Harman Kalra 
---
 drivers/net/octeontx2/otx2_ethdev.c |  8 +++-
 drivers/net/octeontx2/otx2_ethdev.h |  2 ++
 drivers/net/octeontx2/otx2_link.c   | 23 +++
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/net/octeontx2/otx2_ethdev.c 
b/drivers/net/octeontx2/otx2_ethdev.c
index 0834de0cb1..471fa34da1 100644
--- a/drivers/net/octeontx2/otx2_ethdev.c
+++ b/drivers/net/octeontx2/otx2_ethdev.c
@@ -42,7 +42,8 @@ nix_get_tx_offload_capa(struct otx2_eth_dev *dev)
 
 static const struct otx2_dev_ops otx2_dev_ops = {
.link_status_update = otx2_eth_dev_link_status_update,
-   .ptp_info_update = otx2_eth_dev_ptp_info_update
+   .ptp_info_update = otx2_eth_dev_ptp_info_update,
+   .link_status_get = otx2_eth_dev_link_status_get
 };
 
 static int
@@ -2625,6 +2626,11 @@ otx2_eth_dev_uninit(struct rte_eth_dev *eth_dev, bool 
mbox_close)
 
nix_cgx_stop_link_event(dev);
 
+   /* Unregister the dev ops, this is required to stop VFs from
+* receiving link status updates on exit path.
+*/
+   dev->ops = NULL;
+
/* Free up SQs */
for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
otx2_nix_tx_queue_release(eth_dev->data->tx_queues[i]);
diff --git a/drivers/net/octeontx2/otx2_ethdev.h 
b/drivers/net/octeontx2/otx2_ethdev.h
index ac50da7b18..08f9a2a7bc 100644
--- a/drivers/net/octeontx2/otx2_ethdev.h
+++ b/drivers/net/octeontx2/otx2_ethdev.h
@@ -453,6 +453,8 @@ void otx2_nix_toggle_flag_link_cfg(struct otx2_eth_dev 
*dev, bool set);
 int otx2_nix_link_update(struct rte_eth_dev *eth_dev, int wait_to_complete);
 void otx2_eth_dev_link_status_update(struct otx2_dev *dev,
 struct cgx_link_user_info *link);
+void otx2_eth_dev_link_status_get(struct otx2_dev *dev,
+ struct cgx_link_user_info *link);
 int otx2_nix_dev_set_link_up(struct rte_eth_dev *eth_dev);
 int otx2_nix_dev_set_link_down(struct rte_eth_dev *eth_dev);
 int otx2_apply_link_speed(struct rte_eth_dev *eth_dev);
diff --git a/drivers/net/octeontx2/otx2_link.c 
b/drivers/net/octeontx2/otx2_link.c
index a79b997376..5378e5c3b9 100644
--- a/drivers/net/octeontx2/otx2_link.c
+++ b/drivers/net/octeontx2/otx2_link.c
@@ -47,6 +47,29 @@ nix_link_status_print(struct rte_eth_dev *eth_dev, struct 
rte_eth_link *link)
otx2_info("Port %d: Link Down", (int)(eth_dev->data->port_id));
 }
 
+void
+otx2_eth_dev_link_status_get(struct otx2_dev *dev,
+struct cgx_link_user_info *link)
+{
+   struct otx2_eth_dev *otx2_dev = (struct otx2_eth_dev *)dev;
+   struct rte_eth_link eth_link;
+   struct rte_eth_dev *eth_dev;
+
+   if (!link || !dev)
+   return;
+
+   eth_dev = otx2_dev->eth_dev;
+   if (!eth_dev)
+   return;
+
+   rte_eth_linkstatus_get(eth_dev, ð_link);
+
+   link->link_up = eth_link.link_status;
+   link->speed = eth_link.link_speed;
+   link->an = eth_link.link_autoneg;
+   link->full_duplex = eth_link.link_duplex;
+}
+
 void
 otx2_eth_dev_link_status_update(struct otx2_dev *dev,
struct cgx_link_user_info *link)
-- 
2.18.0



Re: [dpdk-dev] [dpdk-stable] [PATCH v3] kni: fix mbuf allocation for alloc FIFO

2021-06-22 Thread Thomas Monjalon
22/06/2021 14:44, wangyunjian:
> From: Yunjian Wang 
> 
> In kni_allocate_mbufs(), we alloc mbuf for alloc_q as this code.
> allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
>   & (MAX_MBUF_BURST_NUM - 1);
> The value of allocq_free maybe zero, for example :
> The ring size is 1024. After init, write = read = 0. Then we fill
> kni->alloc_q to full. At this time, write = 1023, read = 0.
> 
> Then the kernel send 32 packets to userspace. At this time, write
> = 1023, read = 32. And then the userspace receive this 32 packets.
> Then fill the kni->alloc_q, (32 - 1023 - 1) & 31 = 0, fill nothing.
> ...
> Then the kernel send 32 packets to userspace. At this time, write
> = 1023, read = 992. And then the userspace receive this 32 packets.
> Then fill the kni->alloc_q, (992 - 1023 - 1) & 31 = 0, fill nothing.
> 
> Then the kernel send 32 packets to userspace. The kni->alloc_q only
> has 31 mbufs and will drop one packet.
> 
> Absolutely, this is a special scene. Normally, it will fill some
> mbufs everytime, but may not enough for the kernel to use.
> 
> In this patch, we always keep the kni->alloc_q to full for the kernel
> to use.
> 
> Fixes: 49da4e82cf94 ("kni: allocate no more mbuf than empty slots in queue")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Cheng Liu 
> Signed-off-by: Yunjian Wang 
> Acked-by: Ferruh Yigit 
> ---
> v3:
>update patch title
> v2:
>add fixes tag and update commit log
> ---
>  lib/kni/rte_kni.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/kni/rte_kni.c b/lib/kni/rte_kni.c
> index 9dae6a8d7c..eb24b0d0ae 100644
> --- a/lib/kni/rte_kni.c
> +++ b/lib/kni/rte_kni.c
> @@ -677,8 +677,9 @@ kni_allocate_mbufs(struct rte_kni *kni)
>   return;
>   }
>  
> - allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1)
> - & (MAX_MBUF_BURST_NUM - 1);
> + allocq_free = kni_fifo_free_count(kni->alloc_q);

Can we insert a comment here to explain the logic?

> + allocq_free = (allocq_free > MAX_MBUF_BURST_NUM) ?
> + MAX_MBUF_BURST_NUM : allocq_free;
>   for (i = 0; i < allocq_free; i++) {
>   pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
>   if (unlikely(pkts[i] == NULL)) {

About the title, I don't understand the part "for alloc FIFO",
given all mbufs are in a FIFO queue in KNI, right?




Re: [dpdk-dev] [PATCH] raw/ioat: fix memory leak in device configure

2021-06-22 Thread Thomas Monjalon
17/06/2021 16:20, Bruce Richardson:
> On Thu, Jun 17, 2021 at 02:17:52PM +, Kevin Laatz wrote:
> > During device configure, memory is allocated for "hdl_ring_flags". In the
> > event of another call to the device configure function (reconfigure), a
> > memory leak would occur. This patch fixes the memory leak by free'ing the
> > memory before reallocating it.
> > 
> > Fixes: 245efe544d8e ("raw/ioat: report status of completed jobs")
> > 
> > Signed-off-by: Kevin Laatz 
> > ---
> 
> Thanks, Kevin.
> Acked-by: Bruce Richardson 

Cc: sta...@dpdk.org

Applied, thanks




Re: [dpdk-dev] [dpdk-stable] [PATCH] raw/ioat: fix missing ring pointer reset

2021-06-22 Thread Thomas Monjalon
17/06/2021 16:21, Bruce Richardson:
> On Thu, Jun 17, 2021 at 02:18:15PM +, Kevin Laatz wrote:
> > In the event of a device reconfigure, "hdls_avail" is not being reset. This
> > can lead to miscalculations in rte_ioat_completed_ops(), causing the
> > function to report an incorrect amount of completed operations. This patch
> > fixes the issue by resetting "hdls_avail" during the device configure.
> > 
> > Fixes: 74464005a2af ("raw/ioat: rework SW ring layout")
> > 
> > Signed-off-by: Kevin Laatz 
> > ---
> To catch more of these reconfigure issues, we should look to add an
> appropriate unit test.
> 
> Acked-by: Bruce Richardson 

Cc: sta...@dpdk.org

Applied, thanks




Re: [dpdk-dev] rte_memcpy - fence and stream

2021-06-22 Thread Morten Brørup
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Morten Brørup
> Sent: Thursday, 27 May 2021 20.15
> 
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Thursday, 27 May 2021 19.22
> >
> > On Thu, May 27, 2021 at 10:39:59PM +0530, Manish Sharma wrote:
> > >For the case I have, hardly 2% of the data buffers which are
> being
> > >copied get looked at - mostly its for DMA.

Which data buffers are you not looking at, Manish? The original data buffers, 
or the copies, or both?

> > >Having a version of DPDK
> > >memcopy that does non temporal copies would definitely be good.
> > >If in my case, I have a lot of CPUs doing the copy in parallel,
> > would
> > >I/OAT driver copy accelerator still help?
> > >
> > It will depend upon the size of the copies being done. For bigger
> > packets
> > the accelerator can help free up CPU cycles for other things.
> >
> > However, if only 2% of the data which is being copied gets looked at,
> > why
> > does it need to be copied? Can the original buffers not be used in
> that
> > case?
> 
> I can only speak for myself here...
> 
> Our firmware has a packet capture feature with a filter.
> 
> If a packet matches the capture filter, a metadata header and the
> relevant part of the packet contents ("snap length" in tcpdump
> terminology) is appended to a large memory area (the "capture buffer")
> using rte_pktmbuf_read/rte_memcpy. This capture buffer is only read
> through the GUI or management API by the network administrator, i.e. it
> will only be read minutes or hours later, so there is no need to put
> any of it in any CPU cache.
> 
> It does not make sense to clone and hold on to many thousands of mbufs
> when we only need some of their contents. So we copy the contents
> instead of increasing the mbuf refcount.
> 
> We currently only use our packet capture feature for R&D purposes, so
> we have not optimized it yet. However, we will need to optimize it for
> production use at some point. So I find this discussion initiated by
> Manish very interesting.
> 
> -Morten

Here's some code for inspiration. I haven't tested it yet. And it can be 
further optimized.

/**
 * Copy 16 bytes from one location to another, using non-temporal storage
 * at the destination.
 * The locations must not overlap.
 *
 * @param dst
 *   Pointer to the destination of the data.
 *   Must be aligned on a 16-byte boundary.
 * @param src
 *   Pointer to the source data.
 *   Does not need to be aligned on any particular boundary.
 */
static __rte_always_inline void
rte_mov16_aligned16_non_temporal(uint8_t *dst, const uint8_t *src)
{
__m128i xmm0;

xmm0 = _mm_loadu_si128((const __m128i *)src);
_mm_stream_si128((__m128i *)dst, xmm0);
}

/**
 * Copy bytes from one location to another, using non-temporal storage
 * at the destination.
 * The locations must not overlap.
 *
 * @param dst
 *   Pointer to the destination of the data.
 *   Must be aligned on a 16-byte boundary.
 * @param src
 *   Pointer to the source data.
 *   Does not need to be aligned on any particular boundary.
 * @param n
 *   Number of bytes to copy.
 *   Must be divisble by 4.
 * @return
 *   Pointer to the destination data.
 */
static __rte_always_inline void *
rte_memcpy_aligned16_non_temporal(void *dst, const void *src, size_t n)
{
void * const ret = dst;

RTE_ASSERT(!((uintptr_t)dst & 0xF));
RTE_ASSERT(!(n & 3));

while (n >= 16) {
rte_mov16_aligned16_non_temporal(dst, src);
src = (const uint8_t *)src + 16;
dst = (uint8_t *)dst + 16;
n -= 16;
}
if (n & 8) {
int64_t a = *(const int64_t *)src;
_mm_stream_si64((long long int *)dst, a);
src = (const uint8_t *)src + 8;
dst = (uint8_t *)dst + 8;
n -= 8;
}
if (n & 4) {
int32_t a = *(const int32_t *)src;
_mm_stream_si32((int32_t *)dst, a);
src = (const uint8_t *)src + 4;
dst = (uint8_t *)dst + 4;
n -= 4;
}

return ret;
}



Re: [dpdk-dev] [PATCH v4 2/2] bus/auxiliary: introduce auxiliary bus

2021-06-22 Thread Xueming(Steven) Li



> -Original Message-
> From: Thomas Monjalon 
> Sent: Tuesday, June 22, 2021 12:11 AM
> To: Parav Pandit ; Xueming(Steven) Li 
> Cc: dev@dpdk.org; Wang Haiyue ; Kinsella Ray 
> ; david.march...@redhat.com;
> ferruh.yi...@intel.com
> Subject: Re: [dpdk-dev] [PATCH v4 2/2] bus/auxiliary: introduce auxiliary bus
> 
> 13/06/2021 14:58, Xueming Li:
> > Auxiliary bus [1] provides a way to split function into child-devices
> > representing sub-domains of functionality. Each auxiliary device
> > represents a part of its parent functionality.
> >
> > Auxiliary device is identified by unique device name, sysfs path:
> >   /sys/bus/auxiliary/devices/
> >
> > Devargs syntax of auxiliary device:
> >   -a auxiliary:[,args...]
> 
> What about suggesting the new generic syntax?

I'll list both.

> 
> > [1] kernel auxiliary bus document:
> > https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html
> >
> > Signed-off-by: Xueming Li 
> [...]
> > --- a/doc/guides/rel_notes/release_21_08.rst
> > +++ b/doc/guides/rel_notes/release_21_08.rst
> > @@ -55,6 +55,13 @@ New Features
> >   Also, make sure to start the actual text at the margin.
> >   ===
> >
> > +* **Added auxiliary bus support.**
> > +
> > +  * Auxiliary bus provides a way to split function into child-devices
> > +representing sub-domains of functionality. Each auxiliary device
> > +represents a part of its parent functionality.
> > +  * Devargs syntax of auxiliary device: -a auxiliary:[,args...]
> 
> I am not sure the release notes are the right place to provide a guide of the 
> syntax, and this syntax is not the new generice one with
> "bus=" that we want to promote.
> I would just remove this last line from the release notes
> 
> > --- /dev/null
> > +++ b/drivers/bus/auxiliary/auxiliary_common.c
> > @@ -0,0 +1,419 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright 2021 Mellanox Technologies, Ltd
> 
> I think we should use the NVIDIA copyright now.

Good catch!

> 
> > +static struct rte_devargs *
> > +auxiliary_devargs_lookup(const char *name) {
> > +   struct rte_devargs *devargs;
> > +
> > +   RTE_EAL_DEVARGS_FOREACH(RTE_BUS_AXILIARY_NAME, devargs) {
> 
> Missing an "U" in RTE_BUS_AXILIARY_NAME
> 
> [...]
> > +/*
> > + * Scan the content of the auxiliary bus, and the devices in the
> > +devices
> > + * list
> 
> Simpler: Scan the devices in the auxiliary bus.
> 
> [...]
> > +/**
> > + * Update a device being scanned.
> 
> Not clear what is updated.
> It seems to be just the devargs part?
> 
> > + *
> > + * @param aux_dev
> > + * AUXILIARY device.
> > + */
> 
> Should not be a doxygen comment.
> 
> > +void
> > +auxiliary_on_scan(struct rte_auxiliary_device *aux_dev) {
> > +   aux_dev->device.devargs = auxiliary_devargs_lookup(aux_dev->name);
> > +}
> 
> [...]
> > +static int
> > +rte_auxiliary_probe_one_driver(struct rte_auxiliary_driver *dr,
> > +  struct rte_auxiliary_device *dev) {
> > +   enum rte_iova_mode iova_mode;
> > +   int ret;
> > +
> > +   if ((dr == NULL) || (dev == NULL))
> > +   return -EINVAL;
> > +
> > +   /* The device is not blocked; Check if driver supports it. */
> 
> I don't understand why the comment about "not blocked" here.
> The policy check is below.
> 
> > +   if (!auxiliary_match(dr, dev))
> > +   /* Match of device and driver failed */
> > +   return 1;
> > +
> > +   AUXILIARY_LOG(DEBUG, "Auxiliary device %s on NUMA socket %i\n",
> > + dev->name, dev->device.numa_node);
> > +
> > +   /* No initialization when marked as blocked, return without error. */
> > +   if (dev->device.devargs != NULL &&
> > +   dev->device.devargs->policy == RTE_DEV_BLOCKED) {
> > +   AUXILIARY_LOG(INFO, "  Device is blocked, not initializing\n");
> 
> Please no indent inside logs.
> And no \n as it is already in the macro.
> 
> > +   return -1;
> > +   }
> 
> [...]
> > +static int
> > +rte_auxiliary_driver_remove_dev(struct rte_auxiliary_device *dev) {
> > +   struct rte_auxiliary_driver *dr;
> 
> Not sure this variable is needed.
> If you keep it, please "drv" is better.
> 
> > +   int ret = 0;
> > +
> > +   if (dev == NULL)
> > +   return -EINVAL;
> > +
> > +   dr = dev->driver;
> > +
> > +   AUXILIARY_LOG(DEBUG, "Auxiliary device %s on NUMA socket %i\n",
> > + dev->name, dev->device.numa_node);
> > +
> > +   AUXILIARY_LOG(DEBUG, "  remove driver: %s %s\n",
> > + dev->name, dr->driver.name);
> > +
> > +   if (dr->remove) {
> > +   ret = dr->remove(dev);
> > +   if (ret < 0)
> > +   return ret;
> > +   }
> 
> [...]
> > +/*
> > + * Scan the content of the auxiliary bus, and call the probe()
> > +function for
> > + *
> > + * all registered drivers that have a matching entry in its id_table
> > + * for discovered devices.
> 
> Please elaborate what is the id_table.

Hmm, legacy code form pci bus, remo

[dpdk-dev] [PATCH v5 1/2] devargs: add common key definition

2021-06-22 Thread Xueming Li
Adds common devargs key definition for "bus", "class" and "driver".

Cc: Thomas Monjalon 
Signed-off-by: Xueming Li 
---
 drivers/common/mlx5/mlx5_common.h   |  2 --
 drivers/common/mlx5/mlx5_common_pci.c   |  2 +-
 drivers/common/sfc_efx/sfc_efx.c|  7 +++
 drivers/common/sfc_efx/sfc_efx.h|  2 --
 drivers/net/bonding/rte_eth_bond_args.c |  2 +-
 drivers/net/i40e/i40e_ethdev_vf.c   |  5 ++---
 drivers/net/iavf/iavf_ethdev.c  |  5 ++---
 drivers/net/mlx5/mlx5.c |  4 ++--
 drivers/net/sfc/sfc_kvargs.c|  2 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c   |  2 +-
 lib/eal/common/eal_common_devargs.c | 12 ++--
 lib/eal/include/rte_devargs.h   | 24 
 12 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_common.h 
b/drivers/common/mlx5/mlx5_common.h
index 1fbefe0fa6..306f2f1ab7 100644
--- a/drivers/common/mlx5/mlx5_common.h
+++ b/drivers/common/mlx5/mlx5_common.h
@@ -208,8 +208,6 @@ __rte_internal
 int mlx5_get_ifname_sysfs(const char *ibdev_path, char *ifname);
 
 
-#define MLX5_CLASS_ARG_NAME "class"
-
 enum mlx5_class {
MLX5_CLASS_INVALID,
MLX5_CLASS_NET = RTE_BIT64(0),
diff --git a/drivers/common/mlx5/mlx5_common_pci.c 
b/drivers/common/mlx5/mlx5_common_pci.c
index 3f16cd21cf..34747c4e07 100644
--- a/drivers/common/mlx5/mlx5_common_pci.c
+++ b/drivers/common/mlx5/mlx5_common_pci.c
@@ -118,7 +118,7 @@ bus_cmdline_options_handler(__rte_unused const char *key,
 static int
 parse_class_options(const struct rte_devargs *devargs)
 {
-   const char *key = MLX5_CLASS_ARG_NAME;
+   const char *key = RTE_DEVARGS_KEY_CLASS;
struct rte_kvargs *kvlist;
int ret = 0;
 
diff --git a/drivers/common/sfc_efx/sfc_efx.c b/drivers/common/sfc_efx/sfc_efx.c
index 0b78933d9f..2dc5545760 100644
--- a/drivers/common/sfc_efx/sfc_efx.c
+++ b/drivers/common/sfc_efx/sfc_efx.c
@@ -42,7 +42,6 @@ enum sfc_efx_dev_class
 sfc_efx_dev_class_get(struct rte_devargs *devargs)
 {
struct rte_kvargs *kvargs;
-   const char *key = SFC_EFX_KVARG_DEV_CLASS;
enum sfc_efx_dev_class dev_class = SFC_EFX_DEV_CLASS_NET;
 
if (devargs == NULL)
@@ -52,9 +51,9 @@ sfc_efx_dev_class_get(struct rte_devargs *devargs)
if (kvargs == NULL)
return dev_class;
 
-   if (rte_kvargs_count(kvargs, key) != 0) {
-   rte_kvargs_process(kvargs, key, sfc_efx_kvarg_dev_class_handler,
-  &dev_class);
+   if (rte_kvargs_count(kvargs, RTE_DEVARGS_KEY_CLASS) != 0) {
+   rte_kvargs_process(kvargs, RTE_DEVARGS_KEY_CLASS,
+  sfc_efx_kvarg_dev_class_handler, &dev_class);
}
 
rte_kvargs_free(kvargs);
diff --git a/drivers/common/sfc_efx/sfc_efx.h b/drivers/common/sfc_efx/sfc_efx.h
index 6b6164cb1f..c16eca60f3 100644
--- a/drivers/common/sfc_efx/sfc_efx.h
+++ b/drivers/common/sfc_efx/sfc_efx.h
@@ -19,8 +19,6 @@
 extern "C" {
 #endif
 
-#define SFC_EFX_KVARG_DEV_CLASS"class"
-
 enum sfc_efx_dev_class {
SFC_EFX_DEV_CLASS_INVALID = 0,
SFC_EFX_DEV_CLASS_NET,
diff --git a/drivers/net/bonding/rte_eth_bond_args.c 
b/drivers/net/bonding/rte_eth_bond_args.c
index 764b1b8c8e..5406e1c934 100644
--- a/drivers/net/bonding/rte_eth_bond_args.c
+++ b/drivers/net/bonding/rte_eth_bond_args.c
@@ -18,7 +18,7 @@ const char *pmd_bond_init_valid_arguments[] = {
PMD_BOND_SOCKET_ID_KVARG,
PMD_BOND_MAC_ADDR_KVARG,
PMD_BOND_AGG_MODE_KVARG,
-   "driver",
+   RTE_DEVARGS_KEY_DRIVER,
NULL
 };
 
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 385ebedcd3..0cfe13b7b2 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -1660,7 +1660,6 @@ static int
 i40evf_driver_selected(struct rte_devargs *devargs)
 {
struct rte_kvargs *kvlist;
-   const char *key = "driver";
int ret = 0;
 
if (devargs == NULL)
@@ -1670,13 +1669,13 @@ i40evf_driver_selected(struct rte_devargs *devargs)
if (kvlist == NULL)
return 0;
 
-   if (!rte_kvargs_count(kvlist, key))
+   if (!rte_kvargs_count(kvlist, RTE_DEVARGS_KEY_DRIVER))
goto exit;
 
/* i40evf driver selected when there's a key-value pair:
 * driver=i40evf
 */
-   if (rte_kvargs_process(kvlist, key,
+   if (rte_kvargs_process(kvlist, RTE_DEVARGS_KEY_DRIVER,
   i40evf_check_driver_handler, NULL) < 0)
goto exit;
 
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index a7ef7a6d4d..6793bcef08 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -2448,7 +2448,6 @@ static int
 iavf_drv_i40evf_selected(struct rte_devargs *devargs, uint16_t device_id)
 {
struct rte_kvargs *kvlist;
-   const char *key = "drive

[dpdk-dev] [PATCH v5 2/2] bus/auxiliary: introduce auxiliary bus

2021-06-22 Thread Xueming Li
Auxiliary bus [1] provides a way to split function into child-devices
representing sub-domains of functionality. Each auxiliary device
represents a part of its parent functionality.

Auxiliary device is identified by unique device name, sysfs path:
  /sys/bus/auxiliary/devices/

Devargs legacy syntax ofauxiliary device:
  -a auxiliary:[,args...]
Devargs generic syntax of auxiliary device:
  -a bus=auxiliary,name=,,/class=,,/driver=,,

[1] kernel auxiliary bus document:
https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html

Signed-off-by: Xueming Li 
Cc: Wang Haiyue 
Cc: Thomas Monjalon 
Cc: Kinsella Ray 
---
 MAINTAINERS   |   5 +
 doc/guides/rel_notes/release_21_08.rst|   6 +
 drivers/bus/auxiliary/auxiliary_common.c  | 418 ++
 drivers/bus/auxiliary/auxiliary_params.c  |  58 +++
 drivers/bus/auxiliary/linux/auxiliary.c   | 142 
 drivers/bus/auxiliary/meson.build |  16 +
 drivers/bus/auxiliary/private.h   |  75 
 drivers/bus/auxiliary/rte_bus_auxiliary.h | 199 ++
 drivers/bus/auxiliary/version.map |   3 +
 drivers/bus/meson.build   |   1 +
 10 files changed, 923 insertions(+)
 create mode 100644 drivers/bus/auxiliary/auxiliary_common.c
 create mode 100644 drivers/bus/auxiliary/auxiliary_params.c
 create mode 100644 drivers/bus/auxiliary/linux/auxiliary.c
 create mode 100644 drivers/bus/auxiliary/meson.build
 create mode 100644 drivers/bus/auxiliary/private.h
 create mode 100644 drivers/bus/auxiliary/rte_bus_auxiliary.h
 create mode 100644 drivers/bus/auxiliary/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 5877a16971..eaf691ca6a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -525,6 +525,11 @@ F: doc/guides/mempool/octeontx2.rst
 Bus Drivers
 ---
 
+Auxiliary bus driver
+M: Parav Pandit 
+M: Xueming Li 
+F: drivers/bus/auxiliary/
+
 Intel FPGA bus
 M: Rosen Xu 
 F: drivers/bus/ifpga/
diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf3ce..e7ef4c8a05 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -55,6 +55,12 @@ New Features
  Also, make sure to start the actual text at the margin.
  ===
 
+* **Added auxiliary bus support.**
+
+  Auxiliary bus provides a way to split function into child-devices
+  representing sub-domains of functionality. Each auxiliary device
+  represents a part of its parent functionality.
+
 
 Removed Items
 -
diff --git a/drivers/bus/auxiliary/auxiliary_common.c 
b/drivers/bus/auxiliary/auxiliary_common.c
new file mode 100644
index 00..25d6802f24
--- /dev/null
+++ b/drivers/bus/auxiliary/auxiliary_common.c
@@ -0,0 +1,418 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "private.h"
+#include "rte_bus_auxiliary.h"
+
+static struct rte_devargs *
+auxiliary_devargs_lookup(const char *name)
+{
+   struct rte_devargs *devargs;
+
+   RTE_EAL_DEVARGS_FOREACH(RTE_BUS_AUXILIARY_NAME, devargs) {
+   if (strcmp(devargs->name, name) == 0)
+   return devargs;
+   }
+   return NULL;
+}
+
+/*
+ * Test whether the auxiliary device exist
+ *
+ * Stub for OS not supporting auxiliary bus.
+ */
+__rte_weak bool
+auxiliary_dev_exists(const char *name)
+{
+   RTE_SET_USED(name);
+   return false;
+}
+
+/*
+ * Scan the devices in the auxiliary bus.
+ *
+ * Stub for OS not supporting auxiliary bus.
+ */
+__rte_weak int
+auxiliary_scan(void)
+{
+   return 0;
+}
+
+/*
+ * Update a device's devargs being scanned.
+ *
+ * @param aux_dev
+ * AUXILIARY device.
+ */
+void
+auxiliary_on_scan(struct rte_auxiliary_device *aux_dev)
+{
+   aux_dev->device.devargs = auxiliary_devargs_lookup(aux_dev->name);
+}
+
+/*
+ * Match the auxiliary driver and device using driver function.
+ */
+bool
+auxiliary_match(const struct rte_auxiliary_driver *aux_drv,
+   const struct rte_auxiliary_device *aux_dev)
+{
+   if (aux_drv->match == NULL)
+   return false;
+   return aux_drv->match(aux_dev->name);
+}
+
+/*
+ * Call the probe() function of the driver.
+ */
+static int
+rte_auxiliary_probe_one_driver(struct rte_auxiliary_driver *drv,
+  struct rte_auxiliary_device *dev)
+{
+   enum rte_iova_mode iova_mode;
+   int ret;
+
+   if ((drv == NULL) || (dev == NULL))
+   return -EINVAL;
+
+   /* Check if driver supports it. */
+   if (!auxiliary_match(drv, dev))
+   /* Match of device and driver failed */
+   return 1;
+
+   AUXILIARY_LOG(DEBUG, "Auxiliary device %

[dpdk-dev] [PATCH v3 0/4] vhost: support async dequeue for split ring

2021-06-22 Thread Wenwu Ma
This patch implements asynchronous dequeue data path for split ring.
A new asynchronous dequeue function is introduced. With this function,
the application can try to receive packets from the guest with
offloading large copies to the DMA engine, thus saving precious CPU
cycles.

v3:
- Fix compilation warning and error in arm platform.
- Restore the removed function virtio_dev_pktmbuf_alloc,
  async dequeue allocate packets in separate.

v2:
- Refactor vhost datapath as preliminary patch for this series.
- The change of using new API in examples/vhost is put into a
  dedicated patch.
- Check queue_id value before using it.
- Async dequeue performance enhancement. 160% performance improvement
  for v2 vs. v1.
- Async dequeue API name change from rte_vhost_try_dequeue_burst to
  rte_vhost_async_try_dequeue_burst.
- The completed package updates the used ring directly.

Wenwu Ma (3):
  examples/vhost: refactor vhost enqueue and dequeue datapaths.
  examples/vhost: use a new API to query remaining ring space
  examples/vhost: support vhost async dequeue data path

Yuan Wang (1):
  vhost: support async dequeue for split ring

 doc/guides/prog_guide/vhost_lib.rst |  10 +
 doc/guides/sample_app_ug/vhost.rst  |   9 +-
 examples/vhost/ioat.c   |  67 +++-
 examples/vhost/ioat.h   |  25 ++
 examples/vhost/main.c   | 224 +++
 examples/vhost/main.h   |  33 +-
 examples/vhost/virtio_net.c |  16 +-
 lib/vhost/rte_vhost_async.h |  44 ++-
 lib/vhost/version.map   |   3 +
 lib/vhost/virtio_net.c  | 579 
 10 files changed, 902 insertions(+), 108 deletions(-)

-- 
2.25.1



[dpdk-dev] [PATCH v3 1/4] examples/vhost: refactor vhost enqueue and dequeue datapaths.

2021-06-22 Thread Wenwu Ma
Previously, by judging the flag, we call different enqueue/dequeue
functions in data path.

Now, we use an ops that was initialized when Vhost was created,
so that we can call ops directly in Vhost data path without any more
flag judgment.

Signed-off-by: Wenwu Ma 
---
 examples/vhost/main.c   | 112 
 examples/vhost/main.h   |  33 +--
 examples/vhost/virtio_net.c |  16 +-
 3 files changed, 105 insertions(+), 56 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index d2179eadb9..aebdc3a566 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -106,6 +106,8 @@ static uint32_t burst_rx_retry_num = BURST_RX_RETRIES;
 static char *socket_files;
 static int nb_sockets;
 
+static struct vhost_queue_ops vdev_queue_ops[MAX_VHOST_DEVICE];
+
 /* empty vmdq configuration structure. Filled in programatically */
 static struct rte_eth_conf vmdq_conf_default = {
.rxmode = {
@@ -885,27 +887,8 @@ drain_vhost(struct vhost_dev *vdev)
uint16_t nr_xmit = vhost_txbuff[buff_idx]->len;
struct rte_mbuf **m = vhost_txbuff[buff_idx]->m_table;
 
-   if (builtin_net_driver) {
-   ret = vs_enqueue_pkts(vdev, VIRTIO_RXQ, m, nr_xmit);
-   } else if (async_vhost_driver) {
-   uint32_t cpu_cpl_nr = 0;
-   uint16_t enqueue_fail = 0;
-   struct rte_mbuf *m_cpu_cpl[nr_xmit];
-
-   complete_async_pkts(vdev);
-   ret = rte_vhost_submit_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-   m, nr_xmit, m_cpu_cpl, &cpu_cpl_nr);
-
-   if (cpu_cpl_nr)
-   free_pkts(m_cpu_cpl, cpu_cpl_nr);
-
-   enqueue_fail = nr_xmit - ret;
-   if (enqueue_fail)
-   free_pkts(&m[ret], nr_xmit - ret);
-   } else {
-   ret = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-   m, nr_xmit);
-   }
+   ret = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+   VIRTIO_RXQ, m, nr_xmit);
 
if (enable_stats) {
__atomic_add_fetch(&vdev->stats.rx_total_atomic, nr_xmit,
@@ -1184,6 +1167,36 @@ drain_mbuf_table(struct mbuf_table *tx_q)
}
 }
 
+uint16_t
+async_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
+   struct rte_mbuf **pkts, uint32_t rx_count)
+{
+   uint16_t enqueue_count;
+   uint32_t cpu_cpl_nr = 0;
+   uint16_t enqueue_fail = 0;
+   struct rte_mbuf *m_cpu_cpl[MAX_PKT_BURST];
+
+   complete_async_pkts(vdev);
+   enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
+   queue_id, pkts, rx_count,
+   m_cpu_cpl, &cpu_cpl_nr);
+   if (cpu_cpl_nr)
+   free_pkts(m_cpu_cpl, cpu_cpl_nr);
+
+   enqueue_fail = rx_count - enqueue_count;
+   if (enqueue_fail)
+   free_pkts(&pkts[enqueue_count], enqueue_fail);
+
+   return enqueue_count;
+}
+
+uint16_t
+sync_enqueue_pkts(struct vhost_dev *vdev, uint16_t queue_id,
+   struct rte_mbuf **pkts, uint32_t rx_count)
+{
+   return rte_vhost_enqueue_burst(vdev->vid, queue_id, pkts, rx_count);
+}
+
 static __rte_always_inline void
 drain_eth_rx(struct vhost_dev *vdev)
 {
@@ -1214,29 +1227,8 @@ drain_eth_rx(struct vhost_dev *vdev)
}
}
 
-   if (builtin_net_driver) {
-   enqueue_count = vs_enqueue_pkts(vdev, VIRTIO_RXQ,
-   pkts, rx_count);
-   } else if (async_vhost_driver) {
-   uint32_t cpu_cpl_nr = 0;
-   uint16_t enqueue_fail = 0;
-   struct rte_mbuf *m_cpu_cpl[MAX_PKT_BURST];
-
-   complete_async_pkts(vdev);
-   enqueue_count = rte_vhost_submit_enqueue_burst(vdev->vid,
-   VIRTIO_RXQ, pkts, rx_count,
-   m_cpu_cpl, &cpu_cpl_nr);
-   if (cpu_cpl_nr)
-   free_pkts(m_cpu_cpl, cpu_cpl_nr);
-
-   enqueue_fail = rx_count - enqueue_count;
-   if (enqueue_fail)
-   free_pkts(&pkts[enqueue_count], enqueue_fail);
-
-   } else {
-   enqueue_count = rte_vhost_enqueue_burst(vdev->vid, VIRTIO_RXQ,
-   pkts, rx_count);
-   }
+   enqueue_count = vdev_queue_ops[vdev->vid].enqueue_pkt_burst(vdev,
+   VIRTIO_RXQ, pkts, rx_count);
 
if (enable_stats) {
__atomic_add_fetch(&vdev->stats.rx_total_atomic, rx_count,
@@ -1249,6 +1241,14 @@ drain_eth_rx(struct vhost_dev *vdev)
free_pkts(pkts, rx_count);
 }
 
+uint16_t sync_dequeue_pkts(struct vhost_dev *dev, uint16_t queue_id,
+   struct rte_mempool *mbuf_pool,
+   

[dpdk-dev] [PATCH v3 2/4] examples/vhost: use a new API to query remaining ring space

2021-06-22 Thread Wenwu Ma
A new API for querying the remaining descriptor ring capacity
is available, so we use the new one instead of the old one.

Signed-off-by: Wenwu Ma 
---
 examples/vhost/ioat.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index 2a2c2d7202..bf4e033bdb 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -17,7 +17,6 @@ struct packet_tracker {
unsigned short next_read;
unsigned short next_write;
unsigned short last_remain;
-   unsigned short ioat_space;
 };
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
@@ -113,7 +112,6 @@ open_ioat(const char *value)
goto out;
}
rte_rawdev_start(dev_id);
-   cb_tracker[dev_id].ioat_space = IOAT_RING_SIZE - 1;
dma_info->nr++;
i++;
}
@@ -140,7 +138,7 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
src = descs[i_desc].src;
dst = descs[i_desc].dst;
i_seg = 0;
-   if (cb_tracker[dev_id].ioat_space < src->nr_segs)
+   if (rte_ioat_burst_capacity(dev_id) < src->nr_segs)
break;
while (i_seg < src->nr_segs) {
rte_ioat_enqueue_copy(dev_id,
@@ -155,7 +153,6 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
}
write &= mask;
cb_tracker[dev_id].size_track[write] = src->nr_segs;
-   cb_tracker[dev_id].ioat_space -= src->nr_segs;
write++;
}
} else {
@@ -194,7 +191,6 @@ ioat_check_completed_copies_cb(int vid, uint16_t queue_id,
if (n_seg == 0)
return 0;
 
-   cb_tracker[dev_id].ioat_space += n_seg;
n_seg += cb_tracker[dev_id].last_remain;
 
read = cb_tracker[dev_id].next_read;
-- 
2.25.1



[dpdk-dev] [PATCH v3 3/4] vhost: support async dequeue for split ring

2021-06-22 Thread Wenwu Ma
From: Yuan Wang 

This patch implements asynchronous dequeue data path for split ring.
A new asynchronous dequeue function is introduced. With this function,
the application can try to receive packets from the guest with
offloading large copies to the DMA engine, thus saving precious CPU
cycles.

Signed-off-by: Yuan Wang 
Signed-off-by: Jiayu Hu 
Signed-off-by: Wenwu Ma 
---
 doc/guides/prog_guide/vhost_lib.rst |  10 +
 lib/vhost/rte_vhost_async.h |  44 ++-
 lib/vhost/version.map   |   3 +
 lib/vhost/virtio_net.c  | 579 
 4 files changed, 633 insertions(+), 3 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst 
b/doc/guides/prog_guide/vhost_lib.rst
index d18fb98910..05c42c9b11 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -281,6 +281,16 @@ The following is an overview of some key Vhost API 
functions:
   Poll enqueue completion status from async data path. Completed packets
   are returned to applications through ``pkts``.
 
+* ``rte_vhost_async_try_dequeue_burst(vid, queue_id, mbuf_pool, pkts, count, 
nr_inflight)``
+
+  Try to receive packets from the guest with offloading large packets
+  to the DMA engine. Successfully dequeued packets are transfer
+  completed and returned in ``pkts``. But there may be other packets
+  that are sent from the guest but being transferred by the DMA engine,
+  called in-flight packets. This function will return in-flight packets
+  only after the DMA engine finishes transferring. The amount of
+  in-flight packets by now is returned in ``nr_inflight``.
+
 Vhost-user Implementations
 --
 
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index 6faa31f5ad..58019408f1 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -84,13 +84,21 @@ struct rte_vhost_async_channel_ops {
 };
 
 /**
- * inflight async packet information
+ * in-flight async packet information
  */
+struct async_nethdr {
+   struct virtio_net_hdr hdr;
+   bool valid;
+};
+
 struct async_inflight_info {
struct rte_mbuf *mbuf;
-   uint16_t descs; /* num of descs inflight */
+   union {
+   uint16_t descs; /* num of descs in-flight */
+   struct async_nethdr nethdr;
+   };
uint16_t nr_buffers; /* num of buffers inflight for packed ring */
-};
+} __rte_cache_aligned;
 
 /**
  *  dma channel feature bit definition
@@ -193,4 +201,34 @@ __rte_experimental
 uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id,
struct rte_mbuf **pkts, uint16_t count);
 
+/**
+ * This function tries to receive packets from the guest with offloading
+ * large copies to the DMA engine. Successfully dequeued packets are
+ * transfer completed, either by the CPU or the DMA engine, and they are
+ * returned in "pkts". There may be other packets that are sent from
+ * the guest but being transferred by the DMA engine, called in-flight
+ * packets. The amount of in-flight packets by now is returned in
+ * "nr_inflight". This function will return in-flight packets only after
+ * the DMA engine finishes transferring.
+ *
+ * @param vid
+ *  id of vhost device to dequeue data
+ * @param queue_id
+ *  queue id to dequeue data
+ * @param pkts
+ *  blank array to keep successfully dequeued packets
+ * @param count
+ *  size of the packet array
+ * @param nr_inflight
+ *  the amount of in-flight packets by now. If error occurred, its
+ *  value is set to -1.
+ * @return
+ *  num of successfully dequeued packets
+ */
+__rte_experimental
+uint16_t
+rte_vhost_async_try_dequeue_burst(int vid, uint16_t queue_id,
+   struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count,
+   int *nr_inflight);
+
 #endif /* _RTE_VHOST_ASYNC_H_ */
diff --git a/lib/vhost/version.map b/lib/vhost/version.map
index 9103a23cd4..a320f889cd 100644
--- a/lib/vhost/version.map
+++ b/lib/vhost/version.map
@@ -79,4 +79,7 @@ EXPERIMENTAL {
 
# added in 21.05
rte_vhost_get_negotiated_protocol_features;
+
+   # added in 21.08
+   rte_vhost_async_try_dequeue_burst;
 };
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index b93482587c..89a6715e7a 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2673,6 +2673,32 @@ virtio_dev_pktmbuf_prep(struct virtio_net *dev, struct 
rte_mbuf *pkt,
return -1;
 }
 
+/*
+ * Allocate a host supported pktmbuf.
+ */
+static __rte_always_inline struct rte_mbuf *
+virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp,
+uint32_t data_len)
+{
+   struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp);
+
+   if (unlikely(pkt == NULL)) {
+   VHOST_LOG_DATA(ERR,
+   "Failed to allocate memory for mbuf.\n");
+   return NULL;
+   }
+
+   if (virtio_dev_pktmbuf_prep(dev, pkt, data_len)) {
+   /* Data does

[dpdk-dev] [PATCH v3 4/4] examples/vhost: support vhost async dequeue data path

2021-06-22 Thread Wenwu Ma
This patch is to add vhost async dequeue data-path in vhost sample.
vswitch can leverage IOAT to accelerate vhost async dequeue data-path.

Signed-off-by: Wenwu Ma 
---
 doc/guides/sample_app_ug/vhost.rst |   9 +-
 examples/vhost/ioat.c  |  61 ++---
 examples/vhost/ioat.h  |  25 ++
 examples/vhost/main.c  | 140 -
 4 files changed, 177 insertions(+), 58 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst 
b/doc/guides/sample_app_ug/vhost.rst
index 9afde9c7f5..63dcf181e1 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -169,9 +169,12 @@ demonstrates how to use the async vhost APIs. It's used in 
combination with dmas
 **--dmas**
 This parameter is used to specify the assigned DMA device of a vhost device.
 Async vhost-user net driver will be used if --dmas is set. For example
---dmas [txd0@00:04.0,txd1@00:04.1] means use DMA channel 00:04.0 for vhost
-device 0 enqueue operation and use DMA channel 00:04.1 for vhost device 1
-enqueue operation.
+--dmas [txd0@00:04.0,txd1@00:04.1,rxd0@00:04.2,rxd1@00:04.3] means use
+DMA channel 00:04.0/00:04.2 for vhost device 0 enqueue/dequeue operation
+and use DMA channel 00:04.1/00:04.3 for vhost device 1 enqueue/dequeue
+operation. The index of the device corresponds to the socket file in order,
+that means vhost device 0 is created through the first socket file, vhost
+device 1 is created through the second socket file, and so on.
 
 Common Issues
 -
diff --git a/examples/vhost/ioat.c b/examples/vhost/ioat.c
index bf4e033bdb..a305100b47 100644
--- a/examples/vhost/ioat.c
+++ b/examples/vhost/ioat.c
@@ -21,6 +21,8 @@ struct packet_tracker {
 
 struct packet_tracker cb_tracker[MAX_VHOST_DEVICE];
 
+int vid2socketid[MAX_VHOST_DEVICE];
+
 int
 open_ioat(const char *value)
 {
@@ -29,7 +31,7 @@ open_ioat(const char *value)
char *addrs = input;
char *ptrs[2];
char *start, *end, *substr;
-   int64_t vid, vring_id;
+   int64_t socketid, vring_id;
struct rte_ioat_rawdev_config config;
struct rte_rawdev_info info = { .dev_private = &config };
char name[32];
@@ -60,6 +62,8 @@ open_ioat(const char *value)
goto out;
}
while (i < args_nr) {
+   char *txd, *rxd;
+   bool is_txd;
char *arg_temp = dma_arg[i];
uint8_t sub_nr;
sub_nr = rte_strsplit(arg_temp, strlen(arg_temp), ptrs, 2, '@');
@@ -68,27 +72,38 @@ open_ioat(const char *value)
goto out;
}
 
-   start = strstr(ptrs[0], "txd");
-   if (start == NULL) {
+   int async_flag;
+   txd = strstr(ptrs[0], "txd");
+   rxd = strstr(ptrs[0], "rxd");
+   if (txd == NULL && rxd == NULL) {
ret = -1;
goto out;
+   } else if (txd) {
+   is_txd = true;
+   start = txd;
+   async_flag = ASYNC_RX_VHOST;
+   } else {
+   is_txd = false;
+   start = rxd;
+   async_flag = ASYNC_TX_VHOST;
}
 
start += 3;
-   vid = strtol(start, &end, 0);
+   socketid = strtol(start, &end, 0);
if (end == start) {
ret = -1;
goto out;
}
 
-   vring_id = 0 + VIRTIO_RXQ;
+   vring_id = is_txd ? VIRTIO_RXQ : VIRTIO_TXQ;
+
if (rte_pci_addr_parse(ptrs[1],
-   &(dma_info + vid)->dmas[vring_id].addr) < 0) {
+   &(dma_info + socketid)->dmas[vring_id].addr) < 0) {
ret = -1;
goto out;
}
 
-   rte_pci_device_name(&(dma_info + vid)->dmas[vring_id].addr,
+   rte_pci_device_name(&(dma_info + socketid)->dmas[vring_id].addr,
name, sizeof(name));
dev_id = rte_rawdev_get_dev_id(name);
if (dev_id == (uint16_t)(-ENODEV) ||
@@ -103,8 +118,9 @@ open_ioat(const char *value)
goto out;
}
 
-   (dma_info + vid)->dmas[vring_id].dev_id = dev_id;
-   (dma_info + vid)->dmas[vring_id].is_valid = true;
+   (dma_info + socketid)->dmas[vring_id].dev_id = dev_id;
+   (dma_info + socketid)->dmas[vring_id].is_valid = true;
+   (dma_info + socketid)->async_flag |= async_flag;
config.ring_size = IOAT_RING_SIZE;
config.hdls_disable = true;
if (rte_rawdev_configure(dev_id, &info, sizeof(config)) < 0) {
@@ -126,13 +142,16 @@ ioat_transfer_data_cb(int vid, uint16_t queue_id,
struct rte_vhost

Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-22 Thread fengchengwen
On 2021/6/23 1:25, Jerin Jacob wrote:
> On Fri, Jun 18, 2021 at 3:11 PM fengchengwen  wrote:
>>
>> On 2021/6/18 13:52, Jerin Jacob wrote:
>>> On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson
>>>  wrote:

 On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote:
> On Wed, Jun 16, 2021 at 3:47 PM fengchengwen  
> wrote:
>>
>> On 2021/6/16 15:09, Morten Brørup wrote:
 From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson
 Sent: Tuesday, 15 June 2021 18.39

 On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> This patch introduces 'dmadevice' which is a generic type of DMA
> device.
>
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
>
> Signed-off-by: Chengwen Feng 
> ---
 Thanks for sending this.

 Of most interest to me right now are the key data-plane APIs. While we
 are
 still in the prototyping phase, below is a draft of what we are
 thinking
 for the key enqueue/perform_ops/completed_ops APIs.

 Some key differences I note in below vs your original RFC:
 * Use of void pointers rather than iova addresses. While using iova's
 makes
   sense in the general case when using hardware, in that it can work
 with
   both physical addresses and virtual addresses, if we change the APIs
 to use
   void pointers instead it will still work for DPDK in VA mode, while
 at the
   same time allow use of software fallbacks in error cases, and also a
 stub
   driver than uses memcpy in the background. Finally, using iova's
 makes the
   APIs a lot more awkward to use with anything but mbufs or similar
 buffers
   where we already have a pre-computed physical address.
 * Use of id values rather than user-provided handles. Allowing the
 user/app
   to manage the amount of data stored per operation is a better
 solution, I
   feel than proscribing a certain about of in-driver tracking. Some
 apps may
   not care about anything other than a job being completed, while other
 apps
   may have significant metadata to be tracked. Taking the user-context
   handles out of the API also makes the driver code simpler.
 * I've kept a single combined API for completions, which differs from
 the
   separate error handling completion API you propose. I need to give
 the
   two function approach a bit of thought, but likely both could work.
 If we
   (likely) never expect failed ops, then the specifics of error
 handling
   should not matter that much.

 For the rest, the control / setup APIs are likely to be rather
 uncontroversial, I suspect. However, I think that rather than xstats
 APIs,
 the library should first provide a set of standardized stats like
 ethdev
 does. If driver-specific stats are needed, we can add xstats later to
 the
 API.

 Appreciate your further thoughts on this, thanks.

 Regards,
 /Bruce
>>>
>>> I generally agree with Bruce's points above.
>>>
>>> I would like to share a couple of ideas for further discussion:
>
>
> I believe some of the other requirements and comments for generic DMA 
> will be
>
> 1) Support for the _channel_, Each channel may have different
> capabilities and functionalities.
> Typical cases are, each channel have separate source and destination
> devices like
> DMA between PCIe EP to Host memory, Host memory to Host memory, PCIe
> EP to PCIe EP.
> So we need some notion of the channel in the specification.
>

 Can you share a bit more detail on what constitutes a channel in this case?
 Is it equivalent to a device queue (which we are flattening to individual
 devices in this API), or to a specific configuration on a queue?
>>>
>>> It not a queue. It is one of the attributes for transfer.
>>> I.e in the same queue, for a given transfer it can specify the
>>> different "source" and "destination" device.
>>> Like CPU to Sound card, CPU to network card etc.
>>>
>>>

> 2) I assume current data plane APIs are not thread-safe. Is it right?
>
 Yes.

>
> 3) Cookie scheme outlined earlier looks good to me. Instead of having
> generic dequeue() API
>
> 4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void * src,
> void * dst, unsigned int length);
> to two stage API like, Where one will be used in fastpath and other
> one will use used in slowpath.
>
> - slowpath API will for take c

Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

2021-06-22 Thread fengchengwen
On 2021/6/23 1:51, Jerin Jacob wrote:
> On Fri, Jun 18, 2021 at 2:22 PM fengchengwen  wrote:
>>
>> On 2021/6/17 22:18, Bruce Richardson wrote:
>>> On Thu, Jun 17, 2021 at 12:02:00PM +0100, Bruce Richardson wrote:
 On Thu, Jun 17, 2021 at 05:48:05PM +0800, fengchengwen wrote:
> On 2021/6/17 1:31, Bruce Richardson wrote:
>> On Wed, Jun 16, 2021 at 05:41:45PM +0800, fengchengwen wrote:
>>> On 2021/6/16 0:38, Bruce Richardson wrote:
 On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> This patch introduces 'dmadevice' which is a generic type of DMA
> device.
>
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
>
> Signed-off-by: Chengwen Feng 
> ---
 Thanks for sending this.

 Of most interest to me right now are the key data-plane APIs. While we 
 are
 still in the prototyping phase, below is a draft of what we are 
 thinking
 for the key enqueue/perform_ops/completed_ops APIs.

 Some key differences I note in below vs your original RFC:
 * Use of void pointers rather than iova addresses. While using iova's 
 makes
   sense in the general case when using hardware, in that it can work 
 with
   both physical addresses and virtual addresses, if we change the APIs 
 to use
   void pointers instead it will still work for DPDK in VA mode, while 
 at the
   same time allow use of software fallbacks in error cases, and also a 
 stub
   driver than uses memcpy in the background. Finally, using iova's 
 makes the
   APIs a lot more awkward to use with anything but mbufs or similar 
 buffers
   where we already have a pre-computed physical address.
>>>
>>> The iova is an hint to application, and widely used in DPDK.
>>> If switch to void, how to pass the address (iova or just va ?)
>>> this may introduce implementation dependencies here.
>>>
>>> Or always pass the va, and the driver performs address translation, and 
>>> this
>>> translation may cost too much cpu I think.
>>>
>>
>> On the latter point, about driver doing address translation I would 
>> agree.
>> However, we probably need more discussion about the use of iova vs just
>> virtual addresses. My thinking on this is that if we specify the API 
>> using
>> iovas it will severely hurt usability of the API, since it forces the 
>> user
>> to take more inefficient codepaths in a large number of cases. Given a
>> pointer to the middle of an mbuf, one cannot just pass that straight as 
>> an
>> iova but must instead do a translation into offset from mbuf pointer and
>> then readd the offset to the mbuf base address.
>>
>> My preference therefore is to require the use of an IOMMU when using a
>> dmadev, so that it can be a much closer analog of memcpy. Once an iommu 
>> is
>> present, DPDK will run in VA mode, allowing virtual addresses to our
>> hugepage memory to be sent directly to hardware. Also, when using
>> dmadevs on top of an in-kernel driver, that kernel driver may do all 
>> iommu
>> management for the app, removing further the restrictions on what memory
>> can be addressed by hardware.
>
> Some DMA devices many don't support IOMMU or IOMMU bypass default, so 
> driver may
> should call rte_mem_virt2phy() do the address translate, but the 
> rte_mem_virt2phy()
> cost too many CPU cycles.
>
> If the API defined as iova, it will work fine in:
> 1) If DMA don't support IOMMU or IOMMU bypass, then start application with
>--iova-mode=pa
> 2) If DMA support IOMMU, --iova-mode=pa/va work both fine
>

 I suppose if we keep the iova as the datatype, we can just cast "void *"
 pointers to that in the case that virtual addresses can be used directly. I
 believe your RFC included a capability query API - "uses void * as iova"
 should probably be one of those capabilities, and that would resolve this.
 If DPDK is in iova=va mode because of the presence of an iommu, all drivers
 could report this capability too.

>>
 * Use of id values rather than user-provided handles. Allowing the 
 user/app
   to manage the amount of data stored per operation is a better 
 solution, I
   feel than proscribing a certain about of in-driver tracking. Some 
 apps may
   not care about anything other than a job being completed, while 
 other apps
   may have significant metadata to be tracked. Taking the user-context
   handles out of the API also makes the driver code simpler.
>>>
>>> The user-provided handle was mainly used to

[dpdk-dev] [PATCH v4 0/2] power: add support for cppc cpufreq driver

2021-06-22 Thread Richael Zhuang
v4:
rebase on Anatoly's patch: http://dpdk.org/patch/94676

Richael Zhuang (2):
  power: add support for cppc cpufreq
  test/power: round cpuinfo cur freq only when using  CPPC cpufreq

 app/test/test_power.c  |   3 +-
 app/test/test_power_cpufreq.c  |  26 +-
 lib/power/meson.build  |   1 +
 lib/power/power_cppc_cpufreq.c | 632 +
 lib/power/power_cppc_cpufreq.h | 229 
 lib/power/rte_power.c  |  26 ++
 lib/power/rte_power.h  |   2 +-
 7 files changed, 907 insertions(+), 12 deletions(-)
 create mode 100644 lib/power/power_cppc_cpufreq.c
 create mode 100644 lib/power/power_cppc_cpufreq.h

-- 
2.20.1



[dpdk-dev] [PATCH v4 1/2] power: add support for cppc cpufreq

2021-06-22 Thread Richael Zhuang
Currently in DPDK only acpi_cpufreq and pstate_cpufreq drivers are
supported, which are both not available on arm64 platforms. Add
support for cppc_cpufreq driver which works on most arm64 platforms.

Signed-off-by: Richael Zhuang 
---
 app/test/test_power.c  |   3 +-
 app/test/test_power_cpufreq.c  |   3 +-
 lib/power/meson.build  |   1 +
 lib/power/power_cppc_cpufreq.c | 632 +
 lib/power/power_cppc_cpufreq.h | 229 
 lib/power/rte_power.c  |  26 ++
 lib/power/rte_power.h  |   2 +-
 7 files changed, 893 insertions(+), 3 deletions(-)
 create mode 100644 lib/power/power_cppc_cpufreq.c
 create mode 100644 lib/power/power_cppc_cpufreq.h

diff --git a/app/test/test_power.c b/app/test/test_power.c
index da1d67c0a..b7b556134 100644
--- a/app/test/test_power.c
+++ b/app/test/test_power.c
@@ -133,7 +133,8 @@ test_power(void)
/* Perform tests for valid environments.*/
const enum power_management_env envs[] = {PM_ENV_ACPI_CPUFREQ,
PM_ENV_KVM_VM,
-   PM_ENV_PSTATE_CPUFREQ};
+   PM_ENV_PSTATE_CPUFREQ,
+   PM_ENV_CPPC_CPUFREQ};
 
unsigned int i;
for (i = 0; i < RTE_DIM(envs); ++i) {
diff --git a/app/test/test_power_cpufreq.c b/app/test/test_power_cpufreq.c
index 0c3adc5f3..8516df4ca 100644
--- a/app/test/test_power_cpufreq.c
+++ b/app/test/test_power_cpufreq.c
@@ -496,7 +496,8 @@ test_power_cpufreq(void)
 
/* Test environment configuration */
env = rte_power_get_env();
-   if ((env != PM_ENV_ACPI_CPUFREQ) && (env != PM_ENV_PSTATE_CPUFREQ)) {
+   if ((env != PM_ENV_ACPI_CPUFREQ) && (env != PM_ENV_PSTATE_CPUFREQ) &&
+   (env != PM_ENV_CPPC_CPUFREQ)) {
printf("Unexpectedly got an environment other than 
ACPI/PSTATE\n");
goto fail_all;
}
diff --git a/lib/power/meson.build b/lib/power/meson.build
index 74c5f3a29..4a5b07292 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -21,6 +21,7 @@ sources = files(
 'rte_power.c',
 'rte_power_empty_poll.c',
 'rte_power_pmd_mgmt.c',
+   'power_cppc_cpufreq.c',
 )
 headers = files(
 'rte_power.h',
diff --git a/lib/power/power_cppc_cpufreq.c b/lib/power/power_cppc_cpufreq.c
new file mode 100644
index 0..fd4483e52
--- /dev/null
+++ b/lib/power/power_cppc_cpufreq.c
@@ -0,0 +1,632 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Arm Limited
+ */
+
+#include 
+#include 
+
+#include "power_cppc_cpufreq.h"
+#include "power_common.h"
+
+/* macros used for rounding frequency to nearest 10 */
+#define FREQ_ROUNDING_DELTA 5
+#define ROUND_FREQ_TO_N_10 10
+
+/* the unit of highest_perf and nominal_perf differs on different arm 
platforms.
+ * For highest_perf, it maybe 300 or 300, both means 3.0GHz.
+ */
+#define UNIT_DIFF 1
+
+#define POWER_CONVERT_TO_DECIMAL 10
+
+#define POWER_GOVERNOR_USERSPACE "userspace"
+#define POWER_SYSFILE_SETSPEED   \
+   "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_setspeed"
+#define POWER_SYSFILE_SCALING_MAX_FREQ \
+   "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_max_freq"
+#define POWER_SYSFILE_SCALING_MIN_FREQ  \
+   "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_min_freq"
+#define POWER_SYSFILE_HIGHEST_PERF \
+   "/sys/devices/system/cpu/cpu%u/acpi_cppc/highest_perf"
+#define POWER_SYSFILE_NOMINAL_PERF \
+   "/sys/devices/system/cpu/cpu%u/acpi_cppc/nominal_perf"
+#define POWER_SYSFILE_SYS_MAX \
+   "/sys/devices/system/cpu/cpu%u/cpufreq/cpuinfo_max_freq"
+
+#define POWER_CPPC_DRIVER "cppc-cpufreq"
+#define BUS_FREQ 10
+
+enum power_state {
+   POWER_IDLE = 0,
+   POWER_ONGOING,
+   POWER_USED,
+   POWER_UNKNOWN
+};
+
+/**
+ * Power info per lcore.
+ */
+struct cppc_power_info {
+   unsigned int lcore_id;   /**< Logical core id */
+   uint32_t state;  /**< Power in use state */
+   FILE *f; /**< FD of scaling_setspeed */
+   char governor_ori[32];   /**< Original governor name */
+   uint32_t curr_idx;   /**< Freq index in freqs array */
+   uint32_t highest_perf;   /**< system wide max freq */
+   uint32_t nominal_perf;   /**< system wide nominal freq */
+   uint16_t turbo_available;/**< Turbo Boost available */
+   uint16_t turbo_enable;   /**< Turbo Boost enable/disable */
+   uint32_t nb_freqs;   /**< number of available freqs */
+   uint32_t freqs[RTE_MAX_LCORE_FREQS]; /**< Frequency array */
+} __rte_cache_aligned;
+
+static struct cppc_power_info lcore_power_info[RTE_MAX_LCORE];
+
+/**
+ * It is to set specific freq for specific logical core, according to the index
+ * of supported frequencies.
+ 

[dpdk-dev] [PATCH v4 2/2] test/power: round cpuinfo cur freq only when using CPPC cpufreq

2021-06-22 Thread Richael Zhuang
On arm platform, the value in "/sys/.../cpuinfo_cur_freq" may not
be exactly the same as what was set when using CPPC cpufreq driver.
For other cpufreq driver, no need to round it currently, or else
this check will fail with turbo enabled. For example, with acpi_cpufreq,
cpuinfo_cur_freq can be 2401000 which is equal to freqs[0].It should
not be rounded to 240.

Fixes: 606a234c6d360 ("test/power: round CPU frequency to check")
Cc: richael.zhu...@arm.com
Cc: sta...@dpdk.org

Signed-off-by: Richael Zhuang 
---
 app/test/test_power_cpufreq.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/app/test/test_power_cpufreq.c b/app/test/test_power_cpufreq.c
index 8516df4ca..b8fc53925 100644
--- a/app/test/test_power_cpufreq.c
+++ b/app/test/test_power_cpufreq.c
@@ -55,7 +55,9 @@ check_cur_freq(unsigned int lcore_id, uint32_t idx, bool 
turbo)
FILE *f;
char fullpath[PATH_MAX];
char buf[BUFSIZ];
+   enum power_management_env env;
uint32_t cur_freq;
+   uint32_t freq_conv;
int ret = -1;
int i;
 
@@ -80,15 +82,18 @@ check_cur_freq(unsigned int lcore_id, uint32_t idx, bool 
turbo)
goto fail_all;
 
cur_freq = strtoul(buf, NULL, TEST_POWER_CONVERT_TO_DECIMAL);
-
-   /* convert the frequency to nearest 10 value
-* Ex: if cur_freq=1396789 then freq_conv=140
-* Ex: if cur_freq=800030 then freq_conv=80
-*/
-   unsigned int freq_conv = 0;
-   freq_conv = (cur_freq + TEST_FREQ_ROUNDING_DELTA)
-   / TEST_ROUND_FREQ_TO_N_10;
-   freq_conv = freq_conv * TEST_ROUND_FREQ_TO_N_10;
+   freq_conv = cur_freq;
+
+   env = rte_power_get_env();
+   if (env == PM_ENV_CPPC_CPUFREQ) {
+   /* convert the frequency to nearest 10 value
+* Ex: if cur_freq=1396789 then freq_conv=140
+* Ex: if cur_freq=800030 then freq_conv=80
+*/
+   freq_conv = (cur_freq + TEST_FREQ_ROUNDING_DELTA)
+   / TEST_ROUND_FREQ_TO_N_10;
+   freq_conv = freq_conv * TEST_ROUND_FREQ_TO_N_10;
+   }
 
if (turbo)
ret = (freqs[idx] <= freq_conv ? 0 : -1);
-- 
2.20.1



[dpdk-dev] [PATCH v4 00/62] Marvell CNXK Ethdev Driver

2021-06-22 Thread Nithin Dabilpuram
This patchset adds support for Marvell CN106XX SoC based on 'common/cnxk'
driver. In future, CN9K a.k.a octeontx2 will also be supported by same
driver when code is ready and 'net/octeontx2' will be deprecated.

Harman Kalra (1):
  common/cnxk: allocate lmt region in userspace

Jerin Jacob (7):
  common/cnxk: fix batch alloc completion poll logic
  net/cnxk: add Rx burst for cn9k
  net/cnxk: add Rx vector version for cn9k
  net/cnxk: add Tx burst for cn9k
  net/cnxk: add Rx burst for cn10k
  net/cnxk: add Rx vector version for cn10k
  net/cnxk: add Tx burst for cn10k

Kiran Kumar K (2):
  net/cnxk: add support to configure npc
  net/cnxk: support initial version of rte flow

Nithin Dabilpuram (18):
  common/cnxk: change model API to not use camel case
  net/cnxk: add build infra and common probe
  net/cnxk: add platform specific probe and remove
  net/cnxk: add common devargs parsing function
  net/cnxk: support common dev infos get
  net/cnxk: add device configuration operation
  net/cnxk: support link status update
  net/cnxk: add Rx queue setup and release
  net/cnxk: add Tx queue setup and release
  net/cnxk: support packet type
  net/cnxk: support queue start and stop
  net/cnxk: add Rx multi-segmented version for cn9k
  net/cnxk: add Tx multi-segment version for cn9k
  net/cnxk: add Tx vector version for cn9k
  net/cnxk: add Rx multi-segment version for cn10k
  net/cnxk: add Tx multi-segment version for cn10k
  net/cnxk: add Tx vector version for cn10k
  net/cnxk: add device start and stop operations

Satha Rao (8):
  common/cnxk: add support to lock NIX RQ contexts
  common/cnxk: add provision to enable RED on RQ
  net/cnxk: add port/queue stats
  net/cnxk: add xstats apis
  net/cnxk: add rxq/txq info get operations
  net/cnxk: add ethdev firmware version get
  net/cnxk: add get register operation
  net/cnxk: added RETA and RSS hash operations

Satheesh Paul (6):
  common/cnxk: add support to dump flow entries
  common/cnxk: support for mark and flag flow actions
  common/cnxk: support for VLAN push and pop flow actions
  net/cnxk: add flow ops get operation
  net/cnxk: support for RSS in rte flow
  net/cnxk: support marking and VLAN tagging

Sunil Kumar Kori (20):
  net/cnxk: add MAC address set ops
  net/cnxk: add MTU set device operation
  net/cnxk: add promiscuous mode enable and disable
  net/cnxk: support DMAC filter
  net/cnxk: add all multicast enable/disable ethops
  net/cnxk: add Rx/Tx burst mode get ops
  net/cnxk: add flow ctrl set/get ops
  net/cnxk: add link up/down operations
  net/cnxk: add EEPROM module info get operations
  net/cnxk: add Rx queue interrupt enable/disable ops
  net/cnxk: add validation API for mempool ops
  net/cnxk: add device close and reset operations
  net/cnxk: add pending Tx mbuf cleanup operation
  net/cnxk: register callback to get PTP status
  net/cnxk: support base PTP timesync
  net/cnxk: add timesync enable/disable operations
  net/cnxk: add Rx/Tx timestamp read operations
  net/cnxk: add time read/write/adjust operations
  net/cnxk: add read clock operation
  net/cnxk: support multicast filter

--

v4:
- Fixed build issue with gcc 4.8
- Shortened subject lines of few commits
- Removed camel case for model API
- Updated rte_flow features in cnxk_vec.ini and cnxk_vf.ini
- Added CC stable to "fix batch alloc.." patch
- Squashed cn98xx flow create related common patch to
  VLAN push and pop flow actions patch.
- Changed INTERNAL to DPDK_21 in version.map

v3:
- Updated release notes
- Removed RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS flag and add support for queue
  stats in xstats
- Fixed issue with LSO format indices
- Removed mbox sync changes patch from this series
- Fixed documentation issues
- Removed repetitive code in fast path SIMD
- Optimize cn10k LMTST logic
- Make rte_flow_create implementation specific
  to handle VLAN Stripping and MARK actions/offloads
- Use rte_atomic_thread_fence() instead of rte_rmb()
- Handle other comments from Jerin.
- Merged rte flow dump API patch to flow ops get patch
- Added marking and vlan tagging support.
- Fixed some checkpatch and git check log issues.

v2:
- Fixed issue with flow validate and flow create for 98xx
- Fixed issue batch alloc logic
- Fix lmtline allocation to be cached
- Sync Inline IPSec Rx mbox with kernel
- Add support for mark and flag flow actions
- Add reta key and hash update ops
- Added PTP and multicast filter support
 
 MAINTAINERS |5 +-
 doc/guides/nics/cnxk.rst|  232 +
 doc/guides/nics/features/cnxk.ini   |   90 ++
 doc/guides/nics/features/cnxk_vec.ini   |   86 ++
 doc/guides/nics/features/cnxk_vf.ini|   82 ++
 doc/guides/nics/index.rst   |1 +
 doc/guides/platform/cnxk.rst|3 +
 doc/guides/rel_notes/release_21_08.rst  |5 +
 drivers/common/cnxk/hw/npc.h|2 +
 drivers/common/cnxk/meson.build |1 +
 drivers/common/cnxk/roc_api.h   |2 +
 drivers/common/cnxk/roc_dev.

[dpdk-dev] [PATCH v4 01/62] common/cnxk: add support to lock NIX RQ contexts

2021-06-22 Thread Nithin Dabilpuram
From: Satha Rao 

This patch will consider device argument to lock rss table
in NIX.

This patch also adds few misc fixes such as disabling NIX Tx
vlan insertion conf in SMQ, enabling SSO in NIX Tx SQ
for Tx completions and TM related stats API.

Signed-off-by: Satha Rao 
---
 drivers/common/cnxk/roc_nix.h  | 31 ++--
 drivers/common/cnxk/roc_nix_queue.c|  2 +
 drivers/common/cnxk/roc_nix_rss.c  | 51 ++--
 drivers/common/cnxk/roc_nix_tm_utils.c | 86 +-
 drivers/common/cnxk/roc_platform.h |  2 +
 drivers/common/cnxk/version.map|  1 +
 6 files changed, 163 insertions(+), 10 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix.h b/drivers/common/cnxk/roc_nix.h
index b39f461..6d9ac10 100644
--- a/drivers/common/cnxk/roc_nix.h
+++ b/drivers/common/cnxk/roc_nix.h
@@ -85,10 +85,11 @@ struct roc_nix_eeprom_info {
 #define ROC_NIX_LF_RX_CFG_LEN_OL3 BIT_ULL(41)
 
 /* Group 0 will be used for RSS, 1 -7 will be used for npc_flow RSS action*/
-#define ROC_NIX_RSS_GROUP_DEFAULT 0
-#define ROC_NIX_RSS_GRPS 8
-#define ROC_NIX_RSS_RETA_MAX ROC_NIX_RSS_RETA_SZ_256
-#define ROC_NIX_RSS_KEY_LEN  48 /* 352 Bits */
+#define ROC_NIX_RSS_GROUP_DEFAULT0
+#define ROC_NIX_RSS_GRPS8
+#define ROC_NIX_RSS_RETA_MAXROC_NIX_RSS_RETA_SZ_256
+#define ROC_NIX_RSS_KEY_LEN 48 /* 352 Bits */
+#define ROC_NIX_RSS_MCAM_IDX_DEFAULT (-1)
 
 #define ROC_NIX_DEFAULT_HW_FRS 1514
 
@@ -184,6 +185,7 @@ struct roc_nix_sq {
enum roc_nix_sq_max_sqe_sz max_sqe_sz;
uint32_t nb_desc;
uint16_t qid;
+   bool sso_ena;
/* End of Input parameters */
uint16_t sqes_per_sqb_log2;
struct roc_nix *roc_nix;
@@ -241,6 +243,8 @@ struct roc_nix {
uint16_t max_sqb_count;
enum roc_nix_rss_reta_sz reta_sz;
bool enable_loop;
+   bool hw_vlan_ins;
+   uint8_t lock_rx_ctx;
/* End of input parameters */
/* LMT line base for "Per Core Tx LMT line" mode*/
uintptr_t lmt_base;
@@ -371,6 +375,22 @@ struct roc_nix_tm_shaper_profile {
void (*free_fn)(void *profile);
 };
 
+enum roc_nix_tm_node_stats_type {
+   ROC_NIX_TM_NODE_PKTS_DROPPED,
+   ROC_NIX_TM_NODE_BYTES_DROPPED,
+   ROC_NIX_TM_NODE_GREEN_PKTS,
+   ROC_NIX_TM_NODE_GREEN_BYTES,
+   ROC_NIX_TM_NODE_YELLOW_PKTS,
+   ROC_NIX_TM_NODE_YELLOW_BYTES,
+   ROC_NIX_TM_NODE_RED_PKTS,
+   ROC_NIX_TM_NODE_RED_BYTES,
+   ROC_NIX_TM_NODE_STATS_MAX,
+};
+
+struct roc_nix_tm_node_stats {
+   uint64_t stats[ROC_NIX_TM_NODE_STATS_MAX];
+};
+
 int __roc_api roc_nix_tm_node_add(struct roc_nix *roc_nix,
  struct roc_nix_tm_node *roc_node);
 int __roc_api roc_nix_tm_node_delete(struct roc_nix *roc_nix, uint32_t node_id,
@@ -408,6 +428,9 @@ roc_nix_tm_shaper_profile_get(struct roc_nix *roc_nix, 
uint32_t profile_id);
 struct roc_nix_tm_shaper_profile *__roc_api roc_nix_tm_shaper_profile_next(
struct roc_nix *roc_nix, struct roc_nix_tm_shaper_profile *__prev);
 
+int __roc_api roc_nix_tm_node_stats_get(struct roc_nix *roc_nix,
+   uint32_t node_id, bool clear,
+   struct roc_nix_tm_node_stats *stats);
 /*
  * TM ratelimit tree API.
  */
diff --git a/drivers/common/cnxk/roc_nix_queue.c 
b/drivers/common/cnxk/roc_nix_queue.c
index fbf7efa..1c62aa2 100644
--- a/drivers/common/cnxk/roc_nix_queue.c
+++ b/drivers/common/cnxk/roc_nix_queue.c
@@ -582,6 +582,7 @@ sq_cn9k_init(struct nix *nix, struct roc_nix_sq *sq, 
uint32_t rr_quantum,
aq->sq.default_chan = nix->tx_chan_base;
aq->sq.sqe_stype = NIX_STYPE_STF;
aq->sq.ena = 1;
+   aq->sq.sso_ena = !!sq->sso_ena;
if (aq->sq.max_sqe_size == NIX_MAXSQESZ_W8)
aq->sq.sqe_stype = NIX_STYPE_STP;
aq->sq.sqb_aura = roc_npa_aura_handle_to_aura(sq->aura_handle);
@@ -679,6 +680,7 @@ sq_init(struct nix *nix, struct roc_nix_sq *sq, uint32_t 
rr_quantum,
aq->sq.default_chan = nix->tx_chan_base;
aq->sq.sqe_stype = NIX_STYPE_STF;
aq->sq.ena = 1;
+   aq->sq.sso_ena = !!sq->sso_ena;
if (aq->sq.max_sqe_size == NIX_MAXSQESZ_W8)
aq->sq.sqe_stype = NIX_STYPE_STP;
aq->sq.sqb_aura = roc_npa_aura_handle_to_aura(sq->aura_handle);
diff --git a/drivers/common/cnxk/roc_nix_rss.c 
b/drivers/common/cnxk/roc_nix_rss.c
index 2d7b84a..7de69aa 100644
--- a/drivers/common/cnxk/roc_nix_rss.c
+++ b/drivers/common/cnxk/roc_nix_rss.c
@@ -52,7 +52,7 @@ roc_nix_rss_key_get(struct roc_nix *roc_nix, uint8_t 
key[ROC_NIX_RSS_KEY_LEN])
 
 static int
 nix_cn9k_rss_reta_set(struct nix *nix, uint8_t group,
- uint16_t reta[ROC_NIX_RSS_RETA_MAX])
+ uint16_t reta[ROC_NIX_RSS_RETA_MAX], uint8_t lock_rx_ctx)
 {
struct mbox *mbox = (&nix->dev)->mbox;
struct nix_aq_enq_req *req;
@@ -77,6 +77,27 @@

[dpdk-dev] [PATCH v4 02/62] common/cnxk: fix batch alloc completion poll logic

2021-06-22 Thread Nithin Dabilpuram
From: Jerin Jacob 

The instruction generation was not correct due to
fact that volatile suppose to use with ccode variable
as well.

Change the logic to use gcc atomic builtin to
simplify and avoid explicit volatile from the code.

Fixes: 81af26789316 ("common/cnxk: support NPA batch alloc/free")
Cc: sta...@dpdk.org

Signed-off-by: Jerin Jacob 
Signed-off-by: Ashwin Sekhar T K 
---
 drivers/common/cnxk/roc_npa.c |  2 +-
 drivers/common/cnxk/roc_npa.h | 30 +++---
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/common/cnxk/roc_npa.c b/drivers/common/cnxk/roc_npa.c
index f1e03b7..5ba6e81 100644
--- a/drivers/common/cnxk/roc_npa.c
+++ b/drivers/common/cnxk/roc_npa.c
@@ -236,7 +236,7 @@ npa_aura_pool_pair_alloc(struct npa_lf *lf, const uint32_t 
block_size,
 
/* Block size should be cache line aligned and in range of 128B-128KB */
if (block_size % ROC_ALIGN || block_size < 128 ||
-   block_size > 128 * 1024)
+   block_size > ROC_NPA_MAX_BLOCK_SZ)
return NPA_ERR_INVALID_BLOCK_SZ;
 
pos = 0;
diff --git a/drivers/common/cnxk/roc_npa.h b/drivers/common/cnxk/roc_npa.h
index 89f5c6f..59d6223 100644
--- a/drivers/common/cnxk/roc_npa.h
+++ b/drivers/common/cnxk/roc_npa.h
@@ -8,6 +8,7 @@
 #define ROC_AURA_ID_MASK   (BIT_ULL(16) - 1)
 #define ROC_AURA_OP_LIMIT_MASK (BIT_ULL(36) - 1)
 
+#define ROC_NPA_MAX_BLOCK_SZ  (128 * 1024)
 #define ROC_CN10K_NPA_BATCH_ALLOC_MAX_PTRS 512
 #define ROC_CN10K_NPA_BATCH_FREE_MAX_PTRS  15
 
@@ -219,6 +220,17 @@ roc_npa_aura_batch_alloc_issue(uint64_t aura_handle, 
uint64_t *buf,
return 0;
 }
 
+static inline void
+roc_npa_batch_alloc_wait(uint64_t *cache_line)
+{
+   /* Batch alloc status code is updated in bits [5:6] of the first word
+* of the 128 byte cache line.
+*/
+   while (((__atomic_load_n(cache_line, __ATOMIC_RELAXED) >> 5) & 0x3) ==
+  ALLOC_CCODE_INVAL)
+   ;
+}
+
 static inline unsigned int
 roc_npa_aura_batch_alloc_count(uint64_t *aligned_buf, unsigned int num)
 {
@@ -231,17 +243,10 @@ roc_npa_aura_batch_alloc_count(uint64_t *aligned_buf, 
unsigned int num)
/* Check each ROC cache line one by one */
for (i = 0; i < num; i += (ROC_ALIGN >> 3)) {
struct npa_batch_alloc_status_s *status;
-   int ccode;
 
status = (struct npa_batch_alloc_status_s *)&aligned_buf[i];
 
-   /* Status is updated in first 7 bits of each 128 byte cache
-* line. Wait until the status gets updated.
-*/
-   do {
-   ccode = (volatile int)status->ccode;
-   } while (ccode == ALLOC_CCODE_INVAL);
-
+   roc_npa_batch_alloc_wait(&aligned_buf[i]);
count += status->count;
}
 
@@ -261,16 +266,11 @@ roc_npa_aura_batch_alloc_extract(uint64_t *buf, uint64_t 
*aligned_buf,
/* Check each ROC cache line one by one */
for (i = 0; i < num; i += (ROC_ALIGN >> 3)) {
struct npa_batch_alloc_status_s *status;
-   int line_count, ccode;
+   int line_count;
 
status = (struct npa_batch_alloc_status_s *)&aligned_buf[i];
 
-   /* Status is updated in first 7 bits of each 128 byte cache
-* line. Wait until the status gets updated.
-*/
-   do {
-   ccode = (volatile int)status->ccode;
-   } while (ccode == ALLOC_CCODE_INVAL);
+   roc_npa_batch_alloc_wait(&aligned_buf[i]);
 
line_count = status->count;
 
-- 
2.8.4



[dpdk-dev] [PATCH v4 03/62] common/cnxk: add support to dump flow entries

2021-06-22 Thread Nithin Dabilpuram
From: Satheesh Paul 

Add NPC support API to dump created flow entries.

Signed-off-by: Satheesh Paul 
---
 drivers/common/cnxk/hw/npc.h|   2 +
 drivers/common/cnxk/meson.build |   1 +
 drivers/common/cnxk/roc_npc.c   |  20 ++
 drivers/common/cnxk/roc_npc.h   |  12 +-
 drivers/common/cnxk/roc_npc_mcam_dump.c | 611 
 drivers/common/cnxk/roc_npc_priv.h  |   2 +-
 drivers/common/cnxk/roc_npc_utils.c |   4 +
 drivers/common/cnxk/version.map |   2 +
 8 files changed, 652 insertions(+), 2 deletions(-)
 create mode 100644 drivers/common/cnxk/roc_npc_mcam_dump.c

diff --git a/drivers/common/cnxk/hw/npc.h b/drivers/common/cnxk/hw/npc.h
index e0f06bf..68c5037 100644
--- a/drivers/common/cnxk/hw/npc.h
+++ b/drivers/common/cnxk/hw/npc.h
@@ -193,6 +193,7 @@ enum npc_kpu_lb_ltype {
NPC_LT_LB_EXDSA,
NPC_LT_LB_EXDSA_VLAN,
NPC_LT_LB_FDSA,
+   NPC_LT_LB_VLAN_EXDSA,
NPC_LT_LB_CUSTOM0 = 0xE,
NPC_LT_LB_CUSTOM1 = 0xF,
 };
@@ -208,6 +209,7 @@ enum npc_kpu_lc_ltype {
NPC_LT_LC_MPLS,
NPC_LT_LC_NSH,
NPC_LT_LC_FCOE,
+   NPC_LT_LC_NGIO,
NPC_LT_LC_CUSTOM0 = 0xE,
NPC_LT_LC_CUSTOM1 = 0xF,
 };
diff --git a/drivers/common/cnxk/meson.build b/drivers/common/cnxk/meson.build
index 178bce7..e7ab79f 100644
--- a/drivers/common/cnxk/meson.build
+++ b/drivers/common/cnxk/meson.build
@@ -37,6 +37,7 @@ sources = files(
 'roc_npa_irq.c',
 'roc_npc.c',
 'roc_npc_mcam.c',
+'roc_npc_mcam_dump.c',
 'roc_npc_parse.c',
 'roc_npc_utils.c',
 'roc_platform.c',
diff --git a/drivers/common/cnxk/roc_npc.c b/drivers/common/cnxk/roc_npc.c
index abaef77..81c7fd9 100644
--- a/drivers/common/cnxk/roc_npc.c
+++ b/drivers/common/cnxk/roc_npc.c
@@ -870,3 +870,23 @@ roc_npc_flow_destroy(struct roc_npc *roc_npc, struct 
roc_npc_flow *flow)
plt_free(flow);
return 0;
 }
+
+void
+roc_npc_flow_dump(FILE *file, struct roc_npc *roc_npc)
+{
+   struct npc *npc = roc_npc_to_npc_priv(roc_npc);
+   struct roc_npc_flow *flow_iter;
+   struct npc_flow_list *list;
+   uint32_t max_prio, i;
+
+   max_prio = npc->flow_max_priority;
+
+   for (i = 0; i < max_prio; i++) {
+   list = &npc->flow_list[i];
+
+   /* List in ascending order of mcam entries */
+   TAILQ_FOREACH(flow_iter, list, next) {
+   roc_npc_flow_mcam_dump(file, roc_npc, flow_iter);
+   }
+   }
+}
diff --git a/drivers/common/cnxk/roc_npc.h b/drivers/common/cnxk/roc_npc.h
index 223c4ba..115bcd5 100644
--- a/drivers/common/cnxk/roc_npc.h
+++ b/drivers/common/cnxk/roc_npc.h
@@ -90,6 +90,11 @@ struct roc_npc_attr {
uint32_t reserved : 30; /**< Reserved, must be zero. */
 };
 
+struct roc_npc_flow_dump_data {
+   uint8_t lid;
+   uint16_t ltype;
+};
+
 struct roc_npc_flow {
uint8_t nix_intf;
uint8_t enable;
@@ -102,6 +107,9 @@ struct roc_npc_flow {
uint64_t mcam_mask[ROC_NPC_MAX_MCAM_WIDTH_DWORDS];
uint64_t npc_action;
uint64_t vtag_action;
+#define ROC_NPC_MAX_FLOW_PATTERNS 32
+   struct roc_npc_flow_dump_data dump_data[ROC_NPC_MAX_FLOW_PATTERNS];
+   uint16_t num_patterns;
 
TAILQ_ENTRY(roc_npc_flow) next;
 };
@@ -185,5 +193,7 @@ int __roc_api roc_npc_mcam_clear_counter(struct roc_npc 
*roc_npc,
 uint32_t ctr_id);
 
 int __roc_api roc_npc_mcam_free_all_resources(struct roc_npc *roc_npc);
-
+void __roc_api roc_npc_flow_dump(FILE *file, struct roc_npc *roc_npc);
+void __roc_api roc_npc_flow_mcam_dump(FILE *file, struct roc_npc *roc_npc,
+ struct roc_npc_flow *mcam);
 #endif /* _ROC_NPC_H_ */
diff --git a/drivers/common/cnxk/roc_npc_mcam_dump.c 
b/drivers/common/cnxk/roc_npc_mcam_dump.c
new file mode 100644
index 000..19b4901
--- /dev/null
+++ b/drivers/common/cnxk/roc_npc_mcam_dump.c
@@ -0,0 +1,611 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#include "roc_api.h"
+#include "roc_priv.h"
+
+#define NPC_MAX_FIELD_NAME_SIZE   80
+#define NPC_RX_ACTIONOP_MASK  GENMASK(3, 0)
+#define NPC_RX_ACTION_PFFUNC_MASK  GENMASK(19, 4)
+#define NPC_RX_ACTION_INDEX_MASK   GENMASK(39, 20)
+#define NPC_RX_ACTION_MATCH_MASK   GENMASK(55, 40)
+#define NPC_RX_ACTION_FLOWKEY_MASK GENMASK(60, 56)
+
+#define NPC_TX_ACTION_INDEX_MASK GENMASK(31, 12)
+#define NPC_TX_ACTION_MATCH_MASK GENMASK(47, 32)
+
+#define NIX_RX_VTAGACT_VTAG0_RELPTR_MASK GENMASK(7, 0)
+#define NIX_RX_VTAGACT_VTAG0_LID_MASK   GENMASK(10, 8)
+#define NIX_RX_VTAGACT_VTAG0_TYPE_MASK  GENMASK(14, 12)
+#define NIX_RX_VTAGACT_VTAG0_VALID_MASK BIT_ULL(15)
+
+#define NIX_RX_VTAGACT_VTAG1_RELPTR_MASK GENMASK(39, 32)
+#define NIX_RX_VTAGACT_VTAG1_LID_MASK   GENMASK(42, 40)
+#define NIX_RX_VTAGACT_VTAG1_TYPE_MASK  GENMASK(46, 44)
+#define NIX_RX_VTAGACT

[dpdk-dev] [PATCH v4 04/62] common/cnxk: support for mark and flag flow actions

2021-06-22 Thread Nithin Dabilpuram
From: Satheesh Paul 

Add roc API to get mark action.

Signed-off-by: Satheesh Paul 
---
 drivers/common/cnxk/roc_npc.c   | 17 +
 drivers/common/cnxk/roc_npc.h   |  3 +++
 drivers/common/cnxk/version.map |  2 ++
 3 files changed, 22 insertions(+)

diff --git a/drivers/common/cnxk/roc_npc.c b/drivers/common/cnxk/roc_npc.c
index 81c7fd9..e6a5036 100644
--- a/drivers/common/cnxk/roc_npc.c
+++ b/drivers/common/cnxk/roc_npc.c
@@ -757,6 +757,23 @@ npc_rss_action_program(struct roc_npc *roc_npc,
return 0;
 }
 
+int
+roc_npc_mark_actions_get(struct roc_npc *roc_npc)
+{
+   struct npc *npc = roc_npc_to_npc_priv(roc_npc);
+
+   return npc->mark_actions;
+}
+
+int
+roc_npc_mark_actions_sub_return(struct roc_npc *roc_npc, uint32_t count)
+{
+   struct npc *npc = roc_npc_to_npc_priv(roc_npc);
+
+   npc->mark_actions -= count;
+   return npc->mark_actions;
+}
+
 struct roc_npc_flow *
 roc_npc_flow_create(struct roc_npc *roc_npc, const struct roc_npc_attr *attr,
const struct roc_npc_item_info pattern[],
diff --git a/drivers/common/cnxk/roc_npc.h b/drivers/common/cnxk/roc_npc.h
index 115bcd5..cf6f732 100644
--- a/drivers/common/cnxk/roc_npc.h
+++ b/drivers/common/cnxk/roc_npc.h
@@ -196,4 +196,7 @@ int __roc_api roc_npc_mcam_free_all_resources(struct 
roc_npc *roc_npc);
 void __roc_api roc_npc_flow_dump(FILE *file, struct roc_npc *roc_npc);
 void __roc_api roc_npc_flow_mcam_dump(FILE *file, struct roc_npc *roc_npc,
  struct roc_npc_flow *mcam);
+int __roc_api roc_npc_mark_actions_get(struct roc_npc *roc_npc);
+int __roc_api roc_npc_mark_actions_sub_return(struct roc_npc *roc_npc,
+ uint32_t count);
 #endif /* _ROC_NPC_H_ */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index a11ba4d..554459b 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -165,6 +165,8 @@ INTERNAL {
roc_npc_flow_parse;
roc_npc_get_low_priority_mcam;
roc_npc_init;
+   roc_npc_mark_actions_get;
+   roc_npc_mark_actions_sub_return;
roc_npc_mcam_alloc_entries;
roc_npc_mcam_alloc_entry;
roc_npc_mcam_clear_counter;
-- 
2.8.4



[dpdk-dev] [PATCH v4 05/62] common/cnxk: allocate lmt region in userspace

2021-06-22 Thread Nithin Dabilpuram
From: Harman Kalra 

As per the new LMTST design, userspace shall allocate lmt region,
setup the DMA translation and share the IOVA with kernel via MBOX.
Kernel will convert this IOVA to physical memory and update the
LMT table entry with the same.
With this new design also shared mode (i.e. all pci funcs sharing
the LMT region allocated by primary/base pci func) is intact.

Signed-off-by: Harman Kalra 
---
 drivers/common/cnxk/roc_api.h  |  2 +
 drivers/common/cnxk/roc_dev.c  | 98 ++
 drivers/common/cnxk/roc_dev_priv.h |  1 +
 drivers/common/cnxk/roc_mbox.h |  3 ++
 drivers/common/cnxk/roc_platform.h | 11 +
 5 files changed, 63 insertions(+), 52 deletions(-)

diff --git a/drivers/common/cnxk/roc_api.h b/drivers/common/cnxk/roc_api.h
index 67f5d13..32e383c 100644
--- a/drivers/common/cnxk/roc_api.h
+++ b/drivers/common/cnxk/roc_api.h
@@ -24,6 +24,8 @@
 /* Platform definition */
 #include "roc_platform.h"
 
+#define ROC_LMT_LINE_SZ128
+#define ROC_NUM_LMT_LINES  2048
 #define ROC_LMT_LINES_PER_CORE_LOG2 5
 #define ROC_LMT_LINE_SIZE_LOG2 7
 #define ROC_LMT_BASE_PER_CORE_LOG2 
\
diff --git a/drivers/common/cnxk/roc_dev.c b/drivers/common/cnxk/roc_dev.c
index a39acc9..adff779 100644
--- a/drivers/common/cnxk/roc_dev.c
+++ b/drivers/common/cnxk/roc_dev.c
@@ -915,43 +915,30 @@ dev_vf_mbase_put(struct plt_pci_device *pci_dev, 
uintptr_t vf_mbase)
mbox_mem_unmap((void *)vf_mbase, MBOX_SIZE * pci_dev->max_vfs);
 }
 
-static uint16_t
-dev_pf_total_vfs(struct plt_pci_device *pci_dev)
-{
-   uint16_t total_vfs = 0;
-   int sriov_pos, rc;
-
-   sriov_pos =
-   plt_pci_find_ext_capability(pci_dev, ROC_PCI_EXT_CAP_ID_SRIOV);
-   if (sriov_pos <= 0) {
-   plt_warn("Unable to find SRIOV cap, rc=%d", sriov_pos);
-   return 0;
-   }
-
-   rc = plt_pci_read_config(pci_dev, &total_vfs, 2,
-sriov_pos + ROC_PCI_SRIOV_TOTAL_VF);
-   if (rc < 0) {
-   plt_warn("Unable to read SRIOV cap, rc=%d", rc);
-   return 0;
-   }
-
-   return total_vfs;
-}
-
 static int
-dev_setup_shared_lmt_region(struct mbox *mbox)
+dev_setup_shared_lmt_region(struct mbox *mbox, bool valid_iova, uint64_t iova)
 {
struct lmtst_tbl_setup_req *req;
 
req = mbox_alloc_msg_lmtst_tbl_setup(mbox);
-   req->pcifunc = idev_lmt_pffunc_get();
+   /* This pcifunc is defined with primary pcifunc whose LMT address
+* will be shared. If call contains valid IOVA, following pcifunc
+* field is of no use.
+*/
+   req->pcifunc = valid_iova ? 0 : idev_lmt_pffunc_get();
+   req->use_local_lmt_region = valid_iova;
+   req->lmt_iova = iova;
 
return mbox_process(mbox);
 }
 
+/* Total no of lines * size of each lmtline */
+#define LMT_REGION_SIZE (ROC_NUM_LMT_LINES * ROC_LMT_LINE_SZ)
 static int
-dev_lmt_setup(struct plt_pci_device *pci_dev, struct dev *dev)
+dev_lmt_setup(struct dev *dev)
 {
+   char name[PLT_MEMZONE_NAMESIZE];
+   const struct plt_memzone *mz;
struct idev_cfg *idev;
int rc;
 
@@ -965,8 +952,11 @@ dev_lmt_setup(struct plt_pci_device *pci_dev, struct dev 
*dev)
/* Set common lmt region from second pf_func onwards. */
if (!dev->disable_shared_lmt && idev_lmt_pffunc_get() &&
dev->pf_func != idev_lmt_pffunc_get()) {
-   rc = dev_setup_shared_lmt_region(dev->mbox);
+   rc = dev_setup_shared_lmt_region(dev->mbox, false, 0);
if (!rc) {
+   /* On success, updating lmt base of secondary pf_funcs
+* with primary pf_func's lmt base.
+*/
dev->lmt_base = roc_idev_lmt_base_addr_get();
return rc;
}
@@ -975,34 +965,30 @@ dev_lmt_setup(struct plt_pci_device *pci_dev, struct dev 
*dev)
dev->pf_func, rc);
}
 
-   if (dev_is_vf(dev)) {
-   /* VF BAR4 should always be sufficient enough to
-* hold LMT lines.
-*/
-   if (pci_dev->mem_resource[4].len <
-   (RVU_LMT_LINE_MAX * RVU_LMT_SZ)) {
-   plt_err("Not enough bar4 space for lmt lines");
-   return -EFAULT;
-   }
+   /* Allocating memory for LMT region */
+   sprintf(name, "LMT_MAP%x", dev->pf_func);
 
-   dev->lmt_base = dev->bar4;
-   } else {
-   uint64_t bar4_mbox_sz = MBOX_SIZE;
-
-   /* PF BAR4 should always be sufficient enough to
-* hold PF-AF MBOX + PF-VF MBOX + LMT lines.
-*/
-   if (pci_dev->mem_resource[4].len <
-   (bar4_mbox_sz + (RVU_LMT_LINE_MAX * RVU_LMT_SZ))) {
-   plt_err("Not enough b

[dpdk-dev] [PATCH v4 06/62] common/cnxk: add provision to enable RED on RQ

2021-06-22 Thread Nithin Dabilpuram
From: Satha Rao 

Send RED pass/drop levels based on rq configurations to kernel.
Fixed the aura and pool shift value calculation.

Signed-off-by: Satha Rao 
---
 drivers/common/cnxk/roc_nix.h   |  8 ++
 drivers/common/cnxk/roc_nix_queue.c | 50 +
 drivers/common/cnxk/roc_npa.c   |  8 --
 drivers/common/cnxk/roc_npa.h   |  5 
 4 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix.h b/drivers/common/cnxk/roc_nix.h
index 6d9ac10..bb69027 100644
--- a/drivers/common/cnxk/roc_nix.h
+++ b/drivers/common/cnxk/roc_nix.h
@@ -161,6 +161,14 @@ struct roc_nix_rq {
uint32_t vwqe_max_sz_exp;
uint64_t vwqe_wait_tmo;
uint64_t vwqe_aura_handle;
+   /* Average LPB aura level drop threshold for RED */
+   uint8_t red_drop;
+   /* Average LPB aura level pass threshold for RED */
+   uint8_t red_pass;
+   /* Average SPB aura level drop threshold for RED */
+   uint8_t spb_red_drop;
+   /* Average SPB aura level pass threshold for RED */
+   uint8_t spb_red_pass;
/* End of Input parameters */
struct roc_nix *roc_nix;
 };
diff --git a/drivers/common/cnxk/roc_nix_queue.c 
b/drivers/common/cnxk/roc_nix_queue.c
index 1c62aa2..0604e7a 100644
--- a/drivers/common/cnxk/roc_nix_queue.c
+++ b/drivers/common/cnxk/roc_nix_queue.c
@@ -119,6 +119,15 @@ rq_cn9k_cfg(struct nix *nix, struct roc_nix_rq *rq, bool 
cfg, bool ena)
aq->rq.qint_idx = rq->qid % nix->qints;
aq->rq.xqe_drop_ena = 1;
 
+   /* If RED enabled, then fill enable for all cases */
+   if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+   aq->rq.spb_aura_pass = rq->spb_red_pass;
+   aq->rq.lpb_aura_pass = rq->red_pass;
+
+   aq->rq.spb_aura_drop = rq->spb_red_drop;
+   aq->rq.lpb_aura_drop = rq->red_drop;
+   }
+
if (cfg) {
if (rq->sso_ena) {
/* SSO mode */
@@ -155,6 +164,14 @@ rq_cn9k_cfg(struct nix *nix, struct roc_nix_rq *rq, bool 
cfg, bool ena)
aq->rq_mask.rq_int_ena = ~aq->rq_mask.rq_int_ena;
aq->rq_mask.qint_idx = ~aq->rq_mask.qint_idx;
aq->rq_mask.xqe_drop_ena = ~aq->rq_mask.xqe_drop_ena;
+
+   if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+   aq->rq_mask.spb_aura_pass = ~aq->rq_mask.spb_aura_pass;
+   aq->rq_mask.lpb_aura_pass = ~aq->rq_mask.lpb_aura_pass;
+
+   aq->rq_mask.spb_aura_drop = ~aq->rq_mask.spb_aura_drop;
+   aq->rq_mask.lpb_aura_drop = ~aq->rq_mask.lpb_aura_drop;
+   }
}
 
return 0;
@@ -244,6 +261,23 @@ rq_cfg(struct nix *nix, struct roc_nix_rq *rq, bool cfg, 
bool ena)
aq->rq.qint_idx = rq->qid % nix->qints;
aq->rq.xqe_drop_ena = 1;
 
+   /* If RED enabled, then fill enable for all cases */
+   if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+   aq->rq.spb_pool_pass = rq->red_pass;
+   aq->rq.spb_aura_pass = rq->red_pass;
+   aq->rq.lpb_pool_pass = rq->red_pass;
+   aq->rq.lpb_aura_pass = rq->red_pass;
+   aq->rq.wqe_pool_pass = rq->red_pass;
+   aq->rq.xqe_pass = rq->red_pass;
+
+   aq->rq.spb_pool_drop = rq->red_drop;
+   aq->rq.spb_aura_drop = rq->red_drop;
+   aq->rq.lpb_pool_drop = rq->red_drop;
+   aq->rq.lpb_aura_drop = rq->red_drop;
+   aq->rq.wqe_pool_drop = rq->red_drop;
+   aq->rq.xqe_drop = rq->red_drop;
+   }
+
if (cfg) {
if (rq->sso_ena) {
/* SSO mode */
@@ -296,6 +330,22 @@ rq_cfg(struct nix *nix, struct roc_nix_rq *rq, bool cfg, 
bool ena)
aq->rq_mask.rq_int_ena = ~aq->rq_mask.rq_int_ena;
aq->rq_mask.qint_idx = ~aq->rq_mask.qint_idx;
aq->rq_mask.xqe_drop_ena = ~aq->rq_mask.xqe_drop_ena;
+
+   if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+   aq->rq_mask.spb_pool_pass = ~aq->rq_mask.spb_pool_pass;
+   aq->rq_mask.spb_aura_pass = ~aq->rq_mask.spb_aura_pass;
+   aq->rq_mask.lpb_pool_pass = ~aq->rq_mask.lpb_pool_pass;
+   aq->rq_mask.lpb_aura_pass = ~aq->rq_mask.lpb_aura_pass;
+   aq->rq_mask.wqe_pool_pass = ~aq->rq_mask.wqe_pool_pass;
+   aq->rq_mask.xqe_pass = ~aq->rq_mask.xqe_pass;
+
+   aq->rq_mask.spb_pool_drop = ~aq->rq_mask.spb_pool_drop;
+   aq->rq_mask.spb_aura_drop = ~aq->rq_mask.spb_aura_drop;
+   aq->rq_mask.lpb_pool_drop = ~aq->rq_mask.lpb_pool_drop;
+   aq->rq_mask.lpb_aura_drop = ~aq->rq_mask.lpb_aura_drop;
+   aq->rq_mask.wqe_pool_drop = ~aq->rq_mask.wqe_poo

[dpdk-dev] [PATCH v4 07/62] common/cnxk: change model API to not use camel case

2021-06-22 Thread Nithin Dabilpuram
Change model check API's to not use Camel case in function
names.

Signed-off-by: Nithin Dabilpuram 
---
 drivers/common/cnxk/roc_model.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/common/cnxk/roc_model.h b/drivers/common/cnxk/roc_model.h
index fb774ac..5aaad53 100644
--- a/drivers/common/cnxk/roc_model.h
+++ b/drivers/common/cnxk/roc_model.h
@@ -88,19 +88,19 @@ roc_model_is_cn10k(void)
 }
 
 static inline uint64_t
-roc_model_is_cn96_A0(void)
+roc_model_is_cn96_a0(void)
 {
return roc_model->flag & ROC_MODEL_CN96xx_A0;
 }
 
 static inline uint64_t
-roc_model_is_cn96_Ax(void)
+roc_model_is_cn96_ax(void)
 {
return (roc_model->flag & ROC_MODEL_CN96xx_Ax);
 }
 
 static inline uint64_t
-roc_model_is_cn95_A0(void)
+roc_model_is_cn95_a0(void)
 {
return roc_model->flag & ROC_MODEL_CNF95xx_A0;
 }
-- 
2.8.4



[dpdk-dev] [PATCH v4 08/62] common/cnxk: support for VLAN push and pop flow actions

2021-06-22 Thread Nithin Dabilpuram
From: Satheesh Paul 

Add roc API to configure VLAN tag addition and removal.

This patch also adds 98xx support for increased MCAM
entries for rte flow.

Signed-off-by: Satheesh Paul 
---
 drivers/common/cnxk/roc_model.h|   6 +
 drivers/common/cnxk/roc_npc.c  | 257 ++---
 drivers/common/cnxk/roc_npc.h  |  24 
 drivers/common/cnxk/roc_npc_mcam.c |   2 +-
 drivers/common/cnxk/roc_npc_priv.h |   1 +
 drivers/common/cnxk/version.map|   2 +
 6 files changed, 276 insertions(+), 16 deletions(-)

diff --git a/drivers/common/cnxk/roc_model.h b/drivers/common/cnxk/roc_model.h
index 5aaad53..c1d11b7 100644
--- a/drivers/common/cnxk/roc_model.h
+++ b/drivers/common/cnxk/roc_model.h
@@ -88,6 +88,12 @@ roc_model_is_cn10k(void)
 }
 
 static inline uint64_t
+roc_model_is_cn98xx(void)
+{
+   return (roc_model->flag & ROC_MODEL_CN98xx_A0);
+}
+
+static inline uint64_t
 roc_model_is_cn96_a0(void)
 {
return roc_model->flag & ROC_MODEL_CN96xx_A0;
diff --git a/drivers/common/cnxk/roc_npc.c b/drivers/common/cnxk/roc_npc.c
index e6a5036..8a76823 100644
--- a/drivers/common/cnxk/roc_npc.c
+++ b/drivers/common/cnxk/roc_npc.c
@@ -6,6 +6,23 @@
 #include "roc_priv.h"
 
 int
+roc_npc_vtag_actions_get(struct roc_npc *roc_npc)
+{
+   struct npc *npc = roc_npc_to_npc_priv(roc_npc);
+
+   return npc->vtag_actions;
+}
+
+int
+roc_npc_vtag_actions_sub_return(struct roc_npc *roc_npc, uint32_t count)
+{
+   struct npc *npc = roc_npc_to_npc_priv(roc_npc);
+
+   npc->vtag_actions -= count;
+   return npc->vtag_actions;
+}
+
+int
 roc_npc_mcam_free_counter(struct roc_npc *roc_npc, uint16_t ctr_id)
 {
struct npc *npc = roc_npc_to_npc_priv(roc_npc);
@@ -101,7 +118,7 @@ npc_mcam_tot_entries(void)
/* FIXME: change to reading in AF from NPC_AF_CONST1/2
 * MCAM_BANK_DEPTH(_EXT) * MCAM_BANKS
 */
-   if (roc_model_is_cn10k())
+   if (roc_model_is_cn10k() || roc_model_is_cn98xx())
return 16 * 1024; /* MCAM_BANKS = 4, BANK_DEPTH_EXT = 4096 */
else
return 4 * 1024; /* MCAM_BANKS = 4, BANK_DEPTH_EXT = 1024 */
@@ -330,6 +347,7 @@ npc_parse_actions(struct npc *npc, const struct 
roc_npc_attr *attr,
const struct roc_npc_action_mark *act_mark;
const struct roc_npc_action_queue *act_q;
const struct roc_npc_action_vf *vf_act;
+   bool vlan_insert_action = false;
int sel_act, req_act = 0;
uint16_t pf_func, vf_id;
int errcode = 0;
@@ -417,25 +435,69 @@ npc_parse_actions(struct npc *npc, const struct 
roc_npc_attr *attr,
req_act |= ROC_NPC_ACTION_TYPE_SEC;
rq = 0;
break;
+   case ROC_NPC_ACTION_TYPE_VLAN_STRIP:
+   req_act |= ROC_NPC_ACTION_TYPE_VLAN_STRIP;
+   break;
+   case ROC_NPC_ACTION_TYPE_VLAN_INSERT:
+   req_act |= ROC_NPC_ACTION_TYPE_VLAN_INSERT;
+   break;
+   case ROC_NPC_ACTION_TYPE_VLAN_ETHTYPE_INSERT:
+   req_act |= ROC_NPC_ACTION_TYPE_VLAN_ETHTYPE_INSERT;
+   break;
+   case ROC_NPC_ACTION_TYPE_VLAN_PCP_INSERT:
+   req_act |= ROC_NPC_ACTION_TYPE_VLAN_PCP_INSERT;
+   break;
default:
errcode = NPC_ERR_ACTION_NOTSUP;
goto err_exit;
}
}
 
+   if (req_act & (ROC_NPC_ACTION_TYPE_VLAN_INSERT |
+  ROC_NPC_ACTION_TYPE_VLAN_ETHTYPE_INSERT |
+  ROC_NPC_ACTION_TYPE_VLAN_PCP_INSERT))
+   vlan_insert_action = true;
+
+   if ((req_act & (ROC_NPC_ACTION_TYPE_VLAN_INSERT |
+   ROC_NPC_ACTION_TYPE_VLAN_ETHTYPE_INSERT |
+   ROC_NPC_ACTION_TYPE_VLAN_PCP_INSERT)) ==
+   ROC_NPC_ACTION_TYPE_VLAN_PCP_INSERT) {
+   plt_err("PCP insert action can't be supported alone");
+   errcode = NPC_ERR_ACTION_NOTSUP;
+   goto err_exit;
+   }
+
+   /* Both STRIP and INSERT actions are not supported */
+   if (vlan_insert_action && (req_act & ROC_NPC_ACTION_TYPE_VLAN_STRIP)) {
+   errcode = NPC_ERR_ACTION_NOTSUP;
+   goto err_exit;
+   }
+
/* Check if actions specified are compatible */
if (attr->egress) {
-   /* Only DROP/COUNT is supported */
-   if (!(req_act & ROC_NPC_ACTION_TYPE_DROP)) {
+   if (req_act & ROC_NPC_ACTION_TYPE_VLAN_STRIP) {
+   plt_err("VLAN pop action is not supported on Egress");
errcode = NPC_ERR_ACTION_NOTSUP;
goto err_exit;
-   } else if (req_act & ~(ROC_NPC_ACTION_TYPE_DROP |
-  ROC_NPC_ACTION_TYPE_COUNT)) {
+   }
+
+   

[dpdk-dev] [PATCH v4 09/62] net/cnxk: add build infra and common probe

2021-06-22 Thread Nithin Dabilpuram
Add build infrastructure and common probe and remove for cnxk driver
which is used by both CN10K and CN9K SoC.

Signed-off-by: Nithin Dabilpuram 
---
 MAINTAINERS|   5 +-
 doc/guides/nics/cnxk.rst   |  29 +
 doc/guides/nics/features/cnxk.ini  |   9 ++
 doc/guides/nics/features/cnxk_vec.ini  |   9 ++
 doc/guides/nics/features/cnxk_vf.ini   |   9 ++
 doc/guides/nics/index.rst  |   1 +
 doc/guides/platform/cnxk.rst   |   3 +
 doc/guides/rel_notes/release_21_08.rst |   5 +
 drivers/net/cnxk/cnxk_ethdev.c | 218 +
 drivers/net/cnxk/cnxk_ethdev.h |  57 +
 drivers/net/cnxk/meson.build   |  14 +++
 drivers/net/cnxk/version.map   |   3 +
 drivers/net/meson.build|   1 +
 13 files changed, 362 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/nics/cnxk.rst
 create mode 100644 doc/guides/nics/features/cnxk.ini
 create mode 100644 doc/guides/nics/features/cnxk_vec.ini
 create mode 100644 doc/guides/nics/features/cnxk_vf.ini
 create mode 100644 drivers/net/cnxk/cnxk_ethdev.c
 create mode 100644 drivers/net/cnxk/cnxk_ethdev.h
 create mode 100644 drivers/net/cnxk/meson.build
 create mode 100644 drivers/net/cnxk/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 5877a16..b39a1c2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -745,8 +745,11 @@ M: Kiran Kumar K 
 M: Sunil Kumar Kori 
 M: Satha Rao 
 T: git://dpdk.org/next/dpdk-next-net-mrvl
-F: drivers/common/cnxk/
+F: doc/guides/nics/cnxk.rst
+F: doc/guides/nics/features/cnxk*.ini
 F: doc/guides/platform/cnxk.rst
+F: drivers/common/cnxk/
+F: drivers/net/cnxk/
 
 Marvell mvpp2
 M: Liron Himi 
diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
new file mode 100644
index 000..ca21842
--- /dev/null
+++ b/doc/guides/nics/cnxk.rst
@@ -0,0 +1,29 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+Copyright(C) 2021 Marvell.
+
+CNXK Poll Mode driver
+=
+
+The CNXK ETHDEV PMD (**librte_net_cnxk**) provides poll mode ethdev driver
+support for the inbuilt network device found in **Marvell OCTEON CN9K/CN10K**
+SoC family as well as for their virtual functions (VF) in SR-IOV context.
+
+More information can be found at `Marvell Official Website
+`_.
+
+Features
+
+
+Features of the CNXK Ethdev PMD are:
+
+Prerequisites
+-
+
+See :doc:`../platform/cnxk` for setup information.
+
+
+Driver compilation and testing
+--
+
+Refer to the document :ref:`compiling and testing a PMD for a NIC 
`
+for details.
diff --git a/doc/guides/nics/features/cnxk.ini 
b/doc/guides/nics/features/cnxk.ini
new file mode 100644
index 000..2c23464
--- /dev/null
+++ b/doc/guides/nics/features/cnxk.ini
@@ -0,0 +1,9 @@
+;
+; Supported features of the 'cnxk' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Linux= Y
+ARMv8= Y
+Usage doc= Y
diff --git a/doc/guides/nics/features/cnxk_vec.ini 
b/doc/guides/nics/features/cnxk_vec.ini
new file mode 100644
index 000..de78516
--- /dev/null
+++ b/doc/guides/nics/features/cnxk_vec.ini
@@ -0,0 +1,9 @@
+;
+; Supported features of the 'cnxk_vec' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Linux= Y
+ARMv8= Y
+Usage doc= Y
diff --git a/doc/guides/nics/features/cnxk_vf.ini 
b/doc/guides/nics/features/cnxk_vf.ini
new file mode 100644
index 000..9c96351
--- /dev/null
+++ b/doc/guides/nics/features/cnxk_vf.ini
@@ -0,0 +1,9 @@
+;
+; Supported features of the 'cnxk_vf' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Linux= Y
+ARMv8= Y
+Usage doc= Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 799697c..c1a04d9 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -19,6 +19,7 @@ Network Interface Controller Drivers
 axgbe
 bnx2x
 bnxt
+cnxk
 cxgbe
 dpaa
 dpaa2
diff --git a/doc/guides/platform/cnxk.rst b/doc/guides/platform/cnxk.rst
index cebb3d0..b506c11 100644
--- a/doc/guides/platform/cnxk.rst
+++ b/doc/guides/platform/cnxk.rst
@@ -142,6 +142,9 @@ HW Offload Drivers
 
 This section lists dataplane H/W block(s) available in cnxk SoC.
 
+#. **Ethdev Driver**
+   See :doc:`../nics/cnxk` for NIX Ethdev driver information.
+
 #. **Mempool Driver**
See :doc:`../mempool/cnxk` for NPA mempool driver information.
 
diff --git a/doc/guides/rel_notes/release_21_08.rst 
b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf..31e49e1 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -55,6 +55,11 @@ New Features
   

[dpdk-dev] [PATCH v4 10/62] net/cnxk: add platform specific probe and remove

2021-06-22 Thread Nithin Dabilpuram
Add platform specific probe and remove callbacks for CN9K
and CN10K which use common probe and remove functions.
Register ethdev driver for CN9K and CN10K.

Signed-off-by: Nithin Dabilpuram 
---
 drivers/net/cnxk/cn10k_ethdev.c | 64 
 drivers/net/cnxk/cn10k_ethdev.h |  9 +
 drivers/net/cnxk/cn9k_ethdev.c  | 82 +
 drivers/net/cnxk/cn9k_ethdev.h  |  9 +
 drivers/net/cnxk/cnxk_ethdev.c  | 42 +
 drivers/net/cnxk/cnxk_ethdev.h  | 19 ++
 drivers/net/cnxk/meson.build|  5 +++
 7 files changed, 230 insertions(+)
 create mode 100644 drivers/net/cnxk/cn10k_ethdev.c
 create mode 100644 drivers/net/cnxk/cn10k_ethdev.h
 create mode 100644 drivers/net/cnxk/cn9k_ethdev.c
 create mode 100644 drivers/net/cnxk/cn9k_ethdev.h

diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c
new file mode 100644
index 000..ff8ce31
--- /dev/null
+++ b/drivers/net/cnxk/cn10k_ethdev.c
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+#include "cn10k_ethdev.h"
+
+static int
+cn10k_nix_remove(struct rte_pci_device *pci_dev)
+{
+   return cnxk_nix_remove(pci_dev);
+}
+
+static int
+cn10k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+   struct rte_eth_dev *eth_dev;
+   int rc;
+
+   if (RTE_CACHE_LINE_SIZE != 64) {
+   plt_err("Driver not compiled for CN10K");
+   return -EFAULT;
+   }
+
+   rc = roc_plt_init();
+   if (rc) {
+   plt_err("Failed to initialize platform model, rc=%d", rc);
+   return rc;
+   }
+
+   /* Common probe */
+   rc = cnxk_nix_probe(pci_drv, pci_dev);
+   if (rc)
+   return rc;
+
+   if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+   eth_dev = rte_eth_dev_allocated(pci_dev->device.name);
+   if (!eth_dev)
+   return -ENOENT;
+   }
+   return 0;
+}
+
+static const struct rte_pci_id cn10k_pci_nix_map[] = {
+   CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KA, PCI_DEVID_CNXK_RVU_PF),
+   CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KAS, PCI_DEVID_CNXK_RVU_PF),
+   CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KA, PCI_DEVID_CNXK_RVU_VF),
+   CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KAS, PCI_DEVID_CNXK_RVU_VF),
+   CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KA, PCI_DEVID_CNXK_RVU_AF_VF),
+   CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KAS, PCI_DEVID_CNXK_RVU_AF_VF),
+   {
+   .vendor_id = 0,
+   },
+};
+
+static struct rte_pci_driver cn10k_pci_nix = {
+   .id_table = cn10k_pci_nix_map,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA |
+RTE_PCI_DRV_INTR_LSC,
+   .probe = cn10k_nix_probe,
+   .remove = cn10k_nix_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_cn10k, cn10k_pci_nix);
+RTE_PMD_REGISTER_PCI_TABLE(net_cn10k, cn10k_pci_nix_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_cn10k, "vfio-pci");
diff --git a/drivers/net/cnxk/cn10k_ethdev.h b/drivers/net/cnxk/cn10k_ethdev.h
new file mode 100644
index 000..1bf4a65
--- /dev/null
+++ b/drivers/net/cnxk/cn10k_ethdev.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+#ifndef __CN10K_ETHDEV_H__
+#define __CN10K_ETHDEV_H__
+
+#include 
+
+#endif /* __CN10K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn9k_ethdev.c b/drivers/net/cnxk/cn9k_ethdev.c
new file mode 100644
index 000..98d2d3a
--- /dev/null
+++ b/drivers/net/cnxk/cn9k_ethdev.c
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+#include "cn9k_ethdev.h"
+
+static int
+cn9k_nix_remove(struct rte_pci_device *pci_dev)
+{
+   return cnxk_nix_remove(pci_dev);
+}
+
+static int
+cn9k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+   struct rte_eth_dev *eth_dev;
+   struct cnxk_eth_dev *dev;
+   int rc;
+
+   if (RTE_CACHE_LINE_SIZE != 128) {
+   plt_err("Driver not compiled for CN9K");
+   return -EFAULT;
+   }
+
+   rc = roc_plt_init();
+   if (rc) {
+   plt_err("Failed to initialize platform model, rc=%d", rc);
+   return rc;
+   }
+
+   /* Common probe */
+   rc = cnxk_nix_probe(pci_drv, pci_dev);
+   if (rc)
+   return rc;
+
+   /* Find eth dev allocated */
+   eth_dev = rte_eth_dev_allocated(pci_dev->device.name);
+   if (!eth_dev)
+   return -ENOENT;
+
+   dev = cnxk_eth_pmd_priv(eth_dev);
+   /* Update capabilities already set for TSO.
+* TSO not supported for earlier chip revisions
+*/
+   if (roc_model_is_cn96_a0() || roc_model_is_cn95_a0())
+   dev->tx_offload_capa &= ~(DEV_TX_OFFLOAD_TCP_TSO |
+ DEV_TX_OFFLOAD_VXLAN_TNL_TSO |
+ DEV_TX_OF

[dpdk-dev] [PATCH v4 11/62] net/cnxk: add common devargs parsing function

2021-06-22 Thread Nithin Dabilpuram
Add various devargs parsing command line arguments
parsing functions supported by CN9K and CN10K.

Signed-off-by: Nithin Dabilpuram 
---
 doc/guides/nics/cnxk.rst   |  94 +++
 drivers/net/cnxk/cnxk_ethdev.c |   7 ++
 drivers/net/cnxk/cnxk_ethdev.h |   9 ++
 drivers/net/cnxk/cnxk_ethdev_devargs.c | 166 +
 drivers/net/cnxk/meson.build   |   3 +-
 5 files changed, 278 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cnxk_ethdev_devargs.c

diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
index ca21842..6652e17 100644
--- a/doc/guides/nics/cnxk.rst
+++ b/doc/guides/nics/cnxk.rst
@@ -27,3 +27,97 @@ Driver compilation and testing
 
 Refer to the document :ref:`compiling and testing a PMD for a NIC 
`
 for details.
+
+Runtime Config Options
+--
+
+- ``Rx&Tx scalar mode enable`` (default ``0``)
+
+   PMD supports both scalar and vector mode, it may be selected at runtime
+   using ``scalar_enable`` ``devargs`` parameter.
+
+- ``RSS reta size`` (default ``64``)
+
+   RSS redirection table size may be configured during runtime using 
``reta_size``
+   ``devargs`` parameter.
+
+   For example::
+
+  -a 0002:02:00.0,reta_size=256
+
+   With the above configuration, reta table of size 256 is populated.
+
+- ``Flow priority levels`` (default ``3``)
+
+   RTE Flow priority levels can be configured during runtime using
+   ``flow_max_priority`` ``devargs`` parameter.
+
+   For example::
+
+  -a 0002:02:00.0,flow_max_priority=10
+
+   With the above configuration, priority level was set to 10 (0-9). Max
+   priority level supported is 32.
+
+- ``Reserve Flow entries`` (default ``8``)
+
+   RTE flow entries can be pre allocated and the size of pre allocation can be
+   selected runtime using ``flow_prealloc_size`` ``devargs`` parameter.
+
+   For example::
+
+  -a 0002:02:00.0,flow_prealloc_size=4
+
+   With the above configuration, pre alloc size was set to 4. Max pre alloc
+   size supported is 32.
+
+- ``Max SQB buffer count`` (default ``512``)
+
+   Send queue descriptor buffer count may be limited during runtime using
+   ``max_sqb_count`` ``devargs`` parameter.
+
+   For example::
+
+  -a 0002:02:00.0,max_sqb_count=64
+
+   With the above configuration, each send queue's descriptor buffer count is
+   limited to a maximum of 64 buffers.
+
+- ``Switch header enable`` (default ``none``)
+
+   A port can be configured to a specific switch header type by using
+   ``switch_header`` ``devargs`` parameter.
+
+   For example::
+
+  -a 0002:02:00.0,switch_header="higig2"
+
+   With the above configuration, higig2 will be enabled on that port and the
+   traffic on this port should be higig2 traffic only. Supported switch header
+   types are "higig2", "dsa", "chlen90b" and "chlen24b".
+
+- ``RSS tag as XOR`` (default ``0``)
+
+   The HW gives two options to configure the RSS adder i.e
+
+   * ``rss_adder<7:0> = flow_tag<7:0> ^ flow_tag<15:8> ^ flow_tag<23:16> ^ 
flow_tag<31:24>``
+
+   * ``rss_adder<7:0> = flow_tag<7:0>``
+
+   Latter one aligns with standard NIC behavior vs former one is a legacy
+   RSS adder scheme used in OCTEON TX2 products.
+
+   By default, the driver runs in the latter mode.
+   Setting this flag to 1 to select the legacy mode.
+
+   For example to select the legacy mode(RSS tag adder as XOR)::
+
+  -a 0002:02:00.0,tag_as_xor=1
+
+
+
+.. note::
+
+   Above devarg parameters are configurable per device, user needs to pass the
+   parameters to all the PCIe devices if application requires to configure on
+   all the ethdev ports.
diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c
index 526c19b..109fd35 100644
--- a/drivers/net/cnxk/cnxk_ethdev.c
+++ b/drivers/net/cnxk/cnxk_ethdev.c
@@ -57,6 +57,13 @@ cnxk_eth_dev_init(struct rte_eth_dev *eth_dev)
pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
rte_eth_copy_pci_info(eth_dev, pci_dev);
 
+   /* Parse devargs string */
+   rc = cnxk_ethdev_parse_devargs(eth_dev->device->devargs, dev);
+   if (rc) {
+   plt_err("Failed to parse devargs rc=%d", rc);
+   goto error;
+   }
+
/* Initialize base roc nix */
nix->pci_dev = pci_dev;
rc = roc_nix_dev_init(nix);
diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h
index ba2bfcd..97e3a15 100644
--- a/drivers/net/cnxk/cnxk_ethdev.h
+++ b/drivers/net/cnxk/cnxk_ethdev.h
@@ -9,11 +9,15 @@
 
 #include 
 #include 
+#include 
 
 #include "roc_api.h"
 
 #define CNXK_ETH_DEV_PMD_VERSION "1.0"
 
+/* Max supported SQB count */
+#define CNXK_NIX_TX_MAX_SQB 512
+
 #define CNXK_NIX_TX_OFFLOAD_CAPA   
\
(DEV_TX_OFFLOAD_MBUF_FAST_FREE | DEV_TX_OFFLOAD_MT_LOCKFREE |  \
 DEV_TX_OFFLOAD_VLAN_INSERT | DEV_TX_OFFLOAD_QINQ_INSERT | \
@@ -38,6 +42,7 @@ struct cnxk_eth_dev {
uint

[dpdk-dev] [PATCH v4 12/62] net/cnxk: support common dev infos get

2021-06-22 Thread Nithin Dabilpuram
Add support to retrieve dev infos get for CN9K and CN10K.

Signed-off-by: Nithin Dabilpuram 
---
 doc/guides/nics/cnxk.rst  |  3 ++
 doc/guides/nics/features/cnxk.ini |  4 ++
 doc/guides/nics/features/cnxk_vec.ini |  4 ++
 doc/guides/nics/features/cnxk_vf.ini  |  3 ++
 drivers/net/cnxk/cnxk_ethdev.c|  4 +-
 drivers/net/cnxk/cnxk_ethdev.h| 33 
 drivers/net/cnxk/cnxk_ethdev_ops.c| 71 +++
 drivers/net/cnxk/meson.build  |  1 +
 8 files changed, 122 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cnxk_ethdev_ops.c

diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
index 6652e17..6bd410b 100644
--- a/doc/guides/nics/cnxk.rst
+++ b/doc/guides/nics/cnxk.rst
@@ -16,6 +16,9 @@ Features
 
 Features of the CNXK Ethdev PMD are:
 
+- SR-IOV VF
+- Lock-free Tx queue
+
 Prerequisites
 -
 
diff --git a/doc/guides/nics/features/cnxk.ini 
b/doc/guides/nics/features/cnxk.ini
index 2c23464..b426340 100644
--- a/doc/guides/nics/features/cnxk.ini
+++ b/doc/guides/nics/features/cnxk.ini
@@ -4,6 +4,10 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Speed capabilities   = Y
+Lock-free Tx queue   = Y
+SR-IOV   = Y
+Multiprocess aware   = Y
 Linux= Y
 ARMv8= Y
 Usage doc= Y
diff --git a/doc/guides/nics/features/cnxk_vec.ini 
b/doc/guides/nics/features/cnxk_vec.ini
index de78516..292ac1e 100644
--- a/doc/guides/nics/features/cnxk_vec.ini
+++ b/doc/guides/nics/features/cnxk_vec.ini
@@ -4,6 +4,10 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Speed capabilities   = Y
+Lock-free Tx queue   = Y
+SR-IOV   = Y
+Multiprocess aware   = Y
 Linux= Y
 ARMv8= Y
 Usage doc= Y
diff --git a/doc/guides/nics/features/cnxk_vf.ini 
b/doc/guides/nics/features/cnxk_vf.ini
index 9c96351..bc2eb8a 100644
--- a/doc/guides/nics/features/cnxk_vf.ini
+++ b/doc/guides/nics/features/cnxk_vf.ini
@@ -4,6 +4,9 @@
 ; Refer to default.ini for the full list of available PMD features.
 ;
 [Features]
+Speed capabilities   = Y
+Lock-free Tx queue   = Y
+Multiprocess aware   = Y
 Linux= Y
 ARMv8= Y
 Usage doc= Y
diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c
index 109fd35..066e01c 100644
--- a/drivers/net/cnxk/cnxk_ethdev.c
+++ b/drivers/net/cnxk/cnxk_ethdev.c
@@ -38,7 +38,9 @@ nix_get_speed_capa(struct cnxk_eth_dev *dev)
 }
 
 /* CNXK platform independent eth dev ops */
-struct eth_dev_ops cnxk_eth_dev_ops;
+struct eth_dev_ops cnxk_eth_dev_ops = {
+   .dev_infos_get = cnxk_nix_info_get,
+};
 
 static int
 cnxk_eth_dev_init(struct rte_eth_dev *eth_dev)
diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h
index 97e3a15..8d9a7e0 100644
--- a/drivers/net/cnxk/cnxk_ethdev.h
+++ b/drivers/net/cnxk/cnxk_ethdev.h
@@ -15,9 +15,40 @@
 
 #define CNXK_ETH_DEV_PMD_VERSION "1.0"
 
+/* VLAN tag inserted by NIX_TX_VTAG_ACTION.
+ * In Tx space is always reserved for this in FRS.
+ */
+#define CNXK_NIX_MAX_VTAG_INS 2
+#define CNXK_NIX_MAX_VTAG_ACT_SIZE (4 * CNXK_NIX_MAX_VTAG_INS)
+
+/* ETH_HLEN+ETH_FCS+2*VLAN_HLEN */
+#define CNXK_NIX_L2_OVERHEAD (RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN + 8)
+
+#define CNXK_NIX_RX_MIN_DESC   16
+#define CNXK_NIX_RX_MIN_DESC_ALIGN  16
+#define CNXK_NIX_RX_NB_SEG_MAX 6
+#define CNXK_NIX_RX_DEFAULT_RING_SZ 4096
 /* Max supported SQB count */
 #define CNXK_NIX_TX_MAX_SQB 512
 
+/* If PTP is enabled additional SEND MEM DESC is required which
+ * takes 2 words, hence max 7 iova address are possible
+ */
+#if defined(RTE_LIBRTE_IEEE1588)
+#define CNXK_NIX_TX_NB_SEG_MAX 7
+#else
+#define CNXK_NIX_TX_NB_SEG_MAX 9
+#endif
+
+#define CNXK_NIX_RSS_L3_L4_SRC_DST 
\
+   (ETH_RSS_L3_SRC_ONLY | ETH_RSS_L3_DST_ONLY | ETH_RSS_L4_SRC_ONLY | \
+ETH_RSS_L4_DST_ONLY)
+
+#define CNXK_NIX_RSS_OFFLOAD   
\
+   (ETH_RSS_PORT | ETH_RSS_IP | ETH_RSS_UDP | ETH_RSS_TCP |   \
+ETH_RSS_SCTP | ETH_RSS_TUNNEL | ETH_RSS_L2_PAYLOAD |  \
+CNXK_NIX_RSS_L3_L4_SRC_DST | ETH_RSS_LEVEL_MASK | ETH_RSS_C_VLAN)
+
 #define CNXK_NIX_TX_OFFLOAD_CAPA   
\
(DEV_TX_OFFLOAD_MBUF_FAST_FREE | DEV_TX_OFFLOAD_MT_LOCKFREE |  \
 DEV_TX_OFFLOAD_VLAN_INSERT | DEV_TX_OFFLOAD_QINQ_INSERT | \
@@ -77,6 +108,8 @@ extern struct eth_dev_ops cnxk_eth_dev_ops;
 int cnxk_nix_probe(struct rte_pci_driver *pci_drv,
   struct rte_pci_device *pci_dev);
 int cnxk_nix_remove(struct rte_pci_device *pci_dev);
+int cnxk_nix_info_get(struct rte_eth_dev *eth_dev,
+ struct rte_eth_dev_info *dev_info);
 
 /* Devargs */
 int cnxk_ethdev_par

[dpdk-dev] [PATCH v4 13/62] net/cnxk: add device configuration operation

2021-06-22 Thread Nithin Dabilpuram
Add device configuration op for CN9K and CN10K. Most of the
device configuration is common between two platforms except for
some supported offloads.

Signed-off-by: Nithin Dabilpuram 
---
 doc/guides/nics/cnxk.rst  |   2 +
 doc/guides/nics/features/cnxk.ini |   2 +
 doc/guides/nics/features/cnxk_vec.ini |   2 +
 doc/guides/nics/features/cnxk_vf.ini  |   2 +
 drivers/net/cnxk/cn10k_ethdev.c   |  34 ++
 drivers/net/cnxk/cn9k_ethdev.c|  45 +++
 drivers/net/cnxk/cnxk_ethdev.c| 568 ++
 drivers/net/cnxk/cnxk_ethdev.h|  85 +
 8 files changed, 740 insertions(+)

diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
index 6bd410b..0c2ea89 100644
--- a/doc/guides/nics/cnxk.rst
+++ b/doc/guides/nics/cnxk.rst
@@ -18,6 +18,8 @@ Features of the CNXK Ethdev PMD are:
 
 - SR-IOV VF
 - Lock-free Tx queue
+- Multiple queues for TX and RX
+- Receiver Side Scaling (RSS)
 
 Prerequisites
 -
diff --git a/doc/guides/nics/features/cnxk.ini 
b/doc/guides/nics/features/cnxk.ini
index b426340..96dba2a 100644
--- a/doc/guides/nics/features/cnxk.ini
+++ b/doc/guides/nics/features/cnxk.ini
@@ -8,6 +8,8 @@ Speed capabilities   = Y
 Lock-free Tx queue   = Y
 SR-IOV   = Y
 Multiprocess aware   = Y
+RSS hash = Y
+Inner RSS= Y
 Linux= Y
 ARMv8= Y
 Usage doc= Y
diff --git a/doc/guides/nics/features/cnxk_vec.ini 
b/doc/guides/nics/features/cnxk_vec.ini
index 292ac1e..616991c 100644
--- a/doc/guides/nics/features/cnxk_vec.ini
+++ b/doc/guides/nics/features/cnxk_vec.ini
@@ -8,6 +8,8 @@ Speed capabilities   = Y
 Lock-free Tx queue   = Y
 SR-IOV   = Y
 Multiprocess aware   = Y
+RSS hash = Y
+Inner RSS= Y
 Linux= Y
 ARMv8= Y
 Usage doc= Y
diff --git a/doc/guides/nics/features/cnxk_vf.ini 
b/doc/guides/nics/features/cnxk_vf.ini
index bc2eb8a..a0bd2f1 100644
--- a/doc/guides/nics/features/cnxk_vf.ini
+++ b/doc/guides/nics/features/cnxk_vf.ini
@@ -7,6 +7,8 @@
 Speed capabilities   = Y
 Lock-free Tx queue   = Y
 Multiprocess aware   = Y
+RSS hash = Y
+Inner RSS= Y
 Linux= Y
 ARMv8= Y
 Usage doc= Y
diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c
index ff8ce31..d971bbd 100644
--- a/drivers/net/cnxk/cn10k_ethdev.c
+++ b/drivers/net/cnxk/cn10k_ethdev.c
@@ -4,6 +4,38 @@
 #include "cn10k_ethdev.h"
 
 static int
+cn10k_nix_configure(struct rte_eth_dev *eth_dev)
+{
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+   int rc;
+
+   /* Common nix configure */
+   rc = cnxk_nix_configure(eth_dev);
+   if (rc)
+   return rc;
+
+   plt_nix_dbg("Configured port%d platform specific rx_offload_flags=%x"
+   " tx_offload_flags=0x%x",
+   eth_dev->data->port_id, dev->rx_offload_flags,
+   dev->tx_offload_flags);
+   return 0;
+}
+
+/* Update platform specific eth dev ops */
+static void
+nix_eth_dev_ops_override(void)
+{
+   static int init_once;
+
+   if (init_once)
+   return;
+   init_once = 1;
+
+   /* Update platform specific ops */
+   cnxk_eth_dev_ops.dev_configure = cn10k_nix_configure;
+}
+
+static int
 cn10k_nix_remove(struct rte_pci_device *pci_dev)
 {
return cnxk_nix_remove(pci_dev);
@@ -26,6 +58,8 @@ cn10k_nix_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
return rc;
}
 
+   nix_eth_dev_ops_override();
+
/* Common probe */
rc = cnxk_nix_probe(pci_drv, pci_dev);
if (rc)
diff --git a/drivers/net/cnxk/cn9k_ethdev.c b/drivers/net/cnxk/cn9k_ethdev.c
index 98d2d3a..2fb7c14 100644
--- a/drivers/net/cnxk/cn9k_ethdev.c
+++ b/drivers/net/cnxk/cn9k_ethdev.c
@@ -4,6 +4,49 @@
 #include "cn9k_ethdev.h"
 
 static int
+cn9k_nix_configure(struct rte_eth_dev *eth_dev)
+{
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+   struct rte_eth_conf *conf = ð_dev->data->dev_conf;
+   struct rte_eth_txmode *txmode = &conf->txmode;
+   int rc;
+
+   /* Platform specific checks */
+   if ((roc_model_is_cn96_a0() || roc_model_is_cn95_a0()) &&
+   (txmode->offloads & DEV_TX_OFFLOAD_SCTP_CKSUM) &&
+   ((txmode->offloads & DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM) ||
+(txmode->offloads & DEV_TX_OFFLOAD_OUTER_UDP_CKSUM))) {
+   plt_err("Outer IP and SCTP checksum unsupported");
+   return -EINVAL;
+   }
+
+   /* Common nix configure */
+   rc = cnxk_nix_configure(eth_dev);
+   if (rc)
+   return rc;
+
+   plt_nix_dbg("Configured port%d platform specific rx_offload_flags=%x"
+   " tx_offload_flags=0x%x",
+   eth_dev->data->port_id, dev->rx_offload_flags,
+   dev->tx_offl

[dpdk-dev] [PATCH v4 14/62] net/cnxk: support link status update

2021-06-22 Thread Nithin Dabilpuram
Add link status update callback to get current
link status.

Signed-off-by: Nithin Dabilpuram 
---
 doc/guides/nics/cnxk.rst  |   1 +
 doc/guides/nics/features/cnxk.ini |   2 +
 doc/guides/nics/features/cnxk_vec.ini |   2 +
 doc/guides/nics/features/cnxk_vf.ini  |   2 +
 drivers/net/cnxk/cnxk_ethdev.c|   7 +++
 drivers/net/cnxk/cnxk_ethdev.h|   8 +++
 drivers/net/cnxk/cnxk_link.c  | 102 ++
 drivers/net/cnxk/meson.build  |   3 +-
 8 files changed, 126 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cnxk_link.c

diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
index 0c2ea89..7bf6cf5 100644
--- a/doc/guides/nics/cnxk.rst
+++ b/doc/guides/nics/cnxk.rst
@@ -20,6 +20,7 @@ Features of the CNXK Ethdev PMD are:
 - Lock-free Tx queue
 - Multiple queues for TX and RX
 - Receiver Side Scaling (RSS)
+- Link state information
 
 Prerequisites
 -
diff --git a/doc/guides/nics/features/cnxk.ini 
b/doc/guides/nics/features/cnxk.ini
index 96dba2a..affbbd9 100644
--- a/doc/guides/nics/features/cnxk.ini
+++ b/doc/guides/nics/features/cnxk.ini
@@ -8,6 +8,8 @@ Speed capabilities   = Y
 Lock-free Tx queue   = Y
 SR-IOV   = Y
 Multiprocess aware   = Y
+Link status  = Y
+Link status event= Y
 RSS hash = Y
 Inner RSS= Y
 Linux= Y
diff --git a/doc/guides/nics/features/cnxk_vec.ini 
b/doc/guides/nics/features/cnxk_vec.ini
index 616991c..836cc9f 100644
--- a/doc/guides/nics/features/cnxk_vec.ini
+++ b/doc/guides/nics/features/cnxk_vec.ini
@@ -8,6 +8,8 @@ Speed capabilities   = Y
 Lock-free Tx queue   = Y
 SR-IOV   = Y
 Multiprocess aware   = Y
+Link status  = Y
+Link status event= Y
 RSS hash = Y
 Inner RSS= Y
 Linux= Y
diff --git a/doc/guides/nics/features/cnxk_vf.ini 
b/doc/guides/nics/features/cnxk_vf.ini
index a0bd2f1..29bb24f 100644
--- a/doc/guides/nics/features/cnxk_vf.ini
+++ b/doc/guides/nics/features/cnxk_vf.ini
@@ -7,6 +7,8 @@
 Speed capabilities   = Y
 Lock-free Tx queue   = Y
 Multiprocess aware   = Y
+Link status  = Y
+Link status event= Y
 RSS hash = Y
 Inner RSS= Y
 Linux= Y
diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c
index 251d6eb..ea49809 100644
--- a/drivers/net/cnxk/cnxk_ethdev.c
+++ b/drivers/net/cnxk/cnxk_ethdev.c
@@ -601,6 +601,7 @@ cnxk_nix_configure(struct rte_eth_dev *eth_dev)
 /* CNXK platform independent eth dev ops */
 struct eth_dev_ops cnxk_eth_dev_ops = {
.dev_infos_get = cnxk_nix_info_get,
+   .link_update = cnxk_nix_link_update,
 };
 
 static int
@@ -635,6 +636,9 @@ cnxk_eth_dev_init(struct rte_eth_dev *eth_dev)
goto error;
}
 
+   /* Register up msg callbacks */
+   roc_nix_mac_link_cb_register(nix, cnxk_eth_dev_link_status_cb);
+
dev->eth_dev = eth_dev;
dev->configured = 0;
 
@@ -723,6 +727,9 @@ cnxk_eth_dev_uninit(struct rte_eth_dev *eth_dev, bool 
mbox_close)
 
roc_nix_npc_rx_ena_dis(nix, false);
 
+   /* Disable link status events */
+   roc_nix_mac_link_event_start_stop(nix, false);
+
/* Free up SQs */
for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
dev_ops->tx_queue_release(eth_dev->data->tx_queues[i]);
diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h
index 291f5f9..daa87af 100644
--- a/drivers/net/cnxk/cnxk_ethdev.h
+++ b/drivers/net/cnxk/cnxk_ethdev.h
@@ -15,6 +15,9 @@
 
 #define CNXK_ETH_DEV_PMD_VERSION "1.0"
 
+/* Used for struct cnxk_eth_dev::flags */
+#define CNXK_LINK_CFG_IN_PROGRESS_F BIT_ULL(0)
+
 /* VLAN tag inserted by NIX_TX_VTAG_ACTION.
  * In Tx space is always reserved for this in FRS.
  */
@@ -196,6 +199,11 @@ int cnxk_nix_configure(struct rte_eth_dev *eth_dev);
 uint32_t cnxk_rss_ethdev_to_nix(struct cnxk_eth_dev *dev, uint64_t ethdev_rss,
uint8_t rss_level);
 
+/* Link */
+void cnxk_eth_dev_link_status_cb(struct roc_nix *nix,
+struct roc_nix_link_info *link);
+int cnxk_nix_link_update(struct rte_eth_dev *eth_dev, int wait_to_complete);
+
 /* Devargs */
 int cnxk_ethdev_parse_devargs(struct rte_devargs *devargs,
  struct cnxk_eth_dev *dev);
diff --git a/drivers/net/cnxk/cnxk_link.c b/drivers/net/cnxk/cnxk_link.c
new file mode 100644
index 000..b0273e7
--- /dev/null
+++ b/drivers/net/cnxk/cnxk_link.c
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#include "cnxk_ethdev.h"
+
+static inline int
+nix_wait_for_link_cfg(struct cnxk_eth_dev *dev)
+{
+   uint16_t wait = 1000;
+
+   do {
+   rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+   if (!(dev->flags & CNXK_LINK_CFG_IN_PROGRESS_F))
+   break;
+   wait--;
+ 

[dpdk-dev] [PATCH v4 15/62] net/cnxk: add Rx queue setup and release

2021-06-22 Thread Nithin Dabilpuram
Add Rx queue setup and release op for CN9K and CN10K
SoC. Release is completely common while setup is platform
dependent due to fast path Rx queue structure variation.
Fastpath is platform dependent partly due to core cacheline
size difference.

Signed-off-by: Nithin Dabilpuram 
---
 doc/guides/nics/features/cnxk.ini |   1 +
 doc/guides/nics/features/cnxk_vec.ini |   1 +
 doc/guides/nics/features/cnxk_vf.ini  |   1 +
 drivers/net/cnxk/cn10k_ethdev.c   |  44 +
 drivers/net/cnxk/cn10k_ethdev.h   |  14 +++
 drivers/net/cnxk/cn9k_ethdev.c|  44 +
 drivers/net/cnxk/cn9k_ethdev.h|  14 +++
 drivers/net/cnxk/cnxk_ethdev.c| 172 ++
 drivers/net/cnxk/cnxk_ethdev.h|   9 ++
 9 files changed, 300 insertions(+)

diff --git a/doc/guides/nics/features/cnxk.ini 
b/doc/guides/nics/features/cnxk.ini
index affbbd9..a9d2b03 100644
--- a/doc/guides/nics/features/cnxk.ini
+++ b/doc/guides/nics/features/cnxk.ini
@@ -10,6 +10,7 @@ SR-IOV   = Y
 Multiprocess aware   = Y
 Link status  = Y
 Link status event= Y
+Runtime Rx queue setup = Y
 RSS hash = Y
 Inner RSS= Y
 Linux= Y
diff --git a/doc/guides/nics/features/cnxk_vec.ini 
b/doc/guides/nics/features/cnxk_vec.ini
index 836cc9f..6a8ca1f 100644
--- a/doc/guides/nics/features/cnxk_vec.ini
+++ b/doc/guides/nics/features/cnxk_vec.ini
@@ -10,6 +10,7 @@ SR-IOV   = Y
 Multiprocess aware   = Y
 Link status  = Y
 Link status event= Y
+Runtime Rx queue setup = Y
 RSS hash = Y
 Inner RSS= Y
 Linux= Y
diff --git a/doc/guides/nics/features/cnxk_vf.ini 
b/doc/guides/nics/features/cnxk_vf.ini
index 29bb24f..f761638 100644
--- a/doc/guides/nics/features/cnxk_vf.ini
+++ b/doc/guides/nics/features/cnxk_vf.ini
@@ -9,6 +9,7 @@ Lock-free Tx queue   = Y
 Multiprocess aware   = Y
 Link status  = Y
 Link status event= Y
+Runtime Rx queue setup = Y
 RSS hash = Y
 Inner RSS= Y
 Linux= Y
diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c
index d971bbd..b87c4e5 100644
--- a/drivers/net/cnxk/cn10k_ethdev.c
+++ b/drivers/net/cnxk/cn10k_ethdev.c
@@ -4,6 +4,49 @@
 #include "cn10k_ethdev.h"
 
 static int
+cn10k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid,
+uint16_t nb_desc, unsigned int socket,
+const struct rte_eth_rxconf *rx_conf,
+struct rte_mempool *mp)
+{
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+   struct cn10k_eth_rxq *rxq;
+   struct roc_nix_rq *rq;
+   struct roc_nix_cq *cq;
+   int rc;
+
+   RTE_SET_USED(socket);
+
+   /* CQ Errata needs min 4K ring */
+   if (dev->cq_min_4k && nb_desc < 4096)
+   nb_desc = 4096;
+
+   /* Common Rx queue setup */
+   rc = cnxk_nix_rx_queue_setup(eth_dev, qid, nb_desc,
+sizeof(struct cn10k_eth_rxq), rx_conf, mp);
+   if (rc)
+   return rc;
+
+   rq = &dev->rqs[qid];
+   cq = &dev->cqs[qid];
+
+   /* Update fast path queue */
+   rxq = eth_dev->data->rx_queues[qid];
+   rxq->rq = qid;
+   rxq->desc = (uintptr_t)cq->desc_base;
+   rxq->cq_door = cq->door;
+   rxq->cq_status = cq->status;
+   rxq->wdata = cq->wdata;
+   rxq->head = cq->head;
+   rxq->qmask = cq->qmask;
+
+   /* Data offset from data to start of mbuf is first_skip */
+   rxq->data_off = rq->first_skip;
+   rxq->mbuf_initializer = cnxk_nix_rxq_mbuf_setup(dev);
+   return 0;
+}
+
+static int
 cn10k_nix_configure(struct rte_eth_dev *eth_dev)
 {
struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
@@ -33,6 +76,7 @@ nix_eth_dev_ops_override(void)
 
/* Update platform specific ops */
cnxk_eth_dev_ops.dev_configure = cn10k_nix_configure;
+   cnxk_eth_dev_ops.rx_queue_setup = cn10k_nix_rx_queue_setup;
 }
 
 static int
diff --git a/drivers/net/cnxk/cn10k_ethdev.h b/drivers/net/cnxk/cn10k_ethdev.h
index 1bf4a65..08e11bb 100644
--- a/drivers/net/cnxk/cn10k_ethdev.h
+++ b/drivers/net/cnxk/cn10k_ethdev.h
@@ -6,4 +6,18 @@
 
 #include 
 
+struct cn10k_eth_rxq {
+   uint64_t mbuf_initializer;
+   uintptr_t desc;
+   void *lookup_mem;
+   uintptr_t cq_door;
+   uint64_t wdata;
+   int64_t *cq_status;
+   uint32_t head;
+   uint32_t qmask;
+   uint32_t available;
+   uint16_t data_off;
+   uint16_t rq;
+} __plt_cache_aligned;
+
 #endif /* __CN10K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn9k_ethdev.c b/drivers/net/cnxk/cn9k_ethdev.c
index 2fb7c14..2ab035e 100644
--- a/drivers/net/cnxk/cn9k_ethdev.c
+++ b/drivers/net/cnxk/cn9k_ethdev.c
@@ -4,6 +4,49 @@
 #include "cn9k_ethdev.h"
 
 static int
+cn9k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid,
+   uint16_t nb_

[dpdk-dev] [PATCH v4 16/62] net/cnxk: add Tx queue setup and release

2021-06-22 Thread Nithin Dabilpuram
aDD tx queue setup and release for CN9K and CN10K.
Release is common while setup is platform dependent due
to differences in fast path Tx queue structures.

Signed-off-by: Nithin Dabilpuram 
---
 doc/guides/nics/features/cnxk.ini |  1 +
 doc/guides/nics/features/cnxk_vec.ini |  1 +
 doc/guides/nics/features/cnxk_vf.ini  |  1 +
 drivers/net/cnxk/cn10k_ethdev.c   | 72 +
 drivers/net/cnxk/cn10k_ethdev.h   | 13 +
 drivers/net/cnxk/cn10k_tx.h   | 13 +
 drivers/net/cnxk/cn9k_ethdev.c| 70 +
 drivers/net/cnxk/cn9k_ethdev.h| 11 
 drivers/net/cnxk/cn9k_tx.h| 13 +
 drivers/net/cnxk/cnxk_ethdev.c| 98 +++
 drivers/net/cnxk/cnxk_ethdev.h|  3 ++
 11 files changed, 296 insertions(+)
 create mode 100644 drivers/net/cnxk/cn10k_tx.h
 create mode 100644 drivers/net/cnxk/cn9k_tx.h

diff --git a/doc/guides/nics/features/cnxk.ini 
b/doc/guides/nics/features/cnxk.ini
index a9d2b03..462d7c4 100644
--- a/doc/guides/nics/features/cnxk.ini
+++ b/doc/guides/nics/features/cnxk.ini
@@ -11,6 +11,7 @@ Multiprocess aware   = Y
 Link status  = Y
 Link status event= Y
 Runtime Rx queue setup = Y
+Runtime Tx queue setup = Y
 RSS hash = Y
 Inner RSS= Y
 Linux= Y
diff --git a/doc/guides/nics/features/cnxk_vec.ini 
b/doc/guides/nics/features/cnxk_vec.ini
index 6a8ca1f..09e0d3a 100644
--- a/doc/guides/nics/features/cnxk_vec.ini
+++ b/doc/guides/nics/features/cnxk_vec.ini
@@ -11,6 +11,7 @@ Multiprocess aware   = Y
 Link status  = Y
 Link status event= Y
 Runtime Rx queue setup = Y
+Runtime Tx queue setup = Y
 RSS hash = Y
 Inner RSS= Y
 Linux= Y
diff --git a/doc/guides/nics/features/cnxk_vf.ini 
b/doc/guides/nics/features/cnxk_vf.ini
index f761638..4a93a35 100644
--- a/doc/guides/nics/features/cnxk_vf.ini
+++ b/doc/guides/nics/features/cnxk_vf.ini
@@ -10,6 +10,7 @@ Multiprocess aware   = Y
 Link status  = Y
 Link status event= Y
 Runtime Rx queue setup = Y
+Runtime Tx queue setup = Y
 RSS hash = Y
 Inner RSS= Y
 Linux= Y
diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c
index b87c4e5..454c8ca 100644
--- a/drivers/net/cnxk/cn10k_ethdev.c
+++ b/drivers/net/cnxk/cn10k_ethdev.c
@@ -2,6 +2,77 @@
  * Copyright(C) 2021 Marvell.
  */
 #include "cn10k_ethdev.h"
+#include "cn10k_tx.h"
+
+static void
+nix_form_default_desc(struct cnxk_eth_dev *dev, struct cn10k_eth_txq *txq,
+ uint16_t qid)
+{
+   struct nix_send_ext_s *send_hdr_ext;
+   union nix_send_hdr_w0_u send_hdr_w0;
+   union nix_send_sg_s sg_w0;
+
+   RTE_SET_USED(dev);
+
+   /* Initialize the fields based on basic single segment packet */
+   memset(&txq->cmd, 0, sizeof(txq->cmd));
+   send_hdr_w0.u = 0;
+   sg_w0.u = 0;
+
+   if (dev->tx_offload_flags & NIX_TX_NEED_EXT_HDR) {
+   /* 2(HDR) + 2(EXT_HDR) + 1(SG) + 1(IOVA) = 6/2 - 1 = 2 */
+   send_hdr_w0.sizem1 = 2;
+
+   send_hdr_ext = (struct nix_send_ext_s *)&txq->cmd[0];
+   send_hdr_ext->w0.subdc = NIX_SUBDC_EXT;
+   } else {
+   /* 2(HDR) + 1(SG) + 1(IOVA) = 4/2 - 1 = 1 */
+   send_hdr_w0.sizem1 = 1;
+   }
+
+   send_hdr_w0.sq = qid;
+   sg_w0.subdc = NIX_SUBDC_SG;
+   sg_w0.segs = 1;
+   sg_w0.ld_type = NIX_SENDLDTYPE_LDD;
+
+   txq->send_hdr_w0 = send_hdr_w0.u;
+   txq->sg_w0 = sg_w0.u;
+
+   rte_wmb();
+}
+
+static int
+cn10k_nix_tx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid,
+uint16_t nb_desc, unsigned int socket,
+const struct rte_eth_txconf *tx_conf)
+{
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+   struct cn10k_eth_txq *txq;
+   struct roc_nix_sq *sq;
+   int rc;
+
+   RTE_SET_USED(socket);
+
+   /* Common Tx queue setup */
+   rc = cnxk_nix_tx_queue_setup(eth_dev, qid, nb_desc,
+sizeof(struct cn10k_eth_txq), tx_conf);
+   if (rc)
+   return rc;
+
+   sq = &dev->sqs[qid];
+   /* Update fast path queue */
+   txq = eth_dev->data->tx_queues[qid];
+   txq->fc_mem = sq->fc;
+   /* Store lmt base in tx queue for easy access */
+   txq->lmt_base = dev->nix.lmt_base;
+   txq->io_addr = sq->io_addr;
+   txq->nb_sqb_bufs_adj = sq->nb_sqb_bufs_adj;
+   txq->sqes_per_sqb_log2 = sq->sqes_per_sqb_log2;
+
+   nix_form_default_desc(dev, txq, qid);
+   txq->lso_tun_fmt = dev->lso_tun_fmt;
+   return 0;
+}
 
 static int
 cn10k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid,
@@ -76,6 +147,7 @@ nix_eth_dev_ops_override(void)
 
/* Update platform specific ops */
cnxk_eth_dev_ops.dev_configure = cn10k_nix_configure;
+   cnxk_

[dpdk-dev] [PATCH v4 17/62] net/cnxk: support packet type

2021-06-22 Thread Nithin Dabilpuram
Add support for packet type lookup on Rx to translate HW
specific types to  RTE_PTYPE_* defines

Signed-off-by: Nithin Dabilpuram 
---
 doc/guides/nics/cnxk.rst  |   1 +
 doc/guides/nics/features/cnxk.ini |   1 +
 doc/guides/nics/features/cnxk_vec.ini |   1 +
 doc/guides/nics/features/cnxk_vf.ini  |   1 +
 drivers/net/cnxk/cn10k_ethdev.c   |  21 +++
 drivers/net/cnxk/cn10k_rx.h   |  11 ++
 drivers/net/cnxk/cn9k_ethdev.c|  21 +++
 drivers/net/cnxk/cn9k_rx.h|  12 ++
 drivers/net/cnxk/cnxk_ethdev.c|   2 +
 drivers/net/cnxk/cnxk_ethdev.h|  14 ++
 drivers/net/cnxk/cnxk_lookup.c| 326 ++
 drivers/net/cnxk/meson.build  |   3 +-
 12 files changed, 413 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cn10k_rx.h
 create mode 100644 drivers/net/cnxk/cn9k_rx.h
 create mode 100644 drivers/net/cnxk/cnxk_lookup.c

diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
index 7bf6cf5..8bc85c0 100644
--- a/doc/guides/nics/cnxk.rst
+++ b/doc/guides/nics/cnxk.rst
@@ -16,6 +16,7 @@ Features
 
 Features of the CNXK Ethdev PMD are:
 
+- Packet type information
 - SR-IOV VF
 - Lock-free Tx queue
 - Multiple queues for TX and RX
diff --git a/doc/guides/nics/features/cnxk.ini 
b/doc/guides/nics/features/cnxk.ini
index 462d7c4..503582c 100644
--- a/doc/guides/nics/features/cnxk.ini
+++ b/doc/guides/nics/features/cnxk.ini
@@ -14,6 +14,7 @@ Runtime Rx queue setup = Y
 Runtime Tx queue setup = Y
 RSS hash = Y
 Inner RSS= Y
+Packet type parsing  = Y
 Linux= Y
 ARMv8= Y
 Usage doc= Y
diff --git a/doc/guides/nics/features/cnxk_vec.ini 
b/doc/guides/nics/features/cnxk_vec.ini
index 09e0d3a..9ad225a 100644
--- a/doc/guides/nics/features/cnxk_vec.ini
+++ b/doc/guides/nics/features/cnxk_vec.ini
@@ -14,6 +14,7 @@ Runtime Rx queue setup = Y
 Runtime Tx queue setup = Y
 RSS hash = Y
 Inner RSS= Y
+Packet type parsing  = Y
 Linux= Y
 ARMv8= Y
 Usage doc= Y
diff --git a/doc/guides/nics/features/cnxk_vf.ini 
b/doc/guides/nics/features/cnxk_vf.ini
index 4a93a35..8c93ba7 100644
--- a/doc/guides/nics/features/cnxk_vf.ini
+++ b/doc/guides/nics/features/cnxk_vf.ini
@@ -13,6 +13,7 @@ Runtime Rx queue setup = Y
 Runtime Tx queue setup = Y
 RSS hash = Y
 Inner RSS= Y
+Packet type parsing  = Y
 Linux= Y
 ARMv8= Y
 Usage doc= Y
diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c
index 454c8ca..f79d03c 100644
--- a/drivers/net/cnxk/cn10k_ethdev.c
+++ b/drivers/net/cnxk/cn10k_ethdev.c
@@ -2,8 +2,25 @@
  * Copyright(C) 2021 Marvell.
  */
 #include "cn10k_ethdev.h"
+#include "cn10k_rx.h"
 #include "cn10k_tx.h"
 
+static int
+cn10k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
+{
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+   if (ptype_mask) {
+   dev->rx_offload_flags |= NIX_RX_OFFLOAD_PTYPE_F;
+   dev->ptype_disable = 0;
+   } else {
+   dev->rx_offload_flags &= ~NIX_RX_OFFLOAD_PTYPE_F;
+   dev->ptype_disable = 1;
+   }
+
+   return 0;
+}
+
 static void
 nix_form_default_desc(struct cnxk_eth_dev *dev, struct cn10k_eth_txq *txq,
  uint16_t qid)
@@ -114,6 +131,9 @@ cn10k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, 
uint16_t qid,
/* Data offset from data to start of mbuf is first_skip */
rxq->data_off = rq->first_skip;
rxq->mbuf_initializer = cnxk_nix_rxq_mbuf_setup(dev);
+
+   /* Lookup mem */
+   rxq->lookup_mem = cnxk_nix_fastpath_lookup_mem_get();
return 0;
 }
 
@@ -149,6 +169,7 @@ nix_eth_dev_ops_override(void)
cnxk_eth_dev_ops.dev_configure = cn10k_nix_configure;
cnxk_eth_dev_ops.tx_queue_setup = cn10k_nix_tx_queue_setup;
cnxk_eth_dev_ops.rx_queue_setup = cn10k_nix_rx_queue_setup;
+   cnxk_eth_dev_ops.dev_ptypes_set = cn10k_nix_ptypes_set;
 }
 
 static int
diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h
new file mode 100644
index 000..d3d1661
--- /dev/null
+++ b/drivers/net/cnxk/cn10k_rx.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+#ifndef __CN10K_RX_H__
+#define __CN10K_RX_H__
+
+#include 
+
+#define NIX_RX_OFFLOAD_PTYPE_F  BIT(1)
+
+#endif /* __CN10K_RX_H__ */
diff --git a/drivers/net/cnxk/cn9k_ethdev.c b/drivers/net/cnxk/cn9k_ethdev.c
index 5c696c8..19b3727 100644
--- a/drivers/net/cnxk/cn9k_ethdev.c
+++ b/drivers/net/cnxk/cn9k_ethdev.c
@@ -2,8 +2,25 @@
  * Copyright(C) 2021 Marvell.
  */
 #include "cn9k_ethdev.h"
+#include "cn9k_rx.h"
 #include "cn9k_tx.h"
 
+static int
+cn9k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
+{
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+   if

[dpdk-dev] [PATCH v4 18/62] net/cnxk: support queue start and stop

2021-06-22 Thread Nithin Dabilpuram
Add Rx/Tx queue start and stop callbacks for
CN9K and CN10K.

Signed-off-by: Nithin Dabilpuram 
---
 doc/guides/nics/features/cnxk.ini |  1 +
 doc/guides/nics/features/cnxk_vec.ini |  1 +
 doc/guides/nics/features/cnxk_vf.ini  |  1 +
 drivers/net/cnxk/cn10k_ethdev.c   | 16 ++
 drivers/net/cnxk/cn9k_ethdev.c| 16 ++
 drivers/net/cnxk/cnxk_ethdev.c| 92 +++
 drivers/net/cnxk/cnxk_ethdev.h|  1 +
 7 files changed, 128 insertions(+)

diff --git a/doc/guides/nics/features/cnxk.ini 
b/doc/guides/nics/features/cnxk.ini
index 503582c..712f8d5 100644
--- a/doc/guides/nics/features/cnxk.ini
+++ b/doc/guides/nics/features/cnxk.ini
@@ -12,6 +12,7 @@ Link status  = Y
 Link status event= Y
 Runtime Rx queue setup = Y
 Runtime Tx queue setup = Y
+Queue start/stop = Y
 RSS hash = Y
 Inner RSS= Y
 Packet type parsing  = Y
diff --git a/doc/guides/nics/features/cnxk_vec.ini 
b/doc/guides/nics/features/cnxk_vec.ini
index 9ad225a..82f2af0 100644
--- a/doc/guides/nics/features/cnxk_vec.ini
+++ b/doc/guides/nics/features/cnxk_vec.ini
@@ -12,6 +12,7 @@ Link status  = Y
 Link status event= Y
 Runtime Rx queue setup = Y
 Runtime Tx queue setup = Y
+Queue start/stop = Y
 RSS hash = Y
 Inner RSS= Y
 Packet type parsing  = Y
diff --git a/doc/guides/nics/features/cnxk_vf.ini 
b/doc/guides/nics/features/cnxk_vf.ini
index 8c93ba7..61fed11 100644
--- a/doc/guides/nics/features/cnxk_vf.ini
+++ b/doc/guides/nics/features/cnxk_vf.ini
@@ -11,6 +11,7 @@ Link status  = Y
 Link status event= Y
 Runtime Rx queue setup = Y
 Runtime Tx queue setup = Y
+Queue start/stop = Y
 RSS hash = Y
 Inner RSS= Y
 Packet type parsing  = Y
diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c
index f79d03c..d70ab00 100644
--- a/drivers/net/cnxk/cn10k_ethdev.c
+++ b/drivers/net/cnxk/cn10k_ethdev.c
@@ -138,6 +138,21 @@ cn10k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, 
uint16_t qid,
 }
 
 static int
+cn10k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
+{
+   struct cn10k_eth_txq *txq = eth_dev->data->tx_queues[qidx];
+   int rc;
+
+   rc = cnxk_nix_tx_queue_stop(eth_dev, qidx);
+   if (rc)
+   return rc;
+
+   /* Clear fc cache pkts to trigger worker stop */
+   txq->fc_cache_pkts = 0;
+   return 0;
+}
+
+static int
 cn10k_nix_configure(struct rte_eth_dev *eth_dev)
 {
struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
@@ -169,6 +184,7 @@ nix_eth_dev_ops_override(void)
cnxk_eth_dev_ops.dev_configure = cn10k_nix_configure;
cnxk_eth_dev_ops.tx_queue_setup = cn10k_nix_tx_queue_setup;
cnxk_eth_dev_ops.rx_queue_setup = cn10k_nix_rx_queue_setup;
+   cnxk_eth_dev_ops.tx_queue_stop = cn10k_nix_tx_queue_stop;
cnxk_eth_dev_ops.dev_ptypes_set = cn10k_nix_ptypes_set;
 }
 
diff --git a/drivers/net/cnxk/cn9k_ethdev.c b/drivers/net/cnxk/cn9k_ethdev.c
index 19b3727..806e95f 100644
--- a/drivers/net/cnxk/cn9k_ethdev.c
+++ b/drivers/net/cnxk/cn9k_ethdev.c
@@ -136,6 +136,21 @@ cn9k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, 
uint16_t qid,
 }
 
 static int
+cn9k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
+{
+   struct cn9k_eth_txq *txq = eth_dev->data->tx_queues[qidx];
+   int rc;
+
+   rc = cnxk_nix_tx_queue_stop(eth_dev, qidx);
+   if (rc)
+   return rc;
+
+   /* Clear fc cache pkts to trigger worker stop */
+   txq->fc_cache_pkts = 0;
+   return 0;
+}
+
+static int
 cn9k_nix_configure(struct rte_eth_dev *eth_dev)
 {
struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
@@ -178,6 +193,7 @@ nix_eth_dev_ops_override(void)
cnxk_eth_dev_ops.dev_configure = cn9k_nix_configure;
cnxk_eth_dev_ops.tx_queue_setup = cn9k_nix_tx_queue_setup;
cnxk_eth_dev_ops.rx_queue_setup = cn9k_nix_rx_queue_setup;
+   cnxk_eth_dev_ops.tx_queue_stop = cn9k_nix_tx_queue_stop;
cnxk_eth_dev_ops.dev_ptypes_set = cn9k_nix_ptypes_set;
 }
 
diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c
index b1ed046..6c20098 100644
--- a/drivers/net/cnxk/cnxk_ethdev.c
+++ b/drivers/net/cnxk/cnxk_ethdev.c
@@ -866,12 +866,104 @@ cnxk_nix_configure(struct rte_eth_dev *eth_dev)
return rc;
 }
 
+static int
+cnxk_nix_tx_queue_start(struct rte_eth_dev *eth_dev, uint16_t qid)
+{
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+   struct rte_eth_dev_data *data = eth_dev->data;
+   struct roc_nix_sq *sq = &dev->sqs[qid];
+   int rc = -EINVAL;
+
+   if (data->tx_queue_state[qid] == RTE_ETH_QUEUE_STATE_STARTED)
+   return 0;
+
+   rc = roc_nix_tm_sq_aura_fc(sq, true);
+   if (rc) {
+   plt_err("Failed to enable sq aura fc, txq=%u, rc=%d", qid, rc);
+   goto done;
+   }
+
+   data->tx_queue_state[qid] = 

[dpdk-dev] [PATCH v4 19/62] net/cnxk: add Rx burst for cn9k

2021-06-22 Thread Nithin Dabilpuram
From: Jerin Jacob 

Add Rx burst scalar version for CN9K.

Signed-off-by: Jerin Jacob 
---
 drivers/net/cnxk/cn9k_ethdev.h |   3 +
 drivers/net/cnxk/cn9k_rx.c |  46 
 drivers/net/cnxk/cn9k_rx.h | 237 +
 drivers/net/cnxk/cnxk_ethdev.h |   3 +
 drivers/net/cnxk/meson.build   |   3 +-
 5 files changed, 291 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cn9k_rx.c

diff --git a/drivers/net/cnxk/cn9k_ethdev.h b/drivers/net/cnxk/cn9k_ethdev.h
index bd7bf50..bab5540 100644
--- a/drivers/net/cnxk/cn9k_ethdev.h
+++ b/drivers/net/cnxk/cn9k_ethdev.h
@@ -31,4 +31,7 @@ struct cn9k_eth_rxq {
uint16_t rq;
 } __plt_cache_aligned;
 
+/* Rx and Tx routines */
+void cn9k_eth_set_rx_function(struct rte_eth_dev *eth_dev);
+
 #endif /* __CN9K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn9k_rx.c b/drivers/net/cnxk/cn9k_rx.c
new file mode 100644
index 000..a4297f9
--- /dev/null
+++ b/drivers/net/cnxk/cn9k_rx.c
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#include "cn9k_ethdev.h"
+#include "cn9k_rx.h"
+
+#define R(name, f3, f2, f1, f0, flags)\
+   uint16_t __rte_noinline __rte_hot cn9k_nix_recv_pkts_##name(   \
+   void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)  \
+   {  \
+   return cn9k_nix_recv_pkts(rx_queue, rx_pkts, pkts, (flags));   \
+   }
+
+NIX_RX_FASTPATH_MODES
+#undef R
+
+static inline void
+pick_rx_func(struct rte_eth_dev *eth_dev,
+const eth_rx_burst_t rx_burst[2][2][2][2])
+{
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+   /* [MARK] [CKSUM] [PTYPE] [RSS] */
+   eth_dev->rx_pkt_burst = rx_burst
+   [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_MARK_UPDATE_F)]
+   [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_CHECKSUM_F)]
+   [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_PTYPE_F)]
+   [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_RSS_F)];
+}
+
+void
+cn9k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
+{
+   const eth_rx_burst_t nix_eth_rx_burst[2][2][2][2] = {
+#define R(name, f3, f2, f1, f0, flags) \
+   [f3][f2][f1][f0] = cn9k_nix_recv_pkts_##name,
+
+   NIX_RX_FASTPATH_MODES
+#undef R
+   };
+
+   pick_rx_func(eth_dev, nix_eth_rx_burst);
+
+   rte_mb();
+}
diff --git a/drivers/net/cnxk/cn9k_rx.h b/drivers/net/cnxk/cn9k_rx.h
index 95a1e69..92f3c7c 100644
--- a/drivers/net/cnxk/cn9k_rx.h
+++ b/drivers/net/cnxk/cn9k_rx.h
@@ -7,6 +7,243 @@
 
 #include 
 
+#define NIX_RX_OFFLOAD_NONE (0)
+#define NIX_RX_OFFLOAD_RSS_FBIT(0)
 #define NIX_RX_OFFLOAD_PTYPE_F  BIT(1)
+#define NIX_RX_OFFLOAD_CHECKSUM_FBIT(2)
+#define NIX_RX_OFFLOAD_MARK_UPDATE_F BIT(3)
+
+/* Flags to control cqe_to_mbuf conversion function.
+ * Defining it from backwards to denote its been
+ * not used as offload flags to pick function
+ */
+#define NIX_RX_MULTI_SEG_F BIT(15)
+
+#define CNXK_NIX_CQ_ENTRY_SZ 128
+#define NIX_DESCS_PER_LOOP   4
+#define CQE_CAST(x) ((struct nix_cqe_hdr_s *)(x))
+#define CQE_SZ(x)   ((x) * CNXK_NIX_CQ_ENTRY_SZ)
+
+union mbuf_initializer {
+   struct {
+   uint16_t data_off;
+   uint16_t refcnt;
+   uint16_t nb_segs;
+   uint16_t port;
+   } fields;
+   uint64_t value;
+};
+
+static __rte_always_inline uint64_t
+nix_clear_data_off(uint64_t oldval)
+{
+   union mbuf_initializer mbuf_init = {.value = oldval};
+
+   mbuf_init.fields.data_off = 0;
+   return mbuf_init.value;
+}
+
+static __rte_always_inline struct rte_mbuf *
+nix_get_mbuf_from_cqe(void *cq, const uint64_t data_off)
+{
+   rte_iova_t buff;
+
+   /* Skip CQE, NIX_RX_PARSE_S and SG HDR(9 DWORDs) and peek buff addr */
+   buff = *((rte_iova_t *)((uint64_t *)cq + 9));
+   return (struct rte_mbuf *)(buff - data_off);
+}
+
+static __rte_always_inline uint32_t
+nix_ptype_get(const void *const lookup_mem, const uint64_t in)
+{
+   const uint16_t *const ptype = lookup_mem;
+   const uint16_t lh_lg_lf = (in & 0xFFF0) >> 52;
+   const uint16_t tu_l2 = ptype[(in & 0x0000) >> 36];
+   const uint16_t il4_tu = ptype[PTYPE_NON_TUNNEL_ARRAY_SZ + lh_lg_lf];
+
+   return (il4_tu << PTYPE_NON_TUNNEL_WIDTH) | tu_l2;
+}
+
+static __rte_always_inline uint32_t
+nix_rx_olflags_get(const void *const lookup_mem, const uint64_t in)
+{
+   const uint32_t *const ol_flags =
+   (const uint32_t *)((const uint8_t *)lookup_mem +
+  PTYPE_ARRAY_SZ);
+
+   return ol_flags[(in & 0xfff0) >> 20];
+}
+
+static inline uint64_t
+nix_update_match_id(const uint16_t match_id, uint64_t ol_flags,
+   struct rte_mbuf *mbuf)
+{
+ 

[dpdk-dev] [PATCH v4 20/62] net/cnxk: add Rx multi-segmented version for cn9k

2021-06-22 Thread Nithin Dabilpuram
Add Rx burst multi-segmented version for CN9K.

Signed-off-by: Nithin Dabilpuram 
Signed-off-by: Pavan Nikhilesh 
---
 drivers/net/cnxk/cn9k_rx.c  | 17 
 drivers/net/cnxk/cn9k_rx.h  | 60 ++---
 drivers/net/cnxk/cn9k_rx_mseg.c | 17 
 drivers/net/cnxk/cnxk_ethdev.h  |  3 +++
 drivers/net/cnxk/meson.build|  3 ++-
 5 files changed, 96 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/cnxk/cn9k_rx_mseg.c

diff --git a/drivers/net/cnxk/cn9k_rx.c b/drivers/net/cnxk/cn9k_rx.c
index a4297f9..87a62c9 100644
--- a/drivers/net/cnxk/cn9k_rx.c
+++ b/drivers/net/cnxk/cn9k_rx.c
@@ -32,6 +32,8 @@ pick_rx_func(struct rte_eth_dev *eth_dev,
 void
 cn9k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
 {
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
const eth_rx_burst_t nix_eth_rx_burst[2][2][2][2] = {
 #define R(name, f3, f2, f1, f0, flags) \
[f3][f2][f1][f0] = cn9k_nix_recv_pkts_##name,
@@ -40,7 +42,22 @@ cn9k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
 #undef R
};
 
+   const eth_rx_burst_t nix_eth_rx_burst_mseg[2][2][2][2] = {
+#define R(name, f3, f2, f1, f0, flags) \
+   [f3][f2][f1][f0] = cn9k_nix_recv_pkts_mseg_##name,
+
+   NIX_RX_FASTPATH_MODES
+#undef R
+   };
+
pick_rx_func(eth_dev, nix_eth_rx_burst);
 
+   if (dev->rx_offloads & DEV_RX_OFFLOAD_SCATTER)
+   pick_rx_func(eth_dev, nix_eth_rx_burst_mseg);
+
+   /* Copy multi seg version with no offload for tear down sequence */
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+   dev->rx_pkt_burst_no_offload =
+   nix_eth_rx_burst_mseg[0][0][0][0];
rte_mb();
 }
diff --git a/drivers/net/cnxk/cn9k_rx.h b/drivers/net/cnxk/cn9k_rx.h
index 92f3c7c..49f80ce 100644
--- a/drivers/net/cnxk/cn9k_rx.h
+++ b/drivers/net/cnxk/cn9k_rx.h
@@ -104,6 +104,53 @@ nix_update_match_id(const uint16_t match_id, uint64_t 
ol_flags,
 }
 
 static __rte_always_inline void
+nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf,
+   uint64_t rearm)
+{
+   const rte_iova_t *iova_list;
+   struct rte_mbuf *head;
+   const rte_iova_t *eol;
+   uint8_t nb_segs;
+   uint64_t sg;
+
+   sg = *(const uint64_t *)(rx + 1);
+   nb_segs = (sg >> 48) & 0x3;
+   mbuf->nb_segs = nb_segs;
+   mbuf->data_len = sg & 0x;
+   sg = sg >> 16;
+
+   eol = ((const rte_iova_t *)(rx + 1) +
+  ((rx->cn9k.desc_sizem1 + 1) << 1));
+   /* Skip SG_S and first IOVA*/
+   iova_list = ((const rte_iova_t *)(rx + 1)) + 2;
+   nb_segs--;
+
+   rearm = rearm & ~0x;
+
+   head = mbuf;
+   while (nb_segs) {
+   mbuf->next = ((struct rte_mbuf *)*iova_list) - 1;
+   mbuf = mbuf->next;
+
+   __mempool_check_cookies(mbuf->pool, (void **)&mbuf, 1, 1);
+
+   mbuf->data_len = sg & 0x;
+   sg = sg >> 16;
+   *(uint64_t *)(&mbuf->rearm_data) = rearm;
+   nb_segs--;
+   iova_list++;
+
+   if (!nb_segs && (iova_list + 1 < eol)) {
+   sg = *(const uint64_t *)(iova_list);
+   nb_segs = (sg >> 48) & 0x3;
+   head->nb_segs += nb_segs;
+   iova_list = (const rte_iova_t *)(iova_list + 1);
+   }
+   }
+   mbuf->next = NULL;
+}
+
+static __rte_always_inline void
 cn9k_nix_cqe_to_mbuf(const struct nix_cqe_hdr_s *cq, const uint32_t tag,
 struct rte_mbuf *mbuf, const void *lookup_mem,
 const uint64_t val, const uint16_t flag)
@@ -138,8 +185,12 @@ cn9k_nix_cqe_to_mbuf(const struct nix_cqe_hdr_s *cq, const 
uint32_t tag,
*(uint64_t *)(&mbuf->rearm_data) = val;
mbuf->pkt_len = len;
 
-   mbuf->data_len = len;
-   mbuf->next = NULL;
+   if (flag & NIX_RX_MULTI_SEG_F) {
+   nix_cqe_xtract_mseg(rx, mbuf, val);
+   } else {
+   mbuf->data_len = len;
+   mbuf->next = NULL;
+   }
 }
 
 static inline uint16_t
@@ -239,8 +290,11 @@ R(mark_cksum_rss,  1, 1, 0, 1, MARK_F | CKSUM_F | 
RSS_F)  \
 R(mark_cksum_ptype,1, 1, 1, 0, MARK_F | CKSUM_F | PTYPE_F)\
 R(mark_cksum_ptype_rss,1, 1, 1, 1, MARK_F | CKSUM_F | PTYPE_F 
| RSS_F)
 
-#define R(name, f3, f2, f1, f0, flags)\
+#define R(name, f3, f2, f1, f0, flags) 
\
uint16_t __rte_noinline __rte_hot cn9k_nix_recv_pkts_##name(   \
+   void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts); \
+  \
+   uint16_t __rte_noinline __rte_hot cn9k_nix_recv_pkts_mseg_##name(  \
  

[dpdk-dev] [PATCH v4 21/62] net/cnxk: add Rx vector version for cn9k

2021-06-22 Thread Nithin Dabilpuram
From: Jerin Jacob 

Add Rx burst vector version for CN9K.

Signed-off-by: Jerin Jacob 
Signed-off-by: Nithin Dabilpuram 
---
 drivers/net/cnxk/cn9k_rx.c |  13 ++-
 drivers/net/cnxk/cn9k_rx.h | 221 +
 drivers/net/cnxk/cn9k_rx_vec.c |  17 
 drivers/net/cnxk/meson.build   |  11 +-
 4 files changed, 260 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/cnxk/cn9k_rx_vec.c

diff --git a/drivers/net/cnxk/cn9k_rx.c b/drivers/net/cnxk/cn9k_rx.c
index 87a62c9..01eb21f 100644
--- a/drivers/net/cnxk/cn9k_rx.c
+++ b/drivers/net/cnxk/cn9k_rx.c
@@ -50,7 +50,18 @@ cn9k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
 #undef R
};
 
-   pick_rx_func(eth_dev, nix_eth_rx_burst);
+   const eth_rx_burst_t nix_eth_rx_vec_burst[2][2][2][2] = {
+#define R(name, f3, f2, f1, f0, flags) \
+   [f3][f2][f1][f0] = cn9k_nix_recv_pkts_vec_##name,
+
+   NIX_RX_FASTPATH_MODES
+#undef R
+   };
+
+   if (dev->scalar_ena)
+   pick_rx_func(eth_dev, nix_eth_rx_burst);
+   else
+   pick_rx_func(eth_dev, nix_eth_rx_vec_burst);
 
if (dev->rx_offloads & DEV_RX_OFFLOAD_SCATTER)
pick_rx_func(eth_dev, nix_eth_rx_burst_mseg);
diff --git a/drivers/net/cnxk/cn9k_rx.h b/drivers/net/cnxk/cn9k_rx.h
index 49f80ce..bc04f5c 100644
--- a/drivers/net/cnxk/cn9k_rx.h
+++ b/drivers/net/cnxk/cn9k_rx.h
@@ -6,6 +6,7 @@
 #define __CN9K_RX_H__
 
 #include 
+#include 
 
 #define NIX_RX_OFFLOAD_NONE (0)
 #define NIX_RX_OFFLOAD_RSS_FBIT(0)
@@ -266,6 +267,223 @@ cn9k_nix_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t pkts,
return nb_pkts;
 }
 
+#if defined(RTE_ARCH_ARM64)
+
+static __rte_always_inline uint16_t
+cn9k_nix_recv_pkts_vector(void *rx_queue, struct rte_mbuf **rx_pkts,
+ uint16_t pkts, const uint16_t flags)
+{
+   struct cn9k_eth_rxq *rxq = rx_queue;
+   uint16_t packets = 0;
+   uint64x2_t cq0_w8, cq1_w8, cq2_w8, cq3_w8, mbuf01, mbuf23;
+   const uint64_t mbuf_initializer = rxq->mbuf_initializer;
+   const uint64x2_t data_off = vdupq_n_u64(rxq->data_off);
+   uint64_t ol_flags0, ol_flags1, ol_flags2, ol_flags3;
+   uint64x2_t rearm0 = vdupq_n_u64(mbuf_initializer);
+   uint64x2_t rearm1 = vdupq_n_u64(mbuf_initializer);
+   uint64x2_t rearm2 = vdupq_n_u64(mbuf_initializer);
+   uint64x2_t rearm3 = vdupq_n_u64(mbuf_initializer);
+   struct rte_mbuf *mbuf0, *mbuf1, *mbuf2, *mbuf3;
+   const uint16_t *lookup_mem = rxq->lookup_mem;
+   const uint32_t qmask = rxq->qmask;
+   const uint64_t wdata = rxq->wdata;
+   const uintptr_t desc = rxq->desc;
+   uint8x16_t f0, f1, f2, f3;
+   uint32_t head = rxq->head;
+   uint16_t pkts_left;
+
+   pkts = nix_rx_nb_pkts(rxq, wdata, pkts, qmask);
+   pkts_left = pkts & (NIX_DESCS_PER_LOOP - 1);
+
+   /* Packets has to be floor-aligned to NIX_DESCS_PER_LOOP */
+   pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+
+   while (packets < pkts) {
+   /* Exit loop if head is about to wrap and become unaligned */
+   if (((head + NIX_DESCS_PER_LOOP - 1) & qmask) <
+   NIX_DESCS_PER_LOOP) {
+   pkts_left += (pkts - packets);
+   break;
+   }
+
+   const uintptr_t cq0 = desc + CQE_SZ(head);
+
+   /* Prefetch N desc ahead */
+   rte_prefetch_non_temporal((void *)(cq0 + CQE_SZ(8)));
+   rte_prefetch_non_temporal((void *)(cq0 + CQE_SZ(9)));
+   rte_prefetch_non_temporal((void *)(cq0 + CQE_SZ(10)));
+   rte_prefetch_non_temporal((void *)(cq0 + CQE_SZ(11)));
+
+   /* Get NIX_RX_SG_S for size and buffer pointer */
+   cq0_w8 = vld1q_u64((uint64_t *)(cq0 + CQE_SZ(0) + 64));
+   cq1_w8 = vld1q_u64((uint64_t *)(cq0 + CQE_SZ(1) + 64));
+   cq2_w8 = vld1q_u64((uint64_t *)(cq0 + CQE_SZ(2) + 64));
+   cq3_w8 = vld1q_u64((uint64_t *)(cq0 + CQE_SZ(3) + 64));
+
+   /* Extract mbuf from NIX_RX_SG_S */
+   mbuf01 = vzip2q_u64(cq0_w8, cq1_w8);
+   mbuf23 = vzip2q_u64(cq2_w8, cq3_w8);
+   mbuf01 = vqsubq_u64(mbuf01, data_off);
+   mbuf23 = vqsubq_u64(mbuf23, data_off);
+
+   /* Move mbufs to scalar registers for future use */
+   mbuf0 = (struct rte_mbuf *)vgetq_lane_u64(mbuf01, 0);
+   mbuf1 = (struct rte_mbuf *)vgetq_lane_u64(mbuf01, 1);
+   mbuf2 = (struct rte_mbuf *)vgetq_lane_u64(mbuf23, 0);
+   mbuf3 = (struct rte_mbuf *)vgetq_lane_u64(mbuf23, 1);
+
+   /* Mask to get packet len from NIX_RX_SG_S */
+   const uint8x16_t shuf_msk = {
+   0xFF, 0xFF, /* pkt_type set as unknown */
+   0xFF, 0xFF, /* pkt_type set as unkn

[dpdk-dev] [PATCH v4 22/62] net/cnxk: add Tx burst for cn9k

2021-06-22 Thread Nithin Dabilpuram
From: Jerin Jacob 

Add Tx burst scalar version for CN9K.

Signed-off-by: Jerin Jacob 
Signed-off-by: Nithin Dabilpuram 
Signed-off-by: Pavan Nikhilesh 
Signed-off-by: Harman Kalra 
---
 drivers/net/cnxk/cn9k_ethdev.h |   1 +
 drivers/net/cnxk/cn9k_tx.c |  53 ++
 drivers/net/cnxk/cn9k_tx.h | 419 +
 drivers/net/cnxk/cnxk_ethdev.h |  71 +++
 drivers/net/cnxk/meson.build   |   3 +-
 5 files changed, 546 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cn9k_tx.c

diff --git a/drivers/net/cnxk/cn9k_ethdev.h b/drivers/net/cnxk/cn9k_ethdev.h
index bab5540..f8344e3 100644
--- a/drivers/net/cnxk/cn9k_ethdev.h
+++ b/drivers/net/cnxk/cn9k_ethdev.h
@@ -33,5 +33,6 @@ struct cn9k_eth_rxq {
 
 /* Rx and Tx routines */
 void cn9k_eth_set_rx_function(struct rte_eth_dev *eth_dev);
+void cn9k_eth_set_tx_function(struct rte_eth_dev *eth_dev);
 
 #endif /* __CN9K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn9k_tx.c b/drivers/net/cnxk/cn9k_tx.c
new file mode 100644
index 000..a0b022a
--- /dev/null
+++ b/drivers/net/cnxk/cn9k_tx.c
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#include "cn9k_ethdev.h"
+#include "cn9k_tx.h"
+
+#define T(name, f4, f3, f2, f1, f0, sz, flags)\
+   uint16_t __rte_noinline __rte_hot cn9k_nix_xmit_pkts_##name(   \
+   void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts)  \
+   {  \
+   uint64_t cmd[sz];  \
+  \
+   /* For TSO inner checksum is a must */ \
+   if (((flags) & NIX_TX_OFFLOAD_TSO_F) &&\
+   !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+   return 0;  \
+   return cn9k_nix_xmit_pkts(tx_queue, tx_pkts, pkts, cmd, flags);\
+   }
+
+NIX_TX_FASTPATH_MODES
+#undef T
+
+static inline void
+pick_tx_func(struct rte_eth_dev *eth_dev,
+const eth_tx_burst_t tx_burst[2][2][2][2][2])
+{
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+   /* [TSO] [NOFF] [VLAN] [OL3_OL4_CSUM] [IL3_IL4_CSUM] */
+   eth_dev->tx_pkt_burst = tx_burst
+   [!!(dev->tx_offload_flags & NIX_TX_OFFLOAD_TSO_F)]
+   [!!(dev->tx_offload_flags & NIX_TX_OFFLOAD_MBUF_NOFF_F)]
+   [!!(dev->tx_offload_flags & NIX_TX_OFFLOAD_VLAN_QINQ_F)]
+   [!!(dev->tx_offload_flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F)]
+   [!!(dev->tx_offload_flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F)];
+}
+
+void
+cn9k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
+{
+   const eth_tx_burst_t nix_eth_tx_burst[2][2][2][2][2] = {
+#define T(name, f4, f3, f2, f1, f0, sz, flags)\
+   [f4][f3][f2][f1][f0] = cn9k_nix_xmit_pkts_##name,
+
+   NIX_TX_FASTPATH_MODES
+#undef T
+   };
+
+   pick_tx_func(eth_dev, nix_eth_tx_burst);
+
+   rte_mb();
+}
diff --git a/drivers/net/cnxk/cn9k_tx.h b/drivers/net/cnxk/cn9k_tx.h
index bb6379b..7acecc6 100644
--- a/drivers/net/cnxk/cn9k_tx.h
+++ b/drivers/net/cnxk/cn9k_tx.h
@@ -4,10 +4,429 @@
 #ifndef __CN9K_TX_H__
 #define __CN9K_TX_H__
 
+#define NIX_TX_OFFLOAD_NONE  (0)
+#define NIX_TX_OFFLOAD_L3_L4_CSUM_F   BIT(0)
+#define NIX_TX_OFFLOAD_OL3_OL4_CSUM_F BIT(1)
 #define NIX_TX_OFFLOAD_VLAN_QINQ_FBIT(2)
+#define NIX_TX_OFFLOAD_MBUF_NOFF_FBIT(3)
 #define NIX_TX_OFFLOAD_TSO_F BIT(4)
 
+/* Flags to control xmit_prepare function.
+ * Defining it from backwards to denote its been
+ * not used as offload flags to pick function
+ */
+#define NIX_TX_MULTI_SEG_F BIT(15)
+
+#define NIX_TX_NEED_SEND_HDR_W1
\
+   (NIX_TX_OFFLOAD_L3_L4_CSUM_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F | \
+NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)
+
 #define NIX_TX_NEED_EXT_HDR
\
(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)
 
+#define NIX_XMIT_FC_OR_RETURN(txq, pkts)   
\
+   do {   \
+   /* Cached value is low, Update the fc_cache_pkts */\
+   if (unlikely((txq)->fc_cache_pkts < (pkts))) { \
+   /* Multiply with sqe_per_sqb to express in pkts */ \
+   (txq)->fc_cache_pkts = \
+   ((txq)->nb_sqb_bufs_adj - *(txq)->fc_mem)  \
+   << (txq)->sqes_per_sqb_log2;   \
+   /* Check it

[dpdk-dev] [PATCH v4 23/62] net/cnxk: add Tx multi-segment version for cn9k

2021-06-22 Thread Nithin Dabilpuram
Add Tx burst multi-segment version for CN9K.

Signed-off-by: Nithin Dabilpuram 
Signed-off-by: Pavan Nikhilesh 
---
 drivers/net/cnxk/cn9k_tx.c  |  14 
 drivers/net/cnxk/cn9k_tx.h  | 150 
 drivers/net/cnxk/cn9k_tx_mseg.c |  25 +++
 drivers/net/cnxk/cnxk_ethdev.h  |   4 ++
 drivers/net/cnxk/meson.build|   3 +-
 5 files changed, 195 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cn9k_tx_mseg.c

diff --git a/drivers/net/cnxk/cn9k_tx.c b/drivers/net/cnxk/cn9k_tx.c
index a0b022a..8f1d5f5 100644
--- a/drivers/net/cnxk/cn9k_tx.c
+++ b/drivers/net/cnxk/cn9k_tx.c
@@ -21,6 +21,7 @@
 NIX_TX_FASTPATH_MODES
 #undef T
 
+
 static inline void
 pick_tx_func(struct rte_eth_dev *eth_dev,
 const eth_tx_burst_t tx_burst[2][2][2][2][2])
@@ -39,6 +40,8 @@ pick_tx_func(struct rte_eth_dev *eth_dev,
 void
 cn9k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
 {
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
const eth_tx_burst_t nix_eth_tx_burst[2][2][2][2][2] = {
 #define T(name, f4, f3, f2, f1, f0, sz, flags)\
[f4][f3][f2][f1][f0] = cn9k_nix_xmit_pkts_##name,
@@ -47,7 +50,18 @@ cn9k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
 #undef T
};
 
+   const eth_tx_burst_t nix_eth_tx_burst_mseg[2][2][2][2][2] = {
+#define T(name, f4, f3, f2, f1, f0, sz, flags)\
+   [f4][f3][f2][f1][f0] = cn9k_nix_xmit_pkts_mseg_##name,
+
+   NIX_TX_FASTPATH_MODES
+#undef T
+   };
+
pick_tx_func(eth_dev, nix_eth_tx_burst);
 
+   if (dev->tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS)
+   pick_tx_func(eth_dev, nix_eth_tx_burst_mseg);
+
rte_mb();
 }
diff --git a/drivers/net/cnxk/cn9k_tx.h b/drivers/net/cnxk/cn9k_tx.h
index 7acecc6..d9aa406 100644
--- a/drivers/net/cnxk/cn9k_tx.h
+++ b/drivers/net/cnxk/cn9k_tx.h
@@ -311,6 +311,111 @@ cn9k_nix_xmit_submit_lmt_release(const rte_iova_t io_addr)
 }
 
 static __rte_always_inline uint16_t
+cn9k_nix_prepare_mseg(struct rte_mbuf *m, uint64_t *cmd, const uint16_t flags)
+{
+   struct nix_send_hdr_s *send_hdr;
+   union nix_send_sg_s *sg;
+   struct rte_mbuf *m_next;
+   uint64_t *slist, sg_u;
+   uint64_t nb_segs;
+   uint64_t segdw;
+   uint8_t off, i;
+
+   send_hdr = (struct nix_send_hdr_s *)cmd;
+   send_hdr->w0.total = m->pkt_len;
+   send_hdr->w0.aura = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+
+   if (flags & NIX_TX_NEED_EXT_HDR)
+   off = 2;
+   else
+   off = 0;
+
+   sg = (union nix_send_sg_s *)&cmd[2 + off];
+   /* Clear sg->u header before use */
+   sg->u &= 0xFC00;
+   sg_u = sg->u;
+   slist = &cmd[3 + off];
+
+   i = 0;
+   nb_segs = m->nb_segs;
+
+   /* Fill mbuf segments */
+   do {
+   m_next = m->next;
+   sg_u = sg_u | ((uint64_t)m->data_len << (i << 4));
+   *slist = rte_mbuf_data_iova(m);
+   /* Set invert df if buffer is not to be freed by H/W */
+   if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+   sg_u |= (cnxk_nix_prefree_seg(m) << (i + 55));
+   /* Commit changes to mbuf */
+   rte_io_wmb();
+   }
+   /* Mark mempool object as "put" since it is freed by NIX */
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+   if (!(sg_u & (1ULL << (i + 55
+   __mempool_check_cookies(m->pool, (void **)&m, 1, 0);
+   rte_io_wmb();
+#endif
+   slist++;
+   i++;
+   nb_segs--;
+   if (i > 2 && nb_segs) {
+   i = 0;
+   /* Next SG subdesc */
+   *(uint64_t *)slist = sg_u & 0xFC00;
+   sg->u = sg_u;
+   sg->segs = 3;
+   sg = (union nix_send_sg_s *)slist;
+   sg_u = sg->u;
+   slist++;
+   }
+   m = m_next;
+   } while (nb_segs);
+
+   sg->u = sg_u;
+   sg->segs = i;
+   segdw = (uint64_t *)slist - (uint64_t *)&cmd[2 + off];
+   /* Roundup extra dwords to multiple of 2 */
+   segdw = (segdw >> 1) + (segdw & 0x1);
+   /* Default dwords */
+   segdw += (off >> 1) + 1;
+   send_hdr->w0.sizem1 = segdw - 1;
+
+   return segdw;
+}
+
+static __rte_always_inline void
+cn9k_nix_xmit_mseg_prep_lmt(uint64_t *cmd, void *lmt_addr, uint16_t segdw)
+{
+   roc_lmt_mov_seg(lmt_addr, (const void *)cmd, segdw);
+}
+
+static __rte_always_inline void
+cn9k_nix_xmit_mseg_one(uint64_t *cmd, void *lmt_addr, rte_iova_t io_addr,
+  uint16_t segdw)
+{
+   uint64_t lmt_status;
+
+   do {
+   roc_lmt_mov_seg(lmt_addr, (const void *)cmd, segdw);
+   lmt_status = r

[dpdk-dev] [PATCH v4 24/62] net/cnxk: add Tx vector version for cn9k

2021-06-22 Thread Nithin Dabilpuram
Add Tx burst vector version for CN9K.

Signed-off-by: Nithin Dabilpuram 
Signed-off-by: Pavan Nikhilesh 
---
 drivers/net/cnxk/cn9k_tx.c |  16 +-
 drivers/net/cnxk/cn9k_tx.h | 743 +
 drivers/net/cnxk/cn9k_tx_vec.c |  25 ++
 drivers/net/cnxk/meson.build   |   3 +-
 4 files changed, 784 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/cnxk/cn9k_tx_vec.c

diff --git a/drivers/net/cnxk/cn9k_tx.c b/drivers/net/cnxk/cn9k_tx.c
index 8f1d5f5..2ff9720 100644
--- a/drivers/net/cnxk/cn9k_tx.c
+++ b/drivers/net/cnxk/cn9k_tx.c
@@ -21,7 +21,6 @@
 NIX_TX_FASTPATH_MODES
 #undef T
 
-
 static inline void
 pick_tx_func(struct rte_eth_dev *eth_dev,
 const eth_tx_burst_t tx_burst[2][2][2][2][2])
@@ -58,7 +57,20 @@ cn9k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
 #undef T
};
 
-   pick_tx_func(eth_dev, nix_eth_tx_burst);
+   const eth_tx_burst_t nix_eth_tx_vec_burst[2][2][2][2][2] = {
+#define T(name, f4, f3, f2, f1, f0, sz, flags)\
+   [f4][f3][f2][f1][f0] = cn9k_nix_xmit_pkts_vec_##name,
+
+   NIX_TX_FASTPATH_MODES
+#undef T
+   };
+
+   if (dev->scalar_ena ||
+   (dev->tx_offload_flags &
+(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)))
+   pick_tx_func(eth_dev, nix_eth_tx_burst);
+   else
+   pick_tx_func(eth_dev, nix_eth_tx_vec_burst);
 
if (dev->tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS)
pick_tx_func(eth_dev, nix_eth_tx_burst_mseg);
diff --git a/drivers/net/cnxk/cn9k_tx.h b/drivers/net/cnxk/cn9k_tx.h
index d9aa406..7b0d536 100644
--- a/drivers/net/cnxk/cn9k_tx.h
+++ b/drivers/net/cnxk/cn9k_tx.h
@@ -4,6 +4,8 @@
 #ifndef __CN9K_TX_H__
 #define __CN9K_TX_H__
 
+#include 
+
 #define NIX_TX_OFFLOAD_NONE  (0)
 #define NIX_TX_OFFLOAD_L3_L4_CSUM_F   BIT(0)
 #define NIX_TX_OFFLOAD_OL3_OL4_CSUM_F BIT(1)
@@ -495,6 +497,744 @@ cn9k_nix_xmit_pkts_mseg(void *tx_queue, struct rte_mbuf 
**tx_pkts,
return pkts;
 }
 
+#if defined(RTE_ARCH_ARM64)
+
+#define NIX_DESCS_PER_LOOP 4
+static __rte_always_inline uint16_t
+cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts,
+ uint16_t pkts, uint64_t *cmd, const uint16_t flags)
+{
+   uint64x2_t dataoff_iova0, dataoff_iova1, dataoff_iova2, dataoff_iova3;
+   uint64x2_t len_olflags0, len_olflags1, len_olflags2, len_olflags3;
+   uint64x2_t cmd0[NIX_DESCS_PER_LOOP], cmd1[NIX_DESCS_PER_LOOP];
+   uint64_t *mbuf0, *mbuf1, *mbuf2, *mbuf3;
+   uint64x2_t senddesc01_w0, senddesc23_w0;
+   uint64x2_t senddesc01_w1, senddesc23_w1;
+   uint64x2_t sgdesc01_w0, sgdesc23_w0;
+   uint64x2_t sgdesc01_w1, sgdesc23_w1;
+   struct cn9k_eth_txq *txq = tx_queue;
+   uint64_t *lmt_addr = txq->lmt_addr;
+   rte_iova_t io_addr = txq->io_addr;
+   uint64x2_t ltypes01, ltypes23;
+   uint64x2_t xtmp128, ytmp128;
+   uint64x2_t xmask01, xmask23;
+   uint64_t lmt_status, i;
+   uint16_t pkts_left;
+
+   NIX_XMIT_FC_OR_RETURN(txq, pkts);
+
+   pkts_left = pkts & (NIX_DESCS_PER_LOOP - 1);
+   pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+
+   /* Reduce the cached count */
+   txq->fc_cache_pkts -= pkts;
+
+   /* Lets commit any changes in the packet here as no further changes
+* to the packet will be done unless no fast free is enabled.
+*/
+   if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+   rte_io_wmb();
+
+   senddesc01_w0 = vld1q_dup_u64(&txq->cmd[0]);
+   senddesc23_w0 = senddesc01_w0;
+   senddesc01_w1 = vdupq_n_u64(0);
+   senddesc23_w1 = senddesc01_w1;
+   sgdesc01_w0 = vld1q_dup_u64(&txq->cmd[2]);
+   sgdesc23_w0 = sgdesc01_w0;
+
+   for (i = 0; i < pkts; i += NIX_DESCS_PER_LOOP) {
+   /* Clear lower 32bit of SEND_HDR_W0 and SEND_SG_W0 */
+   senddesc01_w0 =
+   vbicq_u64(senddesc01_w0, vdupq_n_u64(0x));
+   sgdesc01_w0 = vbicq_u64(sgdesc01_w0, vdupq_n_u64(0x));
+
+   senddesc23_w0 = senddesc01_w0;
+   sgdesc23_w0 = sgdesc01_w0;
+
+   /* Move mbufs to iova */
+   mbuf0 = (uint64_t *)tx_pkts[0];
+   mbuf1 = (uint64_t *)tx_pkts[1];
+   mbuf2 = (uint64_t *)tx_pkts[2];
+   mbuf3 = (uint64_t *)tx_pkts[3];
+
+   mbuf0 = (uint64_t *)((uintptr_t)mbuf0 +
+offsetof(struct rte_mbuf, buf_iova));
+   mbuf1 = (uint64_t *)((uintptr_t)mbuf1 +
+offsetof(struct rte_mbuf, buf_iova));
+   mbuf2 = (uint64_t *)((uintptr_t)mbuf2 +
+offsetof(struct rte_mbuf, buf_iova));
+   mbuf3 = (uint64_t *)((uintptr_t)mbuf3 +
+offsetof(struct rte_mbuf, buf_iova));
+   /*
+

[dpdk-dev] [PATCH v4 25/62] net/cnxk: add Rx burst for cn10k

2021-06-22 Thread Nithin Dabilpuram
From: Jerin Jacob 

Add Rx burst support for CN10K SoC.

Signed-off-by: Jerin Jacob 
Signed-off-by: Nithin Dabilpuram 
Signed-off-by: Pavan Nikhilesh 
Signed-off-by: Harman Kalra 
---
 drivers/net/cnxk/cn10k_ethdev.h |   3 +
 drivers/net/cnxk/cn10k_rx.c |  45 
 drivers/net/cnxk/cn10k_rx.h | 236 
 drivers/net/cnxk/meson.build|   3 +-
 4 files changed, 286 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cn10k_rx.c

diff --git a/drivers/net/cnxk/cn10k_ethdev.h b/drivers/net/cnxk/cn10k_ethdev.h
index 18deb95..596985f 100644
--- a/drivers/net/cnxk/cn10k_ethdev.h
+++ b/drivers/net/cnxk/cn10k_ethdev.h
@@ -33,4 +33,7 @@ struct cn10k_eth_rxq {
uint16_t rq;
 } __plt_cache_aligned;
 
+/* Rx and Tx routines */
+void cn10k_eth_set_rx_function(struct rte_eth_dev *eth_dev);
+
 #endif /* __CN10K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn10k_rx.c b/drivers/net/cnxk/cn10k_rx.c
new file mode 100644
index 000..8b422d0
--- /dev/null
+++ b/drivers/net/cnxk/cn10k_rx.c
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#include "cn10k_ethdev.h"
+#include "cn10k_rx.h"
+
+#define R(name, f3, f2, f1, f0, flags)\
+   uint16_t __rte_noinline __rte_hot cn10k_nix_recv_pkts_##name(  \
+   void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)  \
+   {  \
+   return cn10k_nix_recv_pkts(rx_queue, rx_pkts, pkts, (flags));  \
+   }
+
+NIX_RX_FASTPATH_MODES
+#undef R
+
+static inline void
+pick_rx_func(struct rte_eth_dev *eth_dev,
+const eth_rx_burst_t rx_burst[2][2][2][2])
+{
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+   /* [MARK] [CKSUM] [PTYPE] [RSS] */
+   eth_dev->rx_pkt_burst = rx_burst
+   [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_MARK_UPDATE_F)]
+   [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_CHECKSUM_F)]
+   [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_PTYPE_F)]
+   [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_RSS_F)];
+}
+
+void
+cn10k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
+{
+   const eth_rx_burst_t nix_eth_rx_burst[2][2][2][2] = {
+#define R(name, f3, f2, f1, f0, flags)   \
+   [f3][f2][f1][f0] = cn10k_nix_recv_pkts_##name,
+
+   NIX_RX_FASTPATH_MODES
+#undef R
+   };
+
+   pick_rx_func(eth_dev, nix_eth_rx_burst);
+   rte_mb();
+}
diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h
index d3d1661..01c9d29 100644
--- a/drivers/net/cnxk/cn10k_rx.h
+++ b/drivers/net/cnxk/cn10k_rx.h
@@ -6,6 +6,242 @@
 
 #include 
 
+#define NIX_RX_OFFLOAD_NONE (0)
+#define NIX_RX_OFFLOAD_RSS_FBIT(0)
 #define NIX_RX_OFFLOAD_PTYPE_F  BIT(1)
+#define NIX_RX_OFFLOAD_CHECKSUM_FBIT(2)
+#define NIX_RX_OFFLOAD_MARK_UPDATE_F BIT(3)
+
+/* Flags to control cqe_to_mbuf conversion function.
+ * Defining it from backwards to denote its been
+ * not used as offload flags to pick function
+ */
+#define NIX_RX_MULTI_SEG_F BIT(15)
+
+#define CNXK_NIX_CQ_ENTRY_SZ 128
+#define NIX_DESCS_PER_LOOP   4
+#define CQE_CAST(x) ((struct nix_cqe_hdr_s *)(x))
+#define CQE_SZ(x)   ((x) * CNXK_NIX_CQ_ENTRY_SZ)
+
+union mbuf_initializer {
+   struct {
+   uint16_t data_off;
+   uint16_t refcnt;
+   uint16_t nb_segs;
+   uint16_t port;
+   } fields;
+   uint64_t value;
+};
+
+static __rte_always_inline uint64_t
+nix_clear_data_off(uint64_t oldval)
+{
+   union mbuf_initializer mbuf_init = {.value = oldval};
+
+   mbuf_init.fields.data_off = 0;
+   return mbuf_init.value;
+}
+
+static __rte_always_inline struct rte_mbuf *
+nix_get_mbuf_from_cqe(void *cq, const uint64_t data_off)
+{
+   rte_iova_t buff;
+
+   /* Skip CQE, NIX_RX_PARSE_S and SG HDR(9 DWORDs) and peek buff addr */
+   buff = *((rte_iova_t *)((uint64_t *)cq + 9));
+   return (struct rte_mbuf *)(buff - data_off);
+}
+
+static __rte_always_inline uint32_t
+nix_ptype_get(const void *const lookup_mem, const uint64_t in)
+{
+   const uint16_t *const ptype = lookup_mem;
+   const uint16_t lh_lg_lf = (in & 0xFFF0) >> 52;
+   const uint16_t tu_l2 = ptype[(in & 0x0000) >> 36];
+   const uint16_t il4_tu = ptype[PTYPE_NON_TUNNEL_ARRAY_SZ + lh_lg_lf];
+
+   return (il4_tu << PTYPE_NON_TUNNEL_WIDTH) | tu_l2;
+}
+
+static __rte_always_inline uint32_t
+nix_rx_olflags_get(const void *const lookup_mem, const uint64_t in)
+{
+   const uint32_t *const ol_flags =
+   (const uint32_t *)((const uint8_t *)lookup_mem +
+  PTYPE_ARRAY_SZ);
+
+   return ol_flags[(in & 0xfff0) >> 20];
+}
+
+static inline uint64_t
+nix_update_match_id(const uint16_t ma

[dpdk-dev] [PATCH v4 26/62] net/cnxk: add Rx multi-segment version for cn10k

2021-06-22 Thread Nithin Dabilpuram
Add Rx burst multi-segment version for CN10K.

Signed-off-by: Nithin Dabilpuram 
Signed-off-by: Pavan Nikhilesh 
---
 doc/guides/nics/cnxk.rst  |  2 ++
 doc/guides/nics/features/cnxk.ini |  2 ++
 doc/guides/nics/features/cnxk_vec.ini |  1 +
 doc/guides/nics/features/cnxk_vf.ini  |  2 ++
 drivers/net/cnxk/cn10k_rx.c   | 20 +++-
 drivers/net/cnxk/cn10k_rx.h   | 57 +--
 drivers/net/cnxk/cn10k_rx_mseg.c  | 17 +++
 drivers/net/cnxk/meson.build  |  3 +-
 8 files changed, 100 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/cnxk/cn10k_rx_mseg.c

diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
index 8bc85c0..fd7f2dd 100644
--- a/doc/guides/nics/cnxk.rst
+++ b/doc/guides/nics/cnxk.rst
@@ -17,11 +17,13 @@ Features
 Features of the CNXK Ethdev PMD are:
 
 - Packet type information
+- Jumbo frames
 - SR-IOV VF
 - Lock-free Tx queue
 - Multiple queues for TX and RX
 - Receiver Side Scaling (RSS)
 - Link state information
+- Scatter-Gather IO support
 
 Prerequisites
 -
diff --git a/doc/guides/nics/features/cnxk.ini 
b/doc/guides/nics/features/cnxk.ini
index 712f8d5..23564b7 100644
--- a/doc/guides/nics/features/cnxk.ini
+++ b/doc/guides/nics/features/cnxk.ini
@@ -15,6 +15,8 @@ Runtime Tx queue setup = Y
 Queue start/stop = Y
 RSS hash = Y
 Inner RSS= Y
+Jumbo frame  = Y
+Scattered Rx = Y
 Packet type parsing  = Y
 Linux= Y
 ARMv8= Y
diff --git a/doc/guides/nics/features/cnxk_vec.ini 
b/doc/guides/nics/features/cnxk_vec.ini
index 82f2af0..421048d 100644
--- a/doc/guides/nics/features/cnxk_vec.ini
+++ b/doc/guides/nics/features/cnxk_vec.ini
@@ -15,6 +15,7 @@ Runtime Tx queue setup = Y
 Queue start/stop = Y
 RSS hash = Y
 Inner RSS= Y
+Jumbo frame  = Y
 Packet type parsing  = Y
 Linux= Y
 ARMv8= Y
diff --git a/doc/guides/nics/features/cnxk_vf.ini 
b/doc/guides/nics/features/cnxk_vf.ini
index 61fed11..e901fa2 100644
--- a/doc/guides/nics/features/cnxk_vf.ini
+++ b/doc/guides/nics/features/cnxk_vf.ini
@@ -14,6 +14,8 @@ Runtime Tx queue setup = Y
 Queue start/stop = Y
 RSS hash = Y
 Inner RSS= Y
+Jumbo frame  = Y
+Scattered Rx = Y
 Packet type parsing  = Y
 Linux= Y
 ARMv8= Y
diff --git a/drivers/net/cnxk/cn10k_rx.c b/drivers/net/cnxk/cn10k_rx.c
index 8b422d0..ce2cfee 100644
--- a/drivers/net/cnxk/cn10k_rx.c
+++ b/drivers/net/cnxk/cn10k_rx.c
@@ -10,7 +10,7 @@
void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)  \
{  \
return cn10k_nix_recv_pkts(rx_queue, rx_pkts, pkts, (flags));  \
-   }
+   }  \
 
 NIX_RX_FASTPATH_MODES
 #undef R
@@ -32,6 +32,8 @@ pick_rx_func(struct rte_eth_dev *eth_dev,
 void
 cn10k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
 {
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
const eth_rx_burst_t nix_eth_rx_burst[2][2][2][2] = {
 #define R(name, f3, f2, f1, f0, flags)   \
[f3][f2][f1][f0] = cn10k_nix_recv_pkts_##name,
@@ -40,6 +42,22 @@ cn10k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
 #undef R
};
 
+   const eth_rx_burst_t nix_eth_rx_burst_mseg[2][2][2][2] = {
+#define R(name, f3, f2, f1, f0, flags)   \
+   [f3][f2][f1][f0] = cn10k_nix_recv_pkts_mseg_##name,
+
+   NIX_RX_FASTPATH_MODES
+#undef R
+   };
+
pick_rx_func(eth_dev, nix_eth_rx_burst);
+
+   if (dev->rx_offloads & DEV_RX_OFFLOAD_SCATTER)
+   pick_rx_func(eth_dev, nix_eth_rx_burst_mseg);
+
+   /* Copy multi seg version with no offload for tear down sequence */
+   if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+   dev->rx_pkt_burst_no_offload =
+   nix_eth_rx_burst_mseg[0][0][0][0];
rte_mb();
 }
diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h
index 01c9d29..c667c9a 100644
--- a/drivers/net/cnxk/cn10k_rx.h
+++ b/drivers/net/cnxk/cn10k_rx.h
@@ -103,6 +103,52 @@ nix_update_match_id(const uint16_t match_id, uint64_t 
ol_flags,
 }
 
 static __rte_always_inline void
+nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf,
+   uint64_t rearm)
+{
+   const rte_iova_t *iova_list;
+   struct rte_mbuf *head;
+   const rte_iova_t *eol;
+   uint8_t nb_segs;
+   uint64_t sg;
+
+   sg = *(const uint64_t *)(rx + 1);
+   nb_segs = (sg >> 48) & 0x3;
+   mbuf->nb_segs = nb_segs;
+   mbuf->data_len = sg & 0x;
+   sg = sg >> 16;
+
+   eol = ((const rte_iova_t *)(rx + 1) + ((rx->desc_sizem1 + 1) << 1));
+ 

[dpdk-dev] [PATCH v4 27/62] net/cnxk: add Rx vector version for cn10k

2021-06-22 Thread Nithin Dabilpuram
From: Jerin Jacob 

Add Rx burst vector version for CN10K.

Signed-off-by: Jerin Jacob 
Signed-off-by: Nithin Dabilpuram 
---
 doc/guides/nics/cnxk.rst|   1 +
 drivers/net/cnxk/cn10k_rx.c |  13 ++-
 drivers/net/cnxk/cn10k_rx.h | 222 
 drivers/net/cnxk/cn10k_rx_vec.c |  19 
 drivers/net/cnxk/meson.build|   3 +-
 5 files changed, 256 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/cnxk/cn10k_rx_vec.c

diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
index fd7f2dd..481bc7e 100644
--- a/doc/guides/nics/cnxk.rst
+++ b/doc/guides/nics/cnxk.rst
@@ -24,6 +24,7 @@ Features of the CNXK Ethdev PMD are:
 - Receiver Side Scaling (RSS)
 - Link state information
 - Scatter-Gather IO support
+- Vector Poll mode driver
 
 Prerequisites
 -
diff --git a/drivers/net/cnxk/cn10k_rx.c b/drivers/net/cnxk/cn10k_rx.c
index ce2cfee..0598111 100644
--- a/drivers/net/cnxk/cn10k_rx.c
+++ b/drivers/net/cnxk/cn10k_rx.c
@@ -50,7 +50,18 @@ cn10k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
 #undef R
};
 
-   pick_rx_func(eth_dev, nix_eth_rx_burst);
+   const eth_rx_burst_t nix_eth_rx_vec_burst[2][2][2][2] = {
+#define R(name, f3, f2, f1, f0, flags)   \
+   [f3][f2][f1][f0] = cn10k_nix_recv_pkts_vec_##name,
+
+   NIX_RX_FASTPATH_MODES
+#undef R
+   };
+
+   if (dev->scalar_ena)
+   pick_rx_func(eth_dev, nix_eth_rx_burst);
+   else
+   pick_rx_func(eth_dev, nix_eth_rx_vec_burst);
 
if (dev->rx_offloads & DEV_RX_OFFLOAD_SCATTER)
pick_rx_func(eth_dev, nix_eth_rx_burst_mseg);
diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h
index c667c9a..7bb9dd8 100644
--- a/drivers/net/cnxk/cn10k_rx.h
+++ b/drivers/net/cnxk/cn10k_rx.h
@@ -5,6 +5,7 @@
 #define __CN10K_RX_H__
 
 #include 
+#include 
 
 #define NIX_RX_OFFLOAD_NONE (0)
 #define NIX_RX_OFFLOAD_RSS_FBIT(0)
@@ -263,6 +264,224 @@ cn10k_nix_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t pkts,
return nb_pkts;
 }
 
+#if defined(RTE_ARCH_ARM64)
+
+static __rte_always_inline uint16_t
+cn10k_nix_recv_pkts_vector(void *rx_queue, struct rte_mbuf **rx_pkts,
+  uint16_t pkts, const uint16_t flags)
+{
+   struct cn10k_eth_rxq *rxq = rx_queue;
+   uint16_t packets = 0;
+   uint64x2_t cq0_w8, cq1_w8, cq2_w8, cq3_w8, mbuf01, mbuf23;
+   const uint64_t mbuf_initializer = rxq->mbuf_initializer;
+   const uint64x2_t data_off = vdupq_n_u64(rxq->data_off);
+   uint64_t ol_flags0, ol_flags1, ol_flags2, ol_flags3;
+   uint64x2_t rearm0 = vdupq_n_u64(mbuf_initializer);
+   uint64x2_t rearm1 = vdupq_n_u64(mbuf_initializer);
+   uint64x2_t rearm2 = vdupq_n_u64(mbuf_initializer);
+   uint64x2_t rearm3 = vdupq_n_u64(mbuf_initializer);
+   struct rte_mbuf *mbuf0, *mbuf1, *mbuf2, *mbuf3;
+   const uint16_t *lookup_mem = rxq->lookup_mem;
+   const uint32_t qmask = rxq->qmask;
+   const uint64_t wdata = rxq->wdata;
+   const uintptr_t desc = rxq->desc;
+   uint8x16_t f0, f1, f2, f3;
+   uint32_t head = rxq->head;
+   uint16_t pkts_left;
+
+   pkts = nix_rx_nb_pkts(rxq, wdata, pkts, qmask);
+   pkts_left = pkts & (NIX_DESCS_PER_LOOP - 1);
+
+   /* Packets has to be floor-aligned to NIX_DESCS_PER_LOOP */
+   pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+
+   while (packets < pkts) {
+   /* Exit loop if head is about to wrap and become unaligned */
+   if (((head + NIX_DESCS_PER_LOOP - 1) & qmask) <
+   NIX_DESCS_PER_LOOP) {
+   pkts_left += (pkts - packets);
+   break;
+   }
+
+   const uintptr_t cq0 = desc + CQE_SZ(head);
+
+   /* Prefetch N desc ahead */
+   rte_prefetch_non_temporal((void *)(cq0 + CQE_SZ(8)));
+   rte_prefetch_non_temporal((void *)(cq0 + CQE_SZ(9)));
+   rte_prefetch_non_temporal((void *)(cq0 + CQE_SZ(10)));
+   rte_prefetch_non_temporal((void *)(cq0 + CQE_SZ(11)));
+
+   /* Get NIX_RX_SG_S for size and buffer pointer */
+   cq0_w8 = vld1q_u64((uint64_t *)(cq0 + CQE_SZ(0) + 64));
+   cq1_w8 = vld1q_u64((uint64_t *)(cq0 + CQE_SZ(1) + 64));
+   cq2_w8 = vld1q_u64((uint64_t *)(cq0 + CQE_SZ(2) + 64));
+   cq3_w8 = vld1q_u64((uint64_t *)(cq0 + CQE_SZ(3) + 64));
+
+   /* Extract mbuf from NIX_RX_SG_S */
+   mbuf01 = vzip2q_u64(cq0_w8, cq1_w8);
+   mbuf23 = vzip2q_u64(cq2_w8, cq3_w8);
+   mbuf01 = vqsubq_u64(mbuf01, data_off);
+   mbuf23 = vqsubq_u64(mbuf23, data_off);
+
+   /* Move mbufs to scalar registers for future use */
+   mbuf0 = (struct rte_mbuf *)vgetq_lane_u64(mbuf01, 0);
+   mbuf1

[dpdk-dev] [PATCH v4 28/62] net/cnxk: add Tx burst for cn10k

2021-06-22 Thread Nithin Dabilpuram
From: Jerin Jacob 

Add Tx burst scalar version for CN10K.

Signed-off-by: Jerin Jacob 
Signed-off-by: Nithin Dabilpuram 
Signed-off-by: Pavan Nikhilesh 
Signed-off-by: Harman Kalra 
---
 doc/guides/nics/cnxk.rst  |   1 +
 doc/guides/nics/features/cnxk.ini |   7 +
 doc/guides/nics/features/cnxk_vec.ini |   6 +
 doc/guides/nics/features/cnxk_vf.ini  |   7 +
 drivers/net/cnxk/cn10k_ethdev.h   |   1 +
 drivers/net/cnxk/cn10k_tx.c   |  54 
 drivers/net/cnxk/cn10k_tx.h   | 491 ++
 drivers/net/cnxk/meson.build  |   7 +-
 8 files changed, 571 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/cnxk/cn10k_tx.c

diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
index 481bc7e..17da141 100644
--- a/doc/guides/nics/cnxk.rst
+++ b/doc/guides/nics/cnxk.rst
@@ -22,6 +22,7 @@ Features of the CNXK Ethdev PMD are:
 - Lock-free Tx queue
 - Multiple queues for TX and RX
 - Receiver Side Scaling (RSS)
+- Inner and Outer Checksum offload
 - Link state information
 - Scatter-Gather IO support
 - Vector Poll mode driver
diff --git a/doc/guides/nics/features/cnxk.ini 
b/doc/guides/nics/features/cnxk.ini
index 23564b7..02be26b 100644
--- a/doc/guides/nics/features/cnxk.ini
+++ b/doc/guides/nics/features/cnxk.ini
@@ -12,11 +12,18 @@ Link status  = Y
 Link status event= Y
 Runtime Rx queue setup = Y
 Runtime Tx queue setup = Y
+Fast mbuf free   = Y
+Free Tx mbuf on demand = Y
 Queue start/stop = Y
+TSO  = Y
 RSS hash = Y
 Inner RSS= Y
 Jumbo frame  = Y
 Scattered Rx = Y
+L3 checksum offload  = Y
+L4 checksum offload  = Y
+Inner L3 checksum= Y
+Inner L4 checksum= Y
 Packet type parsing  = Y
 Linux= Y
 ARMv8= Y
diff --git a/doc/guides/nics/features/cnxk_vec.ini 
b/doc/guides/nics/features/cnxk_vec.ini
index 421048d..8c63853 100644
--- a/doc/guides/nics/features/cnxk_vec.ini
+++ b/doc/guides/nics/features/cnxk_vec.ini
@@ -12,10 +12,16 @@ Link status  = Y
 Link status event= Y
 Runtime Rx queue setup = Y
 Runtime Tx queue setup = Y
+Fast mbuf free   = Y
+Free Tx mbuf on demand = Y
 Queue start/stop = Y
 RSS hash = Y
 Inner RSS= Y
 Jumbo frame  = Y
+L3 checksum offload  = Y
+L4 checksum offload  = Y
+Inner L3 checksum= Y
+Inner L4 checksum= Y
 Packet type parsing  = Y
 Linux= Y
 ARMv8= Y
diff --git a/doc/guides/nics/features/cnxk_vf.ini 
b/doc/guides/nics/features/cnxk_vf.ini
index e901fa2..a1bd49b 100644
--- a/doc/guides/nics/features/cnxk_vf.ini
+++ b/doc/guides/nics/features/cnxk_vf.ini
@@ -11,11 +11,18 @@ Link status  = Y
 Link status event= Y
 Runtime Rx queue setup = Y
 Runtime Tx queue setup = Y
+Fast mbuf free   = Y
+Free Tx mbuf on demand = Y
 Queue start/stop = Y
+TSO  = Y
 RSS hash = Y
 Inner RSS= Y
 Jumbo frame  = Y
 Scattered Rx = Y
+L3 checksum offload  = Y
+L4 checksum offload  = Y
+Inner L3 checksum= Y
+Inner L4 checksum= Y
 Packet type parsing  = Y
 Linux= Y
 ARMv8= Y
diff --git a/drivers/net/cnxk/cn10k_ethdev.h b/drivers/net/cnxk/cn10k_ethdev.h
index 596985f..d39ca31 100644
--- a/drivers/net/cnxk/cn10k_ethdev.h
+++ b/drivers/net/cnxk/cn10k_ethdev.h
@@ -35,5 +35,6 @@ struct cn10k_eth_rxq {
 
 /* Rx and Tx routines */
 void cn10k_eth_set_rx_function(struct rte_eth_dev *eth_dev);
+void cn10k_eth_set_tx_function(struct rte_eth_dev *eth_dev);
 
 #endif /* __CN10K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn10k_tx.c b/drivers/net/cnxk/cn10k_tx.c
new file mode 100644
index 000..13c605f
--- /dev/null
+++ b/drivers/net/cnxk/cn10k_tx.c
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+#include "cn10k_ethdev.h"
+#include "cn10k_tx.h"
+
+#define T(name, f4, f3, f2, f1, f0, sz, flags)\
+   uint16_t __rte_noinline __rte_hot cn10k_nix_xmit_pkts_##name(  \
+   void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts)  \
+   {  \
+   uint64_t cmd[sz];  \
+  \
+   /* For TSO inner checksum is a must */ \
+   if (((flags) & NIX_TX_OFFLOAD_TSO_F) &&\
+   !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+   return 0;  \
+   return cn10k_nix_xmit_pkts(tx_queue, tx_pkts, pkts, cmd,   \
+  flags); \
+   }
+
+NIX_TX_FASTPATH_MODES
+#undef T
+
+static inline void
+pi

[dpdk-dev] [PATCH v4 29/62] net/cnxk: add Tx multi-segment version for cn10k

2021-06-22 Thread Nithin Dabilpuram
Add Tx burst multi-segment version for CN10K.

Signed-off-by: Nithin Dabilpuram 
Signed-off-by: Pavan Nikhilesh 
---
 drivers/net/cnxk/cn10k_tx.c  |  18 -
 drivers/net/cnxk/cn10k_tx.h  | 171 +++
 drivers/net/cnxk/cn10k_tx_mseg.c |  25 ++
 drivers/net/cnxk/meson.build |   3 +-
 4 files changed, 215 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/cnxk/cn10k_tx_mseg.c

diff --git a/drivers/net/cnxk/cn10k_tx.c b/drivers/net/cnxk/cn10k_tx.c
index 13c605f..9803002 100644
--- a/drivers/net/cnxk/cn10k_tx.c
+++ b/drivers/net/cnxk/cn10k_tx.c
@@ -40,6 +40,8 @@ pick_tx_func(struct rte_eth_dev *eth_dev,
 void
 cn10k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
 {
+   struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
const eth_tx_burst_t nix_eth_tx_burst[2][2][2][2][2] = {
 #define T(name, f4, f3, f2, f1, f0, sz, flags) \
[f4][f3][f2][f1][f0] = cn10k_nix_xmit_pkts_##name,
@@ -48,7 +50,21 @@ cn10k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
 #undef T
};
 
-   pick_tx_func(eth_dev, nix_eth_tx_burst);
+   const eth_tx_burst_t nix_eth_tx_burst_mseg[2][2][2][2][2] = {
+#define T(name, f4, f3, f2, f1, f0, sz, flags) \
+   [f4][f3][f2][f1][f0] = cn10k_nix_xmit_pkts_mseg_##name,
+
+   NIX_TX_FASTPATH_MODES
+#undef T
+   };
+
+   if (dev->scalar_ena ||
+   (dev->tx_offload_flags &
+(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)))
+   pick_tx_func(eth_dev, nix_eth_tx_burst);
+
+   if (dev->tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS)
+   pick_tx_func(eth_dev, nix_eth_tx_burst_mseg);
 
rte_mb();
 }
diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h
index c54fbfe..63e9848 100644
--- a/drivers/net/cnxk/cn10k_tx.h
+++ b/drivers/net/cnxk/cn10k_tx.h
@@ -339,6 +339,77 @@ cn10k_nix_xmit_prepare(struct rte_mbuf *m, uint64_t *cmd, 
uintptr_t lmt_addr,
 }
 
 static __rte_always_inline uint16_t
+cn10k_nix_prepare_mseg(struct rte_mbuf *m, uint64_t *cmd, const uint16_t flags)
+{
+   struct nix_send_hdr_s *send_hdr;
+   union nix_send_sg_s *sg;
+   struct rte_mbuf *m_next;
+   uint64_t *slist, sg_u;
+   uint64_t nb_segs;
+   uint64_t segdw;
+   uint8_t off, i;
+
+   send_hdr = (struct nix_send_hdr_s *)cmd;
+   send_hdr->w0.total = m->pkt_len;
+   send_hdr->w0.aura = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+
+   if (flags & NIX_TX_NEED_EXT_HDR)
+   off = 2;
+   else
+   off = 0;
+
+   sg = (union nix_send_sg_s *)&cmd[2 + off];
+   /* Clear sg->u header before use */
+   sg->u &= 0xFC00;
+   sg_u = sg->u;
+   slist = &cmd[3 + off];
+
+   i = 0;
+   nb_segs = m->nb_segs;
+
+   /* Fill mbuf segments */
+   do {
+   m_next = m->next;
+   sg_u = sg_u | ((uint64_t)m->data_len << (i << 4));
+   *slist = rte_mbuf_data_iova(m);
+   /* Set invert df if buffer is not to be freed by H/W */
+   if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F)
+   sg_u |= (cnxk_nix_prefree_seg(m) << (i + 55));
+   /* Mark mempool object as "put" since it is freed by NIX
+*/
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+   if (!(sg_u & (1ULL << (i + 55
+   __mempool_check_cookies(m->pool, (void **)&m, 1, 0);
+#endif
+   slist++;
+   i++;
+   nb_segs--;
+   if (i > 2 && nb_segs) {
+   i = 0;
+   /* Next SG subdesc */
+   *(uint64_t *)slist = sg_u & 0xFC00;
+   sg->u = sg_u;
+   sg->segs = 3;
+   sg = (union nix_send_sg_s *)slist;
+   sg_u = sg->u;
+   slist++;
+   }
+   m = m_next;
+   } while (nb_segs);
+
+   sg->u = sg_u;
+   sg->segs = i;
+   segdw = (uint64_t *)slist - (uint64_t *)&cmd[2 + off];
+   /* Roundup extra dwords to multiple of 2 */
+   segdw = (segdw >> 1) + (segdw & 0x1);
+   /* Default dwords */
+   segdw += (off >> 1) + 1;
+   send_hdr->w0.sizem1 = segdw - 1;
+
+   return segdw;
+}
+
+static __rte_always_inline uint16_t
 cn10k_nix_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts,
uint64_t *cmd, const uint16_t flags)
 {
@@ -421,6 +492,103 @@ cn10k_nix_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t pkts,
return pkts;
 }
 
+static __rte_always_inline uint16_t
+cn10k_nix_xmit_pkts_mseg(void *tx_queue, struct rte_mbuf **tx_pkts,
+uint16_t pkts, uint64_t *cmd, const uint16_t flags)
+{
+   struct cn10k_eth_txq *txq = tx_queue;
+   uintptr_t pa0, pa1, lmt_addr = txq->lmt

[dpdk-dev] [PATCH v4 30/62] net/cnxk: add Tx vector version for cn10k

2021-06-22 Thread Nithin Dabilpuram
Add Tx burst vector version for CN10K.

Signed-off-by: Nithin Dabilpuram 
Signed-off-by: Pavan Nikhilesh 
---
 drivers/net/cnxk/cn10k_tx.c |  10 +
 drivers/net/cnxk/cn10k_tx.h | 815 
 drivers/net/cnxk/cn10k_tx_vec.c |  25 ++
 drivers/net/cnxk/meson.build|   3 +-
 4 files changed, 852 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cn10k_tx_vec.c

diff --git a/drivers/net/cnxk/cn10k_tx.c b/drivers/net/cnxk/cn10k_tx.c
index 9803002..e6eb101 100644
--- a/drivers/net/cnxk/cn10k_tx.c
+++ b/drivers/net/cnxk/cn10k_tx.c
@@ -58,10 +58,20 @@ cn10k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
 #undef T
};
 
+   const eth_tx_burst_t nix_eth_tx_vec_burst[2][2][2][2][2] = {
+#define T(name, f4, f3, f2, f1, f0, sz, flags) \
+   [f4][f3][f2][f1][f0] = cn10k_nix_xmit_pkts_vec_##name,
+
+   NIX_TX_FASTPATH_MODES
+#undef T
+   };
+
if (dev->scalar_ena ||
(dev->tx_offload_flags &
 (NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)))
pick_tx_func(eth_dev, nix_eth_tx_burst);
+   else
+   pick_tx_func(eth_dev, nix_eth_tx_vec_burst);
 
if (dev->tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS)
pick_tx_func(eth_dev, nix_eth_tx_burst_mseg);
diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h
index 63e9848..b74df10 100644
--- a/drivers/net/cnxk/cn10k_tx.h
+++ b/drivers/net/cnxk/cn10k_tx.h
@@ -4,6 +4,8 @@
 #ifndef __CN10K_TX_H__
 #define __CN10K_TX_H__
 
+#include 
+
 #define NIX_TX_OFFLOAD_NONE  (0)
 #define NIX_TX_OFFLOAD_L3_L4_CSUM_F   BIT(0)
 #define NIX_TX_OFFLOAD_OL3_OL4_CSUM_F BIT(1)
@@ -38,6 +40,9 @@
}  \
} while (0)
 
+#define LMT_OFF(lmt_addr, lmt_num, offset) 
\
+   (void *)((lmt_addr) + ((lmt_num) << ROC_LMT_LINE_SIZE_LOG2) + (offset))
+
 /* Function to determine no of tx subdesc required in case ext
  * sub desc is enabled.
  */
@@ -48,6 +53,14 @@ cn10k_nix_tx_ext_subs(const uint16_t flags)
(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)) ? 1 : 0;
 }
 
+static __rte_always_inline uint8_t
+cn10k_nix_pkts_per_vec_brst(const uint16_t flags)
+{
+   RTE_SET_USED(flags);
+   /* We can pack up to 4 packets per LMTLINE if there are no offloads. */
+   return 4 << ROC_LMT_LINES_PER_CORE_LOG2;
+}
+
 static __rte_always_inline uint64_t
 cn10k_nix_tx_steor_data(const uint16_t flags)
 {
@@ -76,6 +89,35 @@ cn10k_nix_tx_steor_data(const uint16_t flags)
return data;
 }
 
+static __rte_always_inline uint64_t
+cn10k_nix_tx_steor_vec_data(const uint16_t flags)
+{
+   const uint64_t dw_m1 = 0x7;
+   uint64_t data;
+
+   RTE_SET_USED(flags);
+   /* This will be moved to addr area */
+   data = dw_m1;
+   /* 15 vector sizes for single seg */
+   data |= dw_m1 << 19;
+   data |= dw_m1 << 22;
+   data |= dw_m1 << 25;
+   data |= dw_m1 << 28;
+   data |= dw_m1 << 31;
+   data |= dw_m1 << 34;
+   data |= dw_m1 << 37;
+   data |= dw_m1 << 40;
+   data |= dw_m1 << 43;
+   data |= dw_m1 << 46;
+   data |= dw_m1 << 49;
+   data |= dw_m1 << 52;
+   data |= dw_m1 << 55;
+   data |= dw_m1 << 58;
+   data |= dw_m1 << 61;
+
+   return data;
+}
+
 static __rte_always_inline void
 cn10k_nix_tx_skeleton(const struct cn10k_eth_txq *txq, uint64_t *cmd,
  const uint16_t flags)
@@ -589,6 +631,776 @@ cn10k_nix_xmit_pkts_mseg(void *tx_queue, struct rte_mbuf 
**tx_pkts,
return pkts;
 }
 
+#if defined(RTE_ARCH_ARM64)
+
+#define NIX_DESCS_PER_LOOP 4
+static __rte_always_inline uint16_t
+cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts,
+  uint16_t pkts, uint64_t *cmd, const uint16_t flags)
+{
+   uint64x2_t dataoff_iova0, dataoff_iova1, dataoff_iova2, dataoff_iova3;
+   uint64x2_t len_olflags0, len_olflags1, len_olflags2, len_olflags3;
+   uint64x2_t cmd0[NIX_DESCS_PER_LOOP], cmd1[NIX_DESCS_PER_LOOP];
+   uint64_t *mbuf0, *mbuf1, *mbuf2, *mbuf3, data, pa;
+   uint64x2_t senddesc01_w0, senddesc23_w0;
+   uint64x2_t senddesc01_w1, senddesc23_w1;
+   uint16_t left, scalar, burst, i, lmt_id;
+   uint64x2_t sgdesc01_w0, sgdesc23_w0;
+   uint64x2_t sgdesc01_w1, sgdesc23_w1;
+   struct cn10k_eth_txq *txq = tx_queue;
+   uintptr_t laddr = txq->lmt_base;
+   rte_iova_t io_addr = txq->io_addr;
+   uint64x2_t ltypes01, ltypes23;
+   uint64x2_t xtmp128, ytmp128;
+   uint64x2_t xmask01, xmask23;
+   uint8_t lnum;
+
+   NIX_XMIT_FC_OR_RETURN(txq, pkts);
+
+   scalar = pkts & (NIX_DESCS_PER_LOOP - 1);
+   pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+
+   /* Reduce the cached count */
+   txq->fc_cache_pkts -= pkts;
+
+   senddesc01_w0 = vl

[dpdk-dev] [PATCH v4 31/62] net/cnxk: add device start and stop operations

2021-06-22 Thread Nithin Dabilpuram
Add device start and stop operation callbacks for
CN9K and CN10K. Device stop is common for both platforms
while device start as some platform dependent portion where
the platform specific offload flags are recomputed and
the right Rx/Tx burst function is chosen.

Signed-off-by: Nithin Dabilpuram 
---
 doc/guides/nics/cnxk.rst|  84 ++
 drivers/net/cnxk/cn10k_ethdev.c | 124 +++
 drivers/net/cnxk/cn9k_ethdev.c  | 127 
 drivers/net/cnxk/cnxk_ethdev.c  |  90 
 drivers/net/cnxk/cnxk_ethdev.h  |   2 +
 drivers/net/cnxk/cnxk_link.c|  11 
 6 files changed, 438 insertions(+)

diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
index 17da141..15911ee 100644
--- a/doc/guides/nics/cnxk.rst
+++ b/doc/guides/nics/cnxk.rst
@@ -39,6 +39,58 @@ Driver compilation and testing
 Refer to the document :ref:`compiling and testing a PMD for a NIC 
`
 for details.
 
+#. Running testpmd:
+
+   Follow instructions available in the document
+   :ref:`compiling and testing a PMD for a NIC `
+   to run testpmd.
+
+   Example output:
+
+   .. code-block:: console
+
+  .//app/dpdk-testpmd -c 0xc -a 0002:02:00.0 -- --portmask=0x1 
--nb-cores=1 --port-topology=loop --rxq=1 --txq=1
+  EAL: Detected 4 lcore(s)
+  EAL: Detected 1 NUMA nodes
+  EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
+  EAL: Selected IOVA mode 'VA'
+  EAL: No available hugepages reported in hugepages-16777216kB
+  EAL: No available hugepages reported in hugepages-2048kB
+  EAL: Probing VFIO support...
+  EAL: VFIO support initialized
+  EAL:   using IOMMU type 1 (Type 1)
+  [ 2003.202721] vfio-pci 0002:02:00.0: vfio_cap_init: hiding cap 0x14@0x98
+  EAL: Probe PCI driver: net_cn10k (177d:a063) device: 0002:02:00.0 
(socket 0)
+  PMD: RoC Model: cn10k
+  EAL: No legacy callbacks, legacy socket not created
+  testpmd: create a new mbuf pool : n=155456, size=2176, 
socket=0
+  testpmd: preferred mempool ops selected: cn10k_mempool_ops
+  Configuring Port 0 (socket 0)
+  PMD: Port 0: Link Up - speed 25000 Mbps - full-duplex
+
+  Port 0: link state change event
+  Port 0: 96:D4:99:72:A5:BF
+  Checking link statuses...
+  Done
+  No commandline core given, start packet forwarding
+  io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support 
enabled, MP allocation mode: native
+  Logical Core 3 (socket 0) forwards packets on 1 streams:
+RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
+
+io packet forwarding packets/burst=32
+nb forwarding cores=1 - nb forwarding ports=1
+port 0: RX queue number: 1 Tx queue number: 1
+  Rx offloads=0x0 Tx offloads=0x1
+  RX queue: 0
+RX desc=4096 - RX free threshold=0
+RX threshold registers: pthresh=0 hthresh=0  wthresh=0
+RX Offloads=0x0
+  TX queue: 0
+TX desc=512 - TX free threshold=0
+TX threshold registers: pthresh=0 hthresh=0  wthresh=0
+TX offloads=0x0 - TX RS bit threshold=0
+  Press enter to exit
+
 Runtime Config Options
 --
 
@@ -132,3 +184,35 @@ Runtime Config Options
Above devarg parameters are configurable per device, user needs to pass the
parameters to all the PCIe devices if application requires to configure on
all the ethdev ports.
+
+Limitations
+---
+
+``mempool_cnxk`` external mempool handler dependency
+~
+
+The OCTEON CN9K/CN10K SoC family NIC has inbuilt HW assisted external mempool 
manager.
+``net_cnxk`` pmd only works with ``mempool_cnxk`` mempool handler
+as it is performance wise most effective way for packet allocation and Tx 
buffer
+recycling on OCTEON TX2 SoC platform.
+
+CRC stripping
+~
+
+The OCTEON CN9K/CN10K SoC family NICs strip the CRC for every packet being 
received by
+the host interface irrespective of the offload configuration.
+
+Debugging Options
+-
+
+.. _table_cnxk_ethdev_debug_options:
+
+.. table:: cnxk ethdev debug options
+
+   +---++---+
+   | # | Component  | EAL log command   |
+   +===++===+
+   | 1 | NIX| --log-level='pmd\.net.cnxk,8' |
+   +---++---+
+   | 2 | NPC| --log-level='pmd\.net.cnxk\.flow,8'   |
+   +---++---+
diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c
index d70ab00..5ff36bb 100644
--- a/drivers/net/cnxk/cn10k_ethdev.c
+++ b/drivers/net/cnxk/cn10k_ethdev.

  1   2   >