Re: [dpdk-dev] [PATCH] eal/linux: fix build when VFIO is disabled

2019-12-22 Thread Ali Alnubani
> -Original Message-
> From: David Marchand 
> Sent: Monday, December 16, 2019 7:22 PM
> To: Ali Alnubani 
> Cc: dev@dpdk.org; vattun...@marvell.com; sta...@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] eal/linux: fix build when VFIO is disabled
> 
> On Wed, Dec 11, 2019 at 7:26 PM Ali Alnubani 
> wrote:
> > The header linux/version.h isn't included when CONFIG_RTE_EAL_VFIO is
> > explicitly disabled. LINUX_VERSION_CODE and KERNEL_VERSION are
> > therefore undefined, causing the build failure:
> >
> >   lib/librte_eal/linux/eal/eal.c: In function ‘rte_eal_init’:
> >   lib/librte_eal/linux/eal/eal.c:1076:32: error: "LINUX_VERSION_CODE" is
> > not defined, evaluates to 0 [-Werror=undef]
> 
> The patch itself is ok.
> But, out of curiosity, why would you disable vfio support?

No reason. I noticed the build error by accident actually.
> 
> 
> --
> David Marchand

Regards,
Ali


Re: [dpdk-dev] [dpdk-users] Should ''shmget" not be used to consume hugepages in DPDK?

2019-12-22 Thread Byonggon Chun
x-posting to dev mailing list.

Hi all.
I'm Kubernetes contributors and I'm working to make container isolation of
hugepages that allows us to set hugepages limit per container cgroup.
(At this point, limits are set on pod level cgroup even though we asked
hugepages as the container level resource)

I tore down testPMD and some parts of DPDK lib and I got a question after i
found there is no usage of "shmget" in DPDK.

My question is that Should "shmget" not be used to consume hugepages in
DPDK?
And here is following questions:
1) If we don't have to use "shmget", Why? Does it affect performance?
2) If I use "shmget" to get hugepages, should I call "mlock" syscall for it?

For more details, as I know there are three ways to consume hugepages in
kubernetes.
1) shmget with SHM_HUGETLB
2) mmap with hugetlbs filebacking
3) mmap with MAP_ANONYMOUS | MAP_HUGETLB

And I found that testPMD calls mlock syscall when it maps an anonymous
hugepages or external allocated
hugepages.https://github.com/DPDK/dpdk/blob/924e55fb340623f03fdf2ff7fbcfd78819d1db25/app/test-pmd/testpmd.c#L896https://github.com/DPDK/dpdk/blob/924e55fb340623f03fdf2ff7fbcfd78819d1db25/app/test-pmd/testpmd.c#L916

Thanks.





On Fri, Dec 20, 2019 at 9:42 PM Byonggon Chun 
wrote:

> > shmget is a legacy Unix API and there is no point in using it.
>
> Yeah, I agree with it,
> I also prefer to use mmap with hugetlbfs in a DPDK container.
>
> The reason why I started this mail thread is some DPDK users still use
> shmget to consume hugepages, and I just wanted to find a good rationale to
> convince them to use mmap.
>
> But, at this point, I have only one rationale : shmget is a legacy UINIX
> API.
>
> On Fri, Dec 20, 2019 at 6:06 AM Stephen Hemminger <
> step...@networkplumber.org> wrote:
>
>> On Fri, 20 Dec 2019 01:23:50 +0900
>> Byonggon Chun  wrote:
>>
>> > Hi all.
>> > I'm Kubernetes contributors and I'm working to make container isolation
>> of
>> > hugepages that allows us to set hugepages limit per container cgroup.
>> > (At this point, limits are set on pod level cgroup even though we asked
>> > hugepages as the container level resource)
>> >
>> > I tore down testPMD and some parts of DPDK lib and I got a question
>> after i
>> > found there is no usage of "shmget" in DPDK.
>> >
>> > My question is that Should "shmget" not be used to consume hugepages in
>> > DPDK?
>> > And here is following questions:
>> > 1) If we don't have to use "shmget", Why? Does it affect performance?
>> > 2) If I use "shmget" to get hugepages, should I call "mlock" syscall
>> for it?
>> >
>> > For more details, as I know there are three ways to consume hugepages in
>> > kubernetes.
>> > 1) shmget with SHM_HUGETLB
>> > 2) mmap with hugetlbs filebacking
>> > 3) mmap with MAP_ANONYMOUS | MAP_HUGETLB
>> >
>> > And I found that testPMD calls mlock syscall when it maps an anonymous
>> > hugepages or external allocated hugepages.
>> >
>> https://github.com/DPDK/dpdk/blob/924e55fb340623f03fdf2ff7fbcfd78819d1db25/app/test-pmd/testpmd.c#L896
>> >
>> https://github.com/DPDK/dpdk/blob/924e55fb340623f03fdf2ff7fbcfd78819d1db25/app/test-pmd/testpmd.c#L916
>> >
>> > Thanks.
>>
>> shmget is a legacy Unix API and there is no point in using it.
>> For new applications libhugetlbfs is preferable.
>>
>


Re: [dpdk-dev] DPDK techboard minutes @18/12/2019

2019-12-22 Thread Matan Azrad
Hi

> From: Ananyev, Konstantin
> Minutes of Technical Board Meeting, 2019-12-18
> 
> Members Attending
> -
> -Bruce
> -Ferruh
> -Hemant
> -Honnappa
> -Jerin
> -Kevin
> -Konstantin (Chair)
> -Maxime
> -Olivier
> -Stephen
> -Thomas
> 
> NOTE: The technical board meetings every second Wednesday on IRC
> channel  #dpdk-board, at 3pm UTC. Meetings are public and DPDK
> community members are welcome to attend.
> 
> NOTE: Next meeting will be on Wednesday 2020-01-15 @3pm UTC, and will
> be chaired by Maxime Coquelin
> 
> 1) drivers/vdpa introduction discussion.
> Currently, at dpdk.org we have only IFC vDPA driver available (located in
> drivers/net), but more drivers are arriving (Mellanox, Virtio vDPA in v20.02).
> Might be some others are expected in later releases.
> Mellanox proposal is to introduce a new vDPA subsystem (drivers/vdpa), and
> move code common to both net and vdpa PMDs to the drivers/common
> directory.
> drivers/net/ifc will be moved entirely into drivers/vdpa/ifc While it seems to
> provide better code layout for drivers in long run, in short term it probably
> would need significant code refactoring for both mlx and virtio PMDs, plus
> can cause some difficulties for stable release maintainers.
> 
> AR to Maxime (virtio) and Matan (mlx) to estimate how big code refactoring
> will be required for such move.

steps:
common/mlx5:
Move glue and DevX mechanism(4 files) from net/mlx5 to common/mlx5.
Move some shared functions\MACROs from net/mlx5 to common/mlx5.
Mainly move code and files as is (even without names changing) and 
update build files accordingly.

After introducing the new vdpa class:
Move IFC code and its docs to the new class directories(update build 
files accordingly).
Locate the new Mellanox vDPA driver in this class.

The plan is to send the class introduction and the IFC moving this week.
Later on, to introduce the mlx5 common dir and the new Mellanox vDPA driver in 
the vdpa class - ~3 weeks.

> AR to stable release maintainers (Kevin and Luca) to estimate impact on
> stable releases.
> Decision at next TB meeting based on provided estimations/feedback.
> 
> 2) AR from Thomas to other TB members to submit missing notes for
> previous TB meeting.
> 
> 3) AR for Thomas to add the approved email IDs to the security pre-release
> mailing list.
> Reply from Thomas: done.
 



[dpdk-dev] [PATCH] kni: fix kernel deadlock when using mlx devices

2019-12-22 Thread Stephen Hemminger
This fixes a deadlock when using KNI with bifurcated drivers.
Bringing kni device up always times out when using Mellanox
devices.

The kernel KNI driver sends message to userspace to complete
the request. For the case of bifurcated driver, this may involve
an additional request to kernel to change state. This request
would deadlock because KNI was holding the RTNL mutex.

This was a bad design which goes back to the original code.
A workaround is for KNI driver to drop RTNL while waiting.
To prevent the device from disappearing while the operation
is in progress, it needs to hold reference to network device
while waiting.

As an added benefit, an useless error check can also be removed.

Fixes: 3fc5ca2f6352 ("kni: initial import")
Cc: sta...@dpdk.org
Signed-off-by: Stephen Hemminger 
---
 kernel/linux/kni/kni_net.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index 1ba9b1b99f66..b7337c1410b8 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -102,17 +103,15 @@ get_data_kva(struct kni_dev *kni, void *pkt_kva)
  * It can be called to process the request.
  */
 static int
-kni_net_process_request(struct kni_dev *kni, struct rte_kni_request *req)
+kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
 {
+   struct kni_dev *kni = netdev_priv(dev);
int ret = -1;
void *resp_va;
uint32_t num;
int ret_val;
 
-   if (!kni || !req) {
-   pr_err("No kni instance or request\n");
-   return -EINVAL;
-   }
+   ASSERT_RTNL();
 
mutex_lock(&kni->sync_lock);
 
@@ -125,8 +124,17 @@ kni_net_process_request(struct kni_dev *kni, struct 
rte_kni_request *req)
goto fail;
}
 
+   /* Since we need to wait and RTNL mutex is held
+* drop the mutex and hold refernce to keep device
+*/
+   dev_hold(dev);
+   rtnl_unlock();
+
ret_val = wait_event_interruptible_timeout(kni->wq,
kni_fifo_count(kni->resp_q), 3 * HZ);
+   rtnl_lock();
+   dev_put(dev);
+
if (signal_pending(current) || ret_val <= 0) {
ret = -ETIME;
goto fail;
@@ -155,7 +163,6 @@ kni_net_open(struct net_device *dev)
 {
int ret;
struct rte_kni_request req;
-   struct kni_dev *kni = netdev_priv(dev);
 
netif_start_queue(dev);
if (dflt_carrier == 1)
@@ -168,7 +175,7 @@ kni_net_open(struct net_device *dev)
 
/* Setting if_up to non-zero means up */
req.if_up = 1;
-   ret = kni_net_process_request(kni, &req);
+   ret = kni_net_process_request(dev, &req);
 
return (ret == 0) ? req.result : ret;
 }
@@ -178,7 +185,6 @@ kni_net_release(struct net_device *dev)
 {
int ret;
struct rte_kni_request req;
-   struct kni_dev *kni = netdev_priv(dev);
 
netif_stop_queue(dev); /* can't transmit any more */
netif_carrier_off(dev);
@@ -188,7 +194,7 @@ kni_net_release(struct net_device *dev)
 
/* Setting if_up to 0 means down */
req.if_up = 0;
-   ret = kni_net_process_request(kni, &req);
+   ret = kni_net_process_request(dev, &req);
 
return (ret == 0) ? req.result : ret;
 }
@@ -638,14 +644,13 @@ kni_net_change_mtu(struct net_device *dev, int new_mtu)
 {
int ret;
struct rte_kni_request req;
-   struct kni_dev *kni = netdev_priv(dev);
 
pr_debug("kni_net_change_mtu new mtu %d to be set\n", new_mtu);
 
memset(&req, 0, sizeof(req));
req.req_id = RTE_KNI_REQ_CHANGE_MTU;
req.new_mtu = new_mtu;
-   ret = kni_net_process_request(kni, &req);
+   ret = kni_net_process_request(dev, &req);
if (ret == 0 && req.result == 0)
dev->mtu = new_mtu;
 
@@ -656,7 +661,6 @@ static void
 kni_net_change_rx_flags(struct net_device *netdev, int flags)
 {
struct rte_kni_request req;
-   struct kni_dev *kni = netdev_priv(netdev);
 
memset(&req, 0, sizeof(req));
 
@@ -678,7 +682,7 @@ kni_net_change_rx_flags(struct net_device *netdev, int 
flags)
req.promiscusity = 0;
}
 
-   kni_net_process_request(kni, &req);
+   kni_net_process_request(netdev, &req);
 }
 
 /*
@@ -737,7 +741,6 @@ kni_net_set_mac(struct net_device *netdev, void *p)
 {
int ret;
struct rte_kni_request req;
-   struct kni_dev *kni;
struct sockaddr *addr = p;
 
memset(&req, 0, sizeof(req));
@@ -749,8 +752,7 @@ kni_net_set_mac(struct net_device *netdev, void *p)
memcpy(req.mac_addr, addr->sa_data, netdev->addr_len);
memcpy(netdev->dev_addr, addr->sa_data, netdev->addr_len);
 
-   kni = netdev_priv(netdev);
-   ret = kni_net_process_request(kni, &req);
+   ret = kni_net_pro

Re: [dpdk-dev] [PATCH v2] net/virtio: packed ring notification data feature support

2019-12-22 Thread Jiang, Cheng1
Hi Gavin,

> -Original Message-
> From: Gavin Hu [mailto:gavin...@arm.com]
> Sent: Wednesday, December 18, 2019 5:52 PM
> To: Jiang, Cheng1 ; maxime.coque...@redhat.com;
> Bie, Tiwei ; Wang, Zhihong
> ; Liu, Yong 
> Cc: dev@dpdk.org; nd 
> Subject: RE: [dpdk-dev] [PATCH v2] net/virtio: packed ring notification data
> feature support
> 
> Hi Jiang,
> 
> > -Original Message-
> > From: dev  On Behalf Of Cheng Jiang
> > Sent: Wednesday, December 18, 2019 4:17 PM
> > To: maxime.coque...@redhat.com; tiwei@intel.com;
> > zhihong.w...@intel.com; yong@intel.com
> > Cc: dev@dpdk.org; Cheng Jiang 
> > Subject: [dpdk-dev] [PATCH v2] net/virtio: packed ring notification
> > data feature support
> >
> > This patch supports the feature that the driver passes extra data
> > (besides identifying the virtqueue) in its device notifications.
> Can the commit log be more specific about the extra data? Something like
> "expand the notification to include the avail index and avail wrap counter, if
> any"?

It makes more sense. I will add more specific commit log in the next version. 
Thank you.

> >
> > Signed-off-by: Cheng Jiang 
> > ---
> >
> > v2:
> > * Removed rte_unused attribute in *hw.
> > * Added some comments on notify_data.
> >
> >  drivers/net/virtio/virtio_ethdev.h |  3 ++-
> >  drivers/net/virtio/virtio_pci.c| 27 +--
> >  drivers/net/virtio/virtio_pci.h|  6 ++
> >  3 files changed, 33 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio/virtio_ethdev.h
> > b/drivers/net/virtio/virtio_ethdev.h
> > index a10111758..cd8947656 100644
> > --- a/drivers/net/virtio/virtio_ethdev.h
> > +++ b/drivers/net/virtio/virtio_ethdev.h
> > @@ -36,7 +36,8 @@
> >  1ULL << VIRTIO_F_IN_ORDER| \
> >  1ULL << VIRTIO_F_RING_PACKED | \
> >  1ULL << VIRTIO_F_IOMMU_PLATFORM  | \
> > -1ULL << VIRTIO_F_ORDER_PLATFORM)
> > +1ULL << VIRTIO_F_ORDER_PLATFORM  | \
> > +1ULL << VIRTIO_F_NOTIFICATION_DATA)
> >
> >  #define VIRTIO_PMD_SUPPORTED_GUEST_FEATURES\
> > (VIRTIO_PMD_DEFAULT_GUEST_FEATURES |\
> > diff --git a/drivers/net/virtio/virtio_pci.c
> > b/drivers/net/virtio/virtio_pci.c index 4468e89cb..8b4e001a1 100644
> > --- a/drivers/net/virtio/virtio_pci.c
> > +++ b/drivers/net/virtio/virtio_pci.c
> > @@ -416,9 +416,32 @@ modern_del_queue(struct virtio_hw *hw, struct
> > virtqueue *vq)  }
> >
> >  static void
> > -modern_notify_queue(struct virtio_hw *hw __rte_unused, struct
> > virtqueue
> > *vq)
> > +modern_notify_queue(struct virtio_hw *hw, struct virtqueue *vq)
> >  {
> > -   rte_write16(vq->vq_queue_index, vq->notify_addr);
> > +   uint32_t notify_data;
> > +
> > +   if (!vtpci_with_feature(hw, VIRTIO_F_NOTIFICATION_DATA)) {
> > +   rte_write16(vq->vq_queue_index, vq->notify_addr);
> > +   return;
> > +   }
> > +
> > +   if (vtpci_with_feature(hw, VIRTIO_F_RING_PACKED)) {
> > +   /*
> > +* Bit[0:15]: vq queue index
> > +* Bit[16:30]: avail index
> > +* Bit[31]: avail wrap counter
> > +*/
> > +   notify_data = uint32_t)vq-
> > >vq_packed.used_wrap_counter << 15) |
> Why not do ' used_wrap_counter << 31' to be more straightforward and
> matches the above comment?

Sure, you're right. I'll modify the patch in the next version. 

Thanks again.
Cheng

> > +   vq->vq_avail_idx) << 16) | vq-
> > >vq_queue_index;
> 
> > +   } else {
> > +   /*
> > +* Bit[0:15]: vq queue index
> > +* Bit[16:31]: avail index
> > +*/
> > +   notify_data = ((uint32_t)vq->vq_avail_idx << 16) |
> > +   vq->vq_queue_index;
> > +   }
> > +   rte_write32(notify_data, vq->notify_addr);
> >  }
> >
> >  const struct virtio_pci_ops modern_ops = { diff --git
> > a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
> > index a38cb45ad..7433d2f08 100644
> > --- a/drivers/net/virtio/virtio_pci.h
> > +++ b/drivers/net/virtio/virtio_pci.h
> > @@ -135,6 +135,12 @@ struct virtnet_ctl;
> >   */
> >  #define VIRTIO_F_ORDER_PLATFORM 36
> >
> > +/*
> > + * This feature indicates that the driver passes extra data (besides
> > + * identifying the virtqueue) in its device notifications.
> > + */
> > +#define VIRTIO_F_NOTIFICATION_DATA 38
> > +
> >  /* The Guest publishes the used index for which it expects an interrupt
> >   * at the end of the avail ring. Host should ignore the avail->flags
> > field. */
> >  /* The Host publishes the avail index for which it expects a kick
> > --
> > 2.17.1



Re: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue

2019-12-22 Thread Wu, Jingjing



> -Original Message-
> From: Li, Xiaoyun 
> Sent: Monday, December 16, 2019 9:59 AM
> To: Wu, Jingjing 
> Cc: dev@dpdk.org; Maslekar, Omkar ; Li, Xiaoyun
> ; sta...@dpdk.org
> Subject: [PATCH v2] raw/ntb: fix write memory barrier issue
> 
> All buffers and ring info should be written before tail register update.
> This patch relocates the write memory barrier before updating tail register
> to avoid potential issues.
> 
> Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Xiaoyun Li 
Acked-by: Jingjing Wu 

> ---
> v2:
>  * Replaced rte_wmb with rte_io_wmb since rte_io_wmb is enough.
> ---
>  drivers/raw/ntb/ntb.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
> index ad7f6abfd..c7de86f36 100644
> --- a/drivers/raw/ntb/ntb.c
> +++ b/drivers/raw/ntb/ntb.c
> @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev,
>  sizeof(struct ntb_used) * nb1);
>   rte_memcpy(txq->tx_used_ring, tx_used + nb1,
>  sizeof(struct ntb_used) * nb2);
> + rte_io_wmb();
>   *txq->used_cnt = txq->last_used;
> - rte_wmb();
> 
>   /* update queue stats */
>   hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes;
> @@ -789,8 +789,8 @@ ntb_dequeue_bufs(struct rte_rawdev *dev,
>  sizeof(struct ntb_desc) * nb1);
>   rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1,
>  sizeof(struct ntb_desc) * nb2);
> + rte_io_wmb();
>   *rxq->avail_cnt = rxq->last_avail;
> - rte_wmb();
> 
>   /* update queue stats */
>   off = NTB_XSTATS_NUM * ((size_t)context + 1);
> --
> 2.17.1



Re: [dpdk-dev] [PATCH] doc: fix a typo in ntb guide

2019-12-22 Thread Wu, Jingjing



> -Original Message-
> From: Li, Xiaoyun 
> Sent: Wednesday, December 4, 2019 11:20 PM
> To: Wu, Jingjing 
> Cc: dev@dpdk.org; Li, Xiaoyun ; sta...@dpdk.org
> Subject: [PATCH] doc: fix a typo in ntb guide
> 
> In prerequisites of ntb guide, the correct flag when loading igb_uio
> module should be `wc_activate=1`, not `wc_active=1`.
> 
> Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Xiaoyun Li 
Acked-by: Jingjing Wu 

> ---
>  doc/guides/rawdevs/ntb.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/rawdevs/ntb.rst b/doc/guides/rawdevs/ntb.rst
> index 58472135f..aa7d80964 100644
> --- a/doc/guides/rawdevs/ntb.rst
> +++ b/doc/guides/rawdevs/ntb.rst
> @@ -52,11 +52,11 @@ NTB PMD needs kernel PCI driver to support write 
> combining (WC) to
> get
>  better performance. The difference will be more than 10 times.
>  To enable WC, there are 2 ways.
> 
> -- Insert igb_uio with ``wc_active=1`` flag if use igb_uio driver.
> +- Insert igb_uio with ``wc_activate=1`` flag if use igb_uio driver.
> 
>  .. code-block:: console
> 
> -  insmod igb_uio.ko wc_active=1
> +  insmod igb_uio.ko wc_activate=1
> 
>  - Enable WC for NTB device's Bar 2 and Bar 4 (Mapped memory) manually.
>The reference is https://www.kernel.org/doc/html/latest/x86/mtrr.html
> --
> 2.17.1



[dpdk-dev] [PATCH] net/i40e: fix TSO pkt exceeds allowed buf size issue

2019-12-22 Thread Xiaoyun Li
Hardware limits that max buffer size per tx descriptor should be
(16K-1)B. So when TSO enabled, the mbuf data size may exceed the
limit and cause malicious behaviour to the NIC. This patch fixes
this issue by using more tx descs for this kind of large buffer.

Fixes: 4861cde46116 ("i40e: new poll mode driver")
Cc: sta...@dpdk.org

Signed-off-by: Xiaoyun Li 
---
 drivers/net/i40e/i40e_rxtx.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 17dc8c78f..8269dc022 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -989,6 +989,8 @@ i40e_set_tso_ctx(struct rte_mbuf *mbuf, union 
i40e_tx_offload tx_offload)
return ctx_desc;
 }
 
+/* HW requires that Tx buffer size ranges from 1B up to (16K-1)B. */
+#define I40E_MAX_DATA_PER_TXD  (16 * 1024 - 1)
 uint16_t
 i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 {
@@ -1046,8 +1048,15 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
 * The number of descriptors that must be allocated for
 * a packet equals to the number of the segments of that
 * packet plus 1 context descriptor if needed.
+* Recalculate the needed tx descs when TSO enabled in case
+* the mbuf data size exceeds max data size that hw allows
+* per tx desc.
 */
-   nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
+   if (ol_flags & PKT_TX_TCP_SEG)
+   nb_used = (uint16_t)(DIV_ROUND_UP(tx_pkt->data_len,
+I40E_MAX_DATA_PER_TXD) + nb_ctx);
+   else
+   nb_used = (uint16_t)(tx_pkt->nb_segs + nb_ctx);
tx_last = (uint16_t)(tx_id + nb_used - 1);
 
/* Circular ring */
@@ -1160,6 +1169,24 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
slen = m_seg->data_len;
buf_dma_addr = rte_mbuf_data_iova(m_seg);
 
+   while ((ol_flags & PKT_TX_TCP_SEG) &&
+   unlikely(slen > I40E_MAX_DATA_PER_TXD)) {
+   txd->buffer_addr =
+   rte_cpu_to_le_64(buf_dma_addr);
+   txd->cmd_type_offset_bsz =
+   i40e_build_ctob(td_cmd,
+   td_offset, I40E_MAX_DATA_PER_TXD,
+   td_tag);
+
+   buf_dma_addr += I40E_MAX_DATA_PER_TXD;
+   slen -= I40E_MAX_DATA_PER_TXD;
+
+   txe->last_id = tx_last;
+   tx_id = txe->next_id;
+   txe = txn;
+   txd = &txr[tx_id];
+   txn = &sw_ring[txe->next_id];
+   }
PMD_TX_LOG(DEBUG, "mbuf: %p, TDD[%u]:\n"
"buf_dma_addr: %#"PRIx64";\n"
"td_cmd: %#x;\n"
-- 
2.17.1



Re: [dpdk-dev] [PATCH v6 1/6] lib/eal: implement the family of rte bit operation APIs

2019-12-22 Thread Honnappa Nagarahalli


> 
> On Sat, 21 Dec 2019 16:07:23 +
> Honnappa Nagarahalli  wrote:
> 
> > Converting these into macros will help remove the size based duplication of
> APIs. I came up with the following macro:
> >
> > #define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \
> > if (sizeof(var) == sizeof(uint32_t)) { \
> > uint32_t mask1 = 1U << (nr)%32; \
> > ret = __atomic_load_n(&var, (memorder)) & mask1;\
> > } \
> > else {\
> > uint64_t mask2 = 1UL << (nr)%64;\
> > ret = __atomic_load_n(&var, (memorder)) & mask2;\
> > } \
> > })
> 
> Macros are more error prone. Especially because this is in exposed header file
That's another question I have. Why do we need to have these APIs in a public 
header file? These will add to the ABI burden as well. These APIs should be in 
a common-but-not-public header file. I am also not sure how helpful these APIs 
are for applications as these APIs seem to have considered requirements only 
from the PMDs.


[dpdk-dev] [PATCH v3] net/virtio: packed ring notification data feature support

2019-12-22 Thread Cheng Jiang
This patch supports the feature that the driver passes extra data
(besides identifying the virtqueue) in its device notifications,
expanding the notifications to include the avail index and avail
wrap counter.

Signed-off-by: Cheng Jiang 
---

v3:
* Modified the commit log to make it more detailed.
* Modified the shift mode of notify_data to make it more intuitive.

v2:
* Removed rte_unused attribute in *hw.
* Added some comments on notify_data.

 drivers/net/virtio/virtio_ethdev.h |  3 ++-
 drivers/net/virtio/virtio_pci.c| 28 ++--
 drivers/net/virtio/virtio_pci.h|  6 ++
 3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index a10111758..cd8947656 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -36,7 +36,8 @@
 1ULL << VIRTIO_F_IN_ORDER| \
 1ULL << VIRTIO_F_RING_PACKED | \
 1ULL << VIRTIO_F_IOMMU_PLATFORM  | \
-1ULL << VIRTIO_F_ORDER_PLATFORM)
+1ULL << VIRTIO_F_ORDER_PLATFORM  | \
+1ULL << VIRTIO_F_NOTIFICATION_DATA)
 
 #define VIRTIO_PMD_SUPPORTED_GUEST_FEATURES\
(VIRTIO_PMD_DEFAULT_GUEST_FEATURES |\
diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 4468e89cb..dedd6bdda 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -416,9 +416,33 @@ modern_del_queue(struct virtio_hw *hw, struct virtqueue 
*vq)
 }
 
 static void
-modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq)
+modern_notify_queue(struct virtio_hw *hw, struct virtqueue *vq)
 {
-   rte_write16(vq->vq_queue_index, vq->notify_addr);
+   uint32_t notify_data;
+
+   if (!vtpci_with_feature(hw, VIRTIO_F_NOTIFICATION_DATA)) {
+   rte_write16(vq->vq_queue_index, vq->notify_addr);
+   return;
+   }
+
+   if (vtpci_with_feature(hw, VIRTIO_F_RING_PACKED)) {
+   /*
+* Bit[0:15]: vq queue index
+* Bit[16:30]: avail index
+* Bit[31]: avail wrap counter
+*/
+   notify_data = ((uint32_t)vq->vq_packed.used_wrap_counter << 31) 
|
+   ((uint32_t)vq->vq_avail_idx << 16) |
+   vq->vq_queue_index;
+   } else {
+   /*
+* Bit[0:15]: vq queue index
+* Bit[16:31]: avail index
+*/
+   notify_data = ((uint32_t)vq->vq_avail_idx << 16) |
+   vq->vq_queue_index;
+   }
+   rte_write32(notify_data, vq->notify_addr);
 }
 
 const struct virtio_pci_ops modern_ops = {
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index a38cb45ad..7433d2f08 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -135,6 +135,12 @@ struct virtnet_ctl;
  */
 #define VIRTIO_F_ORDER_PLATFORM 36
 
+/*
+ * This feature indicates that the driver passes extra data (besides
+ * identifying the virtqueue) in its device notifications.
+ */
+#define VIRTIO_F_NOTIFICATION_DATA 38
+
 /* The Guest publishes the used index for which it expects an interrupt
  * at the end of the avail ring. Host should ignore the avail->flags field. */
 /* The Host publishes the avail index for which it expects a kick
-- 
2.17.1



Re: [dpdk-dev] [PATCH v6 1/6] lib/eal: implement the family of rte bit operation APIs

2019-12-22 Thread Honnappa Nagarahalli
> 
> On Sat, 21 Dec 2019 16:07:23 +
> Honnappa Nagarahalli  wrote:
> 
> > Converting these into macros will help remove the size based duplication of
> APIs. I came up with the following macro:
> >
> > #define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \
> > if (sizeof(var) == sizeof(uint32_t)) { \
> > uint32_t mask1 = 1U << (nr)%32; \
> > ret = __atomic_load_n(&var, (memorder)) & mask1;\
> > } \
> > else {\
> > uint64_t mask2 = 1UL << (nr)%64;\
> > ret = __atomic_load_n(&var, (memorder)) & mask2;\
> > } \
> > })
> 
> 
> Follow on if you want to do it as macros, then use typeof() to make the mask
> any size.
Yes, that makes it much simple
#define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \
 typeof(var) mask; \
 if (sizeof(var) == sizeof(uint32_t)) { \
 mask = 1U << (nr)%32; \
 } else {\
 mask = 1UL << (nr)%64;\
 } \
 ret = __atomic_load_n(&var, (memorder)) & mask;\
})


Re: [dpdk-dev] [PATCH v3] net/virtio: packed ring notification data feature support

2019-12-22 Thread Gavin Hu
Reviewed-by: Gavin Hu 


[dpdk-dev] [PATCH v5] app/test: fix build when ring PMD is disabled

2019-12-22 Thread Reshma Pattan
Some unit tests has dependency on RING PMD,
so this patch is trying to fix those and other
closely related issues.

1)pdump, latency, bitrate, ring PMD and test_event_eth_tx_adapter
unit tests are dependent on ring PMD, so compile those
tests only when ring PMD is enabled else ignore.

2)get rid of make file error which was added by bond unit test
for ring PMD disabled case which is not necessary.

3)Tx adapter UT is dependent on RING PMD, but it was
observed that it was missing from the run in meson
build, so added it. TX adapter UT uses 'sw event and
'null' pmd drivers, so for shared builds the drivers .so
path has to be passed to the test args of meson UT run.

Fixes: 086eb64db3 ("test/pdump: add unit test for pdump library")
Fixes: fdeb30fa71 ("test/bitrate: add unit tests for bitrate library")
Fixes: 1e3676a06e ("test/latency: add unit tests for latencystats library")
Fixes: 46cf97e4bb ("eventdev: add test for eth Tx adapter")
Fixes: d23e09e0ef ("app/test: link with ring pmd when needed")

CC: sta...@dpdk.org
CC: Nikhil Rao 
CC: Chas Williams 
CC: Bruce Richardson 
CC: Stephen Hemminger 

Reported-by: Stephen Hemminger 
Signed-off-by: Reshma Pattan 
---
v5: remove extra blank line.
v4: fix event_eth_tx_adapter_autotest for shared build
as reported by travis-ci
https://travis-ci.com/ovsrobot/dpdk/jobs/249598391
v3: add missing test event_eth_tx_adapter_autotest.
Add link bonding mode4 test to drivers test.
v2: fix comments of v1 and combine the patches 1/2 and 2/2 of v1
---
 app/test/Makefile| 16 +---
 app/test/meson.build | 36 ++--
 app/test/process.h   |  8 
 app/test/test.c  |  2 ++
 4 files changed, 37 insertions(+), 25 deletions(-)

diff --git a/app/test/Makefile b/app/test/Makefile
index 57930c00b..1ee155009 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -151,8 +151,12 @@ SRCS-y += test_func_reentrancy.c
 
 SRCS-y += test_service_cores.c
 
+ifeq ($(CONFIG_RTE_LIBRTE_PMD_RING),y)
+SRCS-y += sample_packet_forward.c
 SRCS-$(CONFIG_RTE_LIBRTE_BITRATE) += test_bitratestats.c
 SRCS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += test_latencystats.c
+SRCS-$(CONFIG_RTE_LIBRTE_PDUMP) += test_pdump.c
+endif
 
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_num.c
@@ -181,11 +185,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += 
test_distributor_perf.c
 
 SRCS-$(CONFIG_RTE_LIBRTE_REORDER) += test_reorder.c
 
-SRCS-$(CONFIG_RTE_LIBRTE_PDUMP) += test_pdump.c
-
 SRCS-y += virtual_pmd.c
 SRCS-y += packet_burst_generator.c
-SRCS-y += sample_packet_forward.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += test_acl.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_PMD_RING),y)
@@ -215,7 +216,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_EVENTDEV),y)
 SRCS-y += test_eventdev.c
 SRCS-y += test_event_ring.c
 SRCS-y += test_event_eth_rx_adapter.c
-SRCS-y += test_event_eth_tx_adapter.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_RING) += test_event_eth_tx_adapter.c
 SRCS-y += test_event_timer_adapter.c
 SRCS-y += test_event_crypto_adapter.c
 endif
@@ -268,13 +269,6 @@ endif
 endif
 endif
 
-# Link against shared libraries when needed
-ifeq ($(CONFIG_RTE_LIBRTE_PMD_BOND),y)
-ifneq ($(CONFIG_RTE_LIBRTE_PMD_RING),y)
-$(error Link bonding tests require CONFIG_RTE_LIBRTE_PMD_RING=y)
-endif
-endif
-
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
 
 ifeq ($(CONFIG_RTE_LIBRTE_PMD_BOND),y)
diff --git a/app/test/meson.build b/app/test/meson.build
index fb49d804b..e38d97d51 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -7,13 +7,11 @@ endif
 
 test_sources = files('commands.c',
'packet_burst_generator.c',
-   'sample_packet_forward.c',
'test.c',
'test_acl.c',
'test_alarm.c',
'test_atomic.c',
'test_barrier.c',
-   'test_bitratestats.c',
'test_bpf.c',
'test_byteorder.c',
'test_cmdline.c',
@@ -43,7 +41,6 @@ test_sources = files('commands.c',
'test_event_crypto_adapter.c',
'test_event_eth_rx_adapter.c',
'test_event_ring.c',
-   'test_event_eth_tx_adapter.c',
'test_event_timer_adapter.c',
'test_eventdev.c',
'test_external_mem.c',
@@ -65,9 +62,7 @@ test_sources = files('commands.c',
'test_ipsec_sad.c',
'test_kni.c',
'test_kvargs.c',
-   'test_latencystats.c',
'test_link_bonding.c',
-   'test_link_bonding_mode4.c',
'test_link_bonding_rssconf.c',
'test_logs.c',
'test_lpm.c',
@@ -88,11 +83,8 @@ test_sources = files('commands.c',
'test_metrics.c',
'test_mcslock.c',
'test_mp_secondary.c',
-   'test_pdump.c',
'test_per_lcore.c',
'test_pmd_perf.c',
-   'test_pmd_ring.c',
-   'test_pmd_ring_perf.c',
'test_power.c',
'test_power_cpufreq.c',
'test_power_kvm_vm.c',
@@ -212,7 +204,6 @@ fast_test_names = [
 'rib_autotest',
 'rib6_autotest',
 'ring_autotest',
-'ring_pmd_autotest',
 '

[dpdk-dev] [PATCH v3 0/2] add travis ci support for aarch64

2019-12-22 Thread Ruifeng Wang
This patch set is to enable native aarch64 build in Travis CI.
It leverages Travis CI multi arch support.

As the first step, compilation jobs are added.
Unit test is not added for now due to service limitation. We are
planning to run unit test with no-huge in future.


v3:
Reverse patches order. Not depending on other patches.
Restore distribution designation for aarch64 native jobs.

v2:
Removed dist designation from matrix since base dist is changing.
Add explanation for library path issue.

Ruifeng Wang (2):
  devtools: add path to additional shared object files
  ci: add travis ci support for aarch64

 .ci/linux-setup.sh| 11 +++
 .travis.yml   | 42 +-
 devtools/test-null.sh |  2 +-
 3 files changed, 49 insertions(+), 6 deletions(-)

-- 
2.17.1



[dpdk-dev] [PATCH v3 2/2] ci: add travis ci support for aarch64

2019-12-22 Thread Ruifeng Wang
Add Travis compilation jobs for aarch64. gcc/clang compilations for
static/shared libraries are added.

Some limitations for current aarch64 Travis support:
1. Container is used. Huge page is not available due to security reason.
2. Missing kernel header package in Xenial distribution.

Solutions to address the limitations:
1. Not to add unit test for now. And run tests with no-huge in future.
2. Use Bionic distribution for all aarch64 jobs.

Signed-off-by: Ruifeng Wang 
Reviewed-by: Gavin Hu 
---
 .ci/linux-setup.sh | 11 +++
 .travis.yml| 42 +-
 2 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/.ci/linux-setup.sh b/.ci/linux-setup.sh
index dfb9d4a20..a92978037 100755
--- a/.ci/linux-setup.sh
+++ b/.ci/linux-setup.sh
@@ -3,7 +3,10 @@
 # need to install as 'root' since some of the unit tests won't run without it
 sudo python3 -m pip install --upgrade meson
 
-# setup hugepages
-cat /proc/meminfo
-sudo sh -c 'echo 1024 > /proc/sys/vm/nr_hugepages'
-cat /proc/meminfo
+# hugepage settings are skipped on aarch64 due to environment limitation
+if [ "$TRAVIS_ARCH" != "aarch64" ]; then
+# setup hugepages
+cat /proc/meminfo
+sudo sh -c 'echo 1024 > /proc/sys/vm/nr_hugepages'
+cat /proc/meminfo
+fi
diff --git a/.travis.yml b/.travis.yml
index 8f90d06f2..980c7605d 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -115,6 +115,46 @@ matrix:
   apt:
 packages:
   - *extra_packages
-
+  - env: DEF_LIB="static"
+arch: arm64
+compiler: gcc
+dist: bionic
+addons:
+  apt:
+packages:
+  - *required_packages
+  - env: DEF_LIB="shared"
+arch: arm64
+compiler: gcc
+dist: bionic
+addons:
+  apt:
+packages:
+  - *required_packages
+  - env: DEF_LIB="static"
+arch: arm64
+dist: bionic
+compiler: clang
+addons:
+  apt:
+packages:
+  - *required_packages
+  - env: DEF_LIB="shared"
+arch: arm64
+dist: bionic
+compiler: clang
+addons:
+  apt:
+packages:
+  - *required_packages
+  - env: DEF_LIB="shared" OPTS="-Denable_kmods=false" BUILD_DOCS=1
+arch: arm64
+compiler: gcc
+dist: bionic
+addons:
+  apt:
+packages:
+  - *required_packages
+  - *doc_packages
 
 script: ./.ci/${TRAVIS_OS_NAME}-build.sh
-- 
2.17.1



[dpdk-dev] [PATCH v3 1/2] devtools: add path to additional shared object files

2019-12-22 Thread Ruifeng Wang
Drivers librte_mempool_ring.so and librte_pmd_null.so are loaded by
librte_eal.so when running testpmd.
In Ubuntu Xenial, driver path is installed to RPATH on testpmd. This
allows librte_eal.so to find drivers by using the RPATH.
However, in Ubuntu Bionic, driver path is installed to RUNPATH instead.
The RUNPATH on testpmd is not available by librte_eal.so and therefore
lead to driver load failure:

EAL: Detected 32 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: librte_mempool_ring.so: cannot open shared object file:
No such file or directory
EAL: FATAL: Cannot init plugins
EAL: Cannot init plugins

Add 'drivers' into LD_LIBRARY_PATH so that testpmd can find and make
use of these shared libraries.

Signed-off-by: Ruifeng Wang 
Reviewed-by: Gavin Hu 
---
 devtools/test-null.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/devtools/test-null.sh b/devtools/test-null.sh
index f39af2c06..548de8113 100755
--- a/devtools/test-null.sh
+++ b/devtools/test-null.sh
@@ -20,7 +20,7 @@ if [ ! -f "$testpmd" ] ; then
 fi
 
 if ldd $testpmd | grep -q librte_ ; then
-   export LD_LIBRARY_PATH=$build/lib:$LD_LIBRARY_PATH
+   export LD_LIBRARY_PATH=$build/drivers:$build/lib:$LD_LIBRARY_PATH
libs='-d librte_mempool_ring.so -d librte_pmd_null.so'
 else
libs=
-- 
2.17.1



[dpdk-dev] [PATCH v2] net/iavf: fix virtual channel return value error

2019-12-22 Thread Yahui Cao
In iavf_handle_virtchnl_msg(), it is not appropriate for _clear_cmd()
to be used as a notification to forground thread. So introduce
_notify_cmd() to fix this error.

Sending msg from VF to PF is mainly by calling iavf_execute_vf_cmd(),
the whole virtchnl msg process is like,

iavf_execute_vf_cmd() will call iavf_aq_send_msg_to_pf() to send
msg and then polling the cmd done flag as "if (vf->pend_cmd ==
VIRTCHNL_OP_UNKNOWN)"

When reply msg is returned by pf, iavf_handle_virtchnl_msg() in
isr will read msg return value by "vf->cmd_retval = msg_ret" and
immediately set the cmd done flag by calling _clear_cmd() to
notify the iavf_execute_vf_cmd().

iavf_execute_vf_cmd() find the cmd done flag is set and then
check whether command return value vf->cmd_retval is success or
not.

However _clear_cmd() also resets the vf->cmd_retval to success,
overwriting the actual return value which is used for
diagnosis. So iavf_execute_vf_cmd() will always find
vf->cmd_retval is success and then return success.

Fixes: 22b123a36d07 ("net/avf: initialize PMD")
Cc: sta...@dpdk.org

Signed-off-by: Yahui Cao 
---
 drivers/net/iavf/iavf.h   | 10 ++
 drivers/net/iavf/iavf_vchnl.c |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h
index bbd4d75d0..215a1f28f 100644
--- a/drivers/net/iavf/iavf.h
+++ b/drivers/net/iavf/iavf.h
@@ -173,6 +173,16 @@ struct iavf_cmd_info {
uint32_t out_size;  /* buffer size for response */
 };
 
+/* notify current command done. Only call in case execute
+ * _atomic_set_cmd successfully.
+ */
+static inline void
+_notify_cmd(struct iavf_info *vf)
+{
+   rte_wmb();
+   vf->pend_cmd = VIRTCHNL_OP_UNKNOWN;
+}
+
 /* clear current command. Only call in case execute
  * _atomic_set_cmd successfully.
  */
diff --git a/drivers/net/iavf/iavf_vchnl.c b/drivers/net/iavf/iavf_vchnl.c
index 14395fed3..285b1fc97 100644
--- a/drivers/net/iavf/iavf_vchnl.c
+++ b/drivers/net/iavf/iavf_vchnl.c
@@ -214,7 +214,7 @@ iavf_handle_virtchnl_msg(struct rte_eth_dev *dev)
vf->cmd_retval = msg_ret;
/* prevent compiler reordering */
rte_compiler_barrier();
-   _clear_cmd(vf);
+   _notify_cmd(vf);
} else
PMD_DRV_LOG(ERR, "command mismatch,"
"expect %u, get %u",
-- 
2.17.1



[dpdk-dev] [PATCH v4] net/virtio-user: fix packed ring server mode

2019-12-22 Thread Xuan Ding
This patch fixes the situation where datapath does not work properly when
vhost reconnects to virtio in server mode with packed ring.

Currently, virtio and vhost share memory of vring. For split ring, vhost
can read the status of discriptors directly from the available ring and
the used ring during reconnection. Therefore, the datapath can continue.

But for packed ring, when reconnecting to virtio, vhost cannot get the
status of discriptors only through the descriptor ring. By resetting Tx
and Rx queues, the datapath can restart from the beginning.

Fixes: 4c3f5822eb214 ("net/virtio: add packed virtqueue defines")
Cc: sta...@dpdk.org

Signed-off-by: Xuan Ding 
---

v4:
* Moved change log below '---' marker.

v3:
* Removed an extra asterisk from a comment.
* Renamed device reset function and moved it to virtio_user_ethdev.c.

v2:
* Renamed queue reset functions and moved them to virtqueue.c.

 drivers/net/virtio/virtio_ethdev.c  |  4 +-
 drivers/net/virtio/virtio_user_ethdev.c | 40 ++
 drivers/net/virtio/virtqueue.c  | 71 +
 drivers/net/virtio/virtqueue.h  |  4 ++
 4 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 044eb10a7..f9d0ea70d 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1913,6 +1913,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
goto err_vtpci_init;
}
 
+   rte_spinlock_init(&hw->state_lock);
+
/* reset device and negotiate default features */
ret = virtio_init_device(eth_dev, VIRTIO_PMD_DEFAULT_GUEST_FEATURES);
if (ret < 0)
@@ -2155,8 +2157,6 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return -EBUSY;
}
 
-   rte_spinlock_init(&hw->state_lock);
-
hw->use_simple_rx = 1;
 
if (vtpci_with_feature(hw, VIRTIO_F_IN_ORDER)) {
diff --git a/drivers/net/virtio/virtio_user_ethdev.c 
b/drivers/net/virtio/virtio_user_ethdev.c
index 3fc172573..425f48230 100644
--- a/drivers/net/virtio/virtio_user_ethdev.c
+++ b/drivers/net/virtio/virtio_user_ethdev.c
@@ -25,12 +25,48 @@
 #define virtio_user_get_dev(hw) \
((struct virtio_user_dev *)(hw)->virtio_user_dev)
 
+static void
+virtio_user_reset_queues_packed(struct rte_eth_dev *dev)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   struct virtnet_rx *rxvq;
+   struct virtnet_tx *txvq;
+   uint16_t i;
+
+   /* Add lock to avoid queue contention. */
+   rte_spinlock_lock(&hw->state_lock);
+   hw->started = 0;
+
+   /*
+* Waitting for datapath to complete before resetting queues.
+* 1 ms should be enough for the ongoing Tx/Rx function to finish.
+*/
+   rte_delay_ms(1);
+
+   /* Vring reset for each Tx queue and Rx queue. */
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   rxvq = dev->data->rx_queues[i];
+   virtqueue_rxvq_reset_packed(rxvq->vq);
+   virtio_dev_rx_queue_setup_finish(dev, i);
+   }
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   txvq = dev->data->tx_queues[i];
+   virtqueue_txvq_reset_packed(txvq->vq);
+   }
+
+   hw->started = 1;
+   rte_spinlock_unlock(&hw->state_lock);
+}
+
+
 static int
 virtio_user_server_reconnect(struct virtio_user_dev *dev)
 {
int ret;
int connectfd;
struct rte_eth_dev *eth_dev = &rte_eth_devices[dev->port_id];
+   struct virtio_hw *hw = eth_dev->data->dev_private;
 
connectfd = accept(dev->listenfd, NULL, NULL);
if (connectfd < 0)
@@ -51,6 +87,10 @@ virtio_user_server_reconnect(struct virtio_user_dev *dev)
 
dev->features &= dev->device_features;
 
+   /* For packed ring, resetting queues is required in reconnection. */
+   if (vtpci_packed_queue(hw))
+   virtio_user_reset_queues_packed(eth_dev);
+
ret = virtio_user_start_device(dev);
if (ret < 0)
return -1;
diff --git a/drivers/net/virtio/virtqueue.c b/drivers/net/virtio/virtqueue.c
index 5ff1e3587..0b4e3bf3e 100644
--- a/drivers/net/virtio/virtqueue.c
+++ b/drivers/net/virtio/virtqueue.c
@@ -141,3 +141,74 @@ virtqueue_rxvq_flush(struct virtqueue *vq)
else
virtqueue_rxvq_flush_split(vq);
 }
+
+int
+virtqueue_rxvq_reset_packed(struct virtqueue *vq)
+{
+   int size = vq->vq_nentries;
+   struct vq_desc_extra *dxp;
+   struct virtnet_rx *rxvq;
+   uint16_t desc_idx;
+
+   vq->vq_used_cons_idx = 0;
+   vq->vq_desc_head_idx = 0;
+   vq->vq_avail_idx = 0;
+   vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
+   vq->vq_free_cnt = vq->vq_nentries;
+
+   vq->vq_packed.used_wrap_counter = 1;
+   vq->vq_packed.cached_flags = VRING_PACKED_DESC_F_AVAIL;
+   vq->vq_packed.event_flags_shadow = 0;
+   vq->vq_p

Re: [dpdk-dev] [PATCH] net/ixgbe: add or remove MAC address

2019-12-22 Thread Ye Xiaolong
Hi, guinan

For the title, better to use

Add support for vf MAC address add and remove

or something like so.

On 12/03, Guinan Sun wrote:
>Ixgbe PMD pf host code needs to support ixgbevf mac address
>add and remove. For this purpose, a response was added
>between pf and vf to update the mac address.

Does this mean each one vf can have multiple MAC addresses after this patch,
or this `add` actually means to replace the old mac address with the new one?

>
>Signed-off-by: Guinan Sun 
>---
> drivers/net/ixgbe/ixgbe_ethdev.h |  1 +
> drivers/net/ixgbe/ixgbe_pf.c | 35 
> 2 files changed, 36 insertions(+)
>
>diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h 
>b/drivers/net/ixgbe/ixgbe_ethdev.h
>index 76a1b9d18..e1cd8fd16 100644
>--- a/drivers/net/ixgbe/ixgbe_ethdev.h
>+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
>@@ -270,6 +270,7 @@ struct ixgbe_vf_info {
>   uint8_t api_version;
>   uint16_t switch_domain_id;
>   uint16_t xcast_mode;
>+  uint16_t mac_count;

How is this mac_count initialized? 

> };
> 
> /*
>diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c
>index d0d85e138..76dbed2ab 100644
>--- a/drivers/net/ixgbe/ixgbe_pf.c
>+++ b/drivers/net/ixgbe/ixgbe_pf.c
>@@ -748,6 +748,37 @@ ixgbe_set_vf_mc_promisc(struct rte_eth_dev *dev, uint32_t 
>vf, uint32_t *msgbuf)
>   return 0;
> }
> 
>+static int
>+ixgbe_set_vf_macvlan_msg(struct rte_eth_dev *dev, uint32_t vf, uint32_t 
>*msgbuf)
>+{
>+  struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>+  struct ixgbe_vf_info *vf_info =
>+  *(IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private));
>+  uint8_t *new_mac = (uint8_t *)(&msgbuf[1]);
>+  int index = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) >>
>+  IXGBE_VT_MSGINFO_SHIFT;
>+
>+  if (index) {
>+  if (!rte_is_valid_assigned_ether_addr(
>+  (struct rte_ether_addr *)new_mac)) {
>+  RTE_LOG(ERR, PMD, "set invalid mac vf:%d\n", vf);
>+  return -1;
>+  }
>+
>+  if (new_mac == NULL)
>+  return -1;

I feel the null check should be in front of valid ether addr check, otherwise
there might be null pointer reference issue.

Thanks,
Xiaolong

>+
>+  vf_info[vf].mac_count++;
>+
>+  hw->mac.ops.set_rar(hw, vf_info[vf].mac_count,
>+  new_mac, vf, IXGBE_RAH_AV);
>+  } else {
>+  hw->mac.ops.clear_rar(hw, vf_info[vf].mac_count);
>+  vf_info[vf].mac_count = 0;
>+  }
>+  return 0;
>+}
>+
> static int
> ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t vf)
> {
>@@ -835,6 +866,10 @@ ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t 
>vf)
>   if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED)
>   retval = ixgbe_set_vf_mc_promisc(dev, vf, msgbuf);
>   break;
>+  case IXGBE_VF_SET_MACVLAN:
>+  if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED)
>+  retval = ixgbe_set_vf_macvlan_msg(dev, vf, msgbuf);
>+  break;
>   default:
>   PMD_DRV_LOG(DEBUG, "Unhandled Msg %8.8x", (unsigned)msgbuf[0]);
>   retval = IXGBE_ERR_MBX;
>-- 
>2.17.1
>


Re: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue

2019-12-22 Thread Li, Xiaoyun
Hi
I reconsidered and retested about this issue.
I still need to use rte_wmb instead of using rte_io_wmb.

Because to achieve high performance, ntb needs to turn on WC(write combining) 
feature. The perf difference with and without WC enabled is more than 20X.
And when WC enabled, rte_io_wmb cannot make sure the instructions are in order 
only rte_wmb can make sure that.

And in my retest, when sending 64 bytes packets, using rte_io_wmb will cause 
out-of-order issue and cause memory corruption on rx side.
And using rte_wmb is fine.

So I can only use v1 patch and suspend v2 patch in patchwork.

Best Regards
Xiaoyun Li

> -Original Message-
> From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com]
> Sent: Monday, December 16, 2019 18:50
> To: Li, Xiaoyun ; Wu, Jingjing 
> Cc: dev@dpdk.org; Maslekar, Omkar ;
> sta...@dpdk.org; nd 
> Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue
> 
> 
> 
> > -Original Message-
> > From: dev  On Behalf Of Xiaoyun Li
> > Sent: Monday, December 16, 2019 9:59 AM
> > To: jingjing...@intel.com
> > Cc: dev@dpdk.org; omkar.masle...@intel.com; Xiaoyun Li
> > ; sta...@dpdk.org
> > Subject: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue
> >
> > All buffers and ring info should be written before tail register update.
> > This patch relocates the write memory barrier before updating tail
> > register to avoid potential issues.
> >
> > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Xiaoyun Li 
> > ---
> > v2:
> >  * Replaced rte_wmb with rte_io_wmb since rte_io_wmb is enough.
> > ---
> >  drivers/raw/ntb/ntb.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index
> > ad7f6abfd..c7de86f36 100644
> > --- a/drivers/raw/ntb/ntb.c
> > +++ b/drivers/raw/ntb/ntb.c
> > @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev,
> >sizeof(struct ntb_used) * nb1);
> > rte_memcpy(txq->tx_used_ring, tx_used + nb1,
> >sizeof(struct ntb_used) * nb2);
> > +   rte_io_wmb();
> As both txq->tx_used_ring and *txq->used_cnt are physically reside in the PCI
> device side, rte_io_wmb is correct to ensure the ordering.
> 
> > *txq->used_cnt = txq->last_used;
> > -   rte_wmb();
> >
> > /* update queue stats */
> > hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; @@ -789,8
> +789,8 @@
> > ntb_dequeue_bufs(struct rte_rawdev *dev,
> >sizeof(struct ntb_desc) * nb1);
> > rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1,
> >sizeof(struct ntb_desc) * nb2);
> > +   rte_io_wmb();
> > *rxq->avail_cnt = rxq->last_avail;
> > -   rte_wmb();
> >
> > /* update queue stats */
> > off = NTB_XSTATS_NUM * ((size_t)context + 1);
> > --
> > 2.17.1
> 
> Reviewed-by: Gavin Hu