Re: [dpdk-dev] [PATCH] eal: force gcc to inline rte_movX function

2018-04-17 Thread Chen, Junjie J
Thanks to point this out. I agree for the title change.

Do you want me to send v2 patch? Or you can handle it when committing? 

> > > Sometimes gcc does not inline the function despite keyword *inline*,
> > > we obeserve rte_movX is not inline when doing performance profiling,
> > > so use *always_inline* keyword to force gcc to inline the function.
> > >
> > > Signed-off-by: Chen, Junjie 
> > > ---
> > >  .../common/include/arch/x86/rte_memcpy.h   | 22
> +++---
> > >  1 file changed, 11 insertions(+), 11 deletions(-)
> >
> > The title should start with "eal/x86:"
> > Something like that:
> > eal/x86: force inlining of memcpy sub-functions
> >
> > Bruce, Konstantin, any review of the content/optimization?
> >
> No objection here.
> 
> Acked-by: Bruce Richardson 



Re: [dpdk-dev] [PATCH] net/vhost: Initialise vid to -1

2018-05-06 Thread Chen, Junjie J
Hi Maxime

I saw Yuanhan already did this. Two patches already in 17.11 LTS.

Cheers
JJ

> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Friday, May 4, 2018 9:09 PM
> To: Loftus, Ciara ; dev@dpdk.org
> Cc: Chen, Junjie J 
> Subject: Re: [dpdk-dev] [PATCH] net/vhost: Initialise vid to -1
> 
> 
> 
> On 05/03/2018 03:18 PM, Loftus, Ciara wrote:
> >>
> >>>
> >>> On 04/27/2018 04:19 PM, Ciara Loftus wrote:
> >>>> rte_eth_vhost_get_vid_from_port_id returns a value of 0 if called
> >>>> before the first call to the new_device callback. A vid value >=0
> >>>> suggests the device is active which is not the case in this
> >>>> instance. Initialise vid to a negative value to prevent this.
> >>>>
> >>>> Signed-off-by: Ciara Loftus 
> >>>> ---
> >>>>drivers/net/vhost/rte_eth_vhost.c | 1 +
> >>>>1 file changed, 1 insertion(+)
> >>>>
> >>>> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> >>> b/drivers/net/vhost/rte_eth_vhost.c
> >>>> index 99a7727..f47950c 100644
> >>>> --- a/drivers/net/vhost/rte_eth_vhost.c
> >>>> +++ b/drivers/net/vhost/rte_eth_vhost.c
> >>>> @@ -1051,6 +1051,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
> >>> uint16_t rx_queue_id,
> >>>>  return -ENOMEM;
> >>>>  }
> >>>>
> >>>> +vq->vid = -1;
> >>>>  vq->mb_pool = mb_pool;
> >>>>  vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
> >>>>  dev->data->rx_queues[rx_queue_id] = vq;
> >>>>
> >>>
> >>> Reviewed-by: Maxime Coquelin 
> >>>
> >>> Thanks,
> >>> Maxime
> >>
> >> On second thoughts, self-NACK.
> >>
> >> We need to provision for the case where we want to call
> >> eth_rx_queue_setup AFTER new_device. For instance when we want to
> >> change the mb_pool. In this case we need to maintain the same vid and
> >> not reset it to -1.
> >>
> >> Without this patch the original problem still exists and need to find
> >> an alternative workaround.
> >
> > Junjie's patches fix the issue I was observing. Thanks Junjie!
> >
> https://dpdk.org/browse/dpdk/commit/?id=30a701a53737a0b6f7953412cc3b3
> d
> > 36c1d49122
> >
> https://dpdk.org/browse/dpdk/commit/?id=e6722dee533cda3756fbc5c9ea4dd
> f
> > bf30276f1b
> >
> > Along with the v2 of this patch could they be considered for the 17.11 
> > stable
> branch?
> 
> Yes, I think it is a good idea.
> It wasn't planned for -stable initially as it fixed a new use-case in v18.05.
> 
> Junjie, can you please generate a backport against v17.11 and post it to
> sta...@dpdk.org, adding "PATCH v17.11 LTS" as subject prefix, and using -x
> option of when cherry-picking so that it references the patches in master?
> 
> Thanks in advance,
> Maxime
> > Thanks,
> > Ciara
> >
> >>
> >> Thanks,
> >> Ciara


Re: [dpdk-dev] [PATCH v3] net/vhost: fix segfault when creating vdev dynamically

2018-04-10 Thread Chen, Junjie J
Thanks for report, I just summited this patch to fix:
https://dpdk.org/dev/patchwork/patch/37765/ 

> 
> Hi,
> 
> On Fri, Mar 30, 2018 at 02:58:31PM +0800, Junjie Chen wrote:
> >When creating vdev dynamically, vhost pmd driver starts directly
> >without checking TX/RX queues are ready or not, and thus causes
> >segmentation fault when vhost library accesses queues. This patch adds
> >a flag to check whether queues are setup or not, and adds queues setup
> >into dev_start function to allow user to start them after setting up.
> 
> for me, with this patch vhost enqueue/dequeue code is never called because
> 
>   if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> 
> this check in eth_vhost_rx() is always true.
> 
> When I revert this patch it works as usual.
> 
> My testpmd cmdline is:
> 
> gdb --args $RTE_SDK/install/bin/testpmd -l 0,2,3,4,5 --socket-mem=1024 -n 4 \
> --vdev 'net_vhost0,iface=/tmp/vhost-user1' \
> --vdev 'net_vhost1,iface=/tmp/vhost-user2' -- \
> --portmask=f  --rxq=1 --txq=1 \
> --nb-cores=4 --forward-mode=io  -i
> 
> After starting testpmd I issue commands "set portlist 0,2,1,3", start my guest
> and start another testpmd issue in the guest.
> 
> Another problem I see: Before this patch I could start testpmd, issue the
> portlist command and type "start". If I do this now I get an infinite loop of
> "VHOST CONFIG device not found" messages.
> 
> 
> regards,
> Jens


Re: [dpdk-dev] [PATCH] net/vhost: fix vhost invalid state

2018-04-10 Thread Chen, Junjie J
Thanks to point this out, I forgot to run 'git add' before generating patch.

Need following change also: 
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -655,9 +655,8 @@ destroy_device(int vid)
eth_dev = list->eth_dev;
internal = eth_dev->data->dev_private;

-   rte_atomic32_set(&internal->started, 0);
-   update_queuing_status(eth_dev);
rte_atomic32_set(&internal->dev_attached, 0);
+   update_queuing_status(eth_dev);

eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;


Will send out V2 to include this.


Cheers
JJ


> -Original Message-
> From: Jens Freimann [mailto:jfreim...@redhat.com]
> Sent: Tuesday, April 10, 2018 7:13 PM
> To: Chen, Junjie J 
> Cc: Tan, Jianfeng ; maxime.coque...@redhat.com;
> mtetsu...@gmail.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] net/vhost: fix vhost invalid state
> 
> On Tue, Apr 10, 2018 at 10:18:09AM -0400, Junjie Chen wrote:
> >dev_start sets *dev_attached* after setup queues, this sets device to
> >invalid state since no frontend is attached. Also destroy_device set
> >*started* to zero which makes *allow_queuing* always zero until
> >dev_start get called again. Actually, we should not determine queues
> >existence by
> >*dev_attached* but by queues pointers or other separated variable(s).
> >
> >Fixes: 30a701a53737 ("net/vhost: fix crash when creating vdev
> >dynamically")
> >
> >Signed-off-by: Junjie Chen 
> 
> So this fixes the problem I saw with allow_queueing always being zero and the
> error message "VHOST_CONFIG: (0) device not found".
> 
> However with this patch on top of virtio-next/master no packets are being
> forwarded to the guest and back anymore.
> 
> When I use virtio-next/master and revert 30a701a53737 both works fine.
> 
> regards,
> Jens
> 
> >---
> > drivers/net/vhost/rte_eth_vhost.c | 64
> >+++
> > 1 file changed, 38 insertions(+), 26 deletions(-)
> >
> >diff --git a/drivers/net/vhost/rte_eth_vhost.c
> >b/drivers/net/vhost/rte_eth_vhost.c
> >index 11b6076..6a2ff76 100644
> >--- a/drivers/net/vhost/rte_eth_vhost.c
> >+++ b/drivers/net/vhost/rte_eth_vhost.c
> >@@ -118,6 +118,7 @@ struct pmd_internal {
> > char *iface_name;
> > uint16_t max_queues;
> > uint16_t vid;
> >+uint16_t queue_ready;
> > rte_atomic32_t started;
> > };
> >
> >@@ -528,10 +529,13 @@ update_queuing_status(struct rte_eth_dev *dev)
> > unsigned int i;
> > int allow_queuing = 1;
> >
> >-if (rte_atomic32_read(&internal->dev_attached) == 0)
> >+if (!dev->data->rx_queues || !dev->data->tx_queues) {
> >+RTE_LOG(ERR, PMD, "RX/TX queues not setup yet\n");
> > return;
> >+}
> >
> >-if (rte_atomic32_read(&internal->started) == 0)
> >+if (rte_atomic32_read(&internal->started) == 0 ||
> >+rte_atomic32_read(&internal->dev_attached) == 0)
> > allow_queuing = 0;
> >
> > /* Wait until rx/tx_pkt_burst stops accessing vhost device */ @@
> >-576,6 +580,8 @@ queue_setup(struct rte_eth_dev *eth_dev, struct
> pmd_internal *internal)
> > vq->internal = internal;
> > vq->port = eth_dev->data->port_id;
> > }
> >+
> >+internal->queue_ready = 1;
> > }
> >
> > static int
> >@@ -607,13 +613,10 @@ new_device(int vid)  #endif
> >
> > internal->vid = vid;
> >-if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
> >+if (eth_dev->data->rx_queues && eth_dev->data->tx_queues)
> > queue_setup(eth_dev, internal);
> >-rte_atomic32_set(&internal->dev_attached, 1);
> >-} else {
> >-RTE_LOG(INFO, PMD, "RX/TX queues have not setup yet\n");
> >-rte_atomic32_set(&internal->dev_attached, 0);
> >-}
> >+else
> >+RTE_LOG(INFO, PMD, "RX/TX queues not setup yet\n");
> >
> > for (i = 0; i < rte_vhost_get_vring_num(vid); i++)
> > rte_vhost_enable_guest_notification(vid, i, 0); @@ -622,6 +625,7
> @@
> >new_device(int vid)
> >
> > eth_dev->data->dev_link.link_status = ETH_LINK_UP;
> >
> >+rte_atomic32_set(&internal->dev_attached, 1);
> > update_queuing_status(eth_dev);
> >
> > RTE_LOG(INFO, PMD, "Vhost 

Re: [dpdk-dev] [PATCH v2] net/vhost: fix vhost invalid state

2018-04-11 Thread Chen, Junjie J
> 
> 
> On 4/11/2018 6:53 PM, Junjie Chen wrote:
> > dev_start sets *dev_attached* after setup queues, this sets device to
> > invalid state since no frontend is attached. Also destroy_device set
> > *started* to zero which makes *allow_queuing* always zero until
> > dev_start get called again. Actually, we should not determine queues
> > existence by
> > *dev_attached* but by queues pointers or other separated variable(s).
> 
> I see two issues from your description:
> 
> - We shall fix the wrong use of dev_attached by commit 30a701a53737.
> - The allow_queuing is not set correctly. But I doubt if it's a real issue, 
> as we do
> update the queue status in dev_start(), dev_stop(), and new_device(). Can you
> explain more?
The started is set to 0 in destroy_device in 30a701a53737 commit, so 
allow_queuing is always
set to 0 once update_queuing_status get called. We have to use "port start" to 
reset to 1.
> 
> >
> > Fixes: 30a701a53737 ("net/vhost: fix crash when creating vdev
> > dynamically")
> >
> > Signed-off-by: Junjie Chen 
> > ---
> > Changes in v2:
> > - use started to determine vhost queues readiness
> > - revert setting started to zero in destroy_device
> >   drivers/net/vhost/rte_eth_vhost.c | 61
> ---
> >   1 file changed, 31 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> > index 11b6076..ff462b3 100644
> > --- a/drivers/net/vhost/rte_eth_vhost.c
> > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > @@ -528,10 +528,13 @@ update_queuing_status(struct rte_eth_dev *dev)
> > unsigned int i;
> > int allow_queuing = 1;
> >
> > -   if (rte_atomic32_read(&internal->dev_attached) == 0)
> > +   if (!dev->data->rx_queues || !dev->data->tx_queues) {
> > +   RTE_LOG(ERR, PMD, "RX/TX queues not exist yet\n");
> 
> Even it's not introduced in this patch, but I think we shall not report
> error log here.
if you attach one port with "port attach" (no vhost queue setup yet), and then 
execute 
"port stop", the queue status is not exist yet, right?
> 
> > return;
> > +   }
> >
> > -   if (rte_atomic32_read(&internal->started) == 0)
> > +   if (rte_atomic32_read(&internal->started) == 0 ||
> > +   rte_atomic32_read(&internal->dev_attached) == 0)
> > allow_queuing = 0;
> >
> > /* Wait until rx/tx_pkt_burst stops accessing vhost device */
> > @@ -607,13 +610,10 @@ new_device(int vid)
> >   #endif
> >
> > internal->vid = vid;
> > -   if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
> > +   if (rte_atomic32_read(&internal->started) == 1)
> > queue_setup(eth_dev, internal);
> > -   rte_atomic32_set(&internal->dev_attached, 1);
> > -   } else {
> > -   RTE_LOG(INFO, PMD, "RX/TX queues have not setup yet\n");
> > -   rte_atomic32_set(&internal->dev_attached, 0);
> > -   }
> > +   else
> > +   RTE_LOG(INFO, PMD, "RX/TX queues not exist yet\n");
> >
> > for (i = 0; i < rte_vhost_get_vring_num(vid); i++)
> > rte_vhost_enable_guest_notification(vid, i, 0);
> > @@ -622,6 +622,7 @@ new_device(int vid)
> >
> > eth_dev->data->dev_link.link_status = ETH_LINK_UP;
> >
> > +   rte_atomic32_set(&internal->dev_attached, 1);
> > update_queuing_status(eth_dev);
> >
> > RTE_LOG(INFO, PMD, "Vhost device %d created\n", vid);
> > @@ -651,23 +652,24 @@ destroy_device(int vid)
> > eth_dev = list->eth_dev;
> > internal = eth_dev->data->dev_private;
> >
> > -   rte_atomic32_set(&internal->started, 0);
> > -   update_queuing_status(eth_dev);
> > rte_atomic32_set(&internal->dev_attached, 0);
> > +   update_queuing_status(eth_dev);
> >
> > eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;
> >
> > -   for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> > -   vq = eth_dev->data->rx_queues[i];
> > -   if (vq == NULL)
> > -   continue;
> > -   vq->vid = -1;
> > -   }
> > -   for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> > -   vq = eth_dev->data->tx_queues[i];
> > -   if (vq == NULL)
> > -   continue;
> > -   vq->vid = -1;
> > +   if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
> > +   for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> > +   vq = eth_dev->data->rx_queues[i];
> > +   if (!vq)
> > +   continue;
> > +   vq->vid = -1;
> > +   }
> > +   for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> > +   vq = eth_dev->data->tx_queues[i];
> > +   if (!vq)
> > +   continue;
> > +   vq->vid = -1;
> > +   }
> > }
> >
> > state = vring_states[eth_dev->data->port_id];
> > @@ -792,11 +794,7 @@ eth_dev_start(struct rte_eth_dev *eth_dev)
> >   {
> > struct pmd_internal *internal = eth_dev->data->dev_private;
> >

Re: [dpdk-dev] [PATCH v2] net/vhost: fix vhost invalid state

2018-04-11 Thread Chen, Junjie J
Hi Jianfeng.

> On 4/11/2018 4:35 PM, Chen, Junjie J wrote:
> >>
> >> On 4/11/2018 6:53 PM, Junjie Chen wrote:
> >>> dev_start sets *dev_attached* after setup queues, this sets device
> >>> to invalid state since no frontend is attached. Also destroy_device
> >>> set
> >>> *started* to zero which makes *allow_queuing* always zero until
> >>> dev_start get called again. Actually, we should not determine queues
> >>> existence by
> >>> *dev_attached* but by queues pointers or other separated variable(s).
> >> I see two issues from your description:
> >>
> >> - We shall fix the wrong use of dev_attached by commit 30a701a53737.
> >> - The allow_queuing is not set correctly. But I doubt if it's a real
> >> issue, as we do update the queue status in dev_start(), dev_stop(),
> >> and new_device(). Can you explain more?
> > The started is set to 0 in destroy_device in 30a701a53737 commit, so
> > allow_queuing is always set to 0 once update_queuing_status get called. We
> have to use "port start" to reset to 1.
> 
> OK, that means it is also due to the wrong use of dev_attached?
Yes, a mixed wrong. That is why I tried to use new variable to track queue. One 
variable for one state.
> 
> >>> Fixes: 30a701a53737 ("net/vhost: fix crash when creating vdev
> >>> dynamically")
> >>>
> >>> Signed-off-by: Junjie Chen 
> >>> ---
> >>> Changes in v2:
> >>> - use started to determine vhost queues readiness
> >>> - revert setting started to zero in destroy_device
> >>>drivers/net/vhost/rte_eth_vhost.c | 61
> >> ---
> >>>1 file changed, 31 insertions(+), 30 deletions(-)
> >>>
> >>> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> >> b/drivers/net/vhost/rte_eth_vhost.c
> >>> index 11b6076..ff462b3 100644
> >>> --- a/drivers/net/vhost/rte_eth_vhost.c
> >>> +++ b/drivers/net/vhost/rte_eth_vhost.c
> >>> @@ -528,10 +528,13 @@ update_queuing_status(struct rte_eth_dev
> *dev)
> >>>   unsigned int i;
> >>>   int allow_queuing = 1;
> >>>
> >>> - if (rte_atomic32_read(&internal->dev_attached) == 0)
> >>> + if (!dev->data->rx_queues || !dev->data->tx_queues) {
> >>> + RTE_LOG(ERR, PMD, "RX/TX queues not exist yet\n");
> >> Even it's not introduced in this patch, but I think we shall not
> >> report error log here.
> > if you attach one port with "port attach" (no vhost queue setup yet),
> > and then execute "port stop", the queue status is not exist yet, right?
> 
> My point is that we allow a dev is not started but attached, so it's not an 
> error,
> and we shall not print error log.
OK, will remove it in V3. 

> 
> >>>   return;
> >>> + }
> >>>
> >>> - if (rte_atomic32_read(&internal->started) == 0)
> >>> + if (rte_atomic32_read(&internal->started) == 0 ||
> >>> + rte_atomic32_read(&internal->dev_attached) == 0)
> >>>   allow_queuing = 0;
> >>>
> >>>   /* Wait until rx/tx_pkt_burst stops accessing vhost device */ @@
> >>> -607,13 +610,10 @@ new_device(int vid)
> >>>#endif
> >>>
> >>>   internal->vid = vid;
> >>> - if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
> >>> + if (rte_atomic32_read(&internal->started) == 1)
> >>>   queue_setup(eth_dev, internal);
> >>> - rte_atomic32_set(&internal->dev_attached, 1);
> >>> - } else {
> >>> - RTE_LOG(INFO, PMD, "RX/TX queues have not setup yet\n");
> >>> - rte_atomic32_set(&internal->dev_attached, 0);
> >>> - }
> >>> + else
> >>> + RTE_LOG(INFO, PMD, "RX/TX queues not exist yet\n");
> >>>
> >>>   for (i = 0; i < rte_vhost_get_vring_num(vid); i++)
> >>>   rte_vhost_enable_guest_notification(vid, i, 0); @@ 
> >>> -622,6
> >>> +622,7 @@ new_device(int vid)
> >>>
> >>>   eth_dev->data->dev_link.link_status = ETH_LINK_UP;
> >>>
> >>> + rte_atomic32_set(&internal->dev_attached, 1);
> >>> 

Re: [dpdk-dev] [PATCH v3] net/vhost: fix vhost invalid state

2018-04-12 Thread Chen, Junjie J
> 
> 
> 
> On 04/12/2018 09:21 AM, Tan, Jianfeng wrote:
> >
> >
> > On 4/12/2018 1:02 AM, Junjie Chen wrote:
> >> dev_start sets *dev_attached* after setup queues, this sets device to
> >> invalid state since no frontend is attached. Also destroy_device set
> >> *started* to zero which makes *allow_queuing* always zero until
> >> dev_start get called again. Actually, we should not determine queues
> >> existence by
> >> *dev_attached* but by queues pointers or other separated variable(s).
> >>
> >> Fixes: 30a701a53737 ("net/vhost: fix crash when creating vdev
> >> dynamically")
> >>
> >> Signed-off-by: Junjie Chen 
> >> Tested-by: Jens Freimann 
> >
> > Overall, looks great to me except a nit below.
> >
> > Reviewed-by: Jianfeng Tan 
> 
> Thanks Jianfeng, I can handle the small change while applying.
> 
> Can you confirm that it is implied that the queue are already allocated, else 
> we
> wouldn't find the internal resource and quit earlier (in case of eth_dev_close
> called twice for example)?

That is required, otherwise it generate segfault if we close device before 
queue setup. For example we
execute following steps in testpmd:
1. port attach
2. ctrl+D

> 
> Thanks,
> Maxime
> 
> >
> >> ---
> >> Changes in v3:
> >> - remove useless log in queue status showing Changes in v2:
> >> - use started to determine vhost queues readiness
> >> - revert setting started to zero in destroy_device
> >>   drivers/net/vhost/rte_eth_vhost.c | 59
> >> +++
> >>   1 file changed, 29 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> >> b/drivers/net/vhost/rte_eth_vhost.c
> >> index 11b6076..e392d71 100644
> >> --- a/drivers/net/vhost/rte_eth_vhost.c
> >> +++ b/drivers/net/vhost/rte_eth_vhost.c
> >> @@ -528,10 +528,11 @@ update_queuing_status(struct rte_eth_dev
> *dev)
> >>   unsigned int i;
> >>   int allow_queuing = 1;
> >> -    if (rte_atomic32_read(&internal->dev_attached) == 0)
> >> +    if (!dev->data->rx_queues || !dev->data->tx_queues)
> >>   return;
> >> -    if (rte_atomic32_read(&internal->started) == 0)
> >> +    if (rte_atomic32_read(&internal->started) == 0 ||
> >> +    rte_atomic32_read(&internal->dev_attached) == 0)
> >>   allow_queuing = 0;
> >>   /* Wait until rx/tx_pkt_burst stops accessing vhost device */
> >> @@ -607,13 +608,10 @@ new_device(int vid)
> >>   #endif
> >>   internal->vid = vid;
> >> -    if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
> >> +    if (rte_atomic32_read(&internal->started) == 1)
> >>   queue_setup(eth_dev, internal);
> >> -    rte_atomic32_set(&internal->dev_attached, 1);
> >> -    } else {
> >> -    RTE_LOG(INFO, PMD, "RX/TX queues have not setup yet\n");
> >> -    rte_atomic32_set(&internal->dev_attached, 0);
> >> -    }
> >> +    else
> >> +    RTE_LOG(INFO, PMD, "RX/TX queues not exist yet\n");
> >>   for (i = 0; i < rte_vhost_get_vring_num(vid); i++)
> >>   rte_vhost_enable_guest_notification(vid, i, 0); @@ -622,6
> >> +620,7 @@ new_device(int vid)
> >>   eth_dev->data->dev_link.link_status = ETH_LINK_UP;
> >> +    rte_atomic32_set(&internal->dev_attached, 1);
> >>   update_queuing_status(eth_dev);
> >>   RTE_LOG(INFO, PMD, "Vhost device %d created\n", vid); @@
> >> -651,23 +650,24 @@ destroy_device(int vid)
> >>   eth_dev = list->eth_dev;
> >>   internal = eth_dev->data->dev_private;
> >> -    rte_atomic32_set(&internal->started, 0);
> >> -    update_queuing_status(eth_dev);
> >>   rte_atomic32_set(&internal->dev_attached, 0);
> >> +    update_queuing_status(eth_dev);
> >>   eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;
> >> -    for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> >> -    vq = eth_dev->data->rx_queues[i];
> >> -    if (vq == NULL)
> >> -    continue;
> >> -    vq->vid = -1;
> >> -    }
> >> -    for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> >> -    vq = eth_dev->data->tx_queues[i];
> >> -    if (vq == NULL)
> >> -    continue;
> >> -    vq->vid = -1;
> >> +    if (eth_dev->data->rx_queues && eth_dev->data->tx_queues) {
> >> +    for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> >> +    vq = eth_dev->data->rx_queues[i];
> >> +    if (!vq)
> >> +    continue;
> >> +    vq->vid = -1;
> >> +    }
> >> +    for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> >> +    vq = eth_dev->data->tx_queues[i];
> >> +    if (!vq)
> >> +    continue;
> >> +    vq->vid = -1;
> >> +    }
> >>   }
> >>   state = vring_states[eth_dev->data->port_id];
> >> @@ -792,11 +792,7 @@ eth_dev_start(struct rte_eth_dev *eth_dev)
> >>   {
> >>   struct pmd_internal *internal = eth_dev->data->dev_private;
> >> -    if (unlikely(rte_atomic32_read(&internal->dev_attached) == 0)) {
> >> -    queue_setup(eth_dev, internal);
> >> -    rte_a

Re: [dpdk-dev] [PATCH v6 1/2] vhost: add support for interrupt mode

2018-04-13 Thread Chen, Junjie J
Hi 
> Please next time run checkpatch before posting, I had to fix below errors when

Thanks, I ran for v1-v5 and too confident for the last minor change.
> applying.
> 
> Thanks,
> Maxime
> 
> ### [dpdk-dev,v6,1/2] vhost: add support for interrupt mode
> 
> CHECK:BRACES: braces {} should be used on all arms of this statement
> #208: FILE: drivers/net/vhost/rte_eth_vhost.c:743:
> + if (rte_atomic32_read(&internal->started) == 1) {
> [...]
>   else
> [...]
> 
> ERROR:TRAILING_WHITESPACE: trailing whitespace
> #237: FILE: drivers/net/vhost/rte_eth_vhost.c:939:
> +^I$
> 
> ERROR:TRAILING_WHITESPACE: trailing whitespace
> #282: FILE: lib/librte_vhost/vhost.c:554:
> +^Ielse^I$
> 
> total: 2 errors, 0 warnings, 225 lines checked
> 
> 0/1 valid patch


Re: [dpdk-dev] [PATCH] vhost: do deep copy while reallocate vq

2018-01-15 Thread Chen, Junjie J
Hi

> > @@ -227,6 +227,7 @@ vhost_user_set_vring_num(struct virtio_net *dev,
> > "zero copy is force disabled\n");
> > dev->dequeue_zero_copy = 0;
> > }
> > +   TAILQ_INIT(&vq->zmbuf_list);
> > }
> >
> > vq->shadow_used_ring = rte_malloc(NULL, @@ -261,6 +262,9 @@
> > numa_realloc(struct virtio_net *dev, int index)
> > int oldnode, newnode;
> > struct virtio_net *old_dev;
> > struct vhost_virtqueue *old_vq, *vq;
> > +   struct zcopy_mbuf *new_zmbuf;
> > +   struct vring_used_elem *new_shadow_used_ring;
> > +   struct batch_copy_elem *new_batch_copy_elems;
> > int ret;
> >
> > old_dev = dev;
> > @@ -285,6 +289,33 @@ numa_realloc(struct virtio_net *dev, int index)
> > return dev;
> >
> > memcpy(vq, old_vq, sizeof(*vq));
> > +   TAILQ_INIT(&vq->zmbuf_list);
> > +
> > +   new_zmbuf = rte_malloc_socket(NULL, vq->zmbuf_size *
> > +   sizeof(struct zcopy_mbuf), 0, newnode);
> > +   if (new_zmbuf) {
> > +   rte_free(vq->zmbufs);
> > +   vq->zmbufs = new_zmbuf;
> > +   }
> 
> You need to consider how to handle the case  ( rte_malloc_socket return
> NULL).

If it failed to allocate new_zmbuf, it uses old zmbufs, so as to keep vhost 
alive.

> 
> > +   new_shadow_used_ring = rte_malloc_socket(NULL,
> > +   vq->size * sizeof(struct vring_used_elem),
> > +   RTE_CACHE_LINE_SIZE,
> > +   newnode);
> > +   if (new_shadow_used_ring) {
> > +   rte_free(vq->shadow_used_ring);
> > +   vq->shadow_used_ring = new_shadow_used_ring;
> > +   }
> > +
> 
> Ditto
> 
> > +   new_batch_copy_elems = rte_malloc_socket(NULL,
> > +   vq->size * sizeof(struct batch_copy_elem),
> > +   RTE_CACHE_LINE_SIZE,
> > +   newnode);
> > +   if (new_batch_copy_elems) {
> > +   rte_free(vq->batch_copy_elems);
> > +   vq->batch_copy_elems = new_batch_copy_elems;
> > +   }
> 
> Ditto
> 
> > +
> > rte_free(old_vq);
> > }
> >
> > --
> > 2.0.1



Re: [dpdk-dev] [PATCH] vhost: do deep copy while reallocate vq

2018-01-15 Thread Chen, Junjie J
Hi
> > > > @@ -227,6 +227,7 @@ vhost_user_set_vring_num(struct virtio_net
> *dev,
> > > > "zero copy is force disabled\n");
> > > > dev->dequeue_zero_copy = 0;
> > > > }
> > > > +   TAILQ_INIT(&vq->zmbuf_list);
> > > > }
> > > >
> > > > vq->shadow_used_ring = rte_malloc(NULL, @@ -261,6 +262,9
> @@
> > > > numa_realloc(struct virtio_net *dev, int index)
> > > > int oldnode, newnode;
> > > > struct virtio_net *old_dev;
> > > > struct vhost_virtqueue *old_vq, *vq;
> > > > +   struct zcopy_mbuf *new_zmbuf;
> > > > +   struct vring_used_elem *new_shadow_used_ring;
> > > > +   struct batch_copy_elem *new_batch_copy_elems;
> > > > int ret;
> > > >
> > > > old_dev = dev;
> > > > @@ -285,6 +289,33 @@ numa_realloc(struct virtio_net *dev, int
> index)
> > > > return dev;
> > > >
> > > > memcpy(vq, old_vq, sizeof(*vq));
> > > > +   TAILQ_INIT(&vq->zmbuf_list);
> > > > +
> > > > +   new_zmbuf = rte_malloc_socket(NULL, vq->zmbuf_size *
> > > > +   sizeof(struct zcopy_mbuf), 0, newnode);
> > > > +   if (new_zmbuf) {
> > > > +   rte_free(vq->zmbufs);
> > > > +   vq->zmbufs = new_zmbuf;
> > > > +   }
> > >
> > > You need to consider how to handle the case  ( rte_malloc_socket
> > > return NULL).
> >
> > If it failed to allocate new_zmbuf, it uses old zmbufs, so as to keep
> > vhost alive.
> 
> It sounds reasonable, another question is, for the 3 blocks of memory being
> allocated,  If some succeed , others fails,  Does it mean that the code will
> run on different socket?  What's the perf impact if it happens.

The original code doesn't do deep copy and thus access memory on different 
socket, this patch is to mitigate this situation. It does access remote memory 
when one of above allocation failed. 

I saw some performance improvement (24.8Gbits/s -> 26.1Gbit/s) on my dev 
machine when only reallocate for zmbufs, while I didn't see significant 
performance difference when allocating vring_used_elem 
and batch_copy_elem.



Re: [dpdk-dev] [PATCH] vhost: dequeue zero copy should restore mbuf before return to pool

2018-01-17 Thread Chen, Junjie J
> >
> > dequeue zero copy change buf_addr and buf_iova of mbuf, and return to
> > mbuf pool without restore them, it breaks vm memory if others allocate
> > mbuf from same pool since mbuf reset doesn't reset buf_addr and
> buf_iova.
> >
> > Signed-off-by: Junjie Chen 
> > ---
> >  lib/librte_vhost/virtio_net.c | 21 +
> >  1 file changed, 21 insertions(+)
> >
> > diff --git a/lib/librte_vhost/virtio_net.c
> > b/lib/librte_vhost/virtio_net.c index 568ad0e..e9aaf6d 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -1158,6 +1158,26 @@ mbuf_is_consumed(struct rte_mbuf *m)
> > return true;
> >  }
> >
> > +
> > +static __rte_always_inline void
> > +restore_mbuf(struct rte_mbuf *m)
> > +{
> > +   uint32_t mbuf_size, priv_size;
> > +
> > +   while (m) {
> > +   priv_size = rte_pktmbuf_priv_size(m->pool);
> > +   mbuf_size = sizeof(struct rte_mbuf) + priv_size;
> > +   /* start of buffer is after mbuf structure and priv data */
> > +   m->priv_size = priv_size;
> 
> I don't think we need to restore priv_size. Refer to its definition in 
> rte_mbuf:
> "Size of the application private data. In case of an indirect mbuf, it
> stores the direct mbuf private data size."
> 
> Thanks,
> Jianfeng

You are right, I also remove restore for data_len since it is reset when 
allocating. Please see v2.
Thanks.

> 
> > +
> > +   m->buf_addr = (char *)m + mbuf_size;
> > +   m->buf_iova = rte_mempool_virt2iova(m) + mbuf_size;
> > +   m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM,
> > +   (uint16_t)m->buf_len);
> > +   m = m->next;
> > +   }
> > +}
> > +
> >  uint16_t
> >  rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t
> > count)
> > @@ -1209,6 +1229,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t
> > queue_id,
> > nr_updated += 1;
> >
> > TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next);
> > +   restore_mbuf(zmbuf->mbuf);
> > rte_pktmbuf_free(zmbuf->mbuf);
> > put_zmbuf(zmbuf);
> > vq->nr_zmbuf -= 1;
> > --
> > 2.0.1



Re: [dpdk-dev] About : Enable optional dequeue zero copy for vHost User

2018-01-23 Thread Chen, Junjie J
Hi
Which version of dpdk you are using? I have some fixes for dequeue zero copy 
and now in 18.02-rc1, you can try 18.02-r1.

Cheers
JJ

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Loftus, Ciara
> Sent: Tuesday, January 23, 2018 6:46 PM
> To: liyang07 
> Cc: dev@dpdk.org; d...@openvswitch.org
> Subject: Re: [dpdk-dev] About : Enable optional dequeue zero copy for vHost
> User
> 
> Hi,
> 
> For the meantime this feature is proposed as ‘experimental’ for OVS DPDK.
> Unless you are transmitting to a NIC, you don’t need to set the n_txq_desc.
> My testing has been only with a DPDK driver in the guest. Have you tried that
> option?
> 
> Thanks,
> Ciara
> 
> From: liyang07 [mailto:liyan...@corp.netease.com]
> Sent: Wednesday, January 17, 2018 10:41 AM
> To: Loftus, Ciara 
> Cc: dev@dpdk.org
> Subject: About : Enable optional dequeue zero copy for vHost User
> 
> 
> Hi Ciara,
> 
> I am tesing the function of "vHost dequeue zero copy" for vm2vm on a
> host, and I have some problems:
> 
> 1. The networking is OK before run iperf, I can ping successed from vm1
> to vm2,but after run iperf, the networking between vm1 and vm2 is down;(I
> think n_txq_desc cause the problem)
> 
> 2. I know the limitation about n_txq_desc, but I cannot set the
> n_txq_desc for dpdkport while the vms on the same host, because there is
> no dpdkports work fow my testing;
> 
> 
> 
> Thus, how can I resolve it, thanks.
> 
> 
> 



Re: [dpdk-dev] [PATCH] doc: add driver limitation for vhost dequeue zero copy

2018-03-13 Thread Chen, Junjie J
Done, Thanks for reminder. 

Cheers
JJ

> -Original Message-
> From: Kovacevic, Marko
> Sent: Tuesday, March 13, 2018 5:13 PM
> To: Tan, Jianfeng ; Chen, Junjie J
> ; dev@dpdk.org
> Cc: y...@fridaylinux.org; maxime.coque...@redhat.com
> Subject: RE: [dpdk-dev] [PATCH] doc: add driver limitation for vhost dequeue
> zero copy
> 
> > > Signed-off-by: Junjie Chen 
> > > ---
> > >  doc/guides/sample_app_ug/vhost.rst | 5 -
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/doc/guides/sample_app_ug/vhost.rst
> > > b/doc/guides/sample_app_ug/vhost.rst
> > > index a4bdc6a..1591a31 100644
> > > --- a/doc/guides/sample_app_ug/vhost.rst
> > > +++ b/doc/guides/sample_app_ug/vhost.rst
> 
> 
> Can you please supersede this patch and the V2 if you have sent a V3 to the
> mailing list.
> 
> Thanks


Re: [dpdk-dev] [PATCH v3] doc: add driver limitation for vhost dequeue zero copy

2018-03-13 Thread Chen, Junjie J
Hi Jianfeng.

> -Original Message-
> From: Tan, Jianfeng
> Sent: Monday, March 12, 2018 11:15 AM
> To: Chen, Junjie J ; y...@fridaylinux.org;
> maxime.coqu...@redhat.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v3] doc: add driver limitation for vhost dequeue zero copy
> 
> 
> 
> On 3/9/2018 6:07 PM, Junjie Chen wrote:
> > In vhost-switch example, when binding nic to vfio-pci with iommu
> > enabled, dequeue zero copy cannot work in VM2NIC mode due to no iommu
> > dma mapping is setup for guest memory currently.
> >
> > Signed-off-by: Junjie Chen 
> > ---
> > Changes in V3:
> > - update limitation to iommu
> > Changes in V2:
> > - add doc in vhost lib
> >
> >   doc/guides/prog_guide/vhost_lib.rst | 5 +
> >   doc/guides/sample_app_ug/vhost.rst  | 5 -
> >   2 files changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guides/prog_guide/vhost_lib.rst
> > b/doc/guides/prog_guide/vhost_lib.rst
> > index 18227b6..06d214f 100644
> > --- a/doc/guides/prog_guide/vhost_lib.rst
> > +++ b/doc/guides/prog_guide/vhost_lib.rst
> > @@ -83,6 +83,11 @@ The following is an overview of some key Vhost API
> functions:
> > of those segments, thus the fewer the segments, the quicker we will
> get
> > the mapping. NOTE: we may speed it by using tree searching in
> future.
> >
> > +* zero copy does not work when using driver with iommu mode
> > + currently, this
> 
> Considering FreeBSD driver nic_uio does not support iommu, we can make
> "driver" more explicitly. For example,
> 
> "driver with iommu mode" -> "vfio-pci with iommu mode"
Will update.
> 
> > +  is because we don't setup iommu dma mapping for guest memory.
> 
> > For example, when you bind device to vfio-pci driver, you need to set 
> > driver to
> work in noiommu mode.
> 
> Reword the above sentence a little bit:
> "If vfio-pci is a must in your case, insert vfio-pci kernel module in noiommu
> mode."
Will update, thanks.

> > +
> > - ``RTE_VHOST_USER_IOMMU_SUPPORT``
> >
> >   IOMMU support will be enabled when this flag is set. It is
> > disabled by diff --git a/doc/guides/sample_app_ug/vhost.rst
> > b/doc/guides/sample_app_ug/vhost.rst
> > index a4bdc6a..f0bb169 100644
> > --- a/doc/guides/sample_app_ug/vhost.rst
> > +++ b/doc/guides/sample_app_ug/vhost.rst
> > @@ -147,7 +147,10 @@ retries on an RX burst, it takes effect only when rx
> retry is enabled. The
> >   default value is 15.
> >
> >   **--dequeue-zero-copy**
> > -Dequeue zero copy will be enabled when this option is given.
> > +Dequeue zero copy will be enabled when this option is given. it is
> > +worth to note that if NIC is binded to driver with iommu enabled,
> > +dequeue zero copy cannot work at VM2NIC mode (vm2vm=0) due to
> > +currently we don't setup iommu dma mapping for guest memory.
> >
> >   **--vlan-strip 0|1**
> >   VLAN strip option is removed, because different NICs have different
> > behaviors



Re: [dpdk-dev] [PATCH] net/vhost: fix segfault when creating vdev dynamically

2018-03-27 Thread Chen, Junjie J
Hi Jianfeng

> On 3/28/2018 12:05 AM, Junjie Chen wrote:
> > when creating vdev dynamically, vhost pmd driver start directly
> > without checking TX/RX queues ready or not, and thus cause
> > segmentation fault when vhost library accessing queues. This patch add
> > flag to check whether queues setup or not, and add driver start call
> > into dev_start to allow user start it after setting up queue.
> 
> The issue is clear now. But this patch just puts the situation before below 
> fix:
> "it doesn't create the actual datagram socket until you call .dev_start()."

No, if the queue exist, the datagram socket still get created in vhost_create 
API, since the vhost_driver_register still exist in vhost_create.

> 
> To fix this issue, we might delay the queue setup until we really have queues?
> 
> > Fixes: aed0b12930b33("net/vhost: fix socket file deleted on stop")
> 
> Missed the space between commit number and title. And we keep 12
> characters for the commit id.

Thanks for reminder, will update in v2.
> 
> Thanks,
> Jianfeng
> 
> > Signed-off-by: Junjie Chen 
> > ---
> >   drivers/net/vhost/rte_eth_vhost.c | 21 +++--
> >   1 file changed, 19 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> > b/drivers/net/vhost/rte_eth_vhost.c
> > index 3aae01c..719a150 100644
> > --- a/drivers/net/vhost/rte_eth_vhost.c
> > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > @@ -118,6 +118,7 @@ struct pmd_internal {
> > char *iface_name;
> > uint16_t max_queues;
> > rte_atomic32_t started;
> > +   rte_atomic32_t once;
> >   };
> >
> >   struct internal_list {
> > @@ -772,12 +773,24 @@ rte_eth_vhost_get_vid_from_port_id(uint16_t
> port_id)
> >   static int
> >   eth_dev_start(struct rte_eth_dev *dev)
> >   {
> > +   int ret = 0;
> > struct pmd_internal *internal = dev->data->dev_private;
> >
> > +   if (unlikely(rte_atomic32_read(&internal->once) == 0)) {
> > +   ret = rte_vhost_driver_start(internal->iface_name);
> > +   if (ret < 0) {
> > +   RTE_LOG(ERR, PMD, "Failed to start driver for %s\n",
> > +   internal->iface_name);
> > +   return ret;
> > +   }
> > +
> > +   rte_atomic32_set(&internal->once, 1);
> > +   }
> > +
> > rte_atomic32_set(&internal->started, 1);
> > update_queuing_status(dev);
> >
> > -   return 0;
> > +   return ret;
> >   }
> >
> >   static void
> > @@ -1101,7 +1114,11 @@ eth_dev_vhost_create(struct rte_vdev_device
> *dev, char *iface_name,
> > goto error;
> > }
> >
> > -   if (rte_vhost_driver_start(iface_name) < 0) {
> > +   if (!data->rx_queues || !data->tx_queues) {
> > +   RTE_LOG(INFO, PMD,
> > +   "TX/RX queue is not ready, driver will not start\n");
> > +   rte_atomic32_set(&internal->once, 0);
> > +   } else if (rte_vhost_driver_start(iface_name) < 0) {
> > RTE_LOG(ERR, PMD, "Failed to start driver for %s\n",
> > iface_name);
> > goto error;



Re: [dpdk-dev] [PATCH] net/vhost: fix segfault when creating vdev dynamically

2018-03-27 Thread Chen, Junjie J

> >
> >> On 3/28/2018 12:05 AM, Junjie Chen wrote:
> >>> when creating vdev dynamically, vhost pmd driver start directly
> >>> without checking TX/RX queues ready or not, and thus cause
> >>> segmentation fault when vhost library accessing queues. This patch
> >>> add flag to check whether queues setup or not, and add driver start
> >>> call into dev_start to allow user start it after setting up queue.
> >> The issue is clear now. But this patch just puts the situation before below
> fix:
> >> "it doesn't create the actual datagram socket until you call .dev_start()."
> > No, if the queue exist, the datagram socket still get created in 
> > vhost_create
> API, since the vhost_driver_register still exist in vhost_create.
> 
> The queue can never be created, as it's still not probed.

I think we need to separate this into two cases:
Statically create vdev, the datagram recreate logical is still there 
since queues are exist already, this patch doesn't change anything.
Dynamic create vdev, as you pointed out, queue can never be created, 
while this should be not valid since In normal process of creating vdev 
dynamically, we always need to config queues. Correct me if I'm wrong.

In summary, I think the previously commit fixes the static code path and this 
patch fixes the dynamic code path (we need to at least setup queue once).


Re: [dpdk-dev] [PATCH] net/vhost: fix segfault when creating vdev dynamically

2018-03-27 Thread Chen, Junjie J

> 
> On 3/27/2018 5:24 PM, Chen, Junjie J wrote:
> >>>> On 3/28/2018 12:05 AM, Junjie Chen wrote:
> >>>>> when creating vdev dynamically, vhost pmd driver start directly
> >>>>> without checking TX/RX queues ready or not, and thus cause
> >>>>> segmentation fault when vhost library accessing queues. This patch
> >>>>> add flag to check whether queues setup or not, and add driver
> >>>>> start call into dev_start to allow user start it after setting up queue.
> >>>> The issue is clear now. But this patch just puts the situation
> >>>> before below
> >> fix:
> >>>> "it doesn't create the actual datagram socket until you call 
> >>>> .dev_start()."
> >>> No, if the queue exist, the datagram socket still get created in
> >>> vhost_create
> >> API, since the vhost_driver_register still exist in vhost_create.
> >>
> >> The queue can never be created, as it's still not probed.
> > I think we need to separate this into two cases:
> > Statically create vdev, the datagram recreate logical is still there 
> > since
> queues are exist already, this patch doesn't change anything.
> > Dynamic create vdev, as you pointed out, queue can never be created,
> while this should be not valid since In normal process of creating vdev
> dynamically, we always need to config queues. Correct me if I'm wrong.
> 
> My point is, either vdev is created statically or dynamically, when probe(),
> queues are not setup yet definitely, then *the unix socket will not be 
> created*
> until we set up the queues and do dev_start(). If the unix socket is not 
> created,
> then VM cannot connect to it.

Yes, I agree this. 
In this patch, it just check whether queue is setup or not and give user a 
chance to setup queue with dev_start, it doesn't revert the logical from 
previously commit.

So the logical is change to stop creating unix socket before queue setup, what 
do you think about this? 
> 
> > In summary, I think the previously commit fixes the static code path and 
> > this
> patch fixes the dynamic code path (we need to at least setup queue once).



Re: [dpdk-dev] [PATCH v2] vhost: add support for interrupt mode

2018-04-02 Thread Chen, Junjie J
> 
> 
> 
> On 3/29/2018 10:37 PM, Junjie Chen wrote:
> > In some cases we want vhost dequeue work in interrupt mode to release
> > cpus to others when no data to transmit. So we install interrupt
> > handler of vhost device and interrupt vectors for each rx queue when
> > creating new backend according to vhost intrerupt configuration. Thus,
> > applications could register a epoll event fd to associate rx queues
> > with interrupt vectors.
> >
> > Signed-off-by: Junjie Chen 
> > ---
> > Changes in v2:
> > - update rx queue index
> > - fill efd_counter_size for intr handler
> > - update log
> >
> >   drivers/net/vhost/rte_eth_vhost.c | 130
> ++
> >   1 file changed, 130 insertions(+)
> >
> > diff --git a/drivers/net/vhost/rte_eth_vhost.c
> > b/drivers/net/vhost/rte_eth_vhost.c
> > index 3aae01c..b2d925e 100644
> > --- a/drivers/net/vhost/rte_eth_vhost.c
> > +++ b/drivers/net/vhost/rte_eth_vhost.c
> > @@ -552,12 +552,129 @@ update_queuing_status(struct rte_eth_dev
> *dev)
> >   }
> >
> >   static int
> > +eth_rxq_intr_enable(struct rte_eth_dev *dev, uint16_t qid) {
> > +   struct rte_vhost_vring vring;
> > +   struct vhost_queue *vq;
> > +   int ret = 0;
> > +
> > +   vq = dev->data->rx_queues[qid];
> > +   if (!vq) {
> > +   RTE_LOG(ERR, PMD, "rxq%d is not setup yet\n", qid);
> > +   return -1;
> > +   }
> > +
> > +   ret = rte_vhost_get_vhost_vring(vq->vid, (qid << 1) + 1, &vring);
> > +   if (ret < 0) {
> > +   RTE_LOG(ERR, PMD, "Failed to get rxq%d's vring\n", qid);
> > +   return ret;
> > +   }
> > +   RTE_LOG(INFO, PMD, "Enable interrupt for rxq%d\n", qid);
> > +   vring.used->flags &= (~VRING_USED_F_NO_NOTIFY);
> 
> Instead, you might want to use the API, rte_vhost_enable_guest_notification.
Will do.
> 
> > +   rte_wmb();
> > +
> > +   return ret;
> > +}
> > +
> > +static int
> > +eth_rxq_intr_disable(struct rte_eth_dev *dev, uint16_t qid) {
> > +   struct rte_vhost_vring vring;
> > +   struct vhost_queue *vq;
> > +   int ret = 0;
> > +
> > +   vq = dev->data->rx_queues[qid];
> > +   if (!vq) {
> > +   RTE_LOG(ERR, PMD, "rxq%d is not setup yet\n", qid);
> > +   return -1;
> > +   }
> > +
> > +   ret = rte_vhost_get_vhost_vring(vq->vid, (qid << 1) + 1, &vring);
> > +   if (ret < 0) {
> > +   RTE_LOG(ERR, PMD, "Failed to get rxq%d's vring", qid);
> > +   return ret;
> > +   }
> > +   RTE_LOG(INFO, PMD, "Disable interrupt for rxq%d\n", qid);
> > +   vring.used->flags |= VRING_USED_F_NO_NOTIFY;
> > +   rte_wmb();
> 
> Ditto.
Will do.
> 
> > +
> > +   return 0;
> > +}
> > +
> > +static void
> > +eth_vhost_uninstall_intr(struct rte_eth_dev *dev) {
> > +   struct rte_intr_handle *intr_handle = dev->intr_handle;
> > +
> > +   if (intr_handle) {
> > +   if (intr_handle->intr_vec)
> > +   free(intr_handle->intr_vec);
> > +   free(intr_handle);
> > +   }
> > +
> > +   dev->intr_handle = NULL;
> > +}
> > +
> > +static int
> > +eth_vhost_install_intr(struct rte_eth_dev *dev) {
> > +   struct rte_vhost_vring vring;
> > +   struct vhost_queue *vq;
> > +   int count = 0;
> > +   int nb_rxq = dev->data->nb_rx_queues;
> > +   int i;
> > +   int ret;
> > +
> > +   /* uninstall firstly if we are reconnecting */
> > +   if (dev->intr_handle)
> > +   eth_vhost_uninstall_intr(dev);
> > +
> > +   dev->intr_handle = malloc(sizeof(*dev->intr_handle));
> > +   if (!dev->intr_handle) {
> > +   RTE_LOG(ERR, PMD, "Fail to allocate intr_handle\n");
> > +   return -ENOMEM;
> > +   }
> > +   memset(dev->intr_handle, 0, sizeof(*dev->intr_handle));
> > +
> > +   dev->intr_handle->efd_counter_size = sizeof(uint64_t);
> 
> I did not read-clean this for virtio-user. But we'd better keep consistent, to
> avoid confusion. If we need read-clean this, we could fix virtio-user.
So, do you want to change this in this patch? Or in other separated one?
> 
> > +
> > +   dev->intr_handle->intr_vec =
> > +   malloc(nb_rxq * sizeof(dev->intr_handle->intr_vec[0]));
> > +
> > +   if (!dev->intr_handle->intr_vec) {
> > +   RTE_LOG(ERR, PMD,
> > +   "Failed to allocate memory for interrupt vector\n");
> 
> For completeness, we need to uninstall before return.
Will do.
> 
> > +   return -ENOMEM;
> > +   }
> > +
> > +   for (i = 0; i < nb_rxq; i++) {
> > +   vq = dev->data->rx_queues[i];
> > +   if (!vq)
> > +   continue;
> 
> Put such check at the entry of this function?
> 
> > +   ret = rte_vhost_get_vhost_vring(vq->vid, (i << 1) + 1, &vring);
> > +   if (ret < 0)
> 
> Does that mean we shall take care of this failure?

For above two checks, I had a fix to check overall queue setup in patch: 
https://dpdk.org/dev/patchwork/patch/36766/ recently.
So I would like to put install_intr function into queue_setup helper function 
in patch 36766, and for internal vq, vring, and kickfd check of install_intr, 
I'd like to sk

Re: [dpdk-dev] [PATCH] doc: add a restriction to multi-process support

2017-11-29 Thread Chen, Junjie J
> -Original Message-
> From: Richardson, Bruce
> Sent: Tuesday, November 28, 2017 7:22 PM
> To: Chen, Junjie J 
> Cc: Gonzalez Monroy, Sergio ; Mcnamara,
> John ; Tahhan, Maryam
> ; Pattan, Reshma ;
> dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: add a restriction to multi-process 
> support
> 
> On Tue, Nov 28, 2017 at 04:56:54AM -0500, junjie.j.c...@intel.com wrote:
> > From: Junjie Chen 
> >
> > This patch add a restriction to multi-process support: secondary
> > processes should only run alongside primary process with same DPDK
> > version, so that secondary processes can use the same hugepage mmap
> > layout as primary process.
> >
> > Signed-off-by: Junjie Chen 
> > ---
> >  doc/guides/prog_guide/multi_proc_support.rst | 1 +
> >  doc/guides/tools/proc_info.rst   | 3 ++-
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guides/prog_guide/multi_proc_support.rst
> > b/doc/guides/prog_guide/multi_proc_support.rst
> > index 9a9dca7fe..0cd67ae2d 100644
> > --- a/doc/guides/prog_guide/multi_proc_support.rst
> > +++ b/doc/guides/prog_guide/multi_proc_support.rst
> > @@ -51,6 +51,7 @@ For now, there are two types of process specified:
> >  Standalone DPDK processes are primary processes,  while secondary
> > processes can only run alongside a primary process or  after a primary
> > process has already configured the hugepage shared memory for them.
> > +And secondary processes should only run alongside primary process with
> same DPDK version.
> >
> >  To support these two process types, and other multi-process setups
> > described later,  two additional command-line parameters are available to
> the EAL:
> > diff --git a/doc/guides/tools/proc_info.rst
> > b/doc/guides/tools/proc_info.rst index fd17e278c..512fd3263 100644
> > --- a/doc/guides/tools/proc_info.rst
> > +++ b/doc/guides/tools/proc_info.rst
> > @@ -36,7 +36,8 @@ The dpdk-procinfo application is a Data Plane
> > Development Kit (DPDK) application  that runs as a DPDK secondary
> > process and is capable of retrieving port  statistics, resetting port 
> > statistics
> and printing DPDK memory information.
> >  This application extends the original functionality that was
> > supported by -dump_cfg.
> > +dump_cfg. Note that dpdk-procinfo can only run alongside primary
> > +process with same DPDK version.
> >
> >  Running the Application
> >  ---
> > --
> Good to add to the docs. I suggest that you add them using the proper "note"
> markup to make it clearer in the docs.

> /Bruce

Thanks, done.

Junjie


Re: [dpdk-dev] [PATCH v5] vhost_user: protect active rings from async ring changes

2018-01-30 Thread Chen, Junjie J
Hi 
May I know why not use trylock also in enqueue path?

Cheers
JJ

> 
> When performing live migration or memory hot-plugging, the changes to the
> device and vrings made by message handler done independently from vring
> usage by PMD threads.
> 
> This causes for example segfaults during live-migration with MQ enable, but
> in general virtually any request sent by qemu changing the state of device
> can cause problems.
> 
> These patches fixes all above issues by adding a spinlock to every vring and
> requiring message handler to start operation only after ensuring that all PMD
> threads related to the device are out of critical section accessing the vring
> data.
> 
> Each vring has its own lock in order to not create contention between PMD
> threads of different vrings and to prevent performance degradation by
> scaling queue pair number.
> 
> See https://bugzilla.redhat.com/show_bug.cgi?id=1450680
> 
> Signed-off-by: Victor Kaplansky 
> ---
> v5:
>  o get rid of spinlock wrapping functions in vhost.h
> 
> v4:
> 
>  o moved access_unlock before accessing enable flag and
>access_unlock after iommu_unlock consistently.
>  o cosmetics: removed blank line.
>  o the access_lock variable moved to be in the same
>cache line with enable and access_ok flags.
>  o dequeue path is now guarded with trylock and returning
>zero if unsuccessful.
>  o GET_VRING_BASE operation is not guarded by access lock
>to avoid deadlock with device_destroy. See the comment
>in the code.
>  o Fixed error path exit from enqueue and dequeue carefully
>unlocking access and iommu locks as appropriate.
> 
> v3:
>o Added locking to enqueue flow.
>o Enqueue path guarded as well as dequeue path.
>o Changed name of active_lock.
>o Added initialization of guarding spinlock.
>o Reworked functions skimming over all virt-queues.
>o Performance measurements done by Maxime Coquelin shows
>  no degradation in bandwidth and throughput.
>o Spelling.
>o Taking lock only on set operations.
>o IOMMU messages are not guarded by access lock.
> 
> v2:
>o Fixed checkpatch complains.
>o Added Signed-off-by.
>o Refined placement of guard to exclude IOMMU messages.
>o TODO: performance degradation measurement.
> 
>  lib/librte_vhost/vhost.h  |  6 ++--
>  lib/librte_vhost/vhost.c  |  1 +
>  lib/librte_vhost/vhost_user.c | 70
> +++
>  lib/librte_vhost/virtio_net.c | 28 ++---
>  4 files changed, 99 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index
> 1cc81c17..c8f2a817 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -108,12 +108,14 @@ struct vhost_virtqueue {
> 
>   /* Backend value to determine if device should started/stopped */
>   int backend;
> + int enabled;
> + int access_ok;
> + rte_spinlock_t  access_lock;
> +
>   /* Used to notify the guest (trigger interrupt) */
>   int callfd;
>   /* Currently unused as polling mode is enabled */
>   int kickfd;
> - int enabled;
> - int access_ok;
> 
>   /* Physical address of used ring, for logging */
>   uint64_tlog_guest_addr;
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index
> 4f8b73a0..dcc42fc7 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -259,6 +259,7 @@ alloc_vring_queue(struct virtio_net *dev, uint32_t
> vring_idx)
> 
>   dev->virtqueue[vring_idx] = vq;
>   init_vring_queue(dev, vring_idx);
> + rte_spinlock_init(&vq->access_lock);
> 
>   dev->nr_vring += 1;
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index f4c7ce46..0685d4e7 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1190,12 +1190,47 @@
> vhost_user_check_and_alloc_queue_pair(struct virtio_net *dev,
> VhostUserMsg *msg)
>   return alloc_vring_queue(dev, vring_idx);  }
> 
> +static void
> +vhost_user_lock_all_queue_pairs(struct virtio_net *dev) {
> + unsigned int i = 0;
> + unsigned int vq_num = 0;
> +
> + while (vq_num < dev->nr_vring) {
> + struct vhost_virtqueue *vq = dev->virtqueue[i];
> +
> + if (vq) {
> + rte_spinlock_lock(&vq->access_lock);
> + vq_num++;
> + }
> + i++;
> + }
> +}
> +
> +static void
> +vhost_user_unlock_all_queue_pairs(struct virtio_net *dev) {
> + unsigned int i = 0;
> + unsigned int vq_num = 0;
> +
> + while (vq_num < dev->nr_vring) {
> + struct vhost_virtqueue *vq = dev->virtqueue[i];
> +
> + if (vq) {
> + rte_spinlock_unlock(&vq->access_lock);
> + vq_num++;
> +  

Re: [dpdk-dev] [PATCH v5] vhost_user: protect active rings from async ring changes

2018-01-31 Thread Chen, Junjie J
Make sense, Thanks.

Cheers
JJ


> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Wednesday, January 31, 2018 4:13 PM
> To: Chen, Junjie J ; Victor Kaplansky
> 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v5] vhost_user: protect active rings from
> async ring changes
> 
> Hi,
> 
> On 01/31/2018 07:51 AM, Chen, Junjie J wrote:
> > Hi
> > May I know why not use trylock also in enqueue path?
> 
> Because if rte_vhost_enqueue_burst() returns 0, the caller is likely to drop
> the packets. This is what happens for example with OVS:
> 
> static void
> __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
>   struct dp_packet **pkts, int cnt) { ...
>  do {
>  int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
>  unsigned int tx_pkts;
> 
>  tx_pkts = rte_vhost_enqueue_burst(vid, vhost_qid, cur_pkts, cnt);
>  if (OVS_LIKELY(tx_pkts)) {
>  /* Packets have been sent.*/
>  cnt -= tx_pkts;
>  /* Prepare for possible retry.*/
>  cur_pkts = &cur_pkts[tx_pkts];
>  } else {
>  /* No packets sent - do not retry.*/
>  break;
>  }
>  } while (cnt && (retries++ <= VHOST_ENQ_RETRY_NUM)); ...
> 
> 
> Maxime
> > Cheers
> > JJ
> >
> >>
> >> When performing live migration or memory hot-plugging, the changes to
> >> the device and vrings made by message handler done independently from
> >> vring usage by PMD threads.
> >>
> >> This causes for example segfaults during live-migration with MQ
> >> enable, but in general virtually any request sent by qemu changing
> >> the state of device can cause problems.
> >>
> >> These patches fixes all above issues by adding a spinlock to every
> >> vring and requiring message handler to start operation only after
> >> ensuring that all PMD threads related to the device are out of
> >> critical section accessing the vring data.
> >>
> >> Each vring has its own lock in order to not create contention between
> >> PMD threads of different vrings and to prevent performance
> >> degradation by scaling queue pair number.
> >>
> >> See https://bugzilla.redhat.com/show_bug.cgi?id=1450680
> >>
> >> Signed-off-by: Victor Kaplansky 
> >> ---
> >> v5:
> >>   o get rid of spinlock wrapping functions in vhost.h
> >>
> >> v4:
> >>
> >>   o moved access_unlock before accessing enable flag and
> >> access_unlock after iommu_unlock consistently.
> >>   o cosmetics: removed blank line.
> >>   o the access_lock variable moved to be in the same
> >> cache line with enable and access_ok flags.
> >>   o dequeue path is now guarded with trylock and returning
> >> zero if unsuccessful.
> >>   o GET_VRING_BASE operation is not guarded by access lock
> >> to avoid deadlock with device_destroy. See the comment
> >> in the code.
> >>   o Fixed error path exit from enqueue and dequeue carefully
> >> unlocking access and iommu locks as appropriate.
> >>
> >> v3:
> >> o Added locking to enqueue flow.
> >> o Enqueue path guarded as well as dequeue path.
> >> o Changed name of active_lock.
> >> o Added initialization of guarding spinlock.
> >> o Reworked functions skimming over all virt-queues.
> >> o Performance measurements done by Maxime Coquelin shows
> >>   no degradation in bandwidth and throughput.
> >> o Spelling.
> >> o Taking lock only on set operations.
> >> o IOMMU messages are not guarded by access lock.
> >>
> >> v2:
> >> o Fixed checkpatch complains.
> >> o Added Signed-off-by.
> >> o Refined placement of guard to exclude IOMMU messages.
> >> o TODO: performance degradation measurement.
> >>
> >>   lib/librte_vhost/vhost.h  |  6 ++--
> >>   lib/librte_vhost/vhost.c  |  1 +
> >>   lib/librte_vhost/vhost_user.c | 70
> >> +++
> >>   lib/librte_vhost/virtio_net.c | 28 ++---
> >>   4 files changed, 99 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> >> index
> >> 1cc81c17..c8f2a817 100644
> >> --- a/lib/librte_vhost/vhost.h
> >> +++ b/lib/lib

Re: [dpdk-dev] [PATCH v2] doc: add driver limitation for vhost dequeue zero copy

2018-03-08 Thread Chen, Junjie J


> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Tuesday, March 6, 2018 4:57 PM
> To: Chen, Junjie J ; y...@fridaylinux.org; Tan,
> Jianfeng 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v2] doc: add driver limitation for vhost dequeue zero copy
> 
> 
> 
> On 02/27/2018 10:21 AM, Junjie Chen wrote:
> > In vhost-switch example, when binding nic to vfio-pci, dequeue zero
> > copy cannot work in VM2NIC mode due to no iommu dma mapping is setup
> > for guest memory currently.
> >
> > Signed-off-by: Junjie Chen 
> > ---
> > Changes in V2:
> >   - add doc in vhost lib
> >
> >   doc/guides/prog_guide/vhost_lib.rst | 3 +++
> >   doc/guides/sample_app_ug/vhost.rst  | 5 -
> >   2 files changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/doc/guides/prog_guide/vhost_lib.rst
> > b/doc/guides/prog_guide/vhost_lib.rst
> > index 18227b6..bdf77d6 100644
> > --- a/doc/guides/prog_guide/vhost_lib.rst
> > +++ b/doc/guides/prog_guide/vhost_lib.rst
> > @@ -83,6 +83,9 @@ The following is an overview of some key Vhost API
> functions:
> > of those segments, thus the fewer the segments, the quicker we
> will get
> > the mapping. NOTE: we may speed it by using tree searching in
> future.
> >
> > +* zero copy does not work when using vfio-pci driver currently, this is
> > +  because we don't setup iommu dma mapping for guest memory.
> > +
> 
> I guess that it should work with vfio-pci in noiommu mode? Maybe worth to
> clarify.
You are right, it works on noiommu mode. Updated patch accordingly. 
> 
> > - ``RTE_VHOST_USER_IOMMU_SUPPORT``
> >
> >   IOMMU support will be enabled when this flag is set. It is
> > disabled by diff --git a/doc/guides/sample_app_ug/vhost.rst
> > b/doc/guides/sample_app_ug/vhost.rst
> > index a4bdc6a..840c1fd 100644
> > --- a/doc/guides/sample_app_ug/vhost.rst
> > +++ b/doc/guides/sample_app_ug/vhost.rst
> > @@ -147,7 +147,10 @@ retries on an RX burst, it takes effect only when rx
> retry is enabled. The
> >   default value is 15.
> >
> >   **--dequeue-zero-copy**
> > -Dequeue zero copy will be enabled when this option is given.
> > +Dequeue zero copy will be enabled when this option is given, it is
> > +worth to note that if NIC is binded to vfio-pci driver, dequeue zero
> > +copy cannot work at VM2NIC mode (vm2vm=0) due to currently we don't
> > +setup iommu dma mapping for guest memory.
> >
> >   **--vlan-strip 0|1**
> >   VLAN strip option is removed, because different NICs have different
> > behaviors
> >


Re: [dpdk-dev] [PATCH v3] doc: add a restriction to multi-process support

2017-12-12 Thread Chen, Junjie J
Done!

Cheers
JJ

> -Original Message-
> From: Mcnamara, John
> Sent: Tuesday, December 12, 2017 8:41 PM
> To: Chen, Junjie J ; Gonzalez Monroy, Sergio
> ; Tahhan, Maryam
> ; dev@dpdk.org
> Subject: RE: [PATCH v3] doc: add a restriction to multi-process support
> 
> 
> 
> > -Original Message-
> > From: Chen, Junjie J
> > Sent: Wednesday, November 29, 2017 5:17 PM
> > To: Gonzalez Monroy, Sergio ;
> > Mcnamara, John ; Tahhan, Maryam
> > ; dev@dpdk.org
> > Cc: Chen, Junjie J 
> > Subject: [PATCH v3] doc: add a restriction to multi-process support
> >
> > This patch add a restriction to multi-process support: secondary
> > processes should only run alongside primary process with same DPDK
> > version, so that secondary processes can use the same hugepage mmap
> > layout as primary process.
> >
> > Signed-off-by: Junjie Chen 
> 
> Could you mark the previous version of this patch as "Superseded":
> 
> http://dpdk.org/dev/patchwork/project/dpdk/list/?submitter=&state=&q=restri
> ction+to+multi-process+support&archive=&delegate=
> 
> 
> Otherwise:
> 
> Acked-by: John McNamara 
> 



Re: [dpdk-dev] [PATCH v3 2/2] gro: support VxLAN GRO

2017-12-22 Thread Chen, Junjie J
Hi Jiayu

> -Original Message-
> From: Hu, Jiayu
> Sent: Friday, December 22, 2017 3:26 PM
> To: dev@dpdk.org
> Cc: Tan, Jianfeng ; Chen, Junjie J
> ; Ananyev, Konstantin
> ; step...@networkplumber.org; Yigit,
> Ferruh ; Yao, Lei A ; Hu, Jiayu
> 
> Subject: [PATCH v3 2/2] gro: support VxLAN GRO
> 
> This patch adds a framework that allows GRO on tunneled packets.
> Furthermore, it leverages that framework to provide GRO support for
> VxLAN-encapsulated packets. Supported VxLAN packets must have an outer
> IPv4 header, and contain an inner TCP/IPv4 packet.
> 
> VxLAN GRO doesn't check if input packets have correct checksums and
> doesn't update checksums for output packets. Additionally, it assumes the
> packets are complete (i.e., MF==0 && frag_off==0), when IP fragmentation is
> possible (i.e., DF==0).
> 
> Signed-off-by: Jiayu Hu 
> ---
>  .../prog_guide/generic_receive_offload_lib.rst |  31 +-
>  lib/librte_gro/Makefile|   1 +
>  lib/librte_gro/gro_vxlan_tcp4.c| 515
> +
>  lib/librte_gro/gro_vxlan_tcp4.h| 184 
>  lib/librte_gro/rte_gro.c   | 129 +-
>  lib/librte_gro/rte_gro.h   |   5 +-
>  6 files changed, 837 insertions(+), 28 deletions(-)  create mode 100644
> lib/librte_gro/gro_vxlan_tcp4.c  create mode 100644
> lib/librte_gro/gro_vxlan_tcp4.h
> 
> diff --git a/doc/guides/prog_guide/generic_receive_offload_lib.rst
> b/doc/guides/prog_guide/generic_receive_offload_lib.rst
> index c2d7a41..078bec0 100644
> --- a/doc/guides/prog_guide/generic_receive_offload_lib.rst
> +++ b/doc/guides/prog_guide/generic_receive_offload_lib.rst
> @@ -57,7 +57,9 @@ assumes the packets are complete (i.e., MF==0 &&
> frag_off==0), when IP  fragmentation is possible (i.e., DF==0). Additionally, 
> it
> complies RFC
>  6864 to process the IPv4 ID field.
> 
> -Currently, the GRO library provides GRO supports for TCP/IPv4 packets.
> +Currently, the GRO library provides GRO supports for TCP/IPv4 packets
> +and VxLAN packets which contain an outer IPv4 header and an inner
> +TCP/IPv4 packet.
> 
>  Two Sets of API
>  ---
> @@ -108,7 +110,8 @@ Reassembly Algorithm
> 
>  The reassembly algorithm is used for reassembling packets. In the GRO
> library, different GRO types can use different algorithms. In this -section, 
> we
> will introduce an algorithm, which is used by TCP/IPv4 GRO.
> +section, we will introduce an algorithm, which is used by TCP/IPv4 GRO
> +and VxLAN GRO.
> 
>  Challenges
>  ~~
> @@ -185,6 +188,30 @@ Header fields deciding if two packets are neighbors
> include:
>  - IPv4 ID. The IPv4 ID fields of the packets, whose DF bit is 0, should
>be increased by 1.
> 
> +VxLAN GRO
> +-
> +
> +The table structure used by VxLAN GRO, which is in charge of processing
> +VxLAN packets with an outer IPv4 header and inner TCP/IPv4 packet, is
> +similar with that of TCP/IPv4 GRO. Differently, the header fields used
> +to define a VxLAN flow include:
> +
> +- outer source and destination: Ethernet and IP address, UDP port
> +
> +- VxLAN header (VNI and flag)
> +
> +- inner source and destination: Ethernet and IP address, TCP port
> +
> +Header fields deciding if packets are neighbors include:
> +
> +- outer IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the
> +  outer IPv4 header is 0, should be increased by 1.
> +
> +- inner TCP sequence number
> +
> +- inner IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the
> +  inner IPv4 header is 0, should be increased by 1.
> +
>  .. note::
>  We comply RFC 6864 to process the IPv4 ID field. Specifically,
>  we check IPv4 ID fields for the packets whose DF bit is 0 and diff
> --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile index
> eb423cc..0110455 100644
> --- a/lib/librte_gro/Makefile
> +++ b/lib/librte_gro/Makefile
> @@ -45,6 +45,7 @@ LIBABIVER := 1
>  # source files
>  SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
>  SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_tcp4.c
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_vxlan_tcp4.c
> 
>  # install this header file
>  SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h diff --git
> a/lib/librte_gro/gro_vxlan_tcp4.c b/lib/librte_gro/gro_vxlan_tcp4.c new file
> mode 100644 index 000..6567779
> --- /dev/null
> +++ b/lib/librte_gro/gro_vxlan_tcp4.c
> @@ -0,0 +1,515 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or with

Re: [dpdk-dev] [PATCH v3 0/2] Support VxLAN GRO

2017-12-28 Thread Chen, Junjie J
> -Original Message-
> From: Hu, Jiayu
> Sent: Friday, December 22, 2017 3:26 PM
> To: dev@dpdk.org
> Cc: Tan, Jianfeng ; Chen, Junjie J
> ; Ananyev, Konstantin
> ; step...@networkplumber.org; Yigit,
> Ferruh ; Yao, Lei A ; Hu, Jiayu
> 
> Subject: [PATCH v3 0/2] Support VxLAN GRO
> 
> VxLAN is one of the most widely used tunneled protocols. Providing GRO
> support for VxLAN-encapsulated packets can benefit many per-packet based
> applications, like Open vSwitch.
> 
> This patchset is to support VxLAN GRO. The first patch cleans up current gro
> codes for the sake of supporting tunneled GRO. The second patch supports
> GRO on the VxLAN packets which have an outer IPv4 header and an inner
> TCP/IPv4 packet.
> 
> Change log
> ===
> v3:
> - remove needless check
> - add "likely()" and "unlikely()" to optimize branch prediction
> - fix a bug in merge_two_tcp4_packets(): for VxLAN packets, check if
>   the outer IPv4 packet length is less than or equal to UINT16_MAX,
>   rather than the inner IPv4 packet length.
> - fix a bug in rte_gro.h: change RTE_GRO_TYPE_SUPPORT_NUM to 2
> - Avoid inserting timestamp in rte_gro_reassemble_burst(), since all
>   packets in the tables will be flushed.
> - fix typos
> v2:
> - comply RFC 6848 to process IP ID fields. Specifically, we require the
>   IP ID fields of neighbor packets whose DF bit is 0 to be increased by
>   1. We don't check IP ID for the packets whose DF bit is 1.
>   Additionally, packets whose DF bits are different cannot be merged.
> - update the programmer guide and function comments
> 
> Jiayu Hu (2):
>   gro: code cleanup
>   gro: support VxLAN GRO
> 
>  .../prog_guide/generic_receive_offload_lib.rst | 269 ++-
>  doc/guides/prog_guide/img/gro-key-algorithm.png| Bin 0 -> 28231
> bytes
>  lib/librte_gro/Makefile|   1 +
>  lib/librte_gro/gro_tcp4.c  | 330 +
>  lib/librte_gro/gro_tcp4.h  | 253 +++---
>  lib/librte_gro/gro_vxlan_tcp4.c| 515
> +
>  lib/librte_gro/gro_vxlan_tcp4.h| 184 
>  lib/librte_gro/rte_gro.c   | 199 +---
>  lib/librte_gro/rte_gro.h   |  97 ++--
>  9 files changed, 1337 insertions(+), 511 deletions(-)  create mode 100644
> doc/guides/prog_guide/img/gro-key-algorithm.png
>  create mode 100644 lib/librte_gro/gro_vxlan_tcp4.c  create mode 100644
> lib/librte_gro/gro_vxlan_tcp4.h
> 
> --
> 2.7.4

Reviewed-by: Junjie Chen

Thanks



Re: [dpdk-dev] [PATCH v3 1/2] gro: code cleanup

2017-12-28 Thread Chen, Junjie J

> -Original Message-
> From: Hu, Jiayu
> Sent: Friday, December 22, 2017 3:26 PM
> To: dev@dpdk.org
> Cc: Tan, Jianfeng ; Chen, Junjie J
> ; Ananyev, Konstantin
> ; step...@networkplumber.org; Yigit,
> Ferruh ; Yao, Lei A ; Hu, Jiayu
> 
> Subject: [PATCH v3 1/2] gro: code cleanup
> 
> - Remove needless check and variants
> - For better understanding, update the programmer guide and rename
>   internal functions and variants
> - For supporting tunneled gro, move common internal functions from
>   gro_tcp4.c to gro_tcp4.h
> - Comply RFC 6864 to process the IPv4 ID field
> 
> Signed-off-by: Jiayu Hu 
> ---
>  .../prog_guide/generic_receive_offload_lib.rst | 246 ---
>  doc/guides/prog_guide/img/gro-key-algorithm.png| Bin 0 -> 28231
> bytes
>  lib/librte_gro/gro_tcp4.c  | 330
> +++--
>  lib/librte_gro/gro_tcp4.h  | 253
> +++-
>  lib/librte_gro/rte_gro.c   |  98 +++---
>  lib/librte_gro/rte_gro.h   |  92 +++---
>  6 files changed, 518 insertions(+), 501 deletions(-)
>  create mode 100644 doc/guides/prog_guide/img/gro-key-algorithm.png
> 
> diff --git a/doc/guides/prog_guide/generic_receive_offload_lib.rst
> b/doc/guides/prog_guide/generic_receive_offload_lib.rst
> index 22e50ec..c2d7a41 100644
> --- a/doc/guides/prog_guide/generic_receive_offload_lib.rst
> +++ b/doc/guides/prog_guide/generic_receive_offload_lib.rst
> @@ -32,128 +32,162 @@ Generic Receive Offload Library
>  ===
> 
>  Generic Receive Offload (GRO) is a widely used SW-based offloading
> -technique to reduce per-packet processing overhead. It gains performance
> -by reassembling small packets into large ones. To enable more flexibility
> -to applications, DPDK implements GRO as a standalone library. Applications
> -explicitly use the GRO library to merge small packets into large ones.
> -
> -The GRO library assumes all input packets have correct checksums. In
> -addition, the GRO library doesn't re-calculate checksums for merged
> -packets. If input packets are IP fragmented, the GRO library assumes
> -they are complete packets (i.e. with L4 headers).
> -
> -Currently, the GRO library implements TCP/IPv4 packet reassembly.
> -
> -Reassembly Modes
> -
> -
> -The GRO library provides two reassembly modes: lightweight and
> -heavyweight mode. If applications want to merge packets in a simple way,
> -they can use the lightweight mode API. If applications want more
> -fine-grained controls, they can choose the heavyweight mode API.
> -
> -Lightweight Mode
> -
> -
> -The ``rte_gro_reassemble_burst()`` function is used for reassembly in
> -lightweight mode. It tries to merge N input packets at a time, where
> -N should be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``.
> -
> -In each invocation, ``rte_gro_reassemble_burst()`` allocates temporary
> -reassembly tables for the desired GRO types. Note that the reassembly
> -table is a table structure used to reassemble packets and different GRO
> -types (e.g. TCP/IPv4 GRO and TCP/IPv6 GRO) have different reassembly table
> -structures. The ``rte_gro_reassemble_burst()`` function uses the reassembly
> -tables to merge the N input packets.
> -
> -For applications, performing GRO in lightweight mode is simple. They
> -just need to invoke ``rte_gro_reassemble_burst()``. Applications can get
> -GROed packets as soon as ``rte_gro_reassemble_burst()`` returns.
> -
> -Heavyweight Mode
> -
> -
> -The ``rte_gro_reassemble()`` function is used for reassembly in heavyweight
> -mode. Compared with the lightweight mode, performing GRO in heavyweight
> mode
> -is relatively complicated.
> -
> -Before performing GRO, applications need to create a GRO context object
> -by calling ``rte_gro_ctx_create()``. A GRO context object holds the
> -reassembly tables of desired GRO types. Note that all update/lookup
> -operations on the context object are not thread safe. So if different
> -processes or threads want to access the same context object simultaneously,
> -some external syncing mechanisms must be used.
> -
> -Once the GRO context is created, applications can then use the
> -``rte_gro_reassemble()`` function to merge packets. In each invocation,
> -``rte_gro_reassemble()`` tries to merge input packets with the packets
> -in the reassembly tables. If an input packet is an unsupported GRO type,
> -or other errors happen (e.g. SYN bit is set), ``rte_gro_reassemble()``
> -returns the packet to applications. Otherwise, the input packet is either
> -merged or inserted into a reassembly table.
&g

Re: [dpdk-dev] [PATCH v3 2/2] gro: support VxLAN GRO

2017-12-28 Thread Chen, Junjie J

> -Original Message-
> From: Hu, Jiayu
> Sent: Friday, December 22, 2017 3:26 PM
> To: dev@dpdk.org
> Cc: Tan, Jianfeng ; Chen, Junjie J
> ; Ananyev, Konstantin
> ; step...@networkplumber.org; Yigit,
> Ferruh ; Yao, Lei A ; Hu, Jiayu
> 
> Subject: [PATCH v3 2/2] gro: support VxLAN GRO
> 
> This patch adds a framework that allows GRO on tunneled packets.
> Furthermore, it leverages that framework to provide GRO support for
> VxLAN-encapsulated packets. Supported VxLAN packets must have an outer
> IPv4 header, and contain an inner TCP/IPv4 packet.
> 
> VxLAN GRO doesn't check if input packets have correct checksums and doesn't
> update checksums for output packets. Additionally, it assumes the packets are
> complete (i.e., MF==0 && frag_off==0), when IP fragmentation is possible 
> (i.e.,
> DF==0).
> 
> Signed-off-by: Jiayu Hu 
> ---
>  .../prog_guide/generic_receive_offload_lib.rst |  31 +-
>  lib/librte_gro/Makefile|   1 +
>  lib/librte_gro/gro_vxlan_tcp4.c| 515
> +
>  lib/librte_gro/gro_vxlan_tcp4.h| 184 
>  lib/librte_gro/rte_gro.c   | 129 +-
>  lib/librte_gro/rte_gro.h   |   5 +-
>  6 files changed, 837 insertions(+), 28 deletions(-)  create mode 100644
> lib/librte_gro/gro_vxlan_tcp4.c  create mode 100644
> lib/librte_gro/gro_vxlan_tcp4.h
> 
> diff --git a/doc/guides/prog_guide/generic_receive_offload_lib.rst
> b/doc/guides/prog_guide/generic_receive_offload_lib.rst
> index c2d7a41..078bec0 100644
> --- a/doc/guides/prog_guide/generic_receive_offload_lib.rst
> +++ b/doc/guides/prog_guide/generic_receive_offload_lib.rst
> @@ -57,7 +57,9 @@ assumes the packets are complete (i.e., MF==0 &&
> frag_off==0), when IP  fragmentation is possible (i.e., DF==0). Additionally, 
> it
> complies RFC
>  6864 to process the IPv4 ID field.
> 
> -Currently, the GRO library provides GRO supports for TCP/IPv4 packets.
> +Currently, the GRO library provides GRO supports for TCP/IPv4 packets
> +and VxLAN packets which contain an outer IPv4 header and an inner
> +TCP/IPv4 packet.
> 
>  Two Sets of API
>  ---
> @@ -108,7 +110,8 @@ Reassembly Algorithm
> 
>  The reassembly algorithm is used for reassembling packets. In the GRO
> library, different GRO types can use different algorithms. In this -section, 
> we
> will introduce an algorithm, which is used by TCP/IPv4 GRO.
> +section, we will introduce an algorithm, which is used by TCP/IPv4 GRO
> +and VxLAN GRO.
> 
>  Challenges
>  ~~
> @@ -185,6 +188,30 @@ Header fields deciding if two packets are neighbors
> include:
>  - IPv4 ID. The IPv4 ID fields of the packets, whose DF bit is 0, should
>be increased by 1.
> 
> +VxLAN GRO
> +-
> +
> +The table structure used by VxLAN GRO, which is in charge of processing
> +VxLAN packets with an outer IPv4 header and inner TCP/IPv4 packet, is
> +similar with that of TCP/IPv4 GRO. Differently, the header fields used
> +to define a VxLAN flow include:
> +
> +- outer source and destination: Ethernet and IP address, UDP port
> +
> +- VxLAN header (VNI and flag)
> +
> +- inner source and destination: Ethernet and IP address, TCP port
> +
> +Header fields deciding if packets are neighbors include:
> +
> +- outer IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the
> +  outer IPv4 header is 0, should be increased by 1.
> +
> +- inner TCP sequence number
> +
> +- inner IPv4 ID. The IPv4 ID fields of the packets, whose DF bit in the
> +  inner IPv4 header is 0, should be increased by 1.
> +
>  .. note::
>  We comply RFC 6864 to process the IPv4 ID field. Specifically,
>  we check IPv4 ID fields for the packets whose DF bit is 0 and diff
> --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile index
> eb423cc..0110455 100644
> --- a/lib/librte_gro/Makefile
> +++ b/lib/librte_gro/Makefile
> @@ -45,6 +45,7 @@ LIBABIVER := 1
>  # source files
>  SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
>  SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_tcp4.c
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_vxlan_tcp4.c
> 
>  # install this header file
>  SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h diff --git
> a/lib/librte_gro/gro_vxlan_tcp4.c b/lib/librte_gro/gro_vxlan_tcp4.c new file
> mode 100644 index 000..6567779
> --- /dev/null
> +++ b/lib/librte_gro/gro_vxlan_tcp4.c
> @@ -0,0 +1,515 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
>

Re: [dpdk-dev] [PATCH] examples/vhost: fix sending arp packet to self

2017-12-28 Thread Chen, Junjie J
Thanks Zhiyong

Updated in v2.

> -Original Message-
> From: Yang, Zhiyong
> Sent: Friday, December 29, 2017 2:27 PM
> To: Chen, Junjie J ; y...@fridaylinux.org;
> maxime.coque...@redhat.com
> Cc: dev@dpdk.org; Chen, Junjie J 
> Subject: RE: [dpdk-dev] [PATCH] examples/vhost: fix sending arp packet to self
> 
> Hi Junjie,
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Junjie Chen
> > Sent: Friday, December 29, 2017 6:14 PM
> > To: y...@fridaylinux.org; maxime.coque...@redhat.com
> > Cc: dev@dpdk.org; Chen, Junjie J 
> > Subject: [dpdk-dev] [PATCH] examples/vhost: fix sending arp packet to
> > self
> >
> > ARP packets are not dropped when dest vdev is itself, which breaks RX
> > ring inconspicuously.
> >
> If you are fixing a bug,  it's better to write one fixline before SOB that can
> describe which commit caused the issue.
> For example,
> Fixes: 756ce64b1ecd ("eal: introduce PCI ioport API")
> 
> Thanks
> Zhiyong
> 
> > Signed-off-by: Junjie Chen 
> > ---
> ...
> 


Re: [dpdk-dev] [PATCH v6] vhost: support virtqueue interrupt/notification suppression

2018-01-08 Thread Chen, Junjie J
Hi

> -Original Message-
> From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> Sent: Monday, January 8, 2018 10:07 PM
> To: Chen, Junjie J 
> Cc: Wang, Xiao W ; maxime.coque...@redhat.com;
> Bie, Tiwei ; dev@dpdk.org; Yao, Lei A
> 
> Subject: Re: [PATCH v6] vhost: support virtqueue interrupt/notification
> suppression
> 
> On Tue, Dec 26, 2017 at 12:43:10PM -0500, Junjie Chen wrote:
> > The driver can suppress interrupt when VIRTIO_F_EVENT_IDX feature bit
> > is negotiated. The driver set vring flags to 0, and MAY use used_event
> > in available ring to advise device interrupt util reach an index
> > specified by used_event. The device ignore the lower bit of vring
> > flags, and send an interrupt when index reach used_event.
> >
> > The device can suppress notification in a manner analogous to the ways
> > driver suppress interrupt. The device manipulates flags or avail_event
> > in the used ring in the same way the driver manipulates flags or
> > used_event in available ring.
> >
> > This patch is to enable this feature in vhost.
> >
> > Signed-off-by: Junjie Chen 
> 
> You need put "---" before the change log. Otherwise, it will be tracked in the
> commit log.

Will update this.
> 
> > +#define vhost_used_event(vr) \
> > +   (*(volatile uint16_t*)&(vr)->avail->ring[(vr)->size])
> > +
> > +static __rte_always_inline void
> > +vhost_notify(struct virtio_net *dev,  struct vhost_virtqueue *vq) {
> > +   /* Don't notify guest if we don't reach index specified by guest. */
> > +   if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) {
> > +   uint16_t old = vq->signalled_used;
> > +   uint16_t new = vq->last_used_idx;
> > +
> > +   LOG_DEBUG(VHOST_DATA, "%s: used_event_idx=%d, old=%d,
> new=%d\n",
> > +   __func__,
> > +   vhost_used_event(vq),
> > +   old, new);
> > +   if (vring_need_event(vhost_used_event(vq), new, old)
> 
> It's a bit weird that you use one from the standard linux header file
> (vring_need_event), while you define you own one (vhost_used_event).
> Note that the system header file also has "vring_used_event()" defined.
The vring_used_event is defined and used for virtio in kernel, kernel defines a 
vhost_used_event in vhost.c for vhost, so I just use a separated macro for 
vhost end.

I'd like to define both vhost_need_event and vhost_used_event in vhost.h to 
remove potential build issue in old linux distribution and also to keep 
consistent. Is that OK for you?

> 
> Besides that, I have few more comments (and some requirements):
> 
> - It'd be much better if there is a Tested-by tag. Expeclitly,
>   I'm asking a test with Linux kernel virtio-net driver in guest.
Sure.
> 
> - I also hope you could have done a build test on some old distributions.
>   AFAIK, the two macros (vring_need_event and vring_used_event) come
>   from kernel 3.0 (or above). Any kernel older than that would fail
>   the build.
> 
> - I'd be great if you could make a new one based on top of my latest
>   tree: I have just applied a patchset that should conflict with this
>   one.

Sure, will do this.
> 
>   --yliu