[dpdk-dev] [PATCH 0/9] qede: update qede PMD to 1.1.0.1

2016-05-23 Thread Rasesh Mody
Hi Bruce,

> From: Rasesh Mody [mailto:rasesh.mody at qlogic.com]
> Sent: Friday, May 06, 2016 9:30 PM
> 
> This patch set adds support for enabling 100G mode for QEDE PMD.
> It also adds support for APIs like
>  - mtu_set
>  - reta_update
>  - reta_query
>  - rss_hash_update
>  - rss_hash_conf_get
>  - xstats_get
> The changes include enablement of vf-vf traffic and updated the driver
> version to 8.7.9.0_1.1.0.1
> 
> The patches have been generated and tested against dpdk-next-net
> rel_16_07 branch.
> 
> Please apply!
> 
> Harish Patil (2):
>   qede: add 100g mode support
>   qede: update version to 8.7.9.0_1.1.0.1
> 
> Rasesh Mody (1):
>   qede: add support for xstats
> 
> Sony Chacko (6):
>   qede: update hash config
>   qede: get hash configuration
>   qede: rss redirection table update
>   qede: rss redirection table query
>   qede: set mtu
>   qede: enable vf-vf traffic with unmatched dest addr
> 
>  config/common_base   |2 +-
>  doc/guides/nics/overview.rst |6 +-
>  doc/guides/nics/qede.rst |   11 +-
>  drivers/net/qede/base/ecore_l2.c |   19 +-
>  drivers/net/qede/qede_eth_if.c   |   45 +++--
>  drivers/net/qede/qede_eth_if.h   |1 +
>  drivers/net/qede/qede_ethdev.c   |  390
> +-
>  drivers/net/qede/qede_ethdev.h   |   10 +-
>  drivers/net/qede/qede_rxtx.c |   42 ++--
>  9 files changed, 466 insertions(+), 60 deletions(-)
> 
> --
> 1.7.10.3

Did you get a chance to review these patches for QEDE PMD?

Thanks!
Rasesh


[dpdk-dev] [PATCH v4] i40e: configure MTU

2016-05-23 Thread Wu, Jingjing


> -Original Message-
> From: Xing, Beilei
> Sent: Friday, May 20, 2016 11:17 PM
> To: Wu, Jingjing
> Cc: dev at dpdk.org; Xing, Beilei
> Subject: [PATCH v4] i40e: configure MTU
> 
> This patch enables configuring MTU for i40e.
> Since changing MTU needs to reconfigure queue, stop port first before
> configuring MTU.
> 
> Signed-off-by: Beilei Xing 

Acked-by: Jingjing Wu 



[dpdk-dev] Suggestions for the dpdk stable tree

2016-05-23 Thread Yuanhan Liu
On Fri, May 20, 2016 at 02:49:31PM +, Mcnamara, John wrote:
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Christian Ehrhardt
> > Sent: Friday, May 20, 2016 9:07 AM
> > To: dev ; Stephen Hemminger 
> > Subject: Re: [dpdk-dev] Suggestions for the dpdk stable tree
> > 
> > Hi,
> > I guess over time/releases less people mind the 2.2-stable.
> > But I still see a lot of people referring to 2.2 - so why not giving this
> > thread a ping again.
> > 
> > ack / nack / opinions ?
> 
> Hi Christian,
> 
> We are interested in having a LTS/Stable tree.

I didn't notice this thread, otherwise, I would have commented earlier:
TBH, I have also thought of LTS tree few months before. But I was thinking,
hmm, it's just a library, what's the big deal of maintaining a stable
tree for it. I then hide it deep inside of my mind, silently.

> We have been looking at identifying a maintainer and validation engineer 
> internally to support the effort but haven't be able to finalize that. Once 
> we do we will come back to the mailing list with a proposal and a request for 
> comments.

I would nominate myself as the LTS tree maintainer, if it makes sense
to have one.

> We would probably be looking at 16.04 or even 16.07 as the basis for the LTS 
> at this stage.

Just one opinion from the view of vhost: since 16.07 is a vhost ABI/API
refactoring release, I'd suggest to base on 16.07, and then we could
have less conflicts to apply later bug fix patches.

However, I'm very open to choose any others as the base, say, even v2.2.

--yliu

> It would be great if we could get support from you or others as well.
> 
> John.
> -- 
> 


[dpdk-dev] [PATCH] examples/ethtool: include case for 64-bit registers

2016-05-23 Thread Zyta Szpak
Hi,
sorry on my late reply I was on sick leave. Sure I can do that. This fix 
was the fastest possible without interfering with DPDK API. I will add 
the callback then.

Regards,
Zyta Szpak

On 20.05.2016 10:25, Remy Horton wrote:
> Morning,
>
> On 11/05/2016 11:48, zr at semihalf.com wrote:
>> From: Zyta Szpak 
>>
>> rte_eth_dev_get_reg_length and rte_eth_dev_get_reg callbacks
>> do not provide register size to the app in any way. Example assuming
>> they are 32-bit wide always allocates not enough memory if the
>> registers are 64-bit wide. It results in memory corruption.
>> This commit is a quick fix to make enough room for 64-bit
>> register values when this returned value is given to malloc.
> [..]
>
> This is a loose end that needs to be fixed but my feeling is that it 
> ought to be done via querying the driver rather than overstating 
> register bank size. My suggestion would be to add something like 
> get_reg_wordsize to struct eth_dev_ops and then to use sizeof(uint32) 
> as fallback for drivers that don't implement the callback.
>
> Regards,
>
> ..R?my



[dpdk-dev] [PATCH] mk: Fix cruft getting installed by "make install" with tar >= 1.29

2016-05-23 Thread Panu Matilainen
--exclude became a positional option in tar 1.29, breaking the
test app filtering in "make install", causing .map files and all test
apps to get installed in bindir. Adjust the tar arguments accordingly,
this is compatible with older versions too since they do not care about
the order.

Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1337864

Signed-off-by: Panu Matilainen 
---
 mk/rte.sdkinstall.mk | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mk/rte.sdkinstall.mk b/mk/rte.sdkinstall.mk
index 68e56b6..abdab0f 100644
--- a/mk/rte.sdkinstall.mk
+++ b/mk/rte.sdkinstall.mk
@@ -116,9 +116,9 @@ install-runtime:
$(Q)$(call rte_mkdir, $(DESTDIR)$(libdir))
$(Q)cp -a$O/lib/* $(DESTDIR)$(libdir)
$(Q)$(call rte_mkdir, $(DESTDIR)$(bindir))
-   $(Q)tar -cf -  -C $O app  --exclude 'app/*.map' \
+   $(Q)tar -cf -  -C $O --exclude 'app/*.map' \
--exclude 'app/cmdline*' --exclude app/test \
-   --exclude app/testacl --exclude app/testpipeline | \
+   --exclude app/testacl --exclude app/testpipeline app | \
tar -xf -  -C $(DESTDIR)$(bindir) --strip-components=1 \
--keep-newer-files --warning=no-ignore-newer
$(Q)$(call rte_mkdir,  $(DESTDIR)$(datadir))
-- 
2.5.5



[dpdk-dev] [PATCH v3 00/35] mempool: rework memory allocation

2016-05-23 Thread Olivier Matz
Hi Panu, Thomas,

On 05/20/2016 11:09 AM, Thomas Monjalon wrote:
> 2016-05-20 11:42, Panu Matilainen:
>> Just noticed this series "breaks" --no-huge as a regular user, commit 
>> 593a084afc2b to be exact:
>>
>> mmap(NULL, 4194304, PROT_READ|PROT_WRITE, 
>> MAP_PRIVATE|MAP_ANONYMOUS|MAP_LOCKED, 0, 0) = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> write(1, "EAL: rte_eal_hugepage_init: mmap"..., 76EAL: 
>> rte_eal_hugepage_init: mmap() failed: Resource temporarily unavailable
>>
>> "Breaks" in quotes because I guess it always was broken (as the 
>> non-locked pages might not be in physical memory) and because its
>> possible to adjust resourse limits to allow the operation to succeed.
>> If you're root, that is.
>>
>> I was just looking into making the test-suite runnable by a regular user 
>> with no special privileges,
> 
> I have the same dream, to make sure every developer can run the unit tests
> easily and quickly.

Thanks Panu for the feedback on this, I didn't notice this regression
for a regular user.

The goal of this commit was to do a step forward in the direction
of a working --no-huge: locking the pages in physical memory is
mandatory for most physical drivers. But as described at the end
of http://dpdk.org/ml/archives/dev/2016-May/039229.html , the
--no-huge option is still not working because the physical addresses
are not correct.

So I think it wouldn't be a problem to revert this commit if it breaks
something.

>> primarily to make it possible to run the 
>> testsuite as part of rpm package builds (in %check), and no special 
>> setup or extra privileges can be assumed there. Such tests are of course 
>> of limited coverage but still better than nothing, and --no-huge was my 
>> ticket there. Talk about bad timing :)
>>
>> It'd be fine to have limited subset of tests to run when non-privileged 
>> but since this one lives inside rte_eal_init() it practically prevents 
>> everything, unless I'm missing some other magic switch or such. Thoughts?
> 
> This change was done for mbuf allocation because they are passed to the
> hardware. We should not have any hardware constraint in the unit tests.
> So I'd say it is a requirement for the memory rework. We must be capable
> to allocate some locked pages if required, and some standard pages for
> other usages.
> Please jump in this thread:
>   http://dpdk.org/ml/archives/dev/2016-April/037444.html

Yes, I agree, this is something that could be managed in the
memory rework task.


Olivier


[dpdk-dev] [PATCH] mbuf: remove unused Rx error flags

2016-05-23 Thread Olivier Matz
Following the discussions from:
http://dpdk.org/ml/archives/dev/2015-July/021721.html
http://dpdk.org/ml/archives/dev/2016-April/038143.html

The value of these flags is 0, making them useless. Today, no example
application checks them on Rx, and only few drivers sets them and
silently give wrong packets to the application, which should not happen.

This patch removes the unused flags from rte_mbuf and their use in the
drivers. The i40e and fm10k are kept as they are today and should be
fixed to drop bad packets. The enic driver is managed by its maintainer
in another patch.

Fixes: c22265f6 ("mbuf: add new packet flags for i40e")
Signed-off-by: Olivier Matz 
---

RFC -> v1:
- remove the enic part of the patch, it is managed separately
  in this patch:
  http://dpdk.org/ml/archives/dev/2016-May/039452.html
  Note that this enic series should be applied before this patch
  to avoid breaking the compilation.
- fix title according to scripts/check-git-log.sh


 drivers/net/fm10k/fm10k_rxtx.c | 16 
 drivers/net/fm10k/fm10k_rxtx_vec.c |  2 +-
 drivers/net/i40e/i40e_rxtx.c   | 15 ---
 lib/librte_mbuf/rte_mbuf.c |  4 
 lib/librte_mbuf/rte_mbuf.h |  5 +
 5 files changed, 2 insertions(+), 40 deletions(-)

diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index 81ed4e7..dd92a91 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -96,22 +96,6 @@ rx_desc_to_ol_flags(struct rte_mbuf *m, const union 
fm10k_rx_desc *d)

if (d->w.pkt_info & FM10K_RXD_RSSTYPE_MASK)
m->ol_flags |= PKT_RX_RSS_HASH;
-
-   if (unlikely((d->d.staterr &
-   (FM10K_RXD_STATUS_IPCS | FM10K_RXD_STATUS_IPE)) ==
-   (FM10K_RXD_STATUS_IPCS | FM10K_RXD_STATUS_IPE)))
-   m->ol_flags |= PKT_RX_IP_CKSUM_BAD;
-
-   if (unlikely((d->d.staterr &
-   (FM10K_RXD_STATUS_L4CS | FM10K_RXD_STATUS_L4E)) ==
-   (FM10K_RXD_STATUS_L4CS | FM10K_RXD_STATUS_L4E)))
-   m->ol_flags |= PKT_RX_L4_CKSUM_BAD;
-
-   if (unlikely(d->d.staterr & FM10K_RXD_STATUS_HBO))
-   m->ol_flags |= PKT_RX_HBUF_OVERFLOW;
-
-   if (unlikely(d->d.staterr & FM10K_RXD_STATUS_RXE))
-   m->ol_flags |= PKT_RX_RECIP_ERR;
 }

 uint16_t
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 03e4a5c..d417c96 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -101,7 +101,7 @@ fm10k_desc_to_olflags_v(__m128i descs[4], struct rte_mbuf 
**rx_pkts)
const __m128i rxe_flag = _mm_set_epi8(0, 0, 0, 0,
0, 0, 0, 0,
0, 0, 0, 0,
-   0, 0, PKT_RX_RECIP_ERR, 0);
+   0, 0, 0, 0);

/* map rss type to rss hash flag */
const __m128i rss_flags = _mm_set_epi8(0, 0, 0, 0,
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index c833aa3..c010770 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -140,27 +140,12 @@ i40e_rxd_error_to_pkt_flags(uint64_t qword)
 #define I40E_RX_ERR_BITS 0x3f
if (likely((error_bits & I40E_RX_ERR_BITS) == 0))
return flags;
-   /* If RXE bit set, all other status bits are meaningless */
-   if (unlikely(error_bits & (1 << I40E_RX_DESC_ERROR_RXE_SHIFT))) {
-   flags |= PKT_RX_MAC_ERR;
-   return flags;
-   }
-
-   /* If RECIPE bit set, all other status indications should be ignored */
-   if (unlikely(error_bits & (1 << I40E_RX_DESC_ERROR_RECIPE_SHIFT))) {
-   flags |= PKT_RX_RECIP_ERR;
-   return flags;
-   }
-   if (unlikely(error_bits & (1 << I40E_RX_DESC_ERROR_HBO_SHIFT)))
-   flags |= PKT_RX_HBUF_OVERFLOW;
if (unlikely(error_bits & (1 << I40E_RX_DESC_ERROR_IPE_SHIFT)))
flags |= PKT_RX_IP_CKSUM_BAD;
if (unlikely(error_bits & (1 << I40E_RX_DESC_ERROR_L4E_SHIFT)))
flags |= PKT_RX_L4_CKSUM_BAD;
if (unlikely(error_bits & (1 << I40E_RX_DESC_ERROR_EIPE_SHIFT)))
flags |= PKT_RX_EIP_CKSUM_BAD;
-   if (unlikely(error_bits & (1 << I40E_RX_DESC_ERROR_OVERSIZE_SHIFT)))
-   flags |= PKT_RX_OVERSIZE;

return flags;
 }
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index eec1456..878db89 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -254,10 +254,6 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
case PKT_RX_L4_CKSUM_BAD: return "PKT_RX_L4_CKSUM_BAD";
case PKT_RX_IP_CKSUM_BAD: return "PKT_RX_IP_CKSUM_BAD";
case PKT_RX_EIP_CKSUM_BAD: return "PKT_RX_EIP_CKSUM_BAD";
-   /* case PKT_RX_OVERSIZE: return "PKT_RX_OVERSIZE"; */
-   /* case PKT_RX_HBUF_OVERFLOW: return "PKT_RX_HBUF_OVERFLOW"; */
-   /* case PKT_RX_RECIP_ERR: return "PKT_RX

[dpdk-dev] [RFC] mbuf: new flag when vlan is stripped

2016-05-23 Thread Olivier Matz
Hi John,

On 05/12/2016 10:36 PM, John Daley (johndale) wrote:
>> ... This is a draft patch that implements what was previously
>> discussed, except the packet_type, which does not exist for vlan
>> today (and I think it is not mandatory for now, let's do it in
>> another patch).
>> 
>> After doing this patch, it appeared that ixgbe was the only driver
>> that had a different behavior for the PKT_RX_VLAN_PKT flag. An
>> alternative to this patch would be to only change the behavior of
>> the ixgbe driver, and just document better document the
>> PKT_RX_VLAN_PKT flags in rte_mbuf.h without adding new flags. I
>> think this is a better option.
>> 
>> Comments are welcome.
>> 
> There are applications depending on the current behavior of
> PKT_RX_VLAN_PKT as confusing as it may be.  I know of one that has
> VLAN stripping disabled and uses the flag to determine if the packet
> delivered to the app has a VLAN tag. This is actually how the flag
> behaves for ixgbe, and they patched enic and maybe other drivers to
> act accordingly. To avoid breaking the app (and any others like it),
> I think we should keep the flag behavior the same and add the new
> flag PKT_RX_VLAN_STRIPPED .

OK, thanks for your comment.
So it means the v1 will be the same than RFC. I'm submitting it.

> We should follow on with the new packet type since it enables a nice
> performance improvement by not forcing apps to break open the packet
> just to see if there is a VLAN tag.

Yep, agree.

Thanks,
Olivier


[dpdk-dev] limited tx performance with VMware VM + PCI pass-through

2016-05-23 Thread Grisha Freilikhman
Setup
ThinkServer RD650, Intel Xeon E5-2680 v3
2*x710 ports pass-through (i40e driver)
Dpdk 2.1
Dpdk app doing forwarding from one port to another

Results
When more than 7.5(50% of wire speed) MPPS sent with 64 bytes, Tx 
fails(rte_eth_tx_burst return with less than given packets to send).
Rx on 1 port -> output 7.5 MPPS
Rx on 2 ports -> each outputs 3.75 MPPS (total 7.5)

This behavior happening only in virtual environment, bare Linux there 
is wire speed performance.

Question
Is there any known issue? Or someone found encountered and fixed this?


Thanks,
Grisha Freilikhman
SW Engineer
Allot Communications
Tel +972 9 7628476
Cell +972 54 3063662
gfreilikhman at allot.com 
www.allot.com





[dpdk-dev] [PATCH] mbuf: new flag when Vlan is stripped

2016-05-23 Thread Olivier Matz
The behavior of PKT_RX_VLAN_PKT was not very well defined, resulting in
PMDs not advertising the same flags in similar conditions.

Following discussion in [1], introduce 2 new flags PKT_RX_VLAN_STRIPPED
and PKT_RX_QINQ_STRIPPED that are better defined:

  PKT_RX_VLAN_STRIPPED: a vlan has been stripped by the hardware and its
  tci is saved in mbuf->vlan_tci. This can only happen if vlan stripping
  is enabled in the RX configuration of the PMD.

For now, the old flag PKT_RX_VLAN_PKT is kept but marked as deprecated.
It should be removed from applications and PMDs in a future revision.

This patch also updates the drivers. For PKT_RX_VLAN_PKT:

- e1000, enic, i40e, mlx5, nfp, vmxnet3: done, PKT_RX_VLAN_PKT already
  had the same meaning than PKT_RX_VLAN_STRIPPED, minor update is
  required.
- fm10k: done, PKT_RX_VLAN_PKT already had the same meaning than
  PKT_RX_VLAN_STRIPPED, and vlan stripping is always enabled on fm10k.
- ixgbe: modification done for standard mode (vector does not support
  vlan stripping)
- the other drivers do not support vlan stripping.

For PKT_RX_QINQ_PKT, it was only supported on i40e, and the meaning was
already correct, so we can reuse the same value for PKT_RX_QINQ_STRIPPED.

[1] http://dpdk.org/ml/archives/dev/2016-April/037837.html,

Signed-off-by: Olivier Matz 
---

RFC -> v1:
- fix checkpatch and check-git-log.sh issues
- add a deprecation notice for the old vlan flags
- rebase on head


 app/test-pmd/rxonly.c|  4 +--
 doc/guides/rel_notes/deprecation.rst |  5 
 drivers/net/e1000/em_rxtx.c  |  3 ++-
 drivers/net/e1000/igb_rxtx.c |  3 ++-
 drivers/net/enic/enic_rx.c   |  2 +-
 drivers/net/i40e/i40e_rxtx.c |  2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c |  7 +
 drivers/net/ixgbe/ixgbe_rxtx.c   | 21 +++
 drivers/net/ixgbe/ixgbe_rxtx.h   |  1 +
 drivers/net/mlx5/mlx5_rxtx.c |  6 +++--
 drivers/net/nfp/nfp_net.c|  2 +-
 drivers/net/vmxnet3/vmxnet3_rxtx.c   |  2 +-
 lib/librte_mbuf/rte_mbuf.c   |  2 ++
 lib/librte_mbuf/rte_mbuf.h   | 50 
 14 files changed, 90 insertions(+), 20 deletions(-)

diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
index 14555ab..c69b344 100644
--- a/app/test-pmd/rxonly.c
+++ b/app/test-pmd/rxonly.c
@@ -156,9 +156,9 @@ pkt_burst_receive(struct fwd_stream *fs)
printf("hash=0x%x ID=0x%x ",
   mb->hash.fdir.hash, mb->hash.fdir.id);
}
-   if (ol_flags & PKT_RX_VLAN_PKT)
+   if (ol_flags & PKT_RX_VLAN_STRIPPED)
printf(" - VLAN tci=0x%x", mb->vlan_tci);
-   if (ol_flags & PKT_RX_QINQ_PKT)
+   if (ol_flags & PKT_RX_QINQ_STRIPPED)
printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x",
mb->vlan_tci, mb->vlan_tci_outer);
if (mb->packet_type) {
diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index ad05eba..2233a90 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -57,3 +57,8 @@ Deprecation Notices
   a handle, like the way kernel exposes an fd to user for locating a
   specific file, and to keep all major structures internally, so that
   we are likely to be free from ABI violations in future.
+
+* The mbuf flags PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT are deprecated and
+  are respectively replaced by PKT_RX_VLAN_STRIPPED and
+  PKT_RX_QINQ_STRIPPED, that are better described. The old flags and
+  their behavior will be kept in 16.07 and will be removed in 16.11.
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 3d36f21..6d8750a 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -629,7 +629,8 @@ rx_desc_status_to_pkt_flags(uint32_t rx_status)
uint64_t pkt_flags;

/* Check if VLAN present */
-   pkt_flags = ((rx_status & E1000_RXD_STAT_VP) ?  PKT_RX_VLAN_PKT : 0);
+   pkt_flags = ((rx_status & E1000_RXD_STAT_VP) ?
+   PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED : 0);

return pkt_flags;
 }
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 18aeead..9d80a0b 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -729,7 +729,8 @@ rx_desc_status_to_pkt_flags(uint32_t rx_status)
uint64_t pkt_flags;

/* Check if VLAN present */
-   pkt_flags = (rx_status & E1000_RXD_STAT_VP) ?  PKT_RX_VLAN_PKT : 0;
+   pkt_flags = ((rx_status & E1000_RXD_STAT_VP) ?
+   PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED : 0);

 #if defined(RTE_LIBRTE_IEEE1588)
if (rx_status & E1000_RXD_STAT_TMST)
diff --git a/drivers/net/enic/enic_rx.c b/drivers/net/enic/enic_rx.c
index f92f6bc..6459e97 100644
--- a/drivers/net/enic/enic_rx.c
+++ b/drivers/net/enic/en

[dpdk-dev] [PATCH] mbuf: new flag when Vlan is stripped

2016-05-23 Thread Ananyev, Konstantin
Hi Olivier,

> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Monday, May 23, 2016 9:47 AM
> To: dev at dpdk.org
> Cc: johndale at cisco.com; Ananyev, Konstantin; Zhang, Helin; 
> adrien.mazarguil at 6wind.com; rahul.lakkireddy at chelsio.com;
> alejandro.lucero at netronome.com; sony.chacko at qlogic.com
> Subject: [PATCH] mbuf: new flag when Vlan is stripped
> 
> The behavior of PKT_RX_VLAN_PKT was not very well defined, resulting in
> PMDs not advertising the same flags in similar conditions.
> 
> Following discussion in [1], introduce 2 new flags PKT_RX_VLAN_STRIPPED
> and PKT_RX_QINQ_STRIPPED that are better defined:
> 
>   PKT_RX_VLAN_STRIPPED: a vlan has been stripped by the hardware and its
>   tci is saved in mbuf->vlan_tci. This can only happen if vlan stripping
>   is enabled in the RX configuration of the PMD.
> 
> For now, the old flag PKT_RX_VLAN_PKT is kept but marked as deprecated.
> It should be removed from applications and PMDs in a future revision.
> 
> This patch also updates the drivers. For PKT_RX_VLAN_PKT:
> 
> - e1000, enic, i40e, mlx5, nfp, vmxnet3: done, PKT_RX_VLAN_PKT already
>   had the same meaning than PKT_RX_VLAN_STRIPPED, minor update is
>   required.
> - fm10k: done, PKT_RX_VLAN_PKT already had the same meaning than
>   PKT_RX_VLAN_STRIPPED, and vlan stripping is always enabled on fm10k.
> - ixgbe: modification done for standard mode (vector does not support
>   vlan stripping)
> - the other drivers do not support vlan stripping.
> 
> For PKT_RX_QINQ_PKT, it was only supported on i40e, and the meaning was
> already correct, so we can reuse the same value for PKT_RX_QINQ_STRIPPED.
> 
> [1] http://dpdk.org/ml/archives/dev/2016-April/037837.html,
> 
> Signed-off-by: Olivier Matz 
> ---
> 
> RFC -> v1:
> - fix checkpatch and check-git-log.sh issues
> - add a deprecation notice for the old vlan flags
> - rebase on head
> 
> 
>  app/test-pmd/rxonly.c|  4 +--
>  doc/guides/rel_notes/deprecation.rst |  5 
>  drivers/net/e1000/em_rxtx.c  |  3 ++-
>  drivers/net/e1000/igb_rxtx.c |  3 ++-
>  drivers/net/enic/enic_rx.c   |  2 +-
>  drivers/net/i40e/i40e_rxtx.c |  2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c |  7 +
>  drivers/net/ixgbe/ixgbe_rxtx.c   | 21 +++
>  drivers/net/ixgbe/ixgbe_rxtx.h   |  1 +
>  drivers/net/mlx5/mlx5_rxtx.c |  6 +++--
>  drivers/net/nfp/nfp_net.c|  2 +-
>  drivers/net/vmxnet3/vmxnet3_rxtx.c   |  2 +-
>  lib/librte_mbuf/rte_mbuf.c   |  2 ++
>  lib/librte_mbuf/rte_mbuf.h   | 50 
> 
>  14 files changed, 90 insertions(+), 20 deletions(-)


I don't see ixgbe/i4oe_rxtx_vec.c updated.
Would it be another patch for them?
Thanks
Konstantin

> 
> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
> index 14555ab..c69b344 100644
> --- a/app/test-pmd/rxonly.c
> +++ b/app/test-pmd/rxonly.c
> @@ -156,9 +156,9 @@ pkt_burst_receive(struct fwd_stream *fs)
>   printf("hash=0x%x ID=0x%x ",
>  mb->hash.fdir.hash, mb->hash.fdir.id);
>   }
> - if (ol_flags & PKT_RX_VLAN_PKT)
> + if (ol_flags & PKT_RX_VLAN_STRIPPED)
>   printf(" - VLAN tci=0x%x", mb->vlan_tci);
> - if (ol_flags & PKT_RX_QINQ_PKT)
> + if (ol_flags & PKT_RX_QINQ_STRIPPED)
>   printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x",
>   mb->vlan_tci, mb->vlan_tci_outer);
>   if (mb->packet_type) {
> diff --git a/doc/guides/rel_notes/deprecation.rst 
> b/doc/guides/rel_notes/deprecation.rst
> index ad05eba..2233a90 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -57,3 +57,8 @@ Deprecation Notices
>a handle, like the way kernel exposes an fd to user for locating a
>specific file, and to keep all major structures internally, so that
>we are likely to be free from ABI violations in future.
> +
> +* The mbuf flags PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT are deprecated and
> +  are respectively replaced by PKT_RX_VLAN_STRIPPED and
> +  PKT_RX_QINQ_STRIPPED, that are better described. The old flags and
> +  their behavior will be kept in 16.07 and will be removed in 16.11.
> diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
> index 3d36f21..6d8750a 100644
> --- a/drivers/net/e1000/em_rxtx.c
> +++ b/drivers/net/e1000/em_rxtx.c
> @@ -629,7 +629,8 @@ rx_desc_status_to_pkt_flags(uint32_t rx_status)
>   uint64_t pkt_flags;
> 
>   /* Check if VLAN present */
> - pkt_flags = ((rx_status & E1000_RXD_STAT_VP) ?  PKT_RX_VLAN_PKT : 0);
> + pkt_flags = ((rx_status & E1000_RXD_STAT_VP) ?
> + PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED : 0);
> 
>   return pkt_flags;
>  }
> diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1

[dpdk-dev] [PATCH] mbuf: new flag when Vlan is stripped

2016-05-23 Thread Olivier Matz
Hi Konstantin,

On 05/23/2016 10:59 AM, Ananyev, Konstantin wrote:
> Hi Olivier,
> 
>> -Original Message-
>> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
>> Sent: Monday, May 23, 2016 9:47 AM
>> To: dev at dpdk.org
>> Cc: johndale at cisco.com; Ananyev, Konstantin; Zhang, Helin; 
>> adrien.mazarguil at 6wind.com; rahul.lakkireddy at chelsio.com;
>> alejandro.lucero at netronome.com; sony.chacko at qlogic.com
>> Subject: [PATCH] mbuf: new flag when Vlan is stripped
>>
>> The behavior of PKT_RX_VLAN_PKT was not very well defined, resulting in
>> PMDs not advertising the same flags in similar conditions.
>>
>> Following discussion in [1], introduce 2 new flags PKT_RX_VLAN_STRIPPED
>> and PKT_RX_QINQ_STRIPPED that are better defined:
>>
>>   PKT_RX_VLAN_STRIPPED: a vlan has been stripped by the hardware and its
>>   tci is saved in mbuf->vlan_tci. This can only happen if vlan stripping
>>   is enabled in the RX configuration of the PMD.
>>
>> For now, the old flag PKT_RX_VLAN_PKT is kept but marked as deprecated.
>> It should be removed from applications and PMDs in a future revision.
>>
>> This patch also updates the drivers. For PKT_RX_VLAN_PKT:
>>
>> - e1000, enic, i40e, mlx5, nfp, vmxnet3: done, PKT_RX_VLAN_PKT already
>>   had the same meaning than PKT_RX_VLAN_STRIPPED, minor update is
>>   required.
>> - fm10k: done, PKT_RX_VLAN_PKT already had the same meaning than
>>   PKT_RX_VLAN_STRIPPED, and vlan stripping is always enabled on fm10k.
>> - ixgbe: modification done for standard mode (vector does not support
>>   vlan stripping)
>> - the other drivers do not support vlan stripping.
>>
>> For PKT_RX_QINQ_PKT, it was only supported on i40e, and the meaning was
>> already correct, so we can reuse the same value for PKT_RX_QINQ_STRIPPED.
>>
> 
> I don't see ixgbe/i4oe_rxtx_vec.c updated.
> Would it be another patch for them?

The ixgbe vector and i40e vector do not support vlan stripping, so
from what I see there is nothing to do:
- The new flag PKT_RX_VLAN_STRIPPED is never returned
- We keep the old behavior for PKT_RX_VLAN_PKT.

Thanks,
Olivier


[dpdk-dev] [PATCH] mbuf: new flag when Vlan is stripped

2016-05-23 Thread Ananyev, Konstantin


> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Monday, May 23, 2016 9:47 AM
> To: dev at dpdk.org
> Cc: johndale at cisco.com; Ananyev, Konstantin; Zhang, Helin; 
> adrien.mazarguil at 6wind.com; rahul.lakkireddy at chelsio.com;
> alejandro.lucero at netronome.com; sony.chacko at qlogic.com
> Subject: [PATCH] mbuf: new flag when Vlan is stripped
> 
> The behavior of PKT_RX_VLAN_PKT was not very well defined, resulting in
> PMDs not advertising the same flags in similar conditions.
> 
> Following discussion in [1], introduce 2 new flags PKT_RX_VLAN_STRIPPED
> and PKT_RX_QINQ_STRIPPED that are better defined:
> 
>   PKT_RX_VLAN_STRIPPED: a vlan has been stripped by the hardware and its
>   tci is saved in mbuf->vlan_tci. This can only happen if vlan stripping
>   is enabled in the RX configuration of the PMD.
> 
> For now, the old flag PKT_RX_VLAN_PKT is kept but marked as deprecated.
> It should be removed from applications and PMDs in a future revision.
> 
> This patch also updates the drivers. For PKT_RX_VLAN_PKT:
> 
> - e1000, enic, i40e, mlx5, nfp, vmxnet3: done, PKT_RX_VLAN_PKT already
>   had the same meaning than PKT_RX_VLAN_STRIPPED, minor update is
>   required.
> - fm10k: done, PKT_RX_VLAN_PKT already had the same meaning than
>   PKT_RX_VLAN_STRIPPED, and vlan stripping is always enabled on fm10k.
> - ixgbe: modification done for standard mode (vector does not support
>   vlan stripping)
> - the other drivers do not support vlan stripping.
> 
> For PKT_RX_QINQ_PKT, it was only supported on i40e, and the meaning was
> already correct, so we can reuse the same value for PKT_RX_QINQ_STRIPPED.
> 
> [1] http://dpdk.org/ml/archives/dev/2016-April/037837.html,
> 
> Signed-off-by: Olivier Matz 
> ---
> 
> RFC -> v1:
> - fix checkpatch and check-git-log.sh issues
> - add a deprecation notice for the old vlan flags
> - rebase on head
> 
> 
>  app/test-pmd/rxonly.c|  4 +--
>  doc/guides/rel_notes/deprecation.rst |  5 
>  drivers/net/e1000/em_rxtx.c  |  3 ++-
>  drivers/net/e1000/igb_rxtx.c |  3 ++-
>  drivers/net/enic/enic_rx.c   |  2 +-
>  drivers/net/i40e/i40e_rxtx.c |  2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c |  7 +
>  drivers/net/ixgbe/ixgbe_rxtx.c   | 21 +++
>  drivers/net/ixgbe/ixgbe_rxtx.h   |  1 +
>  drivers/net/mlx5/mlx5_rxtx.c |  6 +++--
>  drivers/net/nfp/nfp_net.c|  2 +-
>  drivers/net/vmxnet3/vmxnet3_rxtx.c   |  2 +-
>  lib/librte_mbuf/rte_mbuf.c   |  2 ++
>  lib/librte_mbuf/rte_mbuf.h   | 50 
> 
>  14 files changed, 90 insertions(+), 20 deletions(-)
> 
> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c
> index 14555ab..c69b344 100644
> --- a/app/test-pmd/rxonly.c
> +++ b/app/test-pmd/rxonly.c
> @@ -156,9 +156,9 @@ pkt_burst_receive(struct fwd_stream *fs)
>   printf("hash=0x%x ID=0x%x ",
>  mb->hash.fdir.hash, mb->hash.fdir.id);
>   }
> - if (ol_flags & PKT_RX_VLAN_PKT)
> + if (ol_flags & PKT_RX_VLAN_STRIPPED)
>   printf(" - VLAN tci=0x%x", mb->vlan_tci);
> - if (ol_flags & PKT_RX_QINQ_PKT)
> + if (ol_flags & PKT_RX_QINQ_STRIPPED)
>   printf(" - QinQ VLAN tci=0x%x, VLAN tci outer=0x%x",
>   mb->vlan_tci, mb->vlan_tci_outer);
>   if (mb->packet_type) {
> diff --git a/doc/guides/rel_notes/deprecation.rst 
> b/doc/guides/rel_notes/deprecation.rst
> index ad05eba..2233a90 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -57,3 +57,8 @@ Deprecation Notices
>a handle, like the way kernel exposes an fd to user for locating a
>specific file, and to keep all major structures internally, so that
>we are likely to be free from ABI violations in future.
> +
> +* The mbuf flags PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT are deprecated and
> +  are respectively replaced by PKT_RX_VLAN_STRIPPED and
> +  PKT_RX_QINQ_STRIPPED, that are better described. The old flags and
> +  their behavior will be kept in 16.07 and will be removed in 16.11.
> diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
> index 3d36f21..6d8750a 100644
> --- a/drivers/net/e1000/em_rxtx.c
> +++ b/drivers/net/e1000/em_rxtx.c
> @@ -629,7 +629,8 @@ rx_desc_status_to_pkt_flags(uint32_t rx_status)
>   uint64_t pkt_flags;
> 
>   /* Check if VLAN present */
> - pkt_flags = ((rx_status & E1000_RXD_STAT_VP) ?  PKT_RX_VLAN_PKT : 0);
> + pkt_flags = ((rx_status & E1000_RXD_STAT_VP) ?
> + PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED : 0);
> 
>   return pkt_flags;
>  }
> diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
> index 18aeead..9d80a0b 100644
> --- a/drivers/net/e1000/igb_rxtx.c
> +++ b/drivers/net/e1000/i

[dpdk-dev] [PATCH] mbuf: new flag when Vlan is stripped

2016-05-23 Thread Ananyev, Konstantin


> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Monday, May 23, 2016 10:13 AM
> To: Ananyev, Konstantin; dev at dpdk.org
> Cc: johndale at cisco.com; Zhang, Helin; adrien.mazarguil at 6wind.com; 
> rahul.lakkireddy at chelsio.com;
> alejandro.lucero at netronome.com; sony.chacko at qlogic.com
> Subject: Re: [PATCH] mbuf: new flag when Vlan is stripped
> 
> Hi Konstantin,
> 
> On 05/23/2016 10:59 AM, Ananyev, Konstantin wrote:
> > Hi Olivier,
> >
> >> -Original Message-
> >> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> >> Sent: Monday, May 23, 2016 9:47 AM
> >> To: dev at dpdk.org
> >> Cc: johndale at cisco.com; Ananyev, Konstantin; Zhang, Helin; 
> >> adrien.mazarguil at 6wind.com; rahul.lakkireddy at chelsio.com;
> >> alejandro.lucero at netronome.com; sony.chacko at qlogic.com
> >> Subject: [PATCH] mbuf: new flag when Vlan is stripped
> >>
> >> The behavior of PKT_RX_VLAN_PKT was not very well defined, resulting in
> >> PMDs not advertising the same flags in similar conditions.
> >>
> >> Following discussion in [1], introduce 2 new flags PKT_RX_VLAN_STRIPPED
> >> and PKT_RX_QINQ_STRIPPED that are better defined:
> >>
> >>   PKT_RX_VLAN_STRIPPED: a vlan has been stripped by the hardware and its
> >>   tci is saved in mbuf->vlan_tci. This can only happen if vlan stripping
> >>   is enabled in the RX configuration of the PMD.
> >>
> >> For now, the old flag PKT_RX_VLAN_PKT is kept but marked as deprecated.
> >> It should be removed from applications and PMDs in a future revision.
> >>
> >> This patch also updates the drivers. For PKT_RX_VLAN_PKT:
> >>
> >> - e1000, enic, i40e, mlx5, nfp, vmxnet3: done, PKT_RX_VLAN_PKT already
> >>   had the same meaning than PKT_RX_VLAN_STRIPPED, minor update is
> >>   required.
> >> - fm10k: done, PKT_RX_VLAN_PKT already had the same meaning than
> >>   PKT_RX_VLAN_STRIPPED, and vlan stripping is always enabled on fm10k.
> >> - ixgbe: modification done for standard mode (vector does not support
> >>   vlan stripping)
> >> - the other drivers do not support vlan stripping.
> >>
> >> For PKT_RX_QINQ_PKT, it was only supported on i40e, and the meaning was
> >> already correct, so we can reuse the same value for PKT_RX_QINQ_STRIPPED.
> >>
> >
> > I don't see ixgbe/i4oe_rxtx_vec.c updated.
> > Would it be another patch for them?
> 
> The ixgbe vector and i40e vector do not support vlan stripping, 

As I remember, they do.
Konstantin

>so  from what I see there is nothing to do:
> - The new flag PKT_RX_VLAN_STRIPPED is never returned
> - We keep the old behavior for PKT_RX_VLAN_PKT.
> 
> Thanks,
> Olivier


[dpdk-dev] [PATCH] mbuf: new flag when Vlan is stripped

2016-05-23 Thread Olivier Matz
Hi,

>>> I don't see ixgbe/i4oe_rxtx_vec.c updated.
>>> Would it be another patch for them?
>>
>> The ixgbe vector and i40e vector do not support vlan stripping, 
> 
> As I remember, they do.

You are right, I misinterpreted this code in condition_check():

#ifndef RTE_IXGBE_RX_OLFLAGS_ENABLE
/* whithout rx ol_flags, no VP flag report */
if (rxmode->hw_vlan_strip != 0 ||
rxmode->hw_vlan_extend != 0)
return -1;
#endif

I'll update the patch accordingly, thanks for reviewing.

Olivier


[dpdk-dev] [PATCH] mbuf: new flag when Vlan is stripped

2016-05-23 Thread Olivier Matz


>>  static inline uint64_t
>> -rx_desc_status_to_pkt_flags(uint32_t rx_status)
>> +rx_desc_status_to_pkt_flags(uint32_t rx_status, uint8_t vlan_strip)
>>  {
>>  uint64_t pkt_flags;
>> +uint64_t vlan_flags;
>> +
>> +/* if vlan is stripped, set the proper flag */
>> +if (vlan_strip)
>> +vlan_flags = PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED;
>> +else
>> +vlan_flags = PKT_RX_VLAN_PKT;
>>
>>  /*
>>   * Check if VLAN present only.
>>   * Do not check whether L3/L4 rx checksum done by NIC or not,
>>   * That can be found from rte_eth_rxmode.hw_ip_checksum flag
>>   */
>> -pkt_flags = (rx_status & IXGBE_RXD_STAT_VP) ?  PKT_RX_VLAN_PKT : 0;
>> +pkt_flags = (rx_status & IXGBE_RXD_STAT_VP) ?  vlan_flags : 0;
> 
> 
> Instead of storing in rxq (and passing as a parameter) a bool value for 
> vlan_strip (=on/off),
> you probably can store in rxq and pass as a parameter here uint64_t 
> vlan_flags;
> Then it will be:
>  
> rx_desc_status_to_pkt_flags(uint32_t rx_status, uint64_t vlan_flags)
> {
>...
>pkt_flags = (rx_status & IXGBE_RXD_STAT_VP) ?  vlan_flags : 0;
>...
> }
> 
> ...
> pkt_flags = rx_desc_status_to_pkt_flags(s[j], rxq->vlan_flags);
> 
> Might help to save few cycles here.

Yep, will do in v2.


Olivier


[dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address.

2016-05-23 Thread Yuanhan Liu
On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
> In current implementation guest application can reinitialize vrings
> by executing start after stop. In the same time host application
> can still poll virtqueue while device stopped in guest and it will
> crash with segmentation fault while vring reinitialization because
> of dereferencing of bad descriptor addresses.
> 
> OVS crash for example:
> <>
> [test-pmd inside guest VM]
> 
>   testpmd> port stop all
>   Stopping ports...
>   Checking link statuses...
>   Port 0 Link Up - speed 1 Mbps - full-duplex
>   Done
>   testpmd> port config all rxq 2
>   testpmd> port config all txq 2
>   testpmd> port start all
>   Configuring Port 0 (socket 0)
>   Port 0: 52:54:00:CB:44:C8
>   Checking link statuses...
>   Port 0 Link Up - speed 1 Mbps - full-duplex
>   Done

I actually didn't manage to reproduce it on my side, with the
vhost-example instead of OVS though. Is that all the commands
to reproduce it, and run them just after start test-pmd?

> [OVS on host]
>   Program received signal SIGSEGV, Segmentation fault.
>   rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
> 
>   (gdb) bt
>   #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
>   #1  copy_desc_to_mbuf
>   #2  rte_vhost_dequeue_burst
>   #3  netdev_dpdk_vhost_rxq_recv
>   ...
> 
>   (gdb) bt full
>   #0  rte_memcpy
>   ...
>   #1  copy_desc_to_mbuf
>   desc_addr = 0
>   mbuf_offset = 0
>   desc_offset = 12
>   ...
> <>
> 
> Fix that by checking addresses of descriptors before using them.
> 
> Note: For mergeable buffers this patch checks only guest's address for
> zero, but in non-meargeable case host's address checked. This is done
> because checking of host's address in mergeable case requires additional
> refactoring to keep virtqueue in consistent state in case of error.
> 
> Signed-off-by: Ilya Maximets 
> ---
> 
> Actually, current virtio implementation looks broken for me. Because
> 'virtio_dev_start' breaks virtqueue while it still available from the vhost
> side.
> 
> There was 2 patches about this behaviour:
> 
>   1. a85786dc816f ("virtio: fix states handling during initialization")
>   2. 9a0615af7746 ("virtio: fix restart")
> 
> The second patch fixes somehow issue intoduced in the first patch, but 
> actually
> also breaks vhost in the way described above.
> It's not pretty clear for me what to do in current situation with virtio,
> because it will be broken for guest application even if vhost will not crash.
> 
> May be it'll be better to forbid stopping of virtio device and force user to
> exit and start again (may be implemented in hidden from user way)?
> 
> This patch adds additional sane checks, so it should be applied anyway, IMHO.

Agreed.

--yliu


[dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address.

2016-05-23 Thread Ilya Maximets
On 23.05.2016 13:57, Yuanhan Liu wrote:
> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
>> In current implementation guest application can reinitialize vrings
>> by executing start after stop. In the same time host application
>> can still poll virtqueue while device stopped in guest and it will
>> crash with segmentation fault while vring reinitialization because
>> of dereferencing of bad descriptor addresses.
>>
>> OVS crash for example:
>> <>
>> [test-pmd inside guest VM]
>>
>>  testpmd> port stop all
>>  Stopping ports...
>>  Checking link statuses...
>>  Port 0 Link Up - speed 1 Mbps - full-duplex
>>  Done
>>  testpmd> port config all rxq 2
>>  testpmd> port config all txq 2
>>  testpmd> port start all
>>  Configuring Port 0 (socket 0)
>>  Port 0: 52:54:00:CB:44:C8
>>  Checking link statuses...
>>  Port 0 Link Up - speed 1 Mbps - full-duplex
>>  Done
> 
> I actually didn't manage to reproduce it on my side, with the
> vhost-example instead of OVS though. Is that all the commands
> to reproduce it, and run them just after start test-pmd?

Actually, I think, packet flow should be enabled while performing
above actions and some traffic already should be sent through port
to change last used idx on vhost side.

Something like:
start
..wait a while.. see that packets are flowing.
stop
port stop
port config
port config
port start
> 
>> [OVS on host]
>>  Program received signal SIGSEGV, Segmentation fault.
>>  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
>>
>>  (gdb) bt
>>  #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
>>  #1  copy_desc_to_mbuf
>>  #2  rte_vhost_dequeue_burst
>>  #3  netdev_dpdk_vhost_rxq_recv
>>  ...
>>
>>  (gdb) bt full
>>  #0  rte_memcpy
>>  ...
>>  #1  copy_desc_to_mbuf
>>  desc_addr = 0
>>  mbuf_offset = 0
>>  desc_offset = 12
>>  ...
>> <>
>>
>> Fix that by checking addresses of descriptors before using them.
>>
>> Note: For mergeable buffers this patch checks only guest's address for
>> zero, but in non-meargeable case host's address checked. This is done
>> because checking of host's address in mergeable case requires additional
>> refactoring to keep virtqueue in consistent state in case of error.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>
>> Actually, current virtio implementation looks broken for me. Because
>> 'virtio_dev_start' breaks virtqueue while it still available from the vhost
>> side.
>>
>> There was 2 patches about this behaviour:
>>
>>  1. a85786dc816f ("virtio: fix states handling during initialization")
>>  2. 9a0615af7746 ("virtio: fix restart")
>>
>> The second patch fixes somehow issue intoduced in the first patch, but 
>> actually
>> also breaks vhost in the way described above.
>> It's not pretty clear for me what to do in current situation with virtio,
>> because it will be broken for guest application even if vhost will not crash.
>>
>> May be it'll be better to forbid stopping of virtio device and force user to
>> exit and start again (may be implemented in hidden from user way)?
>>
>> This patch adds additional sane checks, so it should be applied anyway, IMHO.
> 
> Agreed.
> 
>   --yliu
> 
> 


[dpdk-dev] [PATCH] mbuf: make rearm_data address naturally aligned

2016-05-23 Thread Olivier Matz
Hi,

On 05/19/2016 05:50 PM, Thomas Monjalon wrote:
> 2016-05-19 19:05, Jerin Jacob:
>> On Thu, May 19, 2016 at 12:18:57PM +, Ananyev, Konstantin wrote:
 On Thu, May 19, 2016 at 12:20:16AM +0530, Jerin Jacob wrote:
> On Wed, May 18, 2016 at 05:43:00PM +0100, Bruce Richardson wrote:
>> On Wed, May 18, 2016 at 07:27:43PM +0530, Jerin Jacob wrote:
>>> I wonder does anyone really use mbuf port field?
>>> My though was - could we to drop it completely?
>>> Actually, after discussing it with Bruce offline, an interesting idea came 
>>> out:
>>> if we'll drop port and make mbuf_prefree() to reset nb_segs=1, then
>>> we can reduce RX rearm_data to 4B. So with that layout:
>>>
>>> struct rte_mbuf {
>>>
>>>  MARKER cacheline0;
>>>
>>> void *buf_addr;   
>>> phys_addr_t buf_physaddr; 
>>> uint16_t buf_len;
>>> uint8_t nb_segs;
>>> uint8_t reserved_1byte;   /* former port */
>>> 
>>> MARKER32 rearm_data;
>>> uint16_t data_off;
>>>uint16_t refcnt;
>>>
>>> uint64_t ol_flags;
>>> ...
>>>
>>> We can keep buf_len at its place and avoid 2B gap, while making rearm_data
>>> 4B long and 4B aligned.
>>
>> Couple of comments,
>> - IMO, It is good if nb_segs can move under rearm_data, as some
>> drivers(not in ixgbe may be) can write nb_segs in one shot also
>> in segmented rx handler case
>> - I think, it makes sense to keep port in mbuf so that application
>> can make use of it(Not sure what real application developers think of
>> this)
> 
> I agree we could try to remove the port id from mbuf.
> These mbuf data are a common base to pass infos between drivers and apps.
> If you need to store some data which are read and write from the app only,
> you can have use the private zone (see rte_pktmbuf_priv_size).

At the first read, I was in favor of keeping the port_id in the
mbuf. But after checking the examples and applications, I'm not
opposed to remove it. Indeed, this information could go in an
application-specific part or it could be an additional function
parameter in the application processing function.

The same question could be raised for nb_seg: it seems this info
is not used a lot, and having a 8 bits value here also prevents from
having long chains (ex: for socket buffer in a tcp stack).

Just an idea thrown in the air: if these 2 fields are removed, it
brings some room for the m->next field to go in the first cache line.
This was mentioned several times (at least [1]).

[1] http://dpdk.org/ml/archives/dev/2015-June/019182.html

By the way, the "pahole" utility gives a nice representation
of structures with the field sizes and offsets. Example on the
current rte_mbuf structure on x86_64:

$ make config T=x86_64-native-linuxapp-gcc
$ make -j4 EXTRA_CFLAGS="-O -g"
$ pahole -C rte_mbuf build/app/testpmd

struct rte_mbuf {
 MARKER cacheline0;   /* 0 0 */
 void * buf_addr; /* 0 8 */
 phys_addr_tbuf_physaddr; /* 8 8 */
 uint16_t   buf_len;  /*16 2 */
 MARKER8rearm_data;   /*18 0 */
 uint16_t   data_off; /*18 2 */
 union {
 rte_atomic16_t refcnt_atomic;/*   2 */
 uint16_t   refcnt;   /*   2 */
 };   /*20 2 */
 uint8_tnb_segs;  /*22 1 */
 uint8_tport; /*23 1 */
 uint64_t   ol_flags; /*24 8 */
 MARKER rx_descriptor_fields1; /*32 0 */
 union {
 uint32_t   packet_type;  /*   4 */
 struct {
 uint32_t   l2_type:4;/*32:28  4 */
 uint32_t   l3_type:4;/*32:24  4 */
 uint32_t   l4_type:4;/*32:20  4 */
 uint32_t   tun_type:4;   /*32:16  4 */
 uint32_t   inner_l2_type:4;  /*32:12  4 */
 uint32_t   inner_l3_type:4;  /*32: 8  4 */
 uint32_t   inner_l4_type:4;  /*32: 4  4 */
 };   /*   4 */
 };   /*32 4 */
 uint32_t   pkt_len;  /*36 4 */
 uint16_t   data_len; /*40 2 */
 uint16_t   vlan_tci; /*42 2 */
 union {
 uint32_t   rss;  /*   4 */
 struct {
 union {
 struct {
   

[dpdk-dev] [PATCH v2] rte mempool: division or modulo by zero

2016-05-23 Thread Olivier Matz


On 05/19/2016 02:36 PM, Slawomir Mrozowicz wrote:
> Fix issue reported by Coverity.
> 
> Coverity ID 13243: Division or modulo by zero
> In function call rte_mempool_xmem_size, division by expression total_size
> which may be zero has undefined behavior.
> 
> Fixes: 148f963fb532 ("xen: core library changes")
> 
> Signed-off-by: Slawomir Mrozowicz 

Acked-by: Olivier Matz 


[dpdk-dev] [PATCH 4/4] pmd_hw_support.py: Add tool to query binaries for hw support information

2016-05-23 Thread Panu Matilainen
On 05/20/2016 05:06 PM, Neil Horman wrote:
[...]
> Look, I think we're simply not going to agree on this issue at all.  What 
> about
> this in the way of compromise.  I simply am not comfortable with automatically
> trying to guess what hardware support will exist in an application based on 
> the
> transient contents of a plugin directory, because of all the reasons we've
> already gone over, but I do understand the desire to get information about 
> what
> _might_ be automatically loaded for an application.  what if we added a 
> 'plugin
> mode' to pmdinfo. In this mode you would dpecify a dpdk installation directory
> and an appropriate mode option.  When specified pmdinfo would scan librte_eal 
> in
> the specified directory, looking for an exported json string that informs us 
> of
> the configured plugin directory.  If found, we iterate through all the 
> libraries
> there displaying hw support.  That allows you to query the plugin directory 
> for
> available hardware support, while not implying that the application is
> guaranteed to get it (because you didn't specifically state on the command 
> line
> that you wanted to know about the applications hardware support).

That brings it all to one tiny step away from what I've been asking: 
have the plugin mode automatically locate librte_eal from an executable. 
So I'm not quite sure where the compromise is supposed to be here :)

I do appreciate wanting to differentiate between "physically" linked-in 
and runtime-loaded hardware support, they obviously *are* different from 
a technical POV. But its also a difference an average user might not 
even know about or understand, let alone care about, they just want to 
know "will it work?"

- Panu -



[dpdk-dev] [PATCH v1 1/1] examples/quota_watermark: fix low_watermark variable placement

2016-05-23 Thread Piotr Azarewicz
qw app at its init stage reserve 2*sizeof(int) memory space for quota
and low_watermark shared variables, but both apps (qw and qwctl) assign
wrong address for low_watermark pointer (out of reserved memzone space)
due to wrong pointer arithmetic.

CID 30709 : Extra sizeof expression (SIZEOF_MISMATCH)
suspicious_pointer_arithmetic: Adding 4UL /* sizeof (int) */ to pointer
(unsigned int *)(*qw_memzone).addr of type unsigned int * is suspicious
because adding an integral value to this pointer automatically scales
that value by the size, 4 bytes, of the pointed-to type, unsigned int.
Most likely, sizeof (int) is extraneous and should be replaced with 1.

Coverity issue: 30709
Fixes: 1d6c3ee3321a ("examples/quota_watermark: initial import")

Signed-off-by: Piotr Azarewicz 
---
 examples/quota_watermark/qw/init.c |2 +-
 examples/quota_watermark/qwctl/qwctl.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/examples/quota_watermark/qw/init.c 
b/examples/quota_watermark/qw/init.c
index afc1366..c208721 100644
--- a/examples/quota_watermark/qw/init.c
+++ b/examples/quota_watermark/qw/init.c
@@ -170,5 +170,5 @@ setup_shared_variables(void)
 rte_exit(EXIT_FAILURE, "%s\n", rte_strerror(rte_errno));

 quota = qw_memzone->addr;
-low_watermark = (unsigned int *) qw_memzone->addr + sizeof(int);
+low_watermark = (unsigned int *) qw_memzone->addr + 1;
 }
diff --git a/examples/quota_watermark/qwctl/qwctl.c 
b/examples/quota_watermark/qwctl/qwctl.c
index eb2f618..4961089 100644
--- a/examples/quota_watermark/qwctl/qwctl.c
+++ b/examples/quota_watermark/qwctl/qwctl.c
@@ -68,7 +68,7 @@ setup_shared_variables(void)
 rte_exit(EXIT_FAILURE, "Couldn't find memzone\n");

 quota = qw_memzone->addr;
-low_watermark = (unsigned int *) qw_memzone->addr + sizeof(int);
+low_watermark = (unsigned int *) qw_memzone->addr + 1;
 }

 int main(int argc, char **argv)
-- 
1.7.9.5



[dpdk-dev] [PATCH] i40e: Unchecked return value

2016-05-23 Thread Slawomir Mrozowicz
Calling i40e_switch_tx_queue without checking return value.
Fixed by add warning log information if return failed.

Fixes: 71d35259ff67 ("i40e: tear down flow director")
Coverity ID 13208

Signed-off-by: Slawomir Mrozowicz 
---
 drivers/net/i40e/i40e_fdir.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_fdir.c b/drivers/net/i40e/i40e_fdir.c
index 8aa41e5..d0bdf2c 100644
--- a/drivers/net/i40e/i40e_fdir.c
+++ b/drivers/net/i40e/i40e_fdir.c
@@ -288,11 +288,14 @@ i40e_fdir_teardown(struct i40e_pf *pf)
 {
struct i40e_hw *hw = I40E_PF_TO_HW(pf);
struct i40e_vsi *vsi;
+   int err = I40E_SUCCESS;

vsi = pf->fdir.fdir_vsi;
if (!vsi)
return;
-   i40e_switch_tx_queue(hw, vsi->base_queue, FALSE);
+   err = i40e_switch_tx_queue(hw, vsi->base_queue, FALSE);
+   if (err)
+   PMD_DRV_LOG(WARNING, "Failed to do FDIR TX switch off.");
i40e_switch_rx_queue(hw, vsi->base_queue, FALSE);
i40e_dev_rx_queue_release(pf->fdir.rxq);
pf->fdir.rxq = NULL;
-- 
1.9.1



[dpdk-dev] [dpdk-dev,v5,1/3] mempool: support external handler

2016-05-23 Thread Jan Viktorin
Hello David,

please, see my comments inline.

I didn't see the previous versions of the mempool (well, only very roughly) so 
I am
probably missing some points... My point of view is as a user of the handler 
API.
I need to understand the API to implement a custom handler for my purposes.

On Thu, 19 May 2016 14:44:59 +0100
David Hunt  wrote:

> Until now, the objects stored in mempool mempool were internally stored a

s/mempool mempool/mempool/

stored _in_ a ring?

> ring. This patch introduce the possibility to register external handlers
> replacing the ring.
> 
> The default behavior remains unchanged, but calling the new function
> rte_mempool_set_handler() right after rte_mempool_create_empty() allows to
> change the handler that will be used when populating the mempool.
> 
> v5 changes: rebasing on top of 35 patch set mempool work.
> 
> Signed-off-by: David Hunt 
> Signed-off-by: Olivier Matz 
> 
> ---
> app/test/test_mempool_perf.c   |   1 -
>  lib/librte_mempool/Makefile|   2 +
>  lib/librte_mempool/rte_mempool.c   |  73 --
>  lib/librte_mempool/rte_mempool.h   | 212 
> +
>  lib/librte_mempool/rte_mempool_default.c   | 147 
>  lib/librte_mempool/rte_mempool_handler.c   | 139 +++
>  lib/librte_mempool/rte_mempool_version.map |   4 +
>  7 files changed, 506 insertions(+), 72 deletions(-)
>  create mode 100644 lib/librte_mempool/rte_mempool_default.c
>  create mode 100644 lib/librte_mempool/rte_mempool_handler.c
> 
> diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
> index cdc02a0..091c1df 100644
> --- a/app/test/test_mempool_perf.c
> +++ b/app/test/test_mempool_perf.c
> @@ -161,7 +161,6 @@ per_lcore_mempool_test(__attribute__((unused)) void *arg)
>  n_get_bulk);
>   if (unlikely(ret < 0)) {
>   rte_mempool_dump(stdout, mp);
> - rte_ring_dump(stdout, mp->ring);
>   /* in this case, objects are lost... */
>   return -1;
>   }

I think, this should be in a separate patch explaining the reason to remove it.

> diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
> index 43423e0..f19366e 100644
> --- a/lib/librte_mempool/Makefile
> +++ b/lib/librte_mempool/Makefile
> @@ -42,6 +42,8 @@ LIBABIVER := 2
>  
>  # all source are stored in SRCS-y
>  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_handler.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_default.c
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
>  
> diff --git a/lib/librte_mempool/rte_mempool.c 
> b/lib/librte_mempool/rte_mempool.c
> index 1ab6701..6ec2b3f 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -148,7 +148,7 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
> phys_addr_t physaddr)
>  #endif
>  
>   /* enqueue in ring */
> - rte_ring_sp_enqueue(mp->ring, obj);
> + rte_mempool_ext_put_bulk(mp, &obj, 1);

I suppose this is OK, however, replacing "enqueue" by "put" (semantically) 
sounds to me
like a bug. Enqueue is put into a queue. Put is to drop a reference.

>  }
>  
>  /* call obj_cb() for each mempool element */
> @@ -300,40 +300,6 @@ rte_mempool_xmem_usage(__rte_unused void *vaddr, 
> uint32_t elt_num,
>   return (size_t)paddr_idx << pg_shift;
>  }
>  
> -/* create the internal ring */
> -static int
> -rte_mempool_ring_create(struct rte_mempool *mp)
> -{
> - int rg_flags = 0, ret;
> - char rg_name[RTE_RING_NAMESIZE];
> - struct rte_ring *r;
> -
> - ret = snprintf(rg_name, sizeof(rg_name),
> - RTE_MEMPOOL_MZ_FORMAT, mp->name);
> - if (ret < 0 || ret >= (int)sizeof(rg_name))
> - return -ENAMETOOLONG;
> -
> - /* ring flags */
> - if (mp->flags & MEMPOOL_F_SP_PUT)
> - rg_flags |= RING_F_SP_ENQ;
> - if (mp->flags & MEMPOOL_F_SC_GET)
> - rg_flags |= RING_F_SC_DEQ;
> -
> - /* Allocate the ring that will be used to store objects.
> -  * Ring functions will return appropriate errors if we are
> -  * running as a secondary process etc., so no checks made
> -  * in this function for that condition.
> -  */
> - r = rte_ring_create(rg_name, rte_align32pow2(mp->size + 1),
> - mp->socket_id, rg_flags);
> - if (r == NULL)
> - return -rte_errno;
> -
> - mp->ring = r;
> - mp->flags |= MEMPOOL_F_RING_CREATED;
> - return 0;
> -}

This is a big change. I suggest (if possible) to make a separate patch with
something like "replace rte_mempool_ring_create by ...". Where is this code
placed now?

> -
>  /* free a memchunk allocated with rte_me

[dpdk-dev] [dpdk-dev, v5, 3/3] mbuf: get default mempool handler from configuration

2016-05-23 Thread Jan Viktorin
On Thu, 19 May 2016 14:45:01 +0100
David Hunt  wrote:

> By default, the mempool handler used for mbuf allocations is a multi
> producer and multi consumer ring. We could imagine a target (maybe some
> network processors?) that provides an hardware-assisted pool
> mechanism. In this case, the default configuration for this architecture
> would contain a different value for RTE_MBUF_DEFAULT_MEMPOOL_HANDLER.
> 
> Signed-off-by: David Hunt 
> Signed-off-by: Olivier Matz 
> 
> ---
> config/common_base |  1 +
>  lib/librte_mbuf/rte_mbuf.c | 21 +
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/config/common_base b/config/common_base
> index 3535c6e..5cf5e52 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -394,6 +394,7 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n
>  #
>  CONFIG_RTE_LIBRTE_MBUF=y
>  CONFIG_RTE_LIBRTE_MBUF_DEBUG=n
> +CONFIG_RTE_MBUF_DEFAULT_MEMPOOL_HANDLER="ring_mp_mc"
>  CONFIG_RTE_MBUF_REFCNT_ATOMIC=y
>  CONFIG_RTE_PKTMBUF_HEADROOM=128
>  
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index eec1456..5dcdc05 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -153,6 +153,7 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
>   unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
>   int socket_id)
>  {
> + struct rte_mempool *mp;
>   struct rte_pktmbuf_pool_private mbp_priv;
>   unsigned elt_size;
>  
> @@ -167,10 +168,22 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
>   mbp_priv.mbuf_data_room_size = data_room_size;
>   mbp_priv.mbuf_priv_size = priv_size;
>  
> - return rte_mempool_create(name, n, elt_size,
> - cache_size, sizeof(struct rte_pktmbuf_pool_private),
> - rte_pktmbuf_pool_init, &mbp_priv, rte_pktmbuf_init, NULL,
> - socket_id, 0);
> + mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
> +  sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
> + if (mp == NULL)
> + return NULL;
> +
> + rte_mempool_set_handler(mp, RTE_MBUF_DEFAULT_MEMPOOL_HANDLER);

Check for a failure is missing here. Especially -EEXIST.

> + rte_pktmbuf_pool_init(mp, &mbp_priv);
> +
> + if (rte_mempool_populate_default(mp) < 0) {
> + rte_mempool_free(mp);
> + return NULL;
> + }
> +
> + rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
> +
> + return mp;
>  }
>  
>  /* do some sanity checks on a mbuf: panic if it fails */



-- 
   Jan Viktorin  E-mail: Viktorin at RehiveTech.com
   System Architect  Web:www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic


[dpdk-dev] [dpdk-dev, v5, 2/3] app/test: test external mempool handler

2016-05-23 Thread Jan Viktorin
On Thu, 19 May 2016 14:45:00 +0100
David Hunt  wrote:

> Use a minimal custom mempool external handler and check that it also
> passes basic mempool autotests.
> 
> Signed-off-by: Olivier Matz 
> Signed-off-by: David Hunt 
> 
> ---
> app/test/test_mempool.c | 113 
>  1 file changed, 113 insertions(+)
> 
> diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
> index 9f02758..f55d126 100644
> --- a/app/test/test_mempool.c
> +++ b/app/test/test_mempool.c
> @@ -85,6 +85,96 @@
>  static rte_atomic32_t synchro;
>  
>  /*
> + * Simple example of custom mempool structure. Holds pointers to all the
> + * elements which are simply malloc'd in this example.
> + */
> +struct custom_mempool {
> + rte_spinlock_t lock;
> + unsigned count;
> + unsigned size;
> + void *elts[];
> +};
> +
> +/*
> + * Loop though all the element pointers and allocate a chunk of memory, then

s/though/through/

> + * insert that memory into the ring.
> + */
> +static void *
> +custom_mempool_alloc(struct rte_mempool *mp)
> +{
> + struct custom_mempool *cm;
> +
> + cm = rte_zmalloc("custom_mempool",
> + sizeof(struct custom_mempool) + mp->size * sizeof(void *), 0);
> + if (cm == NULL)
> + return NULL;
> +
> + rte_spinlock_init(&cm->lock);
> + cm->count = 0;
> + cm->size = mp->size;
> + return cm;
> +}
> +
> +static void
> +custom_mempool_free(void *p)
> +{
> + rte_free(p);
> +}
> +
> +static int
> +custom_mempool_put(void *p, void * const *obj_table, unsigned n)
> +{
> + struct custom_mempool *cm = (struct custom_mempool *)p;
> + int ret = 0;
> +
> + rte_spinlock_lock(&cm->lock);
> + if (cm->count + n > cm->size) {
> + ret = -ENOBUFS;
> + } else {
> + memcpy(&cm->elts[cm->count], obj_table, sizeof(void *) * n);
> + cm->count += n;
> + }
> + rte_spinlock_unlock(&cm->lock);
> + return ret;
> +}
> +
> +
> +static int
> +custom_mempool_get(void *p, void **obj_table, unsigned n)
> +{
> + struct custom_mempool *cm = (struct custom_mempool *)p;
> + int ret = 0;
> +
> + rte_spinlock_lock(&cm->lock);
> + if (n > cm->count) {
> + ret = -ENOENT;
> + } else {
> + cm->count -= n;
> + memcpy(obj_table, &cm->elts[cm->count], sizeof(void *) * n);
> + }
> + rte_spinlock_unlock(&cm->lock);
> + return ret;
> +}
> +
> +static unsigned
> +custom_mempool_get_count(void *p)
> +{
> + struct custom_mempool *cm = (struct custom_mempool *)p;
> + return cm->count;
> +}
> +
> +static struct rte_mempool_handler mempool_handler_custom = {
> + .name = "custom_handler",
> + .alloc = custom_mempool_alloc,
> + .free = custom_mempool_free,
> + .put = custom_mempool_put,
> + .get = custom_mempool_get,
> + .get_count = custom_mempool_get_count,
> +};
> +
> +MEMPOOL_REGISTER_HANDLER(mempool_handler_custom);

What about to drop the rte_mempool_handler.name field and derive the
name from the variable name given to the MEMPOOL_REGISTER_HANDLER.
The MEMPOOL_REGISTER_HANDLER sould do some macro magic inside and call

  rte_mempool_handler_register(name, handler);

Just an idea...

> +
> +/*
>   * save the object number in the first 4 bytes of object data. All
>   * other bytes are set to 0.
>   */
> @@ -479,6 +569,7 @@ test_mempool(void)
>  {
>   struct rte_mempool *mp_cache = NULL;
>   struct rte_mempool *mp_nocache = NULL;
> + struct rte_mempool *mp_ext = NULL;
>  
>   rte_atomic32_init(&synchro);
>  
> @@ -507,6 +598,27 @@ test_mempool(void)
>   goto err;
>   }
>  
> + /* create a mempool with an external handler */
> + mp_ext = rte_mempool_create_empty("test_ext",
> + MEMPOOL_SIZE,
> + MEMPOOL_ELT_SIZE,
> + RTE_MEMPOOL_CACHE_MAX_SIZE, 0,
> + SOCKET_ID_ANY, 0);
> +
> + if (mp_ext == NULL) {
> + printf("cannot allocate mp_ext mempool\n");
> + goto err;
> + }
> + if (rte_mempool_set_handler(mp_ext, "custom_handler") < 0) {
> + printf("cannot set custom handler\n");
> + goto err;
> + }
> + if (rte_mempool_populate_default(mp_ext) < 0) {
> + printf("cannot populate mp_ext mempool\n");
> + goto err;
> + }
> + rte_mempool_obj_iter(mp_ext, my_obj_init, NULL);
> +

The test becomes quite complex. What about having several smaller
tests with a clear setup and cleanup steps?

>   /* retrieve the mempool from its name */
>   if (rte_mempool_lookup("test_nocache") != mp_nocache) {
>   printf("Cannot lookup mempool from its name\n");
> @@ -547,6 +659,7 @@ test_mempool(void)
>  err:
>   rte_mempool_free(mp_nocache);
>   rte_mempool_free(mp_cache);
> + rte_mempool_free(mp_ext);
>   return -1;
>  }
>  


[dpdk-dev] [PATCH v2 1/3] mempool: add stack (lifo) mempool handler

2016-05-23 Thread Olivier Matz
Hi David,

Please find some comments below.

On 05/19/2016 04:48 PM, David Hunt wrote:
> This is a mempool handler that is useful for pipelining apps, where
> the mempool cache doesn't really work - example, where we have one
> core doing rx (and alloc), and another core doing Tx (and return). In
> such a case, the mempool ring simply cycles through all the mbufs,
> resulting in a LLC miss on every mbuf allocated when the number of
> mbufs is large. A stack recycles buffers more effectively in this
> case.
> 
> v2: cleanup based on mailing list comments. Mainly removal of
> unnecessary casts and comments.
> 
> Signed-off-by: David Hunt 
> ---
>  lib/librte_mempool/Makefile|   1 +
>  lib/librte_mempool/rte_mempool_stack.c | 145 
> +
>  2 files changed, 146 insertions(+)
>  create mode 100644 lib/librte_mempool/rte_mempool_stack.c
> 
> diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
> index f19366e..5aa9ef8 100644
> --- a/lib/librte_mempool/Makefile
> +++ b/lib/librte_mempool/Makefile
> @@ -44,6 +44,7 @@ LIBABIVER := 2
>  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_handler.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_default.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_stack.c
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
>  
> diff --git a/lib/librte_mempool/rte_mempool_stack.c 
> b/lib/librte_mempool/rte_mempool_stack.c
> new file mode 100644
> index 000..6e25028
> --- /dev/null
> +++ b/lib/librte_mempool/rte_mempool_stack.c
> @@ -0,0 +1,145 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.

Should be 2016?

> ...
> +
> +static void *
> +common_stack_alloc(struct rte_mempool *mp)
> +{
> + struct rte_mempool_common_stack *s;
> + unsigned n = mp->size;
> + int size = sizeof(*s) + (n+16)*sizeof(void *);
> +
> + /* Allocate our local memory structure */
> + s = rte_zmalloc_socket("common-stack",

"mempool-stack" ?

> + size,
> + RTE_CACHE_LINE_SIZE,
> + mp->socket_id);
> + if (s == NULL) {
> + RTE_LOG(ERR, MEMPOOL, "Cannot allocate stack!\n");
> + return NULL;
> + }
> +
> + rte_spinlock_init(&s->sl);
> +
> + s->size = n;
> + mp->pool = s;
> + rte_mempool_set_handler(mp, "stack");

rte_mempool_set_handler() is a user function, it should be called here

> +
> + return s;
> +}
> +
> +static int common_stack_put(void *p, void * const *obj_table,
> + unsigned n)
> +{
> + struct rte_mempool_common_stack *s = p;
> + void **cache_objs;
> + unsigned index;
> +
> + rte_spinlock_lock(&s->sl);
> + cache_objs = &s->objs[s->len];
> +
> + /* Is there sufficient space in the stack ? */
> + if ((s->len + n) > s->size) {
> + rte_spinlock_unlock(&s->sl);
> + return -ENOENT;
> + }

The usual return value for a failing put() is ENOBUFS (see in rte_ring).


After reading it, I realize that it's nearly exactly the same code than
in "app/test: test external mempool handler".
http://patchwork.dpdk.org/dev/patchwork/patch/12896/

We should drop one of them. If this stack handler is really useful for
a performance use-case, it could go in librte_mempool. At the first
read, the code looks like a demo example : it uses a simple spinlock for
concurrent accesses to the common pool. Maybe the mempool cache hides
this cost, in this case we could also consider removing the use of the
rte_ring.

Do you have some some performance numbers? Do you know if it scales
with the number of cores?

If we can identify the conditions where this mempool handler
overperforms the default handler, it would be valuable to have them
in the documentation.


Regards,
Olivier


[dpdk-dev] [PATCH v2 2/3] mempool: make declaration of handler structs const

2016-05-23 Thread Olivier Matz
Hi David,

On 05/19/2016 04:48 PM, David Hunt wrote:
> For security, any data structure with function pointers should be const
> 
> Signed-off-by: David Hunt 

I think this should be merged in the patchset adding the mempool
handler feature.



[dpdk-dev] [PATCH v2 5/7] eal/linux: mmap ioports on ppc64

2016-05-23 Thread Yuanhan Liu
On Tue, May 17, 2016 at 05:54:01PM +0200, David Marchand wrote:
> > +pci_uio_ioport_map(struct rte_pci_device *dev, int bar,
> > +  struct rte_pci_ioport *p)
> > +{
> > +   FILE *f;
> > +   char buf[BUFSIZ];
> > +   char filename[PATH_MAX];
> > +   uint64_t phys_addr, end_addr, flags;
> > +   int fd, i;
> > +   void *addr;
> > +
> > +   /* open and read addresses of the corresponding resource in sysfs */
> > +   snprintf(filename, sizeof(filename), "%s/" PCI_PRI_FMT "/resource",
> > +   SYSFS_PCI_DEVICES, dev->addr.domain, dev->addr.bus,
> > +dev->addr.devid, dev->addr.function);
> > +   f = fopen(filename, "r");
> > +   if (f == NULL) {
> > +   RTE_LOG(ERR, EAL, "Cannot open sysfs resource: %s\n",
> > +   strerror(errno));
> > +   return -1;
> > +   }
> > +   for (i = 0; i < bar + 1; i++) {
> > +   if (fgets(buf, sizeof(buf), f) == NULL) {
> > +   RTE_LOG(ERR, EAL, "Cannot read sysfs resource\n");
> > +   goto error;
> > +   }
> > +   }
> > +   if (pci_parse_one_sysfs_resource(buf, sizeof(buf), &phys_addr,
> > +   &end_addr, &flags) < 0)
> > +   goto error;
> > +   if ((flags & IORESOURCE_IO) == 0) {
> > +   RTE_LOG(ERR, EAL, "BAR %d is not an IO resource\n", bar);
> > +   goto error;
> > +   }
> > +   snprintf(filename, sizeof(filename), "%s/" PCI_PRI_FMT 
> > "/resource%d",
> > +   SYSFS_PCI_DEVICES, dev->addr.domain, dev->addr.bus,
> > +dev->addr.devid, dev->addr.function, bar);
> > +
> > +   /* mmap the pci resource */
> > +   fd = open(filename, O_RDWR);
> > +   if (fd < 0) {
> > +   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", filename,
> > +   strerror(errno));
> > +   goto error;
> > +   }
> > +   addr = mmap(NULL, end_addr + 1, PROT_READ | PROT_WRITE,
> > +   MAP_SHARED, fd, 0);
> 
> Sorry, did not catch it in v1, but a close(fd) is missing here.
> With this, I think the patchset looks good.
> 
> Just missing some opinion from the virtio maintainers ?

Apologize for being late for review. Assuming you have done proper
test, this patch set looks good to me. (well, I don't quite like
the tons of "#ifdef ... #else ..#end" block though)

A side note is that I noticed an ABI breakage introduced in this
patch, so, this release is not a good fit?

--yliu


[dpdk-dev] [PATCH v2] vhost: add support for dynamic vhost PMD creation

2016-05-23 Thread Yuanhan Liu
On Fri, May 20, 2016 at 11:37:47AM +0100, Bruce Richardson wrote:
> On Thu, May 19, 2016 at 06:44:44PM +0200, Thomas Monjalon wrote:
> > 2016-05-19 17:28, Ferruh Yigit:
> > > On 5/19/2016 9:33 AM, Thomas Monjalon wrote:
> > > > 2016-05-18 18:10, Ferruh Yigit:
> > > >> Add rte_eth_from_vhost() API to create vhost PMD dynamically from
> > > >> applications.
> > > > 
> > > > How is it different from rte_eth_dev_attach() calling 
> > > > rte_eal_vdev_init()?
> > > > 
> > > 
> > > When used rte_eth_dev_attach(), application also needs to do:
> > > rte_eth_dev_configure()
> > > rte_eth_rx_queue_setup()
> > > rte_eth_tx_queue_setup()
> > > rte_eth_dev_start()
> > > 
> > > rte_eth_from_vhost() does these internally, easier to use for 
> > > applications.
> > 
> > This argument is not sufficient.
> > We are not going to add new APIs just for wrapping others.
> 
> Why not - if there is a sufficient increase in developer usability by doing 
> so?
> Having one API that saves an app from having to call 5 other APIs looks like
> something that should always be given fair consideration.

Good point. Judging that vhost is not the only virtual device we
support, and it may also look reasonable to add something similar
for others in future (say, IIRC, you proposed two more internally
that also introduced similar APIs). So, instead of introducing a
new API for each such vdev, may we introduce a common one? Say,
a refined rte_eth_dev_attach(), including dev_configure(),
queue_setup(), etc.

Makes sense?

--yliu


[dpdk-dev] virtio: crash when using multiple processes (16.04 regression)

2016-05-23 Thread Yuanhan Liu
On Thu, May 19, 2016 at 04:20:40PM +, Yoni Gilad wrote:
> Hi,
> 
> We have encountered a crash in virtio_xmit_pkts (specifically, in the call to 
> virtqueue_notify) when running DPDK in a multi-process setup. This is a 
> regression in DPDK 16.04.
> 
> The culprit seems to be the field vtpci_ops in the virtio_hw structure. This 
> field is stored in shared memory, but points to a struct in the primary 
> process's address space. If the same struct was loaded in a different address 
> in the secondary process, it will lead to a crash or other issues when this 
> field is dereferenced there. The referenced virtio_pci_ops struct contains 
> function pointers, which can also be different in the secondary process.


That indeed sounds like to be the culprit. Function pointers is known
for not friendly for multiple processes: see the 18.c section of DPDK
programmers guide 
(http://dpdk.org/doc/guides/prog_guide/multi_proc_support.html):

The use of function pointers between multiple processes running based of
different compiled binaries is not supported, since the location of a
given function in one process may be different to its location in a
second. This prevents the librte_hash library from behaving properly as
in a multi-threaded instance, since it uses a pointer to the hash
function internally.


TBH, I missed this bit (multiple processes) while introducing this
function pointer; well, we never tested it before, either.

We could fix/workaround it by getting the right function pointer set
dynamically, but that far from being perfect.

--yliu


[dpdk-dev] unable to bind to vfio_pci inside RHEL virtual machine.

2016-05-23 Thread DING, TAO

Hello dpdk dev,

Do you know if the vfio_pci can be bind to network interface from within RedHat 
virtual machine ? I read the doc that igb_uio  should not be used ; it is not 
stable. (http://people.redhat.com/~pmatilai/dpdk-guide/index.html) however I 
cannot use vfio_pci driver from inside VM.

Currently I am working on a project that migrating a network package capture 
application into virtual machines so that it can be hosted on  cloud. My intent 
is using SR-IOR to ensure the data sending from physical NIC to vNIC  in line 
speed; using DPDK inside VM to read data from vNIC to get good performance 
because the libpcap does not perform well  inside VM.

Following dpdk instruction, I was able to set up the SR-IOV and bind the 
vfio_pci to virtual Function on the host. Once the VM starts, the Virtual 
Functions bind to vfio-pci automatically on the host. . The following is the 
output from host.
Option: 22


Network devices using DPDK-compatible driver

:04:10.4 'X540 Ethernet Controller Virtual Function' drv=vfio-pci unused=
:04:10.6 'X540 Ethernet Controller Virtual Function' drv=vfio-pci unused=
:04:11.4 'X540 Ethernet Controller Virtual Function' drv=vfio-pci unused=
:04:11.6 'X540 Ethernet Controller Virtual Function' drv=vfio-pci unused=

Network devices using kernel driver
===
:01:00.0 'NetXtreme BCM5720 Gigabit Ethernet PCIe' if=em1 drv=tg3 
unused=vfio-pci
:01:00.1 'NetXtreme BCM5720 Gigabit Ethernet PCIe' if=em2 drv=tg3 
unused=vfio-pci
:02:00.0 'NetXtreme BCM5720 Gigabit Ethernet PCIe' if=em3 drv=tg3 
unused=vfio-pci
:02:00.1 'NetXtreme BCM5720 Gigabit Ethernet PCIe' if=em4 drv=tg3 
unused=vfio-pci
:04:00.0 'Ethernet Controller 10-Gigabit X540-AT2' if=p3p1 drv=ixgbe 
unused=vfio-pci *Active*
:04:00.1 'Ethernet Controller 10-Gigabit X540-AT2' if=p3p2 drv=ixgbe 
unused=vfio-pci
:04:10.0 'X540 Ethernet Controller Virtual Function' if=p3p1_0 drv=ixgbevf 
unused=vfio-pci
:04:10.2 'X540 Ethernet Controller Virtual Function' if=p3p1_1 drv=ixgbevf 
unused=vfio-pci
:04:11.0 'X540 Ethernet Controller Virtual Function' if=p3p1_4 drv=ixgbevf 
unused=vfio-pci
:04:11.2 'X540 Ethernet Controller Virtual Function' if=p3p1_5 drv=ixgbevf 
unused=vfio-pci

I repeated the same set up within the VM which has 4 Virtual Functions assigned 
to it, I could not successfully bind any of network devices  to vfio-pci. I 
followed different suggestion from the web , but no luck.   (however I was able 
to bind UIO driver to the network devices inside the VM)
One difference I noticed between VM and host is the outcome of IOMMU setting.  
On the host, the /sys/kernel/iommu_groups/ is NOT empty , but on the VM , it is 
empty. I rebooted VM several time. There's not luck.

The following is the output from inside the VM. The driver vfio_pci  is visible 
to lsmod , but not visible to driverctl.

[root at hn14vm3 tools]# driverctl -v list-devices | grep -i net
:00:03.0 ixgbevf (X540 Ethernet Controller Virtual Function)
:00:08.0 ixgbevf (X540 Ethernet Controller Virtual Function)
:00:09.0 ixgbevf (X540 Ethernet Controller Virtual Function)
:00:0b.0 ixgbevf (X540 Ethernet Controller Virtual Function)
:00:0c.0 e1000 (82540EM Gigabit Ethernet Controller (QEMU Virtual Machine))
:00:0d.0 e1000 (82540EM Gigabit Ethernet Controller (QEMU Virtual Machine))
[root at hn14vm3 tools]# lsmod |grep uio
igb_uio13224  0
uio19259  1 igb_uio
[root at hn14vm3 tools]# lsmod |grep vfio
vfio_pci   36735  0
vfio_iommu_type1   17632  0
vfio   25291  2 vfio_iommu_type1,vfio_pci
[root at hn14vm3 tools]# driverctl set-override :00:03.0 vfio_pci
driverctl: failed to bind device :00:03.0 to driver vfio_pci
[root at hn14vm3 tools]# driverctl set-override :00:03.0 igb_uio
[root at hn14vm3 tools]# driverctl -v list-devices | grep -i net
:00:03.0 igb_uio [*] (X540 Ethernet Controller Virtual Function)
:00:08.0 ixgbevf (X540 Ethernet Controller Virtual Function)
:00:09.0 ixgbevf (X540 Ethernet Controller Virtual Function)
:00:0b.0 ixgbevf (X540 Ethernet Controller Virtual Function)
:00:0c.0 e1000 (82540EM Gigabit Ethernet Controller (QEMU Virtual Machine))
:00:0d.0 e1000 (82540EM Gigabit Ethernet Controller (QEMU Virtual Machine))

Here are some info about host and vm.
OS, -- both host and VM , redhat enterprise 7.2.  Dell R630 with intel 10GB, 
(and one 1GB interface).
Dpdk version -- dpdk-16.04.
VM has 4 cores and 10G RAM.
VM created  with virt-manager , used the default setting except the "IDE Disk 
1/Advanced options/Performance options/Cache mode chose writeback"

Any pointers would be appreciated
Thanks a lot for your time.





Tao Ding



[dpdk-dev] [PATCH v2 5/7] eal/linux: mmap ioports on ppc64

2016-05-23 Thread Olivier Matz
Hi Yuanhan,

On 05/23/2016 03:07 PM, Yuanhan Liu wrote:
> On Tue, May 17, 2016 at 05:54:01PM +0200, David Marchand wrote:
>>> +pci_uio_ioport_map(struct rte_pci_device *dev, int bar,
>>> +  struct rte_pci_ioport *p)
>>> +{
>>> +   FILE *f;
>>> +   char buf[BUFSIZ];
>>> +   char filename[PATH_MAX];
>>> +   uint64_t phys_addr, end_addr, flags;
>>> +   int fd, i;
>>> +   void *addr;
>>> +
>>> +   /* open and read addresses of the corresponding resource in sysfs */
>>> +   snprintf(filename, sizeof(filename), "%s/" PCI_PRI_FMT "/resource",
>>> +   SYSFS_PCI_DEVICES, dev->addr.domain, dev->addr.bus,
>>> +dev->addr.devid, dev->addr.function);
>>> +   f = fopen(filename, "r");
>>> +   if (f == NULL) {
>>> +   RTE_LOG(ERR, EAL, "Cannot open sysfs resource: %s\n",
>>> +   strerror(errno));
>>> +   return -1;
>>> +   }
>>> +   for (i = 0; i < bar + 1; i++) {
>>> +   if (fgets(buf, sizeof(buf), f) == NULL) {
>>> +   RTE_LOG(ERR, EAL, "Cannot read sysfs resource\n");
>>> +   goto error;
>>> +   }
>>> +   }
>>> +   if (pci_parse_one_sysfs_resource(buf, sizeof(buf), &phys_addr,
>>> +   &end_addr, &flags) < 0)
>>> +   goto error;
>>> +   if ((flags & IORESOURCE_IO) == 0) {
>>> +   RTE_LOG(ERR, EAL, "BAR %d is not an IO resource\n", bar);
>>> +   goto error;
>>> +   }
>>> +   snprintf(filename, sizeof(filename), "%s/" PCI_PRI_FMT 
>>> "/resource%d",
>>> +   SYSFS_PCI_DEVICES, dev->addr.domain, dev->addr.bus,
>>> +dev->addr.devid, dev->addr.function, bar);
>>> +
>>> +   /* mmap the pci resource */
>>> +   fd = open(filename, O_RDWR);
>>> +   if (fd < 0) {
>>> +   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", filename,
>>> +   strerror(errno));
>>> +   goto error;
>>> +   }
>>> +   addr = mmap(NULL, end_addr + 1, PROT_READ | PROT_WRITE,
>>> +   MAP_SHARED, fd, 0);
>>
>> Sorry, did not catch it in v1, but a close(fd) is missing here.
>> With this, I think the patchset looks good.
>>
>> Just missing some opinion from the virtio maintainers ?
> 
> Apologize for being late for review. Assuming you have done proper
> test, this patch set looks good to me. (well, I don't quite like
> the tons of "#ifdef ... #else ..#end" block though)
> 
> A side note is that I noticed an ABI breakage introduced in this
> patch, so, this release is not a good fit?

Thank you for the review.

For reference, here is the report of the ABI checker for EAL:

[?] struct rte_pci_ioport (2)

 1 Field len has been added to this type.
   1) This field will not be initialized by old clients.
   2) Size of the inclusive type has been changed.
  NOTE: this field should be accessed only from the new library
  functions, otherwise it may result in crash or incorrect behavior
  of applications.
 2 Size of this type has been changed from 16 bytes to 24 bytes.
   The fields or parameters of such data type may be incorrectly
   initialized or accessed by old client applications.

[?] affected symbols (4)
 rte_eal_pci_ioport_map ( struct rte_pci_device* dev, int bar,
struct rte_pci_ioport* p ) @@ DPDK_16.04
 3rd parameter 'p' (pointer) has base type 'struct rte_pci_ioport'.
 rte_eal_pci_ioport_read ( struct rte_pci_ioport* p, void* data,
size_t len, off_t offset ) @@ DPDK_16.04
 1st parameter 'p' (pointer) has base type 'struct rte_pci_ioport'.
 rte_eal_pci_ioport_unmap ( struct rte_pci_ioport* p ) @@ DPDK_16.04
 1st parameter 'p' (pointer) has base type 'struct rte_pci_ioport'.
 rte_eal_pci_ioport_write ( struct rte_pci_ioport* p, void const* data,
size_t len, off_t offset ) @@ DPDK_16.04
 1st parameter 'p' (pointer) has base type 'struct rte_pci_ioport'.


My understanding of the comment for this structure is that it's
internal to EAL:

/**
 * A structure used to access io resources for a pci device.
 * rte_pci_ioport is arch, os, driver specific, and should not be used
outside
 * of pci ioport api.
 */
struct rte_pci_ioport {
 ...
}

So I'd say it's ok to have it integrated for 16.07.

Regards,
Olivier


[dpdk-dev] [PATCH] virtio: check mbuf is direct when using any layout

2016-05-23 Thread Yuanhan Liu
On Mon, May 09, 2016 at 06:19:35PM +0200, Olivier Matz wrote:
> The commit dd856dfcb9e74 introduced an optimization that prepends virtio
> header to mbuf data. It can be used when the tx mbuf is writeable, so we
> need to check that the mbuf is direct (i.e. it embeds its own data).
> 
> Fixes: dd856dfcb9e74 "virtio: use any layout on Tx"

Missing "()" here. Fixed and applied to dpdk-next-virtio.

Thanks.

--yliu


[dpdk-dev] [PATCH 4/4] pmd_hw_support.py: Add tool to query binaries for hw support information

2016-05-23 Thread Neil Horman
On Mon, May 23, 2016 at 02:56:18PM +0300, Panu Matilainen wrote:
> On 05/20/2016 05:06 PM, Neil Horman wrote:
> [...]
> > Look, I think we're simply not going to agree on this issue at all.  What 
> > about
> > this in the way of compromise.  I simply am not comfortable with 
> > automatically
> > trying to guess what hardware support will exist in an application based on 
> > the
> > transient contents of a plugin directory, because of all the reasons we've
> > already gone over, but I do understand the desire to get information about 
> > what
> > _might_ be automatically loaded for an application.  what if we added a 
> > 'plugin
> > mode' to pmdinfo. In this mode you would dpecify a dpdk installation 
> > directory
> > and an appropriate mode option.  When specified pmdinfo would scan 
> > librte_eal in
> > the specified directory, looking for an exported json string that informs 
> > us of
> > the configured plugin directory.  If found, we iterate through all the 
> > libraries
> > there displaying hw support.  That allows you to query the plugin directory 
> > for
> > available hardware support, while not implying that the application is
> > guaranteed to get it (because you didn't specifically state on the command 
> > line
> > that you wanted to know about the applications hardware support).
> 
> That brings it all to one tiny step away from what I've been asking: have
> the plugin mode automatically locate librte_eal from an executable. So I'm
> not quite sure where the compromise is supposed to be here :)
The compromise is that I'm not willing to quietly assume that a given
application linked to the dpdk library in /usr/lib64/dpdk-, will get
hardware support for the cxgb4, mlx5 and ixgbe pmds, because those DSO's are in
the exported RTE_EAL_PMD_PATH.  With this method, you at least have to tell the
pmdinfo application that I wish to scan that path for pmds and report on
hardware support for whatever is found there.  Thats a different operation from
getting a report on what hardware an application supports.  i.e. its the
difference between asking the questions:

"What hardware does the application /usr/bin/foo support"
and
"What hardware support is available via the plugin DSO's pointed to by the dpdk
version in /usr/lib64/dpdk-2.2"

I feel its important for users to understand that autoloading doesn't not
guarantee support for the hardware that is autoloaded to any application.  My
compromise is to provide what your asking for, but doing so in a way that
attempts to make that differentiation clear.

> 
> I do appreciate wanting to differentiate between "physically" linked-in and
> runtime-loaded hardware support, they obviously *are* different from a
> technical POV. But its also a difference an average user might not even know
> about or understand, let alone care about, they just want to know "will it
> work?"
> 

Which average user are we talking about here?  Keep in mind the DPDK is
anything but mainstream.  Its a toolkit to implement high speed network
communications for niche or custom purpose applications.  The users of tools
like this are going to be people who understand the nuance of how
applications are built and want to tune them to work at their most efficient
point, not some person just trying to get firefox to connect to digg.  I think
its reasonable to assume that people who are running the tool have sufficient
knoweldge to understand that DSO's and static binaries may embody hardware
support differently, and that things which are loaded at run time may not be
reported at scan time (by way of corressponding example, though its not perfect,
people seem to understand that iscsid supports lots of different HBA's, but the
specific hardware support for those HBA's isn't codified in /usr/sbin/iscsid,
but rather in the modules under /lib/modules/   - Panu -
> 
> 


[dpdk-dev] [PATCH v2] doc: add known issue with EAL argv

2016-05-23 Thread Jingjing Wu
This patch docs the issue on EAL argument that the last EAL
argument is replaced by program name in argv[].

Reported-by: Ziye Yang 
Signed-off-by: Jingjing Wu 
---
 doc/guides/rel_notes/known_issues.rst | 21 +
 1 file changed, 21 insertions(+)

diff --git a/doc/guides/rel_notes/known_issues.rst 
b/doc/guides/rel_notes/known_issues.rst
index 923a202..8ca2a86 100644
--- a/doc/guides/rel_notes/known_issues.rst
+++ b/doc/guides/rel_notes/known_issues.rst
@@ -618,3 +618,24 @@ DPDK may not build on some Intel CPUs using clang < 3.7.0

 **Driver/Module**:
Environment Abstraction Layer (EAL).
+
+
+The last EAL argument is replaced by the program name in argv[]
+---
+
+**Description**:
+   The last EAL argument is replaced by program name in ``argv[]`` after 
``eal_parse_args`` is called.
+   This is the intended behavior but it causes the pointer to the last EAL 
argument to be lost.
+
+**Implication**:
+  If the last EAL argument in ``argv[]`` is generated by a malloc function, 
changing it will cause memory
+  issues when freeing the argument.
+
+**Resolution/Workaround**:
+   An application should not consider the value in ``argv[]`` as unchanged.
+
+**Affected Environment/Platform**:
+   ALL.
+
+**Driver/Module**:
+   Environment Abstraction Layer (EAL).
-- 
2.4.0



[dpdk-dev] [PATCH] crypto: fix control issues in aesni pmd

2016-05-23 Thread Deepak Kumar Jain
From: "Jain, Deepak K" 

Fix the control issues for return value

Fixes: 924e84f87306 ("aesni_mb: add driver for multi buffer based crypto")
Coverity ID 126585

Signed-off-by: Deepak Kumar Jain 
---
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c 
b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
index 3415ac1..9c42f88 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
@@ -560,7 +560,7 @@ aesni_mb_pmd_enqueue_burst(void *queue_pair, struct 
rte_crypto_op **ops,
goto flush_jobs;
else
qp->stats.enqueued_count += processed_jobs;
-   return i;
+   return i;

 flush_jobs:
/*
-- 
2.5.5



[dpdk-dev] [PATCH] crypto: fix null pointer dereferencing

2016-05-23 Thread Deepak Kumar Jain
From: "Jain, Deepak K" 

Fix null pointer dereferencing by reporing if null and
exiting the function.

Fixes: c0f87eb5252b ("cryptodev: change burst API to be crypto op oriented")
Coverity issue: 126584

Signed-off-by: Deepak Kumar Jain 
---
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c 
b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
index 9c42f88..31784e1 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
@@ -379,9 +379,11 @@ process_crypto_op(struct aesni_mb_qp *qp, struct 
rte_crypto_op *op,
/* append space for output data to mbuf */
char *odata = rte_pktmbuf_append(m_dst,
rte_pktmbuf_data_len(op->sym->m_src));
-   if (odata == NULL)
+   if (odata == NULL) {
MB_LOG_ERR("failed to allocate space in destination "
"mbuf for source data");
+   return NULL;
+   }

memcpy(odata, rte_pktmbuf_mtod(op->sym->m_src, void*),
rte_pktmbuf_data_len(op->sym->m_src));
-- 
2.5.5



[dpdk-dev] [PATCH 1/2] rte_kni: Fix documentation.

2016-05-23 Thread ALeX Wang
Sorry for the delayed reply, just sent V2~

On 18 May 2016 at 03:33, Ferruh Yigit  wrote:

> On 5/14/2016 7:22 PM, Alex Wang wrote:
> > From: Alex Wang 
> >
> > The 'mbufs' alloc/free descriptions for 'rte_kni_tx_burst()'
> > and 'rte_kni_rx_burst()' should be inverted.
> >
> > Signed-off-by: Alex Wang 
> > ---
> >  lib/librte_kni/rte_kni.h | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/librte_kni/rte_kni.h b/lib/librte_kni/rte_kni.h
> > index ef9faa9..25fa45e 100644
> > --- a/lib/librte_kni/rte_kni.h
> > +++ b/lib/librte_kni/rte_kni.h
> > @@ -161,8 +161,8 @@ extern int rte_kni_handle_request(struct rte_kni
> *kni);
> >  /**
> >   * Retrieve a burst of packets from a KNI interface. The retrieved
> packets are
> >   * stored in rte_mbuf structures whose pointers are supplied in the
> array of
> > - * mbufs, and the maximum number is indicated by num. It handles the
> freeing of
> > - * the mbufs in the free queue of KNI interface.
> > + * mbufs, and the maximum number is indicated by num. It handles
> allocating
> > + * the mbufs for KNI interface alloc queue.
> >   *
> >   * @param kni
> >   *  The KNI interface context.
> > @@ -180,8 +180,8 @@ extern unsigned rte_kni_rx_burst(struct rte_kni *kni,
> >  /**
> >   * Send a burst of packets to a KNI interface. The packets to be sent
> out are
> >   * stored in rte_mbuf structures whose pointers are supplied in the
> array of
> > - * mbufs, and the maximum number is indicated by num. It handles
> allocating
> > - * the mbufs for KNI interface alloc queue.
> > + * mbufs, and the maximum number is indicated by num. It handles the
> freeing of
> > + * the mbufs in the free queue of KNI interface.
> >   *
> >   * @param kni
> >   *  The KNI interface context.
> >
>
> Hi Alex,
>
> Can you please update the patch subject,
> - replace "rte_kni" tag with a "kni",
> - after space start with lowercase,
> - remove the "." at the end of the sentences,
> like:
> "kni: fix documentation"
> (these are defined in
>
> http://dpdk.org/doc/guides/contributing/patches.html#commit-messages-subject-line
> )
>
> Also can you please add a "Fixes" line, more details on:
> http://dpdk.org/doc/guides/contributing/patches.html#commit-messages-body
>
> Although this information converted into documentation, this is not
> really the documentation, and the patch title gives little information,
> if possible can you please add more information while keeping it around
> 50 chars limit.
>
> finally, patch content is OK.
>
> Thanks,
> ferruh
>
>


-- 
Alex Wang,
Open vSwitch developer


[dpdk-dev] [PATCH 1/2 v2] kni: fix inverted function comments

2016-05-23 Thread Ferruh Yigit
On 5/21/2016 8:25 AM, Alex Wang wrote:
> From: Alex Wang 
> 
> The 'mbufs' alloc/free descriptions for
> 'rte_kni_tx_burst()' and 'rte_kni_rx_burst()'
> should be inverted.
> 
> Fixes: 3fc5ca2 (kni: initial import)
> 
> Signed-off-by: Alex Wang 

Acked-by: Ferruh Yigit 



[dpdk-dev] [PATCH 2/2 v2] kni: Add documentation for the mempool capacity

2016-05-23 Thread Ferruh Yigit
On 5/21/2016 8:25 AM, Alex Wang wrote:
> From: Alex Wang 
> 
> Function like 'rte_kni_rx_burst()' keeps
> allocating 'MAX_MBUF_BURST_NUM' mbufs to
> kni fifo queue unless the queue's capacity
> ('KNI_FIFO_COUNT_MAX') is reached.  So, if
> the mempool is under-provisioned, user may
> run into "Out of Memory" logs from KNI code.
> This commit documents the need to provision
> mempool capacity of more than
> "2 x KNI_FIFO_COUNT_MAX" for each KNI interface.
> 
> Signed-off-by: Alex Wang 

Acked-by: Ferruh Yigit 



[dpdk-dev] [PATCH v2] vhost: add support for dynamic vhost PMD creation

2016-05-23 Thread Ferruh Yigit
On 5/23/2016 2:24 PM, Yuanhan Liu wrote:
> On Fri, May 20, 2016 at 11:37:47AM +0100, Bruce Richardson wrote:
>> On Thu, May 19, 2016 at 06:44:44PM +0200, Thomas Monjalon wrote:
>>> 2016-05-19 17:28, Ferruh Yigit:
 On 5/19/2016 9:33 AM, Thomas Monjalon wrote:
> 2016-05-18 18:10, Ferruh Yigit:
>> Add rte_eth_from_vhost() API to create vhost PMD dynamically from
>> applications.
>
> How is it different from rte_eth_dev_attach() calling rte_eal_vdev_init()?
>

 When used rte_eth_dev_attach(), application also needs to do:
 rte_eth_dev_configure()
 rte_eth_rx_queue_setup()
 rte_eth_tx_queue_setup()
 rte_eth_dev_start()

 rte_eth_from_vhost() does these internally, easier to use for applications.
>>>
>>> This argument is not sufficient.
>>> We are not going to add new APIs just for wrapping others.
>>
>> Why not - if there is a sufficient increase in developer usability by doing 
>> so?
>> Having one API that saves an app from having to call 5 other APIs looks like
>> something that should always be given fair consideration.
> 
> Good point. Judging that vhost is not the only virtual device we
> support, and it may also look reasonable to add something similar
> for others in future (say, IIRC, you proposed two more internally
> that also introduced similar APIs). So, instead of introducing a
> new API for each such vdev, may we introduce a common one? Say,
> a refined rte_eth_dev_attach(), including dev_configure(),
> queue_setup(), etc.
> 

This sounds good to me. If there is not objection, I will send a patch
and we can discuss based on patch.
Something like: rte_eth_dev_attach_and_setup()

Regards,
ferruh



[dpdk-dev] [PATCH 2/2 v2] kni: Add documentation for the mempool capacity

2016-05-23 Thread Ferruh Yigit
On 5/23/2016 6:00 PM, Ferruh Yigit wrote:
> On 5/21/2016 8:25 AM, Alex Wang wrote:
>> From: Alex Wang 
>>
>> Function like 'rte_kni_rx_burst()' keeps
>> allocating 'MAX_MBUF_BURST_NUM' mbufs to
>> kni fifo queue unless the queue's capacity
>> ('KNI_FIFO_COUNT_MAX') is reached.  So, if
>> the mempool is under-provisioned, user may
>> run into "Out of Memory" logs from KNI code.
>> This commit documents the need to provision
>> mempool capacity of more than
>> "2 x KNI_FIFO_COUNT_MAX" for each KNI interface.
>>
>> Signed-off-by: Alex Wang 
> 
> Acked-by: Ferruh Yigit 
> 

Hi Alex,

This is detail but I just recognized patch subject after tag starts with
uppercase. Would you mind sending another patch? You can keep my ack
with it.

Thanks,
ferruh


[dpdk-dev] [PATCH v4 4/9] librte_ether: make rte_eth_dev_get_port_by_name api public

2016-05-23 Thread Reshma Pattan
Converted rte_eth_dev_get_port_by_name to public API.

Signed-off-by: Reshma Pattan 
---
 lib/librte_ether/rte_ethdev.c  |  2 +-
 lib/librte_ether/rte_ethdev.h  | 15 +++
 lib/librte_ether/rte_ether_version.map |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 3ee5b9f..a50bb1e 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -427,7 +427,7 @@ rte_eth_dev_get_name_by_port(uint8_t port_id, char *name)
return 0;
 }

-static int
+int
 rte_eth_dev_get_port_by_name(const char *name, uint8_t *port_id)
 {
int i;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 106318f..b20f5cd 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -4283,6 +4283,21 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id,
  uint32_t mask,
  uint8_t en);

+/**
+* Get the port id from pci adrress or device name
+* Ex: :2:00.0 or vdev name eth_pcap0
+*
+* @param name
+*  pci address or name of the device
+* @param port_id
+*   pointer to port identifier of the device
+* @return
+*   - (0) if successful.
+*   - (-ENODEV) on failure.
+*/
+int
+rte_eth_dev_get_port_by_name(const char *name, uint8_t *port_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ether/rte_ether_version.map 
b/lib/librte_ether/rte_ether_version.map
index d06d648..512f38f 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -137,5 +137,6 @@ DPDK_16.07 {
global:

rte_eth_add_first_rx_callback;
+   rte_eth_dev_get_port_by_name;
rte_eth_dev_info_get;
 } DPDK_16.04;
-- 
2.5.0



[dpdk-dev] [PATCH v4 0/9] add packet capture framework

2016-05-23 Thread Reshma Pattan
This patchset include below changes

1)Changes to librte_ether.
2)New library librte_pdump added for packet capture framework.
3)New app/pdump tool added for packet capturing.
4)Test pmd changes done to initialize packet capture framework.
5)Documentation update.

1)librte_pdump
==
To support packet capturing on dpdk ethernet devices, a new library librte_pdump
is added.Users can develop their own packet capturing application using new 
library APIs.

Operation:
--
Pdump library provides APIs to support packet capturing on dpdk ethernet 
devices.
Library provides APIs to initialize the packet capture framework, enable/disable
the packet capture and un initialize the packet capture framework.

Pdump library works on server and client based model.

Sever is responsible for enabling/disabling the packet captures.
Clients are responsible for requesting enable/disable of the
packet captures.

As part of packet capture framework initialization, pthread and
the server socket is created. Only one server socket is allowed on the system.
As part of enabling/disabling the packet capture, client sockets are created
and multiple client sockets are allowed.
Who ever calls initialization first they will succeed with the initialization,
next subsequent calls of initialization are not allowed. So next users can only
request enabling/disabling the packet capture.

Applications using below APIs need to pass port/device_id, queue, mempool and
ring parameters. Library uses user provided ring and mempool to mirror the rx/tx
packets of the port for users. Users need to deque the rings and write the 
packets
to vdev(pcap/tuntap) to view the packets using any standard tools.

Note:
Mempool and Ring should be mc/mp supportable.
Mempool mbuf size should be big enough to handle the rx/tx packets of a port.

APIs:
-
rte_pdump_init()
rte_pdump_enable()
rte_pdump_enable_by_deviceid()
rte_pdump_disable()
rte_pdump_disable_by_deviceid()
rte_pdump_uninit()

2)app/pdump tool

Tool app/pdump is based on librte_pdump for packet capturing.
This tool by default runs as secondary process, and provides the support for
the command line options for packet capture.

./build/app/dpdk_pdump --
   --pdump '(port= | device_id=),
(queue=),
(rx-dev= |
 tx-dev=),
[ring-size=],
[mbuf-size=],
[total-num-mbufs=]'

Parameters inside the parenthesis represents the mandatory parameters.
Parameters inside the square brackets represents optional parameters.
User has to pass on packet capture parameters under --pdump parameters, 
multiples of
--pdump can be passed to capture packets on different port and queue 
combinations

Operation:
--
*Tool parse the user command line arguments,
creates the mempool, ring and the PCAP PMD vdev with 'tx_stream' as either
of the device passed in rx-dev|tx-dev parameters.

*Then calls the APIs of librte_pdump i.e. 
rte_pdump_enable()/rte_pdump_enable_by_deviceid()
to enable packet capturing on a specific port/device_id and queue by passing on
port|device_id, queue, mempool and ring info.

*Tool runs in while loop to dequeue the packets from the ring and write them to 
pcap device.

*Tool can be stopped using SIGINT, upon which tool calls
rte_pdump_disable()/rte_pdump_disable_by_deviceid() and free the allocated 
resources.

Note:
CONFIG_RTE_LIBRTE_PMD_PCAP flag should be set to yes to compile and run the 
pdump tool.

3)Test-pmd changes
==
Changes are done to test-pmd application to initialize/uninitialize the packet 
capture framework.
So app/pdump tool can be run to see packets of dpdk ports that are used by 
test-pmd.

Similarly any application which needs packet capture should call 
initialize/uninitialize apis of
librate_pdump and use pdump tool to start the capture.

4)Packet capture flow between pdump tool and librte_pdump
=
* Pdump tool (Secondary process) requests packet capture
for specific port|device_id and queue combinations.

*Library in secondary process context creates client socket and communicates
the port|device_id, queue, ring and mempool to server.

*Library initializes server in primary process 'test-pmd' context and serves 
client
request to enable ethernet rxtx call-backs for given port|device_id and queue.?

*Copy the rx/tx packets to passed mempool and enqueue the packets to ring for 
secondary process.

*Pdump tool will dequeue the packets from ring and writes them to PCAPMD vdev,
so ultimately packets will be seen on device passed in rx-dev|tx-dev.

*Once the pdump tool is terminated with SIGINT it will disable packet capturing.

*Library receives the disable packet capture request, communicate the info to 
server,
server will remove the ethernet rxtx call-backs.

*Packet captur

[dpdk-dev] [PATCH v4 1/9] librte_ether: protect add/remove of rxtx callbacks with spinlocks

2016-05-23 Thread Reshma Pattan
Added spinlocks around add/remove logic of rxtx callbacks to
avoid corruption of callback lists in multithreaded context.

Signed-off-by: Reshma Pattan 
---
 lib/librte_ether/rte_ethdev.c | 82 +--
 1 file changed, 40 insertions(+), 42 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a31018e..cf25d8a 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -77,6 +77,12 @@ static uint8_t nb_ports;
 /* spinlock for eth device callbacks */
 static rte_spinlock_t rte_eth_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;

+/* spinlock for add/remove rx callbacks */
+static rte_spinlock_t rte_eth_rx_cb_lock = RTE_SPINLOCK_INITIALIZER;
+
+/* spinlock for add/remove tx callbacks */
+static rte_spinlock_t rte_eth_tx_cb_lock = RTE_SPINLOCK_INITIALIZER;
+
 /* store statistics names and its offset in stats structure  */
 struct rte_eth_xstats_name_off {
char name[RTE_ETH_XSTATS_NAME_SIZE];
@@ -1639,7 +1645,6 @@ rte_eth_dev_set_rx_queue_stats_mapping(uint8_t port_id, 
uint16_t rx_queue_id,
STAT_QMAP_RX);
 }

-
 void
 rte_eth_dev_info_get(uint8_t port_id, struct rte_eth_dev_info *dev_info)
 {
@@ -2925,7 +2930,6 @@ rte_eth_add_rx_callback(uint8_t port_id, uint16_t 
queue_id,
rte_errno = EINVAL;
return NULL;
}
-
struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);

if (cb == NULL) {
@@ -2936,6 +2940,7 @@ rte_eth_add_rx_callback(uint8_t port_id, uint16_t 
queue_id,
cb->fn.rx = fn;
cb->param = user_param;

+   rte_spinlock_lock(&rte_eth_rx_cb_lock);
/* Add the callbacks in fifo order. */
struct rte_eth_rxtx_callback *tail =
rte_eth_devices[port_id].post_rx_burst_cbs[queue_id];
@@ -2948,6 +2953,7 @@ rte_eth_add_rx_callback(uint8_t port_id, uint16_t 
queue_id,
tail = tail->next;
tail->next = cb;
}
+   rte_spinlock_unlock(&rte_eth_rx_cb_lock);

return cb;
 }
@@ -2977,6 +2983,7 @@ rte_eth_add_tx_callback(uint8_t port_id, uint16_t 
queue_id,
cb->fn.tx = fn;
cb->param = user_param;

+   rte_spinlock_lock(&rte_eth_tx_cb_lock);
/* Add the callbacks in fifo order. */
struct rte_eth_rxtx_callback *tail =
rte_eth_devices[port_id].pre_tx_burst_cbs[queue_id];
@@ -2989,6 +2996,7 @@ rte_eth_add_tx_callback(uint8_t port_id, uint16_t 
queue_id,
tail = tail->next;
tail->next = cb;
}
+   rte_spinlock_unlock(&rte_eth_tx_cb_lock);

return cb;
 }
@@ -3007,29 +3015,24 @@ rte_eth_remove_rx_callback(uint8_t port_id, uint16_t 
queue_id,
}

struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-   struct rte_eth_rxtx_callback *cb = dev->post_rx_burst_cbs[queue_id];
-   struct rte_eth_rxtx_callback *prev_cb;
-
-   /* Reset head pointer and remove user cb if first in the list. */
-   if (cb == user_cb) {
-   dev->post_rx_burst_cbs[queue_id] = user_cb->next;
-   return 0;
-   }
-
-   /* Remove the user cb from the callback list. */
-   do {
-   prev_cb = cb;
-   cb = cb->next;
-
+   struct rte_eth_rxtx_callback *cb;
+   struct rte_eth_rxtx_callback **prev_cb;
+   int ret = -EINVAL;
+
+   rte_spinlock_lock(&rte_eth_rx_cb_lock);
+   prev_cb = &dev->post_rx_burst_cbs[queue_id];
+   for (; *prev_cb != NULL; prev_cb = &cb->next) {
+   cb = *prev_cb;
if (cb == user_cb) {
-   prev_cb->next = user_cb->next;
-   return 0;
+   /* Remove the user cb from the callback list. */
+   *prev_cb = cb->next;
+   ret = 0;
+   break;
}
+   }
+   rte_spinlock_unlock(&rte_eth_rx_cb_lock);

-   } while (cb != NULL);
-
-   /* Callback wasn't found. */
-   return -EINVAL;
+   return ret;
 }

 int
@@ -3046,29 +3049,24 @@ rte_eth_remove_tx_callback(uint8_t port_id, uint16_t 
queue_id,
}

struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-   struct rte_eth_rxtx_callback *cb = dev->pre_tx_burst_cbs[queue_id];
-   struct rte_eth_rxtx_callback *prev_cb;
-
-   /* Reset head pointer and remove user cb if first in the list. */
-   if (cb == user_cb) {
-   dev->pre_tx_burst_cbs[queue_id] = user_cb->next;
-   return 0;
-   }
-
-   /* Remove the user cb from the callback list. */
-   do {
-   prev_cb = cb;
-   cb = cb->next;
-
+   int ret = -EINVAL;
+   struct rte_eth_rxtx_callback *cb;
+   struct rte_eth_rxtx_callback **prev_cb;
+
+   rte_spinlock_lock(&rte_eth_tx_cb_lock);
+   prev_cb = &dev->pre_tx_burst_cbs[queue_id];
+   for (; *prev_cb != NULL; prev_

[dpdk-dev] [PATCH v4 9/9] doc: announce ABI change for rte_eth_dev_info structure

2016-05-23 Thread Reshma Pattan
New fields nb_rx_queues and nb_tx_queues will be added to
rte_eth_dev_info structure.
Changes to API rte_eth_dev_info_get() will be done to update
these new fields to rte_eth_dev_info object.

Signed-off-by: Reshma Pattan 
---
 doc/guides/rel_notes/deprecation.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index ad05eba..04316fb 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -57,3 +57,9 @@ Deprecation Notices
   a handle, like the way kernel exposes an fd to user for locating a
   specific file, and to keep all major structures internally, so that
   we are likely to be free from ABI violations in future.
+
+* A librte_ether public structure ``rte_eth_dev_info`` will be changed in 
16.07.
+  The proposed change will add new parameters ``nb_rx_queues``, 
``nb_tx_queues``
+  to the structure. These are the number of queues configured by software.
+  Modification to definition of ``rte_eth_dev_info_get()`` will be done
+  to update new parameters to ``rte_eth_dev_info`` object.
-- 
2.5.0



[dpdk-dev] [PATCH v4 6/9] app/pdump: add pdump tool for packet capturing

2016-05-23 Thread Reshma Pattan
New tool added for packet capturing on dpdk.
This tool supports command line options.
This tool runs as secondary process by default.

Command line supports various parameters to capture
the packets.

User should pass on a)port and queue (or) b)pci address
and queue (or) c)device name and queue to capture
the packets.

Users also need to pass on either pcap file name or
any linux iface, on to which packets captured from dpdk
ports will be sent on for the users to view using tcpdump.

Users have option to capture packets either a) in RX
direction, b)(or) in TX direction c)(or) from both the
directions.

User can pass on ring_size and mempool parameters using
command line, but these are optional parameters.
These are used to create ring and mempool objects for packet
mirroring from primary application to tool. If user doesn't
provide any values, default values will be used internally
for the creation of the ring and mempool.

Signed-off-by: Reshma Pattan 
---
 MAINTAINERS|   1 +
 app/Makefile   |   1 +
 app/pdump/Makefile |  45 +++
 app/pdump/main.c   | 888 +
 4 files changed, 935 insertions(+)
 create mode 100644 app/pdump/Makefile
 create mode 100644 app/pdump/main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 8656239..ae706b9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -436,6 +436,7 @@ F: doc/guides/sample_app_ug/packet_ordering.rst
 Pdump
 M: Reshma Pattan 
 F: lib/librte_pdump/
+F: app/pdump/

 Hierarchical scheduler
 M: Cristian Dumitrescu 
diff --git a/app/Makefile b/app/Makefile
index 1151e09..c593efa 100644
--- a/app/Makefile
+++ b/app/Makefile
@@ -37,5 +37,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += test-pipeline
 DIRS-$(CONFIG_RTE_TEST_PMD) += test-pmd
 DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline_test
 DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += proc_info
+DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += pdump

 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/app/pdump/Makefile b/app/pdump/Makefile
new file mode 100644
index 000..96bb4af
--- /dev/null
+++ b/app/pdump/Makefile
@@ -0,0 +1,45 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+APP = dpdk_pdump
+
+CFLAGS += $(WERROR_FLAGS)
+
+# all source are stored in SRCS-y
+
+SRCS-y := main.c
+
+# this application needs libraries first
+DEPDIRS-y += lib
+
+include $(RTE_SDK)/mk/rte.app.mk
diff --git a/app/pdump/main.c b/app/pdump/main.c
new file mode 100644
index 000..f507f3b
--- /dev/null
+++ b/app/pdump/main.c
@@ -0,0 +1,888 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from t

[dpdk-dev] [PATCH v4 3/9] librte_ether: add new fields to rte_eth_dev_info struct

2016-05-23 Thread Reshma Pattan
Add new fields to rte_eth_dev_info struct
New fields nb_rx_queues and nb_tx_queues are added to
rte_eth_dev_info structure.
Changes to API rte_eth_dev_info_get() are done to update
these new fields to rte_eth_dev_info object.

Signed-off-by: Reshma Pattan 
---
 lib/librte_ether/rte_ethdev.c  | 2 ++
 lib/librte_ether/rte_ethdev.h  | 3 +++
 lib/librte_ether/rte_ether_version.map | 1 +
 3 files changed, 6 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 6331759..3ee5b9f 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1666,6 +1666,8 @@ rte_eth_dev_info_get(uint8_t port_id, struct 
rte_eth_dev_info *dev_info)
(*dev->dev_ops->dev_infos_get)(dev, dev_info);
dev_info->pci_dev = dev->pci_dev;
dev_info->driver_name = dev->data->drv_name;
+   dev_info->nb_rx_queues = dev->data->nb_rx_queues;
+   dev_info->nb_tx_queues = dev->data->nb_tx_queues;
 }

 int
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 92b07a9..106318f 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -882,6 +882,9 @@ struct rte_eth_dev_info {
struct rte_eth_desc_lim rx_desc_lim;  /**< RX descriptors limits */
struct rte_eth_desc_lim tx_desc_lim;  /**< TX descriptors limits */
uint32_t speed_capa;  /**< Supported speeds bitmap (ETH_LINK_SPEED_). */
+   /** Configured number of rx/tx queues */
+   uint16_t nb_rx_queues; /**< Number of RX queues. */
+   uint16_t nb_tx_queues; /**< Number of TX queues. */
 };

 /**
diff --git a/lib/librte_ether/rte_ether_version.map 
b/lib/librte_ether/rte_ether_version.map
index c990b04..d06d648 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -137,4 +137,5 @@ DPDK_16.07 {
global:

rte_eth_add_first_rx_callback;
+   rte_eth_dev_info_get;
 } DPDK_16.04;
-- 
2.5.0



[dpdk-dev] [PATCH v4 8/9] doc: update doc for packet capture framework

2016-05-23 Thread Reshma Pattan
Added programmers guide for librte_pdump.
Added sample application guide for app/pdump application.
Updated release note for packet capture framework changes.

Signed-off-by: Reshma Pattan 
---
 MAINTAINERS |   3 +
 doc/guides/prog_guide/index.rst |   1 +
 doc/guides/prog_guide/pdump_library.rst | 106 +
 doc/guides/rel_notes/release_16_07.rst  |  11 +++
 doc/guides/sample_app_ug/index.rst  |   1 +
 doc/guides/sample_app_ug/pdump.rst  | 115 
 6 files changed, 237 insertions(+)
 create mode 100644 doc/guides/prog_guide/pdump_library.rst
 create mode 100644 doc/guides/sample_app_ug/pdump.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index ae706b9..8b00f41 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -437,6 +437,9 @@ Pdump
 M: Reshma Pattan 
 F: lib/librte_pdump/
 F: app/pdump/
+F: doc/guides/prog_guide/pdump_library.rst
+F: doc/guides/sample_app_ug/pdump.rst
+

 Hierarchical scheduler
 M: Cristian Dumitrescu 
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index b862d0c..4caf969 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -71,6 +71,7 @@ Programmer's Guide
 writing_efficient_code
 profile_app
 glossary
+pdump_library


 **Figures**
diff --git a/doc/guides/prog_guide/pdump_library.rst 
b/doc/guides/prog_guide/pdump_library.rst
new file mode 100644
index 000..8d9ef29
--- /dev/null
+++ b/doc/guides/prog_guide/pdump_library.rst
@@ -0,0 +1,106 @@
+..  BSD LICENSE
+Copyright(c) 2016 Intel Corporation. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+.. _Pdump_Library:
+
+pdump Library
+=
+
+The ``pdump`` library provides the framework for the packet capturing on DPDK.
+Library provides the below APIs to initialize the packet capture framework, to 
enable
+or disable the packet capture and to un initialize the packet capture 
framework.
+
+``rte_pdump_init()``:
+This API initializes the packet capture framework.
+
+``rte_pdump_enable()``:
+This API enables the packet capture on a given port and the queue.
+Note: filter option in the API is the place holder for the future enhancements.
+
+``rte_pdump_enable_by_deviceid()``:
+This API enables the packet capture on a given device id(``vdev name or pci 
address``) and the queue.
+Note: filter option in the API is the place holder for the future enhancements.
+
+``rte_pdump_disable()``:
+This API disables the packet capture on a given port and the queue.
+
+``rte_pdump_disable_by_deviceid()``:
+This API disables the packet capture on a given device id(``vdev name or pci 
address``) and the queue.
+
+``rte_pdump_uninit()``:
+This API un initializes the packet capture framework.
+
+
+Operation
+-
+
+The ``pdump`` library works on the server and the client based model. The 
sever is responsible for enabling or
+disabling the packet capture and the clients are responsible to request enable 
or disable the packet capture.
+
+The packet capture framework, as part of it's initialization, creates the 
pthread and creates the server socket in
+the pthread. The application who calls the framework initialization first, 
will have the server socket created and
+the further calls to the framework initialization by same application or other 
applications is not allowed i.e. only
+one server socket is allowed on the system. So the other applications, can 
only r

[dpdk-dev] [PATCH v4 2/9] librte_ether: add new api rte_eth_add_first_rx_callback

2016-05-23 Thread Reshma Pattan
Added new public api rte_eth_add_first_rx_callback to add given
callback as head of list.

Signed-off-by: Reshma Pattan 
---
 lib/librte_ether/rte_ethdev.c  | 35 ++
 lib/librte_ether/rte_ethdev.h  | 27 ++
 lib/librte_ether/rte_ether_version.map |  6 ++
 3 files changed, 68 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index cf25d8a..6331759 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -2959,6 +2959,41 @@ rte_eth_add_rx_callback(uint8_t port_id, uint16_t 
queue_id,
 }

 void *
+rte_eth_add_first_rx_callback(uint8_t port_id, uint16_t queue_id,
+   rte_rx_callback_fn fn, void *user_param)
+{
+#ifndef RTE_ETHDEV_RXTX_CALLBACKS
+   rte_errno = ENOTSUP;
+   return NULL;
+#endif
+   /* check input parameters */
+   if (!rte_eth_dev_is_valid_port(port_id) || fn == NULL ||
+   queue_id >= 
rte_eth_devices[port_id].data->nb_rx_queues) {
+   rte_errno = EINVAL;
+   return NULL;
+   }
+
+   struct rte_eth_rxtx_callback *cb = rte_zmalloc(NULL, sizeof(*cb), 0);
+
+   if (cb == NULL) {
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+
+   cb->fn.rx = fn;
+   cb->param = user_param;
+
+   rte_spinlock_lock(&rte_eth_rx_cb_lock);
+   /* Add the callbacks at fisrt position*/
+   cb->next = rte_eth_devices[port_id].post_rx_burst_cbs[queue_id];
+   rte_smp_wmb();
+   rte_eth_devices[port_id].post_rx_burst_cbs[queue_id] = cb;
+   rte_spinlock_unlock(&rte_eth_rx_cb_lock);
+
+   return cb;
+}
+
+void *
 rte_eth_add_tx_callback(uint8_t port_id, uint16_t queue_id,
rte_tx_callback_fn fn, void *user_param)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..92b07a9 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -3825,6 +3825,33 @@ int rte_eth_dev_get_dcb_info(uint8_t port_id,
 void *rte_eth_add_rx_callback(uint8_t port_id, uint16_t queue_id,
rte_rx_callback_fn fn, void *user_param);

+/*
+* Add a callback that must be called first on packet RX on a given port and 
queue.
+*
+* This API configures a first function to be called for each burst of
+* packets received on a given NIC port queue. The return value is a pointer
+* that can be used to later remove the callback using
+* rte_eth_remove_rx_callback().
+*
+* Multiple functions are called in the order that they are added.
+*
+* @param port_id
+*   The port identifier of the Ethernet device.
+* @param queue_id
+*   The queue on the Ethernet device on which the callback is to be added.
+* @param fn
+*   The callback function
+* @param user_param
+*   A generic pointer parameter which will be passed to each invocation of the
+*   callback function on this port and queue.
+*
+* @return
+*   NULL on error.
+*   On success, a pointer value which can later be used to remove the callback.
+*/
+void *rte_eth_add_first_rx_callback(uint8_t port_id, uint16_t queue_id,
+   rte_rx_callback_fn fn, void *user_param);
+
 /**
  * Add a callback to be called on packet TX on a given port and queue.
  *
diff --git a/lib/librte_ether/rte_ether_version.map 
b/lib/librte_ether/rte_ether_version.map
index 214ecc7..c990b04 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -132,3 +132,9 @@ DPDK_16.04 {
rte_eth_tx_buffer_set_err_callback;

 } DPDK_2.2;
+
+DPDK_16.07 {
+   global:
+
+   rte_eth_add_first_rx_callback;
+} DPDK_16.04;
-- 
2.5.0



[dpdk-dev] [PATCH v4 5/9] lib/librte_pdump: add new library for packet capturing support

2016-05-23 Thread Reshma Pattan
Added new library for packet capturing support.

Added public api rte_pdump_init, applications should call
this as part of their application setup to have packet
capturing framework ready.

Added public api rte_pdump_uninit to un initialize the packet
capturing framework.

Added public apis rte_pdump_enable and rte_pdump_disable to
enable and disable packet capturing on specific port and queue.

Added public apis rte_pdump_enable_by_deviceid and
rte_pdump_disable_by_deviceid to enable and disable packet
capturing on a specific device (pci address or name) and queue.

Signed-off-by: Reshma Pattan 
---
 MAINTAINERS|   4 +
 config/common_base |   5 +
 lib/Makefile   |   1 +
 lib/librte_pdump/Makefile  |  55 +++
 lib/librte_pdump/rte_pdump.c   | 816 +
 lib/librte_pdump/rte_pdump.h   | 186 
 lib/librte_pdump/rte_pdump_version.map |  12 +
 mk/rte.app.mk  |   1 +
 8 files changed, 1080 insertions(+)
 create mode 100644 lib/librte_pdump/Makefile
 create mode 100644 lib/librte_pdump/rte_pdump.c
 create mode 100644 lib/librte_pdump/rte_pdump.h
 create mode 100644 lib/librte_pdump/rte_pdump_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 9dd0738..8656239 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -433,6 +433,10 @@ F: app/test/test_reorder*
 F: examples/packet_ordering/
 F: doc/guides/sample_app_ug/packet_ordering.rst

+Pdump
+M: Reshma Pattan 
+F: lib/librte_pdump/
+
 Hierarchical scheduler
 M: Cristian Dumitrescu 
 F: lib/librte_sched/
diff --git a/config/common_base b/config/common_base
index 3535c6e..259bf0a 100644
--- a/config/common_base
+++ b/config/common_base
@@ -484,6 +484,11 @@ CONFIG_RTE_LIBRTE_DISTRIBUTOR=y
 CONFIG_RTE_LIBRTE_REORDER=y

 #
+# Compile the pdump library
+#
+CONFIG_RTE_LIBRTE_PDUMP=y
+
+#
 # Compile librte_port
 #
 CONFIG_RTE_LIBRTE_PORT=y
diff --git a/lib/Makefile b/lib/Makefile
index f254dba..ca7c02f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -57,6 +57,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PORT) += librte_port
 DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
 DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
 DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
+DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump

 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_pdump/Makefile b/lib/librte_pdump/Makefile
new file mode 100644
index 000..af81a28
--- /dev/null
+++ b/lib/librte_pdump/Makefile
@@ -0,0 +1,55 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_pdump.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+CFLAGS += -D_GNU_SOURCE
+
+EXPORT_MAP := rte_pdump_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_PDUMP) := rte_pdump.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_PDUMP)-include := rte_pdump.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += lib/librte_ether
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
new file mode 100644
index 000..

[dpdk-dev] [PATCH v4 7/9] app/test-pmd: add pdump initialization uninitialization

2016-05-23 Thread Reshma Pattan
Call rte_pdump_init and rte_pdump_uninit for packet
capturing initialization and uninitialization.

Signed-off-by: Reshma Pattan 
---
 app/test-pmd/testpmd.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9d11830..645bf50 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -75,6 +75,7 @@
 #ifdef RTE_LIBRTE_PMD_XENVIRT
 #include 
 #endif
+#include 

 #include "testpmd.h"

@@ -2023,6 +2024,8 @@ signal_handler(int signum)
if (signum == SIGINT || signum == SIGTERM) {
printf("\nSignal %d received, preparing to exit...\n",
signum);
+   /* uninitialize packet capture framework */
+   rte_pdump_uninit();
force_quit();
/* exit with the expected status */
signal(signum, SIG_DFL);
@@ -2043,6 +2046,9 @@ main(int argc, char** argv)
if (diag < 0)
rte_panic("Cannot init EAL\n");

+   /* initialize packet capture framework */
+   rte_pdump_init();
+
nb_ports = (portid_t) rte_eth_dev_count();
if (nb_ports == 0)
RTE_LOG(WARNING, EAL, "No probed ethernet devices\n");
-- 
2.5.0



[dpdk-dev] [PATCH v4 3/9] librte_ether: add new fields to rte_eth_dev_info struct

2016-05-23 Thread Stephen Hemminger
On Mon, 23 May 2016 22:38:26 +0100
Reshma Pattan  wrote:

> Add new fields to rte_eth_dev_info struct
> New fields nb_rx_queues and nb_tx_queues are added to
> rte_eth_dev_info structure.
> Changes to API rte_eth_dev_info_get() are done to update
> these new fields to rte_eth_dev_info object.
> 
> Signed-off-by: Reshma Pattan 

This is an ABI break because rte_dev_info_get will clobber the
the stack of the caller if the caller thinks dev_info is old size.


[dpdk-dev] [PATCH 2/2 v2] kni: Add documentation for the mempool capacity

2016-05-23 Thread Alex Wang
Sht, sorry for missing that, sending V3,

On Mon, May 23, 2016 at 10:10 AM, Ferruh Yigit 
wrote:

> On 5/23/2016 6:00 PM, Ferruh Yigit wrote:
> > On 5/21/2016 8:25 AM, Alex Wang wrote:
> >> From: Alex Wang 
> >>
> >> Function like 'rte_kni_rx_burst()' keeps
> >> allocating 'MAX_MBUF_BURST_NUM' mbufs to
> >> kni fifo queue unless the queue's capacity
> >> ('KNI_FIFO_COUNT_MAX') is reached.  So, if
> >> the mempool is under-provisioned, user may
> >> run into "Out of Memory" logs from KNI code.
> >> This commit documents the need to provision
> >> mempool capacity of more than
> >> "2 x KNI_FIFO_COUNT_MAX" for each KNI interface.
> >>
> >> Signed-off-by: Alex Wang 
> >
> > Acked-by: Ferruh Yigit 
> >
>
> Hi Alex,
>
> This is detail but I just recognized patch subject after tag starts with
> uppercase. Would you mind sending another patch? You can keep my ack
> with it.
>
> Thanks,
> ferruh
>


[dpdk-dev] [PATCH v2 00/11] enic counter fixes and Tx optimization

2016-05-23 Thread John Daley
The first 3 patches are related to drop counters. The remaining
patches make up a refactoring, cleanup and optimization of the Tx path.

Changes since v1 are:
 - subject line fixups after running check-git-log.sh. 
 - Errors reported from patchworks fixed. Note: ./scripts/checkpatches.pl was
   run locally before upstreaming v1 and the same errors were not caught.
   The local host had perl version v5.16.3.

John Daley (11):
  enic: fix Rx drop counters
  enic: drop bad packets and remove unused Rx error flag
  enic: count truncated packets
  enic: put Tx and Rx functions into same file
  enic: remove some unused functions in Tx path
  enic: streamline mbuf handling in Tx path
  enic: use Tx completion messages instead of descriptors
  enic: refactor Tx mbuf recycling
  enic: optimize the Tx function
  enic: remove unused files and functions and variables
  enic: add an enic assert macro

 drivers/net/enic/Makefile|   2 +-
 drivers/net/enic/base/enic_vnic_wq.h |  79 --
 drivers/net/enic/base/vnic_cq.h  |  44 
 drivers/net/enic/base/vnic_wq.c  |  80 ++
 drivers/net/enic/base/vnic_wq.h  | 118 ++---
 drivers/net/enic/enic.h  |  47 +++-
 drivers/net/enic/enic_ethdev.c   |  67 +
 drivers/net/enic/enic_main.c | 156 +---
 drivers/net/enic/enic_res.h  |  80 +-
 drivers/net/enic/enic_rx.c   | 351 -
 drivers/net/enic/enic_rxtx.c | 482 +++
 11 files changed, 634 insertions(+), 872 deletions(-)
 delete mode 100644 drivers/net/enic/base/enic_vnic_wq.h
 delete mode 100644 drivers/net/enic/enic_rx.c
 create mode 100644 drivers/net/enic/enic_rxtx.c

-- 
2.7.0



[dpdk-dev] [PATCH v2 01/11] enic: fix Rx drop counters

2016-05-23 Thread John Daley
rx_no_bufs is a hardware counter of packets dropped on the
interface due to no host buffers and should be used to update
r_stats->imissed counter instead of rx_nombuf.

Include rx_drop in ierrors. rx_drop is incremented if packets
arrive when the receive queue is disabled.

Add a structure and functions for initializing and clearing
software counters. Add count of Rx mbuf allocation failures
(rx_nombuf) as the first counter.

Fixes: fefed3d1e62c ("enic: new driver")
Signed-off-by: John Daley 
---
 drivers/net/enic/enic.h  |  7 +++
 drivers/net/enic/enic_main.c | 24 +---
 drivers/net/enic/enic_rx.c   |  5 +
 3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 09f3853..584d49b 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -91,6 +91,10 @@ struct enic_fdir {
struct enic_fdir_node *nodes[ENICPMD_FDIR_MAX];
 };

+struct enic_soft_stats {
+   rte_atomic64_t rx_nombuf;
+};
+
 /* Per-instance private data structure */
 struct enic {
struct enic *next;
@@ -133,6 +137,9 @@ struct enic {
/* interrupt resource */
struct vnic_intr intr;
unsigned int intr_count;
+
+   /* software counters */
+   struct enic_soft_stats soft_stats;
 };

 static inline unsigned int enic_cq_rq(__rte_unused struct enic *enic, unsigned 
int rq)
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index bbbe660..c002ef3 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -211,15 +211,30 @@ void enic_send_pkt(struct enic *enic, struct vnic_wq *wq,
  0 /*wrid*/);
 }

+static void enic_clear_soft_stats(struct enic *enic)
+{
+   struct enic_soft_stats *soft_stats = &enic->soft_stats;
+   rte_atomic64_clear(&soft_stats->rx_nombuf);
+}
+
+static void enic_init_soft_stats(struct enic *enic)
+{
+   struct enic_soft_stats *soft_stats = &enic->soft_stats;
+   rte_atomic64_init(&soft_stats->rx_nombuf);
+   enic_clear_soft_stats(enic);
+}
+
 void enic_dev_stats_clear(struct enic *enic)
 {
if (vnic_dev_stats_clear(enic->vdev))
dev_err(enic, "Error in clearing stats\n");
+   enic_clear_soft_stats(enic);
 }

 void enic_dev_stats_get(struct enic *enic, struct rte_eth_stats *r_stats)
 {
struct vnic_stats *stats;
+   struct enic_soft_stats *soft_stats;

if (vnic_dev_stats_dump(enic->vdev, &stats)) {
dev_err(enic, "Error in getting stats\n");
@@ -232,12 +247,13 @@ void enic_dev_stats_get(struct enic *enic, struct 
rte_eth_stats *r_stats)
r_stats->ibytes = stats->rx.rx_bytes_ok;
r_stats->obytes = stats->tx.tx_bytes_ok;

-   r_stats->ierrors = stats->rx.rx_errors;
+   r_stats->ierrors = stats->rx.rx_errors + stats->rx.rx_drop;
r_stats->oerrors = stats->tx.tx_errors;

-   r_stats->imissed = stats->rx.rx_drop;
+   r_stats->imissed = stats->rx.rx_no_bufs;

-   r_stats->rx_nombuf = stats->rx.rx_no_bufs;
+   soft_stats = &enic->soft_stats;
+   r_stats->rx_nombuf = rte_atomic64_read(&soft_stats->rx_nombuf);
 }

 void enic_del_mac_address(struct enic *enic)
@@ -795,6 +811,8 @@ int enic_setup_finish(struct enic *enic)
 {
int ret;

+   enic_init_soft_stats(enic);
+
ret = enic_set_rss_nic_cfg(enic);
if (ret) {
dev_err(enic, "Failed to config nic, aborting.\n");
diff --git a/drivers/net/enic/enic_rx.c b/drivers/net/enic/enic_rx.c
index f92f6bc..89c62ce 100644
--- a/drivers/net/enic/enic_rx.c
+++ b/drivers/net/enic/enic_rx.c
@@ -275,10 +275,7 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
/* allocate a new mbuf */
nmb = rte_mbuf_raw_alloc(rq->mp);
if (nmb == NULL) {
-   dev_err(enic, "RX mbuf alloc failed port=%u qid=%u",
-   enic->port_id, (unsigned)rq->index);
-   rte_eth_devices[enic->port_id].
-   data->rx_mbuf_alloc_failed++;
+   rte_atomic64_inc(&enic->soft_stats.rx_nombuf);
break;
}

-- 
2.7.0



[dpdk-dev] [PATCH v2 03/11] enic: count truncated packets

2016-05-23 Thread John Daley
Truncated packets occur on enic if an mbuf is not big enough to
receive it or there aren't enough mbufs if rx scatter is in use.
They show up as error packets but unlike other error packets (like
packets bad FCS) there are no nic drop counts incremented for them.
Truncated packets are calculated by subtracting hardware errors from
software errors. Note: this causes transient inaccuracies in the
ipackets count. Also, the length of truncated packets are counted
in ibytes even though truncated packets are dropped which can make
ibytes be slightly higher than it should be.

Signed-off-by: Nelson Escobar 
Signed-off-by: John Daley 
---
 drivers/net/enic/enic.h  |  1 +
 drivers/net/enic/enic_main.c | 21 +
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 584d49b..9b6f349 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -93,6 +93,7 @@ struct enic_fdir {

 struct enic_soft_stats {
rte_atomic64_t rx_nombuf;
+   rte_atomic64_t rx_packet_errors;
 };

 /* Per-instance private data structure */
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index c002ef3..e4ccc7d 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -215,12 +215,14 @@ static void enic_clear_soft_stats(struct enic *enic)
 {
struct enic_soft_stats *soft_stats = &enic->soft_stats;
rte_atomic64_clear(&soft_stats->rx_nombuf);
+   rte_atomic64_clear(&soft_stats->rx_packet_errors);
 }

 static void enic_init_soft_stats(struct enic *enic)
 {
struct enic_soft_stats *soft_stats = &enic->soft_stats;
rte_atomic64_init(&soft_stats->rx_nombuf);
+   rte_atomic64_init(&soft_stats->rx_packet_errors);
enic_clear_soft_stats(enic);
 }

@@ -234,14 +236,26 @@ void enic_dev_stats_clear(struct enic *enic)
 void enic_dev_stats_get(struct enic *enic, struct rte_eth_stats *r_stats)
 {
struct vnic_stats *stats;
-   struct enic_soft_stats *soft_stats;
+   struct enic_soft_stats *soft_stats = &enic->soft_stats;
+   int64_t rx_truncated;
+   uint64_t rx_packet_errors;

if (vnic_dev_stats_dump(enic->vdev, &stats)) {
dev_err(enic, "Error in getting stats\n");
return;
}

-   r_stats->ipackets = stats->rx.rx_frames_ok;
+   /* The number of truncated packets can only be calculated by
+* subtracting a hardware counter from error packets received by
+* the driver. Note: this causes transient inaccuracies in the
+* ipackets count. Also, the length of truncated packets are
+* counted in ibytes even though truncated packets are dropped
+* which can make ibytes be slightly higher than it should be.
+*/
+   rx_packet_errors = rte_atomic64_read(&soft_stats->rx_packet_errors);
+   rx_truncated = rx_packet_errors - stats->rx.rx_errors;
+
+   r_stats->ipackets = stats->rx.rx_frames_ok - rx_truncated;
r_stats->opackets = stats->tx.tx_frames_ok;

r_stats->ibytes = stats->rx.rx_bytes_ok;
@@ -250,9 +264,8 @@ void enic_dev_stats_get(struct enic *enic, struct 
rte_eth_stats *r_stats)
r_stats->ierrors = stats->rx.rx_errors + stats->rx.rx_drop;
r_stats->oerrors = stats->tx.tx_errors;

-   r_stats->imissed = stats->rx.rx_no_bufs;
+   r_stats->imissed = stats->rx.rx_no_bufs + rx_truncated;

-   soft_stats = &enic->soft_stats;
r_stats->rx_nombuf = rte_atomic64_read(&soft_stats->rx_nombuf);
 }

-- 
2.7.0



[dpdk-dev] [PATCH v2 02/11] enic: drop bad packets and remove unused Rx error flag

2016-05-23 Thread John Daley
Following the discussions from:
http://dpdk.org/ml/archives/dev/2015-July/021721.html
http://dpdk.org/ml/archives/dev/2016-April/038143.html

Remove the unused flag from enic driver. Also, the enic driver is
modified to drop bad packets.

Signed-off-by: Olivier Matz 
Signed-off-by: John Daley 
---
 drivers/net/enic/enic_rx.c | 39 +--
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/drivers/net/enic/enic_rx.c b/drivers/net/enic/enic_rx.c
index 89c62ce..c72a80a 100644
--- a/drivers/net/enic/enic_rx.c
+++ b/drivers/net/enic/enic_rx.c
@@ -134,20 +134,15 @@ enic_cq_rx_desc_n_bytes(struct cq_desc *cqd)
 }

 static inline uint8_t
-enic_cq_rx_to_pkt_err_flags(struct cq_desc *cqd, uint64_t *pkt_err_flags_out)
+enic_cq_rx_check_err(struct cq_desc *cqd)
 {
struct cq_enet_rq_desc *cqrd = (struct cq_enet_rq_desc *)cqd;
uint16_t bwflags;
-   int ret = 0;
-   uint64_t pkt_err_flags = 0;

bwflags = enic_cq_rx_desc_bwflags(cqrd);
-   if (unlikely(enic_cq_rx_desc_packet_error(bwflags))) {
-   pkt_err_flags = PKT_RX_MAC_ERR;
-   ret = 1;
-   }
-   *pkt_err_flags_out = pkt_err_flags;
-   return ret;
+   if (unlikely(enic_cq_rx_desc_packet_error(bwflags)))
+   return 1;
+   return 0;
 }

 /*
@@ -243,7 +238,7 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
struct enic *enic = vnic_dev_priv(rq->vdev);
unsigned int rx_id;
struct rte_mbuf *nmb, *rxmb;
-   uint16_t nb_rx = 0;
+   uint16_t nb_rx = 0, nb_err = 0;
uint16_t nb_hold;
struct vnic_cq *cq;
volatile struct cq_desc *cqd_ptr;
@@ -259,7 +254,6 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
volatile struct rq_enet_desc *rqd_ptr;
dma_addr_t dma_addr;
struct cq_desc cqd;
-   uint64_t ol_err_flags;
uint8_t packet_error;

/* Check for pkts available */
@@ -280,7 +274,7 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
}

/* A packet error means descriptor and data are untrusted */
-   packet_error = enic_cq_rx_to_pkt_err_flags(&cqd, &ol_err_flags);
+   packet_error = enic_cq_rx_check_err(&cqd);

/* Get the mbuf to return and replace with one just allocated */
rxmb = rq->mbuf_ring[rx_id];
@@ -307,20 +301,21 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
rqd_ptr->length_type = cpu_to_le16(nmb->buf_len
   - RTE_PKTMBUF_HEADROOM);

+   /* Drop incoming bad packet */
+   if (unlikely(packet_error)) {
+   rte_pktmbuf_free(rxmb);
+   nb_err++;
+   continue;
+   }
+
/* Fill in the rest of the mbuf */
rxmb->data_off = RTE_PKTMBUF_HEADROOM;
rxmb->nb_segs = 1;
rxmb->next = NULL;
rxmb->port = enic->port_id;
-   if (!packet_error) {
-   rxmb->pkt_len = enic_cq_rx_desc_n_bytes(&cqd);
-   rxmb->packet_type = enic_cq_rx_flags_to_pkt_type(&cqd);
-   enic_cq_rx_to_pkt_flags(&cqd, rxmb);
-   } else {
-   rxmb->pkt_len = 0;
-   rxmb->packet_type = 0;
-   rxmb->ol_flags = 0;
-   }
+   rxmb->pkt_len = enic_cq_rx_desc_n_bytes(&cqd);
+   rxmb->packet_type = enic_cq_rx_flags_to_pkt_type(&cqd);
+   enic_cq_rx_to_pkt_flags(&cqd, rxmb);
rxmb->data_len = rxmb->pkt_len;

/* prefetch mbuf data for caller */
@@ -331,7 +326,7 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
rx_pkts[nb_rx++] = rxmb;
}

-   nb_hold += nb_rx;
+   nb_hold += nb_rx + nb_err;
cq->to_clean = rx_id;

if (nb_hold > rq->rx_free_thresh) {
-- 
2.7.0



[dpdk-dev] [PATCH v2 11/11] enic: add an enic assert macro

2016-05-23 Thread John Daley
Add an ASSERT macro for the enic driver which is enabled when the log
level is >= RTE_LOG_DEBUG. Assert that number of mbufs to return to
the pool in the Tx function is never greater than the max allowed.

Signed-off-by: John Daley 
---
 drivers/net/enic/enic.h  | 12 
 drivers/net/enic/enic_rxtx.c |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 4eb28ee..4b76e6d 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -187,6 +187,18 @@ enic_ring_incr(uint32_t n_descriptors, uint32_t idx)
return idx;
 }

+#if RTE_LOG_LEVEL >= RTE_LOG_DEBUG
+#define ENIC_ASSERT(cond)  \
+   do {\
+   if (unlikely(!(cond))) {\
+   rte_panic("line %d\tassert \"" #cond "\""   \
+   "failed\n", __LINE__);  \
+   }   \
+   } while (0)
+#else
+#define ENIC_ASSERT(cond) do {} while (0)
+#endif
+
 extern void enic_fdir_stats_get(struct enic *enic,
struct rte_eth_fdir_stats *stats);
 extern int enic_fdir_add_fltr(struct enic *enic,
diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c
index a04ebd0..7527bce 100644
--- a/drivers/net/enic/enic_rxtx.c
+++ b/drivers/net/enic/enic_rxtx.c
@@ -343,6 +343,7 @@ static inline void enic_free_wq_bufs(struct vnic_wq *wq, 
u16 completed_index)
buf = &wq->bufs[tail_idx];
m = (struct rte_mbuf *)(buf->mb);
if (likely(m->pool == pool)) {
+   ENIC_ASSERT(nb_free < ENIC_MAX_WQ_DESCS);
free[nb_free++] = m;
} else {
rte_mempool_put_bulk(pool, (void *)free, nb_free);
-- 
2.7.0



[dpdk-dev] [PATCH v2 05/11] enic: remove some unused functions in Tx path

2016-05-23 Thread John Daley
Functions existed which were never called. Removed them. Also
rename the 'pmd' from the name of the Tx function to improve clarity.

Signed-off-by: John Daley 
---
 drivers/net/enic/base/vnic_wq.h | 45 
 drivers/net/enic/enic.h |  5 ++-
 drivers/net/enic/enic_ethdev.c  |  2 +-
 drivers/net/enic/enic_res.h | 78 -
 drivers/net/enic/enic_rxtx.c|  2 +-
 5 files changed, 4 insertions(+), 128 deletions(-)

diff --git a/drivers/net/enic/base/vnic_wq.h b/drivers/net/enic/base/vnic_wq.h
index c23de62..d8660e4 100644
--- a/drivers/net/enic/base/vnic_wq.h
+++ b/drivers/net/enic/base/vnic_wq.h
@@ -191,51 +191,6 @@ static inline u64 vnic_cached_posted_index(dma_addr_t 
addr, unsigned int len,
PI_PREFETCH_ADDR_MASK) << PI_PREFETCH_ADDR_OFF);
 }

-static inline void vnic_wq_post(struct vnic_wq *wq,
-   void *os_buf, dma_addr_t dma_addr,
-   unsigned int len, int sop, int eop,
-   uint8_t desc_skip_cnt, uint8_t cq_entry,
-   uint8_t compressed_send, uint64_t wrid)
-{
-   struct vnic_wq_buf *buf = wq->to_use;
-
-   buf->sop = sop;
-   buf->cq_entry = cq_entry;
-   buf->compressed_send = compressed_send;
-   buf->desc_skip_cnt = desc_skip_cnt;
-   buf->os_buf = os_buf;
-   buf->dma_addr = dma_addr;
-   buf->len = len;
-   buf->wr_id = wrid;
-
-   buf = buf->next;
-   if (eop) {
-#ifdef DO_PREFETCH
-   uint64_t wr = vnic_cached_posted_index(dma_addr, len,
-   buf->index);
-#endif
-   /* Adding write memory barrier prevents compiler and/or CPU
-* reordering, thus avoiding descriptor posting before
-* descriptor is initialized. Otherwise, hardware can read
-* stale descriptor fields.
-*/
-   wmb();
-#ifdef DO_PREFETCH
-   /* Intel chipsets seem to limit the rate of PIOs that we can
-* push on the bus.  Thus, it is very important to do a single
-* 64 bit write here.  With two 32-bit writes, my maximum
-* pkt/sec rate was cut almost in half. -AJF
-*/
-   iowrite64((uint64_t)wr, &wq->ctrl->posted_index);
-#else
-   iowrite32(buf->index, &wq->ctrl->posted_index);
-#endif
-   }
-   wq->to_use = buf;
-
-   wq->ring.desc_avail -= desc_skip_cnt;
-}
-
 static inline void vnic_wq_service(struct vnic_wq *wq,
struct cq_desc *cq_desc, u16 completed_index,
void (*buf_service)(struct vnic_wq *wq,
diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 62a8c12..5b58a65 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -207,8 +207,7 @@ extern int enic_clsf_init(struct enic *enic);
 extern void enic_clsf_destroy(struct enic *enic);
 uint16_t enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);
-
-uint16_t enicpmd_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
-  uint16_t nb_pkts);
+uint16_t enic_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+  uint16_t nb_pkts);
 void enic_free_wq_buf(__rte_unused struct vnic_wq *wq, struct vnic_wq_buf 
*buf);
 #endif /* _ENIC_H_ */
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index fab8124..697ff82 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -577,7 +577,7 @@ static int eth_enicpmd_dev_init(struct rte_eth_dev *eth_dev)
enic->rte_dev = eth_dev;
eth_dev->dev_ops = &enicpmd_eth_dev_ops;
eth_dev->rx_pkt_burst = &enic_recv_pkts;
-   eth_dev->tx_pkt_burst = &enicpmd_xmit_pkts;
+   eth_dev->tx_pkt_burst = &enic_xmit_pkts;

pdev = eth_dev->pci_dev;
rte_eth_copy_pci_info(eth_dev, pdev);
diff --git a/drivers/net/enic/enic_res.h b/drivers/net/enic/enic_res.h
index 00fa71d..955db71 100644
--- a/drivers/net/enic/enic_res.h
+++ b/drivers/net/enic/enic_res.h
@@ -57,84 +57,6 @@

 #define ENIC_SETTING(enic, f) ((enic->config.flags & VENETF_##f) ? 1 : 0)

-static inline void enic_queue_wq_desc_ex(struct vnic_wq *wq,
-   void *os_buf, dma_addr_t dma_addr, unsigned int len,
-   unsigned int mss_or_csum_offset, unsigned int hdr_len,
-   int vlan_tag_insert, unsigned int vlan_tag,
-   int offload_mode, int cq_entry, int sop, int eop, int loopback)
-{
-   struct wq_enet_desc *desc = vnic_wq_next_desc(wq);
-   u8 desc_skip_cnt = 1;
-   u8 compressed_send = 0;
-   u64 wrid = 0;
-
-   wq_enet_desc_enc(desc,
-   (u64)dma_addr | VNIC_PADDR_TARGET,
-   (u16)len,
-   (u16)mss_or_csum_offset,
-   (u16)hdr_len, (u8)offload_mode,
-   (u8)eop, (u8)cq_entry,
-   0, /* fcoe_encap */
-   (u8)vlan_tag_insert,
-   (u16)vlan_tag,
- 

[dpdk-dev] [PATCH v2 09/11] enic: optimize the Tx function

2016-05-23 Thread John Daley
Reduce host CPU overhead of Tx packet processing:
* Use local variables inside per packet loop instead of fields in structs.
* Factor book keeping and conditionals out of the per packet loop where
  possible.
* Post buffers to the nic at a maximum of every 64 packets

Signed-off-by: Nelson Escobar 
Signed-off-by: John Daley 
---
 drivers/net/enic/base/vnic_wq.h |   1 +
 drivers/net/enic/enic_res.h |   2 +-
 drivers/net/enic/enic_rxtx.c| 167 +++-
 3 files changed, 83 insertions(+), 87 deletions(-)

diff --git a/drivers/net/enic/base/vnic_wq.h b/drivers/net/enic/base/vnic_wq.h
index 689b81c..7a66813 100644
--- a/drivers/net/enic/base/vnic_wq.h
+++ b/drivers/net/enic/base/vnic_wq.h
@@ -67,6 +67,7 @@ struct vnic_wq_ctrl {

 /* 16 bytes */
 struct vnic_wq_buf {
+   struct rte_mempool *pool;
void *mb;
 };

diff --git a/drivers/net/enic/enic_res.h b/drivers/net/enic/enic_res.h
index 955db71..3c8e303 100644
--- a/drivers/net/enic/enic_res.h
+++ b/drivers/net/enic/enic_res.h
@@ -53,7 +53,7 @@

 #define ENIC_NON_TSO_MAX_DESC  16
 #define ENIC_DEFAULT_RX_FREE_THRESH32
-#define ENIC_TX_POST_THRESH(ENIC_MIN_WQ_DESCS / 2)
+#define ENIC_TX_XMIT_MAX   64

 #define ENIC_SETTING(enic, f) ((enic->config.flags & VENETF_##f) ? 1 : 0)

diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c
index ec8d90a..ba15670 100644
--- a/drivers/net/enic/enic_rxtx.c
+++ b/drivers/net/enic/enic_rxtx.c
@@ -374,114 +374,109 @@ unsigned int enic_cleanup_wq(__rte_unused struct enic 
*enic, struct vnic_wq *wq)
return 0;
 }

-void enic_post_wq_index(struct vnic_wq *wq)
-{
-   enic_vnic_post_wq_index(wq);
-}
-
-void enic_send_pkt(struct enic *enic, struct vnic_wq *wq,
-  struct rte_mbuf *tx_pkt, unsigned short len,
-  uint8_t sop, uint8_t eop, uint8_t cq_entry,
-  uint16_t ol_flags, uint16_t vlan_tag)
-{
-   struct wq_enet_desc *desc, *descs;
-   uint16_t mss = 0;
-   uint8_t vlan_tag_insert = 0;
-   uint64_t bus_addr = (dma_addr_t)
-   (tx_pkt->buf_physaddr + tx_pkt->data_off);
-
-   descs = (struct wq_enet_desc *)wq->ring.descs;
-   desc = descs + wq->head_idx;
-
-   if (sop) {
-   if (ol_flags & PKT_TX_VLAN_PKT)
-   vlan_tag_insert = 1;
-
-   if (enic->hw_ip_checksum) {
-   if (ol_flags & PKT_TX_IP_CKSUM)
-   mss |= ENIC_CALC_IP_CKSUM;
-
-   if (ol_flags & PKT_TX_TCP_UDP_CKSUM)
-   mss |= ENIC_CALC_TCP_UDP_CKSUM;
-   }
-   }
-
-   wq_enet_desc_enc(desc,
-   bus_addr,
-   len,
-   mss,
-   0 /* header_length */,
-   0 /* offload_mode WQ_ENET_OFFLOAD_MODE_CSUM */,
-   eop,
-   cq_entry,
-   0 /* fcoe_encap */,
-   vlan_tag_insert,
-   vlan_tag,
-   0 /* loopback */);
-
-   enic_vnic_post_wq(wq, (void *)tx_pkt, cq_entry);
-}
-
 uint16_t enic_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts)
 {
uint16_t index;
-   unsigned int frags;
-   unsigned int pkt_len;
-   unsigned int seg_len;
-   unsigned int inc_len;
+   unsigned int pkt_len, data_len;
unsigned int nb_segs;
-   struct rte_mbuf *tx_pkt, *next_tx_pkt;
+   struct rte_mbuf *tx_pkt;
struct vnic_wq *wq = (struct vnic_wq *)tx_queue;
struct enic *enic = vnic_dev_priv(wq->vdev);
unsigned short vlan_id;
unsigned short ol_flags;
-   uint8_t last_seg, eop;
-   unsigned int host_tx_descs = 0;
+   unsigned int wq_desc_avail;
+   int head_idx;
+   struct vnic_wq_buf *buf;
+   unsigned int hw_ip_cksum_enabled;
+   unsigned int desc_count;
+   struct wq_enet_desc *descs, *desc_p, desc_tmp;
+   uint16_t mss;
+   uint8_t vlan_tag_insert;
+   uint8_t eop;
+   uint64_t bus_addr;

+   enic_cleanup_wq(enic, wq);
+   wq_desc_avail = vnic_wq_desc_avail(wq);
+   head_idx = wq->head_idx;
+   desc_count = wq->ring.desc_count;
+
+   nb_pkts = RTE_MIN(nb_pkts, ENIC_TX_XMIT_MAX);
+
+   hw_ip_cksum_enabled = enic->hw_ip_checksum;
for (index = 0; index < nb_pkts; index++) {
tx_pkt = *tx_pkts++;
-   inc_len = 0;
nb_segs = tx_pkt->nb_segs;
-   if (nb_segs > vnic_wq_desc_avail(wq)) {
+   if (nb_segs > wq_desc_avail) {
if (index > 0)
-   enic_post_wq_index(wq);
-
-   /* wq cleanup and try again */
-   if (!enic_cleanup_wq(enic, wq) ||
-   (nb_segs > vnic_wq_desc_avail(wq))) {
-   return index;
-   }
+

[dpdk-dev] [PATCH v2 04/11] enic: put Tx and Rx functions into same file

2016-05-23 Thread John Daley
Signed-off-by: John Daley 
---
 drivers/net/enic/Makefile  |   2 +-
 drivers/net/enic/enic.h|   3 +
 drivers/net/enic/enic_ethdev.c |  65 --
 drivers/net/enic/enic_main.c   |  82 +--
 drivers/net/enic/enic_rx.c | 343 -
 drivers/net/enic/enic_rxtx.c   | 481 +
 6 files changed, 486 insertions(+), 490 deletions(-)
 delete mode 100644 drivers/net/enic/enic_rx.c
 create mode 100644 drivers/net/enic/enic_rxtx.c

diff --git a/drivers/net/enic/Makefile b/drivers/net/enic/Makefile
index f316274..3926b79 100644
--- a/drivers/net/enic/Makefile
+++ b/drivers/net/enic/Makefile
@@ -53,7 +53,7 @@ VPATH += $(SRCDIR)/src
 #
 SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_ethdev.c
 SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_main.c
-SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_rx.c
+SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_clsf.c
 SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_res.c
 SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += base/vnic_cq.c
diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 9b6f349..62a8c12 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -208,4 +208,7 @@ extern void enic_clsf_destroy(struct enic *enic);
 uint16_t enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

+uint16_t enicpmd_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
+  uint16_t nb_pkts);
+void enic_free_wq_buf(__rte_unused struct vnic_wq *wq, struct vnic_wq_buf 
*buf);
 #endif /* _ENIC_H_ */
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index 6bea940..fab8124 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -519,71 +519,6 @@ static void enicpmd_remove_mac_addr(struct rte_eth_dev 
*eth_dev, __rte_unused ui
enic_del_mac_address(enic);
 }

-
-static uint16_t enicpmd_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
-   uint16_t nb_pkts)
-{
-   uint16_t index;
-   unsigned int frags;
-   unsigned int pkt_len;
-   unsigned int seg_len;
-   unsigned int inc_len;
-   unsigned int nb_segs;
-   struct rte_mbuf *tx_pkt, *next_tx_pkt;
-   struct vnic_wq *wq = (struct vnic_wq *)tx_queue;
-   struct enic *enic = vnic_dev_priv(wq->vdev);
-   unsigned short vlan_id;
-   unsigned short ol_flags;
-   uint8_t last_seg, eop;
-   unsigned int host_tx_descs = 0;
-
-   for (index = 0; index < nb_pkts; index++) {
-   tx_pkt = *tx_pkts++;
-   inc_len = 0;
-   nb_segs = tx_pkt->nb_segs;
-   if (nb_segs > vnic_wq_desc_avail(wq)) {
-   if (index > 0)
-   enic_post_wq_index(wq);
-
-   /* wq cleanup and try again */
-   if (!enic_cleanup_wq(enic, wq) ||
-   (nb_segs > vnic_wq_desc_avail(wq))) {
-   return index;
-   }
-   }
-
-   pkt_len = tx_pkt->pkt_len;
-   vlan_id = tx_pkt->vlan_tci;
-   ol_flags = tx_pkt->ol_flags;
-   for (frags = 0; inc_len < pkt_len; frags++) {
-   if (!tx_pkt)
-   break;
-   next_tx_pkt = tx_pkt->next;
-   seg_len = tx_pkt->data_len;
-   inc_len += seg_len;
-
-   host_tx_descs++;
-   last_seg = 0;
-   eop = 0;
-   if ((pkt_len == inc_len) || !next_tx_pkt) {
-   eop = 1;
-   /* post if last packet in batch or > thresh */
-   if ((index == (nb_pkts - 1)) ||
-  (host_tx_descs > ENIC_TX_POST_THRESH)) {
-   last_seg = 1;
-   host_tx_descs = 0;
-   }
-   }
-   enic_send_pkt(enic, wq, tx_pkt, (unsigned short)seg_len,
- !frags, eop, last_seg, ol_flags, vlan_id);
-   tx_pkt = next_tx_pkt;
-   }
-   }
-
-   enic_cleanup_wq(enic, wq);
-   return index;
-}
-
 static const struct eth_dev_ops enicpmd_eth_dev_ops = {
.dev_configure= enicpmd_dev_configure,
.dev_start= enicpmd_dev_start,
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index e4ccc7d..f41ef86 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -58,7 +58,6 @@
 #include "vnic_cq.h"
 #include "vnic_intr.h"
 #include "vnic_nic.h"
-#include "enic_vnic_wq.h"

 static inline int enic_is_sriov_vf(struct enic *enic)
 {
@@ -104,7 +103,7 @@ void enic_set_hdr_split_

[dpdk-dev] [PATCH v2 06/11] enic: streamline mbuf handling in Tx path

2016-05-23 Thread John Daley
The list of mbufs held by the driver on Tx was allocated in chunks
(a hold-over from the enic kernel mode driver). The structure used
next pointers across chunks which led to cache misses.

Allocate the array used to hold mbufs in flight on Tx with
rte_zmalloc_socket(). Remove unnecessary fields from the structure
and use head and tail pointers instead of next pointers.

Signed-off-by: John Daley 
---
 drivers/net/enic/base/enic_vnic_wq.h | 26 +++--
 drivers/net/enic/base/vnic_wq.c  | 75 +---
 drivers/net/enic/base/vnic_wq.h  | 56 ++-
 drivers/net/enic/enic.h  | 24 
 drivers/net/enic/enic_main.c |  4 +-
 drivers/net/enic/enic_rxtx.c | 23 +++
 6 files changed, 75 insertions(+), 133 deletions(-)

diff --git a/drivers/net/enic/base/enic_vnic_wq.h 
b/drivers/net/enic/base/enic_vnic_wq.h
index b019109..55c5664 100644
--- a/drivers/net/enic/base/enic_vnic_wq.h
+++ b/drivers/net/enic/base/enic_vnic_wq.h
@@ -40,37 +40,21 @@

 static inline void enic_vnic_post_wq_index(struct vnic_wq *wq)
 {
-   struct vnic_wq_buf *buf = wq->to_use;
-
/* Adding write memory barrier prevents compiler and/or CPU
 * reordering, thus avoiding descriptor posting before
 * descriptor is initialized. Otherwise, hardware can read
 * stale descriptor fields.
*/
wmb();
-   iowrite32(buf->index, &wq->ctrl->posted_index);
+   iowrite32(wq->head_idx, &wq->ctrl->posted_index);
 }

 static inline void enic_vnic_post_wq(struct vnic_wq *wq,
-void *os_buf, dma_addr_t dma_addr,
-unsigned int len, int sop,
-uint8_t desc_skip_cnt, uint8_t cq_entry,
-uint8_t compressed_send, uint64_t wrid)
+void *os_buf, uint8_t cq_entry)
 {
-   struct vnic_wq_buf *buf = wq->to_use;
-
-   buf->sop = sop;
-   buf->cq_entry = cq_entry;
-   buf->compressed_send = compressed_send;
-   buf->desc_skip_cnt = desc_skip_cnt;
-   buf->os_buf = os_buf;
-   buf->dma_addr = dma_addr;
-   buf->len = len;
-   buf->wr_id = wrid;
-
-   buf = buf->next;
-   wq->ring.desc_avail -= desc_skip_cnt;
-   wq->to_use = buf;
+   struct vnic_wq_buf *buf = &wq->bufs[wq->head_idx];
+   buf->mb = os_buf;
+   wq->head_idx = enic_ring_incr(wq->ring.desc_count, wq->head_idx);

if (cq_entry)
enic_vnic_post_wq_index(wq);
diff --git a/drivers/net/enic/base/vnic_wq.c b/drivers/net/enic/base/vnic_wq.c
index a3ef417..ab81c7e 100644
--- a/drivers/net/enic/base/vnic_wq.c
+++ b/drivers/net/enic/base/vnic_wq.c
@@ -59,71 +59,30 @@ int vnic_wq_alloc_ring(struct vnic_dev *vdev, struct 
vnic_wq *wq,

 static int vnic_wq_alloc_bufs(struct vnic_wq *wq)
 {
-   struct vnic_wq_buf *buf;
-   unsigned int i, j, count = wq->ring.desc_count;
-   unsigned int blks = VNIC_WQ_BUF_BLKS_NEEDED(count);
-
-   for (i = 0; i < blks; i++) {
-   wq->bufs[i] = kzalloc(VNIC_WQ_BUF_BLK_SZ(count), GFP_ATOMIC);
-   if (!wq->bufs[i])
-   return -ENOMEM;
-   }
-
-   for (i = 0; i < blks; i++) {
-   buf = wq->bufs[i];
-   for (j = 0; j < VNIC_WQ_BUF_BLK_ENTRIES(count); j++) {
-   buf->index = i * VNIC_WQ_BUF_BLK_ENTRIES(count) + j;
-   buf->desc = (u8 *)wq->ring.descs +
-   wq->ring.desc_size * buf->index;
-   if (buf->index + 1 == count) {
-   buf->next = wq->bufs[0];
-   break;
-   } else if (j + 1 == VNIC_WQ_BUF_BLK_ENTRIES(count)) {
-   buf->next = wq->bufs[i + 1];
-   } else {
-   buf->next = buf + 1;
-   buf++;
-   }
-   }
-   }
-
-   wq->to_use = wq->to_clean = wq->bufs[0];
-
+   unsigned int count = wq->ring.desc_count;
+   /* Allocate the mbuf ring */
+   wq->bufs = (struct vnic_wq_buf *)rte_zmalloc_socket("wq->bufs",
+   sizeof(struct vnic_wq_buf) * count,
+   RTE_CACHE_LINE_SIZE, wq->socket_id);
+   wq->head_idx = 0;
+   wq->tail_idx = 0;
+   if (wq->bufs == NULL)
+   return -ENOMEM;
return 0;
 }

 void vnic_wq_free(struct vnic_wq *wq)
 {
struct vnic_dev *vdev;
-   unsigned int i;

vdev = wq->vdev;

vnic_dev_free_desc_ring(vdev, &wq->ring);

-   for (i = 0; i < VNIC_WQ_BUF_BLKS_MAX; i++) {
-   if (wq->bufs[i]) {
-   kfree(wq->bufs[i]);
-   wq->bufs[i] = NULL;
-   }
-   }
-
+   rte_free(wq->bufs);
wq->ctrl = NULL;
 }

-int

[dpdk-dev] [PATCH v2 07/11] enic: use Tx completion messages instead of descriptors

2016-05-23 Thread John Daley
The NIC can either DMA a separate completion message for each
completed send or periodically just DMA an index of the last
completed send. Switch to the second method which improves
cache locality and performance.

Signed-off-by: John Daley 
---
 drivers/net/enic/base/vnic_wq.c |  1 +
 drivers/net/enic/base/vnic_wq.h |  3 +++
 drivers/net/enic/enic_main.c| 43 -
 drivers/net/enic/enic_rxtx.c| 11 +++
 4 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/net/enic/base/vnic_wq.c b/drivers/net/enic/base/vnic_wq.c
index ab81c7e..cfef1af 100644
--- a/drivers/net/enic/base/vnic_wq.c
+++ b/drivers/net/enic/base/vnic_wq.c
@@ -142,6 +142,7 @@ void vnic_wq_init(struct vnic_wq *wq, unsigned int cq_index,
vnic_wq_init_start(wq, cq_index, 0, 0,
error_interrupt_enable,
error_interrupt_offset);
+   wq->last_completed_index = 0;
 }

 void vnic_wq_error_out(struct vnic_wq *wq, unsigned int error)
diff --git a/drivers/net/enic/base/vnic_wq.h b/drivers/net/enic/base/vnic_wq.h
index a6759f5..fe46bb4 100644
--- a/drivers/net/enic/base/vnic_wq.h
+++ b/drivers/net/enic/base/vnic_wq.h
@@ -38,6 +38,7 @@

 #include "vnic_dev.h"
 #include "vnic_cq.h"
+#include 

 /* Work queue control */
 struct vnic_wq_ctrl {
@@ -79,6 +80,8 @@ struct vnic_wq {
unsigned int tail_idx;
unsigned int pkts_outstanding;
unsigned int socket_id;
+   const struct rte_memzone *cqmsg_rz;
+   uint16_t last_completed_index;
 };

 static inline unsigned int vnic_wq_desc_avail(struct vnic_wq *wq)
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 5bf5fcf..eaa206e 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -97,7 +97,6 @@ enic_rxmbuf_queue_release(struct enic *enic, struct vnic_rq 
*rq)
}
 }

-
 void enic_set_hdr_split_size(struct enic *enic, u16 split_hdr_size)
 {
vnic_set_hdr_split_size(enic->vdev, split_hdr_size);
@@ -235,12 +234,26 @@ void enic_init_vnic_resources(struct enic *enic)
unsigned int error_interrupt_enable = 1;
unsigned int error_interrupt_offset = 0;
unsigned int index = 0;
+   unsigned int cq_idx;

for (index = 0; index < enic->rq_count; index++) {
vnic_rq_init(&enic->rq[index],
enic_cq_rq(enic, index),
error_interrupt_enable,
error_interrupt_offset);
+
+   cq_idx = enic_cq_rq(enic, index);
+   vnic_cq_init(&enic->cq[cq_idx],
+   0 /* flow_control_enable */,
+   1 /* color_enable */,
+   0 /* cq_head */,
+   0 /* cq_tail */,
+   1 /* cq_tail_color */,
+   0 /* interrupt_enable */,
+   1 /* cq_entry_enable */,
+   0 /* cq_message_enable */,
+   0 /* interrupt offset */,
+   0 /* cq_message_addr */);
}

for (index = 0; index < enic->wq_count; index++) {
@@ -248,22 +261,19 @@ void enic_init_vnic_resources(struct enic *enic)
enic_cq_wq(enic, index),
error_interrupt_enable,
error_interrupt_offset);
-   }

-   vnic_dev_stats_clear(enic->vdev);
-
-   for (index = 0; index < enic->cq_count; index++) {
-   vnic_cq_init(&enic->cq[index],
+   cq_idx = enic_cq_wq(enic, index);
+   vnic_cq_init(&enic->cq[cq_idx],
0 /* flow_control_enable */,
1 /* color_enable */,
0 /* cq_head */,
0 /* cq_tail */,
1 /* cq_tail_color */,
0 /* interrupt_enable */,
-   1 /* cq_entry_enable */,
-   0 /* cq_message_enable */,
+   0 /* cq_entry_enable */,
+   1 /* cq_message_enable */,
0 /* interrupt offset */,
-   0 /* cq_message_addr */);
+   (u64)enic->wq[index].cqmsg_rz->phys_addr);
}

vnic_intr_init(&enic->intr,
@@ -507,6 +517,7 @@ void enic_free_wq(void *txq)
struct vnic_wq *wq = (struct vnic_wq *)txq;
struct enic *enic = vnic_dev_priv(wq->vdev);

+   rte_memzone_free(wq->cqmsg_rz);
vnic_wq_free(wq);
vnic_cq_free(&enic->cq[enic->rq_count + wq->index]);
 }
@@ -517,6 +528,8 @@ int enic_alloc_wq(struct enic *enic, uint16_t queue_idx,
int err;
struct vnic_wq *wq = &enic->wq[queue_idx];
unsigned int cq_index = enic_cq_wq(enic, queue_idx);
+   char name[NAME_MAX];
+   static int instance;

wq->socket_id = socket_id;
if (nb_desc) {
@@ -552,6 +565,18 @@ int enic_alloc_wq(struct enic *enic, uint16_t queue_

[dpdk-dev] [PATCH v2 08/11] enic: refactor Tx mbuf recycling

2016-05-23 Thread John Daley
Mbufs were returned to the pool one at a time. Use rte_mempool_put_bulk
instead. There were muiltiple function calls for each buffer returned.
Refactor this code into just 2 functions.

Signed-off-by: John Daley 
---
 drivers/net/enic/base/vnic_wq.h | 27 -
 drivers/net/enic/enic_rxtx.c| 54 ++---
 2 files changed, 35 insertions(+), 46 deletions(-)

diff --git a/drivers/net/enic/base/vnic_wq.h b/drivers/net/enic/base/vnic_wq.h
index fe46bb4..689b81c 100644
--- a/drivers/net/enic/base/vnic_wq.h
+++ b/drivers/net/enic/base/vnic_wq.h
@@ -177,33 +177,6 @@ buf_idx_incr(uint32_t n_descriptors, uint32_t idx)
return idx;
 }

-static inline void vnic_wq_service(struct vnic_wq *wq,
-   struct cq_desc *cq_desc, u16 completed_index,
-   void (*buf_service)(struct vnic_wq *wq,
-   struct cq_desc *cq_desc, struct vnic_wq_buf *buf, void *opaque),
-   void *opaque)
-{
-   struct vnic_wq_buf *buf;
-   unsigned int to_clean = wq->tail_idx;
-
-   buf = &wq->bufs[to_clean];
-   while (1) {
-
-   (*buf_service)(wq, cq_desc, buf, opaque);
-
-   wq->ring.desc_avail++;
-
-
-   to_clean = buf_idx_incr(wq->ring.desc_count, to_clean);
-
-   if (to_clean == completed_index)
-   break;
-
-   buf = &wq->bufs[to_clean];
-   }
-   wq->tail_idx = to_clean;
-}
-
 void vnic_wq_free(struct vnic_wq *wq);
 int vnic_wq_alloc(struct vnic_dev *vdev, struct vnic_wq *wq, unsigned int 
index,
unsigned int desc_count, unsigned int desc_size);
diff --git a/drivers/net/enic/enic_rxtx.c b/drivers/net/enic/enic_rxtx.c
index 2a54333..ec8d90a 100644
--- a/drivers/net/enic/enic_rxtx.c
+++ b/drivers/net/enic/enic_rxtx.c
@@ -326,33 +326,49 @@ enic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
return nb_rx;
 }

-static void enic_wq_free_buf(struct vnic_wq *wq,
-   __rte_unused struct cq_desc *cq_desc,
-   struct vnic_wq_buf *buf,
-   __rte_unused void *opaque)
+static inline void enic_free_wq_bufs(struct vnic_wq *wq, u16 completed_index)
 {
-   enic_free_wq_buf(wq, buf);
-}
-
-static int enic_wq_service(struct vnic_dev *vdev, struct cq_desc *cq_desc,
-   __rte_unused u8 type, u16 q_number, u16 completed_index, void *opaque)
-{
-   struct enic *enic = vnic_dev_priv(vdev);
+   struct vnic_wq_buf *buf;
+   struct rte_mbuf *m, *free[ENIC_MAX_WQ_DESCS];
+   unsigned int nb_to_free, nb_free = 0, i;
+   struct rte_mempool *pool;
+   unsigned int tail_idx;
+   unsigned int desc_count = wq->ring.desc_count;
+
+   nb_to_free = enic_ring_sub(desc_count, wq->tail_idx, completed_index)
+  + 1;
+   tail_idx = wq->tail_idx;
+   buf = &wq->bufs[tail_idx];
+   pool = ((struct rte_mbuf *)buf->mb)->pool;
+   for (i = 0; i < nb_to_free; i++) {
+   buf = &wq->bufs[tail_idx];
+   m = (struct rte_mbuf *)(buf->mb);
+   if (likely(m->pool == pool)) {
+   free[nb_free++] = m;
+   } else {
+   rte_mempool_put_bulk(pool, (void *)free, nb_free);
+   free[0] = m;
+   nb_free = 1;
+   pool = m->pool;
+   }
+   tail_idx = enic_ring_incr(desc_count, tail_idx);
+   buf->mb = NULL;
+   }

-   vnic_wq_service(&enic->wq[q_number], cq_desc,
-   completed_index, enic_wq_free_buf,
-   opaque);
+   rte_mempool_put_bulk(pool, (void **)free, nb_free);

-   return 0;
+   wq->tail_idx = tail_idx;
+   wq->ring.desc_avail += nb_to_free;
 }

-unsigned int enic_cleanup_wq(struct enic *enic, struct vnic_wq *wq)
+unsigned int enic_cleanup_wq(__rte_unused struct enic *enic, struct vnic_wq 
*wq)
 {
-   u16 completed_index = *((uint32_t *)wq->cqmsg_rz->addr) & 0x;
+   u16 completed_index;
+
+   completed_index = *((uint32_t *)wq->cqmsg_rz->addr) & 0x;

if (wq->last_completed_index != completed_index) {
-   enic_wq_service(enic->vdev, NULL, 0, wq->index,
-   completed_index, NULL);
+   enic_free_wq_bufs(wq, completed_index);
wq->last_completed_index = completed_index;
}
return 0;
-- 
2.7.0



[dpdk-dev] [PATCH v2 10/11] enic: remove unused files and functions and variables

2016-05-23 Thread John Daley
Remove some files, functions and variables left unused after
Tx performance improvements.

Signed-off-by: John Daley 
---
 drivers/net/enic/base/enic_vnic_wq.h | 63 
 drivers/net/enic/base/vnic_cq.h  | 44 -
 drivers/net/enic/base/vnic_wq.c  |  4 +--
 drivers/net/enic/base/vnic_wq.h  |  6 +---
 drivers/net/enic/enic.h  |  1 -
 drivers/net/enic/enic_main.c |  2 +-
 drivers/net/enic/enic_rxtx.c |  1 -
 7 files changed, 4 insertions(+), 117 deletions(-)
 delete mode 100644 drivers/net/enic/base/enic_vnic_wq.h

diff --git a/drivers/net/enic/base/enic_vnic_wq.h 
b/drivers/net/enic/base/enic_vnic_wq.h
deleted file mode 100644
index 55c5664..000
--- a/drivers/net/enic/base/enic_vnic_wq.h
+++ /dev/null
@@ -1,63 +0,0 @@
-/*
- * Copyright 2008-2015 Cisco Systems, Inc.  All rights reserved.
- * Copyright 2007 Nuova Systems, Inc.  All rights reserved.
- *
- * Copyright (c) 2015, Cisco Systems, Inc.
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- *
- * 1. Redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer.
- *
- * 2. Redistributions in binary form must reproduce the above copyright
- * notice, this list of conditions and the following disclaimer in
- * the documentation and/or other materials provided with the
- * distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
- * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
- * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
- * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
- * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
- * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
- * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- * POSSIBILITY OF SUCH DAMAGE.
- *
- */
-
-#ifndef _ENIC_VNIC_WQ_H_
-#define _ENIC_VNIC_WQ_H_
-
-#include "vnic_dev.h"
-#include "vnic_cq.h"
-
-static inline void enic_vnic_post_wq_index(struct vnic_wq *wq)
-{
-   /* Adding write memory barrier prevents compiler and/or CPU
-* reordering, thus avoiding descriptor posting before
-* descriptor is initialized. Otherwise, hardware can read
-* stale descriptor fields.
-   */
-   wmb();
-   iowrite32(wq->head_idx, &wq->ctrl->posted_index);
-}
-
-static inline void enic_vnic_post_wq(struct vnic_wq *wq,
-void *os_buf, uint8_t cq_entry)
-{
-   struct vnic_wq_buf *buf = &wq->bufs[wq->head_idx];
-   buf->mb = os_buf;
-   wq->head_idx = enic_ring_incr(wq->ring.desc_count, wq->head_idx);
-
-   if (cq_entry)
-   enic_vnic_post_wq_index(wq);
-}
-
-#endif /* _ENIC_VNIC_WQ_H_ */
diff --git a/drivers/net/enic/base/vnic_cq.h b/drivers/net/enic/base/vnic_cq.h
index 922391b..13ab87c 100644
--- a/drivers/net/enic/base/vnic_cq.h
+++ b/drivers/net/enic/base/vnic_cq.h
@@ -90,50 +90,6 @@ struct vnic_cq {
 #endif
 };

-static inline unsigned int vnic_cq_service(struct vnic_cq *cq,
-   unsigned int work_to_do,
-   int (*q_service)(struct vnic_dev *vdev, struct cq_desc *cq_desc,
-   u8 type, u16 q_number, u16 completed_index, void *opaque),
-   void *opaque)
-{
-   struct cq_desc *cq_desc;
-   unsigned int work_done = 0;
-   u16 q_number, completed_index;
-   u8 type, color;
-   struct rte_mbuf **rx_pkts = opaque;
-   unsigned int ret;
-
-   cq_desc = (struct cq_desc *)((u8 *)cq->ring.descs +
-   cq->ring.desc_size * cq->to_clean);
-   cq_desc_dec(cq_desc, &type, &color,
-   &q_number, &completed_index);
-
-   while (color != cq->last_color) {
-   if (opaque)
-   opaque = (void *)&(rx_pkts[work_done]);
-
-   ret = (*q_service)(cq->vdev, cq_desc, type,
-   q_number, completed_index, opaque);
-   cq->to_clean++;
-   if (cq->to_clean == cq->ring.desc_count) {
-   cq->to_clean = 0;
-   cq->last_color = cq->last_color ? 0 : 1;
-   }
-
-   cq_desc = (struct cq_desc *)((u8 *)cq->ring.descs +
-   cq->ring.desc_size * cq->to_clean);
-   cq_desc_dec(cq_desc, &type, &color,
-   &q_number, &completed_index);
-
-   if (ret)
-   work_done++;
-   if (work_done >= work_to_do)
-