Re: [PATCH v2] kni: fix possible alloc_q starvation when mbufs are exhausted

2022-11-11 Thread Matt
On Thu, Nov 10, 2022 at 12:39 AM Stephen Hemminger <
step...@networkplumber.org> wrote:

> On Wed,  9 Nov 2022 14:04:34 +0800
> Yangchao Zhou  wrote:
>
> > In some scenarios, mbufs returned by rte_kni_rx_burst are not freed
> > immediately. So kni_allocate_mbufs may be failed, but we don't know.
> >
> > Even worse, when alloc_q is completely exhausted, kni_net_tx in
> > rte_kni.ko will drop all tx packets. kni_allocate_mbufs is never
> > called again, even if the mbufs are eventually freed.
> >
> > In this patch, we always try to allocate mbufs for alloc_q.
> >
> > Don't worry about alloc_q being allocated too many mbufs, in fact,
> > the old logic will gradually fill up alloc_q.
> > Also, the cost of more calls to kni_allocate_mbufs should be acceptable.
> >
> > Fixes: 3e12a98fe397 ("kni: optimize Rx burst")
> > Cc: hem...@freescale.com
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Yangchao Zhou 
>
> Since fifo_get returning 0 (no buffers) is very common would this
> change impact performance.
>
It does add a little cost, but there is no extra mbuf allocation
and deallocation.

>
> If the problem is pool draining might be better to make the pool
> bigger.
>
Yes, using a larger pool can avoid this problem. But this may lead to
resource wastage and full resource calculation is a challenge for developers
as it involves to mempool caching mechanism, IP fragment cache,
ARP cache, NIC txq, other transit queue, etc.

The mbuf allocation failure may also occur on many NIC drivers,
but if the mbuf allocation fails, the mbuf is not taken out so that
it can be recovered after a retry later.
KNI currently does not have such a takedown and recovery mechanism.
It is also possible to consider implementing something similar to
the NIC driver, but with more changes and other overheads.


Re: [PATCH v3] kni: fix possible alloc_q starvation when mbufs are exhausted

2023-01-04 Thread Matt
Hi Ferruh,

In my case, the traffic is not large, so I can't see the impact.
I also tested under high load(>2Mpps with 2 DPDK cores and 2 kernel threads)
and found no significant difference in performance either.
I think the reason should be that it will not
run to 'kni_fifo_count(kni->alloc_q) == 0' under high load.

On Tue, Jan 3, 2023 at 8:47 PM Ferruh Yigit  wrote:

> On 12/30/2022 4:23 AM, Yangchao Zhou wrote:
> > In some scenarios, mbufs returned by rte_kni_rx_burst are not freed
> > immediately. So kni_allocate_mbufs may be failed, but we don't know.
> >
> > Even worse, when alloc_q is completely exhausted, kni_net_tx in
> > rte_kni.ko will drop all tx packets. kni_allocate_mbufs is never
> > called again, even if the mbufs are eventually freed.
> >
> > In this patch, we try to allocate mbufs for alloc_q when it is empty.
> >
> > According to historical experience, the performance bottleneck of KNI
> > is offen the usleep_range of kni thread in rte_kni.ko.
> > The check of kni_fifo_count is trivial and the cost should be acceptable.
> >
>
> Hi Yangchao,
>
> Are you observing any performance impact with this change in you use case?
>
>
> > Fixes: 3e12a98fe397 ("kni: optimize Rx burst")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Yangchao Zhou 
> > ---
> >  lib/kni/rte_kni.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/kni/rte_kni.c b/lib/kni/rte_kni.c
> > index 8ab6c47153..bfa6a001ff 100644
> > --- a/lib/kni/rte_kni.c
> > +++ b/lib/kni/rte_kni.c
> > @@ -634,8 +634,8 @@ rte_kni_rx_burst(struct rte_kni *kni, struct
> rte_mbuf **mbufs, unsigned int num)
> >  {
> >   unsigned int ret = kni_fifo_get(kni->tx_q, (void **)mbufs, num);
> >
> > - /* If buffers removed, allocate mbufs and then put them into
> alloc_q */
> > - if (ret)
> > + /* If buffers removed or alloc_q is empty, allocate mbufs and then
> put them into alloc_q */
> > + if (ret || (kni_fifo_count(kni->alloc_q) == 0))
> >   kni_allocate_mbufs(kni);
> >
> >   return ret;
>
>


[dpdk-dev] segmented recv ixgbevf

2014-11-05 Thread Matt Laswell
Hey Folks,

I ran into the same issue that Alex is describing here, and I wanted to
expand just a little bit on his comments, as the documentation isn't very
clear.

Per the documentation, the two arguments to rte_pktmbuf_pool_init() are a
pointer to the memory pool that contains the newly-allocated mbufs and an
opaque pointer.  The docs are pretty vague about what the opaque pointer
should point to or what it's contents mean; all of the examples I looked at
just pass a NULL pointer. The docs for this function describe the opaque
pointer this way:

"A pointer that can be used by the user to retrieve useful information for
mbuf initialization. This pointer comes from the init_arg parameter of
rte_mempool_create()
<http://www.dpdk.org/doc/api/rte__mempool_8h.html#a7dc1d01a45144e3203c36d1800cb8f17>
."

This is a little bit misleading.  Under the covers, rte_pktmbuf_pool_init()
doesn't threat the opaque pointer as a pointer at all.  Rather, it just
converts it to a uint16_t which contains the desired mbuf size.   If it
receives 0 (in other words, if you passed in a NULL pointer), it will use
2048 bytes + RTE_PKTMBUF_HEADROOM.  Hence, incoming jumbo frames will be
segmented into 2K chunks.

Any chance we could get an improvement to the documentation about this
parameter?  It seems as though the opaque pointer isn't a pointer and
probably shouldn't be opaque.

Hope this helps the next person who comes across this behavior.

--
Matt Laswell
infinite io, inc.

On Thu, Oct 30, 2014 at 7:48 AM, Alex Markuze  wrote:

> For posterity.
>
> 1.When using MTU larger then 2K its advised to provide the value
> to rte_pktmbuf_pool_init.
> 2.ixgbevf rounds down the ("MBUF size" - RTE_PKTMBUF_HEADROOM) to the
> nearest 1K multiple when deciding on the receiving capabilities [buffer
> size]of the Buffers in the pool.
> The function SRRCTL register,  is considered here for some reason?
>


[dpdk-dev] A question about hugepage initialization time

2014-12-09 Thread Matt Laswell
Hey Folks,

Our DPDK application deals with very large in memory data structures, and
can potentially use tens or even hundreds of gigabytes of hugepage memory.
During the course of development, we've noticed that as the number of huge
pages increases, the memory initialization time during EAL init gets to be
quite long, lasting several minutes at present.  The growth in init time
doesn't appear to be linear, which is concerning.

This is a minor inconvenience for us and our customers, as memory
initialization makes our boot times a lot longer than it would otherwise
be.  Also, my experience has been that really long operations often are
hiding errors - what you think is merely a slow operation is actually a
timeout of some sort, often due to misconfiguration. This leads to two
questions:

1. Does the long initialization time suggest that there's an error
happening under the covers?
2. If not, is there any simple way that we can shorten memory
initialization time?

Thanks in advance for your insights.

--
Matt Laswell
laswell at infiniteio.com
infinite io, inc.


[dpdk-dev] A question about hugepage initialization time

2014-12-09 Thread Matt Laswell
Hey Everybody,

Thanks for the feedback.  Yeah, we're pretty sure that the amount of memory
we work with is atypical, and we're hitting something that isn't an issue
for most DPDK users.

To clarify, yes, we're using 1GB hugepages, and we set them up via
hugepagesz and hugepages= in our kernel's grub line.  We find that when we
use four 1GB huge pages, eal memory init takes a couple of seconds, which
is no big deal.  When we use 128 1GB pages, though, memory init can take
several minutes.   The concern is that we will very likely use even more
memory in the future.  Our boot time is mostly just a nuisance now;
nonlinear growth in memory init time may transform it into a larger problem.

We've had to disable transparent hugepages due to latency issues with
in-memory databases.  I'll have to look at the possibility of alternative
memset implementations.  Perhaps some profiler time is in my future.

Again, thanks to everybody for the useful information.

--
Matt Laswell
laswell at infiniteio.com
infinite io, inc.

On Tue, Dec 9, 2014 at 1:06 PM, Matthew Hall  wrote:

> On Tue, Dec 09, 2014 at 10:33:59AM -0600, Matt Laswell wrote:
> > Our DPDK application deals with very large in memory data structures, and
> > can potentially use tens or even hundreds of gigabytes of hugepage
> memory.
>
> What you're doing is an unusual use case and this is open source code where
> nobody might have tested and QA'ed this yet.
>
> So my recommendation would be adding some rte_log statements to measure the
> various steps in the process to see what's going on. Also using the Linux
> Perf
> framework to do low-overhead sampling-based profiling, and making sure
> you've
> got everything compiled with debug symbols so you can see what's consuming
> the
> execution time.
>
> You might find that it makes sense to use some custom allocators like
> jemalloc
> alongside of the DPDK allocators, including perhaps "transparent hugepage
> mode" in your process, and some larger page sizes to reduce the number of
> pages.
>
> You can also use this handy kernel options, hugepagesz= hugepages=N .
> This creates guaranteed-contiguous known-good hugepages during boot which
> initialize much more quickly with less trouble and glitches in my
> experience.
>
> https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
> https://www.kernel.org/doc/Documentation/vm/transhuge.txt
>
> There is no one-size-fits-all solution but these are some possibilities.
>
> Good Luck,
> Matthew.
>


[dpdk-dev] Ability to/impact of running with smaller page sizes

2014-06-30 Thread Matt Laswell
Hey Folks,

In my application, I'm seeing some design considerations in a project I'm
working on that push me towards the use of smaller memory page sizes.  I'm
curious - is it possible in practical terms to run DPDK without hugepages?
 If so, does anybody have any practical experience (or a
back-of-the-envelop estimate) of how badly such a configuration would hurt
performance?  For sake of argument, assume that virtually all of the memory
being used is in pre-allocated mempools (e.g lots of rte_mempool_create(),
very little rte_malloc().

Thanks in advance for your help.

-- 
Matt Laswell


[dpdk-dev] Ability to/impact of running with smaller page sizes

2014-07-01 Thread Matt Laswell
Thanks everybody,

It sounds as though what I'm looking for may be possible, especially with
1.7, but will require some tweaking and there will most definitely be a
performance hit.  That's great information.  This is still just an
experiment for us, and it's not at all guaranteed that I'm going to move
towards smaller pages, but I very much appreciate the insights.

--
Matt Laswell


On Tue, Jul 1, 2014 at 6:51 AM, Burakov, Anatoly 
wrote:

> Hi Matt,
>
> > I'm curious - is it possible in practical terms to run DPDK without
> hugepages?
>
> Starting with release 1.7.0, support for VFIO was added, which allows
> using  DPDK without hugepages at al (including RX/TX rings) via the
> --no-huge command-line parameter. Bear in mind though that you'll have to
> have IOMMU/VT-d enabled (i.e. no VM support, only host-based) and also have
> supported kernel version (3.6+) as well to use VFIO, the memory size will
> be limited to 1G, and it won't work with multiprocess. I don't have any
> performance figures on that unfortunately.
>
> Best regards,
> Anatoly Burakov
> DPDK SW Engineer
>


[dpdk-dev] DPDK with Ubuntu 14.04?

2014-07-10 Thread Matt Laswell
Hey Folks,

I know that official support hasn't moved past Ubuntu 12.04 LTS yet, but
does anybody have any practical experience running with 14.04 LTS?  My team
has run into one compilation error so far with 1.7, but other than that
things look OK at first blush.  I'd like to move my product to 14.04 for a
variety of reasons, but would hate to spend time chasing down subtle
incompatibilities.  I'm guessing we're not the first ones to try this...

Thanks.

--
Matt Laswell
infinite io


[dpdk-dev] DPDK with Ubuntu 14.04?

2014-07-11 Thread Matt Laswell
Thanks Roger,

We saw similar issues with regard to kcompat.h.  Can I ask if you've done
anything beyond the example applications under 14.04?

--
Matt Laswell
infinite io


On Thu, Jul 10, 2014 at 7:07 PM, Wiles, Roger Keith <
keith.wiles at windriver.com> wrote:

>  The one problem I had with 14.04 was the kcompat.h file. It looks like a
> hash routine has changed its arguments. I edited the kcompat.h file and was
> about to change the code to allow DPDK to build. It is not affix but it
> worked for me.
>
>  lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
>
>  /*  Changed the next line to use (3,13,8) instead of (3,14,0) KeithW
> */
> #if ( LINUX_VERSION_CODE < KERNEL_VERSION(3,13,8) )
> #if (!(RHEL_RELEASE_CODE && RHEL_RELEASE_CODE >=
> RHEL_RELEASE_VERSION(7,0)))
> #ifdef NETIF_F_RXHASH
> #define PKT_HASH_TYPE_L3 0
>
>  *Hope that works.*
>
>  *Keith **Wiles*, Principal Technologist with CTO office, *Wind River*
> mobile 972-213-5533
>
> [image: Powering 30 Years of Innovation]
> <http://www.windriver.com/announces/wr30/>
>
>  On Jul 10, 2014, at 5:56 PM, Matt Laswell  wrote:
>
> Hey Folks,
>
> I know that official support hasn't moved past Ubuntu 12.04 LTS yet, but
> does anybody have any practical experience running with 14.04 LTS?  My team
> has run into one compilation error so far with 1.7, but other than that
> things look OK at first blush.  I'd like to move my product to 14.04 for a
> variety of reasons, but would hate to spend time chasing down subtle
> incompatibilities.  I'm guessing we're not the first ones to try this...
>
> Thanks.
>
> --
> Matt Laswell
> infinite io
>
>
>


[dpdk-dev] Question about ASLR

2014-09-05 Thread Matt Laswell
Hey Folks,

A colleague noticed warnings in section 23.3 of the programmer's guide
about the use of address space layout randomization with multiprocess DPDK
applications.  And, upon inspection, it appears that ASLR is enabled on our
target systems.  We've never seen a problem that we could trace back to
ASLR, and we've never see a warning during EAL memory initialiization,
either, which is strange.

Given the choice, we would prefer to keep ASLR for security reasons.  Given
that in our problem domain:
   - We are running a multiprocess DPDK application
   - We run only one DPDK application, which is a single compiled binary
   - We have exactly one process running per logical core
   - We're OK with interrupts coming just to the primary
   - We handle interaction from our control plane via a separate shared
memory space

Is it OK in this circumstance to leave ASLR enabled?  I think it probably
is, but would love to hear reasons why not and/or pitfalls that we need to
avoid.

Thanks in advance.

--
Matt Laswell
*infinite io*


[dpdk-dev] Question about ASLR

2014-09-08 Thread Matt Laswell
Bruce,

That's tremendously helpful.  Thanks for the information.

--
Matt Laswell
*infinite io*


On Sun, Sep 7, 2014 at 2:52 PM, Richardson, Bruce <
bruce.richardson at intel.com> wrote:

> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matt Laswell
> > Sent: Friday, September 05, 2014 7:57 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] Question about ASLR
> >
> > Hey Folks,
> >
> > A colleague noticed warnings in section 23.3 of the programmer's guide
> > about the use of address space layout randomization with multiprocess
> DPDK
> > applications.  And, upon inspection, it appears that ASLR is enabled on
> our
> > target systems.  We've never seen a problem that we could trace back to
> > ASLR, and we've never see a warning during EAL memory initialiization,
> > either, which is strange.
> >
> > Given the choice, we would prefer to keep ASLR for security reasons.
> Given
> > that in our problem domain:
> >- We are running a multiprocess DPDK application
> >- We run only one DPDK application, which is a single compiled binary
> >- We have exactly one process running per logical core
> >- We're OK with interrupts coming just to the primary
> >- We handle interaction from our control plane via a separate shared
> > memory space
> >
> > Is it OK in this circumstance to leave ASLR enabled?  I think it probably
> > is, but would love to hear reasons why not and/or pitfalls that we need
> to
> > avoid.
> >
> > Thanks in advance.
> >
> > --
> > Matt Laswell
> > *infinite io*
>
> Having ASLR enabled will just introduce a small element of uncertainty in
> the application startup process as you the memory mappings used by your app
> will move about from run to run. In certain cases we've seen some of the
> secondary multi-process application examples fail to start at random once
> every few hundred times (IIRC correctly - this was some time back).
> Presumably the chances of the secondary failing to start will vary
> depending on how ASLR has adjusted the memory mappings in the primary.
> So, with ASLR on, we've found occasionally that mappings will fail, in
> which case the solution is really just to retry the app again and ASLR will
> re-randomise it differently and it will likely start. Disabling ASLR gives
> repeatability in this regard - your app will always start successfully - or
> if there is something blocking the memory maps from being replicated -
> always fail to start (in which case you try passing EAL parameters to hint
> the primary process to use different mapping addresses).
>
> In your case, you are not seeing any problems thus far, so likely if
> secondary process startup failures do occur, they should hopefully work
> fine by just trying again! Whether this element of uncertainty is
> acceptable or not is your choice :-). One thing you could try, to find out
> what the issues might be with your app, is to just try running it
> repeatedly in a script, killing it after a couple of seconds. This should
> tell you how often, if ever, initialization failures are to be expected
> when using ASLR.
>
> Hope this helps,
> Regards,
> /Bruce
>


[dpdk-dev] Beyond DPDK 2.0

2015-04-24 Thread Matt Laswell
On Fri, Apr 24, 2015 at 12:39 PM, Jay Rolette 
wrote:
>
> I can tell you that if DPDK were GPL-based, my company wouldn't be using
> it. I suspect we wouldn't be the only ones...
>

I want to emphasize this point.  It's unsurprising that Jay and I agree,
since we work together.  But I can say with quite a bit of confidence that
my last employer also would stop using DPDK if it were GPL licensed.   Or,
if they didn't jettison it entirely, they would never move beyond the last
BSD-licensed version.  If you want to incentivize companies to support
DPDK, the first step is to ensure they're using it.  For that reason, GPL
seems like a step in the wrong direction to me.

- Matt


Re: [dpdk-dev] Occasional instability in RSS Hashes/Queues from X540 NIC

2017-07-18 Thread Matt Laswell
Hi Qiming,

That's fantastic news.  Thank you very much for taking the time to figure
the issue out.

Would it be possible to backport the fix to the 16.11 LTS release?   This
kind of problem seems tailor-made for LTS.

--
Matt Laswell
lasw...@infinite.io


On Tue, Jul 18, 2017 at 3:58 AM, Yang, Qiming  wrote:

> Hi Matt,
>
> We can reproduce this RSS issue on 16.04 but can't on 17.02, so this issue
> was fixed on 17.02.
> We suggest using the new version.
>
> Qiming
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Matt Laswell
> > Sent: Friday, May 5, 2017 9:05 PM
> > To: dev@dpdk.org
> > Subject: Re: [dpdk-dev] Occasional instability in RSS Hashes/Queues from
> X540
> > NIC
> >
> > On Thu, May 4, 2017 at 1:15 PM, Matt Laswell 
> wrote:
> >
> > > Hey Keith,
> > >
> > > Here is a hexdump of a subset of one of my packet captures.  In this
> > > capture, all of the packets are part of the same TCP connection, which
> > > happens to be NFSv3 traffic. All of them except packet number 6 get
> > > the correct RSS hash and go to the right queue.  Packet number 6 (an
> NFS
> > rename
> > > reply with an NFS error) gets RSS hash 0 and goes to queue 0.
>  Whenever I
> > > repeat this test, the reply to this particular rename attempt always
> > > goes to the wrong core, though it seemingly differs from the rest of
> > > the flow only in layers 4-7.
> > >
> > >  I'll also attach a pcap to this email, in case that's a more
> > > convenient way to interact with the packets.
> > >
> > > --
> > > Matt Laswell
> > > lasw...@infinite.io
> > >
> > >
> > > 16:08:37.093306 IP 10.151.3.81.disclose > 10.151.3.161.nfsd: Flags
> > > [P.], seq 3173509264:3173509380, ack 3244259549, win 580, options
> > > [nop,nop,TS val
> > > 23060466 ecr 490971270], length 116: NFS request xid 2690728524 112
> > > access fh
> > >
> > Unknown/8B6BFEBB0400CFABD1030100DABC0502
> > 01
> > > 00 NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_
> > > ACCESS_EXTEND|NFS_ACCESS_DELETE
> > > 0x:  4500 00a8 6d0f 4000 4006 b121 0a97 0351  E...m.@.@..!...Q
> > > 0x0010:  0a97 03a1 029b 0801 bd27 e890 c15f 78dd  .'..._x.
> > > 0x0020:  8018 0244 1cba  0101 080a 015f dff2  ...D._..
> > > 0x0030:  1d43 a086 8000 0070 a061 424c    .C.p.aBL
> > > 0x0040:   0002 0001 86a3  0003  0004  
> > > 0x0050:   0001  0020 0107 8d2f  0007  .../
> > > 0x0060:  6573 7869 3275 3100      esxi2u1.
> > > 0x0070:   0001        
> > > 0x0080:   0020 8b6b febb 0400  cfab d103  .k..
> > > 0x0090:  0100      dabc 0502  
> > > 0x00a0:  0100   001f  
> > > 16:08:37.095837 IP 10.151.3.161.nfsd > 10.151.3.81.disclose: Flags
> > > [P.], seq 1:125, ack 116, win 28688, options [nop,nop,TS val 490971270
> > > ecr 23060466], length 124: NFS reply xid 2690728524 reply ok 120
> > > access c 001f
> > > 0x:  4500 00b0 1b80 4000 4006 02a9 0a97 03a1  E.@.@...
> > > 0x0010:  0a97 0351 0801 029b c15f 78dd bd27 e904  ...Q._x..'..
> > > 0x0020:  8018 7010 a61a  0101 080a 1d43 a086  ..p..C..
> > > 0x0030:  015f dff2 8000 0078 a061 424c  0001  ._.x.aBL
> > > 0x0040:           
> > > 0x0050:     0001  0002  01ed  
> > > 0x0060:   0003        
> > > 0x0070:   0029    0800  00ff  ...)
> > > 0x0080:   00ff   bbfe 6b8b  0001  ..k.
> > > 0x0090:  03d1 abcf 5908 f554 3272 e4e6 5908 f554  Y..T2r..Y..T
> > > 0x00a0:  3272 e4e6 5908 f554 3365 2612  001f  2r..Y..T3e&.
> > > 16:08:37.096235 IP 10.151.3.81.disclose > 10.151.3.161.nfsd: Flags
> > > [P.], seq 256:372, ack 285, win 589, options [nop,nop,TS val 23060467
> > > ecr 490971270], length 116: NFS request xid 2724282956 112 access fh
> > > Unknown/
> > >
> > 8B6BFEBB0400D0ABD1030100DABC05020100
> > > NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_
> > > ACCESS_EXTEND|NFS_ACCESS_DELETE
> &g

[dpdk-dev] Occasional instability in RSS Hashes/Queues from X540 NIC

2017-05-04 Thread Matt Laswell
Hey Folks,

I'm seeing some strange behavior with regard to the RSS hash values in my
applications and was hoping somebody might have some pointers on where to
look.  In my application, I'm using RSS to divide work among multiple
cores, each of which services a single RX queue.  When dealing with a
single long-lived TCP connection, I occasionally see packets going to the
wrong core.   That is, almost all of the packets in the connection go to
core 5 in this case, but every once in a while, one goes to core 0 instead.

Upon further investigation, I find two problems are occurring.  The first
is that problem packets have the RSS hash value in the mbuf incorrectly set
to zero.  They are therefore put in queue zero, where they are read by core
zero.  Other packets from the same connection that occur immediately before
and after the packet in question have the correct hash value and therefore
go to a different core.   The second problem is that we sometimes see
packets in which the RSS hash in the mbuf appears correct, but the packets
are incorrectly put into queue zero.  As with the first, this results in
the wrong core getting the packet.  Either one of these confuses the state
tracking we're doing per-core.

A few details:

   - Using an Intel X540-AT2 NIC and the igb_uio driver
   - DPDK 16.04
   - A particular packet in our workflow always encounters this problem.
   - Retransmissions of the packet in question also encounter the problem
   - The packet is IPv4, with header length of 20 (so no options), no
   fragmentation.
   - The only differences I can see in the IP header between packets that
   get the right hash value and those that get the wrong one are in the IP ID,
   total length, and checksum fields.
   - Using ETH_RSS_IPV4
   - The packet is TCP with about 100 bytes of payload - it's not a jumbo
   or a runt
   - We fill the key in with 0x6d5a to get symmetric hashing of both sides
   of the connection
   - We only configure RSS information at boot; things like the key or
   header fields are not being changed dynamically
   - Traffic load is light when the problem occurs

Is anybody aware of an errata, either in the NIC or the PMD's configuration
of it that might explain something like this?   Failing that, if you ran
into this sort of behavior, how would you approach finding the reason for
the error?  Every failure mode I can think of would tend to affect all of
the packets in the connection consistently, even if incorrectly.

Thanks in advance for any ideas.

--
Matt Laswell
lasw...@infinite.io


Re: [dpdk-dev] Occasional instability in RSS Hashes/Queues from X540 NIC

2017-05-04 Thread Matt Laswell
Hey Keith,

Here is a hexdump of a subset of one of my packet captures.  In this
capture, all of the packets are part of the same TCP connection, which
happens to be NFSv3 traffic. All of them except packet number 6 get the
correct RSS hash and go to the right queue.  Packet number 6 (an NFS rename
reply with an NFS error) gets RSS hash 0 and goes to queue 0.   Whenever I
repeat this test, the reply to this particular rename attempt always goes
to the wrong core, though it seemingly differs from the rest of the flow
only in layers 4-7.

 I'll also attach a pcap to this email, in case that's a more convenient
way to interact with the packets.

--
Matt Laswell
lasw...@infinite.io


16:08:37.093306 IP 10.151.3.81.disclose > 10.151.3.161.nfsd: Flags [P.],
seq 3173509264:3173509380, ack 3244259549, win 580, options [nop,nop,TS val
23060466 ecr 490971270], length 116: NFS request xid 2690728524 112 access
fh Unknown/8B6BFEBB0400CFABD1030100DABC05020100
NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE
0x:  4500 00a8 6d0f 4000 4006 b121 0a97 0351  E...m.@.@..!...Q
0x0010:  0a97 03a1 029b 0801 bd27 e890 c15f 78dd  .'..._x.
0x0020:  8018 0244 1cba  0101 080a 015f dff2  ...D._..
0x0030:  1d43 a086 8000 0070 a061 424c    .C.p.aBL
0x0040:   0002 0001 86a3  0003  0004  
0x0050:   0001  0020 0107 8d2f  0007  .../
0x0060:  6573 7869 3275 3100      esxi2u1.
0x0070:   0001        
0x0080:   0020 8b6b febb 0400  cfab d103  .k..
0x0090:  0100      dabc 0502  
0x00a0:  0100   001f  
16:08:37.095837 IP 10.151.3.161.nfsd > 10.151.3.81.disclose: Flags [P.],
seq 1:125, ack 116, win 28688, options [nop,nop,TS val 490971270 ecr
23060466], length 124: NFS reply xid 2690728524 reply ok 120 access c 001f
0x:  4500 00b0 1b80 4000 4006 02a9 0a97 03a1  E.@.@...
0x0010:  0a97 0351 0801 029b c15f 78dd bd27 e904  ...Q._x..'..
0x0020:  8018 7010 a61a  0101 080a 1d43 a086  ..p..C..
0x0030:  015f dff2 8000 0078 a061 424c  0001  ._.x.aBL
0x0040:           
0x0050:     0001  0002  01ed  
0x0060:   0003        
0x0070:   0029    0800  00ff  ...)
0x0080:   00ff   bbfe 6b8b  0001  ..k.
0x0090:  03d1 abcf 5908 f554 3272 e4e6 5908 f554  Y..T2r..Y..T
0x00a0:  3272 e4e6 5908 f554 3365 2612  001f  2r..Y..T3e&.
16:08:37.096235 IP 10.151.3.81.disclose > 10.151.3.161.nfsd: Flags [P.],
seq 256:372, ack 285, win 589, options [nop,nop,TS val 23060467 ecr
490971270], length 116: NFS request xid 2724282956 112 access fh
Unknown/8B6BFEBB0400D0ABD1030100DABC05020100
NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_ACCESS_EXTEND|NFS_ACCESS_DELETE
0x:  4500 00a8 6d11 4000 4006 b11f 0a97 0351  E...m.@.@..Q
0x0010:  0a97 03a1 029b 0801 bd27 e990 c15f 79f9  .'..._y.
0x0020:  8018 024d 1cba  0101 080a 015f dff3  ...M._..
0x0030:  1d43 a086 8000 0070 a261 424c    .C.p.aBL
0x0040:   0002 0001 86a3  0003  0004  
0x0050:   0001  0020 0107 8d2f  0007  .../
0x0060:  6573 7869 3275 3100      esxi2u1.
0x0070:   0001        
0x0080:   0020 8b6b febb 0400  d0ab d103  .k..
0x0090:  0100      dabc 0502  
0x00a0:  0100   001f  
16:08:37.098361 IP 10.151.3.161.nfsd > 10.151.3.81.disclose: Flags [P.],
seq 285:409, ack 372, win 28688, options [nop,nop,TS val 490971270 ecr
23060467], length 124: NFS reply xid 2724282956 reply ok 120 access c 001f
0x:  4500 00b0 1b81 4000 4006 02a8 0a97 03a1  E.@.@...
0x0010:  0a97 0351 0801 029b c15f 79f9 bd27 ea04  ...Q._y..'..
0x0020:  8018 7010 ec45  0101 080a 1d43 a086  ..p..E...C..
0x0030:  015f dff3 8000 0078 a261 424c  0001  ._.x.aBL
0x0040:           
0x0050:     0001  0002  01ed  
0x0060:   0004        
0x0070:   0050    0800  00ff  ...P
0x0080:   00ff   bbfe 6b8b  0001  ..k.
0x0090:  03d1 abd0 5908 f554 3536 88ea 5908 f554  Y..T56..Y..T
0x00a0:  3536 88ea 5908 f555 01ff bf76  001f  56..Y..U...v
16:08:37.099013 IP 10.151.3.81.disclose > 10.151.3.161.nfsd: Flags [P.],
seq 652:856, ack 813, win 605, options [nop,nop,TS val 230

Re: [dpdk-dev] Occasional instability in RSS Hashes/Queues from X540 NIC

2017-05-05 Thread Matt Laswell
On Thu, May 4, 2017 at 1:15 PM, Matt Laswell  wrote:

> Hey Keith,
>
> Here is a hexdump of a subset of one of my packet captures.  In this
> capture, all of the packets are part of the same TCP connection, which
> happens to be NFSv3 traffic. All of them except packet number 6 get the
> correct RSS hash and go to the right queue.  Packet number 6 (an NFS rename
> reply with an NFS error) gets RSS hash 0 and goes to queue 0.   Whenever I
> repeat this test, the reply to this particular rename attempt always goes
> to the wrong core, though it seemingly differs from the rest of the flow
> only in layers 4-7.
>
>  I'll also attach a pcap to this email, in case that's a more convenient
> way to interact with the packets.
>
> --
> Matt Laswell
> lasw...@infinite.io
>
>
> 16:08:37.093306 IP 10.151.3.81.disclose > 10.151.3.161.nfsd: Flags [P.],
> seq 3173509264:3173509380, ack 3244259549, win 580, options [nop,nop,TS val
> 23060466 ecr 490971270], length 116: NFS request xid 2690728524 112 access
> fh Unknown/8B6BFEBB0400CFABD1030100DABC05020100
> NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_
> ACCESS_EXTEND|NFS_ACCESS_DELETE
> 0x:  4500 00a8 6d0f 4000 4006 b121 0a97 0351  E...m.@.@..!...Q
> 0x0010:  0a97 03a1 029b 0801 bd27 e890 c15f 78dd  .'..._x.
> 0x0020:  8018 0244 1cba  0101 080a 015f dff2  ...D._..
> 0x0030:  1d43 a086 8000 0070 a061 424c    .C.p.aBL
> 0x0040:   0002 0001 86a3  0003  0004  
> 0x0050:   0001  0020 0107 8d2f  0007  .../
> 0x0060:  6573 7869 3275 3100      esxi2u1.
> 0x0070:   0001        
> 0x0080:   0020 8b6b febb 0400  cfab d103  .k..
> 0x0090:  0100      dabc 0502  
> 0x00a0:  0100   001f  
> 16:08:37.095837 IP 10.151.3.161.nfsd > 10.151.3.81.disclose: Flags [P.],
> seq 1:125, ack 116, win 28688, options [nop,nop,TS val 490971270 ecr
> 23060466], length 124: NFS reply xid 2690728524 reply ok 120 access c 001f
> 0x:  4500 00b0 1b80 4000 4006 02a9 0a97 03a1  E.@.@...
> 0x0010:  0a97 0351 0801 029b c15f 78dd bd27 e904  ...Q._x..'..
> 0x0020:  8018 7010 a61a  0101 080a 1d43 a086  ..p..C..
> 0x0030:  015f dff2 8000 0078 a061 424c  0001  ._.x.aBL
> 0x0040:           
> 0x0050:     0001  0002  01ed  
> 0x0060:   0003        
> 0x0070:   0029    0800  00ff  ...)
> 0x0080:   00ff   bbfe 6b8b  0001  ..k.
> 0x0090:  03d1 abcf 5908 f554 3272 e4e6 5908 f554  Y..T2r..Y..T
> 0x00a0:  3272 e4e6 5908 f554 3365 2612  001f  2r..Y..T3e&.
> 16:08:37.096235 IP 10.151.3.81.disclose > 10.151.3.161.nfsd: Flags [P.],
> seq 256:372, ack 285, win 589, options [nop,nop,TS val 23060467 ecr
> 490971270], length 116: NFS request xid 2724282956 112 access fh Unknown/
> 8B6BFEBB0400D0ABD1030100DABC05020100
> NFS_ACCESS_READ|NFS_ACCESS_LOOKUP|NFS_ACCESS_MODIFY|NFS_
> ACCESS_EXTEND|NFS_ACCESS_DELETE
> 0x:  4500 00a8 6d11 4000 4006 b11f 0a97 0351  E...m.@.@..Q
> 0x0010:  0a97 03a1 029b 0801 bd27 e990 c15f 79f9  .'..._y.
> 0x0020:  8018 024d 1cba  0101 080a 015f dff3  ...M._..
> 0x0030:  1d43 a086 8000 0070 a261 424c    .C.p.aBL
> 0x0040:   0002 0001 86a3  0003  0004  
> 0x0050:   0001  0020 0107 8d2f  0007  .../
> 0x0060:  6573 7869 3275 3100      esxi2u1.
> 0x0070:   0001        
> 0x0080:   0020 8b6b febb 0400  d0ab d103  .k..
> 0x0090:  0100      dabc 0502  
> 0x00a0:  0100   001f  
> 16:08:37.098361 IP 10.151.3.161.nfsd > 10.151.3.81.disclose: Flags [P.],
> seq 285:409, ack 372, win 28688, options [nop,nop,TS val 490971270 ecr
> 23060467], length 124: NFS reply xid 2724282956 reply ok 120 access c 001f
> 0x:  4500 00b0 1b81 4000 4006 02a8 0a97 03a1  E.@.@...
> 0x0010:  0a97 0351 0801 029b c15f 79f9 bd27 ea04  ...Q._y..'..
> 0x0020:  8018 7010 ec45  0101 080a 1d43 a086  ..p..E...C..
> 0x0030:  015f dff3 8000 0078 a261 424c  0001  ._.x.aBL
> 0x0040:           
> 0x0050:     0001  0002  01ed  
> 0x0060:   0004    000

[dpdk-dev] backtracing from within the code

2016-06-27 Thread Matt Laswell
I've done something similar to what's described in the link below.  But
it's worth pointing out that it's using printf() inside a signal handler,
which isn't safe. If your use case is catching SIGSEGV, for example,
solutions built on printf() will usually work, but can deadlock.  One way
around the problem is to call write() directly, passing it stdout's file
handle.

For example, I have this in my code:
#define WRITE_STRING(fd, s) write (fd, s, strlen (s))

In my signal handlers, I use the above like this:
WRITE_STRING(STDOUT_FILENO, "Stack trace:\n");

This approach is a little bit more cumbersome to code, but safer.

The last time that I looked the DPDK rte_dump_stack() is using vfprintf(),
which isn't safe in a signal handler.  However, it's been several DPDK
releases since I peeked at the details.

--
Matt Laswell
Principal Software Engineer
infinite io, inc.
laswell at infinite.io


On Sat, Jun 25, 2016 at 9:07 AM, Rosen, Rami  wrote:

> Hi,
> If you are willing to skip static methods and use the GCC backtrace, you
> can
> try this example (it worked for me, but it was quite a time ago):
> http://www.helicontech.co.il/?id=linuxbt
>
> Regards,
> Rami Rosen
> Intel Corporation
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Friday, June 24, 2016 8:46 PM
> To: Thomas Monjalon 
> Cc: Catalin Vasile ; dev at dpdk.org; Dumitrescu,
> Cristian 
> Subject: Re: [dpdk-dev] backtracing from within the code
>
> On Fri, 24 Jun 2016 12:05:26 +0200
> Thomas Monjalon  wrote:
>
> > 2016-06-24 09:25, Dumitrescu, Cristian:
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Catalin Vasile
> > > > I'm trying to add a feature to DPDK and I'm having a hard time
> printing a
> > > > backtrace.
> > > > I tried using this[1] functions for printing, but it does not print
> more than one
> > > > function. Maybe it lacks the symbols it needs.
> > [...]
> > > It eventually calls rte_dump_stack() in file
> lib/lirte_eal/linuxapp/eal/eal_debug.c, which calls backtrace(), which is
> probably what you are looking for.
> >
> > Example:
> > 5: [build/app/testpmd(_start+0x29) [0x416f69]]
> > 4: [/usr/lib/libc.so.6(__libc_start_main+0xf0) [0x7eff3b757610]]
> > 3: [build/app/testpmd(main+0x2ff) [0x416b3f]]
> > 2: [build/app/testpmd(init_port_config+0x88) [0x419a78]]
> > 1: [build/lib/librte_eal.so.2.1(rte_dump_stack+0x18) [0x7eff3c126488]]
> >
> > Please tell us if you have some cases where rte_dump_stack() does not
> work.
> > I do not remember what are the constraints to have it working.
> > Your binary is not stripped?
>
> The GCC backtrace doesn't work well because it can't find static functions.
> I ended up using libunwind to get a better back trace.
>


[dpdk-dev] Appropriate DPDK data structures for TCP sockets

2015-02-23 Thread Matt Laswell
Hey Matthew,

I've mostly worked on stackless systems over the last few years, but I have
done a fair bit of work on high performance, highly scalable connection
tracking data structures.  In that spirit, here are a few counterintuitive
insights I've gained over the years.  Perhaps they'll be useful to you.
Apologies in advance for likely being a bit long-winded.

First, you really need to take cache performance into account when you're
choosing a data structure.  Something like a balanced tree can seem awfully
appealing at first blush, either on its own or as a chaining mechanism for
a hash table.  But the problem with trees is that there really isn't much
locality of reference in your memory use - every single step in your
descent ends up being a cache miss.  This hurts you twice: once that you
end up stalled waiting for the next node in the tree to load from main
memory, and again when you have to reload whatever you pushed out of cache
to get it.

It's often better if, instead of a tree, you do linear search across arrays
of hash values.  It's easy to size the array so that it is exactly one
cache line long, and you can generally do linear search of the whole thing
in less time than it takes to do a single cache line fill.   If you find a
match, you can do full verification against the full tuple as needed.

Second, rather than synchronizing (perhaps with locks, perhaps with
lockless data structures), it's often beneficial to create multiple
threads, each of which holds a fraction of your connection tracking data.
Every connection belongs to a single one of these threads, selected perhaps
by hash or RSS value, and all packets from the connection go through that
single thread.  This approach has a couple of advantages.  First,
obviously, no slowdowns for synchronization.  But, second, I've found that
when you are spreading packets from a single connection across many compute
elements, you're inevitably going to start putting packets out of order.
In many applications, this ultimately leads to some additional processing
to put things back in order, which gives away the performance gains you
achieved.  Of course, this approach brings its own set of complexities, and
challenges for your application, and doesn't always spread the work as
efficiently across all of your cores.  But it might be worth considering.

Third, it's very worthwhile to have a cache for the most recently accessed
connection.  First, because network traffic is bursty, and you'll
frequently see multiple packets from the same connection in succession.
Second, because it can make life easier for your application code.  If you
have multiple places that need to access connection data, you don't have to
worry so much about the cost of repeated searches.  Again, this may or may
not matter for your particular application.  But for ones I've worked on,
it's been a win.

Anyway, as predicted, this post has gone far too long for a Monday
morning.  Regardless, I hope you found it useful.  Let me know if you have
questions or comments.

--
Matt Laswell
infinite io, inc.
laswell at infiniteio.com

On Sun, Feb 22, 2015 at 10:50 PM, Matthew Hall 
wrote:

>
> On Feb 22, 2015, at 4:02 PM, Stephen Hemminger 
> wrote:
> > Use userspace RCU? or BSD RB_TREE
>
> Thanks Stephen,
>
> I think the RB_TREE stuff is single threaded mostly.
>
> But user-space RCU looks quite good indeed, I didn't know somebody ported
> it out of the kernel. I'll check it out.
>
> Matthew.


[dpdk-dev] Question about link up/down events and transmit queues

2015-03-10 Thread Matt Laswell
Hey Folks,

I'm running into an issue that I hope is obvious and simple.  We're running
DPDK 1.6.2 with an 82599 NIC.  We find that if, while running traffic, we
disconnect a port and then later reconnect it, we never regain the ability
to transmit packets out of that port after it comes back up.
Specifically, our calls to rte_eth_tx_burst() get return values that
indicate that no packets could be sent.

Is there an additional step that we have to do on link down/up operations,
perhaps to tell the NIC to flush its descriptor ring?

Thanks in advance for your help.

--
Matt Laswell
*infinite io, inc.*
laswell at infiniteio.com


[dpdk-dev] Question about link up/down events and transmit queues

2015-03-10 Thread Matt Laswell
Just a bit more on this.  We've found that when a link goes down, the TX
descriptor ring appears to fill up with packets fairly quickly, and then
calls to rte_eth_tx_burst() start returning zero.  Our application handles
this case, and frees the mbufs that could not be sent.

However, when link is reestablished, the TX descriptor ring appears to stay
full.  Hence, subsequent calls to rte_eth_tx_burst() continue to return
zero, and we continue to free the mbufs without sending them.  Frankly,
this was surprising, as we I have assumed that the TX descriptor ring would
be emptied when the link came back up, either by sending the enqueued
packets, or by reinitializing.

I've tried calling rte_eth_dev_start() and rte_eth_promiscuous_enable() in
order to restart everything.  That appears to work, at least on the
combination of drivers that I tested with.  Can somebody please tell me
whether this is the preferred way to recover from link down?

Thanks,

--
Matt Laswell
*infinite io, inc.*
laswell at infiniteio.com


On Tue, Mar 10, 2015 at 10:47 AM, Matt Laswell 
wrote:

> Hey Folks,
>
> I'm running into an issue that I hope is obvious and simple.  We're
> running DPDK 1.6.2 with an 82599 NIC.  We find that if, while running
> traffic, we disconnect a port and then later reconnect it, we never regain
> the ability to transmit packets out of that port after it comes back up.
> Specifically, our calls to rte_eth_tx_burst() get return values that
> indicate that no packets could be sent.
>
> Is there an additional step that we have to do on link down/up operations,
> perhaps to tell the NIC to flush its descriptor ring?
>
> Thanks in advance for your help.
>
> --
> Matt Laswell
> *infinite io, inc.*
> laswell at infiniteio.com
>


[dpdk-dev] pktgen rx errors with intel 82599

2015-03-13 Thread Matt Smith

Hi,

I?ve been using DPDK pktgen 2.8.0 (built against DPDK 1.8.0 libraries) to send 
traffic on a server using an Intel 82599 (X520-2). Traffic gets sent out port 1 
through another server which also an Intel 82599 installed and is forwarded 
back into port 0. When I send using a single source and destination IP address, 
this works fine and packets arrive on port 0 at close to the maximum line rate. 

If I change port 1 to range mode and send traffic from a range of source IP 
addresses to a single destination IP address, for a second or two the display 
indicates that some packets were received on port 0 but then the rate of 
received packets on the display goes to 0 and all incoming packets on port 0 
are registered as rx errors.

The server that traffic is being forwarded through is running the ip_pipeline 
example app. I ruled this out as the source of the problem by sending directly 
from port 1 to port 0 of the pktgen box. The issue still occurs when the 
traffic is not being forwarded through the other box. Since ip_pipeline is able 
to receive the packets and forward them without getting rx errors and it?s 
running with the same model of NIC as pktgen is using, I checked to see if 
there were any differences in initialization of the rx port between ip_pipeline 
and pktgen. I noticed that pktgen has a setting that ip_pipeline doesn't:

const struct rte_eth_conf port_conf = {
.rxmode = {
.mq_mode = ETH_MQ_RX_RSS,

If I comment out the .mq_mode setting and rebuild pktgen, the problem no longer 
occurs and I now receive packets on port 0 at near line rate when testing from 
a range of source addresses.

I recall reading in the past that if a receive queue fills up on an 82599 , 
that receiving stalls for all of the other queues and no more packets can be 
received. Could that be happening with pktgen? Is there any debugging I can do 
to help track it down?

The command line I have been launching pktgen with is: 

pktgen -c f -n 3 -m 512 -- -p 0x3 -P -m 1.0,2.1

Thanks,

-Matt Smith







[dpdk-dev] Symmetric RSS Hashing, Part 2

2015-03-23 Thread Matt Laswell
Hey Folks,

I have essentially the same question as Matthew.  Has there been progress
in this area?

--
Matt Laswell
infinite io, inc.
laswell at infiniteio.com


On Sat, Mar 14, 2015 at 3:47 PM, Matthew Hall  wrote:

> A few months ago we had this thread about symmetric hashing of TCP in RSS:
>
> http://dpdk.org/ml/archives/dev/2014-December/010148.html
>
> I was wondering if we ever did figure out how to get the 0x6d5a hash key
> mentioned in there to work, or another alternative one.
>
> Thanks,
> Matthew.


[dpdk-dev] pktgen rx errors with intel 82599

2015-03-23 Thread Matt Smith

> On Mar 14, 2015, at 1:33 PM, Wiles, Keith  wrote:
> 
> Hi Matt,
> 
> On 3/14/15, 8:47 AM, "Wiles, Keith"  <mailto:keith.wiles at intel.com>> wrote:
> 
>> Hi Matt
>> 
>> On 3/13/15, 3:49 PM, "Matt Smith"  wrote:
>> 
>>> 
>>> Hi,
>>> 
>>> I?ve been using DPDK pktgen 2.8.0 (built against DPDK 1.8.0 libraries) to
>>> send traffic on a server using an Intel 82599 (X520-2). Traffic gets sent
>>> out port 1 through another server which also an Intel 82599 installed and
>>> is forwarded back into port 0. When I send using a single source and
>>> destination IP address, this works fine and packets arrive on port 0 at
>>> close to the maximum line rate.
>>> 
>>> If I change port 1 to range mode and send traffic from a range of source
>>> IP addresses to a single destination IP address, for a second or two the
>>> display indicates that some packets were received on port 0 but then the
>>> rate of received packets on the display goes to 0 and all incoming
>>> packets on port 0 are registered as rx errors.
>>> 
>>> The server that traffic is being forwarded through is running the
>>> ip_pipeline example app. I ruled this out as the source of the problem by
>>> sending directly from port 1 to port 0 of the pktgen box. The issue still
>>> occurs when the traffic is not being forwarded through the other box.
>>> Since ip_pipeline is able to receive the packets and forward them without
>>> getting rx errors and it?s running with the same model of NIC as pktgen
>>> is using, I checked to see if there were any differences in
>>> initialization of the rx port between ip_pipeline and pktgen. I noticed
>>> that pktgen has a setting that ip_pipeline doesn't:
>>> 
>>> const struct rte_eth_conf port_conf = {
>>>   .rxmode = {
>>>   .mq_mode = ETH_MQ_RX_RSS,
>>> 
>>> If I comment out the .mq_mode setting and rebuild pktgen, the problem no
>>> longer occurs and I now receive packets on port 0 at near line rate when
>>> testing from a range of source addresses.
>>> 
>>> I recall reading in the past that if a receive queue fills up on an 82599
>>> , that receiving stalls for all of the other queues and no more packets
>>> can be received. Could that be happening with pktgen? Is there any
>>> debugging I can do to help track it down?
>> 
>> I have seen this problem on some platforms a few times and it looks like
>> you may have found a possible solution to the problem. I will have to look
>> into the change and see if this is the problem, but it does seem to
>> suggest this may be the issue. When the port gets into this state the port
>> receives the number mbufs matching the number of descriptors and the rest
>> are ?missed? frames at the wire. The RX counter is the number of missed
>> frames.
>> 
>> Thanks for the input
>> ++Keith
> 
> I added code to hopefully setup the correct RX/TX conf values. The HEAD of
> the Pktgen-DPDK v2.8.4 should build and work with DPDK 1.8.0 or 2.0.0-rc1.
> I did still see some RX errors and reduced bit rate, but the traffic does
> not stop on my machine. Please give version 2.8.4 a try and let me know if
> you still see problems.
> 
> Regards,
> ++Keith

Hi Keith,

Sorry for the delay in responding, I have been out of town.

Thanks for your attention to the problem. I pulled the latest code from git and 
moved to the pktgen-2.8.4 tag. I had one issue building:

  CC pktgen-port-cfg.o
/root/dpdk/pktgen-dpdk/app/pktgen-port-cfg.c: In function ?pktgen_config_ports?:
/root/dpdk/pktgen-dpdk/app/pktgen-port-cfg.c:300:11: error: variable ?k? set 
but not used [-Werror=unused-but-set-variable]
  uint64_t k;
   ^
cc1: all warnings being treated as errors
make[2]: *** [pktgen-port-cfg.o] Error 1
make[1]: *** [all] Error 2
make: *** [app] Error 2


I prepended '__attribute__((unused))? to the declaration of k and then I was 
able to build successfully. I did not see any receive errors running the 
updated binary. So once I got past the initial build problem, the issue seems 
to be resolved.

Thanks,
-Matt




[dpdk-dev] Symmetric RSS Hashing, Part 2

2015-03-30 Thread Matt Laswell
That's really encouraging.  Thanks!

One thing I'll note is that if my reading of the original paper is
accurate, the 0x6d5a value isn't there in order to cause symmetry - other
repeated 16 bit values will do that, as you've seen.  What the 0x6d5a value
gets you is symmetry while preserving RSS's effectiveness at load spreading
with typical traffic data.  Not all 16 bit values will do this.

--
Matt Laswell
infinite io, inc.
laswell at infiniteio.com

On Mon, Mar 30, 2015 at 10:00 AM, Vladimir Medvedkin 
wrote:

> Matthew,
>
> I don't use any special tricks to make symmetric RSS work. Furthermore, it
> works not only with 0x6d5a.
>
> Regards,
> Vladimir
>
> 2015-03-28 23:11 GMT+03:00 Matthew Hall :
>
> > On Sat, Mar 28, 2015 at 12:10:20PM +0300, Vladimir Medvedkin wrote:
> > > I just verify RSS symmetric in my code, all works great.
> > > ...
> > > By the way, maybe it will be usefull to add softrss function in DPDK?
> >
> > Vladimir,
> >
> > All of this is super-awesome code. I agree having SW RSS would be quite
> > nice.
> > Then you could more easily support things like virtio-net and other stuff
> > which doesn't have RSS.
> >
> > Did you have to use any special tricks to get the 0x6d5a to work? I
> wasn't
> > quite
> > sure how to initialize that and get it to run right.
> >
> > Matthew.
> >
>


[dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instructions.

2015-05-08 Thread Matt Laswell
On Fri, May 8, 2015 at 4:19 PM, Ravi Kerur  wrote:

> This patch replaces memcmp in librte_hash with rte_memcmp which is
> implemented with AVX/SSE instructions.
>
> +static inline int
> +rte_memcmp(const void *_src_1, const void *_src_2, size_t n)
> +{
> +   const uint8_t *src_1 = (const uint8_t *)_src_1;
> +   const uint8_t *src_2 = (const uint8_t *)_src_2;
> +   int ret = 0;
> +
> +   if (n & 0x80)
> +   return rte_cmp128(src_1, src_2);
> +
> +   if (n & 0x40)
> +   return rte_cmp64(src_1, src_2);
> +
> +   if (n & 0x20) {
> +   ret = rte_cmp32(src_1, src_2);
> +   n -= 0x20;
> +   src_1 += 0x20;
> +   src_2 += 0x20;
> +   }
>
>
Pardon me for butting in, but this seems incorrect for the first two cases
listed above, as the function as written will only compare the first 128 or
64 bytes of each source and return the result.  The pattern expressed in
the 32 byte case appears more correct, as it compares the first 32 bytes
and then lets later pieces of the function handle the smaller remaining
bits of the sources. Also, if this function is to handle arbitrarily large
source data, the 128 byte case needs to be in a loop.

What am I missing?

--
Matt Laswell
infinite io, inc.
laswell at infiniteio.com


[dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instructions.

2015-05-08 Thread Matt Laswell
On Fri, May 8, 2015 at 5:54 PM, Ravi Kerur  wrote:

>
>
> On Fri, May 8, 2015 at 3:29 PM, Matt Laswell 
> wrote:
>
>>
>>
>> On Fri, May 8, 2015 at 4:19 PM, Ravi Kerur  wrote:
>>
>>> This patch replaces memcmp in librte_hash with rte_memcmp which is
>>> implemented with AVX/SSE instructions.
>>>
>>> +static inline int
>>> +rte_memcmp(const void *_src_1, const void *_src_2, size_t n)
>>> +{
>>> +   const uint8_t *src_1 = (const uint8_t *)_src_1;
>>> +   const uint8_t *src_2 = (const uint8_t *)_src_2;
>>> +   int ret = 0;
>>> +
>>> +   if (n & 0x80)
>>> +   return rte_cmp128(src_1, src_2);
>>> +
>>> +   if (n & 0x40)
>>> +   return rte_cmp64(src_1, src_2);
>>> +
>>> +   if (n & 0x20) {
>>> +   ret = rte_cmp32(src_1, src_2);
>>> +   n -= 0x20;
>>> +   src_1 += 0x20;
>>> +   src_2 += 0x20;
>>> +   }
>>>
>>>
>> Pardon me for butting in, but this seems incorrect for the first two
>> cases listed above, as the function as written will only compare the first
>> 128 or 64 bytes of each source and return the result.  The pattern
>> expressed in the 32 byte case appears more correct, as it compares the
>> first 32 bytes and then lets later pieces of the function handle the
>> smaller remaining bits of the sources. Also, if this function is to handle
>> arbitrarily large source data, the 128 byte case needs to be in a loop.
>>
>> What am I missing?
>>
>
> Current max hash key length supported is 64 bytes, hence no comparison is
> done after 64 bytes. 128 bytes comparison is added to measure performance
> only and there is no use-case as of now. With the current use-cases its not
> required but if there is a need to handle large arbitrary data upto 128
> bytes it can be modified.
>

Ah, gotcha.  I misunderstood and thought that this was meant to be a
generic AVX/SSE enabled memcmp() replacement, and that the use of it in
rte_hash was meant merely as a test case.   If it's more limited than that,
carry on, though you might want to make a note of it in the documentation.
I suspect others will misinterpret the name as I did.

--
Matt Laswell
infinite io, inc.
laswell at infiniteio.com


[dpdk-dev] Load-balancing position field in DPDK load_balancer sample app vs. Hash table

2014-11-14 Thread Matt Laswell
Hey Folks,

This thread has been tremendously helpful, as I'm looking at adding
RSS-based load balancing to my application in the not too distant future.
Many thanks to all who have contributed, especially regarding symmetric RSS.

Not to derail the conversation too badly, but could one of you point me to
some example code that demonstrates the steps needed to configure RSS?
We're using Niantic NICs, so I assume that this is pretty standard stuff,
but having an example to study is a real leg up.

Again, thanks for all of the information.

--
Matt Laswell
laswell at infiniteio.com
infinite io, inc.

On Fri, Nov 14, 2014 at 10:57 AM, Chilikin, Andrey <
andrey.chilikin at intel.com> wrote:

> Fortville supports symmetrical hashing on HW level, a patch for i40e PMD
> was submitted a couple of weeks ago. For Niantic you can use symmetrical
> rss key recommended by Konstantin.
>
> Regards,
> Andrey
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Friday, November 14, 2014 4:50 PM
> To: Yerden Zhumabekov; Kamraan Nasim; dev at dpdk.org
> Cc: Yuanzhang Hu
> Subject: Re: [dpdk-dev] Load-balancing position field in DPDK
> load_balancer sample app vs. Hash table
>
> > -Original Message-
> > From: Yerden Zhumabekov [mailto:e_zhumabekov at sts.kz]
> > Sent: Friday, November 14, 2014 4:23 PM
> > To: Ananyev, Konstantin; Kamraan Nasim; dev at dpdk.org
> > Cc: Yuanzhang Hu
> > Subject: Re: [dpdk-dev] Load-balancing position field in DPDK
> > load_balancer sample app vs. Hash table
> >
> > I'd like to interject a question here.
> >
> > In case of flow classification, one might possibly prefer for packets
> > from the same flow to fall on the same logical core. With this '%'
> > load balancing, it would require to get the same RSS hash value for
> > packets with direct (src to dst) and swapped (dst to src) IPs and
> > ports. Am I correct that hardware RSS calculation cannot provide this
> symmetry?
>
> As I remember, it is possible but you have to tweak rss key values.
> Here is a paper describing how to do that:
> http://www.ndsl.kaist.edu/~shinae/papers/TR-symRSS.pdf
>
> Konstantin
>
> >
> > 14.11.2014 20:44, Ananyev, Konstantin ?:
> > > If you have a NIC that is capable to do HW hash computation, then
> > > you can do your load balancing based on that value.
> > > Let say ixgbe/igb/i40e NICs can calculate RSS hash value based on
> > > different combinations of dst/src Ips, dst/src ports.
> > > This value can be stored inside mbuf for each RX packet by PMD RX
> function.
> > > Then you can do:
> > > worker_id = mbuf->hash.rss % n_workersl
> > >
> > > That might to provide better balancing then using just one byte
> > > value, plus should be a bit faster, as in that case your balancer code
> don't need to touch packet's data.
> > >
> > > Konstantin
> >
> > --
> > Sincerely,
> >
> > Yerden Zhumabekov
> > State Technical Service
> > Astana, KZ
> >
>
>


[dpdk-dev] Load-balancing position field in DPDK load_balancer sample app vs. Hash table

2014-11-15 Thread Matt Laswell
Fantastic.  Thanks for the assist.

--
Matt Laswell
laswell at infiniteio.com
infinite io, inc.


On Sat, Nov 15, 2014 at 1:10 AM, Yerden Zhumabekov 
wrote:

>  Hello Matt,
>
> You can specify RSS configuration through rte_eth_dev_configure() function
> supplied with this structure:
>
> struct rte_eth_conf port_conf = {
> .rxmode = {
> .mq_mode= ETH_MQ_RX_RSS,
>  ...
> },
> .rx_adv_conf = {
> .rss_conf = {
> .rss_key = NULL,
> .rss_hf = ETH_RSS_IPV4 | ETH_RSS_IPV6,
> },
> },
> .
> };
>
> In this case, RSS-hash is calculated over IP addresses only and with
> default RSS key. Look at lib/librte_ether/rte_ethdev.h for other
> definitions.
>
>
> 15.11.2014 0:49, Matt Laswell ?:
>
> Hey Folks,
>
>  This thread has been tremendously helpful, as I'm looking at adding
> RSS-based load balancing to my application in the not too distant future.
> Many thanks to all who have contributed, especially regarding symmetric RSS.
>
>  Not to derail the conversation too badly, but could one of you point me
> to some example code that demonstrates the steps needed to configure RSS?
> We're using Niantic NICs, so I assume that this is pretty standard stuff,
> but having an example to study is a real leg up.
>
>  Again, thanks for all of the information.
>
>  --
> Matt Laswell
> laswell at infiniteio.com
> infinite io, inc.
>
> On Fri, Nov 14, 2014 at 10:57 AM, Chilikin, Andrey <
> andrey.chilikin at intel.com> wrote:
>
>> Fortville supports symmetrical hashing on HW level, a patch for i40e PMD
>> was submitted a couple of weeks ago. For Niantic you can use symmetrical
>> rss key recommended by Konstantin.
>>
>> Regards,
>> Andrey
>>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
>> Sent: Friday, November 14, 2014 4:50 PM
>> To: Yerden Zhumabekov; Kamraan Nasim; dev at dpdk.org
>> Cc: Yuanzhang Hu
>> Subject: Re: [dpdk-dev] Load-balancing position field in DPDK
>> load_balancer sample app vs. Hash table
>>
>> > -Original Message-
>> > From: Yerden Zhumabekov [mailto:e_zhumabekov at sts.kz]
>> > Sent: Friday, November 14, 2014 4:23 PM
>> > To: Ananyev, Konstantin; Kamraan Nasim; dev at dpdk.org
>> > Cc: Yuanzhang Hu
>> > Subject: Re: [dpdk-dev] Load-balancing position field in DPDK
>> > load_balancer sample app vs. Hash table
>> >
>> > I'd like to interject a question here.
>> >
>> > In case of flow classification, one might possibly prefer for packets
>> > from the same flow to fall on the same logical core. With this '%'
>> > load balancing, it would require to get the same RSS hash value for
>> > packets with direct (src to dst) and swapped (dst to src) IPs and
>> > ports. Am I correct that hardware RSS calculation cannot provide this
>> symmetry?
>>
>> As I remember, it is possible but you have to tweak rss key values.
>> Here is a paper describing how to do that:
>> http://www.ndsl.kaist.edu/~shinae/papers/TR-symRSS.pdf
>>
>> Konstantin
>>
>> >
>> > 14.11.2014 20:44, Ananyev, Konstantin ?:
>> > > If you have a NIC that is capable to do HW hash computation, then
>> > > you can do your load balancing based on that value.
>> > > Let say ixgbe/igb/i40e NICs can calculate RSS hash value based on
>> > > different combinations of dst/src Ips, dst/src ports.
>> > > This value can be stored inside mbuf for each RX packet by PMD RX
>> function.
>> > > Then you can do:
>> > > worker_id = mbuf->hash.rss % n_workersl
>> > >
>> > > That might to provide better balancing then using just one byte
>> > > value, plus should be a bit faster, as in that case your balancer
>> code don't need to touch packet's data.
>> > >
>> > > Konstantin
>> >
>> > --
>> > Sincerely,
>> >
>> > Yerden Zhumabekov
>> > State Technical Service
>> > Astana, KZ
>> >
>>
>>
>
> --
> Sincerely,
>
> Yerden Zhumabekov
> State Technical Service
> Astana, KZ
>
>


[dpdk-dev] capture packets on VM

2016-07-15 Thread Matt Laswell
Hey Raja,

When you bind the ports to the DPDK poll mode drivers, the kernel no longer
has visibility into them.  This makes some sense intuitively - it would be
very bad for both the kernel and a user mode application to both attempt to
control the ports.  This is why tools like tcpdump and wireshark don't work
(and why the ports don't show up in ifconfig generally).

If you just want to know that packets are flowing, an easy way to do it is
simply to emit messages (via printf or the logging subsystem of your
choice) or increment counters when you receive packets.  If you want to
verify a little bit of information about the packets but don't need full
capture, you can either add some parsing information to your messages, or
build out more stats.

However, if you want to actually capture the packet contents, it's a little
trickier.  You can write your own packet-capture application, of course,
but that might be a bigger task than you're looking for.  You can also
instantiate a KNI interface and either copy or forward the packets to it
(and, from there, you can do tcpdump on the kernel side of the interface).
  I seem to recall that there's been some work done on tcpdump like
applications within DPDK, but don't remember what state those efforts are
in presently.

--
Matt Laswell
laswell at infinite.io
infinite io, inc.

On Fri, Jul 15, 2016 at 12:54 AM, Raja Jayapal  wrote:

> Hi All,
>
> I have installed dpdk on VM and would like to know how to capture the
> packets on dpdk ports.
> I am sending traffic from host  and want to know how to confirm whether
> the packets are flowing via dpdk ports.
> I tried with tcpdump and wireshark but could not capture the packets
> inside VM.
> setup : bridge1(Host)--- VM(Guest with DPDK) - bridge2(Host)
>
> Please suggest.
>
> Thanks,
> Raja
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>


[dpdk-dev] Packet Cloning

2015-05-28 Thread Matt Laswell
Since Padam is going to be altering payload, he likely cannot use that API.
The rte_pktmbuf_clone() API doesn't make a copy of the payload.  Instead,
it gives you a second mbuf whose payload pointer points back to the
contents of the first (and also increments the reference counter on the
first so that it isn't actually freed until all clones are accounted for).
This is very fast, which is good.  However, since there's only really one
buffer full of payload, changes in the original also affect the clone and
vice versa.  This can have surprising and unpleasant side effects that may
not show up until you are under load, which is awesome*.

For what it's worth, if you need to be able to modify the copy while
leaving the original alone, I don't believe that there's a good solution
within DPDK.   However, writing your own API to copy rather than clone a
packet mbuf isn't difficult.

-- 
Matt Laswell
infinite io, inc.
laswell at infiniteio.com

* Don't ask me how I know how much awesome fun this can be, though I
suspect you can guess.

On Thu, May 28, 2015 at 9:52 AM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Thu, 28 May 2015 17:15:42 +0530
> Padam Jeet Singh  wrote:
>
> > Hello,
> >
> > Is there a function in DPDK to completely clone a pkt_mbuf including the
> segments?
> >
> > I am trying to build a packet mirroring application which sends packet
> out through two separate interfaces, but the packet payload needs to be
> altered before send.
> >
> > Thanks,
> > Padam
> >
> >
>
> Isn't this what you want?
>
> /**
>  * Creates a "clone" of the given packet mbuf.
>  *
>  * Walks through all segments of the given packet mbuf, and for each of
> them:
>  *  - Creates a new packet mbuf from the given pool.
>  *  - Attaches newly created mbuf to the segment.
>  * Then updates pkt_len and nb_segs of the "clone" packet mbuf to match
> values
>  * from the original packet mbuf.
>  *
>  * @param md
>  *   The packet mbuf to be cloned.
>  * @param mp
>  *   The mempool from which the "clone" mbufs are allocated.
>  * @return
>  *   - The pointer to the new "clone" mbuf on success.
>  *   - NULL if allocation fails.
>  */
> static inline struct rte_mbuf *rte_pktmbuf_clone(struct rte_mbuf *md,
> struct rte_mempool *mp)
>


[dpdk-dev] Packet Cloning

2015-05-28 Thread Matt Laswell
Hey Kyle,

That's one way you can handle it, though I suspect you'll end up with some
complexity elsewhere in your code to deal with remembering whether you
should look at the original data or the copied and modified data.  Another
way is just to make a copy of the original mbuf, but have your copy API
stop after it reaches some particular point.  Perhaps just the L2-L4
headers, perhaps a few hundred bytes into payload, or perhaps something
else entirely. This all gets very application dependent, of course.  How
much is "enough" is going to depend heavily on what you're trying to
accomplish.

-- 
Matt Laswell
infinite io, inc.
laswell at infiniteio.com


On Thu, May 28, 2015 at 10:38 AM, Kyle Larose  wrote:

> I'm fairly new to dpdk, so I may be completely out to lunch on this, but
> here's an idea to possibly improve performance compared to a straight copy
> of the entire packet. If this idea makes sense, perhaps it could be added
> to the mbuf library as an extension of the clone functionality?
>
> If you are only modifying the headers (say the Ethernet header), is it
> possible to make a copy of only the first N bytes (say 32 bytes)?
>
> For example, you make two new "main" mbufs, which contain duplicate
> metadata, and a copy of the first 32 bytes of the packet. Call them A and
> B. Have both A and B chain to the original mbuf (call it O), which is
> reference counted as with the normal clone functionality. Then, you adjust
> the O such that its start data is 32 bytes into the packet.
>
> When you transmit A, it will send its own copy of the 32 bytes, plus the
> unaltered remaining data contained in O. A will be freed, and the refcount
> of O decremented. When you transmit B, it will work the same as with the
> previous one, except that when the refcount on O is decremented, it reaches
> zero and it is freed as well.
>
> I'm not sure if this makes sense in all cases (for example, maybe it's
> just faster to allocate separate mbufs for 64-byte packets). Perhaps that
> could also be handled transparently underneath the hood.
>
> Thoughts?
>
> Thanks,
>
> Kyle
>
> On Thu, May 28, 2015 at 11:10 AM, Matt Laswell 
> wrote:
>
>> Since Padam is going to be altering payload, he likely cannot use that
>> API.
>> The rte_pktmbuf_clone() API doesn't make a copy of the payload.  Instead,
>> it gives you a second mbuf whose payload pointer points back to the
>> contents of the first (and also increments the reference counter on the
>> first so that it isn't actually freed until all clones are accounted for).
>> This is very fast, which is good.  However, since there's only really one
>> buffer full of payload, changes in the original also affect the clone and
>> vice versa.  This can have surprising and unpleasant side effects that may
>> not show up until you are under load, which is awesome*.
>>
>> For what it's worth, if you need to be able to modify the copy while
>> leaving the original alone, I don't believe that there's a good solution
>> within DPDK.   However, writing your own API to copy rather than clone a
>> packet mbuf isn't difficult.
>>
>> --
>> Matt Laswell
>> infinite io, inc.
>> laswell at infiniteio.com
>>
>> * Don't ask me how I know how much awesome fun this can be, though I
>> suspect you can guess.
>>
>> On Thu, May 28, 2015 at 9:52 AM, Stephen Hemminger <
>> stephen at networkplumber.org> wrote:
>>
>> > On Thu, 28 May 2015 17:15:42 +0530
>> > Padam Jeet Singh  wrote:
>> >
>> > > Hello,
>> > >
>> > > Is there a function in DPDK to completely clone a pkt_mbuf including
>> the
>> > segments?
>> > >
>> > > I am trying to build a packet mirroring application which sends packet
>> > out through two separate interfaces, but the packet payload needs to be
>> > altered before send.
>> > >
>> > > Thanks,
>> > > Padam
>> > >
>> > >
>> >
>> > Isn't this what you want?
>> >
>> > /**
>> >  * Creates a "clone" of the given packet mbuf.
>> >  *
>> >  * Walks through all segments of the given packet mbuf, and for each of
>> > them:
>> >  *  - Creates a new packet mbuf from the given pool.
>> >  *  - Attaches newly created mbuf to the segment.
>> >  * Then updates pkt_len and nb_segs of the "clone" packet mbuf to match
>> > values
>> >  * from the original packet mbuf.
>> >  *
>> >  * @param md
>> >  *   The packet mbuf to be cloned.
>> >  * @param mp
>> >  *   The mempool from which the "clone" mbufs are allocated.
>> >  * @return
>> >  *   - The pointer to the new "clone" mbuf on success.
>> >  *   - NULL if allocation fails.
>> >  */
>> > static inline struct rte_mbuf *rte_pktmbuf_clone(struct rte_mbuf *md,
>> > struct rte_mempool *mp)
>> >
>>
>
>


[dpdk-dev] How to approach packet TX lockups

2015-11-16 Thread Matt Laswell
Hey Folks,

I sent this to the users email list, but I'm not sure how many people are
actively reading that list at this point.  I'm dealing with a situation in
which my application loses the ability to transmit packets out of a port
during times of moderate stress.  I'd love to hear suggestions for how to
approach this problem, as I'm a bit at a loss at the moment.

Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on Haswell
processors.  I'm using the 82599 controller, configured to spread packets
across multiple queues.  Each queue is accessed by a different lcore in my
application; there is therefore concurrent access to the controller, but
not to any of the queues.  We're binding the ports to the igb_uio driver.
The symptoms I see are these:


   - All transmit out of a particular port stops
   - rte_eth_tx_burst() indicates that it is sending all of the packets
   that I give to it
   - rte_eth_stats_get() gives me stats indicating that no packets are
   being sent on the affected port.  Also, no tx errors, and no pause frames
   sent or received (opackets = 0, obytes = 0, oerrors = 0, etc.)
   - All other ports continue to work normally
   - The affected port continues to receive packets without problems; only
   TX is affected
   - Resetting the port via rte_eth_dev_stop() and rte_eth_dev_start()
   restores things and packets can flow again
   - The problem is replicable on multiple devices, and doesn't follow one
   particular port

I've tried calling rte_mbuf_sanity_check() on all packets before sending
them.  I've also instrumented my code to look for packets that have already
been sent or freed, as well as cycles in chained packets being sent.  I
also put a lock around all accesses to rte_eth* calls to synchronize access
to the NIC.  Given some recent discussion here, I also tried changing the
TX RS threshold from 0 to 32, 16, and 1.  None of these strategies proved
effective.

Like I said at the top, I'm a little at a loss at this point.  If you were
dealing with this set of symptoms, how would you proceed?

Thanks in advance.

--
Matt Laswell
infinite io, inc.
laswell at infiniteio.com


[dpdk-dev] How to approach packet TX lockups

2015-11-16 Thread Matt Laswell
Hey Stephen,

Thanks a lot; that's really useful information.  Unfortunately, I'm at a
stage in our release cycle where upgrading to a new version of DPDK isn't
feasible.  Any chance you (or others reading this) has a pointer to the
relevant changes?  While I can't afford to upgrade DPDK entirely,
backporting targeted fixes is more doable.

Again, thanks.

- Matt


On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Mon, 16 Nov 2015 17:48:35 -0600
> Matt Laswell  wrote:
>
> > Hey Folks,
> >
> > I sent this to the users email list, but I'm not sure how many people are
> > actively reading that list at this point.  I'm dealing with a situation
> in
> > which my application loses the ability to transmit packets out of a port
> > during times of moderate stress.  I'd love to hear suggestions for how to
> > approach this problem, as I'm a bit at a loss at the moment.
> >
> > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on Haswell
> > processors.  I'm using the 82599 controller, configured to spread packets
> > across multiple queues.  Each queue is accessed by a different lcore in
> my
> > application; there is therefore concurrent access to the controller, but
> > not to any of the queues.  We're binding the ports to the igb_uio driver.
> > The symptoms I see are these:
> >
> >
> >- All transmit out of a particular port stops
> >- rte_eth_tx_burst() indicates that it is sending all of the packets
> >that I give to it
> >- rte_eth_stats_get() gives me stats indicating that no packets are
> >being sent on the affected port.  Also, no tx errors, and no pause
> frames
> >sent or received (opackets = 0, obytes = 0, oerrors = 0, etc.)
> >- All other ports continue to work normally
> >- The affected port continues to receive packets without problems;
> only
> >TX is affected
> >- Resetting the port via rte_eth_dev_stop() and rte_eth_dev_start()
> >restores things and packets can flow again
> >- The problem is replicable on multiple devices, and doesn't follow
> one
> >particular port
> >
> > I've tried calling rte_mbuf_sanity_check() on all packets before sending
> > them.  I've also instrumented my code to look for packets that have
> already
> > been sent or freed, as well as cycles in chained packets being sent.  I
> > also put a lock around all accesses to rte_eth* calls to synchronize
> access
> > to the NIC.  Given some recent discussion here, I also tried changing the
> > TX RS threshold from 0 to 32, 16, and 1.  None of these strategies proved
> > effective.
> >
> > Like I said at the top, I'm a little at a loss at this point.  If you
> were
> > dealing with this set of symptoms, how would you proceed?
> >
>
> I remember some issues with old DPDK 1.6 with some of the prefetch
> thresholds on 82599. You would be better off going to a later DPDK
> version.
>


[dpdk-dev] How to approach packet TX lockups

2015-11-17 Thread Matt Laswell
Yes, we're on 1.6r2.  That said, I've tried a number of different values
for the thresholds without a lot of luck.  Setting wthresh/hthresh/pthresh
to 0/0/32 or 0/0/0 doesn't appear to fix things.  And, as Matthew
suggested, I'm pretty sure using 0 for the thresholds leads to auto-config
by the driver.  I also tried 1/1/32, which required that I also change the
rs_thresh value from 0 to 1 to work around a panic in PMD initialization
("TX WTHRESH must be set to 0 if tx_rs_thresh is greater than 1").

Any other suggestions?

On Mon, Nov 16, 2015 at 7:31 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Mon, 16 Nov 2015 18:49:15 -0600
> Matt Laswell  wrote:
>
> > Hey Stephen,
> >
> > Thanks a lot; that's really useful information.  Unfortunately, I'm at a
> > stage in our release cycle where upgrading to a new version of DPDK isn't
> > feasible.  Any chance you (or others reading this) has a pointer to the
> > relevant changes?  While I can't afford to upgrade DPDK entirely,
> > backporting targeted fixes is more doable.
> >
> > Again, thanks.
> >
> > - Matt
> >
> >
> > On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger <
> > stephen at networkplumber.org> wrote:
> >
> > > On Mon, 16 Nov 2015 17:48:35 -0600
> > > Matt Laswell  wrote:
> > >
> > > > Hey Folks,
> > > >
> > > > I sent this to the users email list, but I'm not sure how many
> people are
> > > > actively reading that list at this point.  I'm dealing with a
> situation
> > > in
> > > > which my application loses the ability to transmit packets out of a
> port
> > > > during times of moderate stress.  I'd love to hear suggestions for
> how to
> > > > approach this problem, as I'm a bit at a loss at the moment.
> > > >
> > > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on
> Haswell
> > > > processors.  I'm using the 82599 controller, configured to spread
> packets
> > > > across multiple queues.  Each queue is accessed by a different lcore
> in
> > > my
> > > > application; there is therefore concurrent access to the controller,
> but
> > > > not to any of the queues.  We're binding the ports to the igb_uio
> driver.
> > > > The symptoms I see are these:
> > > >
> > > >
> > > >- All transmit out of a particular port stops
> > > >- rte_eth_tx_burst() indicates that it is sending all of the
> packets
> > > >that I give to it
> > > >- rte_eth_stats_get() gives me stats indicating that no packets
> are
> > > >being sent on the affected port.  Also, no tx errors, and no pause
> > > frames
> > > >sent or received (opackets = 0, obytes = 0, oerrors = 0, etc.)
> > > >- All other ports continue to work normally
> > > >- The affected port continues to receive packets without problems;
> > > only
> > > >TX is affected
> > > >- Resetting the port via rte_eth_dev_stop() and
> rte_eth_dev_start()
> > > >restores things and packets can flow again
> > > >- The problem is replicable on multiple devices, and doesn't
> follow
> > > one
> > > >particular port
> > > >
> > > > I've tried calling rte_mbuf_sanity_check() on all packets before
> sending
> > > > them.  I've also instrumented my code to look for packets that have
> > > already
> > > > been sent or freed, as well as cycles in chained packets being
> sent.  I
> > > > also put a lock around all accesses to rte_eth* calls to synchronize
> > > access
> > > > to the NIC.  Given some recent discussion here, I also tried
> changing the
> > > > TX RS threshold from 0 to 32, 16, and 1.  None of these strategies
> proved
> > > > effective.
> > > >
> > > > Like I said at the top, I'm a little at a loss at this point.  If you
> > > were
> > > > dealing with this set of symptoms, how would you proceed?
> > > >
> > >
> > > I remember some issues with old DPDK 1.6 with some of the prefetch
> > > thresholds on 82599. You would be better off going to a later DPDK
> > > version.
> > >
>
> I hope you are on 1.6.0r2 at least??
>
> With older DPDK there was no way to get driver to tell you what the
> preferred settings were for pthresh/hthresh/wthresh. And the values
> in Intel sample applications were broken on some hardware.
>
> I remember reverse engineering the safe values from reading the Linux
> driver.
>
> The Linux driver is much better tested than the DPDK one...
> In the Linux driver, the Transmit Descriptor Controller (txdctl)
> is fixed at (for transmit)
>wthresh = 1
>hthresh = 1
>pthresh = 32
>
> The DPDK 2.2 driver uses:
> wthresh = 0
> hthresh = 0
> pthresh = 32
>
>
>
>
>
>
>


[dpdk-dev] How to approach packet TX lockups

2015-11-17 Thread Matt Laswell
Hey Konstantin,

Moving from 1.6r2 to 2.2 is going to be a pretty significant change due to
things like changes in the MBuf format, API differences, etc.  Even as an
experiment, that's an awfully large change to absorb.  Is there a subset
that you're referring to that could be more readily included without
modifying so many touch points into DPDK?

For reference, my transmit function is  rte_eth_tx_burst().  It seems to
reliably tell me that it has enqueued all of the packets that I gave it,
however the stats from rte_eth_stats_get() indicate that no packets are
actually being sent.

Thanks,

- Matt

On Tue, Nov 17, 2015 at 8:44 AM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matt Laswell
> > Sent: Tuesday, November 17, 2015 2:24 PM
> > To: Stephen Hemminger
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] How to approach packet TX lockups
> >
> > Yes, we're on 1.6r2.  That said, I've tried a number of different values
> > for the thresholds without a lot of luck.  Setting
> wthresh/hthresh/pthresh
> > to 0/0/32 or 0/0/0 doesn't appear to fix things.  And, as Matthew
> > suggested, I'm pretty sure using 0 for the thresholds leads to
> auto-config
> > by the driver.  I also tried 1/1/32, which required that I also change
> the
> > rs_thresh value from 0 to 1 to work around a panic in PMD initialization
> > ("TX WTHRESH must be set to 0 if tx_rs_thresh is greater than 1").
> >
> > Any other suggestions?
>
> That's not only DPDK code changed since 1.6.
> I am pretty sure that we also have a new update of shared code since then
> (and as I remember probably more than one).
> One suggestion would be at least try to upgrade the shared code up to the
> latest.
> Another one - even if you can't upgrade to 2.2 in you production
> environment,
> it probably worth to do that in some test environment and then check does
> the problem persist.
> If yes,  then we'll need some guidance how to reproduce it.
>
> Another question it is not clear what TX function do you use?
> Konstantin
>
> >
> > On Mon, Nov 16, 2015 at 7:31 PM, Stephen Hemminger <
> > stephen at networkplumber.org> wrote:
> >
> > > On Mon, 16 Nov 2015 18:49:15 -0600
> > > Matt Laswell  wrote:
> > >
> > > > Hey Stephen,
> > > >
> > > > Thanks a lot; that's really useful information.  Unfortunately, I'm
> at a
> > > > stage in our release cycle where upgrading to a new version of DPDK
> isn't
> > > > feasible.  Any chance you (or others reading this) has a pointer to
> the
> > > > relevant changes?  While I can't afford to upgrade DPDK entirely,
> > > > backporting targeted fixes is more doable.
> > > >
> > > > Again, thanks.
> > > >
> > > > - Matt
> > > >
> > > >
> > > > On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger <
> > > > stephen at networkplumber.org> wrote:
> > > >
> > > > > On Mon, 16 Nov 2015 17:48:35 -0600
> > > > > Matt Laswell  wrote:
> > > > >
> > > > > > Hey Folks,
> > > > > >
> > > > > > I sent this to the users email list, but I'm not sure how many
> > > people are
> > > > > > actively reading that list at this point.  I'm dealing with a
> > > situation
> > > > > in
> > > > > > which my application loses the ability to transmit packets out
> of a
> > > port
> > > > > > during times of moderate stress.  I'd love to hear suggestions
> for
> > > how to
> > > > > > approach this problem, as I'm a bit at a loss at the moment.
> > > > > >
> > > > > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on
> > > Haswell
> > > > > > processors.  I'm using the 82599 controller, configured to spread
> > > packets
> > > > > > across multiple queues.  Each queue is accessed by a different
> lcore
> > > in
> > > > > my
> > > > > > application; there is therefore concurrent access to the
> controller,
> > > but
> > > > > > not to any of the queues.  We're binding the ports to the igb_uio
> > > driver.
> > > > > > The symptoms I see are these:
> > > > > >
>

[dpdk-dev] How to approach packet TX lockups

2015-11-17 Thread Matt Laswell
Thanks, I'll give that a try.

In my environment, I'm pretty sure we're using the fully-featured
ixgbe_xmit_pkts() and not _simple().   If setting rs_thresh=1 is safer,
I'll stick with that.

Again, thanks to all for the assistance.

- Matt

On Tue, Nov 17, 2015 at 10:20 AM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

> Hi Matt,
>
>
>
> As I said, at least  try to upgrade contents of shared code to the latest
> one.
>
> In previous releases: lib/librte_pmd_ixgbe/ixgbe, now located at:
> drivers/net/ixgbe/.
>
>
>
> > For reference, my transmit function is  rte_eth_tx_burst().
>
> I meant what ixgbe TX function it points to: ixgbe_xmit_pkts or
> ixgbe_xmit_pkts_simple()?
>
> For ixgbe_xmit_pkts_simple() don?t set tx_rs_thresh > 32,
>
> for ixgbe_xmit_pkts() the safest way is to set  tx_rs_thresh=1.
>
> Though as I understand from your previous mails, you already did that, and
> it didn?t help.
>
> Konstantin
>
>
>
>
>
> *From:* Matt Laswell [mailto:laswell at infiniteio.com]
> *Sent:* Tuesday, November 17, 2015 3:05 PM
> *To:* Ananyev, Konstantin
> *Cc:* Stephen Hemminger; dev at dpdk.org
>
> *Subject:* Re: [dpdk-dev] How to approach packet TX lockups
>
>
>
> Hey Konstantin,
>
>
>
> Moving from 1.6r2 to 2.2 is going to be a pretty significant change due to
> things like changes in the MBuf format, API differences, etc.  Even as an
> experiment, that's an awfully large change to absorb.  Is there a subset
> that you're referring to that could be more readily included without
> modifying so many touch points into DPDK?
>
>
>
> For reference, my transmit function is  rte_eth_tx_burst().  It seems to
> reliably tell me that it has enqueued all of the packets that I gave it,
> however the stats from rte_eth_stats_get() indicate that no packets are
> actually being sent.
>
>
>
> Thanks,
>
>
>
> - Matt
>
>
>
> On Tue, Nov 17, 2015 at 8:44 AM, Ananyev, Konstantin <
> konstantin.ananyev at intel.com> wrote:
>
>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matt Laswell
> > Sent: Tuesday, November 17, 2015 2:24 PM
> > To: Stephen Hemminger
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] How to approach packet TX lockups
> >
> > Yes, we're on 1.6r2.  That said, I've tried a number of different values
> > for the thresholds without a lot of luck.  Setting wthresh/hthresh/
> pthresh
> > to 0/0/32 or 0/0/0 doesn't appear to fix things.  And, as Matthew
> > suggested, I'm pretty sure using 0 for the thresholds leads to auto-
> config
> > by the driver.  I also tried 1/1/32, which required that I also change
> the
> > rs_thresh value from 0 to 1 to work around a panic in PMD initialization
> > ("TX WTHRESH must be set to 0 if tx_rs_thresh is greater than 1").
> >
> > Any other suggestions?
>
> That's not only DPDK code changed since 1.6.
> I am pretty sure that we also have a new update of shared code since then
> (and as I remember probably more than one).
> One suggestion would be at least try to upgrade the shared code up to the
> latest.
> Another one - even if you can't upgrade to 2.2 in you production
> environment,
> it probably worth to do that in some test environment and then check does
> the problem persist.
> If yes,  then we'll need some guidance how to reproduce it.
>
> Another question it is not clear what TX function do you use?
> Konstantin
>
>
> >
> > On Mon, Nov 16, 2015 at 7:31 PM, Stephen Hemminger <
> > stephen at networkplumber.org> wrote:
> >
> > > On Mon, 16 Nov 2015 18:49:15 -0600
> > > Matt Laswell  wrote:
> > >
> > > > Hey Stephen,
> > > >
> > > > Thanks a lot; that's really useful information.  Unfortunately, I'm
> at a
> > > > stage in our release cycle where upgrading to a new version of DPDK
> isn't
> > > > feasible.  Any chance you (or others reading this) has a pointer to
> the
> > > > relevant changes?  While I can't afford to upgrade DPDK entirely,
> > > > backporting targeted fixes is more doable.
> > > >
> > > > Again, thanks.
> > > >
> > > > - Matt
> > > >
> > > >
> > > > On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger <
> > > > stephen at networkplumber.org> wrote:
> > > >
> > > > > On Mon, 16 Nov 2015 17:48:35 -0600
> > > > > Matt Laswell  wrote:
>

[dpdk-dev] DPDK Port Mirroring

2015-07-09 Thread Matt Laswell
Keith speaks truth.  If I were going to do what you're describing, I would
do the following:

1. Start with the l2fwd example application.
2. Remove the part where it modifies the ethernet MAC address of received
packets.
3. Add a call in to clone mbufs via rte_pktmbuf_clone() and send the cloned
packets out of the port of your choice

As long as you don't need to modify the packets - and if you're mirroring,
you shouldn't - simply cloning received packets and sending them out your
mirror port should get you most of the way there.

On Thu, Jul 9, 2015 at 3:17 PM, Wiles, Keith  wrote:

>
>
> On 7/9/15, 12:26 PM, "dev on behalf of Assaad, Sami (Sami)"
>  
> wrote:
>
> >Hello,
> >
> >I want to build a DPDK app that is able to port-mirror all ingress
> >traffic from two 10G interfaces.
> >
> >1.   Is it possible in port-mirroring traffic consisting of 450byte
> >packets at 20G without losing more than 5% of traffic?
> >
> >2.   Would you have any performance results due to packet copying?
>
> Do you need to copy the packet if you increment the reference count you
> can send the packet to both ports without having to copy the packet.
> >
> >3.   Would you have any port mirroring DPDK sample code?
>
> DPDK does not have port mirroring example, but you could grab the l2fwd or
> l3fwd and modify it to do what you want.
> >
> >Thanks in advance.
> >
> >Best Regards,
> >Sami Assaad.
>
>


[dpdk-dev] Kernel panic in KNI

2016-04-07 Thread Matt Laswell
Hey Robert,

Thanks for the insight.  I work with Jay on the code he's asking about; we
only have one mbuf pool that we use for all packets.  Mostly, this is for
the reasons that you describe, as well as for the sake of simplicity.  As
it happens, the stack trace we're seeing makes it look as though either the
mbuf's data pointer is screwed up, or the VA translation done on it is.  I
suspect that we're getting to a failure mode similar to the one you
experienced, though perhaps for different reasons.

Thanks,
Matt

On Wed, Apr 6, 2016 at 5:30 PM, Sanford, Robert  wrote:

> Hi Jay,
>
> I won't try to interpret your kernel stack trace. But, I'll tell you about
> a KNI-related problem that we once experienced, and the symptom was a
> kernel hang.
>
> The problem was that we were passing mbufs allocated out of one mempool,
> to a KNI context that we had set up with a different mempool (on a
> different CPU socket). The KNI kernel driver, converts the user-space mbuf
> virtual address (VA) to a kernel VA by adding the difference between the
> user and kernel VAs of the mempool used to create the KNI context. So, if
> an mbuf comes from a different mempool, the calculated address will
> probably be VERY BAD.
>
> Could this be your problem?
>
> --
> Robert
>
>
> On 4/6/16 4:16 PM, "Jay Rolette"  wrote:
>
> >I had a system lockup hard a couple of days ago and all we were able to
> >get
> >was a photo of the LCD monitor with most of the kernel panic on it. No way
> >to scroll back the buffer and nothing in the logs after we rebooted. Not
> >surprising with a kernel panic due to an exception during interrupt
> >processing. We have a serial console attached in case we are able to get
> >it
> >to happen again, but it's not easy to reproduce (hours of runtime for this
> >instance).
> >
> >Ran the photo through OCR software to get a text version of the dump, so
> >possible I missed some fixups in this:
> >
> >[39178.433262] RDX: 00ba RSI: 881fd2f350ee RDI:
> >a12520669126180a
> >[39178.464020] RBP: 880433966970 R08: a12520669126180a R09:
> >881fd2f35000
> >[39178.495091] R10:  R11: 881fd2f88000 R12:
> >883fdla75ee8
> >[39178.526594] R13: 00ba R14: 7fdad5a66780 R15:
> >883715ab6780
> >[39178.559011] FS:  77fea740() GS:88lfffc0()
> >knlGS:
> >[39178.592005] CS:  0010 DS:  ES:  CR0: 80050033
> >[39178.623931] CR2: 77ea2000 CR3: 001fd156f000 CR4:
> >001407f0
> >[39178.656187] Stack:
> >[39178.689025] c067c7ef 00ba 00ba
> >881fd2f88000
> >[39178.722682] 4000 8B3fd0bbd09c 883fdla75ee8
> >8804339bb9c8
> >[39178.756525] 81658456 881fcd2ec40c c0680700
> >880436bad800
> >[39178.790577] Call Trace:
> >[39178.824420] [] ? kni_net_tx+0xef/0x1a0 [rte_kni]
> >[39178.859190] [] dev_hard_start_xmit+0x316/0x5c0
> >[39178.893426] [] sch_direct_xmit+0xee/0xic0
> >[39178.927435] [l __dev_queue_xmit+0x200/0x4d0
> >[39178.961684] [l dev_queue_xmit+0x10/0x20
> >[39178.996194] [] neigh_connected_output+0x67/0x100
> >[39179.031098] [] ip_finish_output+0xid8/0x850
> >[39179.066709] [l ip_output+0x58/0x90
> >[39179.101551] [] ip_local_out_sk+0x30/0x40
> >[39179.136823] [] ip_queue_xmit+0xl3f/0x3d0
> >[39179.171742] [] tcp_transmit_skb+0x47c/0x900
> >[39179.206854] [l tcp_write_xmit+0x110/0xcb0
> >[39179.242335] [] __tcp_push_pending_frames+0x2e/0xc0
> >[39179.277632] [] tcp_push+0xec/0x120
> >[39179.311768] [] tcp_sendmsg+0xb9/0xce0
> >[39179.346934] [] ? tcp_recvmsg+0x6e2/0xba0
> >[39179.385586] [] inet_sendmsg+0x64/0x60
> >[39179.424228] [] ? apparmor_socket_sendmsg+0x21/0x30
> >[39179.4586581 [] sock_sendmsg+0x86/0xc0
> >[39179.493220] [] ? __inet_stream_connect+0xa5/0x320
> >[39179.528033] [] ? __fdget+0x13/0x20
> >[39179.561214] [] SYSC_sendto+0x121/0x1c0
> >[39179.594665] [] ? aa_sk_perm.isra.4+0x6d/0x150
> >[39179.6268931 [] ? read_tsc+0x9/0x20
> >[39179.6586541 [] ? ktime_get_ts+0x48/0xe0
> >[39179.689944] [] SyS_sendto+0xe/0x10
> >[39179.719575] [] system_call_fastpath+0xia/0xif
> >[39179.748760] Code: 43 58 48 Zb 43 50 88 43 4e 5b 5d c3 66 Of if 84 00 00
> >00 00 00 e8 fb fb ff ff eb e2 90 90 90 90 90 90 90
> > 90 48 89 f8 48 89 d1  a4 c3 03 83 eZ 07 f3 48 .15 89 di f3 a4 c3 20
> >4c
> >8b % 4c 86
> >[39179.808690] RIP  [] memcpy+0x6/0x110
> >[39179.837238]  RSP 
> >[39179.933755] ---[ end trace

Re: [dpdk-dev] [PATCH 1/2] test: replace license text with SPDX tag

2019-08-14 Thread Peters, Matt
> -Original Message-
> From: Legacy, Allain
> Sent: Tuesday, August 13, 2019 8:20 AM
> To: hemant.agra...@nxp.com
> Cc: dev@dpdk.org; john.mcnam...@intel.com; marko.kovace...@intel.com;
> cristian.dumitre...@intel.com; Peters, Matt
> Subject: [PATCH 1/2] test: replace license text with SPDX tag
> 
> Replacing full license text with SPDX tag.
> 
> Signed-off-by: Allain Legacy 
> ---
Acked-by: Matt Peters 


Re: [dpdk-dev] [PATCH 2/2] doc: replace license text with SPDX tag

2019-08-14 Thread Peters, Matt
> -Original Message-
> From: Legacy, Allain
> Sent: Tuesday, August 13, 2019 8:20 AM
> To: hemant.agra...@nxp.com
> Cc: dev@dpdk.org; john.mcnam...@intel.com; marko.kovace...@intel.com;
> cristian.dumitre...@intel.com; Peters, Matt
> Subject: [PATCH 2/2] doc: replace license text with SPDX tag
> 
> Replace full license text with SPDX tag.
> 
> Signed-off-by: Allain Legacy 
Acked-by: Matt Peters 


Re: [dpdk-dev] [PATCH v3] net/avp: remove resources when port is closed

2019-06-19 Thread Peters, Matt
> -Original Message-
> From: Legacy, Allain
> Sent: Tuesday, June 18, 2019 3:19 PM
> To: tho...@monjalon.net
> Cc: dev@dpdk.org; ferruh.yi...@intel.com; Peters, Matt
> Subject: [PATCH v3] net/avp: remove resources when port is closed
> 
> The rte_eth_dev_close() function now handles freeing resources for
> devices (e.g., mac_addrs).  To conform with the new close() behaviour we
> are asserting the RTE_ETH_DEV_CLOSE_REMOVE flag so that
> rte_eth_dev_close() releases all device level dynamic memory.
> 
> Second level memory allocated to each individual rx/tx queue is now
> freed as part of the close() operation therefore making it safe for the
> rte_eth_dev_close() function to free the device private data without
> orphaning the rx/tx queue pointers.
> 
> Cc: Matt Peters 
> Signed-off-by: Allain Legacy 

Acked-by: Matt Peters 


Re: [dpdk-dev] [PATCH v2] net/avp: remove resources when port is closed

2019-05-31 Thread Peters, Matt



> -Original Message-
> From: Legacy, Allain
> Sent: Monday, May 27, 2019 1:03 PM
> To: tho...@monjalon.net
> Cc: dev@dpdk.org; ferruh.yi...@intel.com; Peters, Matt
> Subject: [PATCH v2] net/avp: remove resources when port is closed
> 
> The rte_eth_dev_close() function now handles freeing resources for
> devices (e.g., mac_addrs).  To conform with the new close() behaviour we
> are asserting the RTE_ETH_DEV_CLOSE_REMOVE flag so that
> rte_eth_dev_close() releases all device level dynamic memory.
> 
> Second level memory allocated to each individual rx/tx queue is now
> freed as part of the close() operation therefore making it safe for the
> rte_eth_dev_close() function to free the device private data without
> orphaning the rx/tx queue pointers.
> 
> Cc: Matt Peters 
> Signed-off-by: Allain Legacy 
> ---
Acked-by: Matt Peters 


[dpdk-dev] [dpdk-moving] Draft Project Charter

2016-11-08 Thread Matt Spencer
I think we need a discussion about the levels of membership - possibly at next 
weeks meeting?


My feeling is that we need more than one level

  - One to enable contribution of hardware to the lab, as the lab will add cost 
to the overall project budget

  - A second to enable contribution to the marketing aspects of the project and 
to allow association for marketing purposes


Calling these Gold and Silver is fine with me, but as I say, lets discuss this 
at next weeks meeting.


Matt


From: moving  on behalf of O'Driscoll, Tim 

Sent: 08 November 2016 03:57:36
To: Vincent JARDIN; moving at dpdk.org
Cc: dev at dpdk.org
Subject: Re: [dpdk-moving] [dpdk-dev] Draft Project Charter


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vincent JARDIN
> Sent: Tuesday, November 8, 2016 11:41 AM
> To: moving at dpdk.org
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [dpdk-moving] Draft Project Charter
>
> Tim,
>
> Thanks for your draft, but it is not a good proposal. It is not written
> in the spirit that we have discussed in Dublin:
>- you create the status of "Gold" members that we do not want from
> Linux Foundation,

As I said in the email, I put in two levels of membership as a placeholder. The 
first thing we need to decide is if we want to have a budget and membership, or 
if we want the OVS model with 0 budget and no membership. We can discuss that 
at today's meeting.

If we do want a membership model then we'll need to decide if everybody 
contributes at the same rate or if we support multiple levels. So, for now, the 
text on having two levels is just an example to show what a membership model 
might look like.

>- you start with "DPDK's first $1,000,000", it is far from the $O
> that we agreed based on OVS model.

That's just standard text that I see in all the LF charters. It's even in the 
OVS charter (http://openvswitch.org/charter/charter.pdf) even though they have 
0 budget. I assumed it's standard text for the LF. I'm sure Mike Dolan can 
clarify.

>
> Please, explain why you did change it?
>
> Thank you,
>Vincent
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


[dpdk-dev] Running kni with low amount of cores

2014-07-09 Thread Olson, Matt Lyle
Hello,

I have two NIC devices and a quad core system that I'm trying to run kni on. I 
would like to leave two cores for general use and two cores for kni. When run 
kni on just one of the ports, everything works fine and I can use that vEth 
normally. The exact command I run is this: ./kni -c 0x0c -n 2 -- -P -p 0x1 
-config="(0,2,3)" But when I try to run kni on both ports, I can't find a 
configuration to make it work. Here's all the configs that I have tried, but 
none of them seem to work properly, the same way as just a single port: 
"(0,2,3), (1,2,3)" "(0,2,3), (1,3,2)""(0,2,2), (1,3,3)". I'm wondering 
if it is supposed to work this way,  where each port needs its own Tx and Rx 
core, or if there is a way to get around it. If it is supposed to work this 
way, would it be worth my time to edit the code to allow me to have all Rx 
information dealt with on one core and all Tx on another?

Thanks,
Matt Olson