[dpdk-dev] failure with gmail accounts

2014-11-03 Thread Jay Rolette
What was the period of the issue? Also does this apply to just gmail.com
email addresses or also to other domains hosted by Google?

On Mon, Nov 3, 2014 at 3:25 AM, Thomas Monjalon 
wrote:

> Hi all,
>
> There was a major failure with gmail accounts.
> Due to bounce errors, the mailing list system has removed these accounts.
> They are now restored but some preferences may be lost.
> Obviously, these users may have missed some emails.
> Please check the archives:
> http://dpdk.org/ml/archives/dev/2014-November/date.html
>
> --
> Thomas
>


[dpdk-dev] VMware Fusion + DPDK and KNI

2014-08-18 Thread Jay Rolette
Thought I'd put this out there in case anyone else runs into it.

Using DPDK 1.6 on Ubuntu 14.04 LTS in a hardware appliance. Also using KNI
to share the data ports with an app that needs a normal TCP/IP stack
interface.

We had everything working reasonably well on the hardware, but most of our
developers run their code in a VM on their laptops: OS X (Mavericks) +
VMware Fusion 6 Pro.

On some VMs, we were getting errors trying to configure KNI ports:

$ sudo ifconfig ffEth0 10.111.2.100 netmask 255.255.0.0 up
SIOCSIFFLAGS: Timer expired
SIOCSIFFLAGS: Timer expired

Skipping the "fun" involved with trying to track down the problem, here's
what ended up fixing it.

We had 4 network ports on the VM:

   - eth0 - Management port
   - eth1 - "other" function not related to the problem
   - eth2 & eth3 - inline datapath (bump-in-the-wire), but also KNI mapped
   to ffEth0 & ffEth1 by our DPDK app

If eth2 and eth3 are on the same vmnet, you'll get the "SIOCSIFFLAGS: Timer
expired" errors. Depending on what parameters you try to set, ifconfig may
think some of them have taken effect (they haven't) or it won't (setting
the MTU, etc.).

If you put eth2 and eth3 on separate vmnets, then no issues and you can
configure the KNI ports via ifconfig as expected.

No idea why having the ports on the same vmnet matters, since our app
doesn't care, but I haven't gone spelunking through the KNI source to find
the root cause.

Doubtful this will matter to many (any?), but maybe it'll save someone some
time.

Jay Rolette
*infinite io*


[dpdk-dev] [PATCH 01/11] ixgbe: clean log messages

2014-08-26 Thread Jay Rolette
Why are you adding newlines to log message strings? Shouldn't that be up to
whatever the messages end up getting routed to?

Jay


On Tue, Aug 26, 2014 at 9:09 AM, David Marchand 
wrote:

> Clean log messages:
> - remove superfluous \n in log macros and add some \n where needed,
> - remove leading \n in some messages,
> - split multi lines messages,
> - replace some PMD_INIT_LOG(DEBUG, "some_func\n") with
> PMD_INIT_FUNC_TRACE().
>
> Signed-off-by: David Marchand 
> ---
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   99
> +--
>  lib/librte_pmd_ixgbe/ixgbe_fdir.c   |   12 ++---
>  lib/librte_pmd_ixgbe/ixgbe_logs.h   |   12 ++---
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c   |   14 ++---
>  4 files changed, 68 insertions(+), 69 deletions(-)
>
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> index 59122a1..ac00323 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> @@ -737,7 +737,7 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct
> eth_driver *eth_drv,
>  #endif /* RTE_NIC_BYPASS */
>
> if (diag != IXGBE_SUCCESS) {
> -   PMD_INIT_LOG(ERR, "Shared code init failed: %d", diag);
> +   PMD_INIT_LOG(ERR, "Shared code init failed: %d\n", diag);
> return -EIO;
> }
>
> @@ -763,7 +763,7 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct
> eth_driver *eth_drv,
> /* Make sure we have a good EEPROM before we read from it */
> diag = ixgbe_validate_eeprom_checksum(hw, &csum);
> if (diag != IXGBE_SUCCESS) {
> -   PMD_INIT_LOG(ERR, "The EEPROM checksum is not valid: %d",
> diag);
> +   PMD_INIT_LOG(ERR, "The EEPROM checksum is not valid:
> %d\n", diag);
> return -EIO;
> }
>
> @@ -791,13 +791,14 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct
> eth_driver *eth_drv,
> if (diag == IXGBE_ERR_EEPROM_VERSION) {
> PMD_INIT_LOG(ERR, "This device is a pre-production
> adapter/"
> "LOM.  Please be aware there may be issues associated "
> -   "with your hardware.\n If you are experiencing
> problems "
> +   "with your hardware.\n");
> +   PMD_INIT_LOG(ERR, "If you are experiencing problems "
> "please contact your Intel or hardware representative "
> "who provided you with this hardware.\n");
> } else if (diag == IXGBE_ERR_SFP_NOT_SUPPORTED)
> PMD_INIT_LOG(ERR, "Unsupported SFP+ Module\n");
> if (diag) {
> -   PMD_INIT_LOG(ERR, "Hardware Initialization Failure: %d",
> diag);
> +   PMD_INIT_LOG(ERR, "Hardware Initialization Failure: %d\n",
> diag);
> return -EIO;
> }
>
> @@ -813,7 +814,7 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct
> eth_driver *eth_drv,
> if (eth_dev->data->mac_addrs == NULL) {
> PMD_INIT_LOG(ERR,
> "Failed to allocate %u bytes needed to store "
> -   "MAC addresses",
> +   "MAC addresses\n",
> ETHER_ADDR_LEN * hw->mac.num_rar_entries);
> return -ENOMEM;
> }
> @@ -826,7 +827,8 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct
> eth_driver *eth_drv,
> IXGBE_VMDQ_NUM_UC_MAC, 0);
> if (eth_dev->data->hash_mac_addrs == NULL) {
> PMD_INIT_LOG(ERR,
> -   "Failed to allocate %d bytes needed to store MAC
> addresses",
> +   "Failed to allocate %d bytes needed to store "
> +   "MAC addresses\n",
> ETHER_ADDR_LEN * IXGBE_VMDQ_NUM_UC_MAC);
> return -ENOMEM;
> }
> @@ -850,14 +852,14 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct
> eth_driver *eth_drv,
>
> if (ixgbe_is_sfp(hw) && hw->phy.sfp_type !=
> ixgbe_sfp_type_not_present)
> PMD_INIT_LOG(DEBUG,
> -"MAC: %d, PHY: %d, SFP+: %d +"MAC: %d, PHY: %d, SFP+: %d\n",
>  (int) hw->mac.type, (int) hw->phy.type,
>  (int) hw->phy.sfp_type);
> else
> PMD_INIT_LOG(DEBUG, "MAC: %d, PHY: %d\n",
>  (int) hw->mac.type, (int) hw->phy.type);
>
> -   PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
> +   PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x\n",
> eth_dev->data->port_id, pci_dev->id.vendor_id,
> pci_dev->id.device_id);
>
> @@ -933,7 +935,7 @@ eth_ixgbevf_dev_init(__attribute__((unused)) struct
> eth_driver *eth_drv,
>
> IXGBE_DEV_PRIVATE_TO_HWSTRIP_BITMAP(eth_dev->data->dev_private);
> struct ether_addr *perm_addr = (struct ether_addr *)
> hw->mac.perm_addr;
>
> -   P

[dpdk-dev] [PATCH 01/11] ixgbe: clean log messages

2014-08-27 Thread Jay Rolette
Hi David,

The updated output is definitely an improvement, but if you'll go with the
first solution you described (adding \n in the log macro), it works much
better for products using DPDK. For developer use, I think it ends up being
a wash either way.

At least for my product (embedded network appliance), we want to capture
anything that is logging in syslog. The log macros in DPDK make that very
reasonable to do, but if there are embedded newlines, it ends up screwing
up log parsing when someone wants to process logs later.

We end up having to remove all of those newlines, which makes it harder to
merge as new releases coming out.

I'm assuming most products have similar requirements for logging. That's at
least been the case for the products I've been involved with over the years.

If the PMDs are using PMD_LOG as a replacement macro for debugging
printf's, I can see where this might be a little more pain, but having
PMD_LOG is a lot more useful it if easily integrates with central logging
facilities.

Thanks,
Jay


On Tue, Aug 26, 2014 at 9:55 AM, David Marchand 
wrote:

> Hello Jay,
>
> On Tue, Aug 26, 2014 at 4:23 PM, Jay Rolette 
> wrote:
>
>> Why are you adding newlines to log message strings? Shouldn't that be up
>> to whatever the messages end up getting routed to?
>>
>
> Actually, I wanted to have consistent log formats in the PMDs so that the
> log messages displayed at runtime are aligned (and not bouncing with \n
> before or after the log message).
> There was two solutions from my point of view :
> - either always add a \n in the log macro and remove all trailing \n
> - do the opposite
>
> I preferred the latter as it let users of the macro set the message format
> as they want.
>
>
> Before this change :
>
> Configuring Port 0 (socket 1)
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f3e59cd8080
> hw_ring=0x7f3e59d02080 dma_addr=0x727702080
>
> PMD: ixgbe_dev_tx_queue_setup(): Using simple tx code path
>
> PMD: ixgbe_dev_tx_queue_setup(): Vector tx enabled.
>
> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f3e59cd57c0
> hw_ring=0x7f3e59d12080 dma_addr=0x727712080
>
> PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are
> satisfied. Rx Burst Bulk Alloc function will be used on port=0, queue=0.
>
> PMD: ixgbe_dev_rx_queue_setup(): Vector rx enabled, please make sure RX
> burst size no less than 32.
>
>
> After this change :
>
> Configuring Port 0 (socket 1)
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7fd47ecd8080
> hw_ring=0x7fd47ed02080 dma_addr=0x727702080
> PMD: ixgbe_dev_tx_queue_setup(): Using simple tx code path
> PMD: ixgbe_dev_tx_queue_setup(): Vector tx enabled.
> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7fd47ecd57c0
> hw_ring=0x7fd47ed12080 dma_addr=0x727712080
> PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are
> satisfied. Rx Burst Bulk Alloc function will be used on port=0, queue=0.
> PMD: ixgbe_dev_rx_queue_setup(): Vector rx enabled, please make sure RX
> burst size no less than 32.
> Port 0: 90:E2:BA:29:DF:58
>
>
> --
> David Marchand
>


[dpdk-dev] Clang Scan build results

2014-08-27 Thread Jay Rolette
*> We could run something like PC-Lint or Coverity, but they cost money :-)*
Pretty sure Coverity is free for open source projects...

Jay

On Wed, Aug 27, 2014 at 10:13 AM, Wiles, Roger Keith <
keith.wiles at windriver.com> wrote:

> Hi Everyone,
>
> I built dpdk with Clang and used the scan-build analyzer to produce a
> report. The report is about 13M in size so not very nice to send to the
> list. I decided to place the report on github.com if you want to see the
> results.
>
> While running scan-build the build would fail, but I forced the build to
> continue using the scan-build option to get as much of the report as
> possible. I did not have time yet to understand why the build stopped and I
> hope to dig into it more later.
>
> Running scan-build is pretty simple, so would be a nice test report if you
> wanted to add it to dpdk.org site and maintain its output for review. It
> would be nice to run once in a while to weed out any basic problems. We
> could run something like PC-Lint or Coverity, but they cost money :-)
>
> Here is the link to my Pktgen-DPDK repos on github:
>
> # git clone git://github.com/Pktgen/dpdk-scan-build-results.git
>
> Let me know if you have any questions or suggestions.
>
> Thanks
> ++Keith
>
>
> Keith Wiles, Principal Technologist with CTO office, Wind River mobile
> 972-213-5533
>
>


[dpdk-dev] Clang Scan build results

2014-08-27 Thread Jay Rolette
Here's the link you want: https://scan.coverity.com/

Check the FAQ for terms and you'll need to register the product, but I
didn't see anything obvious that should get in the way from being able to
use it for DPDK.

Agree with Coverity not being cheap. I bought it for my last company.
Really liked that Coverity had a much higher signal-to-noise ratio than
other tools at the time. Possibly the gap has closed since then, but you
know it can't be as bad as lint!


On Wed, Aug 27, 2014 at 10:52 AM, Wiles, Roger Keith <
keith.wiles at windriver.com> wrote:

>  Nice, we had to buy one and that was not cheap :-) I groped around on the
> Coverity site and did not find any statement about being free to open
> source, but I may have just missed it.
>
>  I did find that PC-Lint, Coverity, scan-build, ? all seem to test
> different parts of your code and some are better then others. Some have a
> lot false positives and some report huge number of issues, but it just
> depends on the type and level of scan you want. One thing I found was you
> need to run different tools to find different problems as none of them do
> everything IMO.
>
>  On Aug 27, 2014, at 10:24 AM, Jay Rolette  wrote:
>
>  *> We could run something like PC-Lint or Coverity, but they cost money
> :-)*
> Pretty sure Coverity is free for open source projects...
>
>  Jay
>
> On Wed, Aug 27, 2014 at 10:13 AM, Wiles, Roger Keith <
> keith.wiles at windriver.com> wrote:
>
>> Hi Everyone,
>>
>> I built dpdk with Clang and used the scan-build analyzer to produce a
>> report. The report is about 13M in size so not very nice to send to the
>> list. I decided to place the report on github.com if you want to see the
>> results.
>>
>> While running scan-build the build would fail, but I forced the build to
>> continue using the scan-build option to get as much of the report as
>> possible. I did not have time yet to understand why the build stopped and I
>> hope to dig into it more later.
>>
>> Running scan-build is pretty simple, so would be a nice test report if
>> you wanted to add it to dpdk.org site and maintain its output for
>> review. It would be nice to run once in a while to weed out any basic
>> problems. We could run something like PC-Lint or Coverity, but they cost
>> money :-)
>>
>> Here is the link to my Pktgen-DPDK repos on github:
>>
>> # git clone git://github.com/Pktgen/dpdk-scan-build-results.git
>>
>> Let me know if you have any questions or suggestions.
>>
>> Thanks
>> ++Keith
>>
>>
>> Keith Wiles, Principal Technologist with CTO office, Wind River mobile
>> 972-213-5533
>>
>>
>
>  *Keith **Wiles*, Principal Technologist with CTO office, *Wind River *mobile
> 972-213-5533
>
>


[dpdk-dev] [PATCH] Avoid possible memory cpoy when sort hugepages

2014-12-10 Thread Jay Rolette
On Wed, Dec 10, 2014 at 5:08 AM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

> I wonder why we do need to write our own bubble sort procedure?
> Why we can't use standard qsort() here?
>

Sadly, even bubble sort would be better than the selection sort being used
here. It's guaranteed to be O(n^2) in all cases.

I just got through replacing that entire function in my repo with a call to
qsort() from the standard library last night myself. Faster (although
probably not material to most deployments) and less code.

Jay


[dpdk-dev] [PATCH] Avoid possible memory cpoy when sort hugepages

2014-12-10 Thread Jay Rolette
On Wed, Dec 10, 2014 at 12:39 PM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

> > I just got through replacing that entire function in my repo with a call
> to qsort() from the standard library last night myself. Faster
> > (although probably not material to most deployments) and less code.
>
> If you feel like it is worth it, why not to submit a patch? :)


On Haswell and IvyBridge Xeons, with 128 1G huge pages, it doesn't make a
user-noticeable difference in the time required for
rte_eal_hugepage_init(). The reason I went ahead and checked it in my repo
is because:

a) it eats at my soul to see an O(n^2) case for something where qsort() is
trivial to use
b) we will increase that up to ~232 1G huge pages soon. Likely doesn't
matter at that point either, but since it was already written...

What *does* chew up a lot of time in init is where the huge pages are being
explicitly zeroed in map_all_hugepages().

Removing that memset() makes find_numasocket() blow up, but I was able to
do a quick test where I only memset 1 byte on each page. That cut init time
by 30% (~20 seconds in my test).  Significant, but since I'm not entirely
sure it is safe, I'm not making that change right now.

On Linux, shared memory that isn't file-backed is automatically zeroed
before the app gets it. However, I haven't had a chance to chase down
whether that applies to huge pages or not, much less how hugetlbfs factors
into the equation.

Back to the question about the patch, if you guys are interested in it,
I'll have to figure out your patch submission process. Shouldn't be a huge
deal other than the fact that we are on DPDK 1.6 (r2).

Cheers,
Jay


[dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr() with qsort() from standard library

2014-12-11 Thread Jay Rolette
Signed-off-by: Jay Rolette 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 59
+++-
 1 file changed, 20 insertions(+), 39 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index bae2507..3656515 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -670,6 +670,25 @@ error:
  return -1;
 }

+static int
+cmp_physaddr(const void *a, const void *b)
+{
+#ifndef RTE_ARCH_PPC_64
+ const struct hugepage_file *p1 = (const struct hugepage_file *)a;
+ const struct hugepage_file *p2 = (const struct hugepage_file *)b;
+#else
+ // PowerPC needs memory sorted in reverse order from x86
+ const struct hugepage_file *p1 = (const struct hugepage_file *)b;
+ const struct hugepage_file *p2 = (const struct hugepage_file *)a;
+#endif
+ if (p1->physaddr < p2->physaddr)
+ return -1;
+ else if (p1->physaddr > p2->physaddr)
+ return 1;
+ else
+ return 0;
+}
+
 /*
  * Sort the hugepg_tbl by physical address (lower addresses first on x86,
  * higher address first on powerpc). We use a slow algorithm, but we won't
@@ -678,45 +697,7 @@ error:
 static int
 sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info
*hpi)
 {
- unsigned i, j;
- int compare_idx;
- uint64_t compare_addr;
- struct hugepage_file tmp;
-
- for (i = 0; i < hpi->num_pages[0]; i++) {
- compare_addr = 0;
- compare_idx = -1;
-
- /*
- * browse all entries starting at 'i', and find the
- * entry with the smallest addr
- */
- for (j=i; j< hpi->num_pages[0]; j++) {
-
- if (compare_addr == 0 ||
-#ifdef RTE_ARCH_PPC_64
- hugepg_tbl[j].physaddr > compare_addr) {
-#else
- hugepg_tbl[j].physaddr < compare_addr) {
-#endif
- compare_addr = hugepg_tbl[j].physaddr;
- compare_idx = j;
- }
- }
-
- /* should not happen */
- if (compare_idx == -1) {
- RTE_LOG(ERR, EAL, "%s(): error in physaddr sorting\n", __func__);
- return -1;
- }
-
- /* swap the 2 entries in the table */
- memcpy(&tmp, &hugepg_tbl[compare_idx],
- sizeof(struct hugepage_file));
- memcpy(&hugepg_tbl[compare_idx], &hugepg_tbl[i],
- sizeof(struct hugepage_file));
- memcpy(&hugepg_tbl[i], &tmp, sizeof(struct hugepage_file));
- }
+ qsort(hugepg_tbl, hpi->num_pages[0], sizeof(struct hugepage_file),
cmp_physaddr);
  return 0;
 }

--


[dpdk-dev] [PATCH] Fixed spam from kni_allocate_mbufs() when no mbufs are free.

2014-12-12 Thread Jay Rolette
Fixed spam from kni_allocate_mbufs() when no mbufs are free.
If mbufs exhausted, 'out of memory' message logged at EXTREMELY high rates.
Now logs no more than once per 10 mins

Signed-off-by: Jay Rolette 
---
 lib/librte_kni/rte_kni.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index fdb7509..f89319c 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -40,6 +40,7 @@
 #include 
 #include 

+#include 
 #include 
 #include 
 #include 
@@ -61,6 +62,9 @@

 #define KNI_MEM_CHECK(cond) do { if (cond) goto kni_fail; } while (0)

+// Configure how often we log "out of memory" messages (in seconds)
+#define KNI_SPAM_SUPPRESSION_PERIOD 60*10
+
 /**
  * KNI context
  */
@@ -592,6 +596,10 @@ kni_free_mbufs(struct rte_kni *kni)
 static void
 kni_allocate_mbufs(struct rte_kni *kni)
 {
+ static uint64_t no_mbufs = 0;
+ static uint64_t spam_filter = 0;
+ static uint64_t delayPeriod = 0;
+
  int i, ret;
  struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];

@@ -620,7 +628,18 @@ kni_allocate_mbufs(struct rte_kni *kni)
  pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
  if (unlikely(pkts[i] == NULL)) {
  /* Out of memory */
- RTE_LOG(ERR, KNI, "Out of memory\n");
+ no_mbufs++;
+
+ // Memory leak or need to tune? Regardless, if we get here once,
+ // we will get here a *lot*. Don't spam the logs!
+ now = rte_get_tsc_cycles();
+ if (!delayPeriod)
+delayPeriod = rte_get_tsc_hz() * KNI_SPAM_SUPPRESSION_PERIOD;
+
+ if (!spam_filter || (now - spam_filter) > delayPeriod) {
+ RTE_LOG(ERR, KNI, "No mbufs available (%llu)\n", (unsigned long
long)no_mbufs);
+ spam_filter = now;
+ }
  break;
  }
  }
--


[dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr() with qsort() from standard library

2014-12-15 Thread Jay Rolette
Because it doesn't work correctly :-)

On Mon, Dec 15, 2014 at 3:05 AM, Wodkowski, PawelX <
pawelx.wodkowski at intel.com> wrote:
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jay Rolette
> > Sent: Thursday, December 11, 2014 5:06 PM
> > To: Dev
> > Subject: [dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr()
> with
> > qsort() from standard library
> >
> > Signed-off-by: Jay Rolette 
> > ---
> >  lib/librte_eal/linuxapp/eal/eal_memory.c | 59
> > +++-
> >  1 file changed, 20 insertions(+), 39 deletions(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
> > b/lib/librte_eal/linuxapp/eal/eal_memory.c
> > index bae2507..3656515 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> > @@ -670,6 +670,25 @@ error:
> >   return -1;
> >  }
> >
> > +static int
> > +cmp_physaddr(const void *a, const void *b)
> > +{
> > +#ifndef RTE_ARCH_PPC_64
> > + const struct hugepage_file *p1 = (const struct hugepage_file *)a;
> > + const struct hugepage_file *p2 = (const struct hugepage_file *)b;
> > +#else
> > + // PowerPC needs memory sorted in reverse order from x86
> > + const struct hugepage_file *p1 = (const struct hugepage_file *)b;
> > + const struct hugepage_file *p2 = (const struct hugepage_file *)a;
> > +#endif
> > + if (p1->physaddr < p2->physaddr)
> > + return -1;
> > + else if (p1->physaddr > p2->physaddr)
> > + return 1;
> > + else
> > + return 0;
> > +}
> > +
>
> Why not simply
>
> return (int)(p1->physaddr - p2->physaddr);
>
>


[dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr() with qsort() from standard library

2014-12-15 Thread Jay Rolette
FWIW, it surprised the heck out of me as well.

Turns out that even though I'm compiling in 64-bit mode, gcc has
sizeof(int) as 4 bytes rather than 8. Not really sure what the scoop is on
that, but the standard leaves that up to the compiler. I'm used to int
always being the "natural size" on the CPU/OS, so for a 64-bit executable
on Intel, I assumed int would be 64-bit.

Being an embedded developer for so many years, I rarely use semi-defined
data types just to avoid these sorts of problems.

On Mon, Dec 15, 2014 at 7:20 AM, Wodkowski, PawelX <
pawelx.wodkowski at intel.com> wrote:
>
> > Because it doesn't work correctly :-)
>
> It should, what I am missing here? :P
>
> Pawel
>
>


[dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr() with qsort() from standard library

2014-12-16 Thread Jay Rolette
Thanks Konstantin. Yes, I'll resend. Not sure why gmail is removing
whitespace when I sent in Plain Text mode.

Ultimately I'll need to figure out how to properly configure git to send
these directly instead of handling them more manually. The examples I saw
assumed you were using a gmail.com email rather than a corporate email
hosted via google apps.

Jay

On Tue, Dec 16, 2014 at 12:39 PM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:
>
>
> Hi Jay,
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jay Rolette
> > Sent: Thursday, December 11, 2014 4:06 PM
> > To: Dev
> > Subject: [dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr()
> with qsort() from standard library
> >
> > Signed-off-by: Jay Rolette 
>
> The patch itself looks good to me.
> Though it seems something wrong with formatting - all lines start with
> offset 0.
> Probably your mail client?
> Konstantin
>
>
> > ---
> >  lib/librte_eal/linuxapp/eal/eal_memory.c | 59
> > +++-
> >  1 file changed, 20 insertions(+), 39 deletions(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
> > b/lib/librte_eal/linuxapp/eal/eal_memory.c
> > index bae2507..3656515 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> > @@ -670,6 +670,25 @@ error:
> >   return -1;
> >  }
> >
> > +static int
> > +cmp_physaddr(const void *a, const void *b)
> > +{
> > +#ifndef RTE_ARCH_PPC_64
> > + const struct hugepage_file *p1 = (const struct hugepage_file *)a;
> > + const struct hugepage_file *p2 = (const struct hugepage_file *)b;
> > +#else
> > + // PowerPC needs memory sorted in reverse order from x86
> > + const struct hugepage_file *p1 = (const struct hugepage_file *)b;
> > + const struct hugepage_file *p2 = (const struct hugepage_file *)a;
> > +#endif
> > + if (p1->physaddr < p2->physaddr)
> > + return -1;
> > + else if (p1->physaddr > p2->physaddr)
> > + return 1;
> > + else
> > + return 0;
> > +}
> > +
> >  /*
> >   * Sort the hugepg_tbl by physical address (lower addresses first on
> x86,
> >   * higher address first on powerpc). We use a slow algorithm, but we
> won't
> > @@ -678,45 +697,7 @@ error:
> >  static int
> >  sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info
> > *hpi)
> >  {
> > - unsigned i, j;
> > - int compare_idx;
> > - uint64_t compare_addr;
> > - struct hugepage_file tmp;
> > -
> > - for (i = 0; i < hpi->num_pages[0]; i++) {
> > - compare_addr = 0;
> > - compare_idx = -1;
> > -
> > - /*
> > - * browse all entries starting at 'i', and find the
> > - * entry with the smallest addr
> > - */
> > - for (j=i; j< hpi->num_pages[0]; j++) {
> > -
> > - if (compare_addr == 0 ||
> > -#ifdef RTE_ARCH_PPC_64
> > - hugepg_tbl[j].physaddr > compare_addr) {
> > -#else
> > - hugepg_tbl[j].physaddr < compare_addr) {
> > -#endif
> > - compare_addr = hugepg_tbl[j].physaddr;
> > - compare_idx = j;
> > - }
> > - }
> > -
> > - /* should not happen */
> > - if (compare_idx == -1) {
> > - RTE_LOG(ERR, EAL, "%s(): error in physaddr sorting\n", __func__);
> > - return -1;
> > - }
> > -
> > - /* swap the 2 entries in the table */
> > - memcpy(&tmp, &hugepg_tbl[compare_idx],
> > - sizeof(struct hugepage_file));
> > - memcpy(&hugepg_tbl[compare_idx], &hugepg_tbl[i],
> > - sizeof(struct hugepage_file));
> > - memcpy(&hugepg_tbl[i], &tmp, sizeof(struct hugepage_file));
> > - }
> > + qsort(hugepg_tbl, hpi->num_pages[0], sizeof(struct hugepage_file),
> > cmp_physaddr);
> >   return 0;
> >  }
> >
> > --
>


[dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr() with qsort() from standard library

2014-12-16 Thread Jay Rolette
Actually, I just relooked at the email I sent and it looks correct
(properly indented, etc.). Any suggestions for what might be going on?

On Tue, Dec 16, 2014 at 1:18 PM, Jay Rolette  wrote:
>
> Thanks Konstantin. Yes, I'll resend. Not sure why gmail is removing
> whitespace when I sent in Plain Text mode.
>
> Ultimately I'll need to figure out how to properly configure git to send
> these directly instead of handling them more manually. The examples I saw
> assumed you were using a gmail.com email rather than a corporate email
> hosted via google apps.
>
> Jay
>
> On Tue, Dec 16, 2014 at 12:39 PM, Ananyev, Konstantin <
> konstantin.ananyev at intel.com> wrote:
>>
>>
>> Hi Jay,
>>
>> > -----Original Message-
>> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jay Rolette
>> > Sent: Thursday, December 11, 2014 4:06 PM
>> > To: Dev
>> > Subject: [dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr()
>> with qsort() from standard library
>> >
>> > Signed-off-by: Jay Rolette 
>>
>> The patch itself looks good to me.
>> Though it seems something wrong with formatting - all lines start with
>> offset 0.
>> Probably your mail client?
>> Konstantin
>>
>>
>> > ---
>> >  lib/librte_eal/linuxapp/eal/eal_memory.c | 59
>> > +++-
>> >  1 file changed, 20 insertions(+), 39 deletions(-)
>> >
>> > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
>> > b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> > index bae2507..3656515 100644
>> > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>> > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> > @@ -670,6 +670,25 @@ error:
>> >   return -1;
>> >  }
>> >
>> > +static int
>> > +cmp_physaddr(const void *a, const void *b)
>> > +{
>> > +#ifndef RTE_ARCH_PPC_64
>> > + const struct hugepage_file *p1 = (const struct hugepage_file *)a;
>> > + const struct hugepage_file *p2 = (const struct hugepage_file *)b;
>> > +#else
>> > + // PowerPC needs memory sorted in reverse order from x86
>> > + const struct hugepage_file *p1 = (const struct hugepage_file *)b;
>> > + const struct hugepage_file *p2 = (const struct hugepage_file *)a;
>> > +#endif
>> > + if (p1->physaddr < p2->physaddr)
>> > + return -1;
>> > + else if (p1->physaddr > p2->physaddr)
>> > + return 1;
>> > + else
>> > + return 0;
>> > +}
>> > +
>> >  /*
>> >   * Sort the hugepg_tbl by physical address (lower addresses first on
>> x86,
>> >   * higher address first on powerpc). We use a slow algorithm, but we
>> won't
>> > @@ -678,45 +697,7 @@ error:
>> >  static int
>> >  sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info
>> > *hpi)
>> >  {
>> > - unsigned i, j;
>> > - int compare_idx;
>> > - uint64_t compare_addr;
>> > - struct hugepage_file tmp;
>> > -
>> > - for (i = 0; i < hpi->num_pages[0]; i++) {
>> > - compare_addr = 0;
>> > - compare_idx = -1;
>> > -
>> > - /*
>> > - * browse all entries starting at 'i', and find the
>> > - * entry with the smallest addr
>> > - */
>> > - for (j=i; j< hpi->num_pages[0]; j++) {
>> > -
>> > - if (compare_addr == 0 ||
>> > -#ifdef RTE_ARCH_PPC_64
>> > - hugepg_tbl[j].physaddr > compare_addr) {
>> > -#else
>> > - hugepg_tbl[j].physaddr < compare_addr) {
>> > -#endif
>> > - compare_addr = hugepg_tbl[j].physaddr;
>> > - compare_idx = j;
>> > - }
>> > - }
>> > -
>> > - /* should not happen */
>> > - if (compare_idx == -1) {
>> > - RTE_LOG(ERR, EAL, "%s(): error in physaddr sorting\n", __func__);
>> > - return -1;
>> > - }
>> > -
>> > - /* swap the 2 entries in the table */
>> > - memcpy(&tmp, &hugepg_tbl[compare_idx],
>> > - sizeof(struct hugepage_file));
>> > - memcpy(&hugepg_tbl[compare_idx], &hugepg_tbl[i],
>> > - sizeof(struct hugepage_file));
>> > - memcpy(&hugepg_tbl[i], &tmp, sizeof(struct hugepage_file));
>> > - }
>> > + qsort(hugepg_tbl, hpi->num_pages[0], sizeof(struct hugepage_file),
>> > cmp_physaddr);
>> >   return 0;
>> >  }
>> >
>> > --
>>
>


[dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr() with qsort() from standard library

2014-12-17 Thread Jay Rolette
Signed-off-by: Jay Rolette 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 59 +++-
 1 file changed, 20 insertions(+), 39 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index bae2507..3656515 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -670,6 +670,25 @@ error:
return -1;
 }

+static int
+cmp_physaddr(const void *a, const void *b)
+{
+#ifndef RTE_ARCH_PPC_64
+   const struct hugepage_file *p1 = (const struct hugepage_file *)a;
+   const struct hugepage_file *p2 = (const struct hugepage_file *)b;
+#else
+   // PowerPC needs memory sorted in reverse order from x86
+   const struct hugepage_file *p1 = (const struct hugepage_file *)b;
+   const struct hugepage_file *p2 = (const struct hugepage_file *)a;
+#endif
+   if (p1->physaddr < p2->physaddr)
+   return -1;
+   else if (p1->physaddr > p2->physaddr)
+   return 1;
+   else
+   return 0;
+}
+
 /*
  * Sort the hugepg_tbl by physical address (lower addresses first on x86,
  * higher address first on powerpc). We use a slow algorithm, but we won't
@@ -678,45 +697,7 @@ error:
 static int
 sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi)
 {
-   unsigned i, j;
-   int compare_idx;
-   uint64_t compare_addr;
-   struct hugepage_file tmp;
-
-   for (i = 0; i < hpi->num_pages[0]; i++) {
-   compare_addr = 0;
-   compare_idx = -1;
-
-   /*
-* browse all entries starting at 'i', and find the
-* entry with the smallest addr
-*/
-   for (j=i; j< hpi->num_pages[0]; j++) {
-
-   if (compare_addr == 0 ||
-#ifdef RTE_ARCH_PPC_64
-   hugepg_tbl[j].physaddr > compare_addr) {
-#else
-   hugepg_tbl[j].physaddr < compare_addr) {
-#endif
-   compare_addr = hugepg_tbl[j].physaddr;
-   compare_idx = j;
-   }
-   }
-
-   /* should not happen */
-   if (compare_idx == -1) {
-   RTE_LOG(ERR, EAL, "%s(): error in physaddr sorting\n", 
__func__);
-   return -1;
-   }
-
-   /* swap the 2 entries in the table */
-   memcpy(&tmp, &hugepg_tbl[compare_idx],
-   sizeof(struct hugepage_file));
-   memcpy(&hugepg_tbl[compare_idx], &hugepg_tbl[i],
-   sizeof(struct hugepage_file));
-   memcpy(&hugepg_tbl[i], &tmp, sizeof(struct hugepage_file));
-   }
+   qsort(hugepg_tbl, hpi->num_pages[0], sizeof(struct hugepage_file), 
cmp_physaddr);
return 0;
 }

-- 
1.9.3 (Apple Git-50)



[dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr() with qsort() from standard library

2014-12-17 Thread Jay Rolette
Thanks to some help from Matthew Hall, it looks like I have it working now.
I just resent the patch directly from git. Please let me know if it looks
ok now?

Sorry for the hassles. We use Mercurial internally, so while there is a lot
of overlap, sending patches isn't something I have to worry about
day-to-day. Appreciate the help getting things configured!

Jay

On Wed, Dec 17, 2014 at 7:08 AM, Qiu, Michael  wrote:
>
> On 12/17/2014 7:01 PM, Ananyev, Konstantin wrote:
> > Hi Jay,
> >
> > From: Jay Rolette [mailto:rolette at infiniteio.com]
> > Sent: Tuesday, December 16, 2014 7:21 PM
> > To: Ananyev, Konstantin
> > Cc: Dev
> > Subject: Re: [dpdk-dev] [PATCH] replaced O(n^2) sort in
> sort_by_physaddr() with qsort() from standard library
> >
> > Actually, I just relooked at the email I sent and it looks correct
> (properly indented, etc.). Any suggestions for what might be going on?
> >
> > On Tue, Dec 16, 2014 at 1:18 PM, Jay Rolette  <mailto:rolette at infiniteio.com>> wrote:
> > Thanks Konstantin. Yes, I'll resend. Not sure why gmail is removing
> whitespace when I sent in Plain Text mode.
> >
> > Sorry, don?t know either.
> > Wonder, did you see this:
> > https://www.kernel.org/pub/software/scm/git/docs/git-format-patch.html
> > There is a section about different mailers, etc.
> > Konstantin
> >
> > Ultimately I'll need to figure out how to properly configure git to send
> these directly instead of handling them more manually. The examples I saw
> assumed you were using a gmail.com<http://gmail.com> email rather than a
> corporate email hosted via google apps.
> Hi Jay,
>
> I was ever use git send-email with my gmail account, it works.
>
> So do you config your git send-email correctly?
>
> Thanks,
> Michael
>
>
> > Jay
> >
> > On Tue, Dec 16, 2014 at 12:39 PM, Ananyev, Konstantin <
> konstantin.ananyev at intel.com<mailto:konstantin.ananyev at intel.com>> 
> wrote:
> >
> > Hi Jay,
> >
> >> -Original Message-
> >> From: dev [mailto:dev-bounces at dpdk.org<mailto:dev-bounces at dpdk.org>]
> On Behalf Of Jay Rolette
> >> Sent: Thursday, December 11, 2014 4:06 PM
> >> To: Dev
> >> Subject: [dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr()
> with qsort() from standard library
> >>
> >> Signed-off-by: Jay Rolette  rolette at infiniteio.com>>
> > The patch itself looks good to me.
> > Though it seems something wrong with formatting - all lines start with
> offset 0.
> > Probably your mail client?
> > Konstantin
> >
> >
> >> ---
> >>  lib/librte_eal/linuxapp/eal/eal_memory.c | 59
> >> +++-
> >>  1 file changed, 20 insertions(+), 39 deletions(-)
> >>
> >> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
> >> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> >> index bae2507..3656515 100644
> >> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> >> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> >> @@ -670,6 +670,25 @@ error:
> >>   return -1;
> >>  }
> >>
> >> +static int
> >> +cmp_physaddr(const void *a, const void *b)
> >> +{
> >> +#ifndef RTE_ARCH_PPC_64
> >> + const struct hugepage_file *p1 = (const struct hugepage_file *)a;
> >> + const struct hugepage_file *p2 = (const struct hugepage_file *)b;
> >> +#else
> >> + // PowerPC needs memory sorted in reverse order from x86
> >> + const struct hugepage_file *p1 = (const struct hugepage_file *)b;
> >> + const struct hugepage_file *p2 = (const struct hugepage_file *)a;
> >> +#endif
> >> + if (p1->physaddr < p2->physaddr)
> >> + return -1;
> >> + else if (p1->physaddr > p2->physaddr)
> >> + return 1;
> >> + else
> >> + return 0;
> >> +}
> >> +
> >>  /*
> >>   * Sort the hugepg_tbl by physical address (lower addresses first on
> x86,
> >>   * higher address first on powerpc). We use a slow algorithm, but we
> won't
> >> @@ -678,45 +697,7 @@ error:
> >>  static int
> >>  sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info
> >> *hpi)
> >>  {
> >> - unsigned i, j;
> >> - int compare_idx;
> >> - uint64_t compare_addr;
> >> - struct hugepage_file tmp;
> >> -
> >> - for (i = 0; i < hpi->num_pages[0]; i++) {
> >> - compare_addr = 0;
> >> - compare_idx = -1;
> 

[dpdk-dev] [PATCH] Fixed spam from kni_allocate_mbufs() when no mbufs are free.

2014-12-17 Thread Jay Rolette
self-NAK now that I know that gmail web client borks up the patches. I'll
re-submit.

Jay

On Fri, Dec 12, 2014 at 7:28 PM, Jay Rolette  wrote:
>
> Fixed spam from kni_allocate_mbufs() when no mbufs are free.
> If mbufs exhausted, 'out of memory' message logged at EXTREMELY high
> rates. Now logs no more than once per 10 mins
>
> Signed-off-by: Jay Rolette 
> ---
>  lib/librte_kni/rte_kni.c | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
> index fdb7509..f89319c 100644
> --- a/lib/librte_kni/rte_kni.c
> +++ b/lib/librte_kni/rte_kni.c
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -61,6 +62,9 @@
>
>  #define KNI_MEM_CHECK(cond) do { if (cond) goto kni_fail; } while (0)
>
> +// Configure how often we log "out of memory" messages (in seconds)
> +#define KNI_SPAM_SUPPRESSION_PERIOD 60*10
> +
>  /**
>   * KNI context
>   */
> @@ -592,6 +596,10 @@ kni_free_mbufs(struct rte_kni *kni)
>  static void
>  kni_allocate_mbufs(struct rte_kni *kni)
>  {
> + static uint64_t no_mbufs = 0;
> + static uint64_t spam_filter = 0;
> + static uint64_t delayPeriod = 0;
> +
>   int i, ret;
>   struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];
>
> @@ -620,7 +628,18 @@ kni_allocate_mbufs(struct rte_kni *kni)
>   pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
>   if (unlikely(pkts[i] == NULL)) {
>   /* Out of memory */
> - RTE_LOG(ERR, KNI, "Out of memory\n");
> + no_mbufs++;
> +
> + // Memory leak or need to tune? Regardless, if we get here once,
> + // we will get here a *lot*. Don't spam the logs!
> + now = rte_get_tsc_cycles();
> + if (!delayPeriod)
> +delayPeriod = rte_get_tsc_hz() * KNI_SPAM_SUPPRESSION_PERIOD;
> +
> + if (!spam_filter || (now - spam_filter) > delayPeriod) {
> + RTE_LOG(ERR, KNI, "No mbufs available (%llu)\n", (unsigned long
> long)no_mbufs);
> + spam_filter = now;
> + }
>   break;
>   }
>   }
> --
>
>


[dpdk-dev] [PATCH] Fixed spam from kni_allocate_mbufs() when no mbufs are free. If mbufs exhausted, 'out of memory' message logged at EXTREMELY high rates. Now logs no more than once per 10 mins

2014-12-17 Thread Jay Rolette
Signed-off-by: Jay Rolette 
---
 lib/librte_kni/rte_kni.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index fdb7509..f89319c 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -40,6 +40,7 @@
 #include 
 #include 

+#include 
 #include 
 #include 
 #include 
@@ -61,6 +62,9 @@

 #define KNI_MEM_CHECK(cond) do { if (cond) goto kni_fail; } while (0)

+// Configure how often we log "out of memory" messages (in seconds)
+#define KNI_SPAM_SUPPRESSION_PERIOD 60*10
+
 /**
  * KNI context
  */
@@ -592,6 +596,10 @@ kni_free_mbufs(struct rte_kni *kni)
 static void
 kni_allocate_mbufs(struct rte_kni *kni)
 {
+   static uint64_t no_mbufs = 0;
+   static uint64_t spam_filter = 0;
+   static uint64_t delayPeriod = 0;
+
int i, ret;
struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];

@@ -620,7 +628,18 @@ kni_allocate_mbufs(struct rte_kni *kni)
pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
if (unlikely(pkts[i] == NULL)) {
/* Out of memory */
-   RTE_LOG(ERR, KNI, "Out of memory\n");
+   no_mbufs++;
+
+   // Memory leak or need to tune? Regardless, if we get 
here once,
+   // we will get here a *lot*. Don't spam the logs!
+   now = rte_get_tsc_cycles();
+   if (!delayPeriod)
+   delayPeriod = rte_get_tsc_hz() * 
KNI_SPAM_SUPPRESSION_PERIOD;
+
+   if (!spam_filter || (now - spam_filter) > delayPeriod) {
+   RTE_LOG(ERR, KNI, "No mbufs available 
(%llu)\n", (unsigned long long)no_mbufs);
+   spam_filter = now;
+   }
break;
}
}
-- 
1.9.3 (Apple Git-50)



[dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr() with qsort() from standard library

2014-12-17 Thread Jay Rolette
Hi Thomas,

Please read http://dpdk.org/dev#send for submission guidelines.
>

I did when I was figuring out how to submit the patch, but possible I'm
missing something on the tools to get it to include the commit comment
correctly.

A description of why you do it would be welcome in the commit log.
>

How much are you looking for here? I thought replacing an O(n^2) algorithm
with qsort() was fairly self-evident. Less code and take advantage of
standard library code that is faster.

I get it in the general case. Just didn't seem necessary for this one.


> > +static int
> > +cmp_physaddr(const void *a, const void *b)
> > +{
> > +#ifndef RTE_ARCH_PPC_64
> > + const struct hugepage_file *p1 = (const struct hugepage_file *)a;
> > + const struct hugepage_file *p2 = (const struct hugepage_file *)b;
> > +#else
> > + // PowerPC needs memory sorted in reverse order from x86
>
> Comments shall be C-style (/* */).
>

Single line comments ('//') have been part of the C standard since C99. Is
DPDK following C89 or is this just a style thing? If it is a style thing, a
link to a page with the rubric would be helpful. I didn't see one on the
submission guidelines.


> > + const struct hugepage_file *p1 = (const struct hugepage_file *)b;
> > + const struct hugepage_file *p2 = (const struct hugepage_file *)a;
> > +#endif
> > + if (p1->physaddr < p2->physaddr)
> > + return -1;
> > + else if (p1->physaddr > p2->physaddr)
> > + return 1;
> > + else
> > + return 0;
> > +}
>
> One of the goal of EAL is to avoid #ifdef.
> So that function would probably be better located in
> lib/librte_eal/common/include/arch/* with different implemenations
> depending of the architecture.
>

Hmm... I was following the approach already used in the module. See
map_all_hugepages(), remap_all_hugepages().

Regards,
Jay


[dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instructions.

2015-05-13 Thread Jay Rolette
On Tue, May 12, 2015 at 8:16 PM, Ravi Kerur  wrote:

> On Mon, May 11, 2015 at 3:29 PM, Don Provan  wrote:
>
> > I probably shouldn't stick my nose into this, but I can't help myself.
> >
> > An experienced programmer will tend to ignore the documentation for
> > a routine named "blahblah_memcmp" and just assume it functions like
> > memcmp. Whether or not there's currently a use case in DPDK is
> > completely irrelevant because as soon as there *is* a use case, some
> > poor DPDK developer will try to use rte_memcmp for that and may or
> > may not have a test case that reveals their mistake.
> >
>
> In general I agree with you. However, comparison is a hit(equal) or
> miss(unequal) is generally the case in networking. I haven't seen cases
> where "less than" or "greater than" has mattered.
>

It's useful when you need to make sure packets from both sides of a
conversation go to the same processing queue/thread. Instead of hashing the
5-tuple from the packet as src.ip, dst.ip, src.dport, dst.dport, etc., you
can use lesser.ip, higher.ip, lesser.sport, higher.dport, etc.

Very common when you are doing deep packet inspection.

Jay


[dpdk-dev] [PATCH v2 01/17] ixgbe: use the right debug macro

2014-09-02 Thread Jay Rolette
Hi David,

A couple of minor comments inline below. Looks good otherwise.

Jay


On Mon, Sep 1, 2014 at 5:24 AM, David Marchand 
wrote:

> - We should not use DEBUGOUT*/DEBUGFUNC macros in non-shared code.
> These macros come as compat wrappers for shared code.
> - We should avoid calling RTE_LOG directly as pmd provides a wrapper for
> logs.
>
> Signed-off-by: David Marchand 
> ---
>  lib/librte_pmd_ixgbe/ixgbe_82599_bypass.c |   14 
>  lib/librte_pmd_ixgbe/ixgbe_bypass.c   |   26 +++---
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c   |   27 +++
>  lib/librte_pmd_ixgbe/ixgbe_pf.c   |4 +--
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c |   53
> +++--
>  5 files changed, 63 insertions(+), 61 deletions(-)
>
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_82599_bypass.c
> b/lib/librte_pmd_ixgbe/ixgbe_82599_bypass.c
> index 0fc..2623419 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_82599_bypass.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_82599_bypass.c
> @@ -63,7 +63,7 @@ ixgbe_set_fiber_fixed_speed(struct ixgbe_hw *hw,
> ixgbe_link_speed speed)
> rs = IXGBE_SFF_SOFT_RS_SELECT_1G;
> break;
> default:
> -   DEBUGOUT("Invalid fixed module speed\n");
> +   PMD_DRV_LOG("Invalid fixed module speed");
> return;
> }
>
> @@ -72,7 +72,7 @@ ixgbe_set_fiber_fixed_speed(struct ixgbe_hw *hw,
> ixgbe_link_speed speed)
>IXGBE_I2C_EEPROM_DEV_ADDR2,
>&eeprom_data);
> if (status) {
> -   DEBUGOUT("Failed to read Rx Rate Select RS0\n");
> +   PMD_DRV_LOG("Failed to read Rx Rate Select RS0");
> goto out;
> }
>
> @@ -82,7 +82,7 @@ ixgbe_set_fiber_fixed_speed(struct ixgbe_hw *hw,
> ixgbe_link_speed speed)
> IXGBE_I2C_EEPROM_DEV_ADDR2,
> eeprom_data);
> if (status) {
> -   DEBUGOUT("Failed to write Rx Rate Select RS0\n");
> +   PMD_DRV_LOG("Failed to write Rx Rate Select RS0");
> goto out;
> }
>
> @@ -91,7 +91,7 @@ ixgbe_set_fiber_fixed_speed(struct ixgbe_hw *hw,
> ixgbe_link_speed speed)
>IXGBE_I2C_EEPROM_DEV_ADDR2,
>&eeprom_data);
> if (status) {
> -   DEBUGOUT("Failed to read Rx Rate Select RS1\n");
> +   PMD_DRV_LOG("Failed to read Rx Rate Select RS1");
> goto out;
> }
>
> @@ -101,7 +101,7 @@ ixgbe_set_fiber_fixed_speed(struct ixgbe_hw *hw,
> ixgbe_link_speed speed)
> IXGBE_I2C_EEPROM_DEV_ADDR2,
> eeprom_data);
> if (status) {
> -   DEBUGOUT("Failed to write Rx Rate Select RS1\n");
> +   PMD_DRV_LOG("Failed to write Rx Rate Select RS1");
> goto out;
> }
>  out:
> @@ -130,7 +130,7 @@ ixgbe_setup_mac_link_multispeed_fixed_fiber(struct
> ixgbe_hw *hw,
> bool link_up = false;
> bool negotiation;
>
> -   DEBUGFUNC("");
> +   PMD_INIT_FUNC_TRACE();
>
> /* Mask off requested but non-supported speeds */
> status = ixgbe_get_link_capabilities(hw, &link_speed,
> &negotiation);
> @@ -261,7 +261,7 @@ ixgbe_bypass_get_media_type(struct ixgbe_hw *hw)
>  {
> enum ixgbe_media_type media_type;
>
> -   DEBUGFUNC("");
> +   PMD_INIT_FUNC_TRACE();
>
> if (hw->device_id == IXGBE_DEV_ID_82599_BYPASS) {
> media_type = ixgbe_media_type_fiber;
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_bypass.c
> b/lib/librte_pmd_ixgbe/ixgbe_bypass.c
> index 1d21dc0..1a980b8 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_bypass.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_bypass.c
> @@ -40,20 +40,20 @@
>  #defineBYPASS_STATUS_OFF_MASK  3
>
>  /* Macros to check for invlaid function pointers. */
> -#defineFUNC_PTR_OR_ERR_RET(func, retval) do { \
> -   if ((func) == NULL) {  \
> -   DEBUGOUT("%s:%d function not supported\n", \
> -   __func__, __LINE__);   \
> -   return (retval);   \
> -   }  \
> +#defineFUNC_PTR_OR_ERR_RET(func, retval) do {  \
> +   if ((func) == NULL) {   \
> +   PMD_DRV_LOG("%s:%d function not supported", \
> +   __func__, __LINE__);\
> +   return retval;\
>
Need to keep the parens around retval in your macro


> +   }   \
>  } while(0)
>
> -#defineFUNC_PTR_OR_RET(func) do { \
>

[dpdk-dev] [PATCH v2 03/17] ixgbe: clean log messages

2014-09-02 Thread Jay Rolette
t;);
> break;
> }
>
> @@ -3210,7 +3213,7 @@ ixgbe_alloc_rx_queue_mbufs(struct igb_rx_queue *rxq)
> volatile union ixgbe_adv_rx_desc *rxd;
> struct rte_mbuf *mbuf = rte_rxmbuf_alloc(rxq->mb_pool);
> if (mbuf == NULL) {
> -   PMD_INIT_LOG(ERR, "RX mbuf al

[dpdk-dev] [PATCH v2 01/17] ixgbe: use the right debug macro

2014-09-02 Thread Jay Rolette
On Tue, Sep 2, 2014 at 9:21 AM, Thomas Monjalon 
wrote:

> >> -#defineFUNC_PTR_OR_ERR_RET(func, retval) do { \
> > >> -   if ((func) == NULL) {  \
> > >> -   DEBUGOUT("%s:%d function not supported\n", \
> > >> -   __func__, __LINE__);   \
> > >> -   return (retval);   \
> > >> -   }  \
> > >> +#defineFUNC_PTR_OR_ERR_RET(func, retval) do {  \
> > >> +   if ((func) == NULL) {   \
> > >> +   PMD_DRV_LOG("%s:%d function not supported", \
> > >> +   __func__, __LINE__);\
> > >> +   return retval;\
> > >>
> > > Need to keep the parens around retval in your macro
> >
> > Actually, checkpatch complained about this.
> > So I can keep the parenthesis, but then I don't want Thomas to tell me my
> > patch does not pass checkpatch :-)
>
> You're right, I care about checkpatch :)
> I don't see a case where parens are needed with return. Please give an
> example.


Looking at it again, in this specific case you are correct. It is good
hygiene to always use parens around macro arguments, but this specific case
is ok without. It does add a small bit of danger if retval ends up getting
used elsewhere in the macro, but that's about it. Probably not worth
redoing the patch over that.

Jay


[dpdk-dev] [PATCH v2 07/17] i40e: use the right debug macro

2014-09-02 Thread Jay Rolette
-253,10 +253,8 @@ i40e_pf_host_send_msg_to_vf(struct i40e_pf_vf *vf,
> ret = i40e_aq_send_msg_to_vf(hw, abs_vf_id, opcode, retval,
> msg, msglen, NULL);
> if (ret) {
> -   PMD_DRV_LOG(ERR, "Fail to send message to VF, err %u\n",
> -   hw->aq.asq_last_status);
> -   printf("Fail to send message to VF, err %u\n",
> -   hw->aq.asq_last_status);
> +   PMD_INIT_LOG(ERR, "Fail to send message to VF, err %u\n",
> +hw->aq.asq_last_status);
> }
>
> return ret;
> diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c
> b/lib/librte_pmd_i40e/i40e_rxtx.c
> index f153844..6987200 100644
> --- a/lib/librte_pmd_i40e/i40e_rxtx.c
> +++ b/lib/librte_pmd_i40e/i40e_rxtx.c
> @@ -1788,50 +1788,50 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ?
> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH);
> if (tx_rs_thresh >= (nb_desc - 2)) {
> -   RTE_LOG(ERR, PMD, "tx_rs_thresh must be less than the "
> -   "number of TX descriptors minus 2. "
> -   "(tx_rs_thresh=%u port=%d queue=%d)\n",
> -   (unsigned int)tx_rs_thresh,
> -   (int)dev->data->port_id,
> -   (int)queue_idx);
> +   PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the "
> +"number of TX descriptors minus 2. "
> +"(tx_rs_thresh=%u port=%d queue=%d)\n",
> +(unsigned int)tx_rs_thresh,
> +(int)dev->data->port_id,
> +(int)queue_idx);
> return I40E_ERR_PARAM;
> }
> if (tx_free_thresh >= (nb_desc - 3)) {
> -   RTE_LOG(ERR, PMD, "tx_rs_thresh must be less than the "
> -   "tx_free_thresh must be less than the "
> -   "number of TX descriptors minus 3. "
> -   "(tx_free_thresh=%u port=%d queue=%d)\n",
> -   (unsigned int)tx_free_thresh,
> -   (int)dev->data->port_id,
> -   (int)queue_idx);
> +   PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the "
> +"tx_free_thresh must be less than the "
> +"number of TX descriptors minus 3. "
> +"(tx_free_thresh=%u port=%d queue=%d)\n",
> +(unsigned int)tx_free_thresh,
> +(int)dev->data->port_id,
> +(int)queue_idx);
> return I40E_ERR_PARAM;
> }
> if (tx_rs_thresh > tx_free_thresh) {
> -   RTE_LOG(ERR, PMD, "tx_rs_thresh must be less than or "
> -       "equal to tx_free_thresh.
> (tx_free_thresh=%u"
> -   " tx_rs_thresh=%u port=%d queue=%d)\n",
> -   (unsigned
> int)tx_free_thresh,
> -   (unsigned int)tx_rs_thresh,
> -   (int)dev->data->port_id,
> -   (int)queue_idx);
> +   PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than or "
> +"equal to tx_free_thresh. (tx_free_thresh=%u"
> +" tx_rs_thresh=%u port=%d queue=%d)\n",
> +(unsigned int)tx_free_thresh,
> +(unsigned int)tx_rs_thresh,
> +(int)dev->data->port_id,
> +(int)queue_idx);
> return I40E_ERR_PARAM;
> }
> if ((nb_desc % tx_rs_thresh) != 0) {
> -   RTE_LOG(ERR, PMD, "tx_rs_thresh must be a divisor of the "
> -   "number of TX descriptors.
> (tx_rs_thresh=%u"
> -   " port=%d queue=%d)\n",
> -   (unsigned int)tx_rs_thresh,
> -   (int)dev->data->port_id,
> -   (int)queue_idx);
> +   PMD_INIT_LOG(ERR, "tx_rs_thresh must be a divisor of the "
> +"number of TX descriptors. (tx_rs_thresh=%u"
> +" port=%d queue=%d)\n",
> +(unsigned int)tx_rs_thresh,
> +(int)dev->data->port_id,
> +(int)queue_idx);
> return I40E_ERR_PARAM;
> }
> if ((tx_rs_thresh > 1) && (tx_conf->tx_thresh.wthresh != 0)) {
> -   RTE_LOG(ERR, PMD, "TX WTHRESH must be set to 0 if "
> -   "tx_rs_thresh is greater than 1. "
> -   "(tx_rs_thresh=%u port=%d queue=%d)\n",
> -   (unsigned int)tx_rs_thresh,
> -   (int)dev->data->port_id,
> -   (int)queue_idx);
> +   PMD_INIT_LOG(ERR, "TX WTHRESH must be set to 0 if "
> +"tx_rs_thresh is greater than 1. "
> +"(tx_rs_thresh=%u port=%d queue=%d)\n",
> +(unsigned int)tx_rs_thresh,
> +(int)dev->data->port_id,
> +(int)queue_idx);
> return I40E_ERR_PARAM;
> }
>
> --
> 1.7.10.4
>
> Reviewed-by: Jay Rolette 


[dpdk-dev] [PATCH v2 12/17] e1000: use the right debug macro

2014-09-02 Thread Jay Rolette
MD_DRV_LOG(DEBUG, "SMBI lock released");
> }
> e1000_put_hw_semaphore_generic(hw);
>
> @@ -416,7 +416,8 @@ igb_reset_swfw_lock(struct e1000_hw *hw)
> if (hw->bus.func > E1000_FUNC_1)
> mask <<= 2;
> if (hw->mac.ops.acquire_swfw_sync(hw, mask) < 0) {
> -   DEBUGOUT1("SWFW phy%d lock released",
> hw->bus.func);
> +   PMD_DRV_LOG(DEBUG, "SWFW phy%d lock released",
> +   hw->bus.func);
> }
> hw->mac.ops.release_swfw_sync(hw, mask);
>
> @@ -428,7 +429,7 @@ igb_reset_swfw_lock(struct e1000_hw *hw)
>  */
> mask = E1000_SWFW_EEP_SM;
> if (hw->mac.ops.acquire_swfw_sync(hw, mask) < 0) {
> -   DEBUGOUT("SWFW common locks released");
> +   PMD_DRV_LOG(DEBUG, "SWFW common locks released");
> }
> hw->mac.ops.release_swfw_sync(hw, mask);
> }
> @@ -707,7 +708,7 @@ igb_vmdq_vlan_hw_filter_enable(struct rte_eth_dev *dev)
>  static int
>  rte_igbvf_pmd_init(const char *name __rte_unused, const char *params
> __rte_unused)
>  {
> -   DEBUGFUNC("rte_igbvf_pmd_init");
> +   PMD_INIT_FUNC_TRACE();
>
> rte_eth_driver_register(&rte_igbvf_pmd);
> return (0);
> diff --git a/lib/librte_pmd_e1000/igb_pf.c b/lib/librte_pmd_e1000/igb_pf.c
> index 3d405f0..76033ad 100644
> --- a/lib/librte_pmd_e1000/igb_pf.c
> +++ b/lib/librte_pmd_e1000/igb_pf.c
> @@ -404,7 +404,7 @@ igb_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t
> vf)
>
> retval = e1000_read_mbx(hw, msgbuf, mbx_size, vf);
> if (retval) {
> -   RTE_LOG(ERR, PMD, "Error mbx recv msg from VF %d\n", vf);
> +   PMD_INIT_LOG(ERR, "Error mbx recv msg from VF %d\n", vf);
> return retval;
> }
>
> @@ -432,7 +432,8 @@ igb_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t
> vf)
> retval = igb_vf_set_vlan(dev, vf, msgbuf);
> break;
> default:
> -   RTE_LOG(DEBUG, PMD, "Unhandled Msg %8.8x\n", (unsigned)
> msgbuf[0]);
> +   PMD_INIT_LOG(DEBUG, "Unhandled Msg %8.8x\n",
> +(unsigned) msgbuf[0]);
> retval = E1000_ERR_MBX;
> break;
> }
> diff --git a/lib/librte_pmd_e1000/igb_rxtx.c
> b/lib/librte_pmd_e1000/igb_rxtx.c
> index 977c4a2..3aa9609 100644
> --- a/lib/librte_pmd_e1000/igb_rxtx.c
> +++ b/lib/librte_pmd_e1000/igb_rxtx.c
> @@ -1210,17 +1210,15 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
>  * driver.
>  */
> if (tx_conf->tx_free_thresh != 0)
> -   RTE_LOG(WARNING, PMD,
> -   "The tx_free_thresh parameter is not "
> -   "used for the 1G driver.\n");
> +   PMD_INIT_LOG(WARNING, "The tx_free_thresh parameter is not
> "
> +"used for the 1G driver.\n");
> if (tx_conf->tx_rs_thresh != 0)
> -   RTE_LOG(WARNING, PMD,
> -   "The tx_rs_thresh parameter is not "
> -   "used for the 1G driver.\n");
> +   PMD_INIT_LOG(WARNING, "The tx_rs_thresh parameter is not "
> +"used for the 1G driver.\n");
> if (tx_conf->tx_thresh.wthresh == 0)
> -   RTE_LOG(WARNING, PMD,
> -   "To improve 1G driver performance, consider
> setting "
> -   "the TX WTHRESH value to 4, 8, or 16.\n");
> +   PMD_INIT_LOG(WARNING, "To improve 1G driver performance, "
> +"consider setting the TX WTHRESH value to 4,
> 8, "
> +"or 16.\n");
>
> /* Free memory prior to re-allocation if needed */
> if (dev->data->tx_queues[queue_idx] != NULL) {
> --
> 1.7.10.4
>
> Reviewed-by: Jay Rolette 


[dpdk-dev] [PATCH v2 14/17] e1000: clean log messages

2014-09-02 Thread Jay Rolette
 PMD_INIT_LOG(ERR, "Error mbx recv msg from VF %d", vf);
> return retval;
> }
>
> @@ -432,7 +432,7 @@ igb_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t
> vf)
> retval = igb_vf_set_vlan(dev, vf, msgbuf);
> break;
> default:
> -   PMD_INIT_LOG(DEBUG, "Unhandled Msg %8.8x\n",
> +   PMD_INIT_LOG(DEBUG, "Unhandled Msg %8.8x",
>  (unsigned) msgbuf[0]);
> retval = E1000_ERR_MBX;
> break;
> diff --git a/lib/librte_pmd_e1000/igb_rxtx.c
> b/lib/librte_pmd_e1000/igb_rxtx.c
> index 3aa9609..5ca06c9 100644
> --- a/lib/librte_pmd_e1000/igb_rxtx.c
> +++ b/lib/librte_pmd_e1000/igb_rxtx.c
> @@ -396,7 +396,7 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts,
> tx_last = (uint16_t) (tx_last - txq->nb_tx_desc);
>
> PMD_TX_LOG(DEBUG, "port_id=%u queue_id=%u pktlen=%u"
> -  " tx_first=%u tx_last=%u\n",
> +  " tx_first=%u tx_last=%u",
>(unsigned) txq->port_id,
>(unsigned) txq->queue_id,
>(unsigned) pkt_len,
> @@ -548,7 +548,7 @@ eth_igb_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts,
> txd->read.cmd_type_len |=
> rte_cpu_to_le_32(E1000_TXD_CMD_EOP |
> E1000_TXD_CMD_RS);
> }
> - end_of_tx:
> +end_of_tx:
> rte_wmb();
>
> /*
> @@ -697,8 +697,8 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts,
>  * to happen by sending specific "back-pressure" flow
> control
>  * frames to its peer(s).
>  */
> -   PMD_RX_LOG(DEBUG, "\nport_id=%u queue_id=%u rx_id=%u "
> -  "staterr=0x%x pkt_len=%u\n",
> +   PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
> +  "staterr=0x%x pkt_len=%u",
>(unsigned) rxq->port_id, (unsigned)
> rxq->queue_id,
>(unsigned) rx_id, (unsigned) staterr,
>(unsigned)
> rte_le_to_cpu_16(rxd.wb.upper.length));
> @@ -706,7 +706,7 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts,
> nmb = rte_rxmbuf_alloc(rxq->mb_pool);
> if (nmb == NULL) {
> PMD_RX_LOG(DEBUG, "RX mbuf alloc failed port_id=%u
> "
> -  "queue_id=%u\n", (unsigned)
> rxq->port_id,
> +  "queue_id=%u", (unsigned) rxq->port_id,
>(unsigned) rxq->queue_id);
>
> rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed++;
> break;
> @@ -794,7 +794,7 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts,
> nb_hold = (uint16_t) (nb_hold + rxq->nb_rx_hold);
> if (nb_hold > rxq->rx_free_thresh) {
> PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_tail=%u "
> -  "nb_hold=%u nb_rx=%u\n",
> +  "nb_hold=%u nb_rx=%u",
>(unsigned) rxq->port_id, (unsigned)
> rxq->queue_id,
>(unsigned) rx_id, (unsigned) nb_hold,
>(unsigned) nb_rx);
> @@ -881,8 +881,8 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct
> rte_mbuf **rx_pkts,
>  * to happen by sending specific "back-pressure" flow
> control
>  * frames to its peer(s).
>  */
> -   PMD_RX_LOG(DEBUG, "\nport_id=%u queue_id=%u rx_id=%u "
> -  "staterr=0x%x data_len=%u\n",
> +   PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "
> +  "staterr=0x%x data_len=%u",
>(unsigned) rxq->port_id, (unsigned)
> rxq->queue_id,
>(unsigned) rx_id, (unsigned) staterr,
>(unsigned)
> rte_le_to_cpu_16(rxd.wb.upper.length));
> @@ -890,7 +890,7 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct
> rte_mbuf **rx_pkts,
> nmb = rte_rxmbuf_alloc(rxq->mb_pool);
>     if (nmb =

[dpdk-dev] [PATCH v2 09/17] i40e: clean log messages

2014-09-02 Thread Jay Rolette
On Mon, Sep 1, 2014 at 5:24 AM, David Marchand 
wrote:

> Clean log messages:
> - remove leading \n in some messages,
> - remove trailing \n in some messages,
> - split multi lines messages,
> - replace some PMD_INIT_LOG(DEBUG, "some_func") with PMD_INIT_FUNC_TRACE().
>
> Signed-off-by: David Marchand 
> ---
>  lib/librte_pmd_i40e/i40e_ethdev.c|  418
> +-
>  lib/librte_pmd_i40e/i40e_ethdev_vf.c |  166 +++---
>  lib/librte_pmd_i40e/i40e_pf.c|   75 +++---
>  lib/librte_pmd_i40e/i40e_rxtx.c  |  118 +-
>  4 files changed, 385 insertions(+), 392 deletions(-)
>
> diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c
> b/lib/librte_pmd_i40e/i40e_ethdev.c
> index 352beb1..1a5b55d 100644
> --- a/lib/librte_pmd_i40e/i40e_ethdev.c
> +++ b/lib/librte_pmd_i40e/i40e_ethdev.c
> @@ -371,7 +371,7 @@ eth_i40e_dev_init(__rte_unused struct eth_driver
> *eth_drv,
> hw->hw_addr = (uint8_t *)(pci_dev->mem_resource[0].addr);
> if (!hw->hw_addr) {
> PMD_INIT_LOG(ERR, "Hardware is not available, "
> -   "as address is NULL\n");
> +"as address is NULL");
> return -ENODEV;
> }
>
> @@ -395,7 +395,8 @@ eth_i40e_dev_init(__rte_unused struct eth_driver
> *eth_drv,
> /* Initialize the shared code (base driver) */
> ret = i40e_init_shared_code(hw);
> if (ret) {
> -   PMD_INIT_LOG(ERR, "Failed to init shared code (base
> driver): %d", ret);
> +   PMD_INIT_LOG(ERR, "Failed to init shared code (base
> driver):"
> +"%d", ret);
> return ret;
> }
>
> @@ -406,8 +407,7 @@ eth_i40e_dev_init(__rte_unused struct eth_driver
> *eth_drv,
> PMD_INIT_LOG(ERR, "Failed to init adminq: %d", ret);
> return -EIO;
> }
> -   PMD_INIT_LOG(INFO, "FW %d.%d API %d.%d NVM "
> -   "%02d.%02d.%02d eetrack %04x\n",
> +   PMD_INIT_LOG(INFO, "FW %d.%d API %d.%d NVM %02d.%02d.%02d eetrack
> %04x",
> hw->aq.fw_maj_ver, hw->aq.fw_min_ver,
> hw->aq.api_maj_ver, hw->aq.api_min_ver,
> ((hw->nvm.version >> 12) & 0xf),
> @@ -417,7 +417,7 @@ eth_i40e_dev_init(__rte_unused struct eth_driver
> *eth_drv,
> /* Disable LLDP */
> ret = i40e_aq_stop_lldp(hw, true, NULL);
> if (ret != I40E_SUCCESS) /* Its failure can be ignored */
> -   PMD_INIT_LOG(INFO, "Failed to stop lldp\n");
> +   PMD_INIT_LOG(INFO, "Failed to stop lldp");
>
> /* Clear PXE mode */
> i40e_clear_pxe_mode(hw);
> @@ -439,13 +439,13 @@ eth_i40e_dev_init(__rte_unused struct eth_driver
> *eth_drv,
> /* Initialize the queue management */
> ret = i40e_res_pool_init(&pf->qp_pool, 0, hw->func_caps.num_tx_qp);
> if (ret < 0) {
> -   PMD_INIT_LOG(ERR, "Failed to init queue pool\n");
> +   PMD_INIT_LOG(ERR, "Failed to init queue pool");
> goto err_qp_pool_init;
> }
> ret = i40e_res_pool_init(&pf->msix_pool, 1,
> hw->func_caps.num_msix_vectors - 1);
> if (ret < 0) {
> -   PMD_INIT_LOG(ERR, "Failed to init MSIX pool\n");
> +   PMD_INIT_LOG(ERR, "Failed to init MSIX pool");
> goto err_msix_pool_init;
> }
>
> @@ -499,8 +499,8 @@ eth_i40e_dev_init(__rte_unused struct eth_driver
> *eth_drv,
> /* Should be after VSI initialized */
> dev->data->mac_addrs = rte_zmalloc("i40e", len, 0);
> if (!dev->data->mac_addrs) {
> -   PMD_INIT_LOG(ERR, "Failed to allocated memory "
> -   "for storing mac address");
> +   PMD_INIT_LOG(ERR, "Failed to allocated memory for storing "
> +"mac address");
> goto err_get_mac_addr;
> }
> ether_addr_copy((struct ether_addr *)hw->mac.perm_addr,
> @@ -723,9 +723,9 @@ i40e_phy_conf_link(struct i40e_hw *hw, uint8_t
> abilities, uint8_t force_speed)
> phy_conf.eeer = phy_ab.eeer_val;
> phy_conf.low_power_ctrl = phy_ab.d3_lpan;
>
> -   PMD_DRV_LOG(DEBUG, "\n\tCurrent: abilities %x, link_speed %x\n"
> -   "\tConfig:  abilities %x, link_speed %x",
> -   phy_ab.abilities, phy_ab.link_speed,
> +   PMD_DRV_LOG(DEBUG, "\tCurrent: abilities %x, link_speed %x",
> +   phy_ab.abilities, phy_ab.link_speed);
> +   PMD_DRV_LOG(DEBUG, "\tConfig:  abilities %x, link_speed %x",
> phy_conf.abilities, phy_conf.link_speed);
>
> status = i40e_aq_set_phy_config(hw, &phy_conf, NULL);
> @@ -763,7 +763,7 @@ i40e_dev_start(struct rte_eth_dev *dev)
>
> if ((dev->data->dev_conf.link_duplex != ETH_LINK_AUTONEG_DUPLEX) &&
> (dev->data->dev_co

[dpdk-dev] KNI and memzones

2014-09-23 Thread Jay Rolette
*> p.s. Lately someone involved with DPDK said KNI would be deprecated in
future DPDK releases; I haven't read or listen to this before, is this
true? What would be the natural replacement then?*

KNI is a non-trivial part of the product I'm in the process of building.
I'd appreciate someone "in the know" addressing this one please. Are there
specific roadmap plans relative to KNI that we need to be aware of?

Regards,
Jay

On Tue, Sep 23, 2014 at 4:27 AM, Marc Sune  wrote:

> Hi all,
>
> So we are having some problems with KNI. In short, we have a DPDK
> application that creates KNI interfaces and destroys them during its
> lifecycle and connecting them to DOCKER containers. Interfaces may
> eventually be even named the same (see below).
>
> We were wondering why even calling rte_kni_release() the hugepages memory
> was rapidly being exhausted, and we also realised even after destruction,
> you cannot use the same name for the interface.
>
> After close inspection of the rte_kni lib we think the core issue and is
> mostly a design issue. rte_kni_alloc ends up calling kni_memzone_reserve()
> that calls at the end rte_memzone_reserve() which cannot be unreserved by
> rte_kni_relese() (by design of memzones). The exhaustion is rapid due to
> the number of FIFOs created (6).
>
> If this would be right, we would propose and try to provide a patch as
> follows:
>
> * Create a new rte_kni_init(unsigned int max_knis);
>
> This would preallocate all the FIFO rings(TX, RX, ALLOC, FREE, Request
> and  Response)*max_knis by calling kni_memzone_reserve(), and store them in
> a kni_fifo_pool. This should only be called once by DPDK applications at
> bootstrapping time.
>
> * rte_kni_allocate would just use one of the kni_fifo_pool (one => meaning
> a a set of 6 FIFOs making a single slot)
> * rte_kni_release would return to the pool.
>
> This should solve both issues. We would base the patch on 1.7.2.
>
> Thoughts?
> marc
>
> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> future DPDK releases; I haven't read or listen to this before, is this
> true? What would be the natural replacement then?
>


[dpdk-dev] KNI and memzones

2014-09-23 Thread Jay Rolette
I can't discuss product details openly yet, but I'm happy to have a
detailed discussion under NDA with Intel. In fact, we had an early NDA
discussion with Intel about it a few months ago.

That said, the use case isn't tied so closely to my product that I can't
describe it in general terms...

Imagine a box that installs in your network as a transparent
bump-in-the-wire. Traffic comes in port 1 and is processed by our
DPDK-based engine, then the packets are forwarded out port 2, where they
head to their original destination. From a network topology point of view,
the box is mostly invisible.

Same process applies for traffic going the other way (RX on port 2,
special-sauce processing in DPDK app, TX on port 1).

If you are familiar with network security products, this is very much how
IPS devices work.

Where KNI comes into play is for several user-space apps that need to use
the normal network stack (sockets) to communicate over the _same_ ports
used on the main data path. We use KNI to create a virtual port with an IP
address overlaid on the "invisible" data path ports.

This isn't just for control traffic. It's obviously not line-rate
processing, but we need to get all the bandwidth we can out of it.

Let me know if that makes sense or if I need to clarify some things. If
you'd rather continue this as an NDA discussion, just shoot me an email
directly.

Regards,
Jay



On Tue, Sep 23, 2014 at 11:38 AM, Zhou, Danny  wrote:

>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jay Rolette
> > Sent: Tuesday, September 23, 2014 8:39 PM
> > To: Marc Sune
> > Cc: ; dev-team at bisdn.de
> > Subject: Re: [dpdk-dev] KNI and memzones
> >
> > *> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> > future DPDK releases; I haven't read or listen to this before, is this
> > true? What would be the natural replacement then?*
> >
> > KNI is a non-trivial part of the product I'm in the process of building.
> > I'd appreciate someone "in the know" addressing this one please. Are
> there
> > specific roadmap plans relative to KNI that we need to be aware of?
> >
>
> KNI and multi-threaded KNI has several limitation:
> 1) Flow classification and packet distribution are done both software,
> specifically KNI user space library, at the cost of CPU cycles.
> 2) Low performance, skb creation/free and packetscopy between skb and mbuf
> kills performance significantly.
> 3) Dedicate cores in user space and kernel space responsible for rx/tx
> packets between DPDK App and KNI device, it seems to me waste too many core
> resources.
> 4) GPL license jail as KNI sits in kernel.
>
> We actually have a bifurcated driver prototype that meets both high
> performance and upstreamable requirement, which is treated as alternative
> solution of KNI. The idea is to
> leverage NIC' flow director capability to bifurcate data plane packets to
> DPDK and keep control plane packets or whatever packets need to go through
> kernel' TCP/IP stack remains
> being processed in kernel(NIC driver + stack). Basically, kernel NIC
> driver and DPDK co-exists to driver a same NIC device, but manipulate
> different rx/tx queue pairs. Though there is some
> tough consistent NIC control issue which needs to be resolved and
> upstreamed to kernel, which I do not want to expose details at the moment.
>
> IMHO, KNI should NOT be removed unless there is a really good user space,
> open-source and socket backward-compatible() TCP/IP stack which should not
> become true very soon.
> The bifurcated driver approach could certainly replace KNI for some use
> cases where DPDK does not own the NIC control.
>
> Do you mind share your KNI use case in more details to help determine
> whether bifurcate driver could help with?
>
> > Regards,
> > Jay
> >
> > On Tue, Sep 23, 2014 at 4:27 AM, Marc Sune  wrote:
> >
> > > Hi all,
> > >
> > > So we are having some problems with KNI. In short, we have a DPDK
> > > application that creates KNI interfaces and destroys them during its
> > > lifecycle and connecting them to DOCKER containers. Interfaces may
> > > eventually be even named the same (see below).
> > >
> > > We were wondering why even calling rte_kni_release() the hugepages
> memory
> > > was rapidly being exhausted, and we also realised even after
> destruction,
> > > you cannot use the same name for the interface.
> > >
> > > After close inspection of the rte_kni lib we think the core issue and
> is
> > > mostly a design issue. rte_kni_alloc ends up calling
> kni_memzone_reserve()
>

[dpdk-dev] KNI and memzones

2014-09-23 Thread Jay Rolette
Yep, good way to describe it. Not really related to network security
functions but very similar architecture.

On Tue, Sep 23, 2014 at 2:12 PM, Zhou, Danny  wrote:

>  It looks like a typical network middle box usage with IDS/IPS/DPI sort
> of functionalities.  Good enough performance rather than line-rate
> performance should be ok for this case, and multi-threaded KNI(multiple
> software rx/tx queues are established between DPDK and a single vEth netdev
> with multiple kernel threads affinities to several lcores) should fit, with
> linear performance scaling if you can allocate multiple lcores to achieve
> satisfied throughput for relatively big packets.
>
>
>
> Since NIC control is still in DPDK? PMD for this case, bifurcated driver
> does not fit, unless you only use DPDK to rx/tx packets in your box.
>
>
>
> *From:* Jay Rolette [mailto:rolette at infiniteio.com]
> *Sent:* Wednesday, September 24, 2014 2:53 AM
> *To:* Zhou, Danny
> *Cc:* Marc Sune; ; dev-team at bisdn.de
>
> *Subject:* Re: [dpdk-dev] KNI and memzones
>
>
>
> I can't discuss product details openly yet, but I'm happy to have a
> detailed discussion under NDA with Intel. In fact, we had an early NDA
> discussion with Intel about it a few months ago.
>
>
>
> That said, the use case isn't tied so closely to my product that I can't
> describe it in general terms...
>
>
>
> Imagine a box that installs in your network as a transparent
> bump-in-the-wire. Traffic comes in port 1 and is processed by our
> DPDK-based engine, then the packets are forwarded out port 2, where they
> head to their original destination. From a network topology point of view,
> the box is mostly invisible.
>
>
>
> Same process applies for traffic going the other way (RX on port 2,
> special-sauce processing in DPDK app, TX on port 1).
>
>
>
> If you are familiar with network security products, this is very much how
> IPS devices work.
>
>
>
> Where KNI comes into play is for several user-space apps that need to use
> the normal network stack (sockets) to communicate over the _same_ ports
> used on the main data path. We use KNI to create a virtual port with an IP
> address overlaid on the "invisible" data path ports.
>
>
>
> This isn't just for control traffic. It's obviously not line-rate
> processing, but we need to get all the bandwidth we can out of it.
>
>
>
> Let me know if that makes sense or if I need to clarify some things. If
> you'd rather continue this as an NDA discussion, just shoot me an email
> directly.
>
>
>
> Regards,
>
> Jay
>
>
>
>
>
>
>
> On Tue, Sep 23, 2014 at 11:38 AM, Zhou, Danny 
> wrote:
>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jay Rolette
> > Sent: Tuesday, September 23, 2014 8:39 PM
> > To: Marc Sune
> > Cc: ; dev-team at bisdn.de
> > Subject: Re: [dpdk-dev] KNI and memzones
> >
> > *> p.s. Lately someone involved with DPDK said KNI would be deprecated in
> > future DPDK releases; I haven't read or listen to this before, is this
> > true? What would be the natural replacement then?*
> >
> > KNI is a non-trivial part of the product I'm in the process of building.
> > I'd appreciate someone "in the know" addressing this one please. Are
> there
> > specific roadmap plans relative to KNI that we need to be aware of?
> >
>
> KNI and multi-threaded KNI has several limitation:
> 1) Flow classification and packet distribution are done both software,
> specifically KNI user space library, at the cost of CPU cycles.
> 2) Low performance, skb creation/free and packetscopy between skb and mbuf
> kills performance significantly.
> 3) Dedicate cores in user space and kernel space responsible for rx/tx
> packets between DPDK App and KNI device, it seems to me waste too many core
> resources.
> 4) GPL license jail as KNI sits in kernel.
>
> We actually have a bifurcated driver prototype that meets both high
> performance and upstreamable requirement, which is treated as alternative
> solution of KNI. The idea is to
> leverage NIC' flow director capability to bifurcate data plane packets to
> DPDK and keep control plane packets or whatever packets need to go through
> kernel' TCP/IP stack remains
> being processed in kernel(NIC driver + stack). Basically, kernel NIC
> driver and DPDK co-exists to driver a same NIC device, but manipulate
> different rx/tx queue pairs. Though there is some
> tough consistent NIC control issue which needs to be resolved and
> upstreamed to kernel, which I do not want to expose detai

[dpdk-dev] [PATCH] eal: decrease the memory init time with many hugepages setup

2015-04-02 Thread Jay Rolette
On Thu, Apr 2, 2015 at 7:55 AM, Thomas Monjalon 
wrote:

> 2015-04-02 19:30, jerry.lilijun at huawei.com:
> > From: Lilijun 
> >
> > In the function map_all_hugepages(), hugepage memory is truly allocated
> by
> > memset(virtaddr, 0, hugepage_sz). Then it costs about 40s to finish the
> > dpdk memory initialization when 4 2M hugepages are setup in host os.
>
> Yes it's something we should try to reduce.
>

I have a patch in my tree that does the same opto, but it is commented out
right now. In our case, 2/3's of the startup time for our entire app was
due to that particular call - memset(virtaddr, 0, hugepage_sz). Just
zeroing 1 byte per huge page reduces that by 30% in my tests.

The only reason I have it commented out is that I didn't have time to make
sure there weren't side-effects for DPDK or my app. For normal shared
memory on Linux, pages are initialized to zero automatically once they are
touched, so the memset isn't required but I wasn't sure whether that
applied to huge pages. Also wasn't sure how hugetlbfs factored into the
equation.

Hopefully someone can chime in on that. Would love to uncomment the opto :)

> In fact we can only write one byte to finish  the allocation.
>
> Isn't it a security hole?
>

Not necessarily. If the kernel pre-zeros the huge pages via CoW like normal
pages, then definitely not.

Even if the kernel doesn't pre-zero the pages, if DPDK takes care of
properly initializing memory structures on startup as they are carved out
of the huge pages, then it isn't a security hole. However, that approach is
susceptible to bit rot... You can audit the code and make sure everything
is kosher at first, but you have to worry about new code making assumptions
about how memory is initialized.


> This article speaks about "prezeroing optimizations" in Linux kernel:
> http://landley.net/writing/memory-faq.txt


I read through that when I was trying to figure out what whether huge pages
were pre-zeroed or not. It doesn't talk about huge pages much beyond why
they are useful for reducing TLB swaps.

Jay


[dpdk-dev] tools brainstorming

2015-04-08 Thread Jay Rolette
"C comments" includes //, right? It's been part of the C standard for a long 
time now...

Sent from my iPhone

> On Apr 8, 2015, at 8:40 AM, Butler, Siobhan A  
> wrote:
> 
> 
> 
>> -Original Message-
>> From: Neil Horman [mailto:nhorman at tuxdriver.com]
>> Sent: Wednesday, April 8, 2015 2:11 PM
>> To: Butler, Siobhan A
>> Cc: Thomas Monjalon; dev at dpdk.org
>> Subject: Re: [dpdk-dev] tools brainstorming
>> 
>>> On Wed, Apr 08, 2015 at 12:16:10PM +, Butler, Siobhan A wrote:
>>> 
>>> 
 -Original Message-
 From: Neil Horman [mailto:nhorman at tuxdriver.com]
 Sent: Wednesday, April 8, 2015 12:44 PM
 To: Butler, Siobhan A
 Cc: Thomas Monjalon; dev at dpdk.org
 Subject: Re: [dpdk-dev] tools brainstorming
 
> On Wed, Apr 08, 2015 at 10:43:53AM +, Butler, Siobhan A wrote:
> Hi all,
> To add to the tools brainstorming - I propose we use the following
> Coding
 Standards as the basis of guidelines on coding style going forward.
> The style outlined below is in alignment with the current
> convention used
 for the majority of the project.
> Any thoughts/suggestions or feedback welcome.
> Thanks
> Siobhan :)
> 
> 
> 
> 
> Coding Style
> ~~
> 
> Description
> ---
> 
> This document specifies the preferred style for source files in
> the DPDK
 source tree.
> It is based on the Linux Kernel coding guidelines and the FreeBSD
> 7.2 Kernel Developer's Manual (see man style(9)), but was heavily
> modified for
 the needs of the DPDK. Many of the style rules are implicit in the
>> examples.
> Be careful to check the examples before assuming that style is
> silent on an
 issue.
> 
> General Guidelines
> --
> 
> The rules and guidelines given in this document cannot cover every
 situation, so the following general guidelines should be used as a 
 fallback:
> The code style should be consistent within each individual file,
> and within each file in a given directory or module - in the case
> of creating new
 files The primary reason for coding standards is to increase code
 readability and comprehensibility, therefore always use whatever
 option will make the code easiest to read.
> 
> The following more specific recommendations apply to all sections,
> both for
 C and assembly code:
> Line length is recommended to be not more than 80 characters,
> including comments. [Tab stop size should be assumed to be at
> least 4-
 characters wide] Indentation should be to no more than 3 levels deep.
> NOTE The above are recommendations, and not hard limits. However,
> it is
 expected that the recommendations should be followed in all but the
 rarest situations.
> C Comment Style
> 
> Usual Comments
> --
> 
> These comments should be used in normal cases. To document a
> public
 API, a doxygen-like format must be used: refer to Doxygen
>> Documentation.
> /*
>  * VERY important single-line comments look like this.
>  */
> 
> /* Most single-line comments look like this. */
> 
> /*
>  * Multi-line comments look like this.  Make them real sentences. Fill
>  * them so they look like real paragraphs.
>  */
> 
> License Header
> --
> 
> Each file should begin with a special comment tag which will
> contain the
 appropriate copyright and license for the file (Generally BSD License).
> After any copyright header, a blank line should be left before any
> other
 contents, e.g. include statements in a C file.
 
 This can become very confusing.  There already is a LICENSE.GPL and
 LICENSE.LGPL file at the top of the project, allowing others to
 insert their own licenses within their files can make it difficult
 for end user to determine if it is legally safe to use a given project.
 
 I'd suggest instead that contributions be disallowed from including
 license files in their code, relying instead on only a single
 license at the top of the project (which should likely be BSD or LGPL, 
 since
>> we're shipping a library).
 
 IANAL, but it seems to me to be dangerous to do anything else.  For
 example, all the code that is dual licensed in the library (like
 rte_pci_dev_ids.h).  It allows for a BSD or GPL license.  If someone
 builds dpdk as a DSO and distributes it under the former, every
 application that links against that re-distribution may arguably
 itself become GPL licensed.  While I'm personally fine with that, I
 can see it as being a big deal to some end users.  Unifying the
 license makes the re-distribution license issue more clear for everyone.
 
 Neil
>>> 
>>> 
>>> Input appreciated Neil thank you, would it be best to

[dpdk-dev] tools brainstorming

2015-04-09 Thread Jay Rolette
On Wed, Apr 8, 2015 at 5:38 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Wed, 8 Apr 2015 16:29:54 -0600
> Jay Rolette  wrote:
>
> > "C comments" includes //, right? It's been part of the C standard for a
> long time now...
>
> Yes but.
> I like to use checkpatch and checkpatch enforces kernel style which does
> not allow // for
> comments.
>

Fork checkpatch and disable that bit? DPDK isn't the kernel, so no
requirement to follow all of its rules


[dpdk-dev] tools brainstorming

2015-04-09 Thread Jay Rolette
On Thu, Apr 9, 2015 at 2:16 PM, Neil Horman  wrote:

> On Thu, Apr 09, 2015 at 11:31:39AM -0500, Jay Rolette wrote:
> > On Wed, Apr 8, 2015 at 5:38 PM, Stephen Hemminger <
> > stephen at networkplumber.org> wrote:
> >
> > > On Wed, 8 Apr 2015 16:29:54 -0600
> > > Jay Rolette  wrote:
> > >
> > > > "C comments" includes //, right? It's been part of the C standard
> for a
> > > long time now...
> > >
> > > Yes but.
> > > I like to use checkpatch and checkpatch enforces kernel style which
> does
> > > not allow // for
> > > comments.
> > >
> >
> > Fork checkpatch and disable that bit? DPDK isn't the kernel, so no
> > requirement to follow all of its rules
> >
>
> Doesn't that beg the question, why?  I understand the DPDK isn't the
> kernel, but
> we're not talking about clarity of code, not anything functional to that
> code.
> It seems we would be better served by just taking something that works here
> rather than re-inventing the wheel and digging into the minuate of what
> type of
> comments should be allowed (unless there is a compelling reason to change
> it
> that supercedes the avilable tools).  If not checkpath, then some other
> tool,
> but It seems to me that coding style is one of those things where we can
> bend to
> the tool rather than taking the time to make the tool do exactly whats
> desired,
> at least until someone gets the time to modify it.
>

Fair question.

It depends a bit on how much you want to encourage patch contributions. Is
it worth adding more pain for folks trying to contribute patches for things
like this?

Should we force someone to spend time redoing a patch because of which way
they do their parenthesis? What about number of spaces to indent code? //
vs /* */ comments? None of these matter functionally and they don't affect
maintenance generally.

If someone is modifying existing code, then yeah, they should follow the
prevailing style (indention level, brace alignment, etc.) of the file they
are in. It helps readability, which makes maintenance easier. However, IMO,
mixing // and /* */ for comments doesn't affect the readability of the
source.

I know if I submit a patch and the only feedback is that I should have used
/* */ for comments, I'm extremely unlikely spend extra time to resubmit the
patch for pedantry.


[dpdk-dev] [PATCH] Fixed spam from kni_allocate_mbufs() when no mbufs are free. If mbufs exhausted, 'out of memory' message logged at EXTREMELY high rates. Now logs no more than once per 10 mins

2015-04-14 Thread Jay Rolette
Hi Stephen,

Thanks for the feedback. Comments and questions inline below.

Jay

On Mon, Apr 13, 2015 at 8:09 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Wed, 17 Dec 2014 07:57:02 -0600
> Jay Rolette  wrote:
>
> > Signed-off-by: Jay Rolette 
> > ---
> >  lib/librte_kni/rte_kni.c | 21 -
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
> > index fdb7509..f89319c 100644
> > --- a/lib/librte_kni/rte_kni.c
> > +++ b/lib/librte_kni/rte_kni.c
> > @@ -40,6 +40,7 @@
> >  #include 
> >  #include 
> >
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -61,6 +62,9 @@
> >
> >  #define KNI_MEM_CHECK(cond) do { if (cond) goto kni_fail; } while (0)
> >
> > +// Configure how often we log "out of memory" messages (in seconds)
> > +#define KNI_SPAM_SUPPRESSION_PERIOD 60*10
> > +
> >  /**
> >   * KNI context
> >   */
> > @@ -592,6 +596,10 @@ kni_free_mbufs(struct rte_kni *kni)
> >  static void
> >  kni_allocate_mbufs(struct rte_kni *kni)
> >  {
> > + static uint64_t no_mbufs = 0;
> > + static uint64_t spam_filter = 0;
> > + static uint64_t delayPeriod = 0;
> > +
> >   int i, ret;
> >   struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];
> >
> > @@ -620,7 +628,18 @@ kni_allocate_mbufs(struct rte_kni *kni)
> >   pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
> >   if (unlikely(pkts[i] == NULL)) {
> >   /* Out of memory */
> > - RTE_LOG(ERR, KNI, "Out of memory\n");
> > + no_mbufs++;
> > +
> > + // Memory leak or need to tune? Regardless, if we
> get here once,
> > + // we will get here a *lot*. Don't spam the logs!
> > + now = rte_get_tsc_cycles();
> > + if (!delayPeriod)
> > + delayPeriod = rte_get_tsc_hz() *
> KNI_SPAM_SUPPRESSION_PERIOD;
> > +
> > + if (!spam_filter || (now - spam_filter) >
> delayPeriod) {
> > + RTE_LOG(ERR, KNI, "No mbufs available
> (%llu)\n", (unsigned long long)no_mbufs);
> > + spam_filter = now;
> > + }
> >   break;
> >   }
> >   }
>
> I agree whole completely with the intent of this.
> But just remove the log message completely. It doesn't
> help at all, use a statistic instead.
>

I'm fine with removing the log message completely. Can you point me to
where DPDK keeps stats generally? Stats like this are only useful if they
are accessible from some sort of monitoring process. There aren't any stats
in struct rte_kni right now.

If you want to do ratelimiting, then it is better to create
> a common function (see net_ratelimit() in Linux kernel) to
> have all code use the same method, rather than reinventing private
> code to do it.
>

I'll remove it.


> Minor style complaints:
>   * don't use camelCase, it goes against the style of the rest of the code.
>

ok


>   * don't use C++ style comments.
>

I didn't. I used C99 style comments :)


>   * always use rte_cycles() not TSC for things like this.
>

I don't see rte_cycles() defined anywhere. Did you mean some other function?

Please resubmit removing the log message and adding a statistic.
>


[dpdk-dev] Beyond DPDK 2.0

2015-04-24 Thread Jay Rolette
On Fri, Apr 24, 2015 at 2:47 AM, Luke Gorrie  wrote:

> 2. How will DPDK users justify contributing to DPDK upstream?
>
> Engineers in network equipment vendors want to contribute to open source,
> but what is the incentive for the companies to support this? This would be
> easy if DPDK were GPL'd (they are compelled) or if everybody were
> dynamically linking with the upstream libdpdk (can't have private patches).
> However, in a world where DPDK is BSD-licensed and statically linked, is it
> not both cheaper and competitively advantageous to keep fixes and
> optimizations in house?
>

The main incentive for most companies to support it is that it reduces
their maintenance load. It makes it easier to not get "stuck" on a
particular version of DPDK and they don't have to waste time constantly
back-porting improvements and bug fixes.

I can tell you that if DPDK were GPL-based, my company wouldn't be using
it. I suspect we wouldn't be the only ones...

Jay


[dpdk-dev] Beyond DPDK 2.0

2015-04-24 Thread Jay Rolette
On Fri, Apr 24, 2015 at 1:51 PM, Neil Horman  wrote:

> So, I hear your arguments, and its understandable that you might not want
> a GPL
> licensed product, given that the DPDK is a library (though I'm not sure
> what the
> aversion to LGPL would be).  Regardless, I think this conversation is a
> bit more
> about participation than license choice.  While you are correct, in that
> the
> first step to support (by which I presume you mean participation in the
> community) is use, the goal here is to get people contributing patches and
> helping increase the usefulness of DPDK.


> Given that DPDK is primarily licensed as BSD now, whats preventing you, or
> what
> would encourage you to participate in the community?  I see emails from
> infiniteio addresss in the archives asking questions and making
> suggestions on
> occasion, but no patches.  What would get you (or others in a simmilar
> situation) to submit those?
>

36 hours in the day? :)

It's not a lot, but we've submitted a couple of small patches. It's mostly
a matter of opportunity. We submit patches as we come across DPDK bugs or
find useful optos.

*Patches*

   - replaced O(n^2) sort in sort_by_physaddr() with qsort() from standard
   library 
   - Fixed spam from kni_allocate_mbufs() when no mbufs are free. If mbufs
   exhausted, 'out of memory' message logged at EXTREMELY high rates. Now logs
   no more than once per 10 mins 

*Reviews*

   - kni: optimizing the rte_kni_rx_burst
   
   - [PATCH RFC] librte_reorder: new reorder library
   
   - [PATCH v2 09/17] i40e: clean log messages
    (several in
   that series, but I figure 1 link is plenty)

*Other*
Not really patches or reviews, but trying to participate in the community:

   - VMware Fusion + DPDK and KNI
   
   - Appropriate DPDK data structures for TCP sockets
   
   - kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]
   
   - segmented recv ixgbevf
   

Jay


[dpdk-dev] Beyond DPDK 2.0

2015-04-27 Thread Jay Rolette
On Sat, Apr 25, 2015 at 7:10 AM, Neil Horman  wrote:

> On Fri, Apr 24, 2015 at 02:55:33PM -0500, Jay Rolette wrote:
> > On Fri, Apr 24, 2015 at 1:51 PM, Neil Horman 
> wrote:
> >
> > > So, I hear your arguments, and its understandable that you might not
> want
> > > a GPL
> > > licensed product, given that the DPDK is a library (though I'm not sure
> > > what the
> > > aversion to LGPL would be).  Regardless, I think this conversation is a
> > > bit more
> > > about participation than license choice.  While you are correct, in
> that
> > > the
> > > first step to support (by which I presume you mean participation in the
> > > community) is use, the goal here is to get people contributing patches
> and
> > > helping increase the usefulness of DPDK.
> >
> >
> > > Given that DPDK is primarily licensed as BSD now, whats preventing
> you, or
> > > what
> > > would encourage you to participate in the community?  I see emails from
> > > infiniteio addresss in the archives asking questions and making
> > > suggestions on
> > > occasion, but no patches.  What would get you (or others in a simmilar
> > > situation) to submit those?
> > >
> >
> > 36 hours in the day? :)
> >
> > It's not a lot, but we've submitted a couple of small patches. It's
> mostly
> > a matter of opportunity. We submit patches as we come across DPDK bugs or
> > find useful optos.
> >
>


>
> Understand, I'm not trying to single you out here.  I see that you, and
> others
> from infiniteio have had some participation in the project, and thats
> great, and
> appreciated. I'm more focused on why that level of participation is not
> higher
> (not only from you, but from the larger presumed user base in general).  As
> noted at the start of this thread, one of the goals of this discussion is
> to
> find ways to maximize participation in the project, and from you I'm
> hearing
> that:
>
> 1) you use the dpdk because it lowers maintenence overhead
> 2) you don't participate more in the project because your product work
> schedule
> doesn't allow you to do so.
>
> Are those both accurate statements?
>

(1) was just my response to Luke about what would motivate a company to
submit patches if the license didn't require it (GPL vs BSD discussion).
Maint overhead had little to do with why we decided to use DPDK.

(2) is certainly true enough, but not really all that big of a factor in
the reasons why our participation isn't at some higher level. I'd say there
are two primary factors:

*New Contributors*
Dropping a major patch on a project where we have no history would be
foolish and frustrating. Every open source project has a vibe to it and its
own way of operating. It normally takes some time to figure that out, but
for major contributions, it generally involves a level of trust.

Major code drops or patches aren't generally well received unless it is
from someone that is known and trusted by the community. Building up that
trust ("street cred") normally involves participating in smaller, less
risky ways. Answering questions where you can, small patches to fix bugs,
possibly reviewing code (although this can be iffy as well), etc.

This facilitates understanding the process, who the players are and what
sort of toes you are likely to step on.

It also gives you time to ramp up on the internals of the code and
philosophy/goals of the project better. With DPDK, performance is obviously
more important than portability. Until recently, very little care was given
to API stability or the impact that has on DPDK apps. Both of those are
very different approaches than typical and it affects what you might do
when approaching submitting a patch.

If you want to build up the number of folks contributing, expect them to
start small and make sure it actually GOES somewhere.

The first patch I submitted in mid-December had some quick responses and
questions back-and-forth, but then stalled on a couple of undocumented
minor stylistic things (comment style and use of #ifdefs). I never heard
anything back and 4.5 months later, a simple startup opto is still sitting
there.

The second patch I submitted (also mid-December) got no response at all for
4 months. I've replied to the feedback that came eventually, but once
again, no subsequent answers.

Neither of the patches were important, but the process doesn't exactly
inspire a strong desire to contribute more.

*Different Goals*
I see at least two different sets of people on the mailing list. The heavy
hitters generally have a product reason for their high level of involvement
with DPDK. For Intel and 6Wind, the

[dpdk-dev] Beyond DPDK 2.0

2015-04-28 Thread Jay Rolette
On Tue, Apr 28, 2015 at 12:26 PM, Neil Horman  wrote:

> On Mon, Apr 27, 2015 at 08:46:01AM -0500, Jay Rolette wrote:
> > On Sat, Apr 25, 2015 at 7:10 AM, Neil Horman 
> wrote:
> >
> > > On Fri, Apr 24, 2015 at 02:55:33PM -0500, Jay Rolette wrote:
> > > > On Fri, Apr 24, 2015 at 1:51 PM, Neil Horman 
> > > wrote:
> > > >
> > > > > So, I hear your arguments, and its understandable that you might
> not
> > > want
> > > > > a GPL
> > > > > licensed product, given that the DPDK is a library (though I'm not
> sure
> > > > > what the
> > > > > aversion to LGPL would be).  Regardless, I think this conversation
> is a
> > > > > bit more
> > > > > about participation than license choice.  While you are correct, in
> > > that
> > > > > the
> > > > > first step to support (by which I presume you mean participation
> in the
> > > > > community) is use, the goal here is to get people contributing
> patches
> > > and
> > > > > helping increase the usefulness of DPDK.
> > > >
> > > >
> > > > > Given that DPDK is primarily licensed as BSD now, whats preventing
> > > you, or
> > > > > what
> > > > > would encourage you to participate in the community?  I see emails
> from
> > > > > infiniteio addresss in the archives asking questions and making
> > > > > suggestions on
> > > > > occasion, but no patches.  What would get you (or others in a
> simmilar
> > > > > situation) to submit those?
> > > > >
> > > >
> > > > 36 hours in the day? :)
> > > >
> > > > It's not a lot, but we've submitted a couple of small patches. It's
> > > mostly
> > > > a matter of opportunity. We submit patches as we come across DPDK
> bugs or
> > > > find useful optos.
> > > >
> > >
> > 
> >
> > >
> > > Understand, I'm not trying to single you out here.  I see that you, and
> > > others
> > > from infiniteio have had some participation in the project, and thats
> > > great, and
> > > appreciated. I'm more focused on why that level of participation is not
> > > higher
> > > (not only from you, but from the larger presumed user base in
> general).  As
> > > noted at the start of this thread, one of the goals of this discussion
> is
> > > to
> > > find ways to maximize participation in the project, and from you I'm
> > > hearing
> > > that:
> > >
> > > 1) you use the dpdk because it lowers maintenence overhead
> > > 2) you don't participate more in the project because your product work
> > > schedule
> > > doesn't allow you to do so.
> > >
> > > Are those both accurate statements?
> > >
> >
> > (1) was just my response to Luke about what would motivate a company to
> > submit patches if the license didn't require it (GPL vs BSD discussion).
> > Maint overhead had little to do with why we decided to use DPDK.
> >
> > (2) is certainly true enough, but not really all that big of a factor in
> > the reasons why our participation isn't at some higher level. I'd say
> there
> > are two primary factors:
> >
> > *New Contributors*
> > Dropping a major patch on a project where we have no history would be
> > foolish and frustrating. Every open source project has a vibe to it and
> its
> > own way of operating. It normally takes some time to figure that out, but
> > for major contributions, it generally involves a level of trust.
> >
> > Major code drops or patches aren't generally well received unless it is
> > from someone that is known and trusted by the community. Building up that
> > trust ("street cred") normally involves participating in smaller, less
> > risky ways. Answering questions where you can, small patches to fix bugs,
> > possibly reviewing code (although this can be iffy as well), etc.
> >
> > This facilitates understanding the process, who the players are and what
> > sort of toes you are likely to step on.
> >
> > It also gives you time to ramp up on the internals of the code and
> > philosophy/goals of the project better. With DPDK, performance is
> obviously
> > more important than portability. Until recently, very little care was
> given
> > to API stability or the i

Re: [dpdk-dev] drops while transmitting to the kni using rte_kni_tx_burst()

2017-01-17 Thread Jay Rolette
On Mon, Jan 16, 2017 at 8:55 AM, Ferruh Yigit 
wrote:

> On 1/16/2017 2:47 PM, Shirley Avishour wrote:
> > Hi,
> > As I wrote the kernel thread runs on a dedicated lcore.
> > running top while my application is running I see kni_single and the cpu
> > usage is really low...
> > Is there any rate limitation for transmitting to the kernel interface
> > (since packets are being copied in the kernel).
>
> Yes, kind of, kernel thread sleeps periodically, with a value defined by
> KNI_KTHREAD_RESCHEDULE_INTERVAL. You can try tweaking this value, if you
> want thread do more work, less sleep J
>

YMMV, but that is unlikely to be the problem. See this thread for why the
code is misleading if you don't understand what the schedule_timeout() call
is doing:

http://dpdk.org/ml/archives/dev/2015-June/018858.html

Jay


>
> Also KNI_RX_LOOP_NUM can be updated for same purpose.
>
> >
> >
> > On Mon, Jan 16, 2017 at 4:42 PM, Ferruh Yigit  > > wrote:
> >
> > On 1/16/2017 12:20 PM, Shirley Avishour wrote:
> > > Hi,
> > > I have an application over dpdk which is consisted of the
> following threads
> > > each running on a separate core:
> > > 1) rx thread which listens on in a poll mode for traffic
> > > 2) 2 packet processing threads (for load balancing)
> > > 3) kni thread (which also runs on a separate core).
> >
> > This is kernel thread, right? Is it bind to any specific core?
> > Is it possible that this thread shares the core with 2nd processing
> > thread when enabled?
> >
> > >
> > > the rx thread receives packets and clones them and transmit a copy
> > to the
> > > kni and the other packet is sent to the packet processing unit
> > (hashing
> > > over 2 threads).
> > > the receive traffic rate is 100Mbps.
> > > When working with single packet processing thread I am able to get
> > all the
> > > 100Mbps towards the kni with no drops.
> > > but when I activate my application with 2 packet processing
> > threads I start
> > > facing drops towards the kni.
> > > the way I see it the only difference now is that I have another
> > threads
> > > which handles an mbuf and frees it once processing is completed.
> > > Can anyone assist with this case please?
> > >
> > > Thanks!
> > >
> >
> >
>
>


Re: [dpdk-dev] [PATCH 2/2] config: disable KNI ethtool by default

2017-01-17 Thread Jay Rolette
On Tue, Jan 17, 2017 at 12:01 PM, Ferruh Yigit 
wrote:

> KNI ethtool support (KNI control path) is not commonly used,
> and it tends to break the build with new version of the Linux kernel.
>
> KNI ethtool feature is disabled by default. KNI datapath is not effected
> from this update.
>
> It is possible to enable feature explicitly with config option:
> "CONFIG_RTE_KNI_KMOD_ETHTOOL=y"
>
> Signed-off-by: Ferruh Yigit 
>

Is there a test case somewhere to detect when it gets broken or is the
intent to let it bit-rot unless someone enables that option and
subsequently discovers it broken?

I know we don't do this for every config option, but given the fact that
this tends to get broken frequently, it seems riskier.

Jay


> ---
>  config/common_linuxapp | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index d4c4f0c..2483dfa 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -38,7 +38,6 @@ CONFIG_RTE_EXEC_ENV_LINUXAPP=y
>  CONFIG_RTE_EAL_IGB_UIO=y
>  CONFIG_RTE_EAL_VFIO=y
>  CONFIG_RTE_KNI_KMOD=y
> -CONFIG_RTE_KNI_KMOD_ETHTOOL=y
>  CONFIG_RTE_LIBRTE_KNI=y
>  CONFIG_RTE_LIBRTE_VHOST=y
>  CONFIG_RTE_LIBRTE_PMD_VHOST=y
> --
> 2.9.3
>
>


Re: [dpdk-dev] [PATCH] kni: fast data availability check in thread_single loop

2017-01-18 Thread Jay Rolette
On Wed, Jan 18, 2017 at 5:05 AM, Sergey Vyazmitinov <
s.vyazmiti...@brain4net.com> wrote:

> On Thu, Jan 12, 2017 at 12:29 AM, Ferruh Yigit 
>  wrote:
>
> > On 12/29/2016 11:23 PM, Sergey Vyazmitinov wrote:
> > > This allow to significant reduces packets processing latency.
> > >
> > > Signed-off-by: Sergey Vyazmitinov 
> > > ---
> > >  .../linuxapp/eal/include/exec-env/rte_kni_common.h |  6 
> > >  lib/librte_eal/linuxapp/kni/kni_misc.c | 33
> > --
> > >  2 files changed, 30 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/lib/librte_eal/linuxapp/eal/
> include/exec-env/rte_kni_common.h
> > b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > > index 09713b0..8183a8e 100644
> > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h
> > > @@ -109,6 +109,12 @@ struct rte_kni_fifo {
> > >   void *volatile buffer[]; /**< The buffer contains mbuf
> > pointers */
> > >  };
> > >
> > > +static inline int
> > > +kni_fifo_empty(struct rte_kni_fifo *fifo)
> > > +{
> > > + return fifo->write == fifo->read;
> > > +}
> > > +
> > >  /*
> > >   * The kernel image of the rte_mbuf struct, with only the relevant
> > fields.
> > >   * Padding is necessary to assure the offsets of these fields
> > > diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c
> > b/lib/librte_eal/linuxapp/kni/kni_misc.c
> > > index 497db9b..4bf9bfa 100644
> > > --- a/lib/librte_eal/linuxapp/kni/kni_misc.c
> > > +++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
> > > @@ -45,6 +45,7 @@ MODULE_AUTHOR("Intel Corporation");
> > >  MODULE_DESCRIPTION("Kernel Module for managing kni devices");
> > >
> > >  #define KNI_RX_LOOP_NUM 1000
> > > +#define KNI_RX_DATA_LOOP_NUM 2500
> > >
> > >  #define KNI_MAX_DEVICES 32
> > >
> > > @@ -129,25 +130,39 @@ static struct pernet_operations kni_net_ops = {
> > >  #endif
> > >  };
> > >
> > > -static int
> > > -kni_thread_single(void *data)
> > > +static inline void
> > > +kni_thread_single_rx_data_loop(struct kni_net *knet)
> > >  {
> > > - struct kni_net *knet = data;
> > > - int j;
> > >   struct kni_dev *dev;
> > > + int i;
> > >
> > > - while (!kthread_should_stop()) {
> > > - down_read(&knet->kni_list_lock);
> > > - for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
> > > - list_for_each_entry(dev, &knet->kni_list_head,
> > list) {
> > > + for (i = 0; i < KNI_RX_DATA_LOOP_NUM; ++i) {
> >
> > When there are multiple KNI interfaces, and lets assume there is traffic
> > too, this will behave like:
> >
> > KNI1x2500 data_packets + KNI2x2500 data_packets  KNI10x2500
> >
> > After data packets, KNI1 resp_packet + KNI2 resp_packets ...
> >
> > Won't this scenario also may cause latency? And perhaps jitter according
> > KNI interface traffic loads?
> >
> > This may be good for some use cases, but not sure if this is good for
> all.
> >
> We can decrease KNI_RX_DATA_LOOP_NUM to some reasonable value.
> I can make test to find lower bound.
> Also, the point is in fast check for a new data in interface rx queue.
> May be will be better add some kind of break after several kni_net_rx
> calls.
> Without them loop ends very quickly.
> Anyway, this patch decrease average latency in my case from 4.5ms to
> 0.011ms in ping test with 10 packets.
>

If you were seeing latency of 4.5ms, then it is more likely a different
issue.

At the end of the loop where KNI is reading packets from the queue, it
calls *schedule_timeout_interruptible()* with (by default) a 5us timeout.
However, that call just guarantees that the thread will sleep for AT LEAST
5us.

For most x86 Linux distros, HZ = 250 in the kernel, which works out to 4ms.
I'm reasonably certain the latency you are seeing is because the KNI thread
is sleeping and not getting woken up like you might expect.

When you increased the number of loops happening before the sleep, you
increased how long KNI spends before it sleeps and it happened to be long
enough in your particular test to change your average latency. If you ran
your test for a few minutes and built a histogram of ping times, I bet
you'll see ~4ms of latency pop up regularly.

More details from when I dug into this behavior previously:
http://dpdk.org/ml/archives/dev/2015-June/018858.html

Jay



>
> >
> > > + list_for_each_entry(dev, &knet->kni_list_head, list) {
> > > + /* Burst dequeue from rx_q */
> > > + if (!kni_fifo_empty((struct rte_kni_fifo
> > *)dev->rx_q)) {
> >
> > Do we need this check, since first thing in kni_net_rx_normal() is
> > checking if there is item in the queue?
> >
> > You right. Without that check latency is even less.
>
> >  #ifdef RTE_KNI_VHOST
> > >   kni_chk_vhost_rx(dev);
> > >  #else
> > >   kni_net_rx(dev);
> > >  #endif
> > > - kni_net_poll_

Re: [dpdk-dev] [PATCH] SDK: Add scripts to initialize DPDK runtime

2016-12-12 Thread Jay Rolette
On Mon, Dec 12, 2016 at 3:12 PM, Bruce Richardson <
bruce.richard...@intel.com> wrote:

> On Mon, Dec 12, 2016 at 07:24:02PM +, Luca Boccassi wrote:
> > From: Christian Ehrhardt 
> >
> > A tools/init directory is added with dpdk-init, a script that can be
> > used to initialize a DPDK runtime environment. 2 config files with
> > default options, dpdk.conf and interfaces, are provided as well
> > together with a SysV init script and a systemd service unit.
> >
> > Signed-off-by: Luca Boccassi 
> > Signed-off-by: Christian Ehrhardt 
> > ---
> 
> > new file mode 100755
> > index 000..89e0399
> > --- /dev/null
> > +++ b/tools/init/dpdk-init.in
> > @@ -0,0 +1,256 @@
> > +#!/bin/sh
> > +#
> > +# dpdk-init: startup script to initialize a dpdk runtime environment
> > +#
> > +# Copyright 2015-2016 Canonical Ltd.
> > +# Autor: Stefan Bader 
> > +# Autor: Christian Ehrhardt 
> > +#
> > +#This program is free software: you can redistribute it and/or
> modify
> > +#it under the terms of the GNU General Public License version 3,
> > +#as published by the Free Software Foundation.
> > +#
> > +#This program is distributed in the hope that it will be useful,
> > +#but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +#MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +#GNU General Public License for more details.
> > +#
> > +#You should have received a copy of the GNU General Public License
> > +#along with this program.  If not, see <
> http://www.gnu.org/licenses/>.
> > +#
>
> Any particular reason this is licensed under GPL v3. AFAIK, all the
> userspace code in DPDK is licensed under a BSD license (with the
> exception of some dual licensed stuff which is shared between kernel and
> userspace). I just worry that adding additional licenses into the mix
> may confuse things.
>
> Regards,
> /Bruce
>

Most generally, shouldn't any patch that isn't compatible with the standard
DPDK license be rejected? With the specific exception for the KNI kernel
bits that require different licenses...

Jay


Re: [dpdk-dev] KNI broken again with 4.9 kernel

2016-12-15 Thread Jay Rolette
On Thu, Dec 15, 2016 at 6:01 AM, Mcnamara, John 
wrote:

>
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger
> > Sent: Wednesday, December 14, 2016 11:41 PM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] KNI broken again with 4.9 kernel
> >
> > /build/lib/librte_eal/linuxapp/kni/igb_main.c:2317:21: error:
> > initialization from incompatible pointer type [-Werror=incompatible-
> > pointer-types]
> >   .ndo_set_vf_vlan = igb_ndo_set_vf_vlan,
> >  ^~~
> >
> > I am sure Ferruh Yigit will fix it.
> >
> > Which raises a couple of questions:
> >  1. Why is DPDK still keeping KNI support for Intel specific ethtool
> > functionality.
> > This always breaks, is code bloat, and means a 3rd copy of base code
> > (Linux, DPDK PMD, + KNI)
> >
> >  2. Why is KNI not upstream?
> > If not acceptable due to security or supportablity then why does it
> > still exist?
> >
> >  3. If not upstream, then maintainer should track upstream kernel changes
> > and fix DPDK before
> > kernel is released.  The ABI is normally set early in the rc cycle
> > weeks before release.
>
>
> Hi Stephen,
>
> On point 2: The feedback we have always received is that the KNI code
> isn't upstreamable. Do you think there is an upstream path?
>
> > If not acceptable due to security or supportablity then why does it
> > still exist?
>
> The most commonly expressed reason when we have asked this question in the
> past (and we did again at Userspace a few months ago) is that the people
> who use it want the performance.
>

We use KNI in our product. In our case, it's because it allows "normal"
non-DPDK apps in the control plane to interact with traffic on the fastpath
as needed. Having everything under the sun live in DPDK's essentially flat
memory space is not great for security or stability.

It helps time to market by being able to use existing programs that
interface to the network via sockets instead of having to limit ourselves
to the relatively tiny set of libraries out there that work directly DPDK.

Double bonus on the time-to-market argument since we can implement
functionality in other higher-level languages as appropriate.

Performance-wise, KNI is "ok" but not great. It's not clear to me why it is
so much slower than using a NIC normally (not via DPDK) via the Linux
network stack. Copying data between sk_buf and mbuf is never going to be
cheap, but comparing that to what happens through all the kernel network
stack layers, the end result seems too slow.

That said, it's still faster than TAP/TUN interfaces and similar approaches.

Jay

On point 3: We do have an internal continuous integration system that runs
> nightly compiles of DPDK against the latest kernel and flags any issues.
>
> John
>


Re: [dpdk-dev] next technical board meeting, 2017-04-06

2017-03-30 Thread Jay Rolette
On Thu, Mar 30, 2017 at 10:51 AM, Wiles, Keith 
wrote:


>
> My Soap box comment:
>I think we are limiting DPDK’s growth by only focusing on a few new
> PMDs and reworking the existing code. We need to look forward and grow DPDK
> as a community to get more people involved in adding more applications and
> new designs. I believe DPDK.org needs to be a bigger community and not just
> a I/O library called DPDK. We need to actively move the organization to
> include more then just a high speed I/O library. Some will focus on DPDK
> and others will focus on providing a higher level applications, libraries
> and features.
>
> Regards,
> Keith
>

Yes!


Re: [dpdk-dev] [PATCH] kni: fast data availability check in thread_single loop

2017-03-10 Thread Jay Rolette
On Fri, Mar 10, 2017 at 6:59 AM, Thomas Monjalon 
wrote:

> 2017-01-18 07:11, Jay Rolette:
> > On Wed, Jan 18, 2017 at 5:05 AM, Sergey Vyazmitinov <
> > s.vyazmiti...@brain4net.com> wrote:
> >
> > > On Thu, Jan 12, 2017 at 12:29 AM, Ferruh Yigit  >
> > >  wrote:
> > >
> > > > On 12/29/2016 11:23 PM, Sergey Vyazmitinov wrote:
> > > > > This allow to significant reduces packets processing latency.
> > > > >
> > > > > Signed-off-by: Sergey Vyazmitinov 
> [...]
> > > > > --- a/lib/librte_eal/linuxapp/kni/kni_misc.c
> > > > > +++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
> > > > > @@ -45,6 +45,7 @@ MODULE_AUTHOR("Intel Corporation");
> > > > >  MODULE_DESCRIPTION("Kernel Module for managing kni devices");
> > > > >
> > > > >  #define KNI_RX_LOOP_NUM 1000
> > > > > +#define KNI_RX_DATA_LOOP_NUM 2500
> > > > >
> > > > >  #define KNI_MAX_DEVICES 32
> > > > >
> > > > > @@ -129,25 +130,39 @@ static struct pernet_operations kni_net_ops
> = {
> > > > >  #endif
> > > > >  };
> > > > >
> > > > > -static int
> > > > > -kni_thread_single(void *data)
> > > > > +static inline void
> > > > > +kni_thread_single_rx_data_loop(struct kni_net *knet)
> > > > >  {
> > > > > - struct kni_net *knet = data;
> > > > > - int j;
> > > > >   struct kni_dev *dev;
> > > > > + int i;
> > > > >
> > > > > - while (!kthread_should_stop()) {
> > > > > - down_read(&knet->kni_list_lock);
> > > > > - for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
> > > > > - list_for_each_entry(dev,
> &knet->kni_list_head,
> > > > list) {
> > > > > + for (i = 0; i < KNI_RX_DATA_LOOP_NUM; ++i) {
> > > >
> > > > When there are multiple KNI interfaces, and lets assume there is
> traffic
> > > > too, this will behave like:
> > > >
> > > > KNI1x2500 data_packets + KNI2x2500 data_packets  KNI10x2500
> > > >
> > > > After data packets, KNI1 resp_packet + KNI2 resp_packets ...
> > > >
> > > > Won't this scenario also may cause latency? And perhaps jitter
> according
> > > > KNI interface traffic loads?
> > > >
> > > > This may be good for some use cases, but not sure if this is good for
> > > all.
> > > >
> > > We can decrease KNI_RX_DATA_LOOP_NUM to some reasonable value.
> > > I can make test to find lower bound.
> > > Also, the point is in fast check for a new data in interface rx queue.
> > > May be will be better add some kind of break after several kni_net_rx
> > > calls.
> > > Without them loop ends very quickly.
> > > Anyway, this patch decrease average latency in my case from 4.5ms to
> > > 0.011ms in ping test with 10 packets.
> > >
> >
> > If you were seeing latency of 4.5ms, then it is more likely a different
> > issue.
> >
> > At the end of the loop where KNI is reading packets from the queue, it
> > calls *schedule_timeout_interruptible()* with (by default) a 5us
> timeout.
> > However, that call just guarantees that the thread will sleep for AT
> LEAST
> > 5us.
> >
> > For most x86 Linux distros, HZ = 250 in the kernel, which works out to
> 4ms.
> > I'm reasonably certain the latency you are seeing is because the KNI
> thread
> > is sleeping and not getting woken up like you might expect.
> >
> > When you increased the number of loops happening before the sleep, you
> > increased how long KNI spends before it sleeps and it happened to be long
> > enough in your particular test to change your average latency. If you ran
> > your test for a few minutes and built a histogram of ping times, I bet
> > you'll see ~4ms of latency pop up regularly.
> >
> > More details from when I dug into this behavior previously:
> > http://dpdk.org/ml/archives/dev/2015-June/018858.html
>
> No answer in this discussion.
> Should we close it in patchwork?
>

I don't believe we should merge the patch.

Jay


Re: [dpdk-dev] [RFC] Kernel Control Path (KCP)

2017-06-13 Thread Jay Rolette
On Tue, Jun 13, 2017 at 12:21 PM, Ferruh Yigit 
wrote:

> On 5/30/2017 11:55 AM, Thomas Monjalon wrote:
> > 26/05/2017 18:52, Ferruh Yigit:
> >> We are looking for re-sending [1] the Kernel Control Path (KCP)
> >> with some updates [2].
> >>
> >> Mainly this is an usability improvement for DPDK.
> >>
> >> And a quick reminder about what KCP is:
> >>
> >> "KCP is Linux virtual network interface that can control DPDK ports".
> >>
> >> So DPDK interfaces, somehow will be visible and it will be possible to
> >> use common Linux tools on DPDK interfaces.
> >
> > Reminder: the Mellanox PMDs live with their upstream kernel modules,
> > allowing such features.
> >
> > The best model would be to have control path in kernel for every PMDs.
>
> That is the intention with this feature.
>
> >
> > Anyway, do you think KCP (or NCI) could be upstreamed in any way?
>
> Unfortunately I believe the answer is same, it may not be possible to
> upsteam this kernel module. Should this fact block the feature?
>

Upstream is better, but KCP is a nice quality-of-life feature that I'd like
to see go in regardless. Anything that helps make DPDK less "foreign" to
normal port configuration and status tools is goodness.

Jay


[dpdk-dev] BUG - KNI broken in 4.2 kernel

2015-08-27 Thread Jay Rolette
On Thu, Aug 27, 2015 at 10:23 AM, Zhang, Helin 
wrote:
>
>
> > -Original Message-
> > From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> > Sent: Wednesday, August 26, 2015 5:15 PM
> > To: dev at dpdk.org; Zhang, Helin
> > Subject: BUG - KNI broken in 4.2 kernel
> >
> > The network device ops handles changed again.
> >
> > Does KNI really need to keep yet another copy of the Intel driver code.
> Yes, it is a bit ugly. But there is no better way till now, as some of
> users want to have the functionality of KNI ethtool though limited now.
> I have an idea that is to disable KNI ethtool support by default, as most
> of users may not care about it. Then these users will not be bothered by
> these type of issues.
>

Do you know of anyone that uses KNI that doesn't care about it? Since KNI
presents as a normal network interface, it really needs to support the
normal Linux commands (ifconfig, ethtool, ...)

Jay


> For those users who are using KNI ethtool support, they need to fix the
> issues on any new versions of kernel or similar.
> Any good ideas or comments?
>
> Helin
>
> > There already are 4 versions:
> >   1. Out-of tree base driver
> >   2. In-kernel mainline Linux driver
> >   3. DPDK driver
> >   4. KNI DPDK driver
> >
> > No wonder they can't stay in sync.
>


[dpdk-dev] releases scheduling

2015-12-15 Thread Jay Rolette
+100 for the LTS

One of the bigger challenges for products using DPDK is that there is so
much change going in each release with very limited testing. Trying to even
remotely keep up is too risky. We end up back-porting various fixes and
enhancements to whatever version we are on (1.6rc2 currently).

Jay

On Tue, Dec 15, 2015 at 8:42 AM, Wiles, Keith  wrote:

> On 12/15/15, 7:37 AM, "dev on behalf of O'Driscoll, Tim" <
> dev-bounces at dpdk.org on behalf of tim.odriscoll at intel.com> wrote:
>
> >
> >> -Original Message-
> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> >> Sent: Sunday, December 13, 2015 7:23 PM
> >> To: dev at dpdk.org
> >> Subject: [dpdk-dev] releases scheduling
> >>
> >> Hi all,
> >>
> >> We need to define the deadlines for the next releases.
> >> During 2015, we were doing a release every 4 months.
> >> If we keep the same pace, the next releases would be:
> >>  2.3: end of March
> >>  2.4: end of July
> >>  2.5: end of November
> >>
> >> However, things move fast and it may be a bit long to wait 4 months for
> >> a feature. That's why I suggest to progressively shorten release terms:
> >>  2.3: end of March
> >>  2.4: mid July
> >>  2.5: end of October
> >> and continue with a release every 3 months:
> >>  2.6: end of January
> >>  2.7: end of April
> >>  2.8: end of July
> >> This planning would preserve some of the major holiday periods
> >> (February, May, August, December).
> >>
> >> The first period, for the first submission of a feature, was 2 months
> >> long.
> >> Then we had 2 other months to discuss, merge and fix.
> >> We should shorten only the first period.
> >>
> >> Anyway, the next deadlines should be unchanged:
> >>  - January 31: end of first submission phase
> >>  - March 31: release 2.3
> >>
> >> Opinions are welcome.
> >
> >I think moving to quarterly releases is a good idea. Your proposal to do
> this in a gradual way, so that we don't change the 2.3 dates, also makes
> sense.
> >
> >Should we consider changing the release numbering at the same time? It's
> difficult to keep track of when each 2.x release is due, and we don't have
> any criteria in place for moving to 3.x in future. Many people like the
> Ubuntu numbering scheme of Year.Month. Should we consider adopting that
> convention? If we move in future to a model where we have long-term support
> releases (something that was touched on in Dublin), then we could append an
> LTS designation to the release number.
>
> +1 for the Ubuntu number and the LTS
> >
> >
> >Tim
> >
>
>
> Regards,
> Keith
>
>
>
>
>


Re: [dpdk-dev] KNI performance is not what is claimed

2018-09-20 Thread Jay Rolette
On Thu, Sep 20, 2018 at 1:11 PM Stephen Hemminger <
step...@networkplumber.org> wrote:

> I wonder if KNI is claiming performance that was never measured on current
> CPU, OS, DPDK.
>
> With single stream and TCP testing on IXGBE (DPDK), I see lowest
> performance with KNI.
>
> Rx  Tx
> KNI 3.2 Gbit/sec1.3 Gbit/sec
> TAP 4.9 4.7
> Virtio  5.6 8.6
>
> Perhaps for 18.11 we should change documentation to remove language
> claiming
> better performance with KNI, and then plan for future deprecation?
>

Do TAP and Virtio provide equivalent function to KNI? I can't speak for any
other products, but ours is dependent on KNI. The ability for control plane
applications to use normal Linux sockets with DPDK is key even if it isn't
performant.

Hopefully the answer is "yes", in which case I'll happily port over to
using one of the faster mechanisms.

Thanks,
Jay


Re: [dpdk-dev] KNI performance is not what is claimed

2018-09-20 Thread Jay Rolette
On Thu, Sep 20, 2018 at 3:16 PM Stephen Hemminger <
step...@networkplumber.org> wrote:

> On Thu, 20 Sep 2018 15:02:53 -0500
> Jay Rolette  wrote:
>
> > On Thu, Sep 20, 2018 at 1:11 PM Stephen Hemminger <
> > step...@networkplumber.org> wrote:
> >
> > > I wonder if KNI is claiming performance that was never measured on
> current
> > > CPU, OS, DPDK.
> > >
> > > With single stream and TCP testing on IXGBE (DPDK), I see lowest
> > > performance with KNI.
> > >
> > > Rx  Tx
> > > KNI 3.2 Gbit/sec1.3 Gbit/sec
> > > TAP 4.9 4.7
> > > Virtio  5.6 8.6
> > >
> > > Perhaps for 18.11 we should change documentation to remove language
> > > claiming
> > > better performance with KNI, and then plan for future deprecation?
> > >
> >
> > Do TAP and Virtio provide equivalent function to KNI? I can't speak for
> any
> > other products, but ours is dependent on KNI. The ability for control
> plane
> > applications to use normal Linux sockets with DPDK is key even if it
> isn't
> > performant.
> >
> > Hopefully the answer is "yes", in which case I'll happily port over to
> > using one of the faster mechanisms.
> >
> > Thanks,
> > Jay
>
> See:
>
> https://doc.dpdk.org/guides-17.11/howto/virtio_user_as_exceptional_path.html


Thanks. Looks like it's time to run some experiments again.

Jay


Re: [dpdk-dev] DPDK Release Status Meeting 24/01/2018

2019-01-25 Thread Jay Rolette
>* Questions from Intel Test about the use of the Stable Tree.
>  Do people use it? Each stable/LTS release requires a lot of
>  testing and there are currently 3 releases to be tested.

We do @ infinite io.

Jay

On Fri, Jan 25, 2019 at 3:24 AM Mcnamara, John 
wrote:

> Release Status Meeting 24/01/2018
>
> 28 June 2018
> 09:32
>
> Agenda
> --
>
> * Subtrees
> * Bugzilla
> * Dates
>
> Subtrees
> 
>
> * Next-Net
>   * There are no critical/blocking issues. There is only a bunch of
> patches merged, again nothing critical. Pulled from sub-trees.
> It can be ready for Friday for rc4.
> * Next-Crypto
>   * No patches
> * Next-virtio
>   * Have prepared branch with 4 patches.
> * Next-Eventdev
>   * No patches
> * Next-Ip-pipeline
>   * No patches
> * Master
>   * Documentation patches
>
> RC4 target for Friday 25 January.
>
>
> RC3 Testing
> ---
>
> * No critical or very big issues for RC3 - from Intel Testing
>
>
> LTS Testing
> ---
>
> * Intel cannot test the LTS until after Chinese Spring Festival
> * Should be able to start retesting in the week of Feb 18th
> * Questions from Intel Test about the use of the Stable Tree.
>   Do people use it? Each stable/LTS release requires a lot of
>   testing and there are currently 3 releases to be tested.
>
>
> Deprecation notices
> ---
>
>Please review the deprecation notices below:
>   * doc: add deprecation notice for meson:
> http://patches.dpdk.org/patch/49877/
>   * doc: add deprecation notice to remove rte meter color:
> http://patches.dpdk.org/patch/48712/
>   * doc: update API deprecation for device reset:
> http://patches.dpdk.org/patch/44990/
>   * doc: announce change of rte service API parameter type:
> http://patches.dpdk.org/patch/49987/
>   * doc: announce ring API change:
> http://patches.dpdk.org/patch/49961/
>
>
> OVS 2.11 Status
> ---
>
> * 2.11 branched from master January 18th.
>* 4 week stabilization period.
> * Release targeted February 18th~.
>
> DPDK 19.05
> --
> * Request for to try get some changes in early
> * Agreement to accept proposed dates for 19.05
> * Intel to submit patch to website to update dates
>   * Proposal Deadline: Friday  1 March 2019
>   * Merge deadline:Friday 22 March 2019
>   * RC1 release:   Friday 29 March 2019
>   * Release:   Friday 10 May   2019
>
>
>


Re: [dpdk-dev] DPDK Release Status Meeting 24/01/2018

2019-01-25 Thread Jay Rolette
Hi Thomas,

On Fri, Jan 25, 2019 at 10:40 AM Thomas Monjalon 
wrote:

> Hi Jay,
>
> 25/01/2019 14:42, Jay Rolette:
> > >* Questions from Intel Test about the use of the Stable Tree.
> > >  Do people use it? Each stable/LTS release requires a lot of
> > >  testing and there are currently 3 releases to be tested.
> >
> > We do @ infinite io.
>
> Would you be OK to share your test plan and the results please?
>

I thought Intel was asking if anyone was using the LTS tree (ie., is it
worthwhile to continue maintaining and testing it). We don't test DPDK as a
standalone entity. We have test plans and such for our product, which
includes DPDK, but nothing that would be of interest elsewhere.

Sorry if I misunderstood the original question.

Jay


[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-22 Thread Jay Rolette
On Thu, Jan 22, 2015 at 3:06 AM, Luke Gorrie  wrote:

Here is another thought: when is it time to start thinking of packet copy
> as a cheap unit-time operation?
>

Pretty much never short of changes to memory architecture, IMO. Frankly,
there are never enough cycles for deep packet inspection applications that
need to run at/near line-rate. Don't waste any doing something you can
avoid in the first place.

Microseconds matter. Scaling up to 100GbE, nanoseconds matter.

Jay


[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization

2015-01-22 Thread Jay Rolette
On Thu, Jan 22, 2015 at 12:27 PM, Luke Gorrie  wrote:

> On 22 January 2015 at 14:29, Jay Rolette  wrote:
>
>> Microseconds matter. Scaling up to 100GbE, nanoseconds matter.
>>
>
> True. Is there a cut-off point though?
>

There are always engineering trade-offs that have to be made. If I'm
optimizing something today, I'm certainly not starting at something that
takes 1ns for an app that is doing L4-7 processing. It's all about
profiling and figuring out where the bottlenecks are.

For past networking products I've built, there was a lot of traffic that
the software didn't have to do much to. Minimal L2/L3 checks, then forward
the packet. It didn't even have to parse the headers because that was
offloaded on an FPGA. The only way to make those packets faster was to turn
them around in the FPGA and not send them to the CPU at all. That change
improved small packet performance by ~30%. That was on high-end network
processors that are significantly faster than Intel processors for packet
handling.

It seems to be a strange thing when you realize that just getting the
packets into the CPU is expensive, nevermind what you do with them after
that.

Does one nanosecond matter?
>

You just have to be careful when talking about things like a nanosecond.
It's sounds really small, but IPG for a 10G link is only 9.6ns. It's all
relative.

AVX512 will fit a 64-byte packet in one register and move that to or from
> memory with one instruction. L1/L2 cache bandwidth per server is growing on
> a double-exponential curve (both bandwidth per core and cores per CPU). I
> wonder if moving data around in cache will soon be too cheap for us to
> justify worrying about.
>

Adding cores helps with aggregate performance, but doesn't really help with
latency on a single packet. That said, I'll take advantage of anything I
can from the hardware to either let me scale up how much traffic I can
handle or the amount of features I can add at the same performance level!

Jay


[dpdk-dev] Any way to control port_id mapping?

2015-01-27 Thread Jay Rolette
Is there a way to control what ethernet port gets mapped to each port_id in
DPDK?

>From what I've seen, it looks like the port_id values just get assigned
based on the order they come back in the PCI probe/scan. Ideally, I'd like
to have the port_id values map reasonably directly to the customer-facing
labels on my appliance.

Label  : port_id
--   ---
Port 1 : 0
Port 2 : 1
Port 3 : 2
etc.

The PCI bus:slot.function values of the ethernet ports doesn't just sort
cleanly (no surprise).

I can work around it at higher levels, but it is a lot easier for debugging
and stats if there is a way to control how DPDK maps physical ports to
port_id values.

I didn't see anything in the docs or example apps so presumably the answer
is no, but thought I'd ask just in case.

Is that something other folks would find useful?

Thanks,
Jay


[dpdk-dev] Process question: reviewing older patches

2015-01-28 Thread Jay Rolette
There's a fairly old KNI patch (http://dpdk.org/dev/patchwork/patch/84/)
that I reviewed, but I'm not seeing how to submit my "Reviewed-by" when I
don't have any of the emails from the patch in my mail client.

I can copy the text from the 'mbox' link in Patchwork into an email, but
I'm guessing that may not make the patch toolchain happy.

What's the right way to do this?

Thanks,
Jay


[dpdk-dev] Process question: reviewing older patches

2015-01-28 Thread Jay Rolette
Thanks Thomas and Neil. Sadly, no joy. While I generally like gmail for my
mail, there's not a reasonable way to import the mbox file or to control
the message id.

If someone else wants to resend the message to the list, I can reply to
that. Otherwise, here are the relevant bits from the original patch email:

>From patchwork Wed Jul 23 06:45:12 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [dpdk-dev] kni: optimizing the rte_kni_rx_burst
From: Hemant Agrawal 
X-Patchwork-Id: 84
Message-Id: <14060979121185-git-send-email-Hemant at freescale.com>
To: 
Date: Wed, 23 Jul 2014 12:15:12 +0530

The current implementation of rte_kni_rx_burst polls the fifo for buffers.
Irrespective of success or failure, it allocates the mbuf and try to put
them into the alloc_q
if the buffers are not added to alloc_q, it frees them.
This waste lots of cpu cycles in allocating and freeing the buffers if
alloc_q is full.

The logic has been changed to:
1. Initially allocand add buffer(burstsize) to alloc_q
2. Add buffers to alloc_q only when you are pulling out the buffers.

Signed-off-by: Hemant Agrawal 

---
lib/librte_kni/rte_kni.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index 76feef4..01e85f8 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -263,6 +263,9 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool,

  ctx->in_use = 1;

+ /* Allocate mbufs and then put them into alloc_q */
+ kni_allocate_mbufs(ctx);
+
  return ctx;

 fail:
@@ -369,8 +372,9 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf
**mbufs, unsigned num)
 {
  unsigned ret = kni_fifo_get(kni->tx_q, (void **)mbufs, num);

- /* Allocate mbufs and then put them into alloc_q */
- kni_allocate_mbufs(kni);
+ /* If buffers removed, allocate mbufs and then put them into alloc_q */
+ if(ret)
+ kni_allocate_mbufs(kni);

  return ret;
 }

The patch looks good from a DPDK 1.6r2 viewpoint. We saw the same behavior
in our app and ended up avoiding it higher in the stack (in our code).

Reviewed-by: Jay Rolette 

Jay

On Wed, Jan 28, 2015 at 10:49 AM, Neil Horman  wrote:

> On Wed, Jan 28, 2015 at 09:52:48AM -0600, Jay Rolette wrote:
> > There's a fairly old KNI patch (http://dpdk.org/dev/patchwork/patch/84/)
> > that I reviewed, but I'm not seeing how to submit my "Reviewed-by" when I
> > don't have any of the emails from the patch in my mail client.
> >
> > I can copy the text from the 'mbox' link in Patchwork into an email, but
> > I'm guessing that may not make the patch toolchain happy.
> >
> > What's the right way to do this?
> >
> Just grab the message id from the patchwork site, and list it in the
> envelope
> headers in-reply-to: field when you respond.  You won't have the rest of
> the
> conversation field in the thread, but you will respond properly to the
> thread,
> and patchwork will pick up the ACK
> Neil
>
> > Thanks,
> > Jay
> >
>


[dpdk-dev] Process question: reviewing older patches

2015-01-29 Thread Jay Rolette
On Wed, Jan 28, 2015 at 3:23 PM, Neil Horman  wrote:

> On Wed, Jan 28, 2015 at 02:57:58PM -0600, Jay Rolette wrote:
> > Thanks Thomas and Neil. Sadly, no joy. While I generally like gmail for
> my
> > mail, there's not a reasonable way to import the mbox file or to control
> > the message id.
> >
> Sure there is, you just need to select an appropriate MUA.  You can't use
> the
> web interface for this.  Enable imap access to your gmail account, and
> setup an
> MUA like mutt to point to it.  Then the mutt client can open the mbox
> file, or
> you can fill out the in-reply-to: header manually.
>

True. I meant there's not a way to do this in the mail interface I use.
Honestly, while I was happy to do the review since I've been digging into
KNI anyway, I'm not going to spend time setting up an MUA just for this.

Jay


Re: [dpdk-dev] Announcement of SSE requirement change in dpdk

2017-08-11 Thread Jay Rolette
On Fri, Aug 11, 2017 at 3:29 PM, Bruce Richardson <
bruce.richard...@intel.com> wrote:

> On Wed, Aug 09, 2017 at 04:21:32PM -0400, Neil Horman wrote:
> > Can anyone point out to me when and where the change to require SSE4.2
> was
> > dicussed?  The first I saw of it was when the commit to the release
> notes went
> > in on August 3, and I can find no prior mention of it, save for the
> patches that
> > went in separately in the prior weeks.
> >
> > Neil
> >
> There was no real widespread discussion of it, if that's what you are
> looking for. I made the proposal via patch, and it was reviewed and
> acked by a number of folks, with nobody raising any objections at the
> time. Possibly it was a change that should have been more widely
> publicised ahead of time, but I'm not sure what form that publicization
> should have taken, since all tech discussion happens on the dev mailing
> list anyway.
> Not that I'm planning any similar changes, but for the future, what do
> you think the process for changes like this should be - and what changes
> would classify for it? If we have a process problem, let's try and fix
> it.
>
> Regards,
> /Bruce.
>

I'd suggest removing support for any hardware (CPUs or NICs) should require
a much "louder" announcement.

This particular one didn't hit my company, but when you have hardware out
in the field that you are expected to provide on-going software updates
for, these sort of changes are painful. The LTS tree helps considerably
with this, but definitely deserves more publicity than usual.

$0.02
Jay


Re: [dpdk-dev] Minutes of technical board meeting 2017-12-06

2017-12-12 Thread Jay Rolette
On Tue, Dec 12, 2017 at 11:26 AM, Richardson, Bruce <
bruce.richard...@intel.com> wrote:


> Topic: Management of old patches in patchwork
> * Unanimous agreement that old patches should be rejected in patchwork
>   after a reasonable period, set initially at 3 releases (9 months).
> * After a release, all patches dated older than 3 releases previously,
>   e.g. after 17.11, any patches submitted before the release of 17.02,
>   will be marked as rejected and a mail reply sent to the appropriate
>   mailing list thread informing people of the same.
> * To have patches reconsidered for future inclusion, a new version of
>   the patches should be submitted to the mailing list.
>

Does this mean there is a commitment to act on submitted patches in a
timely manner? Maybe this is better now, but at least in the past, even
small patches to fix bugs would sit around with no action on them for 6+
months.

It's been a while since I submitted any patches, so if this isn't an issue
now, then nevermind :-)

Jay


[dpdk-dev] [PATCH 1/3] kcp: add kernel control path kernel module

2016-02-29 Thread Jay Rolette
On Mon, Feb 29, 2016 at 5:06 AM, Thomas Monjalon 
wrote:

> Hi,
> I totally agree with Avi's comments.
> This topic is really important for the future of DPDK.
> So I think we must give some time to continue the discussion
> and have netdev involved in the choices done.
> As a consequence, these series should not be merged in the release 16.04.
> Thanks for continuing the work.
>

I know you guys are very interested in getting rid of the out-of-tree
drivers, but please do not block incremental improvements to DPDK in the
meantime. Ferruh's patch improves the usability of KNI. Don't throw out
good and useful enhancements just because it isn't where you want to be in
the end.

I'd like to see these be merged.

Jay


[dpdk-dev] [RFC 0/3] Use common Linux tools to control DPDK ports

2016-01-18 Thread Jay Rolette
On Mon, Jan 18, 2016 at 5:12 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Fri, 15 Jan 2016 16:18:01 +
> Ferruh Yigit  wrote:
>
> > This work is to make DPDK ports more visible and to enable using common
> > Linux tools to configure DPDK ports.
> >
> > Patch is based on KNI but contains only control functionality of it,
> > also this patch does not include any Linux kernel network driver as
> > part of it.
>
> I actually would like KNI to die and be replaced by something generic.
> Right now with KNI it is driver and hardware specific. It is almost as if
> there
> are three drivers for ixgbe, the Linux driver, the DPDK driver, and the
> KNI driver.
>

Any ideas about what that would look like? Having the ability to send
traffic to/from DPDK-owned ports from control plane applications that live
outside of (and are ignorant of) DPDK is a platform requirement for our
product.

I'm assuming that isn't uncommon, but that could just be the nature of the
types of products I've built over the years.

That said, I'd love there to be something that performs better and plays
nicer with the system than KNI.

Jay


[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-01-22 Thread Jay Rolette
On Fri, Jan 22, 2016 at 9:22 AM, Van Haaren, Harry <
harry.van.haaren at intel.com> wrote:

> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Subject: Re: [dpdk-dev] Future Direction for rte_eth_stats_get()
>
> > + Harry
>
> Hey all,
>
> > xstats are driver agnostic and have a well-defined naming scheme.
>
> Indeed, described here:
>
> http://dpdk.org/doc/guides/prog_guide/poll_mode_drv.html#extended-statistics-api
>
> All of the rte_eth_stats statistics are presented in xstats consistently
> (independent of which PMD is running), as they are implemented in
> rte_ethdev.c:
> http://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.c#n1500
>
>
> From: David Harton:
> > For example, what if there was a kind of "stats registry" composed of ID
> and name.  It
> > would work similar to xtats except instead of advertising strings only
> the "get" API would
> > return ID/count pairs.  If the application wishes to use the DPDK
> provided string they can
> > call another API to get the stat string via the ID.  These IDs would be
> well-defined
> > clearly explaining what the count represent.  This way the strings for
> counts will be
> > uniform across drivers and also make it clear to the users what the
> counts truly represent
> > and the application could obtain stats from any driver in a driver
> agnostic manner.
>
> What you (David Harton) describe about a well-defined name, and clearly
> explaining the value
> it is representing is what the goal was for xstats: a human readable
> string, which can be
> easily parsed and a value retrieved.
>
> The "rules" of encoding a statistic as an xstats string are pretty simple,
> and if any PMD
> breaks the rules, it should be considered broken and fixed.
>
> Consistency is key, in this case consistency in the PMD xstats
> implementations to make it
> useful for clients of the API.
>
> The big advantage xstats has over adding items to rte_eth_stats is that it
> won't break ABI.
>
> Do you see a fundamental problem with parsing the strings to retrieve
> values?
>

When you have an appliance with a large number of ports and a large number
of stats that you need to collect, having to parse those strings constantly
is very slow compared to having fixed struct members or at least known ID
values.

Strings are also error prone. While they may be consistent at this point,
it is remarkably easy for them to get out of sync along the way. Using an
ID instead avoids that neatly.

Jay


[dpdk-dev] [PATCH 2/3] rte_ctrl_if: add control interface library

2016-01-28 Thread Jay Rolette
On Thu, Jan 28, 2016 at 7:15 AM, Ferruh Yigit 
wrote:

> On Thu, Jan 28, 2016 at 11:14:47AM +, Remy Horton wrote:
> > On 27/01/2016 16:24, Ferruh Yigit wrote:
> >
> > > +   default:
> > > +   ret = -95 /* EOPNOTSUPP */;
> > > +   break;
> >
> > Is this intentional? -EOPNOTSUPP is -122 (-95 is -ENOTSOCK)..
> >
> Return value is not significant, callee just checks for negative value,
> I can remove comment to prevent confusion.
>

No, please fix the return value. Return values are significant when you are
trying to debug or understand the intent of the code.

Jay


[dpdk-dev] random pkt generator PMD

2016-06-15 Thread Jay Rolette
On Wed, Jun 15, 2016 at 4:43 AM, Yerden Zhumabekov 
wrote:

> Hello everybody,
>
> DPDK already got a number of PMDs for various eth devices, it even has PMD
> emulations for backends such as pcap, sw rings etc.
>
> I've been thinking about the idea of having PMD which would generate mbufs
> on the fly in some randomized fashion. This would serve goals like, for
> example:
>
> 1) running tests for applications with network processing capabilities
> without additional software packet generators;
> 2) making performance measurements with no hw inteference;
> 3) ability to run without root privileges, --no-pci, --no-huge, for CI
> build, so on.
>
> Maybe there's no such need, and these goals may be achieved by other means
> and this idea is flawed? Any thoughts?
>

Are you thinking of something along the lines of what BreakingPoint (now
part of Ixia) does, but as an open source software tool?


[dpdk-dev] random pkt generator PMD

2016-06-15 Thread Jay Rolette
On Wed, Jun 15, 2016 at 7:11 AM, Yerden Zhumabekov 
wrote:
>
>
> On 15.06.2016 17:50, Jay Rolette wrote:
>
>> On Wed, Jun 15, 2016 at 4:43 AM, Yerden Zhumabekov 
>> wrote:
>>
>> Hello everybody,
>>>
>>> DPDK already got a number of PMDs for various eth devices, it even has
>>> PMD
>>> emulations for backends such as pcap, sw rings etc.
>>>
>>> I've been thinking about the idea of having PMD which would generate
>>> mbufs
>>> on the fly in some randomized fashion. This would serve goals like, for
>>> example:
>>>
>>> 1) running tests for applications with network processing capabilities
>>> without additional software packet generators;
>>> 2) making performance measurements with no hw inteference;
>>> 3) ability to run without root privileges, --no-pci, --no-huge, for CI
>>> build, so on.
>>>
>>> Maybe there's no such need, and these goals may be achieved by other
>>> means
>>> and this idea is flawed? Any thoughts?
>>>
>>> Are you thinking of something along the lines of what BreakingPoint (now
>> part of Ixia) does, but as an open source software tool?
>>
>>
> More dreaming than thinking though :) Live flows generation, malware,
> attacks simulation etc is way out of scope of PMD dev, I guess.
>

Having a DPDK-based open-source BreakingPoint app would be a _fantastic_
tool for the security community, but yes, it doesn't really make sense to
put any of that logic in the PMD itself.

Were you more after the capabilities from that sort of tool or the
experience of writing a PMD?


[dpdk-dev] [PATCH 1/3] kcp: add kernel control path kernel module

2016-03-02 Thread Jay Rolette
On Tue, Mar 1, 2016 at 8:02 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Mon, 29 Feb 2016 08:33:25 -0600
> Jay Rolette  wrote:
>
> > On Mon, Feb 29, 2016 at 5:06 AM, Thomas Monjalon <
> thomas.monjalon at 6wind.com>
> > wrote:
> >
> > > Hi,
> > > I totally agree with Avi's comments.
> > > This topic is really important for the future of DPDK.
> > > So I think we must give some time to continue the discussion
> > > and have netdev involved in the choices done.
> > > As a consequence, these series should not be merged in the release
> 16.04.
> > > Thanks for continuing the work.
> > >
> >
> > I know you guys are very interested in getting rid of the out-of-tree
> > drivers, but please do not block incremental improvements to DPDK in the
> > meantime. Ferruh's patch improves the usability of KNI. Don't throw out
> > good and useful enhancements just because it isn't where you want to be
> in
> > the end.
> >
> > I'd like to see these be merged.
> >
> > Jay
>
> The code is really not ready. I am okay with cooperative development
> but the current code needs to go into a staging type tree.
> No compatibility, no ABI guarantees, more of an RFC.
> Don't want vendors building products with it then screaming when it
> gets rebuilt/reworked/scrapped.
>

That's fair. To be clear, it wasn't my intent for code that wasn't baked
yet to be merged.

The main point of my comment was that I think it is important not to halt
incremental improvements to existing capabilities (KNI in this case) just
because there are philosophical or directional changes that the community
would like to make longer-term.

Bird in the hand vs. two in the bush...

Jay


[dpdk-dev] mmap fails with more than 40000 hugepages

2015-02-05 Thread Jay Rolette
On Thu, Feb 5, 2015 at 6:00 AM, Damjan Marion (damarion)  wrote:

> Hi,
>
> I have system with 2 NUMA nodes and 256G RAM total. I noticed that DPDK
> crashes in rte_eal_init()
> when number of available hugepages is around 4 or above.
> Everything works fine with lower values (i.e. 3).
>
> I also tried with allocating 4 on node0 and 0 on node1, same crash
> happens.
>
>
> Any idea what might be causing this?
>

Any reason you can't switch to using 1GB hugepages? You'll get better
performance and your init time will be shorter. The systems we run on are
similar (256GB, 2 NUMA nodes) and that works fine for us.

Not directly related, but if you have to stick with 2MB hugepages, you
might want to take a look at a patch I submitted that fixes the O(n^2)
algorithm used in initializing hugepages.

Jay


[dpdk-dev] mmap fails with more than 40000 hugepages

2015-02-05 Thread Jay Rolette
On Thu, Feb 5, 2015 at 7:36 AM, Damjan Marion (damarion)  wrote:
>
>  On 05 Feb 2015, at 14:22, Jay Rolette  wrote:
>
>   Not directly related, but if you have to stick with 2MB hugepages, you
> might want to take a look at a patch I submitted that fixes the O(n^2)
> algorithm used in initializing hugepages.
>
>
>  I tried it hoping that it will change something, no luck?
>

I suggested that patch only to speed up your init time. There's nothing in
it that will help with the crash you are seeing.


[dpdk-dev] kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]

2015-02-10 Thread Jay Rolette
Environment:
  * DPDK 1.6.0r2
  * Ubuntu 14.04 LTS
  * kernel: 3.13.0-38-generic

When we start exercising KNI a fair bit (transferring files across it, both
sending and receiving), I'm starting to see a fair bit of these kernel
lockups:

kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]

Frequently I can't do much other than get a screenshot of the error message
coming across the console session once we get into this state, so debugging
what is happening is "interesting"...

I've seen this on multiple hardware platforms (so not box specific) as well
as virtual machines.

Are there any known issues with KNI that would cause kernel lockups in DPDK
1.6? Really hoping someone that knows KNI well can point me in the right
direction.

KNI in the 1.8 tree is significantly different, so it didn't look
straight-forward to back-port it, although I do see a few changes that
might be relevant.

Any suggestions, pointers or other general help for tracking this down?

Thanks!
Jay


[dpdk-dev] kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]

2015-02-16 Thread Jay Rolette
Thanks Alejandro.

I'll look into the kernel dump if there is one. The system is extremely
brittle once this happens. Usually I can't do much other than power-cycle
the box. Anything requiring sudo just locks the terminal up, so little to
look at besides the messages on the console.

Matthew Hall also suggested a few things for me to look into, so I'm
following up on that as well.

Jay

On Wed, Feb 11, 2015 at 10:25 AM, Alejandro Lucero <
alejandro.lucero at netronome.com> wrote:

> Hi Jay,
>
> I saw these errors when I worked in the HPC sector. They come usually with
> a kernel dump for each core in the machine so you can know, after some
> peering at the kernel code, how the soft lockup triggers. When I did that
> it was always an issue with the memory.
>
> So those times that you can still work on the machine after the problem,
> look at the kernel messages. I will be glad to look at it.
>
>
>
> On Wed, Feb 11, 2015 at 1:33 AM, Jay Rolette 
> wrote:
>
> > Environment:
> >   * DPDK 1.6.0r2
> >   * Ubuntu 14.04 LTS
> >   * kernel: 3.13.0-38-generic
> >
> > When we start exercising KNI a fair bit (transferring files across it,
> both
> > sending and receiving), I'm starting to see a fair bit of these kernel
> > lockups:
> >
> > kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]
> >
> > Frequently I can't do much other than get a screenshot of the error
> message
> > coming across the console session once we get into this state, so
> debugging
> > what is happening is "interesting"...
> >
> > I've seen this on multiple hardware platforms (so not box specific) as
> well
> > as virtual machines.
> >
> > Are there any known issues with KNI that would cause kernel lockups in
> DPDK
> > 1.6? Really hoping someone that knows KNI well can point me in the right
> > direction.
> >
> > KNI in the 1.8 tree is significantly different, so it didn't look
> > straight-forward to back-port it, although I do see a few changes that
> > might be relevant.
> >
> > Any suggestions, pointers or other general help for tracking this down?
> >
> > Thanks!
> > Jay
> >
>


[dpdk-dev] kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]

2015-02-16 Thread Jay Rolette
On Tue, Feb 10, 2015 at 7:33 PM, Jay Rolette  wrote:

> Environment:
>   * DPDK 1.6.0r2
>   * Ubuntu 14.04 LTS
>   * kernel: 3.13.0-38-generic
>
> When we start exercising KNI a fair bit (transferring files across it,
> both sending and receiving), I'm starting to see a fair bit of these kernel
> lockups:
>
> kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]
>
> Frequently I can't do much other than get a screenshot of the error
> message coming across the console session once we get into this state, so
> debugging what is happening is "interesting"...
>
> I've seen this on multiple hardware platforms (so not box specific) as
> well as virtual machines.
>
> Are there any known issues with KNI that would cause kernel lockups in
> DPDK 1.6? Really hoping someone that knows KNI well can point me in the
> right direction.
>
> KNI in the 1.8 tree is significantly different, so it didn't look
> straight-forward to back-port it, although I do see a few changes that
> might be relevant.
>

Found the problem. No patch to submit since it's already fixed in later
versions of DPDK, but thought I'd follow up with the details since I'm sure
we aren't the only ones trying to use bleeding-edge versions of DPDK...

In kni_net_rx_normal(), it was calling netif_receive_skb() instead of
netif_rx(). The source for netif_receive_skb() point out that it should
only be called from soft-irq context, which isn't the case for KNI.

As typical, simple fix once you track it down.

Yao-Po Wang's fix:  commit 41a6ebded53982107c1adfc0652d6cc1375a7db9.

Cheers,
Jay


[dpdk-dev] kernel: BUG: soft lockup - CPU#1 stuck for 22s! [kni_single:1782]

2015-02-17 Thread Jay Rolette
On Mon, Feb 16, 2015 at 7:00 PM, Matthew Hall  wrote:

> On Mon, Feb 16, 2015 at 10:33:52AM -0600, Jay Rolette wrote:
> > In kni_net_rx_normal(), it was calling netif_receive_skb() instead of
> > netif_rx(). The source for netif_receive_skb() point out that it should
> > only be called from soft-irq context, which isn't the case for KNI.
>
> For the uninitiated among us, what was the practical effect of the coding
> error? Waiting forever for a lock which will never be available in IRQ
> context, or causing unintended re-entrancy, or what?
>

Sadly, I'm not really one of the enlightened ones when it comes to the
Linux kernel. VxWorks? sure. Linux kernel? learning as required.

I didn't chase it down to the specific mechanism in this case. Unusual for
me, but this time I took the expedient route of finding a likely
explanation plus Yao's fix on that same code with his explanation of a
deadlock and went with it. It'll be a few more days before we've had enough
run time on it to absolutely confirm (not an easy bug to repro).

If I get hand-wavy about it, my assumption is that the requirement for
netif_receive_skb() be called in soft-irq context means it doesn't expect
to be pre-empted or rentrant.  When you call netif_rx() instead, it puts
the skb on the backlog and it gets processed from there. Part of that code
disables interrupts during part of the processing. Not sure what else is
coming in and actually deadlocking things.

Honestly, I don't understand enough details of how everything works in the
Linux network stack yet. I've done tons of work on the network path of
stack-less systems, a bit of work in device drivers, but have only touched
the surface of the internals of Linux network stack. The meat of my product
avoids that like the plague because it is too slow.

Sorry, lots of words but not much light being shed this time...
Jay


[dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst

2015-02-25 Thread Jay Rolette
On Wed, Feb 25, 2015 at 6:38 AM, Marc Sune  wrote:

>
> On 25/02/15 13:24, Hemant at freescale.com wrote:
>
>> Hi OIivier
>>  Comments inline.
>> Regards,
>> Hemant
>>
>>  -Original Message-
>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Deme
>>> Sent: 25/Feb/2015 5:44 PM
>>> To: dev at dpdk.org
>>> Subject: Re: [dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst
>>>
>>> Thank you Hemant, I think there might be one issue left with the patch
>>> though.
>>> The alloc_q must initially be filled with mbufs before getting mbuf back
>>> on the
>>> tx_q.
>>>
>>> So the patch should allow rte_kni_rx_burst to check if alloc_q is empty.
>>> If so, it should invoke kni_allocate_mbufs(kni, 0) (to fill the alloc_q
>>> with
>>> MAX_MBUF_BURST_NUM mbufs)
>>>
>>> The patch for rte_kni_rx_burst would then look like:
>>>
>>> @@ -575,7 +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf
>>> **mbufs, unsigned num)
>>>
>>>/* If buffers removed, allocate mbufs and then put them into
>>> alloc_q */
>>>if (ret)
>>> -kni_allocate_mbufs(kni);
>>> +  kni_allocate_mbufs(kni, ret);
>>> +  else if (unlikely(kni->alloc_q->write == kni->alloc_q->read))
>>> +  kni_allocate_mbufs(kni, 0);
>>>
>>>  [hemant]  This will introduce a run-time check.
>>
>> I missed to include the other change in the patch.
>>   I am doing it in kni_alloc i.e. initiate the alloc_q with default burst
>> size.
>> kni_allocate_mbufs(ctx, 0);
>>
>> In a way, we are now suggesting to reduce the size of alloc_q to only
>> default burst size.
>>
>
> As an aside comment here, I think that we should allow to tweak the
> userspace <-> kernel queue sizes (rx_q, tx_q, free_q and alloc_q) . Whether
> this should be a build configuration option or a parameter to
> rte_kni_init(), it is not completely clear to me, but I guess
> rte_kni_init() is a better option.
>

rte_kni_init() is definitely a better option. It allows things to be tuned
based on individual system config rather than requiring different builds.


> Having said that, the original mail from Hemant was describing that KNI
> was giving an out-of-memory. This to me indicates that the pool is
> incorrectly dimensioned. Even if KNI will not pre-allocate in the alloc_q,
> or not completely, in the event of high load, you will get this same "out
> of memory".
>
> We can reduce the usage of buffers by the KNI subsystem in kernel space
> and in userspace, but the kernel will always need a small cache of
> pre-allocated buffers (coming from user-space), since the KNI kernel module
> does not know where to grab the packets from (which pool). So my guess is
> that the dimensioning problem experienced by Hemant would be the same, even
> with the proposed changes.
>
>
>> Can we reach is situation, when the kernel is adding packets faster in
>> tx_q than the application is able to dequeue?
>>
>
> I think so. We cannot control much how the kernel will schedule the KNI
> thread(s), specially if the # of threads in relation to the cores is
> incorrect (not enough), hence we need at least a reasonable amount of
> buffering to prevent early dropping to those "internal" burst side effects.
>
> Marc


Strongly agree with Marc here. We *really* don't want just a single burst
worth of mbufs available to the kernel in alloc_q. That's just asking for
congestion when there's no need for it.

The original problem reported by Olivier is more of a resource tuning
problem than anything else. The number of mbufs you need in the system has
to take into account internal queue depths.

Jay


[dpdk-dev] soft lockup calling receive burst

2015-03-20 Thread Jay Rolette
On Fri, Mar 20, 2015 at 12:53 PM, Joseph Vossen  wrote:

> hello,
>
> I am working on a dpdk-based app using version 1.6 under RH7.3.  Under
> varying traffic loads, I will intermittently notice a kernel soft lockup
> and the RIP provided by the kernel always points to the same MOV
> instruction in rte_ethdev.h (line #1982).  The stack trace looks like:
>
>
> dev = &rte_eth_devices[port_id];
> return
> (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id], rx_pkts, nb_pkts);
>
>
> 473176: 0f b7 15 a7 68 38 00movzwl
> 0x3868a7(%rip),%edx  # 7f9a24  rte_eth_rx_burst():
> 47317d: 0f b6 31movzbl
> (%rcx),%esi
> 473180: 48 89 f0mov
> %rsi,%rax
> 473183: 0f b6 71 01 movzbl
> 0x1(%rcx),%esi
> 473187: 48 c1 e0 06 shl $0x6,%rax
> 47318b: 48 8b b8 70 ed 83 00mov
> 0x83ed70(%rax),%rdi
> >   473192: 48 8b 0fmov
> (%rdi),%rcx
> 473195: 48 8b 3c f1 mov
> (%rcx,%rsi,8),%rdi
> 473199: 4c 89 eemov
> %r13,%rsi
> 47319c: ff 90 60 ed 83 00   callq
> *0x83ed60(%rax)
>
>
> has any one else seen something like this?
>
> thanks


I haven't seen that, but there's a soft-lockup in KNI in DPDK 1.6. Here's
what I posted about that one a few weeks ago:

Found the problem. No patch to submit since it's already fixed in later
versions of DPDK, but thought I'd follow up with the details since I'm sure
we aren't the only ones trying to use bleeding-edge versions of DPDK...

In kni_net_rx_normal(), it was calling netif_receive_skb() instead of
netif_rx(). The source for netif_receive_skb() point out that it should
only be called from soft-irq context, which isn't the case for KNI.

As typical, simple fix once you track it down.

Yao-Po Wang's fix:  commit 41a6ebded53982107c1adfc0652d6cc1375a7db9.

Jay


[dpdk-dev] Kernel deadlock due to rte_kni

2015-03-25 Thread Jay Rolette
http://patchwork.dpdk.org/ml/archives/dev/2015-February/013335.html

Jay

On Wed, Mar 25, 2015 at 2:39 PM, Dey, Souvik  wrote:

> Hi All,
> There looks like an issue will rte_kni.ko which gets
> kernel into deadlock. We are trying to run rte_kni.ko with multiple thread
> support which are pinned to different non-isolated cores. When we test with
> tcp/tls the kernel is getting hanged in on race condition. Below is the
> kernel stack trace.
>
> PID: 19942  TASK: 880227a71950  CPU: 3   COMMAND: "CE_2N_Comp_SamP"
> #0 [88043fd87ec0] crash_nmi_callback at 8101d4a8
> -- MORE --  forward: ,  or j  backward: b or k  quit: q
> #1 [88043fd87ed0] notifier_call_chain at 81055b68
> #2 [88043fd87f00] notify_die at 81055be0
> #3 [88043fd87f30] do_nmi at 81009ddd
> #4 [88043fd87f50] nmi at 812ea9d0
> [exception RIP: _raw_spin_lock_bh+25]
> RIP: 812ea2a4  RSP: 880189439c88  RFLAGS: 0293
> RAX: 5b59  RBX: 880291708ec8  RCX: 045a
> RDX: 880189439d90  RSI:   RDI: 880291708ec8
> RBP: 880291708e80   R8: 047fef78   R9: 0001
> R10: 0009  R11: 8126c658  R12: 880423799a40
> R13: 880189439e08  R14: 045a  R15: 0017
> ORIG_RAX:   CS: 0010  SS: 0018
> ---  ---
> #5 [880189439c88] _raw_spin_lock_bh at 812ea2a4
> #6 [880189439c90] lock_sock_nested at 8122e948
> #7 [880189439ca0] tcp_sendmsg at 8126c676
> #8 [880189439d50] sock_aio_write at 8122bb12
> #9 [880189439e00] do_sync_write at 810c61c6
> #10 [880189439f10] vfs_write at 810c68a9
> #11 [880189439f40] sys_write at 810c6dfe
> #12 [880189439f80] system_call_fastpath at 812eab92
> RIP: 7fc7909bc0ed  RSP: 7fc787ffe108  RFLAGS: 0202
> RAX: 0001  RBX: 812eab92  RCX: 7fc7880aa170
> RDX: 045a  RSI: 04d56546  RDI: 002b
> RBP: 04d56546   R8: 047fef78   R9: 0001
> R10: 0009  R11: 0293  R12: 04d56546
> R13: 0483de10  R14: 045a  R15: 0001880008b0
> ORIG_RAX: 0001  CS: 0033  SS: 002b
>
> PID: 3598   TASK: 88043db21310  CPU: 1   COMMAND: "kni_pkt0"
> #0 [88043fc87ec0] crash_nmi_callback at 8101d4a8
> #1 [88043fc87ed0] notifier_call_chain at 81055b68
> #2 [88043fc87f00] notify_die at 81055be0
> #3 [88043fc87f30] do_nmi at 81009ddd
> #4 [88043fc87f50] nmi at 812ea9d0
> [exception RIP: _raw_spin_lock+16]
> RIP: 812ea0b1  RSP: 88043fc83e78  RFLAGS: 0297
> RAX: 5a59  RBX: 880291708e80  RCX: 0001
> RDX: 88043fc83ec0  RSI: 2f82  RDI: 880291708ec8
> RBP: 88043d8f4000   R8: 813a8d20   R9: 0001
> R10: 88043d9d8098  R11: 8101e62a  R12: 81279a3c
> R13: 88042d9b3fd8  R14: 880291708e80  R15: 88043fc83ec0
> ORIG_RAX:   CS: 0010  SS: 0018
> ---  ---
> #5 [88043fc83e78] _raw_spin_lock at 812ea0b1
> #6 [88043fc83e78] tcp_delack_timer at 81279a4e
> #7 [88043fc83e98] run_timer_softirq at 8104642d
> #8 [88043fc83f08] __do_softirq at 81041539
> #9 [88043fc83f48] call_softirq at 812ebd9c
> #10 [88043fc83f60] do_softirq at 8100b037
> #11 [88043fc83f80] irq_exit at 8104185a
> #12 [88043fc83f90] smp_apic_timer_interrupt at 8101eaef
> #13 [88043fc83fb0] apic_timer_interrupt at 812eb553
> ---  ---
> #14 [88042d9b3ae8] apic_timer_interrupt at 812eb553
> [exception RIP: tcp_rcv_established+1732]
> RIP: 812756ab  RSP: 88042d9b3b90  RFLAGS: 0202
> RAX: 0020  RBX: 88042d86f470  RCX: 020a
> RDX: 8801fc163864  RSI: 88032f8b2380  RDI: 880291708e80
> RBP: 88032f8b2380   R8: 88032f8b2380   R9: 81327a60
> R10: 000e  R11: 8112ae8f  R12: 812eb54e
> R13: 8122ea88  R14: 010c  R15: 05a8
> -- MORE --  forward: ,  or j  backward: b or k  quit: q
> ORIG_RAX: ff10  CS: 0010  SS: 0018
> #15 [88042d9b3bd8] tcp_v4_do_rcv at 8127b483
> #16 [88042d9b3c48] tcp_v4_rcv at 8127d89f
> #17 [88042d9b3cb8] ip_local_deliver_finish at 81260cc2
> #18 [88042d9b3cd8] __netif_receive_skb at 81239de6
> #19 [88042d9b3d28] netif_receive_skb at 81239e7f
> #20 [88042d9b3d58] kni_net_rx_normal at a022b06f [rte_kni]
> #21 [88042d9b3ec8] kni_thread_multiple at a022a2df [rte_kni]
> #22 [88042d9b3ee8] kthread at ff

[dpdk-dev] [PATCH RFC] librte_reorder: new reorder library

2014-10-09 Thread Jay Rolette
Hi Reshma,

A few comments and questions about your design...

1) How do you envision the reorder library to be used? Based on the
description, it seems like the expectation is that packet order would be
maintained at either the interface/port level or maybe at the RX queue
level. Is that right or am I reading too much between the lines?

For my purposes (and for network security products I've developed in the
past), I mostly don't care about box or port-level order. The most
important thing is to maintain packet order within a flow. Relative order
from packets in different flows doesn't matter. If there is a way I can
process packets in parallel and transmit out-of-order transmission *within
the flow*, that's very useful. Architecturally, it helps avoid hot-spotting
in my packet processing pipeline and wasting cycles when load-balancing
isn't perfect (and it never is).

2) If the reorder library is "flow aware", then give me flexibility on
deciding what a flow is. Let me define pseudo-flows even if the protocol
itself isn't connection oriented (ie., frequently useful to treat UDP
5-tuples as a flow). I may want to include tunnels/VLANs/etc. as part of my
"flow" definition. I may need to include the physical port as part of the
flow definition.

Ideally, the library includes the common cases and gives me the option to
register a callback function for doing whatever sort of "flows" I require
for my app.

3) Is there a way to apply the reorder library to some packets and not
others? I might want to use for TCP and UDP, but not care about order for
other IP traffic (for example).

4) How are you dealing with internal congestion? If I drop a packet
somewhere in my processing pipeline, how does the TX side of the reorder
queue/buffer deal with the missing sequence number? Is there some sort of
timeout mechanism so that it will only wait for X microseconds for a
missing sequence number?

Need the ability to bound how long packets are held up in the reorder
engine before they are released.

Assuming you address this, the reorder engine will also need to deal with
slow packets that show up after "later" packets were transmitted.

Regards,
Jay


On Tue, Oct 7, 2014 at 4:33 AM, Pattan, Reshma 
wrote:

> Hi All,
>
> I am planning  to implement packet reorder library. Details are as below,
> please go through them and provide the comments.
>
> Requirement:
>To reorder out of ordered packets that are received from
> different cores.
>
> Usage:
> To be used along with distributor library. Next version of distributor are
> planned to distribute incoming packets to all worker cores irrespective of
> the flow type.
> In this case to ensure in order delivery of the packets at output side
> reorder library will used by the tx end.
>
> Assumption:
> All input packets will be marked with sequence number in seqn field of
> mbuf, this will be the reference for reordering at the tx end.
> Sequence number will be of type uint32_t. New sequence number field seqn
> will be added to mbuf structure.
>
> Design:
> a)There will be reorder buffer(circular buffer) structure maintained in
> reorder library to store reordered packets and other details of buffer like
> head to drain the packet from, min sequence number and other details.
>b)Library will provide insert and drain functions to
> reorder and fetch out the reordered packets respectively.
> c)Users of library should pass the packets to insert functions for
> reordering.
>
> Insertion logic:
> Sequence number of current packet will be used to calculate offset in
> reorder buffer and write packet to the location  in the reorder buffer
> corresponding to offset.
>  Offset is calculated as difference of current
> packet  sequence number and sequence number associated with  reorder buffer.
>
> During sequence number wrapping or wrapping over of reorder buffer size,
> before inserting the new packet we should move offset number of packets to
> other buffer called overflow buffer and advance the head of reorder buffer
> by "offset-reorder buffer size" and insert the new packet.
>
> Insert function:
> int rte_reorder_insert(struct rte_reorder_buffer *buffer, struct rte_mbuf
> *mbuf);
> Note: Other insert function is also under plan to insert burst of packets.
>
>Reorder buffer:
> struct rte_reorder_buffer {
> unsigned int size;  /* The size (number of entries) of the
> buffer. */
> unsigned int mask;  /* Mask (size - 1) of the buffer */
> unsigned int head;  /* Current head of buffer */
> uint32_t min_seqn;  /* latest sequence number associated with
> buffer */
> struct rte_mbuf *entries[MAX_REORDER_BUFFER_SIZE]; /* buffer to
> hold reordered mbufs */
> };
>
> d)Users can fetch out the reordered packets by drain function provided by
> library. Users must pass the mbuf array , drain function should fill
> passed mbuff array  with the reordered buffer packets.

[dpdk-dev] [PATCH RFC] librte_reorder: new reorder library

2014-10-17 Thread Jay Rolette
Thanks for the responses, Reshma.

Can you provide a little more context about the use case that your reorder
library is intended to help with? If I'm understanding your answers
correctly, the functionality seems pretty limited and not something I would
ever end up using, but that may be more about the types of products I build
(deep packet inspection and working at L4-L7 generally, even though running
at or near line-rate).

Please take my comments in the spirit intended... If the design makes sense
for different use cases and I'm not the target audience, that's perfectly
ok and there are probably different trade-offs being made. But if it is
intended to be useful for DPI applications, I'd hate to just be quiet and
end up with something that doesn't get used as much as it might.

I haven't looked at the distributor library, so entirely possible it makes
more sense in that context.

More detailed responses to your previous answers inline.

Regards,
Jay

On Fri, Oct 17, 2014 at 4:44 AM, Pattan, Reshma 
wrote:

>  Hi Jay,
>
>
>
> Please find comments inline.
>
>
>
> Thanks,
>
> Reshma
>
>
>
> *From:* Jay Rolette [mailto:rolette at infiniteio.com]
> *Sent:* Thursday, October 9, 2014 8:02 PM
> *To:* Pattan, Reshma
> *Cc:* dev at dpdk.org
> *Subject:* Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
>
>
>
> Hi Reshma,
>
>
>
> A few comments and questions about your design...
>
>
>
> 1) How do you envision the reorder library to be used? Based on the
> description, it seems like the expectation is that packet order would be
> maintained at either the interface/port level or maybe at the RX queue
> level. Is that right or am I reading too much between the lines?
>
>
>
> For my purposes (and for network security products I've developed in the
> past), I mostly don't care about box or port-level order. The most
> important thing is to maintain packet order within a flow. Relative order
> from packets in different flows doesn't matter. If there is a way I can
> process packets in parallel and transmit out-of-order transmission *within
> the flow*, that's very useful. Architecturally, it helps avoid hot-spotting
> in my packet processing pipeline and wasting cycles when load-balancing
> isn't perfect (and it never is).
>
> [Reshma]: Generic parallel processing of packets is planned in phase2
> version of distributor based on sequence numbers, but not flow based
>  parallel processing.
>

See question at the top of my email about the intended use-case. For DPI
applications, global (box-wide or per port) reordering isn't normally
required. Maintaining order within flows is the important part. Depending
on your implementation and the guarantees you make, the impact it has on
aggregate system throughput can be significant.


>   2) If the reorder library is "flow aware", then give me flexibility on
> deciding what a flow is. Let me define pseudo-flows even if the protocol
> itself isn't connection oriented (ie., frequently useful to treat UDP
> 5-tuples as a flow). I may want to include tunnels/VLANs/etc. as part of my
> "flow" definition. I may need to include the physical port as part of the
> flow definition.
>
>
>
> Ideally, the library includes the common cases and gives me the option to
> register a callback function for doing whatever sort of "flows" I require
> for my app.
>
> [Reshma]:It is not flow aware. But to reorder packets of particular flow,
> you can handover particular flow to the library and library will give you
> back the reordered data.
>

I think given how a couple of other bits are described, this doesn't end up
helping. More a bit further down.


>   3) Is there a way to apply the reorder library to some packets and not
> others? I might want to use for TCP and UDP, but not care about order for
> other IP traffic (for example).
>
> [Reshma]:No, reorder library will not have intelligence about  traffic
> type (i.e. flow or protocols based).
>
> Applications can do  traffic  splitting into flows or  protocol based and
>  handover to library for reordering
>

Ditto

  4) How are you dealing with internal congestion? If I drop a packet
> somewhere in my processing pipeline, how does the TX side of the reorder
> queue/buffer deal with the missing sequence number? Is there some sort of
> timeout mechanism so that it will only wait for X microseconds for a
> missing sequence number?
>
> [Reshma]: Library just takes care of packets what it has  got. No waiting
> mechanism is used for missing packets.
>
> Reorder processing will skip the dropped packets(i.e. will create a gap in
> reorder buffer) and proceed with allocation of slot to

[dpdk-dev] Fwd: [dpdk-announce] DPDK Features for Q1 2015

2014-10-23 Thread Jay Rolette
Tim,

Thanks for sharing this. If nothing else, I wanted to at least provide some
feedback on the parts that look useful to me for my applications/product.
Bits that make me interested in the release:



*> 2.0 (Q1 2015) DPDK Features:> Bifurcated Driver: With the Bifurcated
Driver, the kernel will retain direct control of the NIC, and will assign
specific queue pairs to DPDK. Configuration of the NIC is controlled by the
kernel via ethtool.*

Having NIC configuration, port stats, etc. available via the normal Linux
tools is very helpful - particularly on new products just getting started
with DPDK.


*> Packet Reordering: Assign a sequence number to packets on Rx, and then
provide the ability to reorder on Tx to preserve the original order.*

This could be extremely useful but it depends on where it goes. The current
design being discussed seems fundamentally flawed to me. See the thread on
the RFC for details.


*> Packet Distributor (phase 2): Implement the following enhancements to
the Packet Distributor that was originally delivered in the DPDK 1.7
release: performance improvements; the ability for packets from a flow to
be processed by multiple worker cores in parallel and then reordered on Tx
using the Packet Reordering feature; the ability to have multiple
Distributors which share Worker cores.*

TBD on this for me. The 1.0 version of our product is based on DPDK 1.6 and
I haven't had a chance to look at what is happening with Packet Distributor
yet. An area of potential interest at least.


*> Cuckoo Hash: A new hash algorithm was implemented as part of the Cuckoo
Switch project (see http://www.cs.cmu.edu/~dongz/papers/cuckooswitch.pdf
), and shows some
promising performance results. This needs to be modified to make it more
generic, and then incorporated into DPDK.*

More performance == creamy goodness, especially if it is in the plumbing
and doesn't require significant app changes.


*> Interrupt mode for PMD: Allow DPDK process to transition to interrupt
mode when load is low so that other processes can run, or else power can be
saved. This will increase latency/jitter.*

Yes! I don't care about power savings, but I do care about giving a good
product impression in the lab during evals without having to sacrifice
overall system performance when under load. Hybrid drivers that use
interrupts when load is low and poll-mode when loaded are ideal, IMO.

It seems an odd thing, but during lab testing, it is normal for customers
to fire the box up and just start running pings or some other low volume
traffic through the box. If the PMDs are configured to batch in sizes
optimal for best performance under load, the system can look *really* bad
in these initial tests. We go through a fair bit of gymnastics right now to
work around this without just giving up on batching in the PMDs.


*> DPDK Headroom: Provide a mechanism to indicate how much headroom (spare
capacity) exists in a DPDK process.*

Very helpful in the field. Anything that helps customers understand how
much headroom is left on their box before they need to take action is a
huge win. CPU utilization is a bad indicator, especially with a PMD
architecture.

Hope this type of feedback is helpful.

Regards,
Jay


[dpdk-dev] [PATCH] ethdev: distinguish between drop and error stats

2015-10-02 Thread Jay Rolette
Can you improve the comments on these counters? If you didn't happen to
follow this thread, there's no way to reasonably figure out what the
difference is from looking at the code without chasing it all the way down
and cross-referencing the NIC datasheet.

Thanks,
Jay

On Fri, Oct 2, 2015 at 7:47 AM, Maryam Tahhan 
wrote:

> Make a distniction between dropped packets and error statistics to allow
> a higher level fault management entity to interact with DPDK and take
> appropriate measures when errors are detected. It will also provide
> valuable information for any applications that collects/extracts DPDK
> stats, such applications include Open vSwitch.
> After this patch the distinction is:
> ierrors = Total number of packets dropped by hardware (malformed
> packets, ...) Where the # of drops can ONLY be <=  the packets received
> (without overlap between registers).
> Rx_pkt_errors = Total number of erroneous received packets. Where the #
> of errors can be >= the packets received (without overlap between
> registers), this is because there may be multiple errors associated with
> a packet.
>
> Signed-off-by: Maryam Tahhan 
> ---
>  lib/librte_ether/rte_ethdev.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 8a8c82b..53dd55d 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -200,8 +200,9 @@ struct rte_eth_stats {
> /**< Deprecated; Total of RX packets with CRC error. */
> uint64_t ibadlen;
> /**< Deprecated; Total of RX packets with bad length. */
> -   uint64_t ierrors;   /**< Total number of erroneous received
> packets. */
> +   uint64_t ierrors;   /**< Total number of dropped received packets.
> */
> uint64_t oerrors;   /**< Total number of failed transmitted
> packets. */
> +   uint64_t ipkterrors;   /**< Total number of erroneous received
> packets. */
> uint64_t imcasts;
> /**< Deprecated; Total number of multicast received packets. */
> uint64_t rx_nombuf; /**< Total number of RX mbuf allocation
> failures. */
> --
> 2.4.3
>
>


[dpdk-dev] [PATCH] ethdev: distinguish between drop and error stats

2015-10-08 Thread Jay Rolette
On Thu, Oct 8, 2015 at 9:37 AM, Tahhan, Maryam 
wrote:

> Hi Jay
>
> Yeah, I will do this to make the distinction clear for the counters I?m
> modifying in my patch. But please note that these aren?t counters that are
> defined in the NIC datasheets, these are the high level DPDK general
> counters for ethdevs. Their descriptions exist in the code and the
> generated DPDK docs.
>

Yeah, I know. The problem is that the generated docs don't make it clear
what the counters really are and you end up having to trace through the
code and, frequently, go look at NIC datasheets.

I just went through this a few months ago when I was integrating DPDK port
stats into our stat tracking system.

I appreciate you making the extra effort to make this clearer for app
developers :-)

Jay


> *From:* Jay Rolette [mailto:rolette at infiniteio.com]
> *Sent:* Friday, October 2, 2015 2:25 PM
> *To:* Tahhan, Maryam
> *Cc:* DPDK
> *Subject:* Re: [dpdk-dev] [PATCH] ethdev: distinguish between drop and
> error stats
>
>
>
> Can you improve the comments on these counters? If you didn't happen to
> follow this thread, there's no way to reasonably figure out what the
> difference is from looking at the code without chasing it all the way down
> and cross-referencing the NIC datasheet.
>
>
>
> Thanks,
>
> Jay
>
>
>
> On Fri, Oct 2, 2015 at 7:47 AM, Maryam Tahhan 
> wrote:
>
> Make a distniction between dropped packets and error statistics to allow
> a higher level fault management entity to interact with DPDK and take
> appropriate measures when errors are detected. It will also provide
> valuable information for any applications that collects/extracts DPDK
> stats, such applications include Open vSwitch.
> After this patch the distinction is:
> ierrors = Total number of packets dropped by hardware (malformed
> packets, ...) Where the # of drops can ONLY be <=  the packets received
> (without overlap between registers).
> Rx_pkt_errors = Total number of erroneous received packets. Where the #
> of errors can be >= the packets received (without overlap between
> registers), this is because there may be multiple errors associated with
> a packet.
>
> Signed-off-by: Maryam Tahhan 
> ---
>  lib/librte_ether/rte_ethdev.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 8a8c82b..53dd55d 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -200,8 +200,9 @@ struct rte_eth_stats {
> /**< Deprecated; Total of RX packets with CRC error. */
> uint64_t ibadlen;
> /**< Deprecated; Total of RX packets with bad length. */
> -   uint64_t ierrors;   /**< Total number of erroneous received
> packets. */
> +   uint64_t ierrors;   /**< Total number of dropped received packets.
> */
> uint64_t oerrors;   /**< Total number of failed transmitted
> packets. */
> +   uint64_t ipkterrors;   /**< Total number of erroneous received
> packets. */
> uint64_t imcasts;
> /**< Deprecated; Total number of multicast received packets. */
> uint64_t rx_nombuf; /**< Total number of RX mbuf allocation
> failures. */
> --
> 2.4.3
>
>
>


[dpdk-dev] [PATCH] rte_alarm: modify it to make it not to be affected by discontinuous jumps in the system time

2015-10-14 Thread Jay Rolette
Back when this was first submitted in June, I mentioned that
CLOCK_MONOTONIC_RAW was ~10x slower than CLOCK_MONOTONIC:

http://dpdk.org/ml/archives/dev/2015-June/018687.html

It's not completely free from NTP frequency adjustments, but it won't have
any discontinuities.

That's what we've been using in our tree since then...

Jay


On Tue, Oct 13, 2015 at 7:33 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Fri,  5 Jun 2015 10:46:36 +0800
> Wen-Chi Yang  wrote:
>
> > Due to eal_alarm_callback() and rte_eal_alarm_set() use gettimeofday()
> > to get the current time, and gettimeofday() is affected by jumps.
> >
> > For example, set up a rte_alarm which will be triggerd next second (
> > current time + 1 second) by rte_eal_alarm_set(). And the callback
> > function of this rte_alarm sets up another rte_alarm which will be
> > triggered next second (current time + 2 second).
> > Once we change the system time when the callback function is triggered,
> > it is possiblb that rte alarm functionalities work out of expectation.
> >
> > Replace gettimeofday() with clock_gettime(CLOCK_MONOTONIC_RAW, &now)
> > could avoid this phenomenon.
> >
> > Signed-off-by: Wen-Chi Yang 
>
> Agreed, this should be applied.
> Does BSD version have same problem?
>
> Acked-by: Stephen Hemminger 
>
>


[dpdk-dev] [RFC] location for DPDK related white papers

2015-10-23 Thread Jay Rolette
On Fri, Oct 23, 2015 at 5:28 AM, Mcnamara, John 
wrote:

> Hi,
>
> We have had a few people wishing to submit DPDK related white papers.
> These tend to focus on particular aspects of DPDK and don't fit into
> any of the existing documents.
>
> Where should these be stored/made available? Some options:
>
> * In the repo, in RST format in a doc/guides/white_paper directory.
> * On dpdk.org in PDF format.
> * Elsewhere.
>
> Any comments, suggestions?
>

PDFs on dpdk.org


[dpdk-dev] rte_eal_init() alternative?

2015-09-02 Thread Jay Rolette
On Wed, Sep 2, 2015 at 7:56 AM, Bruce Richardson  wrote:

> On Wed, Sep 02, 2015 at 12:49:40PM +, Montorsi, Francesco wrote:
> > Hi all,
> >
> > Currently it seems that the only way to initialize EAL is using
> rte_eal_init() function, correct?
> >
> > I have the problem that rte_eal_init() will call rte_panic() whenever
> something fails to initialize or in other cases it will call exit().
> > In my application, I would rather like to attempt DPDK initialization.
> If it fails I don't want to exit.
> > Unfortunately I cannot even copy&paste the rte_eal_init() code into my
> application (removing rte_panic and exit calls) since it uses a lot of DPDK
> internal private functions.
> >
> > I think that my requirements (avoid abort/exit calls when init fails) is
> a basic requirement... would you accept a patch that adds an alternative
> rte_eal_init() function that just returns an error code upon failure,
> instead of immediately exiting?
> >
> > Thanks for your hard work!
> >
> > Francesco Montorsi
> >
> I, for one, would welcome such a patch. I think the code is overly quick in
> many places to panic or exit the app, when an error code would be more
> appropriate.
> Feel free to also look at other libraries in DPDK too, if you like :-)
>
> Regards,
> /Bruce
>

+1


[dpdk-dev] [PATCH v1] change hugepage sorting to avoid overlapping memcpy

2015-09-08 Thread Jay Rolette
Most of the code in sort_by_physaddr() should be replaced by a call to
qsort() instead. Less code and gets rid of an O(n^2) sort. It's only init
code, but given how long EAL init takes, every bit helps.

I submitted a patch for this close to a year ago:
http://dpdk.org/dev/patchwork/patch/2061/

Jay

On Tue, Sep 8, 2015 at 7:45 AM, Gonzalez Monroy, Sergio <
sergio.gonzalez.monroy at intel.com> wrote:

> Hi Ralf,
>
> Just a few comments/suggestions:
>
> Add 'eal/linux:'  to the commit title, ie:
>   "eal/linux: change hugepage sorting to avoid overlapping memcpy"
>
> On 04/09/2015 11:14, Ralf Hoffmann wrote:
>
>> with only one hugepage or already sorted hugepage addresses, the sort
>> function called memcpy with same src and dst pointer. Debugging with
>> valgrind will issue a warning about overlapping area. This patch changes
>> the bubble sort to avoid this behavior. Also, the function cannot fail
>> any longer.
>>
>> Signed-off-by: Ralf Hoffmann 
>> ---
>>   lib/librte_eal/linuxapp/eal/eal_memory.c | 27
>> +--
>>   1 file changed, 13 insertions(+), 14 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
>> b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> index ac2745e..6d01f61 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> @@ -699,25 +699,25 @@ error:
>>* higher address first on powerpc). We use a slow algorithm, but we
>> won't
>>* have millions of pages, and this is only done at init time.
>>*/
>> -static int
>> +static void
>>   sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info
>> *hpi)
>>   {
>> unsigned i, j;
>> -   int compare_idx;
>> +   unsigned compare_idx;
>> uint64_t compare_addr;
>> struct hugepage_file tmp;
>> for (i = 0; i < hpi->num_pages[0]; i++) {
>> -   compare_addr = 0;
>> -   compare_idx = -1;
>> +   compare_addr = hugepg_tbl[i].physaddr;
>> +   compare_idx = i;
>> /*
>> -* browse all entries starting at 'i', and find the
>> +* browse all entries starting at 'i+1', and find the
>>  * entry with the smallest addr
>>  */
>> -   for (j=i; j< hpi->num_pages[0]; j++) {
>> +   for (j=i + 1; j < hpi->num_pages[0]; j++) {
>>
> Although there are many style/checkpatch issues in current code, we try to
> fix them
> in new patches.
> In that regard, checkpatch complains about above line with:
> ERROR:SPACING: spaces required around that '='
>
>   - if (compare_addr == 0 ||
>> +   if (
>>   #ifdef RTE_ARCH_PPC_64
>> hugepg_tbl[j].physaddr > compare_addr) {
>>   #else
>> @@ -728,10 +728,9 @@ sort_by_physaddr(struct hugepage_file *hugepg_tbl,
>> struct hugepage_info *hpi)
>> }
>> }
>>   - /* should not happen */
>> -   if (compare_idx == -1) {
>> -   RTE_LOG(ERR, EAL, "%s(): error in physaddr
>> sorting\n", __func__);
>> -   return -1;
>> +   if (compare_idx == i) {
>> +   /* no smaller page found */
>> +   continue;
>> }
>> /* swap the 2 entries in the table */
>> @@ -741,7 +740,8 @@ sort_by_physaddr(struct hugepage_file *hugepg_tbl,
>> struct hugepage_info *hpi)
>> sizeof(struct hugepage_file));
>> memcpy(&hugepg_tbl[i], &tmp, sizeof(struct
>> hugepage_file));
>> }
>> -   return 0;
>> +
>> +   return;
>>   }
>>
> I reckon checkpatch is not picking this one because the end-of-function is
> not part of the patch,
> but it is a warning:
> WARNING:RETURN_VOID: void function return statements are not generally
> useful
>
> /*
>> @@ -1164,8 +1164,7 @@ rte_eal_hugepage_init(void)
>> goto fail;
>> }
>>   - if (sort_by_physaddr(&tmp_hp[hp_offset], hpi) < 0)
>> -   goto fail;
>> +   sort_by_physaddr(&tmp_hp[hp_offset], hpi);
>> #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS
>> /* remap all hugepages into single file segments */
>>
>>
>>
> Thanks,
> Sergio
>


[dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr() with qsort() from standard library

2015-09-10 Thread Jay Rolette
Thanks for the feedback, Sergio. Responses inline below, but unfortunately
I don't have time to submit a new patch right now. I'm at the tail-end of
our release cycle. Last year when I originally submitted the patch, it
would have been easy to update it and resubmit.

Jay

On Wed, Sep 9, 2015 at 5:35 AM, Gonzalez Monroy, Sergio <
sergio.gonzalez.monroy at intel.com> wrote:

> Following conversation in
> http://dpdk.org/ml/archives/dev/2015-September/023230.html :
>
> On 17/12/2014 13:31, rolette at infiniteio.com (Jay Rolette) wrote:
>
>> Signed-off-by: Jay Rolette 
>> ---
>>
> Update commit title/description, maybe something like:
>   eal/linux: use qsort for sorting hugepages
>   Replace O(n^2) sort in sort_by_physaddr() with qsort() from standard
> library


Ok. Pretty minor, but easy enough.


>   lib/librte_eal/linuxapp/eal/eal_memory.c | 59
>> +++-
>>   1 file changed, 20 insertions(+), 39 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
>> b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> index bae2507..3656515 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> @@ -670,6 +670,25 @@ error:
>> return -1;
>>   }
>>   +static int
>> +cmp_physaddr(const void *a, const void *b)
>> +{
>> +#ifndef RTE_ARCH_PPC_64
>> +   const struct hugepage_file *p1 = (const struct hugepage_file *)a;
>> +   const struct hugepage_file *p2 = (const struct hugepage_file *)b;
>> +#else
>> +   // PowerPC needs memory sorted in reverse order from x86
>> +   const struct hugepage_file *p1 = (const struct hugepage_file *)b;
>> +   const struct hugepage_file *p2 = (const struct hugepage_file *)a;
>> +#endif
>> +   if (p1->physaddr < p2->physaddr)
>> +   return -1;
>> +   else if (p1->physaddr > p2->physaddr)
>> +   return 1;
>> +   else
>> +   return 0;
>> +}
>> +
>>
> There were a couple of comments from Thomas about the comments style and
> #ifdef:
> - Comment style should be modified as per
> http://dpdk.org/doc/guides/contributing/coding_style.html#c-comment-style


Yep, although that came along well after I submitted the patch

- Regarding the #ifdef, I agree with Jay that it is the current approach in
> the file.
>
>   /*
>>* Sort the hugepg_tbl by physical address (lower addresses first on
>> x86,
>>* higher address first on powerpc). We use a slow algorithm, but we
>> won't
>> @@ -678,45 +697,7 @@ error:
>>   static int
>>   sort_by_physaddr(struct hugepage_file *hugepg_tbl, struct hugepage_info
>> *hpi)
>>   {
>> -   unsigned i, j;
>> -   int compare_idx;
>> -   uint64_t compare_addr;
>> -   struct hugepage_file tmp;
>> -
>> -   for (i = 0; i < hpi->num_pages[0]; i++) {
>> -   compare_addr = 0;
>> -   compare_idx = -1;
>> -
>> -   /*
>> -* browse all entries starting at 'i', and find the
>> -* entry with the smallest addr
>> -*/
>> -   for (j=i; j< hpi->num_pages[0]; j++) {
>> -
>> -   if (compare_addr == 0 ||
>> -#ifdef RTE_ARCH_PPC_64
>> -   hugepg_tbl[j].physaddr > compare_addr) {
>> -#else
>> -   hugepg_tbl[j].physaddr < compare_addr) {
>> -#endif
>> -   compare_addr = hugepg_tbl[j].physaddr;
>> -   compare_idx = j;
>> -   }
>> -   }
>> -
>> -   /* should not happen */
>> -   if (compare_idx == -1) {
>> -   RTE_LOG(ERR, EAL, "%s(): error in physaddr
>> sorting\n", __func__);
>> -   return -1;
>> -   }
>> -
>> -   /* swap the 2 entries in the table */
>> -   memcpy(&tmp, &hugepg_tbl[compare_idx],
>> -   sizeof(struct hugepage_file));
>> -   memcpy(&hugepg_tbl[compare_idx], &hugepg_tbl[i],
>> -   sizeof(struct hugepage_file));
>> -   memcpy(&hugepg_tbl[i], &tmp, sizeof(struct
>> hugepage_file));
>> -   }
>> +   qsort(hugepg_tbl, hpi->num_pages[0], sizeof(struct
>> hugepage_file), cmp_physaddr);
>> return 0;
>>   }
>>
> Why not just remove sort_by_physaddr() and call qsort() directly?


That would be fine. I hadn't bothered to check whether sort_by_physaddr()
was called anywhere else, so left it there. It's not, so no real value to
keeping it in this case.


>
> Sergio
>


[dpdk-dev] Help: How to read packet statistics from device registers via dpdk PMD?

2016-07-05 Thread Jay Rolette
On Tue, Jul 5, 2016 at 2:35 AM, Bill Bonaparte 
wrote:

> Hi:
> I am a new fish, I have tried my best to find answer about my question on
> web, but I failed. so
> I come here to ask for your help. the below is my question:
>
> I found that dpdk provides a api rte_eth_stats_get to read packet
> statistics about the interface, includes total input/output
> unicast/multicast/brodcast packets/bytes. but the api does not work on
> VMxnet interface (which is a virtual interface in VMware machine).
>

Probably something in your app or environment rather than in the API
itself. We run rte_eth_stats_get() against interfaces in VMware Fusion,
VirtualBox and real hardware and they all work generally.

Need some info before anyone can help you much:

* What version of DPDK are you running?
* What OS are you running on?
* Output from "dpdk_nic_bind.py --status"?

Jay


[dpdk-dev] Help: How to read packet statistics from device registers via dpdk PMD?

2016-07-07 Thread Jay Rolette
On Thu, Jul 7, 2016 at 12:52 AM, Bill Bonaparte 
wrote:

> I am so happy to get your reply.
> My dpdk version is 2.1?and the OS is centOS 7?
> the following is the output from "dpdk_nic_bind.py --status":
>
> [root at APV35 ~]# dpdk_nic_bind.py --status
>
> Network devices using DPDK-compatible driver
> 
> :04:00.0 'VMXNET3 Ethernet Controller' drv=igb_uio unused=
> :0b:00.0 'VMXNET3 Ethernet Controller' drv=igb_uio unused=
> :13:00.0 'VMXNET3 Ethernet Controller' drv=igb_uio unused=
> :1b:00.0 'VMXNET3 Ethernet Controller' drv=igb_uio unused=
>
> Network devices using kernel driver
> ===
> :03:00.0 'VMXNET3 Ethernet Controller' if=ens160 drv=vmxnet3
> unused=igb_uio *Active*
>
> Other network devices
> =
> 
>

That's a different virtual NIC than what I'm running in my VMs, but given
your app isn't working directly on the hardware, I doubt that's the issue.
In case it helps along the way, here's what I see in my VM:

$ dpdk_nic_bind.py --status

Network devices using DPDK-compatible driver

:02:02.0 '82545EM Gigabit Ethernet Controller (Copper)' drv=igb_uio
unused=
:02:03.0 '82545EM Gigabit Ethernet Controller (Copper)' drv=igb_uio
unused=
:02:04.0 '82545EM Gigabit Ethernet Controller (Copper)' drv=igb_uio
unused=
:02:05.0 '82545EM Gigabit Ethernet Controller (Copper)' drv=igb_uio
unused=

Network devices using kernel driver
===
:02:01.0 '82545EM Gigabit Ethernet Controller (Copper)' if=eth0
drv=e1000 unused=igb_uio *Active*

Other network devices
=



> I tried it on the physical mathine, it still does not work. the OS is
> centOS 7, too.
> [root at AN ~]# dpdk_nic_bind.py --status
>
> Network devices using DPDK-compatible driver
> 
> :01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio
> unused=
> :01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' drv=igb_uio
> unused=
> :03:00.0 'I350 Gigabit Backplane Connection' drv=igb_uio unused=
> :03:00.1 'I350 Gigabit Backplane Connection' drv=igb_uio unused=
> :03:00.2 'I350 Gigabit Backplane Connection' drv=igb_uio unused=
> :03:00.3 'I350 Gigabit Backplane Connection' drv=igb_uio unused=
> :07:00.0 'I350 Gigabit Network Connection' drv=igb_uio unused=
> :07:00.1 'I350 Gigabit Network Connection' drv=igb_uio unused=
> :07:00.2 'I350 Gigabit Network Connection' drv=igb_uio unused=
> :07:00.3 'I350 Gigabit Network Connection' drv=igb_uio unused=
> :09:00.0 'I350 Gigabit Network Connection' drv=igb_uio unused=
> :09:00.1 'I350 Gigabit Network Connection' drv=igb_uio unused=
> :09:00.2 'I350 Gigabit Network Connection' drv=igb_uio unused=
> :09:00.3 'I350 Gigabit Network Connection' drv=igb_uio unused=
> :0c:00.0 'Device 0011' drv=igb_uio unused=
> :0f:00.1 'I350 Gigabit Network Connection' drv=igb_uio unused=
>
> Network devices using kernel driver
> ===========
> :0f:00.0 'I350 Gigabit Network Connection' if=enp15s0f0 drv=igb
> unused=igb_uio *Active*
>
> Other network devices
> =
> 
>

With it not working on hardware and you having devices successfully bound
to DPDK, maybe it's a problem in your app. Have you tried running any of
the sample apps that use rte_eth_stats_get() and see if it works there?

Jay


> On Tue, Jul 5, 2016 at 8:03 PM, Jay Rolette  wrote:
>
>>
>> On Tue, Jul 5, 2016 at 2:35 AM, Bill Bonaparte 
>> wrote:
>>
>>> Hi:
>>> I am a new fish, I have tried my best to find answer about my question on
>>> web, but I failed. so
>>> I come here to ask for your help. the below is my question:
>>>
>>> I found that dpdk provides a api rte_eth_stats_get to read packet
>>> statistics about the interface, includes total input/output
>>> unicast/multicast/brodcast packets/bytes. but the api does not work on
>>> VMxnet interface (which is a virtual interface in VMware machine).
>>>
>>
>> Probably something in your app or environment rather than in the API
>> itself. We run rte_eth_stats_get() against interfaces in VMware Fusion,
>> VirtualBox and real hardware and they all work generally.
>>
>> Need some info before anyone can help you much:
>>
>> * What version of DPDK are you running?
>> * What OS are you running on?
>> * Output from "dpdk_nic_bind.py --status"?
>>
>> Jay
>>
>
>


[dpdk-dev] [PATCH] doc: announce KNI ethtool removal

2016-07-21 Thread Jay Rolette
On Thu, Jul 21, 2016 at 10:33 AM, Ferruh Yigit 
wrote:

> On 7/20/2016 5:07 PM, Thomas Monjalon wrote:
> > The out-of-tree kernel code must be avoided.
> > Moreover there is no good reason to keep this legacy feature
> > which is only partially supported.
> >
> > As described earlier in this plan:
> >   http://dpdk.org/ml/archives/dev/2016-July/043606.html
> > it will help to keep PCI ids in PMD code.
> >
> > Signed-off-by: Thomas Monjalon 
> > ---
> >  doc/guides/rel_notes/deprecation.rst | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> > index f502f86..9cadf6a 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -41,3 +41,10 @@ Deprecation Notices
> >  * The mempool functions for single/multi producer/consumer are
> deprecated and
> >will be removed in 16.11.
> >It is replaced by rte_mempool_generic_get/put functions.
> > +
> > +* The ethtool support will be removed from KNI in 16.11.
> > +  It is implemented only for igb and ixgbe.
> > +  It is really hard to maintain because it requires some out-of-tree
> kernel
> > +  code to be duplicated in this kernel module.
> > +  Removing this partial support will help to restrict the PCI id
> definitions
> > +  to the PMD code.
> >
>
> KNI ethtool is functional and maintained, and it may have users!
>
> Why just removing it, specially without providing an alternative?
> Is is good time to discuss KCP again?
>

Yes, my product uses it. Seems like we are back to the same discussion we
had a few months ago about the KNI situation...

It shouldn't be removed unless there is a replacement, ideally one that
works with the normal Linux tools like every other network device. While
the code wasn't ready at the time, it was a definite improvement over what
we have with KNI today.

Jay


[dpdk-dev] [PATCH] doc: announce KNI ethtool removal

2016-07-21 Thread Jay Rolette
On Thu, Jul 21, 2016 at 3:32 PM, Thomas Monjalon 
wrote:

> 2016-07-21 13:20, Jay Rolette:
> > On Thu, Jul 21, 2016 at 10:33 AM, Ferruh Yigit 
> > wrote:
> > > KNI ethtool is functional and maintained, and it may have users!
> > >
> > > Why just removing it, specially without providing an alternative?
> > > Is is good time to discuss KCP again?
> >
> > Yes, my product uses it.
>
> Your product uses what? KCP? KNI? KNI ethtool?
>

Sorry, that wasn't very clear. It uses KNI + ifconfig to configure the
device/interface in Linux. I'm assuming the "ethtool" bits under discussion
are the same things that make ifconfig work with KNI to the limited extent
it does.

> Seems like we are back to the same discussion we
> > had a few months ago about the KNI situation...
> >
> > It shouldn't be removed unless there is a replacement, ideally one that
> > works with the normal Linux tools like every other network device.
>
> This ethtool module works only for igb and ixgbe!
> There is already no replacement for other drivers.
> Who works on a replacement?
>

Ferruh submitted KCP previously, but you guys didn't like the fact that it
was a kernel module. IIRC, one of the gains from that was simplified
maintenance because you didn't need driver specific support for KNI.
Assuming he's still willing to beat it into shape, we have something that
is already most of the way there.

If people are going to continue to block it because it is a kernel module,
then IMO, it's better to leave the existing support on igx / ixgbe in place
instead of stepping backwards to zero support for ethtool.

> While the code wasn't ready at the time, it was a definite improvement
> over what
> > we have with KNI today.
>


[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-01 Thread Jay Rolette
On Wed, Jun 1, 2016 at 10:00 AM, Wiles, Keith  wrote:

> Started from the link below, but did not want to highjack the thread.
> http://dpdk.org/ml/archives/dev/2016-June/040021.html
>
> I was thinking about this problem from a user perspective and command line
> options are very difficult to manage specifically when you have a large
> number of options as we have in dpdk. I see all of these options as a type
> of database of information for the DPDK and the application, because the
> application command line options are also getting very complex as well.
>
> I have been looking at a number of different options here and the
> direction I was thinking was using a file for the options and
> configurations with the data in a clean format. It could have been a INI
> file or JSON or XML, but they all seem to have some problems I do not like.
> The INI file is too flat and I wanted a hierarchy in the data, the JSON
> data is similar and XML is just hard to read. I wanted to be able to manage
> multiple applications and possible system the DPDK/app runs. The problem
> with the above formats is they are just data and not easy to make decisions
> about the system and applications at runtime.
>

INI format is simplest for users to read, but if you really need hierarchy,
JSON will do that just fine. Not sure what you mean by "JSON data is
similar"...


> If the ?database? of information could be queried by the EAL, drivers and
> application then we do not need to try and create a complex command line.
> It would be nice to execute a DPDK applications like this:
>
> ./some_dpdk_app ?config-file dpdk-config-filename
>

+1 much nicer than the mess that is EAL command line args today.


> The dpdk-config-filename could contain a lot of information and be able to
> startup multiple different applications. The dpdk-config-file could also
> include other config files to complete the configuration. The format of the
> data in the config file needs to be readable, but allow the user to put in
> new options, needs to be hierarchical in nature and have some simple
> functions to execute if required.
>
> The solution I was thinking is the file information is really just a
> fragment of a scripting language, which the DPDK application contains this
> scripting language interpreter. I was looking at using Lua lua.org as the
> scripting language interpreter it is small and easy to understand. Python
> and others are very big and require a lot of resources and Lua requires
> very few system resources. Also I did not want to have to write a parser
> (lex/yacc). The other nice feature of Lua is you can create a sandbox for
> the code to run in and limit the type of system resources and APIs that can
> be accessed by the application and configuration. Lua can be trimmed down
> to a fairly small size and builds on just about any system or we can just
> install Lua on the system without changes from a rpm or deb.
>

There are JSON and INI file parser libraries for pretty much any language
you care to use. That shouldn't be a factor in choosing file format.

The argument about "Python and others are very big and require a lot of
resources" doesn't end up mattering much since it is already required by a
couple of the DPDK tools (in particular, dpdk_nic_bind.py).


> I use Lua in pktgen at this time and the interface between ?C? and Lua is
> very simple and easy. Currently I include Lua in Pktgen, but I could have
> just used a system library.
>
> The data in the config file can be data statements along with some limited
> code to make some data changes at run time without having to modify the
> real application. Here is a simple config file I wrote: Some of the options
> do not make sense to be in the file at the same time, but wanted to see all
> of the options. The mk_lcore_list() and mk_coremap() are just Lua functions
> we can preload to help convert the simple strings into real data in this
> case tables of information. The application could be something like pktgen
> = { map = { ? }, more_options = 1, } this allows the same file to possible
> contain many application configurations. Needs a bit more work.
>
> dpdk_default = {
>


> }
>
> The EAL, driver, application, ? would query an API to access the data and
> the application can change his options quickly without modifying the code.
>
> Anyway comments are welcome.
>
> Regards,
> Keith
>

I like the concept overall. I'd suggest separating out the Lua thing. Lua's
fine for scripting, but nothing here really requires it or saves a lot of
development work.

Jay


[dpdk-dev] [PATCH v2] pipeline: add statistics for librte_pipeline ports and tables

2015-05-21 Thread Jay Rolette
On Thu, May 21, 2015 at 8:33 AM, Dumitrescu, Cristian <
cristian.dumitrescu at intel.com> wrote:

> > > The problem I see with this approach is that you cannot turn off debug
> > > messages while still having the statistics collected.  We should allow
> > > people to collects the stats without getting the debug messages. How do
> > > we do this?
> >
> > By setting build-time log level to debug, you would enable stats and
> debug
> > messages. Then you can adjust the log messages level with the run-time
> > option --log-level.
>
> This is a really clever trick! I have to say it took me some time to make
> sure I got it right :)
> So when RTE_LOG_LEVEL (build time option) is set to DEBUG (i.e. 8), then
> we get both the stats and the debug messages, but then we can adjust the
> level of debug messages at run-time through the --log-level EAL option.
> I can work with this option, thank you!
>
> There is one more simple proposal that came to my mind late last night:
> how about single config file option RTE_COLLECT_STATS=y/n that should be
> used by all the libs across the board to indicate whether stats should be
> enabled or not?
> This is logically very similar to the solution above, as they both look at
> a single option in the config file, but maybe it is also more intuitive for
> people.
>
> I will go with your choice. Which one do you pick?
>

As Stephen said previously, any real DPDK app needs stats. You can't
support network products operationally without them. What's the point of
making them optional other than trying to juice benchmarks?

Jay


[dpdk-dev] KNI example with pcap file

2015-07-28 Thread Jay Rolette
Maybe I just haven't had enough caffeine this morning yet, but why would
you add IP reassembly to KNI? If packets are going through KNI, they
already have IP reassembly via the normal Linux networking stack...

Jay

On Mon, Jul 27, 2015 at 8:55 PM, EaseTheWorld Mr. 
wrote:

> Hello. I'm a DPDK newbie, playing with KNI.
>
> I have a packet analysis application like tshark,
> which receives packets from network interface or pcap file.
>
> My plan with KNI is
> 1) Performance improvement without modifying my program
> 2) Add ip_reassembly, acl to kni example.
>(rte_eth_rx_burst -> ip_reassembly, acl -> rte_kni_tx_burst)
>
> If input is network interface, I think the plan is fine.
> but if input is pcap file, is it possible?
> Should I use vdev or something?
>


[dpdk-dev] add support for HTM lock elision for x86

2015-06-02 Thread Jay Rolette
On Tue, Jun 2, 2015 at 8:11 AM, Roman Dementiev 
wrote:

>
> This series of patches adds methods that use hardware memory transactions
> (HTM)
> on fast-path for DPDK locks (a.k.a. lock elision). Here the methods are
> implemented
> for x86 using Restricted Transactional Memory instructions (Intel(r)
> Transactional
> Synchronization Extensions). The implementation fall-backs to the normal
> DPDK lock
> if HTM is not available or memory transactions fail.
>

This is very interesting. Do you have any summary you can give us of what
the performance implications are from your test data?


> This is not a replacement for lock usages since not all critical sections
> protected
> by locks are friendly to HTM.
>

Any pointers to material that describe where HTM in its current incarnation
on x86 is approprate?


>
> Roman Dementiev (3):
>  spinlock: add support for HTM lock elision for x86
>  rwlock: add support for HTM lock elision for x86
>  test scaling of HTM lock elision protecting rte_hash
>
>  app/test/Makefile  |   1 +
>  app/test/test_hash_scaling.c   | 223
> +
>  lib/librte_eal/common/Makefile |   4 +-
>  .../common/include/arch/ppc_64/rte_rwlock.h|  38 
>  .../common/include/arch/ppc_64/rte_spinlock.h  |  41 
>  lib/librte_eal/common/include/arch/x86/rte_rtm.h   |  73 +++
>  .../common/include/arch/x86/rte_rwlock.h   |  82 
>  .../common/include/arch/x86/rte_spinlock.h | 107 ++
>  lib/librte_eal/common/include/generic/rte_rwlock.h | 194
> ++
>  .../common/include/generic/rte_spinlock.h  |  75 +++
>  lib/librte_eal/common/include/rte_rwlock.h | 158 ---
>  11 files changed, 836 insertions(+), 160 deletions(-)
>

Thanks!
Jay


[dpdk-dev] rte_eal_alarm_set() is affected by discontinuous jumps in the system time

2015-06-03 Thread Jay Rolette
On Wed, Jun 3, 2015 at 7:54 AM, Bruce Richardson  wrote:

> On Wed, Jun 03, 2015 at 02:31:47PM +0800, Selmon Yang wrote:
> > Hi,
> >
> > I found that, in dpdk 2.0, rte_eal_alarm_set() is affected by
> > discontinuous jumps in the system time because eal_alarm_callback()
> > and rte_eal_alarm_set() use gettimeofday() to get the current time.
> >
> > Here is what I encountered.
> > I set up a rte eal alarm as below, and I like it to be triggered every
> second.
> > #define USE_PER_S 1000 * 1000
> > void my_alarm_cb(void *arg)
> > {
> > /* send heartbeat signal out, etc. */
> >
> > rte_eal_alarm_set(1 * US_PER_S, my_alarm_cb, NULL);
> > return;
> > }
> >
> > int main(void)
> > {
> > /* ..., do something */
> > rte_eal_alarm_set(1 * US_PER_S, my_alarm_cb, NULL);
> > /* ... do something else */
> > }
> >
> > It works fine in most of time.
> > However, if I change system time manually, it is possible that rte alarm
> > function works out of my expectation.
> > Suppose that current time is 11:00:00 AM, and eal_alarm_callback()
> > is triggered because I executed
> > rte_eal_alarm_set(1 * US_PER_S, my_alarm_cb, NULL) at 10:59:59 AM.
> > eal_alarm_callback() gets the current time (11:00:00 AM)
> > and calls my_alarm_cb() as below.
> > while ((ap = LIST_FIRST(&alarm_list)) !=NULL &&
> >   gettimeofday(&now, NULL) == 0 &&
> >   (ap->time.tv_sec < now.tv_sec ||
> > (ap->time.tv_sec == now.tv_sec &&
> >   ap->time.tv_usec <=
> > now.tv_usec))){
> > ap->executing = 1;
> > ap->executing_id = pthread_self();
> > rte_spinlock_unlock(&alarm_list_lk);
> >
> > ap->cb_fn(ap->cb_arg);
> >
> > rte_spinlock_lock(&alarm_list_lk);
> >
> > LIST_REMOVE(ap, next);
> > rte_free(ap);
> > }
> >
> > In my_alarm_cb(), rte_eal_alarm_set() is called again.
> > rte_eall_alarm_set() gets the current time (11:00:00 AM), plus 1 second,
> > and adds the new alarm entry to alarm_list.
> > /* use current time to calculate absolute time of alarm */
> > gettimeofday(&now, NULL);
> >
> > new_alarm->cb_fn = cb_fn;
> > new_alarm->cb_arg = cb_arg;
> > new_alarm->time.tv_usec = (now.tv_usec + us) % US_PER_S;
> > new_alarm->time.tv_sec = now.tv_sec + ((now.tv_usec + us) /
> US_PER_S);
> >
> > rte_spinlock_lock(&alarm_list_lk);
> > if (!handler_registered) {
> > ret |= rte_intr_callback_register(&intr_handle,
> > eal_alarm_callback, NULL);
> > handler_registered = (ret == 0) ? 1 : 0;
> > }
> >
> > if (LIST_EMPTY(&alarm_list))
> > LIST_INSERT_HEAD(&alarm_list, new_alarm, next);
> > else {
> > LIST_FOREACH(ap, &alarm_list, next) {
> > if (ap->time.tv_sec > new_alarm->time.tv_sec ||
> > (ap->time.tv_sec ==
> > new_alarm->time.tv_sec &&
> >
> > ap->time.tv_usec > new_alarm->time.tv_usec)){
> > LIST_INSERT_BEFORE(ap, new_alarm, next);
> > break;
> > }
> > if (LIST_NEXT(ap, next) == NULL) {
> > LIST_INSERT_AFTER(ap, new_alarm, next);
> > break;
> > }
> > }
> > }
> >
> > After the new alarm entry is added to alarm_list, if current time is
> > set to 8:00:00 AM manually, the current time in eal_alarm_callback()
> > will be updated to 8:00:00 AM, too.
> > Then the new alarm entry will be triggered after 3 hours and 1 second.
> >
> > I think rte alarm should not be affected by discontinuous jumps in
> > the system time.
> > I tried to replace gettimeofday() with
> clock_gettime(CLOCK_MONOTONIC_RAW, &now),
> > and it looks work fine.
> > What do you think about this modification?
> > Will you consider to modify rte_alarm functions to be not affected
> > by discontinuous jumps in the system time?
>
> I agree with you that the alarm functionality should not be affected by
> jumps
> in system time. If you have a patch that fixes this bug, it would be great
> if
> you could upstream it here.
>
> Thanks,
> /Bruce
>

I haven't looked through the RTE alarm code, but one thing to consider is
whether you want to use CLOCK_MONOTONIC_RAW or just CLOCK_MONOTONIC. The
RAW version is ~10x slower than the CLOCK_MONOTONIC.

CLOCK_MONOTONIC isn't completely protected from NTP frequency adjustments,
but it won't have discontinuities.

We've found the rte_eal_alarm calls to be surprisingly (ie., we hadn't
bothered to look at the implementation yet) variable and inter

[dpdk-dev] [PATCH 1/3] kni: minor opto

2015-06-03 Thread Jay Rolette
Don't need the 'safe' version of list_for_each_entry() if you aren't deleting 
from the list as you iterate over it

Signed-off-by: Jay Rolette 
---
 lib/librte_eal/linuxapp/kni/kni_misc.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c 
b/lib/librte_eal/linuxapp/kni/kni_misc.c
index 1935d32..312f196 100644
--- a/lib/librte_eal/linuxapp/kni/kni_misc.c
+++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
@@ -213,13 +213,12 @@ static int
 kni_thread_single(void *unused)
 {
int j;
-   struct kni_dev *dev, *n;
+   struct kni_dev *dev;

while (!kthread_should_stop()) {
down_read(&kni_list_lock);
for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
-   list_for_each_entry_safe(dev, n,
-   &kni_list_head, list) {
+   list_for_each_entry(dev, &kni_list_head, list) {
 #ifdef RTE_KNI_VHOST
kni_chk_vhost_rx(dev);
 #else
-- 
2.3.2 (Apple Git-55)



[dpdk-dev] [PATCH 2/3] kni: minor opto

2015-06-03 Thread Jay Rolette
No reason to check out many entries are in kni->rx_q prior to
actually pulling them from the fifo. You can't dequeue more than
are there anyway. Max entries to dequeue is either the max batch
size or however much space is available on kni->free_q (lesser of the two)

Signed-off-by: Jay Rolette 
---
 lib/librte_eal/linuxapp/kni/kni_net.c | 21 -
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/lib/librte_eal/linuxapp/kni/kni_net.c 
b/lib/librte_eal/linuxapp/kni/kni_net.c
index dd95db5..13ccbb8 100644
--- a/lib/librte_eal/linuxapp/kni/kni_net.c
+++ b/lib/librte_eal/linuxapp/kni/kni_net.c
@@ -131,7 +131,7 @@ kni_net_rx_normal(struct kni_dev *kni)
 {
unsigned ret;
uint32_t len;
-   unsigned i, num, num_rq, num_fq;
+   unsigned i, num, num_fq;
struct rte_kni_mbuf *kva;
struct rte_kni_mbuf *va[MBUF_BURST_SZ];
void * data_kva;
@@ -139,24 +139,19 @@ kni_net_rx_normal(struct kni_dev *kni)
struct sk_buff *skb;
struct net_device *dev = kni->net_dev;

-   /* Get the number of entries in rx_q */
-   num_rq = kni_fifo_count(kni->rx_q);
-
/* Get the number of free entries in free_q */
-   num_fq = kni_fifo_free_count(kni->free_q);
-
-   /* Calculate the number of entries to dequeue in rx_q */
-   num = min(num_rq, num_fq);
-   num = min(num, (unsigned)MBUF_BURST_SZ);
-
-   /* Return if no entry in rx_q and no free entry in free_q */
-   if (num == 0)
+   if ((num_fq = kni_fifo_free_count(kni->free_q)) == 0) {
+   /* No room on the free_q, bail out */
return;
+   }
+
+   /* Calculate the number of entries to dequeue from rx_q */
+   num = min(num_fq, (unsigned)MBUF_BURST_SZ);

/* Burst dequeue from rx_q */
ret = kni_fifo_get(kni->rx_q, (void **)va, num);
if (ret == 0)
-   return; /* Failing should not happen */
+   return;

/* Transfer received packets to netif */
for (i = 0; i < num; i++) {
-- 
2.3.2 (Apple Git-55)



[dpdk-dev] [PATCH 3/3] kni: rx loop was using the wrong counter

2015-06-03 Thread Jay Rolette
Loop processing packets dequeued from rx_q was using the number of
packets requested, not how many it actually received.

Variable rename to make code a little more clear

Signed-off-by: Jay Rolette 
---
 lib/librte_eal/linuxapp/kni/kni_net.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/linuxapp/kni/kni_net.c 
b/lib/librte_eal/linuxapp/kni/kni_net.c
index 13ccbb8..d37a6b9 100644
--- a/lib/librte_eal/linuxapp/kni/kni_net.c
+++ b/lib/librte_eal/linuxapp/kni/kni_net.c
@@ -131,7 +131,7 @@ kni_net_rx_normal(struct kni_dev *kni)
 {
unsigned ret;
uint32_t len;
-   unsigned i, num, num_fq;
+   unsigned i, num_rx, num_fq;
struct rte_kni_mbuf *kva;
struct rte_kni_mbuf *va[MBUF_BURST_SZ];
void * data_kva;
@@ -146,15 +146,15 @@ kni_net_rx_normal(struct kni_dev *kni)
}

/* Calculate the number of entries to dequeue from rx_q */
-   num = min(num_fq, (unsigned)MBUF_BURST_SZ);
+   num_rx = min(num_fq, (unsigned)MBUF_BURST_SZ);

/* Burst dequeue from rx_q */
-   ret = kni_fifo_get(kni->rx_q, (void **)va, num);
-   if (ret == 0)
+   num_rx = kni_fifo_get(kni->rx_q, (void **)va, num_rx);
+   if (num_rx == 0)
return;

/* Transfer received packets to netif */
-   for (i = 0; i < num; i++) {
+   for (i = 0; i < num_rx; i++) {
kva = (void *)va[i] - kni->mbuf_va + kni->mbuf_kva;
len = kva->data_len;
data_kva = kva->buf_addr + kva->data_off - kni->mbuf_va
@@ -184,8 +184,8 @@ kni_net_rx_normal(struct kni_dev *kni)
}

/* Burst enqueue mbufs into free_q */
-   ret = kni_fifo_put(kni->free_q, (void **)va, num);
-   if (ret != num)
+   ret = kni_fifo_put(kni->free_q, (void **)va, num_rx);
+   if (ret != num_rx)
/* Failing should not happen */
KNI_ERR("Fail to enqueue entries into free_q\n");
 }
-- 
2.3.2 (Apple Git-55)



[dpdk-dev] [PATCH 2/3] kni: minor opto

2015-06-03 Thread Jay Rolette
Some sort of hiccup sending. Not sure why 2/3 didn't come out as expected.
I'll try sending again.

Jay

On Wed, Jun 3, 2015 at 2:07 PM, Jay Rolette  wrote:

> No reason to check out many entries are in kni->rx_q prior to
> actually pulling them from the fifo. You can't dequeue more than
> are there anyway. Max entries to dequeue is either the max batch
> size or however much space is available on kni->free_q (lesser of the two)
>
> Signed-off-by: Jay Rolette 
> ---
>  lib/librte_eal/linuxapp/kni/kni_net.c | 21 -
>  1 file changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/kni/kni_net.c
> b/lib/librte_eal/linuxapp/kni/kni_net.c
> index dd95db5..13ccbb8 100644
> --- a/lib/librte_eal/linuxapp/kni/kni_net.c
> +++ b/lib/librte_eal/linuxapp/kni/kni_net.c
> @@ -131,7 +131,7 @@ kni_net_rx_normal(struct kni_dev *kni)
>  {
> unsigned ret;
> uint32_t len;
> -   unsigned i, num, num_rq, num_fq;
> +   unsigned i, num, num_fq;
> struct rte_kni_mbuf *kva;
> struct rte_kni_mbuf *va[MBUF_BURST_SZ];
> void * data_kva;
> @@ -139,24 +139,19 @@ kni_net_rx_normal(struct kni_dev *kni)
> struct sk_buff *skb;
> struct net_device *dev = kni->net_dev;
>
> -   /* Get the number of entries in rx_q */
> -   num_rq = kni_fifo_count(kni->rx_q);
> -
> /* Get the number of free entries in free_q */
> -   num_fq = kni_fifo_free_count(kni->free_q);
> -
> -   /* Calculate the number of entries to dequeue in rx_q */
> -   num = min(num_rq, num_fq);
> -   num = min(num, (unsigned)MBUF_BURST_SZ);
> -
> -   /* Return if no entry in rx_q and no free entry in free_q */
> -   if (num == 0)
> +   if ((num_fq = kni_fifo_free_count(kni->free_q)) == 0) {
> +   /* No room on the free_q, bail out */
> return;
> +   }
> +
> +   /* Calculate the number of entries to dequeue from rx_q */
> +   num = min(num_fq, (unsigned)MBUF_BURST_SZ);
>
> /* Burst dequeue from rx_q */
> ret = kni_fifo_get(kni->rx_q, (void **)va, num);
> if (ret == 0)
> -   return; /* Failing should not happen */
> +   return;
>
> /* Transfer received packets to netif */
> for (i = 0; i < num; i++) {
> --
> 2.3.2 (Apple Git-55)
>
>


  1   2   >