from:"Matthew Hall"

[dpdk-dev] Could virtio-net-pmd co-exist with virtio-net.ko?

2014-11-06 Thread Matthew Hall

On Thu, Nov 06, 2014 at 10:24:11AM +0800, GongJinrong wrote:
> Hi, Guys
> 
>  When I run virtio-net-pmd in VM, I got "virtio-net device is already
> used by another driver" error message, after I removed the virtio-net.ko, it
> worked, but now I cannot use the virio-net driver for another virtual NIC,
> this cost that normal network performance(non-DPDK application) drops a lot,
> could the virtio-net-pmd co-exist with standard virio-net driver?
> 
> BR
> John Gong

I have no proof it will work perfectly, as I never got to use the virtio PMDs 
because neither works in VirtualBox (developer-friendly / desktop 
virtualization).

But there is a script included in DPDK, dpdk_nic_bind.py, which should let you 
configure this more intelligently on a per-VNIC basis. You could try something 
similar to this:

export RTE_SDK="${build_directory}/external/dpdk"
export RTE_TOOLS="${RTE_SDK}/tools"
export RTE_NIC_BIND="${RTE_TOOLS}/dpdk_nic_bind.py"

"${RTE_NIC_BIND}" --status | fgrep "${PCI_ID}"
"${RTE_NIC_BIND}" -b none  "${PCI_ID}"
"${RTE_NIC_BIND}" -b igb_uio   "${PCI_ID}"
"${RTE_NIC_BIND}" --status | fgrep "${PCI_ID}"

Good Luck!
Matthew.

[dpdk-dev] Valgrind and DPDK - does it work ?

2014-11-06 Thread Matthew Hall

On Fri, Nov 07, 2014 at 01:22:49AM +0100, Marc Sune wrote:
> Found some time to have a close look. I also wanted to check a DPDK app
> against valgrind. It works!
> 
> I downloaded and compiled valgrind from sources (3.10.0) and applied
> (manually) this patch:
> 
> https://bugs.kde.org/attachment.cgi?id=85950&action=edit
> 
> (Applied around line 2216)
> 
> From this post:
> 
> http://valgrind.10908.n7.nabble.com/mpich-unable-to-munmap-hugepages-td49150.html
> 
> Happy debugging
> Marc

Marc,

This is just AMAZING!!! I have wished for it for many years for DPDK, ever 
since I used it in beta before it went GA.

Would it be possible to post your modification of Valgrind in Github, 
Bitbucket, or some other repo? I'd like to try this out on my app, too.

Also, not sure if anybody sent this upstream to Valgrind, but if not, we 
really should, so it just works by default from now on.

Thanks,
Matthew.

[dpdk-dev] building shared library

2014-11-10 Thread Matthew Hall

On Mon, Nov 10, 2014 at 03:22:40PM +0100, Newman Poborsky wrote:
> is it possible to build a  dpdk app as a shared library?

Yes it will work, with a bit of performance loss from the .so symbol lookup 
overhead. You have to set some of the build config options to get it to work 
though.

Matthew.

[dpdk-dev] Debugging EAL PCI / Driver Init

2014-08-01 Thread Matthew Hall

Hello,

I am running into a problem where Eth driver init works fine in a sample app 
and finds my NICs, and the NICs appear in rte_eal_pci_dump(stdout) but they 
don't show up in rte_eth_dev_count() even after rte_eal_pci_probe() is called 
the same as the sample apps, so my app won't boot.

I have a lot of experience using the older versions of the DPDK where you had 
to call the PMD init functions manually but no experience with the later 
versions where the DPDK is supposed to init the PMDs itself automatically.

What do I have to do to dump the most possible debug output on why the driver 
list for my PCI devices always seems empty? Any places I should look to see 
the issue? Maybe I didn't link it together with the right DPDK libs? I used 
the combined DPDK static lib libintel_dpdk.a to make things simpler as I had 
seen recommended in various places.

Thanks,
Matthew.

[dpdk-dev] Debugging EAL PCI / Driver Init

2014-08-02 Thread Matthew Hall

I did a bit more experimentation and found the following. If I unmark the 
rte_igb_pmd_init function as static, and call it directly from my code, the 
driver will load, and the port count increments to 2:

EAL: PCI device :01:00.0 on NUMA socket -1
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL:   PCI memory mapped at 0x7f09d45f2000
EAL:   PCI memory mapped at 0x7f09d473
PMD: eth_igb_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1521

EAL: PCI device :01:00.1 on NUMA socket -1
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL:   PCI memory mapped at 0x7f09d0f0
EAL:   PCI memory mapped at 0x7f09d472c000
PMD: eth_igb_dev_init(): port_id 1 vendorID=0x8086 deviceID=0x1521

So it seems like when you enable these options:

# Combine to one single library
CONFIG_RTE_BUILD_COMBINE_LIBS=y
CONFIG_RTE_LIBNAME="intel_dpdk"

You don't really get a working DPDK inside of "-lintel_dpdk". Someone 
suggested linking with "-Xlinker -lintel_dpdk" but that didn't seem to help.

Is there a secret to getting a single integrated static library, where all of 
the PMD's end up in the PCI driver list so rte_eal_pci_probe can find them? Or 
some secret to linking against the combined library which works properly?

Matthew.

On Fri, Aug 01, 2014 at 10:51:38AM -0700, Matthew Hall wrote:
> Hello,
> 
> I am running into a problem where Eth driver init works fine in a sample app 
> and finds my NICs, and the NICs appear in rte_eal_pci_dump(stdout) but they 
> don't show up in rte_eth_dev_count() even after rte_eal_pci_probe() is called 
> the same as the sample apps, so my app won't boot.
> 
> I have a lot of experience using the older versions of the DPDK where you had 
> to call the PMD init functions manually but no experience with the later 
> versions where the DPDK is supposed to init the PMDs itself automatically.
> 
> What do I have to do to dump the most possible debug output on why the driver 
> list for my PCI devices always seems empty? Any places I should look to see 
> the issue? Maybe I didn't link it together with the right DPDK libs? I used 
> the combined DPDK static lib libintel_dpdk.a to make things simpler as I had 
> seen recommended in various places.
> 
> Thanks,
> Matthew.

[dpdk-dev] Debugging EAL PCI / Driver Init

2014-08-02 Thread Matthew Hall

Also, when using the separate libraries the problem still happens:

-lethdev -lrte_cfgfile -lrte_cmdline -lrte_distributor -lrte_hash 
-lrte_ip_frag -lrte_lpm -lrte_malloc -lrte_mbuf -lrte_mempool -lrte_pmd_e1000 
-lrte_pmd_pcap -lrte_pmd_virtio_uio -lrte_pmd_vmxnet3_uio -lrte_port 
-lrte_ring -lrte_table -lrte_timer -lrte_eal -lbsd -ldl -lpcap -lpthread

So it seems there is a special order or link technique which must be used, or 
rte_eal's PCI code won't be able to load the PMD's during PCI probing. Is it 
documented anywhere how to get that to work?

Thanks,
Matthew.

On Sat, Aug 02, 2014 at 08:29:04AM -0700, Matthew Hall wrote:
> I did a bit more experimentation and found the following. If I unmark the 
> rte_igb_pmd_init function as static, and call it directly from my code, the 
> driver will load, and the port count increments to 2:
> 
> EAL: PCI device :01:00.0 on NUMA socket -1
> EAL:   probe driver: 8086:1521 rte_igb_pmd
> EAL:   PCI memory mapped at 0x7f09d45f2000
> EAL:   PCI memory mapped at 0x7f09d473
> PMD: eth_igb_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x1521
> 
> EAL: PCI device :01:00.1 on NUMA socket -1
> EAL:   probe driver: 8086:1521 rte_igb_pmd
> EAL:   PCI memory mapped at 0x7f09d0f0
> EAL:   PCI memory mapped at 0x7f09d472c000
> PMD: eth_igb_dev_init(): port_id 1 vendorID=0x8086 deviceID=0x1521
> 
> So it seems like when you enable these options:
> 
> # Combine to one single library
> CONFIG_RTE_BUILD_COMBINE_LIBS=y
> CONFIG_RTE_LIBNAME="intel_dpdk"
> 
> You don't really get a working DPDK inside of "-lintel_dpdk". Someone 
> suggested linking with "-Xlinker -lintel_dpdk" but that didn't seem to help.
> 
> Is there a secret to getting a single integrated static library, where all of 
> the PMD's end up in the PCI driver list so rte_eal_pci_probe can find them? 
> Or 
> some secret to linking against the combined library which works properly?
> 
> Matthew.
> 
> On Fri, Aug 01, 2014 at 10:51:38AM -0700, Matthew Hall wrote:
> > Hello,
> > 
> > I am running into a problem where Eth driver init works fine in a sample 
> > app 
> > and finds my NICs, and the NICs appear in rte_eal_pci_dump(stdout) but they 
> > don't show up in rte_eth_dev_count() even after rte_eal_pci_probe() is 
> > called 
> > the same as the sample apps, so my app won't boot.
> > 
> > I have a lot of experience using the older versions of the DPDK where you 
> > had 
> > to call the PMD init functions manually but no experience with the later 
> > versions where the DPDK is supposed to init the PMDs itself automatically.
> > 
> > What do I have to do to dump the most possible debug output on why the 
> > driver 
> > list for my PCI devices always seems empty? Any places I should look to see 
> > the issue? Maybe I didn't link it together with the right DPDK libs? I used 
> > the combined DPDK static lib libintel_dpdk.a to make things simpler as I 
> > had 
> > seen recommended in various places.
> > 
> > Thanks,
> > Matthew.

[dpdk-dev] Debugging EAL PCI / Driver Init

2014-08-02 Thread Matthew Hall

On Sun, Aug 03, 2014 at 01:37:06AM +0900, Masaru Oki wrote:
> cc links library funtion from archive only if call from other object.
> but new dpdk pmd library has constractor section and not call directly.
> ld always links library funtion with constractor section.
> use -Xlinker, or use ld instead of cc.

Hello Oki-san,

The trick to fix it was this, I finally found it in the example Makefiles with 
V=1 flag.

-Wl,--whole-archive -Wl,--start-group -lintel_dpdk -Wl,--end-group 
-Wl,--no-whole-archive

Thank you for the advice you provided, I couldn't have fixed it without your 
suggestions... it got me to look more closely at the linking. Importantly, 
"-Wl,--whole-archive" includes the entire archive whether or not it's called 
from other objects, so we don't lose the constructors, just like you said.

Matthew.

[dpdk-dev] Debugging EAL PCI / Driver Init

2014-08-03 Thread Matthew Hall

There is an option in the DPDK build config which compiles every DPDK lib into 
a single static lib. I don't have the name of the option in front of me but it 
had COMBINE in its name. When this option is used you can get every function in 
the whole DPDK with a single library. After that I had a lot fewer linking 
issues.

Matthew.
-- 
Sent from my mobile device.

On August 3, 2014 4:41:51 AM PDT, Alex Markuze  wrote:
>Resolved just as Matt has described.
>To remove any ambiguity (for future reference).
>
>This line in the gcc command resolves the issue in my case (a
>different nic may need a different lib):
>--whole-archive -Wl,--start-group -lrte_pmd_ixgbe -Wl,--end-group -Wl,
>
>The problem is that the probe command polls over all pci devices and
>tries to find a matching driver, these drivers register
>With these macros which will only be called when the --whole-archive
>option is provided and its actually the reason for this flags
>existence. Without this flag the driver list is empty.
>
>PMD_REGISTER_DRIVER(rte_ixgbe_driver);
>PMD_REGISTER_DRIVER(rte_ixgbevf_driver);
>
>
>On Sun, Aug 3, 2014 at 1:38 PM, Alex Markuze  wrote:
>> Hi Matt, Dev
>> I'm Trying to compile ann app linking to dpdk and dpdk based libs.
>> And I'm seeing the same issue you've reported.
>> The probe function doesn't seem to find any ixgbevf(SRIOV VM) ports.
>> Same code compiled as a dpdk app works fine.
>>
>> In your solution to this issue you are referring to  -lintel_dpdk? I
>> couldn't find any reference to it.
>>
>> Thanks
>> Alex.
>>
>>
>> On Sat, Aug 2, 2014 at 7:46 PM, Matthew Hall 
>wrote:
>>> On Sun, Aug 03, 2014 at 01:37:06AM +0900, Masaru Oki wrote:
>>>> cc links library funtion from archive only if call from other
>object.
>>>> but new dpdk pmd library has constractor section and not call
>directly.
>>>> ld always links library funtion with constractor section.
>>>> use -Xlinker, or use ld instead of cc.
>>>
>>> Hello Oki-san,
>>>
>>> The trick to fix it was this, I finally found it in the example
>Makefiles with
>>> V=1 flag.
>>>
>>> -Wl,--whole-archive -Wl,--start-group -lintel_dpdk -Wl,--end-group
>-Wl,--no-whole-archive
>>>
>>> Thank you for the advice you provided, I couldn't have fixed it
>without your
>>> suggestions... it got me to look more closely at the linking.
>Importantly,
>>> "-Wl,--whole-archive" includes the entire archive whether or not
>it's called
>>> from other objects, so we don't lose the constructors, just like you
>said.
>>>
>>> Matthew.

[dpdk-dev] Debugging EAL PCI / Driver Init

2014-08-03 Thread Matthew Hall

Yes, that's the one! It made a lot of my linking problems go away when I used 
that along with the whole-archive flags.

Matthew
-- 
Sent from my mobile device.

On August 3, 2014 11:13:55 AM PDT, "Jayakumar, Muthurajan" 
 wrote:
>
>Are you referring to CONFIG_RTE_BUILD_COMBINE_LIBS ? 
>
>(ps: referenced here
>http://dpdk.org/ml/archives/dev/2013-October/000639.html)
>
>Thanks,
>M Jay
>
>
>-Original Message-
>From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matthew Hall
>Sent: Sunday, August 03, 2014 10:23 AM
>To: Alex Markuze
>Cc: dev at dpdk.org
>Subject: Re: [dpdk-dev] Debugging EAL PCI / Driver Init
>
>There is an option in the DPDK build config which compiles every DPDK
>lib into a single static lib. I don't have the name of the option in
>front of me but it had COMBINE in its name. When this option is used
>you can get every function in the whole DPDK with a single library.
>After that I had a lot fewer linking issues.
>
>Matthew.
>--
>Sent from my mobile device.
>
>On August 3, 2014 4:41:51 AM PDT, Alex Markuze  wrote:
>>Resolved just as Matt has described.
>>To remove any ambiguity (for future reference).
>>
>>This line in the gcc command resolves the issue in my case (a
>different 
>>nic may need a different lib):
>>--whole-archive -Wl,--start-group -lrte_pmd_ixgbe -Wl,--end-group -Wl,
>>
>>The problem is that the probe command polls over all pci devices and 
>>tries to find a matching driver, these drivers register With these 
>>macros which will only be called when the --whole-archive option is 
>>provided and its actually the reason for this flags existence. Without
>
>>this flag the driver list is empty.
>>
>>PMD_REGISTER_DRIVER(rte_ixgbe_driver);
>>PMD_REGISTER_DRIVER(rte_ixgbevf_driver);
>>
>>
>>On Sun, Aug 3, 2014 at 1:38 PM, Alex Markuze  wrote:
>>> Hi Matt, Dev
>>> I'm Trying to compile ann app linking to dpdk and dpdk based libs.
>>> And I'm seeing the same issue you've reported.
>>> The probe function doesn't seem to find any ixgbevf(SRIOV VM) ports.
>>> Same code compiled as a dpdk app works fine.
>>>
>>> In your solution to this issue you are referring to  -lintel_dpdk? I
>
>>> couldn't find any reference to it.
>>>
>>> Thanks
>>> Alex.
>>>
>>>
>>> On Sat, Aug 2, 2014 at 7:46 PM, Matthew Hall 
>>wrote:
>>>> On Sun, Aug 03, 2014 at 01:37:06AM +0900, Masaru Oki wrote:
>>>>> cc links library funtion from archive only if call from other
>>object.
>>>>> but new dpdk pmd library has constractor section and not call
>>directly.
>>>>> ld always links library funtion with constractor section.
>>>>> use -Xlinker, or use ld instead of cc.
>>>>
>>>> Hello Oki-san,
>>>>
>>>> The trick to fix it was this, I finally found it in the example
>>Makefiles with
>>>> V=1 flag.
>>>>
>>>> -Wl,--whole-archive -Wl,--start-group -lintel_dpdk -Wl,--end-group
>>-Wl,--no-whole-archive
>>>>
>>>> Thank you for the advice you provided, I couldn't have fixed it
>>without your
>>>> suggestions... it got me to look more closely at the linking.
>>Importantly,
>>>> "-Wl,--whole-archive" includes the entire archive whether or not
>>it's called
>>>> from other objects, so we don't lose the constructors, just like
>you
>>said.
>>>>
>>>> Matthew.

[dpdk-dev] is there any function like rte_mempool_destroy compare with rte_mempool_create

2014-08-27 Thread Matthew Hall

On Wed, Aug 27, 2014 at 07:46:26AM +, Ni, Xun wrote:
> If I run an application twice, is there possible that 
> the app has the memory leak? Or it just doesn't have enough memory to 
> execute it again because the first one already get most of the memory but 
> without release it.

User-mapped memory should be freed by the kernel when the process dies.

However if there is multiprocess, shmem, or kernel resources being used in 
your configuration anything could happen, including leaks.

Do you have some specifics about what you were doing / running?

Matthew.

[dpdk-dev] Clang Scan build results

2014-08-27 Thread Matthew Hall

On Wed, Aug 27, 2014 at 03:13:43PM +, Wiles, Roger Keith wrote:
> Hi Everyone,

Hi Keith,

For me the build failed with clang but I made a series of awful patches to get 
it to compile... not sure if the clang failures could be related to your 
scan-build failures. If it will help you I can provide you the patches I made 
to get it to work on clang... they are not ready for the DPDK master branch 
but it's really good to get safe output from scan-build.

> It would be nice to run once in a while to weed out any basic problems. We 
> could run something like PC-Lint or Coverity, but they cost money :-)

Not 100% true... you can run Coverity for free on open source if you are the 
maintainer... given Intel, Wind River, and 6WIND all have some form of 
maintainership authority over DPDK there should be a way to qualify via this 
avenue.

Matthew.

[dpdk-dev] rte_mbuf: documentation, meta-data, and inconsistencies

2014-08-28 Thread Matthew Hall

On Thu, Aug 28, 2014 at 08:00:59PM -0400, daniel chapiesky wrote:
> But, in the end, sharing the meta-data area with the packet headroom seems
> to be a very
> bad idea.
> 
> Sincerely,
> 
> Daniel Chapiesky

You might have picked a good time to inquire about it as some of the Intel 
guys are making patches to clean up rte_mbuf during the last couple of weeks 
as we speak.

Matthew.

[dpdk-dev] A question about hugepage initialization time

2014-12-09 Thread Matthew Hall

On Tue, Dec 09, 2014 at 10:33:59AM -0600, Matt Laswell wrote:
> Our DPDK application deals with very large in memory data structures, and
> can potentially use tens or even hundreds of gigabytes of hugepage memory.

What you're doing is an unusual use case and this is open source code where 
nobody might have tested and QA'ed this yet.

So my recommendation would be adding some rte_log statements to measure the 
various steps in the process to see what's going on. Also using the Linux Perf 
framework to do low-overhead sampling-based profiling, and making sure you've 
got everything compiled with debug symbols so you can see what's consuming the 
execution time.

You might find that it makes sense to use some custom allocators like jemalloc 
alongside of the DPDK allocators, including perhaps "transparent hugepage 
mode" in your process, and some larger page sizes to reduce the number of 
pages.

You can also use this handy kernel options, hugepagesz= hugepages=N . 
This creates guaranteed-contiguous known-good hugepages during boot which 
initialize much more quickly with less trouble and glitches in my experience.

https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
https://www.kernel.org/doc/Documentation/vm/transhuge.txt

There is no one-size-fits-all solution but these are some possibilities.

Good Luck,
Matthew.

[dpdk-dev] [PATCH] examples: fix clang compilation issues

2014-07-20 Thread Matthew Hall

Contains adjustments to some warnings which prevent examples from compiling 
under clang.

Contains ability to override RTE_SDK_BIN in rte.extvars.mk, without which it's 
impossible to compile the examples against DPDK if it compiled into any build 
directory not named identically to the RTE_TARGET.

Signed-off-by: Matthew Hall

[dpdk-dev] [PATCH 1/4] l3fwd: some functions are unused in l3fwd-acl

2014-07-20 Thread Matthew Hall

Signed-off-by: Matthew Hall 
---
 examples/l3fwd/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/examples/l3fwd/Makefile b/examples/l3fwd/Makefile
index 68de8fc..5cd7396 100644
--- a/examples/l3fwd/Makefile
+++ b/examples/l3fwd/Makefile
@@ -46,6 +46,7 @@ SRCS-y := main.c

 CFLAGS += -O3 $(USER_FLAGS)
 CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-unused-function

 # workaround for a gcc bug with noreturn attribute
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
-- 
1.9.1

[dpdk-dev] [PATCH 2/4] virtio-net.c: incorrect parens around equality check

2014-07-20 Thread Matthew Hall

Signed-off-by: Matthew Hall 
---
 examples/vhost/virtio-net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/examples/vhost/virtio-net.c b/examples/vhost/virtio-net.c
index 801607a..5e659c7 100644
--- a/examples/vhost/virtio-net.c
+++ b/examples/vhost/virtio-net.c
@@ -280,8 +280,8 @@ get_config_ll_entry(struct vhost_device_ctx ctx)

/* Loop through linked list until the device_fh is found. */
while (ll_dev != NULL) {
-   if ((ll_dev->dev.device_fh == ctx.fh))
-return ll_dev;
+   if (ll_dev->dev.device_fh == ctx.fh)
+   return ll_dev;
ll_dev = ll_dev->next;
}

-- 
1.9.1

[dpdk-dev] [PATCH 3/4] examples/*: -Wno-switch required for weird ioctl() ID's

2014-07-20 Thread Matthew Hall

Signed-off-by: Matthew Hall 
---
 examples/multi_process/client_server_mp/mp_client/Makefile | 1 +
 examples/multi_process/client_server_mp/mp_server/Makefile | 1 +
 examples/multi_process/l2fwd_fork/Makefile | 1 +
 examples/multi_process/simple_mp/Makefile  | 1 +
 examples/multi_process/symmetric_mp/Makefile   | 1 +
 examples/netmap_compat/Makefile| 2 ++
 examples/netmap_compat/bridge/Makefile | 1 +
 examples/vhost/Makefile| 1 +
 8 files changed, 9 insertions(+)

diff --git a/examples/multi_process/client_server_mp/mp_client/Makefile 
b/examples/multi_process/client_server_mp/mp_client/Makefile
index 2688fed..ba2481d 100644
--- a/examples/multi_process/client_server_mp/mp_client/Makefile
+++ b/examples/multi_process/client_server_mp/mp_client/Makefile
@@ -43,6 +43,7 @@ APP = mp_client
 SRCS-y := client.c

 CFLAGS += $(WERROR_FLAGS) -O3
+CFLAGS += -Wno-switch
 CFLAGS += -I$(SRCDIR)/../shared

 include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/multi_process/client_server_mp/mp_server/Makefile 
b/examples/multi_process/client_server_mp/mp_server/Makefile
index c29e478..bf96c30 100644
--- a/examples/multi_process/client_server_mp/mp_server/Makefile
+++ b/examples/multi_process/client_server_mp/mp_server/Makefile
@@ -52,6 +52,7 @@ SRCS-y := main.c init.c args.c
 INC := $(wildcard *.h)

 CFLAGS += $(WERROR_FLAGS) -O3
+CFLAGS += -Wno-switch
 CFLAGS += -I$(SRCDIR)/../shared

 # for newer gcc, e.g. 4.4, no-strict-aliasing may not be necessary
diff --git a/examples/multi_process/l2fwd_fork/Makefile 
b/examples/multi_process/l2fwd_fork/Makefile
index ff257a3..2c5a808 100644
--- a/examples/multi_process/l2fwd_fork/Makefile
+++ b/examples/multi_process/l2fwd_fork/Makefile
@@ -46,5 +46,6 @@ SRCS-y := main.c flib.c

 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-switch

 include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/multi_process/simple_mp/Makefile 
b/examples/multi_process/simple_mp/Makefile
index 31ec0c8..dd43814 100644
--- a/examples/multi_process/simple_mp/Makefile
+++ b/examples/multi_process/simple_mp/Makefile
@@ -46,5 +46,6 @@ SRCS-y := main.c mp_commands.c

 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-switch

 include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/multi_process/symmetric_mp/Makefile 
b/examples/multi_process/symmetric_mp/Makefile
index c789f3c..9f1d184 100644
--- a/examples/multi_process/symmetric_mp/Makefile
+++ b/examples/multi_process/symmetric_mp/Makefile
@@ -46,5 +46,6 @@ SRCS-y := main.c

 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-switch

 include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/netmap_compat/Makefile b/examples/netmap_compat/Makefile
index d398e5f..4ee8c60 100644
--- a/examples/netmap_compat/Makefile
+++ b/examples/netmap_compat/Makefile
@@ -38,6 +38,8 @@ unexport RTE_SRCDIR RTE_OUTPUT RTE_EXTMK

 DIRS-y += bridge

+CFLAGS += -Wno-switch
+
 .PHONY: all clean $(DIRS-y)

 all: $(DIRS-y)
diff --git a/examples/netmap_compat/bridge/Makefile 
b/examples/netmap_compat/bridge/Makefile
index 50d96e8..f91a634 100644
--- a/examples/netmap_compat/bridge/Makefile
+++ b/examples/netmap_compat/bridge/Makefile
@@ -56,6 +56,7 @@ SRCS-y += compat_netmap.c

 CFLAGS += -O3 -I$(SRCDIR)/../lib -I$(SRCDIR)/../netmap
 CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-switch

 include $(RTE_SDK)/mk/rte.extapp.mk

diff --git a/examples/vhost/Makefile b/examples/vhost/Makefile
index f45f83f..2c1f0ba 100644
--- a/examples/vhost/Makefile
+++ b/examples/vhost/Makefile
@@ -53,6 +53,7 @@ SRCS-y := main.c virtio-net.c vhost-net-cdev.c

 CFLAGS += -O2 -I/usr/local/include -D_FILE_OFFSET_BITS=64 -Wno-unused-parameter
 CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-switch
 LDFLAGS += -lfuse

 include $(RTE_SDK)/mk/rte.extapp.mk
-- 
1.9.1

[dpdk-dev] [PATCH 4/4] rte.extvars.mk: allow user to override RTE_SDK_BIN

2014-07-20 Thread Matthew Hall

Without this patch it is impossible to compile the examples if you have
compiled the DPDK into the $(RTE_SDK)/build directory, or any other one
besides $(RTE_SDK)/$(RTE_TARGET).

Signed-off-by: Matthew Hall 
---
 mk/rte.extvars.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mk/rte.extvars.mk b/mk/rte.extvars.mk
index 3e5a990..fc583ce 100644
--- a/mk/rte.extvars.mk
+++ b/mk/rte.extvars.mk
@@ -51,7 +51,7 @@ endif
 RTE_EXTMK ?= $(RTE_SRCDIR)/Makefile
 export RTE_EXTMK

-RTE_SDK_BIN := $(RTE_SDK)/$(RTE_TARGET)
+RTE_SDK_BIN ?= $(RTE_SDK)/$(RTE_TARGET)

 #
 # Output files wil go in a separate directory: default output is
-- 
1.9.1

[dpdk-dev] [PATCH 1/4] l3fwd: some functions are unused in l3fwd-acl

2014-07-21 Thread Matthew Hall

The same code is used in l3fwd and l3fwd-acl. When it is reused in l3fwd-acl a 
packet processing function from original l3fwd is not used any more.
-- 
Sent from my mobile device.

On July 21, 2014 6:44:41 AM PDT, Thomas Monjalon  
wrote:
>Hi,
>
>2014-07-20 20:47, Matthew Hall:
>> +CFLAGS += -Wno-unused-function
>
>I think it's better to fix the code instead of removing a warning.
>If there is a very good reason to not do it, it would appear in the
>log.
>
>Thanks

[dpdk-dev] [PATCH 2/4] virtio-net.c: incorrect parens around equality check

2014-07-22 Thread Matthew Hall

On Tue, Jul 22, 2014 at 03:14:51PM +0200, Thomas Monjalon wrote:
> Hi Matthew,
> 
> I think that patches 1, 3 and 4 need some rework but this one is valid
> and has no relation with other ones in the serie. So it can be integrated now.

Great thanks!

I'll work a bit more on the others when I've got time.

Discovered them during a weekend Open Source project.

Matthew.

[dpdk-dev] Performance - linking against DPDK shared vs static libraries

2014-07-23 Thread Matthew Hall

On Wed, Jul 23, 2014 at 09:43:49PM +, Kavanagh, Mark B wrote:
> I take it ... that the performance drop when using shared libraries is 
> expected behavior?

s/expected behavior/and unavoidable consequence/g

;)

Matthew Hall.

[dpdk-dev] symbol conflicts between netinet/in.h, arpa/inet.h, and rte_ip.h

2014-07-24 Thread Matthew Hall

Hello,

I ran into some weird symbol conflicts between system netinet/in.h and DPDK 
rte_ip.h. They have a lot of duplicated definitions for stuff like IPPROTO_IP 
and so on. This breaks when you want to use inet_pton from arpa/inet.h, 
because it includes netinet/in.h to define struct in_addr.

Thus with all the conflicts it's impossible to use a DPDK IP struct instead of 
all the system's sockaddr stuff, to store a value from the system copy of 
inet_pton. This would be a common operation if, for example, you want to 
configure all the IP addresses on your box from a JSON file, which is what I 
was doing.

The DPDK kludged around it internally by using a file called 
cmdline_parse_ipaddr.c with private copies of these functions. But it in my 
opinion very unwisely marked all of the functions as static except for 
cmdline_parse_ipaddr, which only works on the DPDK's proprietary argument 
handling, and not with anything the user might have which is a different 
format.

So, it would be a big help for users if the macros in librte_net files would 
check if the symbols already existed, or if they had subheader files available 
to grab only non conflicting symbols, or if they would make a proper .h and 
factor all the inet_pton and inet_ntop inside the cmdline lib into a place 
where users can access them. It would also be a help if they had a less ugly 
equivalent to struct sockaddr, which let you work with IP addresses a bit more 
easily, such as something like this:

struct ip4_addr {
uint32_t addr;
};

typedef struct ip4_addr ip4_addr;

struct ip6_addr {
uint8_t addr[16];
};

typedef struct ip6_addr ip6_addr;

struct ip_addr {
uint8_t family;
uint8_t prefix;
union {
struct ip4_addr ipv4;
struct ip6_addr ipv6;
};
};

I had to create a bunch of duplicate code to handle it in my project, since 
the DPDK marked its copies of all these functions as "secret" and didn't make 
a .h for them. If any of it is useful I am happy to donate it, although I 
don't think I've got quite enough experience with this specifc part of the 
DPDK to code it up all by myself.

Thanks,
Matthew.

[dpdk-dev] [PATCH 3/5] i40e: support selecting hash functions

2014-07-24 Thread Matthew Hall

On Thu, Jul 24, 2014 at 09:59:23AM +0200, Thomas Monjalon wrote:
> Is it really a good idea to configure this kind of thing at build time?
> Maybe yes, I'm not sure.

Whether it's safe to set at runtime probably depends what happens to the card 
if it gets changed. Do you have to reset the card or the port? Or is it OK if 
it's more dynamic?

Matthew.

[dpdk-dev] [PATCH 3/5] i40e: support selecting hash functions

2014-07-24 Thread Matthew Hall

If no reboot of the card is needed then it's probably better to add it to one 
of the ethtool style APIs...
-- 
Sent from my mobile device.

On July 24, 2014 1:07:37 AM PDT, Thomas Monjalon  
wrote:
>2014-07-24 01:01, Matthew Hall:
>> On Thu, Jul 24, 2014 at 09:59:23AM +0200, Thomas Monjalon wrote:
>> > Is it really a good idea to configure this kind of thing at build
>time?
>> > Maybe yes, I'm not sure.
>> 
>> Whether it's safe to set at runtime probably depends what happens to
>the card 
>> if it gets changed. Do you have to reset the card or the port? Or is
>it OK if 
>> it's more dynamic?
>
>No, we can change configuration with rte_eth_dev_configure() before
>initializing port. So it is truly configurable.
>Requiring recompilation means it's not really configurable between 2
>runs.
>And it breaks binary packaging for Linux distributions.
>
>Many things in DPDK are configured at build time. But we should wonder
>if
>it's really a good design.

[dpdk-dev] symbol conflicts between netinet/in.h, arpa/inet.h, and rte_ip.h

2014-07-24 Thread Matthew Hall

On Thu, Jul 24, 2014 at 10:55:59PM +, Antti Kantee wrote:
> In my experience from years of fighting with more or less this exact same
> problem -- the fight is now thankfully over but the scars remain -- you
> either want to expose a complete set of types and provide support for
> everything, or you want to expose nothing.

I don't have a problem with this approach. Implementing it would require the 
DPDK to create user accessible functions for creating MAC addresses, IPs, 
CIDRs, and TCP / UDP port numbers. Many of the things required are hiding 
inside of *cmdline* code where it's impossible for anybody to access them.

Matthew.

[dpdk-dev] symbol conflicts between netinet/in.h, arpa/inet.h, and rte_ip.h

2014-07-24 Thread Matthew Hall

I don't know if it will work right on both Linux and BSD and/or if they also 
100% agree with the libc / glibc values compiled into the system's .so files. 
That's the risk that you run if you don't have more complete support in the 
DPDK itself for these features.
-- 
Sent from my mobile device.

On July 24, 2014 6:12:18 PM PDT, "Wu, Jingjing"  
wrote:
>Hello,
>
>We also notice these conflicts, we just planned to fix it in our new
>feature development. The proposal is like:
>
>#ifndef _NETINET_IN_H
>#ifndef _NETINET_IN_H_
>
>#define IPPROTO_IP 0
> ... ... 
>#define IPPROTO_MAX  256
>
>#endif
>#endif
>
>Do you think it is a good idea?
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Antti Kantee
>> Sent: Friday, July 25, 2014 6:56 AM
>> To: Matthew Hall; dev at dpdk.org
>> Subject: Re: [dpdk-dev] symbol conflicts between netinet/in.h,
>arpa/inet.h, and rte_ip.h
>> 
>> On 24/07/14 07:59, Matthew Hall wrote:
>> > Hello,
>> >
>> > I ran into some weird symbol conflicts between system netinet/in.h
>and DPDK
>> > rte_ip.h. They have a lot of duplicated definitions for stuff like
>IPPROTO_IP
>> > and so on. This breaks when you want to use inet_pton from
>arpa/inet.h,
>> > because it includes netinet/in.h to define struct in_addr.
>> 
>> I would namespace the definitions in DPDK, i.e. make them
>> DPDK_IPPROTO_FOO etc.
>> 
>> > Thus with all the conflicts it's impossible to use a DPDK IP struct
>instead of
>> > all the system's sockaddr stuff, to store a value from the system
>copy of
>> > inet_pton. This would be a common operation if, for example, you
>want to
>> > configure all the IP addresses on your box from a JSON file, which
>is what I
>> > was doing.
>> >
>> > The DPDK kludged around it internally by using a file called
>> > cmdline_parse_ipaddr.c with private copies of these functions. But
>it in my
>> > opinion very unwisely marked all of the functions as static except
>for
>> > cmdline_parse_ipaddr, which only works on the DPDK's proprietary
>argument
>> > handling, and not with anything the user might have which is a
>different
>> > format.
>> 
>> In my experience from years of fighting with more or less this exact
>> same problem -- the fight is now thankfully over but the scars remain
>--
>> you either want to expose a complete set of types and provide support
>> for everything, or you want to expose nothing.  Approaches where you
>use
>> cute definitions and reuse some host routines is like asking for an
>> audience with Tyranthraxus when armed with a kitten.  It's that
>doubly
>> so if you don't have to and do it anyway.
>> 
>> > So, it would be a big help for users if the macros in librte_net
>files would
>> > check if the symbols already existed, or if they had subheader
>files available
>> > to grab only non conflicting symbols, or if they would make a
>proper .h and
>> > factor all the inet_pton and inet_ntop inside the cmdline lib into
>a place
>> > where users can access them. It would also be a help if they had a
>less ugly
>> > equivalent to struct sockaddr, which let you work with IP addresses
>a bit more
>> > easily, such as something like this:
>> 
>> Again, I recommend steering away from any tightrope approaches that
>> "know" which types are non-conflicting, or pick out half-and-half
>from
>> the host and IP stack.  "Do, or do not, there is no half-and-half"

[dpdk-dev] symbol conflicts between netinet/in.h, arpa/inet.h, and rte_ip.h

2014-07-25 Thread Matthew Hall

If the bare metal mode is getting yanked, then I think we can go with Antti's 
advice and just yank the conflicting symbols and use the system versions.
-- 
Sent from my mobile device.

On July 25, 2014 7:40:02 AM PDT, Antti Kantee  wrote:
>On 25/07/14 10:43, Thomas Monjalon wrote:
>>> On 24/07/14 07:59, Matthew Hall wrote:
>>>> I ran into some weird symbol conflicts between system netinet/in.h
>and DPDK
>>>> rte_ip.h. They have a lot of duplicated definitions for stuff like
>IPPROTO_IP
>>>> and so on. This breaks when you want to use inet_pton from
>arpa/inet.h,
>>>> because it includes netinet/in.h to define struct in_addr.
>> [...]
>>> Again, I recommend steering away from any tightrope approaches that
>>> "know" which types are non-conflicting, or pick out half-and-half
>from
>>> the host and IP stack.  "Do, or do not, there is no half-and-half"
>>
>> The general problem here is that DPDK is conflicting with libc.
>> So the obvious question would be: "why DPDK needs to redefine libc
>stuff"?
>> I don't see any obvious answer since bare metal is planned to be
>removed.
>> (see http://dpdk.org/ml/archives/dev/2014-June/003868.html)
>
>One reason is if you want DPDK to be a portable network programming 
>environment.  Especially in that case you do not want definitions based
>
>on hackish assumptions of some particular version of some particular 
>host implementation.  However, I'm not trying to argue if DPDK should
>or 
>shouldn't be that, just that you should either dramatically improve the
>
>current implementation or nuke it.

[dpdk-dev] [PATCHv5] librte_acl make it build/work for 'default' target

2014-09-02 Thread Matthew Hall

On Wed, Sep 03, 2014 at 03:29:16AM +0200, Thomas Monjalon wrote:
> > > Make ACL library to build/work on 'default' architecture:

Upon reading all the steps taken to implement the new multi-arch version of 
the code, I had a funny feeling where each time I asked, "But... what if X???" 
the next step addressed X, until I had gone through all my questions.

Now I am looking forward to using that code in the future.

Matthew.

[dpdk-dev] TCP/IP stack for DPDK

2014-09-08 Thread Matthew Hall

On Tue, Sep 09, 2014 at 08:49:44AM +0800, zimeiw wrote:
> I have porting major FreeBSD tcp/ip stack to dpdk. new tcp/ip stack is based 
> on dpdk rte_mbuf, rte_ring, rte_memory and rte_table. it is faster to 
> forwarding packets.

Hello,

This is awesome work to be doing and badly needed to use DPDK for any L4 
purposes where it is very limited. I'll be following your progress.

You didn't mention your name, and compare your work with 
https://github.com/rumpkernel/dpdk-rumptcpip/ , and talk about behavior / 
performance, and how long you think it'll take. I'm curious if you can give 
some more comments.

I'm implementing an RX-side very basic stack myself... but I'm not using BSD 
standard APIs or doing TX-side like yours will have.

Matthew.

[dpdk-dev] TCP/IP stack for DPDK

2014-09-08 Thread Matthew Hall

On Tue, Sep 09, 2014 at 06:47:48AM +, Zhang, Helin wrote:
> That means your great works under GPL/LGPL license will not occur in DPDK 
> main line, as it is always BSD license.
> 
> Regards,
> Helin

However despite this issue, there are some cases where the Linux stack is 
greatly superior to the BSD one although normally the opposite is the case... 
AF_NETLINK for configuring 10,000+ IP addresses, especially for L4-L7 
performance testing, would be one possible example of this. Another potential 
example would be the BPF JIT compiler if you want to combine BPF filters with 
DPDK (something I'm doing right now in my own code actually).

Matthew.

[dpdk-dev] Defaults for rte_hash

2014-09-09 Thread Matthew Hall

Hello,

I was looking at the code which inits rte_hash objects in examples/l3fwd. It's 
using approx. 1M to 4M hash 'entries' depending on 32-bit vs 64-bit, but it's 
setting the 'bucket_entries' to just 4.

Normally I'm used to using somewhat deeper hash buckets than that... it seems 
like having a zillion little tiny hash buckets would cause more TLB pressure 
and memory overhead... or does 4 get shifted / exponentiated into 2**4 ?

The documentation in http://dpdk.org/doc/api/structrte__hash__parameters.html 
and http://dpdk.org/doc/api/rte__hash_8h.html isn't that clear... is there a 
better place to look for this?

In my case I'm looking to create a table of 4M or 8M entries, containing 
tables of security threat IPs / domains, to be detected in the traffic. So it 
would be good to have some understanding how not to waste a ton of memory on a 
table this huge without making it run super slow either.

Did anybody have some experience with how to get this right?

Another thing... the LPM table uses 16-bit Hop IDs. But I would probably have 
more than 64K CIDR blocks of badness on the Internet available to me for 
analysis. How would I cope with this, besides just letting some attackers 
escape unnoticed? ;)

Have we got some kind of structure which allows a greater number of CIDRs even 
if it's not quite as fast?

Thanks,
Matthew.

[dpdk-dev] Defaults for rte_hash

2014-09-09 Thread Matthew Hall

On Tue, Sep 09, 2014 at 11:42:40AM +, De Lara Guarch, Pablo wrote:
> That 4 is not shifted, so it is actually 4 entries/bucket. Actually, the 
> maximum number of entries you can use is 16, as bucket will be as big as a 
> cache line. However, regardless the number of entries, memory size will 
> remain the same, but using 4 entries/bucket, with 16-byte key, all keys 
> stored for a bucket will fit in a cache line, so performance looks to be 
> better in this case (although a non-optimal hash function could lead not to 
> be able to store all keys, as chances to fill a bucket are higher). Anyway, 
> for this example, 4 entries/bucket looks a good number to me.

So, a general purpose hash usually has some kind of conflict resolution when a 
bucket is full rather than just tossing out entries. It could be open 
addressing, chaining, secondary hashing, etc.

If I'm putting security indicators into a bucket and the buckets just toss 
stuff out without warning that's a security problem. Same thing could be true 
for firewall tables.

Also, if we're assuming a 16-byte key, what happens when I want to do matching 
against www.badness.com or www.this-is-a-really-long-malware-domain.net ?

Did anybody have a performant general purpose hash table for DPDK that doesn't 
have problems with bigger keys or depth issues in a bucket?

Matthew.

[dpdk-dev] TCP/IP stack for DPDK

2014-09-09 Thread Matthew Hall

On Tue, Sep 09, 2014 at 07:54:19AM -0700, Stephen Hemminger wrote:
> Porting Linux stack to DPDK opens up a licensing can of worms.
> Linux code is GPLv2, and DPDK code is BSD. Any combination of the two would
> end up
> being covered by the Linux GPLv2 license.

It would be a can of worms for a closed-source app. Which is why some others 
have used the BSD stack. But it doesn't mean it isn't useful code.

Matthew.

[dpdk-dev] TCP/IP stack for DPDK

2014-09-09 Thread Matthew Hall

On Tue, Sep 09, 2014 at 08:00:32AM -0700, Jim Thompson wrote:
> BPF JIT, or even pflua[1] should be straight-forward to put on top of DPDK.  
> (It?s straight-forward to do on top of netmap.)
> 
> jim

The pflua guys made a user-space copy of Linux BPF JIT. I'm planning to use 
that because it was almost as fast as pflua with a lot fewer usage headaches 
and dependencies.

I'm making an MIT licensed app... so it isn't an issue for me personally if 
there is some GPL2 Linux code present. I don't think anybody made a non-rump 
version of the BSD one yet or I'd use that... I'm trying not to stray too far 
from the app's original purposes until it has some working features present.

Until that time comes, I just started out with libpcap offline mode BPF for 
development purposes because it's standard and already available, and allows 
operations upon raw packet pointers with no issues at all.

Matthew.

[dpdk-dev] TCP/IP stack for DPDK

2014-09-09 Thread Matthew Hall

On Tue, Sep 09, 2014 at 10:30:01PM +0100, Alexander Nasonov wrote:
> sys/net/bpfjit.c in NetBSD should be very easy to adapt to Linux.
> I was often testing it on Linux in userspace (without mbuf support).
> At the moment, I'm only allowed to work on some NetBSD projects and
> I can't adapt bpfjit to anything outside of NetBSD but when I last
> compiled bpfjit on Linux, it took me about a minute to fix compilation.
> 
> Please try github version, it's not up-to-date but it worked on Linux:
> 
> https://github.com/alnsn/bpfjit
> 
> Alex

Alex,

You rock, thanks for supplying this, I'll be sure to use it along with 
upstream changes from BSD to get a friendlier license for users of my code, 
whoever they might eventually be.

If I forked this from you and updated it to the latest code periodically for 
performance, security, and features, would you accept the pull requests?

Matthew.

[dpdk-dev] [PATCH] librte_log: add function to retrieve log_level

2014-09-14 Thread Matthew Hall

Signed-off-by: Matthew Hall 
---
 lib/librte_eal/common/eal_common_log.c  | 7 +++
 lib/librte_eal/common/include/rte_log.h | 6 ++
 2 files changed, 13 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_log.c 
b/lib/librte_eal/common/eal_common_log.c
index e4df0b9..d979f28 100644
--- a/lib/librte_eal/common/eal_common_log.c
+++ b/lib/librte_eal/common/eal_common_log.c
@@ -176,6 +176,13 @@ rte_set_log_level(uint32_t level)
rte_logs.level = (uint32_t)level;
 }

+/* Get global log level */
+uint32_t
+rte_get_log_level()
+{
+   return rte_logs.level;
+}
+
 /* Set global log type */
 void
 rte_set_log_type(uint32_t type, int enable)
diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index 565415a..7f1c2f9 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -130,6 +130,12 @@ int rte_openlog_stream(FILE *f);
 void rte_set_log_level(uint32_t level);

 /**
+ * Get the global log level.
+ *
+ */
+uint32_t rte_get_log_level(void);
+
+/**
  * Enable or disable the log type.
  *
  * @param type
-- 
1.9.1

[dpdk-dev] [PATCH] librte_log: add function to retrieve log_level

2014-09-14 Thread Matthew Hall

On Sun, Sep 14, 2014 at 01:34:46AM -0700, Matthew Hall wrote:
> Signed-off-by: Matthew Hall 
> ---
>  lib/librte_eal/common/eal_common_log.c  | 7 +++
>  lib/librte_eal/common/include/rte_log.h | 6 ++
>  2 files changed, 13 insertions(+)

I forgot to mention in the comments, this patch is helpful when you want 
outside code to cooperate with and respect log levels set in DPDK. Then you 
can avoid using duplicate incompatible log code in the DPDK and non-DPDK parts 
of the app.

Matthew.

[dpdk-dev] [PATCH] librte_log: add function to retrieve log_level

2014-09-15 Thread Matthew Hall

Thanks for the ack Bruce! Used this one to clean up a lot of grubby app-side 
code and I hate forking open source for too long if it can be avoided.

Matthew.
-- 
Sent from my mobile device.

On September 15, 2014 1:14:57 AM PDT, "Richardson, Bruce"  wrote:
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matthew Hall
>> Sent: Sunday, September 14, 2014 9:35 AM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH] librte_log: add function to retrieve
>log_level
>> 
>> Signed-off-by: Matthew Hall 
>Acked-by: Bruce Richardson 
>
>> ---
>>  lib/librte_eal/common/eal_common_log.c  | 7 +++
>>  lib/librte_eal/common/include/rte_log.h | 6 ++
>>  2 files changed, 13 insertions(+)
>> 
>> diff --git a/lib/librte_eal/common/eal_common_log.c
>> b/lib/librte_eal/common/eal_common_log.c
>> index e4df0b9..d979f28 100644
>> --- a/lib/librte_eal/common/eal_common_log.c
>> +++ b/lib/librte_eal/common/eal_common_log.c
>> @@ -176,6 +176,13 @@ rte_set_log_level(uint32_t level)
>>  rte_logs.level = (uint32_t)level;
>>  }
>> 
>> +/* Get global log level */
>> +uint32_t
>> +rte_get_log_level()
>> +{
>> +return rte_logs.level;
>> +}
>> +
>>  /* Set global log type */
>>  void
>>  rte_set_log_type(uint32_t type, int enable)
>> diff --git a/lib/librte_eal/common/include/rte_log.h
>> b/lib/librte_eal/common/include/rte_log.h
>> index 565415a..7f1c2f9 100644
>> --- a/lib/librte_eal/common/include/rte_log.h
>> +++ b/lib/librte_eal/common/include/rte_log.h
>> @@ -130,6 +130,12 @@ int rte_openlog_stream(FILE *f);
>>  void rte_set_log_level(uint32_t level);
>> 
>>  /**
>> + * Get the global log level.
>> + *
>> + */
>> +uint32_t rte_get_log_level(void);
>> +
>> +/**
>>   * Enable or disable the log type.
>>   *
>>   * @param type
>> --
>> 1.9.1

[dpdk-dev] [PATCH] librte_log: add function to retrieve log_level

2014-09-15 Thread Matthew Hall

The real effort was the client-side cleanup. I had to get rid of pages of logs 
for every packet flowing through and I hate yanking out logs... but some of 
them call weird functions like memdump and pktmbuf_dump. There's no good way to 
clean those up without a function to check the current loglevel. And reaching 
into private structs to get it seemed like an uncivilized thing to do.

Good news is, another couple weeks and some Coverity patches and I'll have 
shareable code to hand out.
-- 
Sent from my mobile device.

On September 15, 2014 1:20:32 AM PDT, "Richardson, Bruce"  wrote:
>> -Original Message-
>> From: Matthew Hall [mailto:mhall at mhcomputing.net]
>> Sent: Monday, September 15, 2014 9:17 AM
>> To: Richardson, Bruce; dev at dpdk.org
>> Subject: RE: [dpdk-dev] [PATCH] librte_log: add function to retrieve
>log_level
>> 
>> Thanks for the ack Bruce! Used this one to clean up a lot of grubby
>app-side
>> code and I hate forking open source for too long if it can be
>avoided.
>
>No problem. With such huge patches as this the code review takes many
>hours of strenuous effort! :-)
>
>/Bruce
>> 
>> Matthew.
>> --
>> Sent from my mobile device.
>> 
>> On September 15, 2014 1:14:57 AM PDT, "Richardson, Bruce"
>>  wrote:
>> >> -Original Message-
>> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matthew Hall
>> >> Sent: Sunday, September 14, 2014 9:35 AM
>> >> To: dev at dpdk.org
>> >> Subject: [dpdk-dev] [PATCH] librte_log: add function to retrieve
>> >log_level
>> >>
>> >> Signed-off-by: Matthew Hall 
>> >Acked-by: Bruce Richardson 
>> >
>> >> ---
>> >>  lib/librte_eal/common/eal_common_log.c  | 7 +++
>> >>  lib/librte_eal/common/include/rte_log.h | 6 ++
>> >>  2 files changed, 13 insertions(+)
>> >>
>> >> diff --git a/lib/librte_eal/common/eal_common_log.c
>> >> b/lib/librte_eal/common/eal_common_log.c
>> >> index e4df0b9..d979f28 100644
>> >> --- a/lib/librte_eal/common/eal_common_log.c
>> >> +++ b/lib/librte_eal/common/eal_common_log.c
>> >> @@ -176,6 +176,13 @@ rte_set_log_level(uint32_t level)
>> >>   rte_logs.level = (uint32_t)level;
>> >>  }
>> >>
>> >> +/* Get global log level */
>> >> +uint32_t
>> >> +rte_get_log_level()
>> >> +{
>> >> + return rte_logs.level;
>> >> +}
>> >> +
>> >>  /* Set global log type */
>> >>  void
>> >>  rte_set_log_type(uint32_t type, int enable)
>> >> diff --git a/lib/librte_eal/common/include/rte_log.h
>> >> b/lib/librte_eal/common/include/rte_log.h
>> >> index 565415a..7f1c2f9 100644
>> >> --- a/lib/librte_eal/common/include/rte_log.h
>> >> +++ b/lib/librte_eal/common/include/rte_log.h
>> >> @@ -130,6 +130,12 @@ int rte_openlog_stream(FILE *f);
>> >>  void rte_set_log_level(uint32_t level);
>> >>
>> >>  /**
>> >> + * Get the global log level.
>> >> + *
>> >> + */
>> >> +uint32_t rte_get_log_level(void);
>> >> +
>> >> +/**
>> >>   * Enable or disable the log type.
>> >>   *
>> >>   * @param type
>> >> --
>> >> 1.9.1

[dpdk-dev] Userland IP stack for DPDK

2014-09-17 Thread Matthew Hall

On Thu, Sep 18, 2014 at 07:13:43AM +0300, Vadim Suraev wrote:
> Hi,
> I've published the source code of Linux kernel IP stack ported to user space
> and integrated with DPDK (1.6 currently).
> I includes also some example applications & test scripts as well as
> documentation
> about what & how is done and how to use & adapt it.
> The source:
> git at github.com:vadimsu/ipaugenblick.git
> 
> two branches - master & muticore
> Documentation:
> - README at the root of the project
> ipaugenblick.net
> Please feel free to ask questions you may have
> Regards,
>  Vadim.

Hi Vadim,

The docs you wrote are very good. Not many developers offer support over Skype 
either, this is quite unique and very cool! :)

Definitely going to find some time to experiment with it. In a few weeks I'll 
release my code too.

Matthew.

[dpdk-dev] [PATCH] librte_log: add function to retrieve log_level

2014-09-18 Thread Matthew Hall

On Thu, Sep 18, 2014 at 03:29:11PM +0200, Thomas Monjalon wrote:
> void is missing here

Ah yes, sorry for missing it.

Thanks for correcting and applying, glad to have it available upstream.

Matthew.

[dpdk-dev] compile error with linuxapp-clang target on Fedora 20 with 3.15.10 kernel

2014-09-22 Thread Matthew Hall

I fixed some of the clang errors a few weeks ago. But some of my patches got 
sent back due to issues seen by others and I didn't have time to fix them yet.
-- 
Sent from my mobile device.

On September 22, 2014 6:18:51 AM PDT, Neil Horman  
wrote:
>On Mon, Sep 22, 2014 at 09:36:33AM +, Richardson, Bruce wrote:
>> Hi all,
>> 
>> just looking to see if anyone has any suggestions to help me debug an
>issue I'm seeing here. Basically, the clang target is no longer working
>for me on Fedora 20 -due to errors when compiling up the kernel
>modules. The interesting thing is that the gcc target works fine, while
>the clang target doesn't - despite the fact that gcc is used as the
>compiler for the kernel modules in both builds. Something else about
>the clang target is affecting the kernel compile.
>> 
>> From my investigation, it looks like the compiler flags used in both
>cases are different, but I'm not sure why. The output of compiling up
>the first of the kni files is shown below, first for a regular gcc
>target and secondly for the clang target. From what I read, some
>initial support for clang compiler went into the 3.15 kernels - is it
>possible that clang is being incorrectly detected as the kernel
>compiler by the kernel build system in the second case?
>> 
>> Regards,
>> /Bruce
>> 
>>  GCC TARGET COMPILE
>> == Build lib/librte_eal/linuxapp/kni
>> make -C /usr/src/kernels/3.15.10-201.fc20.x86_64 \
>> KBUILD_SRC=/usr/src/kernels/3.15.10-201.fc20.x86_64 \
>>
>KBUILD_EXTMOD="/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni"
>-f /usr/src/kernels/3.15.10-201.fc20.x86_64/Makefile \
>> 
>> test -e include/generated/autoconf.h -a -e include/config/auto.conf
>|| (\
>> echo >&2;   \
>> echo >&2 "  ERROR: Kernel configuration is invalid.";   \
>> echo >&2 " include/generated/autoconf.h or
>include/config/auto.conf are missing.";\
>> echo >&2 " Run 'make oldconfig && make prepare' on kernel src
>to fix it.";  \
>> echo >&2 ;  \
>> /bin/false)
>> mkdir -p
>/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/.tmp_versions
>; rm -f
>/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/.tmp_versions/*
>> make -f
>/usr/src/kernels/3.15.10-201.fc20.x86_64/scripts/Makefile.build
>obj=/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni
>>   gcc
>-Wp,-MD,/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/.ixgbe_main.o.d
>-nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.8.3/include
>-I/usr/src/kernels/3.15.10-201.fc20.x86_64/arch/x86/include
>-Iarch/x86/include/generated 
>-I/usr/src/kernels/3.15.10-201.fc20.x86_64/include -Iinclude
>-I/usr/src/kernels/3.15.10-201.fc20.x86_64/arch/x86/include/uapi
>-Iarch/x86/include/generated/uapi
>-I/usr/src/kernels/3.15.10-201.fc20.x86_64/include/uapi
>-Iinclude/generated/uapi -include
>/usr/src/kernels/3.15.10-201.fc20.x86_64/include/linux/kconfig.h  
>-I/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni
>-D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
>-fno-strict-aliasing -fno-common -Werror-implicit-function-declaration
>-Wno-format-security -fno-delete-null-pointer-checks -O2 -m64 -mno-mmx
>-mno-sse -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3
>-mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time
>-maccumulate-outgoing-args -DCONFIG_AS_CFI=1
>-DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1
>-DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1
>-DCONFIG_AS_AVX2=1 -pipe -Wno-sign-compare
>-fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow
>-mno-avx -Wframe-larger-than=2048 -fstack-protector-strong
>-Wno-unused-but-set-variable -fno-omit-frame-pointer
>-fno-optimize-sibling-calls -fno-var-tracking-assignments -g -pg
>-mfentry -DCC_USING_FENTRY -Wdeclaration-after-statement
>-Wno-pointer-sign -fno-strict-overflow -fconserve-stack
>-Werror=implicit-int -Werror=strict-prototypes -DCC_HAVE_ASM_GOTO  
>-I/home/bruce/mbuf_rework/lib/librte_eal/linuxapp/kni --param
>max-inline-insns-single=50  
>-I/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/include  
>-I/home/bruce/mbuf_rework/lib/librte_eal/linuxapp/kni/ethtool/ixgbe  
>-I/home/bruce/mbuf_rework/lib/librte_eal/linuxapp/kni/ethtool/igb
>-include
>/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/include/rte_config.h
>-Wall -Werror  -DMODULE  -D"KBUILD_STR(s)=#s"
>-D"KBUILD_BASENAME=KBUILD_STR(ixgbe_main)" 
>-D"KBUILD_MODNAME=KBUILD_STR(rte_kni)" -c -o
>/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/ixgbe_main.o
>/home/bruce/mbuf_rework/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/kni/ixgbe_main.c
>> 
>>  CLANG TARGET COMPILE (MODULE COMPILED USING GCC)
>>

[dpdk-dev] compile error with linuxapp-clang target on Fedora 20 with 3.15.10 kernel

2014-09-22 Thread Matthew Hall

On Mon, Sep 22, 2014 at 04:05:29PM -0400, Neil Horman wrote:
> On Mon, Sep 22, 2014 at 12:23:36PM -0700, Matthew Hall wrote:
> > I fixed some of the clang errors a few weeks ago. But some of my patches 
> > got sent back due to issues seen by others and I didn't have time to fix 
> > them yet.
> Can you elaborate on the specific issue here?
> Neil

Sure...

Have a look at this thread. With this, I got it compiling fine with Clang on 
Ubuntu 14.04 LTS.

Some of your stuff was funky kernel problems... I probably didn't get that as 
I was using an earlier kernel release.

One of the patches was merged as it was trivial but the others involved 
disabling some warnings on certain examples... but people said they preferred 
using ifdef's instead to fix them, which I didn't get a chance to do yet.

Maybe we could try and make all of these clang fixes happen together. I really 
value the better error messages, I can fix bugs much quicker with all of 
those.

Matthew.

[dpdk-dev] LRU using DPDK 1.7

2014-09-22 Thread Matthew Hall

On Tue, Sep 23, 2014 at 01:08:21AM +, Saha, Avik (AWS) wrote:
> I was wondering if there is way to use the rte_table_hash_lru without 
> building a pipeline - Basically using the same hash table like functionality 
> of add, delete and lookup without setting up a pipeline and connect it to 
> ports etc.

I've been finding that rte_hash is designed only for some very specialized 
purposes. It doesn't work well if you use unexpected sizes of keys or want 
behavior that isn't precisely doing what the designers of the hash used it 
for... it's not very general-purpose.

I did try to point out one example of the issue but I didn't get much response 
yet to my questions about its limitations and whether a more general-purpose 
table was available, or at least some discussion what rte_hash is for and what 
it's not for.

Matthew.

[dpdk-dev] LRU using DPDK 1.7

2014-09-22 Thread Matthew Hall

On Tue, Sep 23, 2014 at 03:43:59AM +, Saha, Avik (AWS) wrote:

> So with DPDK 1.7 there are 2 separate implementations - one is the rte_hash 
> which does not support LRU (at least to my understanding - I could be wrong 
> here) and then there is the librte_table library which has support for LRU 
> in a hash table. I m a little confused as to which one you are referring to 
> Matthew.

I'm referring to the fact that rte_hash'es of all types are kind of 
special-purpose. So it's important to clarify what kind of data you're 
planning to match, using which of the hashes, and which strategy.

In my case I wanted to use them for variable-length data which will likely not 
fit onto a cacheline such as URL's, and they don't work for this application.

Matthew.

[dpdk-dev] compile error with linuxapp-clang target on Fedora 20 with 3.15.10 kernel

2014-09-23 Thread Matthew Hall

I fixed one main libs bug which blocked compile that was trivial and got it 
applied. I had examples working too but using an impolite method of doing so.

As for the latest kernel stuff, it sounds like we have to get a hand from LKML 
or a sublist to figure it out, eh? Doesn't seem like it's in the DPDK code.

Matthew.
-- 
Sent from my mobile device.

On September 23, 2014 2:59:47 AM PDT, Bruce Richardson  wrote:
>On Mon, Sep 22, 2014 at 03:12:43PM -0700, Matthew Hall wrote:
>> On Mon, Sep 22, 2014 at 04:05:29PM -0400, Neil Horman wrote:
>> > On Mon, Sep 22, 2014 at 12:23:36PM -0700, Matthew Hall wrote:
>> > > I fixed some of the clang errors a few weeks ago. But some of my
>patches got sent back due to issues seen by others and I didn't have
>time to fix them yet.
>> > Can you elaborate on the specific issue here?
>> > Neil
>> 
>> Sure...
>> 
>> Have a look at this thread. With this, I got it compiling fine with
>Clang on 
>> Ubuntu 14.04 LTS.
>> 
>> Some of your stuff was funky kernel problems... I probably didn't get
>that as 
>> I was using an earlier kernel release.
>> 
>> One of the patches was merged as it was trivial but the others
>involved 
>> disabling some warnings on certain examples... but people said they
>preferred 
>> using ifdef's instead to fix them, which I didn't get a chance to do
>yet.
>> 
>> Maybe we could try and make all of these clang fixes happen together.
>I really 
>> value the better error messages, I can fix bugs much quicker with all
>of 
>> those.
>> 
>> Matthew.
>
>"make examples" on all the examples has failed for some time, but the 
>compilation of the main libs used to work. I've pulled down a 3.14
>kernel 
>for fedora from koji and confirmed that building with 
>"RTE_KERNELDIR=/usr/src/kernels/3.14.9-200.fc20.x86_64/" works fine.
>It's 
>something that has changed in 3.15 and beyond that is causing clang
>flags to 
>get passed in to gcc. I've confirmed that 3.16 also doesn't work.
>
>/Bruce

[dpdk-dev] Can not init NIC after merge to DPDK 1.7 problem

2014-09-23 Thread Matthew Hall

On Tue, Sep 23, 2014 at 06:53:57PM +, Wang, Shawn wrote:
> Can someone share some light on what is magic of the dpdk Makefile to 
> correctly register the NIC type?

I had the same problem as a guy who began using it before the auto-reg, 
stopped a while, and began again after.

You have to pass the following GNU LD option:

--whole-archive

Matthew.

[dpdk-dev] LD_PRELOAD libraries for DPDK to run unmodified applications with DPDK?

2014-09-24 Thread Matthew Hall

On Wed, Sep 24, 2014 at 01:10:32PM -0700, Malveeka Tewari wrote:
> 
> There is already a rump-kernel based  TCP/IP stack for DPDK
> https://github.com/rumpkernel/dpdk-rumptcpip/.
...
> But these solutions are again too heavy weight.

Try using this along with https://github.com/rumpkernel/rumprun-posix .

Matthew.

[dpdk-dev] DPDK Demos at IDF conference using DDIO

2014-09-25 Thread Matthew Hall

On Thu, Sep 25, 2014 at 09:11:24AM -0700, Jeff Shaw wrote:
> Intel(R) Data Direct I/O Technology (Intel(R) DDIO) is a feature introduced 
> with the Intel(R) Xeon(R) processor E5 family.
> 
> It has been around for several years and is available at least on all Xeon 
> E5 processors. DDIO is part of the platform, so any DPDK version can take 
> advantage of the feature.  There are several papers and videos available on 
> the Internet that can provide more details.

One difficulty I run into with a lot of these Intel accelerations... each one 
is described as an atomic entity independent of all the other possible 
accelerations. Nobody explains how to take all of them together to make a 
complete high-speend low-latency packet processing solution from L1-L7.

It'd be nice to see an architecture level view of DPDK, along with the 
accelerations one could / should apply at each level, so there's some kind of 
checklist you can follow to be sure you used everything you should where you 
should. Otherwise you'll miss some stuff and waste the features you paid for.

Also, how is DDIO different from the previous DCA accelerations?

Thanks,
Matthew.

[dpdk-dev] DPDK Demos at IDF conference using DDIO

2014-09-25 Thread Matthew Hall

On Thu, Sep 25, 2014 at 07:27:21PM +, Anjali Kulkarni wrote:
> Actually, in the demo that I saw they had probably used as many of the
> accelerations as possible to get the kind of rates they described. Even if
> we could see (a documentation of) what all things they used in this
> particular application, it would help.
> From my discussions, it seemed as if there were some specific lookup APIs
> that they used to get better performance.
> 
> Anjali

Indeed it would be best if this stuff were documented first, then demoed. 
Otherwise it's hard to get reliably reproducible results.

In particular something which went all the way through the processing pipeline 
from Rx to Tx L1-L7.

Matthew.

[dpdk-dev] rc1 / call for review

2014-09-29 Thread Matthew Hall

On Mon, Sep 29, 2014 at 10:23:58PM +0200, Thomas Monjalon wrote:
>   - mbuf rework
>   - logs rework
>   - some eal cleanups

Hi Thomas,

I was curious, did we happen to know if any of these three changes affected 
the external API's much?

It would help us get some idea what to test and where to look, since mbuf, 
logs, and eal are probably the three most popular parts of DPDK for us app 
hackers to interact with regularly.

Thanks,
Matthew.

[dpdk-dev] rc1 / call for review

2014-09-29 Thread Matthew Hall

On Tue, Sep 30, 2014 at 06:52:45AM +0200, Thomas Monjalon wrote:
> You're right.
> During integration time, app hackers should be able to check the git history
> for these API changes.
> When it will be officially released, there will be some notes in the
> documentation to help porting applications.

It works for commercial apps where we have 40 hrs / week to look. But for my 
open source app I guess I just have to do step 1) compile, step 2) pray that 
it still works. ;)

Matthew.

[dpdk-dev] tools brainstorming

2015-04-08 Thread Matthew Hall

On Wed, Apr 08, 2015 at 11:16:03AM -0700, Stephen Hemminger wrote:
> I prefer the file just say that it is BSD or GPL and refer to license files 
> in the package. That way if something has to change it doesn't need a 
> massive license sweep

Hi guys,

I hope we're also enforcing some requirement that all user-space files that 
are expected to be used inside of the address space apps must be BSD, MIT, or 
other license which allows binary redistribution, as part of these standards. 
Or we could end up causing a lot of pain for the app developers if somebody 
puts a bunch of GPL files into the user-space code which blocks their usage.

For the Linux kernel side files, we probably need to say BSD, MIT, or GPLv2 
specifically, and not GPLv3, I think that's what Linus is using, or it could 
be a problem to upstream any of those as DPDK usage grows.

For the BSD kernel side files, if any, probably need to be sure we're 
compatible with at least FreeBSD and NetBSD, and probably also OpenBSD.

Matthew.

[dpdk-dev] cost of reading tsc register

2015-04-20 Thread Matthew Hall

On Mon, Apr 20, 2015 at 02:37:53PM +, Ravi Kumar Iyer wrote:
> We were doing some code optimizations , running DPDK based applications, and 
> chanced upon the rte_rdtsc function [ to read tsc timestamp register value ] 
> consuming cpu cycles of the order of 100clock cycles with a delta of upto 
> 40cycles at times [ 60-140 cycles]
> 
> We are actually building up a cpu intensive application which is also very 
> clock cycle sensitive and this is impacting our implementation.
> 
> To validate the same using a small/vanilla application we wrote a small code 
> and tested on a single core.
> Has anyone else faced a similar issue or are we doing something really 
> atrocious here.

What happened when you tried rte_rdtsc_precise ?

Matthew.

[dpdk-dev] DCA

2015-04-21 Thread Matthew Hall

On Tue, Apr 21, 2015 at 10:27:48AM +0100, Bruce Richardson wrote:
> Can you perhaps comment on the use-case where you find this binding 
> limiting? Modern platforms have multiple NUMA nodes, but they also generally 
> have PCI slots connected to those multiple NUMA nodes also, so that you can 
> have your NIC ports similarly NUMA partitionned?

Hi Bruce,

I was wondering if you have tried to do this on COTS (commerical 
off-the-shelf) hardware before. What I found each time I tried it was that 
PCIe slots are not very evenly distributed across the NUMA nodes unlike what 
you'd expect.

Sometimes the PCIe lanes on CPU 0 get partly used up by Super IO or other 
integrated peripherals. Other times the motherboards give you 2 x8 when you 
needed 1 x16 or they give you a bundh of x4 when you needed x8, etc.

It's actually pretty difficult to find the mapping, for one, and even when you 
do, even harder to get the right slots for your cards and so on. In the ixgbe 
kernel driver you'll sometimes get some cryptic debug prints when it's been 
munged and performance will suffer. But in the ixgbe PMD driver you're on your 
own mostly.

Matthew.

[dpdk-dev] [PATCH] Use pthread_setname APIs

2015-04-22 Thread Matthew Hall

On Wed, Apr 22, 2015 at 03:57:44PM -0700, Stephen Hemminger wrote:
> Since it possible to have multiple DPDK applications in same environment,
> and the thread name size is so limited, I wonder if this is a good idea.

Why not try to opportunistically make the code easier to debug? DPDK is not 
always the easiest thing to debug, but at least it's way better than the 
kernel.

Matthew.

[dpdk-dev] [PATCH] Use pthread_setname APIs

2015-04-22 Thread Matthew Hall

On Wed, Apr 22, 2015 at 05:39:40PM -0700, Stephen Hemminger wrote:
> In our application we already use setname and have a policy for what the
> names look like.
> This won't help

Not everybody does.

[dpdk-dev] Beyond DPDK 2.0

2015-04-24 Thread Matthew Hall

On Fri, Apr 24, 2015 at 12:39:47PM -0500, Jay Rolette wrote:
> I can tell you that if DPDK were GPL-based, my company wouldn't be using
> it. I suspect we wouldn't be the only ones...
> 
> Jay

I could second this, from the past employer where I used it. Right now I am 
using it in an open source app, I have a bit of GPL here and there but I'm 
trying to get rid of it or confine it to separate address spaces, where it 
won't impact the core code written around DPDK, as I don't want to cause 
headaches for any downstream users I attract someday.

Hard-core GPL would not be possible for most. LGPL could be possible, but I 
don't think it could be worth the relicensing headache for that small change.

Instead we should make the patch process as easy as humanly possible so people 
are encouraged to send us the fixes and not cart them around their companies 
constantly.

Perhaps it means having some ReviewBoard type of tools, a clone in Github or 
Bitbucket where the less hardcore kernel-workflow types could send back their 
small bug fixes a bit more easily, this kind of stuff. Google has been getting 
good uptake since they moved most of their open source across to Github, 
because the contribution workflow was more convenient than Google Code was.

Matthew.

[dpdk-dev] What to do about UIO breakage in 2.0

2015-04-27 Thread Matthew Hall

Stephen,

This mail is a bit confusing for end users of DPDK which might be why you 
didn't get many replies yet.

If I understand this mail right, you're saying that nothing works? Or it works, 
but igb_uio doesn't work, and the performance isn't good because MSI-X is not 
working? I am confused what you're saying exactly.

Previously I think we knew we needed to use igb_uio for almost all the 
non-virtual NIC PMDs, and some of the virtual NIC PMDs also, before they would 
load and get access to the PCIe BARs, etc. for the NICs. But now it sounds 
totally changed so I'm not sure what to reply.

Can you give a use case, from the perspective of the guy trying to bootstrap 
EAL / DPDK, what does this problem do to him if he tries it with DPDK 2.X?

Matthew.

On Apr 27, 2015, at 3:06 PM, Stephen Hemminger  
wrote:

> I raised the issue, but people seem to be ignoring that fact that igb_uio
> was broken by the introduction of UIO PCI generic in 2.0.
> 
> There are three options:
> 1. Remove IGB_UIO only use UIO PCI generic.
>Downside there is no MSI-X support for UIO PCI generic.
> 2. Revert UIO PCI generic support
> 3. Replace both of the above with something better.
> 
> I am working on #3 but it will not be ready for 2.0.1 and there
> is no solution for users of 2.0 and any future stable code.

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-27 Thread Matthew Hall

On Apr 27, 2015, at 3:28 PM, Paul Emmerich  wrote:
> Let me know if you need any additional information.
> I'd also be interested in the configuration that resulted in the 20% speed-
> up that was mentioned in the original mbuf patch

Not sure if it's relevant or not, but there was another mail claiming PCIe 
MSI-X wasn't necessarily working in DPDK 2.x. Not sure if that could be causing 
slowdowns when there are drastic volumes of 64-byte packets causing a lot of 
PCI activity.

Also, you are mentioning some specific patches were involved... so I have to 
ask if anybody tried git bisect yet or not. Maybe easier than trying to guess 
at the answer.

Matthew.

[dpdk-dev] Beyond DPDK 2.0

2015-04-27 Thread Matthew Hall

On Apr 25, 2015, at 5:10 AM, Neil Horman  wrote:
> I'm more focused on why that level of participation is not higher

Hi Neal,

This mail is probably way too long, but here is what I saw about participation, 
in my case I used DPDK on two projects so far:

1) proprietary project for a L4-L7 stateful replay DPI firewall performance 
tester at a large network test and measurement corporation using the old 
original 6WIND DPDK 1.X under NDA before it became available to a wider channel

2) open-source self-created threat intelligence sensor project using DPDK 
1.7.X, under development in my spare time

I think the things that making more DPDK contributions are ironically more 
technical and social in nature than legal or bureaucratic in nature as one 
would normally suspect, and as you theorized in your original mail. Let me go 
through some things and see what people think.

At the social level, there are not very many people in the world who are 
familiar and comfortable with the LKML style embedded coding workflow (heavy 
mailing list usage, sending and reviewing patches in emails, putting specific 
people in To or Cc to get patches approved and ACKed in subtrees, etc.) I 
happen to know some if not most of this tribally, because I used Linux since 
1997, but very few developers have any clue about this stuff. However I never 
participated in the actual LKML flow any further than tester / bug reporter, 
and I actually use DPDK very deliberately, to avoid fighting with the headaches 
of the linux-net code and flamewars.

This is why I was proposing that we try to find a way to allow contributions 
via Github or Bitbucket... their fork-and-pull model is much simpler for 
outsiders to comprehend and make quick patches when they find little bugs or 
issues as they integrate with the library... given we are BSD licensed, the low 
barrier to entry is the ultimate way to keep the patch velocity as high as 
possible, and keep the community going.

At the technical level, I see two or three difficulties:

1) A lot of the various performance enhancements one can use are kind of 
"magical" or "jigsaw puzzle" and not presented in a unified way, where I can 
methodically go through a checklist and enable everything my app should use 
even though I have no clue what they all are. For example, 1) let's talk about 
hashing... there is RSS hashing (symmetric or asymmetric), JHASH, CRC hash, ... 
not sure how many different ones. 2) Let's talk about CPU models... SSE, SSE4, 
SSE4.1, SSE4.2, etc. I don't know what I have myself, much less what my users 
will have, much less what I actually need or should use, without guidance from 
some processor people. 3) Let's talk about PCIe bus... there is DCA, some other 
non-DCA acceleration that's faster if you are on the same NUMA node as the PCIe 
slot, but fails to work if you aren't... etc.

A lot of the people from Intel, 6WIND, and the kernel people are just thinking 
"this stuff is obvious... we've used it since 200X and it's 2015!" That's true 
if you are a processor / kernel hacker... but if you spent your whole career on 
packet processing or network security like a lot of us app developers might 
have done, that's very orthogonal to Intel-specific and compiler-specific and 
hardware-specific performance hacks... so a lot of us have no real clue how to 
configure and test them all, much less enhance them and make some patches to 
it. We just blindly trust 6WIND and Intel to get it right, because as far as we 
can see, all the DPDK code is pretty clean and readable, and we're pretty sure 
we don't know anything better than what's already coded and put into the 
repository, if we don't have some checklists to follow to enable and test every 
combination, and find any more bugfixes to suggest.

2) A lot of the network adapters DPDK uses, especially when you begin using the 
more crazy accelerations, are either hard to obtain or expensive... for example 
from what I saw in my jobs, the 10 gigabit boards were a minimum ~$250 USD in 
manufacturing quantities. The 2-port gigabit latest-gen Intel board I got was 
$100 USD... I think a lot of higher-ed students and overseas people from less 
developed nations might have a hard time paying for some of this stuff to start 
hacking... some kind of program to get these sort of people some sample gear 
might help.

I also hit difficulties with the virtio-net driver... it doesn't work with the 
virtio-net adapter in Virtualbox... which makes it harder for me to use cool, 
convenient environments configured by the tool Vagrant, to make very simple dev 
VM environments to quickly get new hackers up to speed on my open source 
running just a command or two, and by extension harder for me to show them why 
they should use DPDK for all their cool new network ${GADGET}s.

The same difficulty comes into place if I wanted to do some performance 
patches... I don't have the money to buy the VTune profiler for my spare time 
project, that you would

[dpdk-dev] [PATCH] eal/linux: fix negative value for undetermined numa_node

2015-07-31 Thread Matthew Hall

I asked about this many months ago and was informed that "-1" is a "standard 
error value" that I should expect from these APIs when NUMA is not present. 
Now we're saying I have to change my code again to handle a zero value?

Also not sure how to tell the difference between no NUMA, something running on 
socket zero, and something with multiple sockets. Seems like we need a bit of 
thought about how the NUMA APIs should behave overall.

Matthew.

On Fri, Jul 31, 2015 at 09:36:12AM +0800, Cunming Liang wrote:
> The patch sets zero as the default value of pci device numa_node
> if the socket could not be determined.
> It provides the same default value as FreeBSD which has no NUMA support,
> and makes the return value of rte_eth_dev_socket_id() be consistent
> with the API description.
> 
> Signed-off-by: Cunming Liang

[dpdk-dev] LPM6 next hop size

2016-09-19 Thread Matthew Hall

On Mon, Sep 19, 2016 at 11:18:48PM +0200, Nikita Kozlov wrote:
> I have submitted a patch that, among other things, increase this size.
> But it needs some reviews: http://dpdk.org/dev/patchwork/patch/15295/

A whole ton of us submitted patches to fix LPM lately.

But fixing LPM6 was deleted from the roadmap and never added in a future 
release.

So we don't get any upstream support to put the patches together and add it in 
the official release.

Matthew.

[dpdk-dev] LPM6 next hop size

2016-09-21 Thread Matthew Hall

On Tue, Sep 20, 2016 at 10:11:04PM +0200, Thomas Monjalon wrote:
> Please, will you help reviewing this patch?

Sure.

1. It adds a dependency on libbsd on Linux: bsd/sys/tree.h. Is this an 
expected dependency of DPDK already? I use it in my code but not sure it's 
expected for everybody else.

2. Some of the tabbing of the new code seems not right. Probably need to 
recheck the indentation settings.

3. The comment formatting of the new code is a bit odd. Continued lines in /* 
... */ in DPDK should have * on them I think.

4. It also contains some vendor specific comments like "/* Vyatta change to 
use red-black tree */". That's probably not needed if it's going into normal 
DPDK.

5. It uses "malloc" instead of standard DPDK allocators. That's bad for me 
because I don't want to use libc malloc in my code. Only DPDK allocators and 
jemalloc.

6. I don't see any updates to the unit test stuff. Don't we need new tests for 
the changes to deletion and re-insertion of rules and the changes in the tbl8 
allocation.

7. Some of us previous submitted code to expand LPM4 and LPM6 to 24 bit next 
hop but the current code is only doing 16 bit. This is not pleasant for me 
because I really need the 24-bit support. When can we make this happen?

Matthew.

[dpdk-dev] LPM6 next hop size

2016-09-21 Thread Matthew Hall

On Wed, Sep 21, 2016 at 04:42:05PM -0700, Stephen Hemminger wrote:
> This was intentional because rte_malloc comes out of huge page area and that
> resource is a critical resource. It could use rte_malloc() but that makes it
> more likely to break when doing Policy Based routing or VRF.

Can we get more clarity on why PBR or VRF would break it? The performance and 
fragmentation of the default glibc allocator are quite bad. So I am trying to 
avoid it in my app for example.

Matthew.

[dpdk-dev] LPM6 next hop size

2016-09-22 Thread Matthew Hall

On Wed, Sep 21, 2016 at 10:07:46PM -0700, Stephen Hemminger wrote:
> If you have 2G of huge memory and one 16M routes then the rules start to
> kill an application.
> Since huge memory is unpageable (pinned) then it is limited.

Won't paging out routes result in very poor network performance?

[dpdk-dev] [PATCH] eal/linux: fix negative value for undetermined numa_node

2015-08-02 Thread Matthew Hall

On Mon, Aug 03, 2015 at 09:46:54AM +0800, Liang, Cunming wrote:
> According to the API definition, if the socket could not be determined, a
> default of zero will take.
> The '-1' is returned when the port_id value is out of range.

Yes, but when I asked the exact same question and was told the documentation 
was wrong not the -1 return value.

> To your concern, "difference between no NUMA, something running on socket
> zero, and something with multiple sockets.".
> The latter two belongs to the same situation, that is the numa_node stores
> the NUMA id.
> So in fact the concern is about using '-1' or '0' when there's no NUMA
> detect.
> If we won't plan to redefine the API return value, the fix patch is
> reasonable.
> 
> Btw, if it returns '-1' when no NUMA is detected, what will you do, do
> condition check '-1' and then use node 0 instead ?
> In that way, you can't distinguish '-'1 is out of range port_id error or no
> NUMA detection error.

I asked that question also, and the answer I got was to use node 0 instead.

> If it is, why not follow the API definition.

Sure, if nobody objects like the last time I asked. But this will change the 
user behavior as I am looking for the -1 now.

> /Steve

Matthew.

[dpdk-dev] 2.3 Roadmap

2015-12-01 Thread Matthew Hall

On Mon, Nov 30, 2015 at 08:50:58PM +, O'Driscoll, Tim wrote:
> Increase Next Hops for LPM (IPv4): The number of next hops for IPv4 LPM is 
> currently limited to 256. This will be extended to allow a greater number of 
> next hops.

In other threads, we previously proposed doing increased LPM4 *and* LPM6.

Having incompatible support between both is a huge headache for me.

And I already contributed patches to fix the issue in both versions.

Thanks,
Matthew

[dpdk-dev] 2.3 Roadmap

2015-12-01 Thread Matthew Hall

On Tue, Dec 01, 2015 at 11:58:16AM +, Bruce Richardson wrote:
> Hi,
> 
> that is indeed very similar to what we are thinking ourselves. Is there any of
> what you have already done that you could contribute publically to save us
> duplicating some of your effort? [The one big difference, is that we are not
> thinking of enabling kni permanently for each port, as the ethtool support is
> only present for a couple of NIC types, and solving that is a separate 
> issue.:-)]
> 
> /Bruce

Personally I was looking at something a bit different because I wanted an 
ability to support lightning fast BPF expressions for security purposes, not 
just debugging captures.

I got hold of a copy of the bpfjit implementation, with some tweaks to support 
compiling on Linux and BSD in userspace mode, from Alexander Nasonov who made 
it for the BSD kernel, as a result of participating here.

I am planning to use this to do the captures so you don't incur the headache 
or performance issues with rte_kni.

I am curious how I might be able to link it up w/ the standard libpcap based 
tools to get an end-to-end solution with minimal loss.

Matthew.

[dpdk-dev] 2.3 Roadmap

2015-12-01 Thread Matthew Hall

On Tue, Dec 01, 2015 at 01:16:47PM +, O'Driscoll, Tim wrote:
> True. The goal is to merge the best of the various patches that were 
> submitted on this. This could involve changes to IPv6 as well as IPv4.
> 
> 
> Tim

If it's possible to fix IPv6 as well this would be good for me. Offering a 
large nexthop space on all protocols is very good for BGP / core routing and 
security inspection applications. Using this feature, I will be able to detect 
interactions with bad subnets and IPs at tremendous speed compared to 
competing solutions.

Matthew.

[dpdk-dev] 2.3 Roadmap

2015-12-01 Thread Matthew Hall

On Tue, Dec 01, 2015 at 09:45:56AM -0500, Kyle Larose wrote:
> Earlier Stephen mentioned using the named pipe behaviour of tcpdump.
> Is there an opportunity to take what you have mentioned here and marry
> it to the named pipe output to get the perf you need?

I am wondering about the same thing. But I didn't want to limit the scope of 
solutions too much so I didn't specifically enumerate this possibility.

Matthew.

[dpdk-dev] 2.3 Roadmap

2015-12-01 Thread Matthew Hall

On Tue, Dec 01, 2015 at 10:31:02AM -0500, Aaron Conole wrote:
> The benefit is no dependancy on kernel modules (just TUN/TAP support). I 
> don't have a way of signaling sampling, so right now, it's just drinking 
> from the firehose.

This is actually quite a good idea. Many years ago I coded up a simple 
connector between DPDK and TAP devices for use with some legacy applications 
that did not support DPDK.

I could definitely connect the output of user-space bpfjit to a TAP device 
quite easily.

I am somewhat less clear on how to connect tcpdump or other standard libpcap 
based entities up, so that one could change the capture filters or other 
settings from outside the DPDK application. I am hoping some of the network 
API experts can comment on this since I'm just a security specialist.

How are you letting people configure the capture filter in this scenario?

Matthew.

[dpdk-dev] 2.3 Roadmap

2015-12-01 Thread Matthew Hall

On Tue, Dec 01, 2015 at 01:57:39PM +, Bruce Richardson wrote:
> Hi Matthew,
> 
> Couple of follow-up questions on this:
> * do you need the exact same number of bits in both implementations? If we 
> support
> 21 bits of data in IPv6 and 24 in IPv4 is that an issue compared to supporting
> 21 bits just in both for compatibility.
> * related to this - how much data are you looking to store in the tables?
> 
> Thanks,
> /Bruce

Let me provide some more detailed high level examples of some security use 
cases so we could consider what makes sense.

1) Spamhaus provides a list of approximately 800 CIDR blocks which are so 
bad that they recommend null-routing them as widely as possible:

https://www.spamhaus.org/drop/
https://www.spamhaus.org/drop/drop.txt
https://www.spamhaus.org/drop/edrop.txt

In the old implementation I couldn't even fit all of those, and doing 
something like this seems to be a must-have feature for security.

2) Team Cymru provides lists of Bogons for IPv4 and IPv6. In IPv4, there are 
3600 bogon CIDR blocks because many things are in-use. But the IPv6 table has 
65000 CIDR blocks, because it is larger, newer, and more sparse.

http://www.team-cymru.org/Services/Bogons/fullbogons-ipv4.txt
http://www.team-cymru.org/Services/Bogons/fullbogons-ipv6.txt

Being able to monitor these would be another must-have for security and is 
quite popular for core routing from what I have heard.

3) At any given time, through various methods, I am aware of around 350,000 to 
2.5 million recent bad IP addresses. Technically single entries could be 
matched using rte_hash. But it is quite common in the security world, to look 
at the number of bad IPs in a class C, and then flag the entire subnet as 
suspect if more than a few bad IPs are present there.

Some support for some level of this is a must-have for security and firewall 
use cases.

4) Of course, it goes without saying that fitting the contents of the entire 
Internet BGP prefix list for IPv4 and IPv6 is a must-have for core routing 
although less needed for security. I am not an expert in this. Some very basic 
statistics I located with a quick search suggest one needs about 600,000 
prefixes (presumably for IPv4). It would help if some router experts could 
clarify it and help me know what the story is for IPv6.

http://www.cidr-report.org/as2.0/#General_Status

5) Considering all of the above, it seems like 22 or 23 unsigned lookup bits 
are required (4194304 or 8388608 entries) if you want comprehensive bad IP 
detection. And probably 21 unsigned bits for basic security support. But that 
would not necessarily leave a whole lot of headroom depending on the details.

Matthew.

[dpdk-dev] 2.3 Roadmap

2015-12-01 Thread Matthew Hall

On Wed, Dec 02, 2015 at 01:38:07AM +, Wiles, Keith wrote:
> In Pktgen I used tap interface to wireshark and that worked very nicely the 
> only problem is it was slow :-(
> 
> Having a tap PMD would be nice to be able to remove that code from Pktgen.

All these approaches we discussed so far have a serious architectural issue. 
The entire point of BPF / libpcap was to prevent crossing unnecessary system 
boundaries with an arbitrarily, unsustainably large volume of unfiltered 
packets which will be tossed out anyways thereafter as they are irrelevant to 
a particular debugging objective.

In the past it was the kernel -> user boundary.

In our case it is the Data Plane -> Management Plane boundary.

If we don't use something similar to libpcap offline mode (which I am using 
presently in my code) or preferably the user-space bpfjit (which I am working 
up to using eventually), it's going to be more or less impossible for this to 
work properly and not blow stuff up with anything close to wirespeed traffic 
going through the links being monitored. Especially with 10, 40, 100, ad 
nauseam, gigabit links.

With the classic BPF / libpcap, it's absolutely possible to get it to work, 
without causing a big performance problem, or causing a huge packet dump 
meltdown, or any other issues in the process. We need to find a way to achieve 
the same objective in our new environment as well.

One possible option, if some kernel guys could assist with figuring out the 
trick, would be if we could capture the BPF ioctl / syscall / whatever it 
secretly does on the TAP or KNI interface, when it passes the capture filter 
to the kernel, and steal the filter for use in pcap offline or userspace 
bpfjit inside of DPDK. Then supply only the packets meeting the filter back 
onto said interface.

Matthew.

[dpdk-dev] 2.3 Roadmap

2015-12-02 Thread Matthew Hall

On Wed, Dec 02, 2015 at 12:35:16PM +, Bruce Richardson wrote:
> Hi Matthew,
> 
> thanks for the info, but I'm not sure I understand it correctly. It seems to
> me that you are mostly referring to the depths/sizes of the tables being used,
> rather than to the "data-size" being stored in each entry, which was actually
> what I was asking about. Is that correct? If so, it seems that - looking 
> initially
> at IPv4 LPM only - you are more looking for an increase in the number of 
> tbl8's
> for lookup, rather than necessarily an increase the 8-bit user data being 
> stored
> with each entry. [And assuming similar interest for v6] Am I right in 
> thinking this?
> 
> Thanks,
> /Bruce

This question is a result of a different way of looking at things between 
routing / networking and security. I actually need to increase the size of 
user data as I did in my patches.

1. There is an assumption, when LPM is used for routing, that many millions of 
inputs might map to a smaller number of outputs.

2. This assumption is not true in the security ecosystem. If I have several 
million CIDR blocks and bad IPs, I need a separate user data value output for 
each value input.

This is because, every time I have a bad IP, CIDR, Domain, URL, or Email, I 
create a security indicator tracking struct for each one of these. In the IP 
and CIDR case I find the struct using rte_hash (possibly for single IPs) and 
rte_lpm.

For Domain, URL, and Email, rte_hash cannot be used, because it mis-assumes 
all inputs are equal-length. So I had to use a different hash table.

4. The struct contains things such as a unique 64-bit unsigned integer for 
each separate IP or CIDR triggered, to allow looking up contextual data about 
the threat it represents. These IDs are defined by upstream threat databases, 
so I can't crunch them down to fit inside rte_lpm. They also include stats 
regarding how many times an indicator is seen, what kind of security threat it 
represents, etc. Without which you can't do any valuable security enrichment 
needed to respond to any events generated.

5. This means, if I want to support X million security indicators, regardless 
if they are IP, CIDR, Domain, URL, or Email, then I need X million distinct 
user data values to look up all the context that goes with them.

Matthew.

[dpdk-dev] librte_power w/ intel_pstate cpufreq governor

2015-12-05 Thread Matthew Hall

Hello all,

I wanted to ask some questions about librte_power and the great adaptive
polling / IRQ mode example in l3fwd-power.

I am very interested in getting this to work in my project because it will
make it much friendlier to attract new community developers if I am as
cooperative as possible with system resources.

Let's discuss the init process for a moment. It has some problems on my
system, and I need some help to figure out how to handle this right.

1. Begins with the call to rte_power_init.

2. Attempts to init ACPI cpufreq mode.

2.1. Sets lcore cpufreq governor to userspace mode.

2.2. Function power_get_available_freqs checks lcore CPU frequencies from:

/sys/devices/system/cpu/cpuX/cpufreq/scaling_available_frequencies

2.3. This fails with (cryptic) error "POWER: ERR: File not openned". I am
planning to write a patch for this error a bit later.

My kernel is using the intel_pstate driver, so scaling_available_frequencies
does not exist:

http://askubuntu.com/questions/544266/why-are-missing-the-frequency-options-on-cpufreq-utils-indicator

3. When power_get_available_freqs fails, rte_power_acpi_cpufreq_init fails.

4. rte_power_init will try rte_power_kvm_vm_init. That will fail because it's
a physical Skylake system not some kind of VM.

5. Now rte_power_init totally fails, with error "POWER: ERR: Unable to set
Power Management Environment for lcore 0".

So, I have a couple of questions to figure out from here:

1. It seems bad to switch the governor into userspace before verifying the
frequencies available in scaling_available_frequencies. If there are no
frequencies available, it seems like it should not be trying to take over
control of an effectively uncontrollable value.

2. If the governor is switched to userspace, and then no governing is done, it
seems like the clockrate will necessarily always be wrong also because nothing
will be configuring it anymore, neither kernel, nor failed DPDK userspace
code, since rte_power_freq_up / down function pointers will always be NULL. Is
this true? This seems bad if so.

It seems that the librte_power code is basically out of date, as pstate has
been present since Sandy Bridge, which is quite old by now for network
processing. I am not sure how to make this work right now. So far I see a
couple options but I really don't know much about this stuff:

1) skip rte_power_init completely, and let intel_pstate handle it using HWP
mode

2) disable intel_pstate, switch to the legacy ACPI cpufreq (but people warned
this old driver is mostly a no-op and the CPU ignores its frequency requests).

The Internet advice says it's possible, but not a very good idea, to switch
from the modern intel_pstate driver to the legacy ACPI mode. Reading through
the kernel docs (below) state that it's better to use HWP (Hardware P State)
mode:

https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt

If none of this rte_power_init stuff works, are the other CPU conservation
measures inside the l3fwd-power example enough to work right with HWP all by
themselves with nothing additional?

Thanks,
Matthew.

[dpdk-dev] tcpdump support in DPDK 2.3

2015-12-14 Thread Matthew Hall

FYI your last name comes in as a corrupt character for me. You might have to 
think about converting it from ISO 8859-1 / 8859-15 to UTF-8.

On Mon, Dec 14, 2015 at 10:57:10AM +0100, Morten B wrote:
> Check out the new "extcap" feature of Wireshark. It uses named pipes for the 
> packets, already mentioned by Stephen Hemminger.

I looked at it a bit. I wasn't 100% clear if there is a way to pass down the 
BPF expression for compilation and usage inside the DPDK application.

> Tcpdump is an open source application, so it should be possible to define an 
> efficient interface between DPDK and tcpdump, and implement it in both DPDK 
> and tcpdump. The same goes for libpcap.

Easier said than done. A whole ton of libpcap assumes it's talking to a very 
specific kernel interface, and the code is quite complicated.

> It possibly also has a secondary feature: passing a BPF program 
> from tcpdump/libpcap to DPDK, so packets can be filtered in DPDK and don't 
> need to be passed on to tcpdump/libpcap.

If we can figure out how to get this feature to work in extcap, I think that 
will be the winning solution by far.

> [A]dd a BPF library (librte_bpf) to DPDK, preferably with a compiler. The 
> application initially calls the library's BPF compiler function once with 
> the BPF program to compile it, and in the fast path the application calls a 
> library function that takes an mbuf and the compiled BPF program and returns 
> an integer value indicating how many bytes of the packet should be mirrored 
> by the capturing application. +1 to Matthew Hall for taking this direction!

Yes, performance wise I think this is the only way that will really work 100% 
of the time. Otherwise I think we end up in the very bad situation where the 
guy who tries to make a capture of a single flow for debugging on i40e ends up 
crashing his system or dropping all his traffic when the capture system 
unhelpfully redirects a storm of unfiltered traffic outside of DPDK to KNI or 
some pipe devices or another place it does not belong.

There is one complexity though... the list of BPF filters should probably be a 
linked list, where they get added and removed, or you can't do > 1 filter at a 
time. I know how to code some of this stuff but I only work on DPDK in my 
spare time so I don't have the cycles to do all of the work.

> The pcap file format contains a header in front of each packet, which is 
> extremely simple. But it has a timestamp (which uses 32 bit for tv_sec and 
> tv_usec in files), so it needs to be considered how to handle this 
> efficiently.

I already wrote some C code for generating the original pcap format files a 
while ago which I think could be donated. For the timestamps to work at 
highest efficiency we'd need to run an rte_timer every X microseconds that 
updates a global volatile copy of tv_sec and tv_usec.

Or make some code that calculates the offset of rte_rdtsc from 01 January 1970 
00:00:00 UTC and uses TSC value to generate the right tv_sec and tv_usec would 
also work fine.

Matthew.

[dpdk-dev] tcpdump support in DPDK 2.3

2015-12-14 Thread Matthew Hall

On Mon, Dec 14, 2015 at 11:14:42AM -0800, Stephen Hemminger wrote:
> There are already several BPF libraries available. I would prefer DPDK not
> start copying existing code.

I didn't copy or reduplicate any code. I was planning to use bpfjit from Alex 
Nasonov, but a userspace version instead of the kernel one. If somebody makes 
an shlib version of course I could use that instead. But I didn't hear of one 
yet.

Matthew.

[dpdk-dev] tcpdump support in DPDK 2.3

2015-12-14 Thread Matthew Hall

On Mon, Dec 14, 2015 at 02:17:12PM -0500, Aaron Conole wrote:
> Why not just use libpcap to write out pcap files? I bet it does a better
> job that any of us will ;) It's BSD licensed, so there should be no
> issues with linking against it (DPDK currently does for the pcap PMD), and
> it supports both pcap and pcap-ng (although -ng support may not be 100%,
> I expect it will get better).

It doesn't do things such as scatter-gather vector IO. So it causes a lot more 
system calls than needed. It's an issue if you are doing I40E and such. But I 
don't really care so much how it works.

> The current option is to start up with a pcap PMD configured, capture to a 
> file for a bit, then stop. I think the issues being discussed are what other 
> options to give the user. Then again, I may have my signals crossed 
> somewhere.

For me I think it's very important to make something that works even with 
tremendous load, not causing tons of writes and syscalls on packets that match 
no filters and are not even wanted. None of the solutions I saw so far could 
do this except bpfjit combined with extcap.

Matthew.

[dpdk-dev] tcpdump support in DPDK 2.3

2015-12-14 Thread Matthew Hall

On Mon, Dec 14, 2015 at 04:29:41PM -0500, Kyle Larose wrote:
> I've seen lots of ideas and options tossed around which would solve
> some or all of the above items, but nobody actually committing to
> anything. What can we do to actually agree on a solution to go and
> implement? I'm relatively new to the community, so I don't really know
> how this stuff works. Do people typically form a working group where
> they go off and discuss the problem, and then come back to the main
> community with a proposal? Or do people just submit RFCs independently
> with their own ideas?
> 
> Thanks,
> Kyle

I am getting the impression of a misplaced sense of urgency / panic. I don't 
think anybody came up with a reason why we have to answer all these questions 
tremendously quickly. It will take some more time, particularly with the 
holidays, for the developers to finish the last bug fixes on the current 
release before they have time to discuss 2.3 features.

When that happens, someone working on DPDK full time will be identified as the 
leader for the feature, that will lead the effort on PCAP, and help us 
formulate the plan. Until then, what we really could use at this point is not 
necessarily more writings and speculation, but an answer on some key tech 
questions, particularly from some kernel guys:

1) How do we get the pcap filter string and/or BPF opcode vector from libpcap 
/ tcpdump / tshark / wireshark, into the DPDK application? There we can 
compile it using the user-space bpfjit, so we can filter the packets at very 
high speeds and not end up breaking everything doing a ton of stupid copies 
when somebody does a capture of one flow on his i40e device or such. libpcap 
is crappy about this, as it sends it all over syscalls which are always 
assuming the kernel is on the other end, which is a bad assumption on their 
part but many decades old and not so easy to fix.

2) How do we get the matched packets back out to the extcap or libpcap? From 
what I saw extcap is tshark / wireshark only, which are 1) GPL licensed in 
various ways, 2) not as widely used as libpcap. So using only extcap might be 
kind of crappy.

3) For libpcap to work, maybe it will help if some of our kernel guys can help 
us find out how to "detect" the kernel put a BPF capture filter onto a TUN / 
TAP interface, and copy that filter to the DPDK app. Then, take any matched 
packets and write them back onto the TUN / TAP. This would also be super 
efficient and work with more off-the-shelf tools besides just tshark / 
wireshark.

If we don't find the answers for these items I don't think we have a path to a 
working solution, forgetting about all the nice-to-have points such as UX 
issues, troubleshooting, debugging, etc.

Matthew.

[dpdk-dev] tcpdump support in DPDK 2.3

2015-12-16 Thread Matthew Hall

On Wed, Dec 16, 2015 at 11:56:11AM +, Bruce Richardson wrote:
> Having this work with any application is one of our primary targets here. 
> The app author should not have to worry too much about getting basic debug 
> support. Even if it doesn't work at 40G small packet rates, you can get a 
> lot of benefit from a scheme that provides functional debugging for an app. 

I think my issue is that I don't think I buy into this particular set of 
assumptions above.

I don't think a capture mechanism that doesn't work right in the real use 
cases of the apps actually buys us much. If all we care about is quickly 
dumping some frames to a pcap for occasional debugging, I already have some C 
code for that I can donate which is a lot less complicated than the trouble 
being proposed for "basic debug support". Or we could use libpcap's 
equivalent... but it's quite a lot more complicated than the code I have.

If we're going to assign engineers to this it's costing somebody a lot of time 
and money. So I'd prefer to get them focused on something that will always 
work even with high loads, such as real bpfjit support.

Matthew.

[dpdk-dev] tcpdump support in DPDK 2.3

2015-12-16 Thread Matthew Hall

On Wed, Dec 16, 2015 at 11:45:46PM +0100, Morten Br?rup wrote:
> Matthew presented a very important point a few hours ago: We don't need 
> tcpdump support for debugging the application in a lab; we already have 
> plenty of other tools for debugging what we are developing. We need tcpdump 
> support for debugging network issues in a production network.

+1

> In my "hardened network appliance" world, a solution designed purely for 
> legacy applications (tcpdump, Wireshark etc.) is useless because the network 
> technician doesn't have access to these applications on the appliance.

Maybe that's true on one exact system. But I've used a whole ton of systems 
including appliances where this was not true. I really do want to find a way 
to support them, but according to my recent discussions w/ Alex Nasonov who 
made bpfjit, I don't think it is possible without really tearing apart 
libpcap. So for now the only good hope is Wireshark's Extcap support.

> While a PC system running a DPDK based application might have plenty of 
> spare lcores for filtering, the SmartShare appliances are already using all 
> lcores for dedicated purposes, so the runtime filtering has to be done by 
> the IO lcores (otherwise we would have to rehash everything and reallocate 
> some lcores for mirroring, which I strongly oppose). Our non-DPDK firmware 
> has also always been filtering directly in the fast path.

The shared process stuff and weird leftover lcore stuff seems way too complex 
for me whether or not there are any spare lcores. To me it seems easier if I 
just call some function and hand it mbufs, and it would quickly check them 
against a linked list of active filters if filters are present, or do nothing 
and return if no filter is active.

> If the filter is so complex that it unexpectedly degrades the normal traffic 
> forwarding performance

If bpfjit is used, I think it is very hard to affect the performance much. 
Unless you do something incredibly crazy.

> Although it is generally considered bad design if a system's behavior (or 
> performance) changes unexpectedly when debugging features are being used, 

I think we can keep the behavior change quite small using something like what 
I described.

> Other companies might prefer to keep their fast path performance unaffected 
> and dedicate/reallocate some lcores for filtering.

It always starts out unaffected... then goes back to accepting a bit of 
slowness when people are forced to re-learn how bad it is with no debugging. I 
have seen it again and again in many companies. Hence my proposal for 
efficient lightweight debugging support from the beginning.

> 1. BPF filtering (... a DPDK variant of bpfjit),

+1

> 2. scalable packet queueing for the mirrored packets (probably multi 
> producer, single or multi consumer)

I hate queueing. Queueing always reduces max possible throughput because 
queueing is inefficient. It is better just to put them where they need to go 
immediately (run to completion) while the mbufs are already prefetched.

> Then the DPDK application can take care of interfacing to 
> the attached application and outputting the mirrored packets to the 
> appropriate destination

Too complicated. Pcap and extcap should be working by default.

> A note about packet ordering: Mirrored packets belonging to different flows 
> are probably out of order because of RSS, where multiple lcores contribute 
> to the mirror output.

Where I worry is weird configurations where a flow can occur in >1 cores. But 
I think most users try not to do this.

[dpdk-dev] tcpdump support in DPDK 2.3

2015-12-21 Thread Matthew Hall

On Mon, Dec 21, 2015 at 04:17:26PM +, Gray, Mark D wrote:
> Is tcpdump used in large production cloud environments? I would have 
> thought other less intrusive (and less manual) tools would be used? Isn't
> that one of the benefits of SDN.

tcpdump, tshark, wireshark, libpcap, etc. have been used every single place I 
ever worked, including in production under heavy load.

This is because nobody wants to redo the library of many tens of thousands of 
hours of protocol dissectors.

This is also why I am trying to point out what is required to get a solution 
that I am confident will really work when people are counting on it, which I 
am concerned the current proposals do not cover.

Matthew.

[dpdk-dev] Why nothing since 1.8.0?

2015-01-15 Thread Matthew Hall

On Thu, Jan 15, 2015 at 09:55:00PM +, O'driscoll, Tim wrote:
> As you said, there's a balance to be struck, and too many subtrees may 
> become unmanageable. With respect to your concern about developers having to 
> potentially develop patches against multiple subtrees, this has never been 
> raised as a concern by any of our development team. Is there any historical 
> data on the number of changes that would fall into this category so we can 
> see if it's a real problem or not?

Hi Tim,

What happens when a core API like rte_mbuf gets some changes, and you have to 
update the PMD's to fit?

Do I have to make 10-20 odd random patches to separate PMD maintainers instead 
of one set of patches to the PMD subtree?

To me it doesn't sound very nice for the guys maintaining the core. Given most 
of the changes seem to be mbuf or eal this seems like a scaling issue to me.

But maybe I misunderstood the process.

Matthew.

[dpdk-dev] Why nothing since 1.8.0?

2015-01-16 Thread Matthew Hall

On Fri, Jan 16, 2015 at 07:18:19PM +0100, Thomas Monjalon wrote:
> I'd like to try solving the review challenge first and see what else can be 
> done after that. Step by step.

FWIW, I know the kernel guys seem to really love it, but not everybody else 
has much fun trying to do the reviews reading huge patch emails. I lose a lot 
of context trying to stare at them in mutt 80x25 console etc. It would be nice 
if we could have a visual interface with syntax highlighting and comment 
capabilities, that's easier to read through quickly and clearly, like 
ReviewBoard, GitHub Pull Request UI, etc. If it had email integration to reply 
to the patch threads that'd be great too.

Also if we had some branches available where conceptually related changes are 
grouped, somebody could check out the branch with some feature they wanted to 
try, get all the related patches, integrate with their app of choice, and see 
if the app works successfully with the new feature.

Some of these things like DPDK, it isn't obvious how the feature will help or 
hurt, until you write some code against it and/or benchmark it first, because 
some of these features are kind of complicated.

Another thing... if we had some kind of wiki page, where some of the backend 
coders could mark themselves as maintainers of all the different features they 
work on, and more client-side network stack guys like me could express 
interest in certain features, we could connect the two sides so any given guy 
knows who can review his bugfix he found, or try out his new patchset to see 
if it works well in an app.

Matthew.

[dpdk-dev] Why nothing since 1.8.0?

2015-01-16 Thread Matthew Hall

On Fri, Jan 16, 2015 at 03:00:57PM -0500, Neil Horman wrote:
> Like Gerrit:
> https://code.google.com/p/gerrit/

Maybe we could work on setting up a community copy? I'd prefer if we could 
avoid n=1 and make our community as strong as possible.

Matthew.

[dpdk-dev] Why nothing since 1.8.0?

2015-01-16 Thread Matthew Hall

On Fri, Jan 16, 2015 at 04:14:38PM -0500, Neil Horman wrote:
> Sure, Its a bit orthogonal to this conversation, but I think its a fine idea 
> to
> have if it aids in general reviewing.  Thomas can it be setup on dpdk.org?
> 
> Neil

Admittedly I'm not a PMD expert to comment on all the specifics of what you 
said about subtrees. But I do like to think out of the box and look at the big 
picture of what people have to say in the various threads. Your points about 
making the community strong seemed important, so I thought about the various 
subproblems to solve the whole topic:

1) some logical subtrees as you advised,

2) a clarification of who runs the subtrees and who does the -next mergeups,

3) feature branches or some other way for end users to validate new related 
functionality all together as a kind of integration testing,

4) a MAINTAINERS file, and maybe a TESTERS file, or tester / end-user entries 
in MAINTAINERS,

5) a really good friend way to review the new code like Gerrit as you advised

I think if we attack all of these we should be in good shape as the community 
continues to grow and mature.

Matthew.

[dpdk-dev] some questions about rte_memcpy

2015-01-21 Thread Matthew Hall

On Thu, Jan 22, 2015 at 11:39:11AM +0800, Linhaifeng wrote:
> Why call memcpy when n is constant variable?

One theory. Many DPDK functions crash if they are called before rte_eal_init() 
is called. So perhaps this could be a cause, since that won't have been called 
when working on a constant?

Matthew.

[dpdk-dev] some questions about rte_memcpy

2015-01-21 Thread Matthew Hall

On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
> Do you mean if call rte_memcpy before rte_eal_init() would crash?why?

No guarantee. But a theory. It might use some things from the EAL init to 
figure out which version of the accelerated algorithm to use.

Matthew.

[dpdk-dev] [PATCH v2 00/24] Single virtio implementation

2015-01-26 Thread Matthew Hall

On Tue, Jan 27, 2015 at 10:35:40AM +0800, Ouyang Changchun wrote:
> This is the patch set for single virtio implementation.
>  
> Why we need single virtio?
> 
> As we know currently there are at least 3 virtio PMD driver implementations:
> A) lib/librte_pmd_virtio(refer as virtio A);
> B) virtio_net_pmd by 6wind(refer as virtio B);
> C) virtio by Brocade/vyatta(refer as virtio C);
>  
> Integrating 3 implementations into one could reduce the maintaining cost and 
> time,
> in other hand, user don't need practice their application on 3 variant one by 
> one to see
> which one is the best for them;

Thank you so much for this, using virtio drivers in DPDK has been messy and 
unpleasant in the past, and you clearly wrote a lot of nice new code to help 
improve it all.

Previously I'd reported a bug, where all RTE virtio drivers I tried (A and B, 
because I did not know C existed), failed to work with the virtio-net 
interfaces exposed in VirtualBox, due to various strange errors, and they all 
only worked with the virtio-net interfaces from qemu.

I wanted to find out if we managed to fix this other problem, because I would 
really like to use the Vagrant VM deployment tool (https://www.vagrantup.com/) 
to distribute my open-source DPDK based application to everyone in the 
open source community.

The better the out-of-box experience of practical community-created DPDK-based 
real-life example applications similar to mine, the more adoption of DPDK and 
better DPDK community we will be able to have as time marches forward.

If we could manage to get it to work in VirtualBox, then I could surely help 
do some app-level testing on the new code, if we could see it in a test branch 
or test repo somewhere I could access it.

Sincerely,
Matthew Hall

[dpdk-dev] [PATCH v2 00/24] Single virtio implementation

2015-01-27 Thread Matthew Hall

On Tue, Jan 27, 2015 at 03:42:00AM +, Wiles, Keith wrote:
> There is an app note on how to get DPDK working in VirtualBox, it is a bit
> bumpy on getting it work.
> Here is the link: 
> http://plvision.eu/blog/deploying-intel-dpdk-in-oracle-virtualbox/
> 
> I have not tried it, but it was suggested to me it should work. It will be
> nice if the new driver works better :-)

I already used a derivative of these directions... "cheated" and used the igb 
driver like they did. Unlike them I automated the entire process, including 
updating the base OS to latest kernel and recompiling against it, as well as 
auto-enabling the NICs, the SSE instruction sets, etc. etc.

However their directions use an IGB NIC not a virtio-net NIC which would be 
much better for performance and resource consumption. So I really would be 
very very happy if we had a virtio-net which worked properly with both qemu 
and VirtualBox.

Matthew.

[dpdk-dev] [PATCH v2 00/24] Single virtio implementation

2015-01-27 Thread Matthew Hall

On Tue, Jan 27, 2015 at 10:02:24AM +, Stephen Hemminger wrote:
> On Mon, 26 Jan 2015 19:06:12 -0800
> Matthew Hall  wrote:
> 
> > Thank you so much for this, using virtio drivers in DPDK has been messy and 
> > unpleasant in the past, and you clearly wrote a lot of nice new code to 
> > help 
> > improve it all.
> > 
> > Previously I'd reported a bug, where all RTE virtio drivers I tried (A and 
> > B, 
> > because I did not know C existed), failed to work with the virtio-net 
> > interfaces exposed in VirtualBox, due to various strange errors, and they 
> > all 
> > only worked with the virtio-net interfaces from qemu.
> 
> I suspect a problem with features required (and not supported by VirtualBox).
> Build driver with debug enabled and send the log please.

Hi Stephen,

Here is everything that happened when I tried it before.

http://dpdk.org/ml/archives/dev/2014-October/006623.html

Matthew.

[dpdk-dev] RTM instruction compile failure for XABORT when AVX is active

2015-06-30 Thread Matthew Hall

On Jun 29, 2015, at 3:19 AM, Thomas Monjalon  
wrote:
> There is no such bug with my compiler:
>   clang version 3.6.1 (tags/RELEASE_361/final)
>   Target: x86_64-unknown-linux-gnu
> 
> Matthew, which version are you using?

Hi Thomas and Roman,

It seems to happen if I have set -mavx in CFLAGS with clang 1:3.4-1ubuntu3.

I get a different issue that only shows up at runtime in clang 
3.6.2-svn240577-1~exp1:

ERROR: This system does not support "FSGSBASE".
Please check that RTE_MACHINE is set correctly.

It appears I probably need to learn how to do a better job on my EXTRA_CFLAGS. 
Do we have some recommendations what should be used on the different Intel CPUs 
to avoid build issues but still get the best performance? This would help a lot.

Matthew.

[dpdk-dev] RTM instruction compile failure for XABORT when AVX is active

2015-06-30 Thread Matthew Hall

To be a bit more specific, this is what I had to do to fix it for clang 3.6 SVN 
snapshot release.

I am not sure if there is a better way of handling this situation. I'd love to 
know where I could improve it.

Matthew.

diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
index f595cd0..8c883ee 100644
--- a/mk/rte.cpuflags.mk
+++ b/mk/rte.cpuflags.mk
@@ -77,13 +77,13 @@ ifneq ($(filter $(AUTO_CPUFLAGS),__RDRND__),)
 CPUFLAGS += RDRAND
 endif

-ifneq ($(filter $(AUTO_CPUFLAGS),__FSGSBASE__),)
-CPUFLAGS += FSGSBASE
-endif
+#ifneq ($(filter $(AUTO_CPUFLAGS),__FSGSBASE__),)
+#CPUFLAGS += FSGSBASE
+#endif

-ifneq ($(filter $(AUTO_CPUFLAGS),__F16C__),)
-CPUFLAGS += F16C
-endif
+#ifneq ($(filter $(AUTO_CPUFLAGS),__F16C__),)
+#CPUFLAGS += F16C
+#endif

 ifneq ($(filter $(AUTO_CPUFLAGS),__AVX2__),)
 CPUFLAGS += AVX2

[dpdk-dev] RTM instruction compile failure for XABORT when AVX is active

2015-06-30 Thread Matthew Hall

With those two items commented out, and these CFLAGS:

"-g -O0 -fPIC -msse4.2"

it looks like I can reproduce the issue in clang 2.6 series:

/vagrant/external/dpdk/build/include/rte_rtm.h:56:15: error: invalid operand 
for inline asm constraint 'i'
asm volatile(".byte 0xc6,0xf8,%P0" :: "i" (status) : "memory");

So there are definitely some corner cases that seem to be able to trigger it.

On Jun 30, 2015, at 10:17 PM, Matthew Hall  wrote:

> To be a bit more specific, this is what I had to do to fix it for clang 3.6 
> SVN snapshot release.
> 
> I am not sure if there is a better way of handling this situation. I'd love 
> to know where I could improve it.
> 
> Matthew.
> 
> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
> index f595cd0..8c883ee 100644
> --- a/mk/rte.cpuflags.mk
> +++ b/mk/rte.cpuflags.mk
> @@ -77,13 +77,13 @@ ifneq ($(filter $(AUTO_CPUFLAGS),__RDRND__),)
> CPUFLAGS += RDRAND
> endif
> 
> -ifneq ($(filter $(AUTO_CPUFLAGS),__FSGSBASE__),)
> -CPUFLAGS += FSGSBASE
> -endif
> +#ifneq ($(filter $(AUTO_CPUFLAGS),__FSGSBASE__),)
> +#CPUFLAGS += FSGSBASE
> +#endif
> 
> -ifneq ($(filter $(AUTO_CPUFLAGS),__F16C__),)
> -CPUFLAGS += F16C
> -endif
> +#ifneq ($(filter $(AUTO_CPUFLAGS),__F16C__),)
> +#CPUFLAGS += F16C
> +#endif
> 
> ifneq ($(filter $(AUTO_CPUFLAGS),__AVX2__),)
> CPUFLAGS += AVX2

[dpdk-dev] rte_lpm4 with expanded next hop support now available

2015-06-30 Thread Matthew Hall

Hello,

Based on the wonderful assistance from Vladimir and Stephen and a close friend 
of mine that is a hypervisor developer who helped me reverse engineer and 
rewrite rte_lpm_lookupx4, I have got a known-working version of rte_lpm4 with 
expanded 24 bit next hop support available here:

https://github.com/megahall/dpdk_mhall/tree/megahall/lpm-expansion

I'm going to be working on rte_lpm6 next, it seems to take a whole ton of 
memory to run the self-test, if anybody knows how much that would help, as it 
seems to run out when I tried it.

Sadly this change is not ABI compatible or performance compatible with the 
original rte_lpm because I had to hack on the bitwise layout to get more data 
in there, and it will run maybe 50% slower because it has to access some more 
memory.

Despite all this I'd really like to do the right thing find a way to contribute 
it back, perhaps as a second kind of rte_lpm, so I wouldn't be the only person 
using it and forking the code when I already met several others who needed it. 
I could use some ideas how to handle the situation.

Matthew.

[dpdk-dev] RTM instruction compile failure for XABORT when AVX is active

2015-07-01 Thread Matthew Hall

Previously, with the -msse4.2 flag removed, the build failed for a different 
reason.

I can retry without it and see if it's the case in the new DPDK.

On Jul 1, 2015, at 4:10 AM, Bruce Richardson  
wrote:

> On Tue, Jun 30, 2015 at 10:49:26PM -0700, Matthew Hall wrote:
>> With those two items commented out, and these CFLAGS:
>> 
>> "-g -O0 -fPIC -msse4.2"
>> 
> 
> The recommended way of specifying a particular instruction set is via the
> RTE_MACHINE setting in your build time config. Can you perhaps reproduce the
> issue using a setting there?
> 
> /Bruce

[dpdk-dev] rte_lpm4 with expanded next hop support now available

2015-07-01 Thread Matthew Hall

On Jul 1, 2015, at 4:20 AM, Bruce Richardson  
wrote:
> Could you maybe send a patch (or set) with all your changes in it here for us
> to look at? [I did look at it in github, but I'm not very familiar with github
> and the changes seem to be spread over a whole series of commits]

Here is a view of the specific commits:

https://github.com/megahall/dpdk_mhall/compare/megahall/lpm-expansion

I'll work on emails when I get a moment. I was hoping since the branch is open 
to all for download someone could sync it and try it in an environment that has 
some kind of performance tests / known results for the self-tests as my 
development setup is not that great compared to some of the other DPDK 
engineers out there.

> In terms of ABI issues, the overall function set for lpm4 library is not that
> big, so it may be possible to maintain old a new copies of the functions in 
> parallel
> for one release, and solve the ABI issues that way. I'm quite keen to get 
> these
> changes in, since I think being limited to 255 next hops is quite a limitation
> for many cases.

Sounds good.

> A final interesting suggestion I might throw out, is: can we make the lpm 
> library
> configurable in that it can use either 8-bit, 16/24 bit or even pointer based
> next hops (I won't say 64-bit, as for pointers we might be able to get away
> with less than 64-bits being stored)? Would such a thing be useful to people?

I think this could be pretty nice, the tricky part is that, at least in the 
version Vladimir and Stephen helped me cook up, a lot of bitfield trickery was 
involved. So we'd need to switch away from bitfields to something a bit more 
flexible or easy to work with when variable configuration comes into the 
picture. Also not sure how it'd work at runtime versus compilation, etc. You 
guys know more than me about this stuff I think.

Matthew.

[dpdk-dev] DPDK Hash library

2015-07-02 Thread Matthew Hall

On Jul 2, 2015, at 4:20 AM, Dumitrescu, Cristian  wrote:
> I am wondering how can I use the hash library if I don't know the number
> of entries in the bucket (number of entries in the bucket can grow
> dynamically)
> I am trying to use the DPDK hash library for MAC table where I can't give
> the fixed number of elements in each bucket.

Another thing to keep in mind. DPDK hashes have an extremely large number of 
shallow buckets and pretty good spreading hash functions, This is for 
performance reasons to minimize the amount of cacheline loads. The chances of 
evicting buckets in this sort of hash table aren't really all that high so 
worrying about this might be overkill.

If you want a good quality hash table for weird stuff like variable-length 
keys, deeper buckets to guarantee items are preserved, etc., what I ended up 
doing was combining uthash with jemalloc in my app (you could also use 
rte_malloc but I didn't want to complicate the app too much until I get it 
feature complete and begin tuning it).

https://troydhanson.github.io/uthash/
https://github.com/troydhanson/uthash

http://www.canonware.com/jemalloc/
https://github.com/jemalloc/jemalloc

Matthew.

[dpdk-dev] DPDK Hash library

2015-07-02 Thread Matthew Hall

On Thu, Jul 02, 2015 at 05:55:20PM +, De Lara Guarch, Pablo wrote:
> You are probably talking about extendable buckets here.
> The downsize of that approach is that you have to allocate memory on the fly,
> whereas with the cuckoo hash implementation, the entry can be stored in an 
> alternative bucket
> without having to reserve more memory (which also will take you more time).
> With this approach, hash tables can get a higher utilization, as other less 
> used
> buckets can be used to store keys from other busier buckets.
> 
> Pablo

Expanding and shrinking buckets constantly can also be concurrency-hostile, 
and is a lot more complicated to get right than just using a good rehash 
algorithm and a nice static hunk of memory on contiguous hugepages for minimal 
TLB / cache pressure.

If you want to do these more complex manipulations uthash is probably a better 
route. But it will be slower than the DPDK hashes by quite a ways I think. I 
used DPDK hash for my TCP socket table where everything is a very predictable 
size, but I had to use uthash for my unpredictably sized byte buffers for 
security indicators (IP, URL, Domain, Email, File Hash, etc.)

Of course, when you do this kind of stuff in your app it is going to give you 
scaling problems and you'll have to spend a lot of time tuning it.

Matthew.

1 2 3 4 >

1 - 100 of 316 matches

Mail list logo