[dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library (librte_eal/librte_ether link bonding support changes)

2014-06-16 Thread Neil Horman
On Mon, Jun 16, 2014 at 04:17:03PM +, Richardson, Bruce wrote:
> 
> 
> > -Original Message-
> <...snip...>
> > > this doesn't seem like an idea solution either. I'm not 100% clear why
> > > rte_eal_pci_probe is currently called by  the application code and not 
> > > initiated
> > >  from within rte_eal_dev_init, if this was the case we would be able to 
> > > figure
> > out
> > > a dependency tree for initialization of devices, based on the parsed input
> > > arguments without having extra user input to or multiple calls of
> > rte_eal_dev_init.
> > >
> > I agree, I think you should move it into its proper place within 
> > rte_eal_init,
> > though I think multiple calls to rte_eal_dev_init is perfectly acceptible in
> > this scenario.  You certainly could parse the command line to build a 
> > dependency
> > tree if you wish, though I don't think thats crruently overly complex. The 
> > current
> > dependency tree is static at (physical devices, virtual devices, probe, 
> > stacked
> > devices).  If you need something more complex later, you can re-write it 
> > freely
> > as it will be an internal library mechanism with no external visibility.
> > 
> 
> How useful is this going to be? The more complicated the ethdev stacking 
> being done, the better suited it is to being done via the C APIs, via code 
> which can do analysis of the hardware and cores at runtime and make logical 
> decisions? After all, even if we do have applications being run to use bonded 
> devices at runtime in place of physical ones, the bonded devices don't 
> actually replace the physical ones, so some other application logic is needed 
> to ensure that the application knows to only use the bonded devices. In most 
> cases simply having a port-mask suffices, but not all apps will have a 
> port-mask parameter, and for those that do, it introduces complexity for the 
> user counting out interfaces and trying to work out what ports will have what 
> numbers to set the appropriate bits in the portmask.

Honestly?  I don't know.  As Stephen indicates the command line options might
not be overly used, as its not always the best interface to select when building
an application.  But by the same token, not everyone is building an application
that needs dynamic configuration with DPDK.  My main concern here is one of
consistency, which is really what people look for in a package within a
distribution (i.e. Fedora, what I'm doing with the dpdk right now). You're
probably correct in that lots of people will use the C api to build bonded
interfaces since its a new api and they won't have been using bonds yet.  But
I'm concerned that, with a distributed package, lots of people might also be
porting legacy applications to use the DPDK, in which case they may very well
want to create static configurations within the dpdk to lower their porting
efforts.  Those people may well be very turned off by the fact that some, but
not all interfaces can use the command line to be configured.

In the end, its all about consistency in my mind.  I get that the --vdev command
line parameter perhaps isn't the most useful interface available, but its whats
there.  And if you start creating PMD's that use separate configuration
interfaces and abandon the ones before it, you'll wind up with a hodgepodge of
apis, all of which an application will have to be aware of to provide a full
robust feature set.

If --vdev stays, we should make it work for all the PMD's.  Most might not use
it, but some will, and those that do will appreciate the consistency you
provide.  If it just doesn't make sense to keep it around anymore, lets drop it
and replace it with something better.  Perhaps a configuration file interface
would be good, so that the initialization of the pmds and the creation of
interfaces is separated from the acutal application (thats actually a good idea
I think, as it implies that all an application needs to know about is
interfaces, not how they are constructed or stacked).  But lets not just quietly
start abandoning stuff because its inconvienient.  

Neil



[dpdk-dev] [PATCH] vfio: make container open error non-fatal

2014-06-17 Thread Neil Horman
On Mon, Jun 16, 2014 at 10:30:54PM +, Richardson, Bruce wrote:
> The below patch is the quickest fix I found to make my applications work 
> again, but I'm not sure it's the best solution. Can anyone else offer other 
> suggestions to improve this?
> 
> > -Original Message-
> > From: Richardson, Bruce
> > Sent: Monday, June 16, 2014 3:29 PM
> > To: dev at dpdk.org
> > Cc: Richardson, Bruce
> > Subject: [PATCH] vfio: make container open error non-fatal
> > 
> > When setting up an app to run using the uio driver, errors caused by
> > VFIO failures should not abruptly cause the app to fail.
> > 
> > Example: on a board with 8 ports bound to igb_uio module, and no VFIO
> > configuration, a testpmd run currently fails with:
> > 
> > EAL:   cannot open VFIO container!
> > EAL:   :04:00.0 cannot open VFIO container!
> > EAL: Error - exiting with code: 1
> >   Cause: Requested device :04:00.0 cannot be used
> > 
> > With this patch applied, the problem with VFIO is ignored and testpmd
> > successfully starts up - with ignored errors with vfio - as below:
> > 
> > EAL: PCI device :04:00.0 on NUMA socket 0
> > EAL:   probe driver: 8086:1521 rte_igb_pmd
> > EAL:   unknown IOMMU driver!
> > EAL:   :04:00.0 cannot open VFIO container!
> > EAL:   :04:00.0 not managed by UIO driver, skipping
> > <...scan results for other ports skipped...>
> > EAL: PCI device :8e:00.0 on NUMA socket 1
> > EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> > EAL:   unknown IOMMU driver!
> > EAL:   :8e:00.0 cannot open VFIO container!
> > EAL:   PCI memory mapped at 0x7ff4ff5fa000
> > EAL:   PCI memory mapped at 0x7ff4ff5f6000
> > EAL: PCI device :8e:00.1 on NUMA socket 1
> > EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> > EAL:   unknown IOMMU driver!
> > EAL:   :8e:00.1 cannot open VFIO container!
> > EAL:   PCI memory mapped at 0x7ff4ff4f6000
> > EAL:   PCI memory mapped at 0x7ff4ff4f2000
> > Interactive-mode selected
> > Configuring Port 0 (socket 0)
> > <...other 7 ports ...>
> > Checking link statuses...
> > Port 0 Link Up - speed 1 Mbps - full-duplex
> > Port 1 Link Down
> > Port 2 Link Up - speed 1 Mbps - full-duplex
> > Port 3 Link Down
> > Port 4 Link Up - speed 1 Mbps - full-duplex
> > Port 5 Link Down
> > Port 6 Link Up - speed 1 Mbps - full-duplex
> > Port 7 Link Down
> > Done
> > testpmd>
> > 
> > This issue is introduced by the VFIO patch set addition, specifically
> > commit ff0b67d1.
> > 
> > Signed-off-by: Bruce Richardson 
> > ---
> >  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > index 4de6061..4af38f6 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > @@ -528,7 +528,7 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
> > int vfio_container_fd = pci_vfio_get_container_fd();
> > if (vfio_container_fd < 0) {
> > RTE_LOG(ERR, EAL, "  %s cannot open VFIO
> > container!\n", pci_addr);
> > -   return -1;
> > +   return 1;
> > }
> > 
> > vfio_cfg.vfio_container_fd = vfio_container_fd;
> > --
> > 1.9.3
> 
> 

I think it would be preferable to convert the pci_vfio_get_container_fd function
to return not -1, but some -ERRNO value, so that the caller can differentiate
between fatal and non-fatal errors (for instance, not having any vfio container
seems non-fatal, but having one with an incompatible api version may be
unworkable.

Neil



[dpdk-dev] vfio detection

2014-06-17 Thread Neil Horman
On Tue, Jun 17, 2014 at 04:38:38PM +, Richardson, Bruce wrote:
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Richardson, Bruce
> > Sent: Tuesday, June 17, 2014 9:29 AM
> > To: Burakov, Anatoly; dev at dpdk.org
> > Subject: Re: [dpdk-dev] vfio detection
> > 
> > > -Original Message-
> > > From: Burakov, Anatoly
> > > Sent: Tuesday, June 17, 2014 1:40 AM
> > > To: Richardson, Bruce; dev at dpdk.org
> > > Subject: RE: vfio detection
> > >
> > > Hi Bruce,
> > >
> > > > I have a number of NIC ports which were working correctly yesterday and
> > are
> > > > bound correctly to the igb_uio driver - and I want to keep using them
> > > > through the igb_uio driver for now, not vfio. However, whenever I run a
> > > > dpdk application today, I find that the vfio kernel module is getting 
> > > > loaded
> > > > each time - even after I manually remove it, and verify that it has been
> > > > removed by checking lsmod. Is this expected? If so, why are we loading 
> > > > the
> > > > vfio driver when I just want to continue using igb_uio which works fine?
> > >
> > > Can you elaborate a bit on what do you mean by "loading vfio driver"? Do 
> > > you
> > > mean the vfio-pci kernel gets loaded by DPDK? I certainly didn't put in 
> > > any code
> > > that would automatically load that driver, and certainly not binding 
> > > devices to
> > it.
> > 
> > The kernel module called just "vfio" is constantly getting reloaded, and 
> > there is
> > always a "/dev/vfio" directory, which triggers the vfio code handling every 
> > time I
> > run dpdk.
> > 
> > >
> > > > Secondly, then, when testpmd or any other app loads, it automatically 
> > > > tries
> > > > to map the NIC using vfio and then aborts on the very first NIC port 
> > > > when it
> > > > fails to do so.
> > >
> > > This shouldn't happen, unless you have a device bound to VFIO and have
> > another
> > > device in the same IOMMU group that is bound to something else. Can you
> > > provide a log of what you are seeing?
> > 
> > Log of testpmd run attached.
> 
> Log got stripped from mail, including below instead.
> 
> Script started on Tue 17 Jun 2014 17:23:54 IST
> 
> bruce at silpixa00372841:dpdk.org$ ./tools/dpdk_nic_bind.py --status
> 
> Network devices using DPDK-compatible driver
> 
> :84:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> :87:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> :8b:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> :8e:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> 
> Network devices using kernel driver
> ===
> :04:00.0 'I350 Gigabit Network Connection' if=em0 drv=igb unused=igb_uio 
> *Active*
> :04:00.1 'I350 Gigabit Network Connection' if=ens2f1 drv=igb 
> unused=igb_uio 
> :04:00.2 'I350 Gigabit Network Connection' if=ens2f2 drv=igb 
> unused=igb_uio 
> :04:00.3 'I350 Gigabit Network Connection' if=ens2f3 drv=igb 
> unused=igb_uio 
> 
> Other network devices
> =
> :0a:00.1 'DH8900CC Null Device' unused=igb_uio
> :0b:00.1 'DH8900CC Null Device' unused=igb_uio
> :0c:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' 
> unused=ixgbe,igb_uio
> :0c:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' 
> unused=ixgbe,igb_uio
> :84:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> :87:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> :8b:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> :8e:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> 
> 
> bruce at silpixa00372841:dpdk.org$ sudo 
> ./x86_64-native-linuxapp-gcc/app/testpmd -c F00 -n 4 -- -i
> EAL: Detected lcore 0 as core 0 on socket 0
> EAL: Detected lcore 1 as core 1 on socket 0
> EAL: Detected lcore 2 as core 2 on socket 0
> EAL: Detected lcore 3 as core 3 on socket 0
> EAL: Detected lcore 4 as core 4 on socket 0
> EAL: Detected lcore 5 as core 5 on socket 0
> EAL: Detected lcore 6 as core 6 on socket 0
> EAL: Detected lcore 7 as core 7 on socket 0
> EAL: Detected lcore 8 as core 0 on socket 1
> EAL: Detected lcore 9 as core 1 on socket 1
> EAL: Detected lcore 10 as core 2 on socket 1
> EAL: Detected lcore 11 as core 3 on socket 1
> EAL: Detected lcore 12 as core 4 on socket 1
> EAL: Detected lcore 13 as core 5 on socket 1
> EAL: Detected lcore 14 as core 6 on socket 1
> EAL: Detected lcore 15 as core 7 on socket 1
> EAL: Detected lcore 16 as core 0 on socket 0
> EAL: Detected lcore 17 as core 1 on socket 0
> EAL: Detected lcore 18 as core 2 on socket 0
> EAL: Detected lcore 19 as core 3 on socket 0
> EAL: Detected lcore 20 as core 4 on socket 0
> EAL: Detected lcore 21 as core 5 on socket 0
> EAL: Detected lcore 22 as core 6 on socket 0
> EAL: Detected lcore 23 as core 7 on socket 0
> EAL: Detected lcore 24 as core 0 on socket 1
> EAL: Detected lcore 25 as cor

[dpdk-dev] vfio detection

2014-06-17 Thread Neil Horman
On Tue, Jun 17, 2014 at 05:45:55PM +, Richardson, Bruce wrote:
> 
> 
> > -Original Message-
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Tuesday, June 17, 2014 10:42 AM
> > To: Richardson, Bruce
> > Cc: Burakov, Anatoly; dev at dpdk.org
> > Subject: Re: [dpdk-dev] vfio detection
> > 
> > On Tue, Jun 17, 2014 at 04:38:38PM +, Richardson, Bruce wrote:
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Richardson, 
> > > > Bruce
> > > > Sent: Tuesday, June 17, 2014 9:29 AM
> > > > To: Burakov, Anatoly; dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] vfio detection
> > > >
> > > > > -Original Message-
> > > > > From: Burakov, Anatoly
> > > > > Sent: Tuesday, June 17, 2014 1:40 AM
> > > > > To: Richardson, Bruce; dev at dpdk.org
> > > > > Subject: RE: vfio detection
> > > > >
> > > > > Hi Bruce,
> > > > >
> > > > > > I have a number of NIC ports which were working correctly yesterday 
> > > > > > and
> > > > are
> > > > > > bound correctly to the igb_uio driver - and I want to keep using 
> > > > > > them
> > > > > > through the igb_uio driver for now, not vfio. However, whenever I 
> > > > > > run a
> > > > > > dpdk application today, I find that the vfio kernel module is 
> > > > > > getting
> > loaded
> > > > > > each time - even after I manually remove it, and verify that it has 
> > > > > > been
> > > > > > removed by checking lsmod. Is this expected? If so, why are we 
> > > > > > loading
> > the
> > > > > > vfio driver when I just want to continue using igb_uio which works 
> > > > > > fine?
> > > > >
> > > > > Can you elaborate a bit on what do you mean by "loading vfio driver"? 
> > > > > Do
> > you
> > > > > mean the vfio-pci kernel gets loaded by DPDK? I certainly didn't put 
> > > > > in any
> > code
> > > > > that would automatically load that driver, and certainly not binding 
> > > > > devices
> > to
> > > > it.
> > > >
> > > > The kernel module called just "vfio" is constantly getting reloaded, 
> > > > and there
> > is
> > > > always a "/dev/vfio" directory, which triggers the vfio code handling 
> > > > every
> > time I
> > > > run dpdk.
> > > >
> > > > >
> > > > > > Secondly, then, when testpmd or any other app loads, it 
> > > > > > automatically
> > tries
> > > > > > to map the NIC using vfio and then aborts on the very first NIC 
> > > > > > port when
> > it
> > > > > > fails to do so.
> > > > >
> > > > > This shouldn't happen, unless you have a device bound to VFIO and have
> > > > another
> > > > > device in the same IOMMU group that is bound to something else. Can 
> > > > > you
> > > > > provide a log of what you are seeing?
> > > >
> > > > Log of testpmd run attached.
> > >
> > > Log got stripped from mail, including below instead.
> > >
> > > Script started on Tue 17 Jun 2014 17:23:54 IST
> > >
> > > bruce at silpixa00372841:dpdk.org$ ./tools/dpdk_nic_bind.py --status
> > >
> > > Network devices using DPDK-compatible driver
> > > 
> > > :84:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> > > :87:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> > > :8b:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> > > :8e:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> > >
> > > Network devices using kernel driver
> > > ===
> > > :04:00.0 'I350 Gigabit Network Connection' if=em0 drv=igb
> > unused=igb_uio *Active*
> > > :04:00.1 'I350 Gigabit Network Connection' if=ens2f1 drv=igb
> > unused=igb_uio
> > > :04:00.2 'I350 Gigabit Network Connection' if=ens2f2 drv=igb
> >

[dpdk-dev] [PATCH] vfio: correct system call error checking

2014-06-17 Thread Neil Horman
Noticed today that ioctl error code return checking was incorrect in some of the
vfio code.  ioctl can return a negative value if the system detects an error
before the target device/driver can produce a return code.  The dpdk vfio code
only checks specfically for the values that it expects, which leaves it open to
accepting unexpected error codes as success.  For instance, if the vfio layer
noted that the iommu driver hadn't finished registering yet, it would return an
-EINVAL error code, but the dpdk would accept that as success, becuase it wasn't
0.

Fix this to specifically check for < 0 error codes

Signed-off-by: Neil Horman 
CC: Thomas Monjalon 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 4de6061..65aa8ad 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -319,16 +319,16 @@ pci_vfio_get_container_fd(void)

/* check VFIO API version */
ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
-   if (ret != VFIO_API_VERSION) {
-   RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
+   if ((ret < 0) || (ret != VFIO_API_VERSION)) {
+   RTE_LOG(ERR, EAL, "  unknown VFIO API version! errno = 
%d\n", errno);
close(vfio_container_fd);
return -1;
}

/* check if we support IOMMU type 1 */
ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION, 
VFIO_TYPE1_IOMMU);
-   if (!ret) {
-   RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
+   if (ret <= 0) {
+   RTE_LOG(ERR, EAL, "  unknown IOMMU driver! errno = 
%d\n", errno);
close(vfio_container_fd);
return -1;
}
-- 
1.8.3.1



[dpdk-dev] [PATCH] vfio: correct system call error checking

2014-06-17 Thread Neil Horman
On Tue, Jun 17, 2014 at 08:21:29PM +, Richardson, Bruce wrote:
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> > Sent: Tuesday, June 17, 2014 12:04 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] vfio: correct system call error checking
> > 
> > Noticed today that ioctl error code return checking was incorrect in some 
> > of the
> > vfio code.  ioctl can return a negative value if the system detects an error
> > before the target device/driver can produce a return code.  The dpdk vfio 
> > code
> > only checks specfically for the values that it expects, which leaves it 
> > open to
> > accepting unexpected error codes as success.  For instance, if the vfio 
> > layer
> > noted that the iommu driver hadn't finished registering yet, it would 
> > return an
> > -EINVAL error code, but the dpdk would accept that as success, becuase it
> > wasn't
> > 0.
> > 
> > Fix this to specifically check for < 0 error codes
> > 
> > Signed-off-by: Neil Horman 
> > CC: Thomas Monjalon 
> > ---
> >  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > index 4de6061..65aa8ad 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > @@ -319,16 +319,16 @@ pci_vfio_get_container_fd(void)
> > 
> > /* check VFIO API version */
> > ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
> > -   if (ret != VFIO_API_VERSION) {
> > -   RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
> > +   if ((ret < 0) || (ret != VFIO_API_VERSION)) {
> > +   RTE_LOG(ERR, EAL, "  unknown VFIO API version! errno
> > = %d\n", errno);
> > close(vfio_container_fd);
> > return -1;
> > }
> 
> Not sure how this change improves things, since the existing check will 
> already trigger an error on all values <0. Can you please clarify why you 
> think this needs to be changed?
Ah, my bad, the ret < 0 is superfulous, as the != already catches it, but the
log message change is valuable in that it differentiates bad API version
detection from other system errors.  I can respin that if you like.
Neil

> 
> > 
> > /* check if we support IOMMU type 1 */
> > ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION,
> > VFIO_TYPE1_IOMMU);
> > -   if (!ret) {
> > -   RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
> > +   if (ret <= 0) {
> > +   RTE_LOG(ERR, EAL, "  unknown IOMMU driver! errno =
> > %d\n", errno);
> > close(vfio_container_fd);
> > return -1;
> > }
> 
> Ack on this change part. The previously code was incorrect according to what 
> I read in the docs for VFIO.
> 
> /Bruce
> 


[dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init

2014-06-18 Thread Neil Horman
On Wed, Jun 18, 2014 at 10:26:08AM +, Burakov, Anatoly wrote:
> Hi Cristian,
> 
> > I would suggest we add a log message explaining which mechanism is loaded
> > (igb_uio/vfio) and why (e.g. tried vfio first but container could not be
> > opened, so falling back to igb_uio, etc).
> 
> This already happens.
> 
> If the container could not be loaded for whatever reason, the log message is 
> displayed (it did before, but the previous code didn't account for situations 
> such as Bruce was having, hence the patch). If VFIO is loaded and enabled, 
> all drivers will then report either being bound or skipped (e.g. "not bound 
> to VFIO, skipping").

I think what Thomas wants is for you to resend the patch with a proper changelog
entry added to it, so the commit has an explination of what was changed and, for
posterity.
Neil

> 
> Best regards,
> Anatoly Burakov
> DPDK SW Engineer
> 
> 
> 
> 


[dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init

2014-06-18 Thread Neil Horman
On Wed, Jun 18, 2014 at 11:02:23AM +, Burakov, Anatoly wrote:
> Hi Neil
> 
> > I think what Thomas wants is for you to resend the patch with a proper
> > changelog entry added to it, so the commit has an explination of what was
> > changed and, for posterity.
> 
> Got it. Can I also incorporate your changes to error codes as well?
> 
If you want to take Bruce's suggestions and incoproate them as well from my
thread below, sure, as long as Thomas is ok with it.
Neil

> Best regards,
> Anatoly Burakov
> DPDK SW Engineer
> 


[dpdk-dev] [PATCH v2 0/2] Fix issues with VFIO

2014-06-18 Thread Neil Horman
On Wed, Jun 18, 2014 at 02:07:17PM +0100, Anatoly Burakov wrote:
> This patchset fixes an issue with VFIO where DPDK initialization could
> fail even if the user didn't want to use VFIO in the first place. Also,
> more verbose and descriptive error messages were added to VFIO code, for
> example distinguishing between a failed ioctl() call and an unsupported
> VFIO API version.
> 
> Anatoly Burakov (2):
>   vfio: open VFIO container at startup rather than during init
>   vfio: more verbose error messages
> 

You still need a changelog entry with each patch (the cover letter doesn't get
included with the patches in git).  You also probably want to mention that the
second patch also fixes a case in which syscall errors are erroneously treated
as a success case

Neil

>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 63 
> --
>  1 file changed, 34 insertions(+), 29 deletions(-)
> 
> -- 
> 1.8.1.4
> 
> 


[dpdk-dev] [PATCH v5 0/6] Link Bonding Library

2014-06-18 Thread Neil Horman
On Wed, Jun 18, 2014 at 05:14:17PM +0100, Declan Doherty wrote:
> This patch contains the initial release of the Link Bonding PMD Library
> 
> Supporting bonding modes:
>  0 - Round Robin
>  1 - Active Backup
>  2 - Balance (Supporting 3 transmission polices)
>   layer 2, layer 2+3, layer 3+4
>  3 - Broadcast
> 
> Version 5 of patch set:
> Contains changes to EAL code to allow initialisation of Bonded devices from 
> application startup options. rte_eal_init now calls rte_eal_pci_probe 
> between calling rte_eal_dev_init with PRE and POST PCI probe flags. This gets
> around polluting the eal pci code with references to link bonding devices.
> Also rte_eal_pci_probe can now be called multiple times and will not try to
> re-initialize the driver if one already exists, this means that existing
> applications which currently call rte_eal_pci_probe will not be affected
> by this change
> 
> 
> Patch Set Description:
>  0001 - librte_pmd_bond + makefile changes
>  0002 - librte_ether changes to support unique naming of pmds 
>  0003 - librte_eal changes to support bonding device intialization
>  0005 - link bonding unti test suite
>  0005 - testpmd link bonding support changes
>  0006 - doxygen additions
> 
> 
> Declan Doherty (6):
>   Link Bonding Library (lib/librte_pmd_bond)
>   Support for unique interface naming of pmds
>   EAL support for link bonding device initialization
>   Link bonding Unit Tests
>   testpmd link bonding additions
>   Link Bonding Library doxygen additions
> 
>  app/test-pmd/cmdline.c  |  579 
>  app/test-pmd/config.c   |4 +-
>  app/test-pmd/parameters.c   |3 +
>  app/test-pmd/testpmd.c  |   40 +-
>  app/test-pmd/testpmd.h  |2 +
>  app/test/Makefile   |4 +-
>  app/test/commands.c |7 +
>  app/test/packet_burst_generator.c   |  287 ++
>  app/test/packet_burst_generator.h   |   78 +
>  app/test/test.h |1 +
>  app/test/test_link_bonding.c| 3958 
> +++
>  app/test/virtual_pmd.c  |  574 
>  app/test/virtual_pmd.h  |   74 +
>  config/common_bsdapp|5 +
>  config/common_linuxapp  |5 +
>  doc/doxy-api-index.md   |1 +
>  doc/doxy-api.conf   |1 +
>  lib/Makefile|1 +
>  lib/librte_eal/bsdapp/eal/eal.c |   10 +-
>  lib/librte_eal/common/eal_common_dev.c  |   58 +-
>  lib/librte_eal/common/eal_common_pci.c  |3 +
>  lib/librte_eal/common/include/eal_private.h |7 -
>  lib/librte_eal/common/include/rte_dev.h |   13 +-
>  lib/librte_eal/linuxapp/eal/eal.c   |   11 +-
>  lib/librte_ether/rte_ethdev.c   |   32 +-
>  lib/librte_ether/rte_ethdev.h   |7 +-
>  lib/librte_pmd_bond/Makefile|   32 +
>  lib/librte_pmd_bond/rte_eth_bond.c  | 2148 +++
>  lib/librte_pmd_bond/rte_eth_bond.h  |  255 ++
>  lib/librte_pmd_pcap/rte_eth_pcap.c  |   22 +-
>  lib/librte_pmd_ring/rte_eth_ring.c  |   32 +-
>  lib/librte_pmd_ring/rte_eth_ring.h  |3 +-
>  lib/librte_pmd_xenvirt/rte_eth_xenvirt.c|2 +-
>  mk/rte.app.mk   |5 +
>  34 files changed, 8193 insertions(+), 71 deletions(-)
>  create mode 100644 app/test/packet_burst_generator.c
>  create mode 100644 app/test/packet_burst_generator.h
>  create mode 100644 app/test/test_link_bonding.c
>  create mode 100644 app/test/virtual_pmd.c
>  create mode 100644 app/test/virtual_pmd.h
>  create mode 100644 lib/librte_pmd_bond/Makefile
>  create mode 100644 lib/librte_pmd_bond/rte_eth_bond.c
>  create mode 100644 lib/librte_pmd_bond/rte_eth_bond.h
> 
> 
For the series
Acked-by: Neil Horman 

Thanks for all the hard work!
Neil



[dpdk-dev] [PATCH v3 0/2] Fix issues with VFIO

2014-06-18 Thread Neil Horman
On Wed, Jun 18, 2014 at 04:07:15PM +0100, Anatoly Burakov wrote:
> This patchset fixes an issue with VFIO where DPDK initialization could
> fail even if the user didn't want to use VFIO in the first place. Also,
> more verbose and descriptive error messages were added to VFIO code, for
> example distinguishing between a failed ioctl() call and an unsupported
> VFIO API version.
> 
> Anatoly Burakov (2):
>   vfio: open VFIO container at startup rather than during init
>   vfio: more verbose error messages
> 
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 63 
> --
>  1 file changed, 34 insertions(+), 29 deletions(-)
> 
> -- 
> 1.8.1.4
> 
> 

Series
Acked-by: Neil Horman 
Thanks Anatoly!
Neil



[dpdk-dev] Random mbuf corruption

2014-06-24 Thread Neil Horman
On Mon, Jun 23, 2014 at 05:43:21PM -0400, Stefan Baranoff wrote:
> Paul,
> 
> Thanks for the advice; we ran memtest as well as the Dell complete system
> diagnostic and neither found an issue. The plot thickens, though!
> 
> Our admins messed up our kickstart labels and what I *thought* was CentOS
> 6.4 was actually RHEL 6.4 and the problem seems to be following the CentOS
> 6.4 installations -- the current configuration of success/failure is:
>   1 server - Westmere - RHEL 6.4 -- works
>   1 server - Sandy Bridge - RHEL 6.4 -- works
>   2 servers - Sandy Bridge - CentOS 6.4 -- fails
> 
there were several memory corruptors fixed in between RHEL 6.3 and RHEL 6.5.
Its possible that CentOS didn't get one of those patches, if it went out in a
zstream or something.  Is there an older version of RHEL that recreates the
problem for you?  If so, I can provide a list of bugs/fixes that may be related,
which you could cross check against centos for inclusion.

> Given that the hardware seems otherwise stable/checks out I'm trying to
> figure out how to determine if this is:
>   a) our software has a bug
>   b) a kernel/hugetlbfs bug
>   c) a  DPDK 1.6.0r2 bug
> 
> I have seen similar issues where calling rte_eal_init too late in a process
> also causes similar issues (things like calling 'free' on memory that was
> allocated with 'malloc' before 'rte_eal_init' is called fails/results in
> segfault in libc) which seems odd to me but in this case we are calling
> rte_eal_init as the first thing we do in main().
> 
Sounds like it might be time to add in some poisoning options to the mbuf
allocator.

Neil

> 
> Thanks,
> Stefan
> 
> 
> On Fri, Jun 20, 2014 at 9:59 AM, Paul Barrette  windriver.com>
> wrote:
> 
> >
> > On 06/20/2014 07:20 AM, Stefan Baranoff wrote:
> >
> >> All,
> >>
> >> We are seeing 'random' memory corruption in mbufs coming from the ixgbe
> >> UIO
> >> driver and I am looking for some pointers on debugging it. Our software
> >> was
> >> running flawlessly for weeks at a time on our old Westmere systems (CentOS
> >> 6.4) but since moving to a new Sandy Bridge v2 server (also CentOS 6.4) it
> >> runs for 1-2 minutes and then at least one mbuf is overwritten with
> >> arbitrary data (pointers/lengths/RSS value/num segs/etc. are all
> >> ridiculous). Both servers are using the 82599EB chipset (x520) and the
> >> DPDK
> >> version (1.6.0r2) is identical. We recently also tested on a third server
> >> running RHEL 6.4 with the same hardware as the failing Sandy Bridge based
> >> system and it is fine (days of runtime no failures).
> >>
> >> Running all of this in GDB with 'record' enabled and setting a watchpoint
> >> on the address which contains the corrupted data and executing a
> >> 'reverse-continue' never hits the watchpoint [GDB newbie here -- assuming
> >> 'watch *(uint64_t*)0x7FB.' should work]. My first thought was memory
> >> corruption but the BIOS memcheck on the ECC RAM shows no issues.
> >>
> >> Also looking at mbuf->pkt.data, as an example, the corrupt value was the
> >> same 6/12 trials but I could not find that value elsewhere in the
> >> processes
> >> memory. This doesn't seem "random" and points to a software bug but I
> >> cannot for the life of me get GDB to tell me where the program is when
> >> that
> >> memory is written to. Incidentally trying this with the PCAP driver and
> >> --no-huge to run valgrind shows no memory access errors/uninitialized
> >> values/etc.
> >>
> >> Thoughts? Pointers? Ways to rule in/out hardware other than going 1 by 1
> >> removing each of the 24 DIMMs?
> >>
> >> Thanks so much in advance!
> >> Stefan
> >>
> > Run memtest to rule out bad ram.
> >
> > Pb
> >
> 


[dpdk-dev] [PATCH] skeleton app: Very simple code for l2fwding

2014-06-26 Thread Neil Horman
On Thu, Jun 26, 2014 at 09:22:40PM +0100, Bruce Richardson wrote:
> This is a very simple example app for doing packet forwarding with the
> Intel DPDK. It's designed to serve as a start point for people new to
> the Intel DPDK and who want to develop a new app.
> 
> Therefore it's meant to:
> * have as good a performance out-of-the-box as possible, using the
>   best-known settings for configuring the PMDs, so that any new apps can
>   be based off it.
> * be kept as short as possible to make it easy to understand it and get
>   started with it.
> 
> Signed-off-by: Bruce Richardson 
Isn't there already an l2fwd example app?
Neil

> ---
>  examples/Makefile|   1 +
>  examples/skeleton_app/Makefile   |  57 ++
>  examples/skeleton_app/basicfwd.c | 236 
> +++
>  examples/skeleton_app/basicfwd.h |  46 
>  4 files changed, 340 insertions(+)
>  create mode 100644 examples/skeleton_app/Makefile
>  create mode 100644 examples/skeleton_app/basicfwd.c
>  create mode 100644 examples/skeleton_app/basicfwd.h
> 
> diff --git a/examples/Makefile b/examples/Makefile
> index 4353b84..2605d1d 100644
> --- a/examples/Makefile
> +++ b/examples/Makefile
> @@ -60,6 +60,7 @@ DIRS-y += netmap_compat/bridge
>  DIRS-$(CONFIG_RTE_LIBRTE_METER) += qos_meter
>  DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += qos_sched
>  DIRS-y += quota_watermark
> +DIRS-y += skeleton_app
>  DIRS-y += timer
>  DIRS-y += vhost
>  DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += vhost_xen
> diff --git a/examples/skeleton_app/Makefile b/examples/skeleton_app/Makefile
> new file mode 100644
> index 000..244f4ef
> --- /dev/null
> +++ b/examples/skeleton_app/Makefile
> @@ -0,0 +1,57 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +# * Redistributions of source code must retain the above copyright
> +#   notice, this list of conditions and the following disclaimer.
> +# * Redistributions in binary form must reproduce the above copyright
> +#   notice, this list of conditions and the following disclaimer in
> +#   the documentation and/or other materials provided with the
> +#   distribution.
> +# * Neither the name of Intel Corporation nor the names of its
> +#   contributors may be used to endorse or promote products derived
> +#   from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +ifeq ($(RTE_SDK),)
> +$(error "Please define RTE_SDK environment variable")
> +endif
> +
> +# Default target, can be overriden by command line or environment
> +RTE_TARGET ?= x86_64-default-linuxapp-gcc
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# binary name
> +APP = basicfwd
> +
> +# all source are stored in SRCS-y
> +SRCS-y := basicfwd.c
> +
> +CFLAGS += $(WERROR_FLAGS)
> +
> +# workaround for a gcc bug with noreturn attribute
> +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
> +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
> +CFLAGS_main.o += -Wno-return-type
> +endif
> +
> +EXTRA_CFLAGS += -O3 -g -Wfatal-errors
> +
> +include $(RTE_SDK)/mk/rte.extapp.mk
> diff --git a/examples/skeleton_app/basicfwd.c 
> b/examples/skeleton_app/basicfwd.c
> new file mode 100644
> index 000..bf51a9d
> --- /dev/null
> +++ b/examples/skeleton_app/basicfwd.c
> @@ -0,0 +1,236 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other

[dpdk-dev] [PATCH] test: Ring PMD unit test rollback

2014-06-27 Thread Neil Horman
On Fri, Jun 27, 2014 at 03:46:39PM +0100, Pablo de Lara wrote:
> From: Pablo de Lara 
> 
> New ring PMD unit test requires extra EAL option (vdev),
> in order to pass, which I believe is incorrect, as this
> test should be focused on testing the API, without needing
> to pass any argument.
> 
> Added again all functions that create all the ring devices
> necessary for the test, and remove the last two tests
> (test_pmd_ring_pair_create and test_pmd_ring_pair_attach),
> as they call functions that are deprecated.
> 
> Signed-off-by: Pablo de Lara 

Nak, this also re-adds the need to link the pmd to the application, either
statically or at runtime (i.e. this implicitly obviates the -d option in
rte_eal_parse_args).  If you really want to re-add these api calls, I recommend
that you split them into a separate ring library and ring_pmd library.

Neil

P.S. you did remember to use the -d option in your testing, right?  Without it
the pmd won't load and the vdev option won't find the proper pmd to parse it.


> ---
>  app/test/test_pmd_ring.c |  217 
> +++---
>  1 files changed, 72 insertions(+), 145 deletions(-)
> 
> diff --git a/app/test/test_pmd_ring.c b/app/test/test_pmd_ring.c
> index 0d3d95c..6988277 100644
> --- a/app/test/test_pmd_ring.c
> +++ b/app/test/test_pmd_ring.c
> @@ -42,6 +42,7 @@
>  /* two test rings, r1 is used by two ports, r2 just by one */
>  static struct rte_ring *r1[2], *r2;
>  
> +static struct rte_ring *nullring = NULL;
>  static struct rte_mempool *mp;
>  static uint8_t start_idx; /* will store the port id of the first of our new 
> ports */
>  
> @@ -49,8 +50,6 @@ static uint8_t start_idx; /* will store the port id of the 
> first of our new port
>  #define RX_PORT (uint8_t)(start_idx + 2)
>  #define RXTX_PORT (uint8_t)(start_idx + 3)
>  #define RXTX_PORT2 (uint8_t)(start_idx + 4)
> -#define RXTX_PORT4 (uint8_t)(start_idx + 6)
> -#define RXTX_PORT5 (uint8_t)(start_idx + 7)
>  #define SOCKET0 0
>  
>  #define RING_SIZE 256
> @@ -58,6 +57,58 @@ static uint8_t start_idx; /* will store the port id of the 
> first of our new port
>  #define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)
>  #define NB_MBUF   512
>  
> +
> +static int
> +test_ring_ethdev_create(void)
> +{
> + int retval;
> + printf("Testing ring pmd create\n");
> +
> + retval = rte_eth_from_rings(NULL, 0, NULL, 0, SOCKET0);
> + if (retval < 0) {
> + printf("Failure, failed to create zero-sized RXTX ring pmd\n");
> + return -1;
> + }
> +
> + retval = rte_eth_from_rings(NULL, 0, NULL, 0, RTE_MAX_NUMA_NODES);
> + if (retval >= 0) {
> + printf("Failure, can create ring pmd on socket %d\n", 
> RTE_MAX_NUMA_NODES);
> + return -1;
> + }
> +
> + retval = rte_eth_from_rings(NULL, 1, &r2, 1, SOCKET0);
> + if (retval >= 0) {
> + printf("Failure, can create pmd with null rx rings\n");
> + return -1;
> + }
> +
> + retval = rte_eth_from_rings(r1, 1, NULL, 1, SOCKET0);
> + if (retval >= 0) {
> + printf("Failure, can create pmd with null tx rings\n");
> + return -1;
> + }
> +
> + retval = rte_eth_from_rings(&nullring, 1, r1, 2, SOCKET0);
> + if (retval < 0) {
> + printf("Failure, failed to create TX-only ring pmd\n");
> + return -1;
> + }
> +
> + retval = rte_eth_from_rings(r1, 1, &nullring, 1, SOCKET0);
> + if (retval < 0) {
> + printf("Failure, failed to create RX-only ring pmd\n");
> + return -1;
> + }
> +
> + retval = rte_eth_from_rings(&r2, 1, &r2, 1, SOCKET0);
> + if (retval < 0) {
> + printf("Failure, failed to create RXTX ring pmd\n");
> + return -1;
> + }
> +
> + return 0;
> +}
> +
>  static int
>  test_ethdev_configure(void)
>  {
> @@ -252,12 +303,26 @@ test_stats_reset(void)
>  static int
>  test_pmd_ring_init(void)
>  {
> + const char * name1 = "R3";
> + const char * name2 = "R4";
> + const char * params_null = NULL;
> + const char * params = "PARAMS";
>   struct rte_eth_stats stats;
>   struct rte_mbuf buf, *pbuf = &buf;
>   struct rte_eth_conf null_conf;
>  
>   printf("Testing ring pmd init\n");
>  
> + if (rte_pmd_ring_devinit(name1, params_null) < 0) {
> + printf("Testing ring pmd init fail\n");
> + return -1;
> + }
> +
> + if (rte_pmd_ring_devinit(name2, params) < 0) {
> + printf("Testing ring pmd init fail\n");
> + return -1;
> + }
> +
>   if (RXTX_PORT2 >= RTE_MAX_ETHPORTS) {
>   printf(" TX/RX port exceed max eth ports\n");
>   return -1;
> @@ -304,144 +369,8 @@ test_pmd_ring_init(void)
>  
>   rte_eth_dev_stop(RXTX_PORT2);
>  
> - return 0;
> -}
> -
> -static int
> -test_pmd_ring_pair_create(void)
> -{
> - struct rte_eth_stats stats, stats2;
> - struct rte_mbuf bu

[dpdk-dev] [PATCH] test: Ring PMD unit test rollback

2014-06-27 Thread Neil Horman
On Fri, Jun 27, 2014 at 03:47:07PM +, De Lara Guarch, Pablo wrote:
> Hi Neil,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> > Sent: Friday, June 27, 2014 4:03 PM
> > To: De Lara Guarch, Pablo
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] test: Ring PMD unit test rollback
> > 
> > On Fri, Jun 27, 2014 at 03:46:39PM +0100, Pablo de Lara wrote:
> > > From: Pablo de Lara 
> > >
> > > New ring PMD unit test requires extra EAL option (vdev),
> > > in order to pass, which I believe is incorrect, as this
> > > test should be focused on testing the API, without needing
> > > to pass any argument.
> > >
> > > Added again all functions that create all the ring devices
> > > necessary for the test, and remove the last two tests
> > > (test_pmd_ring_pair_create and test_pmd_ring_pair_attach),
> > > as they call functions that are deprecated.
> > >
> > > Signed-off-by: Pablo de Lara 
> > 
> > Nak, this also re-adds the need to link the pmd to the application, either
> > statically or at runtime (i.e. this implicitly obviates the -d option in
> > rte_eal_parse_args).  If you really want to re-add these api calls, I
> > recommend
> > that you split them into a separate ring library and ring_pmd library.
> > 
> > Neil
> > 
> > P.S. you did remember to use the -d option in your testing, right?  Without 
> > it
> > the pmd won't load and the vdev option won't find the proper pmd to parse
> > it.
> > 
> 
> Why does it need the -d? I could use the vdev option without loading any 
> dynamic pmds, and
You don't need it only if you build the application statically.  If you build it
as shared objects you need to specify which pmds to load at run time using the
-d option.  If you're building statically then no, you don't need it.

> it created the devices, but it is just difficult to run a simple unit test, 
> by having to know which
> ethdevs you have to create, instead of letting the test itself to create the 
> ones that it needs,
> if you know what I mean. So, if this solution is not good enough, another 
> idea could be to let 
> the unit test make calls to the application, creating the virtual devices 
> needed. What I don't 
> like is the idea of having to pass the vdev to the generic test application, 
> as it is intended to 
> run other tests and not just the ring pmd (we could do something like in the 
> eal_flags test,
> where several test app instances are run, with different parameters, although 
> I am not sure,
> if this is possible for this kind of test).
> 


Alternatively to restoring the API, you can hardcode the command line strings
into the test pmd, so you don't have to specify them at run time, and just pass
them directly to rte_eal_init.

The problem with just blindly adding back the API is that it requires any
application that might want to use the ring pmd to load it at run time
unilaterally.

Regards
Neil

> Thanks,
> Pablo
> 


[dpdk-dev] hardware neutral driver for DPDK

2014-03-06 Thread Neil Horman
Hey there-
I'm interested in doing some work on the DPDK, specifically in creating
a driver backend that interfaces to the kernel using AF_PACKET rather than a
specific card for which a UIO or VFIO driver is available in kernel.  I think it
would be nice for the DPDK to have a hardware agnostic driver for development
purposes.  Before I begin I wanted to make sure that something like this hasn't
already been done. I don't see any clear reference to one in the git repo, but I
figured I'd ask just to be sure there wasn't some external project in the works
already.

Thanks & Regards
Neil



[dpdk-dev] hardware neutral driver for DPDK

2014-03-06 Thread Neil Horman
On Thu, Mar 06, 2014 at 10:26:21PM +0100, Vincent JARDIN wrote:
> librte_pmd_pcap
>   http://dpdk.org/browse/dpdk/tree/lib/librte_pmd_pcap
> 
Perfect, thanks!
Neil

> On 06/03/2014 22:25, Neil Horman wrote:
> >Hey there-
> > I'm interested in doing some work on the DPDK, specifically in creating
> >a driver backend that interfaces to the kernel using AF_PACKET rather than a
> >specific card for which a UIO or VFIO driver is available in kernel.  I 
> >think it
> >would be nice for the DPDK to have a hardware agnostic driver for development
> >purposes.  Before I begin I wanted to make sure that something like this 
> >hasn't
> >already been done. I don't see any clear reference to one in the git repo, 
> >but I
> >figured I'd ask just to be sure there wasn't some external project in the 
> >works
> >already.
> >
> >Thanks & Regards
> >Neil
> >
> 
> 


[dpdk-dev] [PATCH] eal: fix up bad asm in rte_cpu_get_features

2014-03-18 Thread Neil Horman
The recent conversion to build dpdk as a DSO has an error in
rte_cpu_get_features.  When being build with -fpie, %ebx gets clobbered by the
cpuid instruction which is also the pic register.  Therefore the inline asm
tries to save off %ebx, but does so incorrectly.  It starts by loading
params.ebx to "D" which is %edi, but then the first instruction moves %ebx to
%edi, clobbering the input value. Then after the operation is complete, "D"
(%edi) is stored to the local ebx variable, but only after the xchgl instruction
has happened, which means ebx holds only the PIC pointer.  This behavior was
causing strange segfults for me when running the cpuid instruction.

The fix is pretty easy, split the asm into two separate directives, the first
saving ebx, and using it to grab the appropriate cpuid info (and correctly
listing %edi as a clobbered register in the process, and then a subsequent asm
directive preforming the reverse exchange (again, listing %edi as being
clobbered).

Signed-off-by: Neil Horman 
---
 lib/librte_eal/common/eal_common_cpuflags.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
b/lib/librte_eal/common/eal_common_cpuflags.c
index 1ebf78c..2072a0c 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -208,16 +208,19 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
asm volatile ( 
 "mov %%ebx, %%edi\n"
 "cpuid\n"
-"xchgl %%ebx, %%edi;\n"
 : "=a" (eax),
-  "=D" (ebx),
+  "=b" (ebx),
   "=c" (ecx),
   "=d" (edx)
 /* input */
 : "a" (params.eax),
-  "D" (params.ebx),
+  "b" (params.ebx),
   "c" (params.ecx),
-  "d" (params.edx));
+  "d" (params.edx)
+   : "%edi");
+
+   asm volatile ("xchgl %%ebx, %%edi;\n"
+ : :);
 #endif

switch (params.return_register) {
-- 
1.8.3.1



[dpdk-dev] [PATCH v2] eal: fix up bad asm in rte_cpu_get_features

2014-03-19 Thread Neil Horman
The recent conversion to build dpdk as a DSO has an error in
rte_cpu_get_features.  When being build with -fpie, %ebx gets clobbered by the
cpuid instruction which is also the pic register.  Therefore the inline asm
tries to save off %ebx, but does so incorrectly.  It starts by loading
params.ebx to "D" which is %edi, but then the first instruction moves %ebx to
%edi, clobbering the input value. Then after the operation is complete, "D"
(%edi) is stored to the local ebx variable, but only after the xchgl instruction
has happened, which means ebx holds only the PIC pointer.  This behavior was
causing strange segfults for me when running the cpuid instruction.

The fix is pretty easy, split the asm into two separate directives, the first
saving ebx, and using it to grab the appropriate cpuid info (and correctly
listing %edi as a clobbered register in the process, and then a subsequent asm
directive preforming the reverse exchange (again, listing %edi as being
clobbered).

Signed-off-by: Neil Horman 

---
Change notes

v2) Fix constraints to ensure that ebx isn't overwritten before asm starts
---
 lib/librte_eal/common/eal_common_cpuflags.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
b/lib/librte_eal/common/eal_common_cpuflags.c
index 1ebf78c..75b505f 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -190,7 +190,7 @@ static const struct feature_entry cpu_feature_table[] = {
 static inline int
 rte_cpu_get_features(struct cpuid_parameters_t params)
 {
-   int eax, ebx, ecx, edx;/* registers */
+   int eax, ebx, ecx, edx, oldebx;/* registers */

 #ifndef __PIC__
asm volatile ("cpuid"
@@ -206,18 +206,21 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
"d" (params.edx));
 #else
asm volatile ( 
-"mov %%ebx, %%edi\n"
+"xchgl %%ebx, %%edi\n"
 "cpuid\n"
-"xchgl %%ebx, %%edi;\n"
 : "=a" (eax),
-  "=D" (ebx),
+  "=b" (ebx),
   "=c" (ecx),
-  "=d" (edx)
+  "=d" (edx),
+ "=D" (oldebx)
 /* input */
 : "a" (params.eax),
   "D" (params.ebx),
   "c" (params.ecx),
   "d" (params.edx));
+
+   asm volatile ("xchgl %%ebx, %%edi;\n"
+ : : "D" (oldebx));
 #endif

switch (params.return_register) {
-- 
1.8.3.1



[dpdk-dev] [PATCH v2] eal: fix up bad asm in rte_cpu_get_features

2014-03-19 Thread Neil Horman
On Wed, Mar 19, 2014 at 08:44:46AM -0700, H. Peter Anvin wrote:
> On 03/19/2014 07:48 AM, Neil Horman wrote:
> > The recent conversion to build dpdk as a DSO has an error in
> > rte_cpu_get_features.  When being build with -fpie, %ebx gets clobbered by 
> > the
> > cpuid instruction which is also the pic register.  Therefore the inline asm
> > tries to save off %ebx, but does so incorrectly.  It starts by loading
> > params.ebx to "D" which is %edi, but then the first instruction moves %ebx 
> > to
> > %edi, clobbering the input value. Then after the operation is complete, "D"
> > (%edi) is stored to the local ebx variable, but only after the xchgl 
> > instruction
> > has happened, which means ebx holds only the PIC pointer.  This behavior was
> > causing strange segfults for me when running the cpuid instruction.
> > 
> > The fix is pretty easy, split the asm into two separate directives, the 
> > first
> > saving ebx, and using it to grab the appropriate cpuid info (and correctly
> > listing %edi as a clobbered register in the process, and then a subsequent 
> > asm
> > directive preforming the reverse exchange (again, listing %edi as being
> > clobbered).
> > 
> > Signed-off-by: Neil Horman 
> > 
> 
So after some discussion with hpa, I need to self NAK this again, apologies for
the noise.  Theres some clean up to be done in this area, and I'm still getting
a segfault that is in some way related to this code, but I need to dig deeper to
understand it.

Neil

> Hi Neil :)
> 
> If I'm reading this correctly, this is at the very best extremely
> dangerous (I'm confused why it would compile at all with PIC enabled),
> since it leaves the CPU state in an unexpected way between two asm
> statements, where the compiler is perfectly allowed to put code.
> 
> Instead, I would do simple xchg/xchg, which is an idiom I have used for
> this particular purpose in a lot of code.  The minimal patch is simply
> to change "mov" to "xchg" inside the asm statement.
> 
> There is no fundamental reason to nail down the register to %edi,
> though; thus I would suggest instead:
> 
> diff --git a/lib/librte_eal/common/eal_common_cpuflags.c
> b/lib/librte_eal/common/eal_common_cpuflags.c
> index 1ebf78cc2a48..6b75992fec1a 100644
> --- a/lib/librte_eal/common/eal_common_cpuflags.c
> +++ b/lib/librte_eal/common/eal_common_cpuflags.c
> @@ -206,16 +206,16 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
> "d" (params.edx));
>  #else
> asm volatile (
> -"mov %%ebx, %%edi\n"
> +"xchgl %%ebx, %1\n"
>  "cpuid\n"
> -"xchgl %%ebx, %%edi;\n"
> +"xchgl %%ebx, %1;\n"
>  : "=a" (eax),
> -  "=D" (ebx),
> +  "=r" (ebx),
>"=c" (ecx),
>"=d" (edx)
>  /* input */
>  : "a" (params.eax),
> -  "D" (params.ebx),
> +  "1" (params.ebx),
>"c" (params.ecx),
>"d" (params.edx));
>  #endif
> 
> 
> > ---
> > Change notes
> > 
> > v2) Fix constraints to ensure that ebx isn't overwritten before asm starts
> > ---
> >  lib/librte_eal/common/eal_common_cpuflags.c | 13 -
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
> > b/lib/librte_eal/common/eal_common_cpuflags.c
> > index 1ebf78c..75b505f 100644
> > --- a/lib/librte_eal/common/eal_common_cpuflags.c
> > +++ b/lib/librte_eal/common/eal_common_cpuflags.c
> > @@ -190,7 +190,7 @@ static const struct feature_entry cpu_feature_table[] = 
> > {
> >  static inline int
> >  rte_cpu_get_features(struct cpuid_parameters_t params)
> >  {
> > -   int eax, ebx, ecx, edx;/* registers */
> > +   int eax, ebx, ecx, edx, oldebx;/* registers */
> >  
> >  #ifndef __PIC__
> > asm volatile ("cpuid"
> > @@ -206,18 +206,21 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
> > "d" (params.edx));
> >  #else
> > asm volatile ( 
> > -"mov %%ebx, %%edi\n"
> > +"xchgl %%ebx, %%edi\n"
> >  "cpuid\n"
> > -"xchgl %%ebx, %%edi;\n"
> >  : "=a" (eax),
> > -  "=D" (ebx),
> > +  "=b" (ebx),
> >"=c" (ecx),
> > -  "=d" (edx)
> > +  "=d" (edx),
> > + "=D" (oldebx)
> >  /* input */
> >  : "a" (params.eax),
> >"D" (params.ebx),
> >"c" (params.ecx),
> >"d" (params.edx));
> > +
> > +   asm volatile ("xchgl %%ebx, %%edi;\n"
> > + : : "D" (oldebx));
> >  #endif
> >  
> > switch (params.return_register) {
> > 
> 
> 


[dpdk-dev] [PATCH v2] eal: fix up bad asm in rte_cpu_get_features

2014-03-20 Thread Neil Horman
On Wed, Mar 19, 2014 at 09:22:03PM -0700, H. Peter Anvin wrote:
> On 03/19/2014 05:40 PM, Neil Horman wrote:
> > So after some discussion with hpa, I need to self NAK this again, apologies 
> > for
> > the noise.  Theres some clean up to be done in this area, and I'm still 
> > getting
> > a segfault that is in some way related to this code, but I need to dig 
> > deeper to
> > understand it.
> > 
> > Neil
> 
> I still believe we should add the patch I posted in the previous email;
> I should clean it up and put a proper header on it.
> 
I agree, but the fact of the matter is that I'm still getting a segfault very
close to these instructions and I dont' understand why yet.  I'd hate to just
make the problem go away without understanding the reason why.  The patch you
propose doesn't fix (yet moving the xchgl to its own asm statement does).

> This is, if there is actually a need to feed %ebx and %edx into CPUID
> (the native instruction is sensitive to %eax and %ecx, but not %ebx or
> %edx.)
> 
> For reference, this is a version of CPUID I personally often use:
> 
> struct cpuid {
>   unsigned int eax, ecx, edx, ebx;
> };
> 
> static inline void cpuid(unsigned int leaf, unsigned int subleaf,
>struct cpuid *out)
> {
> #if defined(__i386__) && defined(__PIC__)
So, this is an additional difference and this in fact does make the problem
clear up.  By applying only this patch:

@@ -192,7 +192,7 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
 {
int eax, ebx, ecx, edx;/* registers */

-#ifndef __PIC__
+#if !defined(__PIC__) || !defined(__i386__)
asm volatile ("cpuid"
  /* output */
  : "=a" (eax),

my build compiles the cpuid instruction branch, not the mov;cpuid; xchgl branch
(its an x86_64 build).  Is there any reason that x86_64 doesn't need to save the
ebx register when running cpuid while building PIE code?

Neil



[dpdk-dev] [PATCH v2] eal: fix up bad asm in rte_cpu_get_features

2014-03-20 Thread Neil Horman
On Thu, Mar 20, 2014 at 07:03:23AM -0400, Neil Horman wrote:
> On Wed, Mar 19, 2014 at 09:22:03PM -0700, H. Peter Anvin wrote:
> > On 03/19/2014 05:40 PM, Neil Horman wrote:
> > > So after some discussion with hpa, I need to self NAK this again, 
> > > apologies for
> > > the noise.  Theres some clean up to be done in this area, and I'm still 
> > > getting
> > > a segfault that is in some way related to this code, but I need to dig 
> > > deeper to
> > > understand it.
> > > 
> > > Neil
> > 
> > I still believe we should add the patch I posted in the previous email;
> > I should clean it up and put a proper header on it.
> > 
> I agree, but the fact of the matter is that I'm still getting a segfault very
> close to these instructions and I dont' understand why yet.  I'd hate to just
> make the problem go away without understanding the reason why.  The patch you
> propose doesn't fix (yet moving the xchgl to its own asm statement does).
> 
> > This is, if there is actually a need to feed %ebx and %edx into CPUID
> > (the native instruction is sensitive to %eax and %ecx, but not %ebx or
> > %edx.)
> > 
> > For reference, this is a version of CPUID I personally often use:
> > 
> > struct cpuid {
> > unsigned int eax, ecx, edx, ebx;
> > };
> > 
> > static inline void cpuid(unsigned int leaf, unsigned int subleaf,
> >  struct cpuid *out)
> > {
> > #if defined(__i386__) && defined(__PIC__)
> So, this is an additional difference and this in fact does make the problem
> clear up.  By applying only this patch:
> 
> @@ -192,7 +192,7 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
>  {
> int eax, ebx, ecx, edx;/* registers */
>  
> -#ifndef __PIC__
> +#if !defined(__PIC__) || !defined(__i386__)
> asm volatile ("cpuid"
>   /* output */
>   : "=a" (eax),
> 
> my build compiles the cpuid instruction branch, not the mov;cpuid; xchgl 
> branch
> (its an x86_64 build).  Is there any reason that x86_64 doesn't need to save 
> the
> ebx register when running cpuid while building PIE code?
> 
> Neil
> 
> 
So, I answered my own question, sort of.  The __i386__ is clear: x86_64 uses RIP
relative addressing, making the saving of ebx not needed - thats perfectly
clear.

Whats a bit less clear to me is why it matters.  Ideally moving ebx and
restoring it with an xchg should change the register state at all.  It would
clobber the lower part of rbx I think, but looking at the disassembly that
shouldn't be used, so as long as the calling function saves its value of rbx, it
should be ok.  The odd part is, if I look at the disassembly of
rte_cpu_get_flag_enabled compiled with and without the mov and xchgl operations,
I see that without those additional instructions the compiler adds a push rbx
and pop rbx instruction at the start and end of the assembly, but not when the
mov ebx, %0 and xchgl %ebx, %0 instructions are added.  I'm not sure what the
compiler is sensitive to when adding those instructions, but it seems like it
should be sensitive to the cpuid instruction, and should be adding it to both.

I'd like your thought Peter on that, but either way it seems clear that the
mov/xchgl aren't needed for x86_64 code, so I'll clean that up and post a new
patch shortly.

Neil



[dpdk-dev] [PATCH v3] eal: Fix up assembly for x86_64 in rte_cpu_get_features

2014-03-20 Thread Neil Horman
x86_64 doesn't need to save off and restore ebx when issuing cpuid, since x86_64
uses RIP relative addressing.  Doing the save actually clobbers the lower half
of rbx, which could be used and not saved off independently, leading to
undefined behavior.  Fix up the defines so that for x86_64 we just issue the
cpuid instruction, which is safe.  Also, while we're at it, lets clean up the
input and output constraints on the inline asm, so that we don't load registers
that the cpuid instruction isn't sensitive to.

Note that this patch does alter the API, in that specifcations to ebx and edx
are ignored.  I chose to go ahead and do that because there is only a single
caller of this function and neither register is ever written currently.

Signed-off-by: Neil Horman 
CC: "H. Peter Anvin" 
---
 lib/librte_eal/common/eal_common_cpuflags.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
b/lib/librte_eal/common/eal_common_cpuflags.c
index 1ebf78c..0a18d53 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -192,7 +192,7 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
 {
int eax, ebx, ecx, edx;/* registers */

-#ifndef __PIC__
+#if !defined(__PIC__) || !defined(__i386__)
asm volatile ("cpuid"
  /* output */
  : "=a" (eax),
@@ -201,23 +201,19 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
"=d" (edx)
  /* input */
  : "a" (params.eax),
-   "b" (params.ebx),
-   "c" (params.ecx),
-   "d" (params.edx));
+   "c" (params.ecx));
 #else
asm volatile ( 
-"mov %%ebx, %%edi\n"
+"mov %%ebx, %0\n"
 "cpuid\n"
-"xchgl %%ebx, %%edi;\n"
-: "=a" (eax),
-  "=D" (ebx),
+"xchgl %%ebx, %0\n"
+: "=r" (ebx),
+ "=a" (eax),
   "=c" (ecx),
   "=d" (edx)
 /* input */
 : "a" (params.eax),
-  "D" (params.ebx),
-  "c" (params.ecx),
-  "d" (params.edx));
+  "c" (params.ecx));
 #endif

switch (params.return_register) {
-- 
1.8.3.1



[dpdk-dev] [RFC UNTESTED PATCH] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-03-20 Thread Neil Horman
On Thu, Mar 20, 2014 at 08:53:50AM -0700, H. Peter Anvin wrote:
> Neil Horman reported that on x86-64 the upper half of %rbx would get
> clobbered when the code was compiled PIC or PIE, because the
> i386-specific code to preserve %ebx was incorrectly compiled.
> 
> However, the code is really way more complex than it needs to be.  For
> one thing, the CPUID instruction only needs %eax (leaf) and %ecx
> (subleaf) as parameters, and since we are testing for bits, we might
> as well list the bits explicitly.  Furthermore, we can use an array
> rather than doing a switch statement inside a structure.
> 
> Reported-by: Neil Horman 
> Signed-off-by: H. Peter Anvin 
Acked-by: Neil Horman 

> ---
>  lib/librte_eal/common/eal_common_cpuflags.c | 272 
> +---
>  1 file changed, 121 insertions(+), 151 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
> b/lib/librte_eal/common/eal_common_cpuflags.c
> index 1ebf78cc2a48..bf66ad9d94ec 100644
> --- a/lib/librte_eal/common/eal_common_cpuflags.c
> +++ b/lib/librte_eal/common/eal_common_cpuflags.c
> @@ -54,21 +54,12 @@
>   */
>  enum cpu_register_t {
>   REG_EAX = 0,
> - REG_EBX,
>   REG_ECX,
>   REG_EDX,
> + REG_EBX,
>  };
>  
> -/**
> - * Parameters for CPUID instruction
> - */
> -struct cpuid_parameters_t {
> - uint32_t eax;
> - uint32_t ebx;
> - uint32_t ecx;
> - uint32_t edx;
> - enum cpu_register_t return_register;
> -};
> +typedef uint32_t cpuid_registers_t[4];
>  
>  #define CPU_FLAG_NAME_MAX_LEN 64
>  
> @@ -78,8 +69,10 @@ struct cpuid_parameters_t {
>  struct feature_entry {
>   enum rte_cpu_flag_t feature;/**< feature name */
>   char name[CPU_FLAG_NAME_MAX_LEN];   /**< String for printing */
> - struct cpuid_parameters_t params;   /**< cpuid parameters */
> - uint32_t feature_mask;  /**< bitmask for feature */
> + uint32_t leaf;  /**< cpuid leaf */
> + uint32_t subleaf;   /**< cpuid subleaf */
> + uint32_t reg;   /**< cpuid register */
> + uint32_t bit;   /**< cpuid register bit */
>  };
>  
>  #define FEAT_DEF(f) RTE_CPUFLAG_##f, #f
> @@ -88,97 +81,97 @@ struct feature_entry {
>   * An array that holds feature entries
>   */
>  static const struct feature_entry cpu_feature_table[] = {
> - {FEAT_DEF(SSE3),  {0x1, 0, 0, 0, REG_ECX}, 0x0001},
> - {FEAT_DEF(PCLMULQDQ), {0x1, 0, 0, 0, REG_ECX}, 0x0002},
> - {FEAT_DEF(DTES64),{0x1, 0, 0, 0, REG_ECX}, 0x0004},
> - {FEAT_DEF(MONITOR),   {0x1, 0, 0, 0, REG_ECX}, 0x0008},
> - {FEAT_DEF(DS_CPL),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
> - {FEAT_DEF(VMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0020},
> - {FEAT_DEF(SMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0040},
> - {FEAT_DEF(EIST),  {0x1, 0, 0, 0, REG_ECX}, 0x0080},
> - {FEAT_DEF(TM2),   {0x1, 0, 0, 0, REG_ECX}, 0x0100},
> - {FEAT_DEF(SSSE3), {0x1, 0, 0, 0, REG_ECX}, 0x0200},
> - {FEAT_DEF(CNXT_ID),   {0x1, 0, 0, 0, REG_ECX}, 0x0400},
> - {FEAT_DEF(FMA),   {0x1, 0, 0, 0, REG_ECX}, 0x1000},
> - {FEAT_DEF(CMPXCHG16B),{0x1, 0, 0, 0, REG_ECX}, 0x2000},
> - {FEAT_DEF(XTPR),  {0x1, 0, 0, 0, REG_ECX}, 0x4000},
> - {FEAT_DEF(PDCM),  {0x1, 0, 0, 0, REG_ECX}, 0x8000},
> - {FEAT_DEF(PCID),  {0x1, 0, 0, 0, REG_ECX}, 0x0002},
> - {FEAT_DEF(DCA),   {0x1, 0, 0, 0, REG_ECX}, 0x0004},
> - {FEAT_DEF(SSE4_1),{0x1, 0, 0, 0, REG_ECX}, 0x0008},
> - {FEAT_DEF(SSE4_2),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
> - {FEAT_DEF(X2APIC),{0x1, 0, 0, 0, REG_ECX}, 0x0020},
> - {FEAT_DEF(MOVBE), {0x1, 0, 0, 0, REG_ECX}, 0x0040},
> - {FEAT_DEF(POPCNT),{0x1, 0, 0, 0, REG_ECX}, 0x0080},
> - {FEAT_DEF(TSC_DEADLINE),  {0x1, 0, 0, 0, REG_ECX}, 0x0100},
> - {FEAT_DEF(AES),   {0x1, 0, 0, 0, REG_ECX}, 0x0200},
> - {FEAT_DEF(XSAVE), {0x1, 0, 0, 0, REG_ECX}, 0x0400},
> - {FEAT_DEF(OSXSAVE),   {0x1, 0, 0, 0, REG_ECX}, 0x0800},
> - {FEAT_DEF(AVX),   {0x1, 0, 0, 0, REG_ECX}, 0x1000},
> - {FEAT_DEF(F16C),  {0x1, 0, 0, 0, REG_ECX}, 0x2000},
> - {FEAT_DEF(RDRAND),{0x1, 0, 0, 0, REG_ECX}, 0x4000},
> -
> - {FEAT_DEF(FPU),   {0x1, 0, 0, 0, REG_E

[dpdk-dev] [RFC UNTESTED PATCH] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-03-20 Thread Neil Horman
On Thu, Mar 20, 2014 at 09:44:28AM -0700, H. Peter Anvin wrote:
> Neil Horman reported that on x86-64 the upper half of %rbx would get
> clobbered when the code was compiled PIC or PIE, because the
> i386-specific code to preserve %ebx was incorrectly compiled.
> 
> However, the code is really way more complex than it needs to be.  For
> one thing, the CPUID instruction only needs %eax (leaf) and %ecx
> (subleaf) as parameters, and since we are testing for bits, we might
> as well list the bits explicitly.  Furthermore, we can use an array
> rather than doing a switch statement inside a structure.
> 
> Reported-by: Neil Horman 
> Signed-off-by: H. Peter Anvin 
Acked-by: Neil Horman 

> ---
>  lib/librte_eal/common/eal_common_cpuflags.c | 272 
> +---
>  1 file changed, 121 insertions(+), 151 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
> b/lib/librte_eal/common/eal_common_cpuflags.c
> index 1ebf78cc2a48..bf66ad9d94ec 100644
> --- a/lib/librte_eal/common/eal_common_cpuflags.c
> +++ b/lib/librte_eal/common/eal_common_cpuflags.c
> @@ -54,21 +54,12 @@
>   */
>  enum cpu_register_t {
>   REG_EAX = 0,
> - REG_EBX,
>   REG_ECX,
>   REG_EDX,
> + REG_EBX,
>  };
>  
> -/**
> - * Parameters for CPUID instruction
> - */
> -struct cpuid_parameters_t {
> - uint32_t eax;
> - uint32_t ebx;
> - uint32_t ecx;
> - uint32_t edx;
> - enum cpu_register_t return_register;
> -};
> +typedef uint32_t cpuid_registers_t[4];
>  
>  #define CPU_FLAG_NAME_MAX_LEN 64
>  
> @@ -78,8 +69,10 @@ struct cpuid_parameters_t {
>  struct feature_entry {
>   enum rte_cpu_flag_t feature;/**< feature name */
>   char name[CPU_FLAG_NAME_MAX_LEN];   /**< String for printing */
> - struct cpuid_parameters_t params;   /**< cpuid parameters */
> - uint32_t feature_mask;  /**< bitmask for feature */
> + uint32_t leaf;  /**< cpuid leaf */
> + uint32_t subleaf;   /**< cpuid subleaf */
> + uint32_t reg;   /**< cpuid register */
> + uint32_t bit;   /**< cpuid register bit */
>  };
>  
>  #define FEAT_DEF(f) RTE_CPUFLAG_##f, #f
> @@ -88,97 +81,97 @@ struct feature_entry {
>   * An array that holds feature entries
>   */
>  static const struct feature_entry cpu_feature_table[] = {
> - {FEAT_DEF(SSE3),  {0x1, 0, 0, 0, REG_ECX}, 0x0001},
> - {FEAT_DEF(PCLMULQDQ), {0x1, 0, 0, 0, REG_ECX}, 0x0002},
> - {FEAT_DEF(DTES64),{0x1, 0, 0, 0, REG_ECX}, 0x0004},
> - {FEAT_DEF(MONITOR),   {0x1, 0, 0, 0, REG_ECX}, 0x0008},
> - {FEAT_DEF(DS_CPL),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
> - {FEAT_DEF(VMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0020},
> - {FEAT_DEF(SMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0040},
> - {FEAT_DEF(EIST),  {0x1, 0, 0, 0, REG_ECX}, 0x0080},
> - {FEAT_DEF(TM2),   {0x1, 0, 0, 0, REG_ECX}, 0x0100},
> - {FEAT_DEF(SSSE3), {0x1, 0, 0, 0, REG_ECX}, 0x0200},
> - {FEAT_DEF(CNXT_ID),   {0x1, 0, 0, 0, REG_ECX}, 0x0400},
> - {FEAT_DEF(FMA),   {0x1, 0, 0, 0, REG_ECX}, 0x1000},
> - {FEAT_DEF(CMPXCHG16B),{0x1, 0, 0, 0, REG_ECX}, 0x2000},
> - {FEAT_DEF(XTPR),  {0x1, 0, 0, 0, REG_ECX}, 0x4000},
> - {FEAT_DEF(PDCM),  {0x1, 0, 0, 0, REG_ECX}, 0x8000},
> - {FEAT_DEF(PCID),  {0x1, 0, 0, 0, REG_ECX}, 0x0002},
> - {FEAT_DEF(DCA),   {0x1, 0, 0, 0, REG_ECX}, 0x0004},
> - {FEAT_DEF(SSE4_1),{0x1, 0, 0, 0, REG_ECX}, 0x0008},
> - {FEAT_DEF(SSE4_2),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
> - {FEAT_DEF(X2APIC),{0x1, 0, 0, 0, REG_ECX}, 0x0020},
> - {FEAT_DEF(MOVBE), {0x1, 0, 0, 0, REG_ECX}, 0x0040},
> - {FEAT_DEF(POPCNT),{0x1, 0, 0, 0, REG_ECX}, 0x0080},
> - {FEAT_DEF(TSC_DEADLINE),  {0x1, 0, 0, 0, REG_ECX}, 0x0100},
> - {FEAT_DEF(AES),   {0x1, 0, 0, 0, REG_ECX}, 0x0200},
> - {FEAT_DEF(XSAVE), {0x1, 0, 0, 0, REG_ECX}, 0x0400},
> - {FEAT_DEF(OSXSAVE),   {0x1, 0, 0, 0, REG_ECX}, 0x0800},
> - {FEAT_DEF(AVX),   {0x1, 0, 0, 0, REG_ECX}, 0x1000},
> - {FEAT_DEF(F16C),  {0x1, 0, 0, 0, REG_ECX}, 0x2000},
> - {FEAT_DEF(RDRAND),{0x1, 0, 0, 0, REG_ECX}, 0x4000},
> -
> - {FEAT_DEF(FPU),   {0x1, 0, 0, 0, REG_E

[dpdk-dev] [RFC UNTESTED PATCH] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-03-20 Thread Neil Horman
On Thu, Mar 20, 2014 at 12:39:21PM -0400, Neil Horman wrote:
> On Thu, Mar 20, 2014 at 08:53:50AM -0700, H. Peter Anvin wrote:
> > Neil Horman reported that on x86-64 the upper half of %rbx would get
> > clobbered when the code was compiled PIC or PIE, because the
> > i386-specific code to preserve %ebx was incorrectly compiled.
> > 
> > However, the code is really way more complex than it needs to be.  For
> > one thing, the CPUID instruction only needs %eax (leaf) and %ecx
> > (subleaf) as parameters, and since we are testing for bits, we might
> > as well list the bits explicitly.  Furthermore, we can use an array
> > rather than doing a switch statement inside a structure.
> > 
> > Reported-by: Neil Horman 
> > Signed-off-by: H. Peter Anvin 
> Acked-by: Neil Horman 
> 
Sorry, I'm just acking the proposed change, I've not tested it yet, though based
on our conversation, this is the right thing to do.  I'll have test reports
shortly.
Neil

> > ---
> >  lib/librte_eal/common/eal_common_cpuflags.c | 272 
> > +---
> >  1 file changed, 121 insertions(+), 151 deletions(-)
> > 
> > diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
> > b/lib/librte_eal/common/eal_common_cpuflags.c
> > index 1ebf78cc2a48..bf66ad9d94ec 100644
> > --- a/lib/librte_eal/common/eal_common_cpuflags.c
> > +++ b/lib/librte_eal/common/eal_common_cpuflags.c
> > @@ -54,21 +54,12 @@
> >   */
> >  enum cpu_register_t {
> > REG_EAX = 0,
> > -   REG_EBX,
> > REG_ECX,
> > REG_EDX,
> > +   REG_EBX,
> >  };
> >  
> > -/**
> > - * Parameters for CPUID instruction
> > - */
> > -struct cpuid_parameters_t {
> > -   uint32_t eax;
> > -   uint32_t ebx;
> > -   uint32_t ecx;
> > -   uint32_t edx;
> > -   enum cpu_register_t return_register;
> > -};
> > +typedef uint32_t cpuid_registers_t[4];
> >  
> >  #define CPU_FLAG_NAME_MAX_LEN 64
> >  
> > @@ -78,8 +69,10 @@ struct cpuid_parameters_t {
> >  struct feature_entry {
> > enum rte_cpu_flag_t feature;/**< feature name */
> > char name[CPU_FLAG_NAME_MAX_LEN];   /**< String for printing */
> > -   struct cpuid_parameters_t params;   /**< cpuid parameters */
> > -   uint32_t feature_mask;  /**< bitmask for feature */
> > +   uint32_t leaf;  /**< cpuid leaf */
> > +   uint32_t subleaf;   /**< cpuid subleaf */
> > +   uint32_t reg;   /**< cpuid register */
> > +   uint32_t bit;   /**< cpuid register bit */
> >  };
> >  
> >  #define FEAT_DEF(f) RTE_CPUFLAG_##f, #f
> > @@ -88,97 +81,97 @@ struct feature_entry {
> >   * An array that holds feature entries
> >   */
> >  static const struct feature_entry cpu_feature_table[] = {
> > -   {FEAT_DEF(SSE3),  {0x1, 0, 0, 0, REG_ECX}, 0x0001},
> > -   {FEAT_DEF(PCLMULQDQ), {0x1, 0, 0, 0, REG_ECX}, 0x0002},
> > -   {FEAT_DEF(DTES64),{0x1, 0, 0, 0, REG_ECX}, 0x0004},
> > -   {FEAT_DEF(MONITOR),   {0x1, 0, 0, 0, REG_ECX}, 0x0008},
> > -   {FEAT_DEF(DS_CPL),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
> > -   {FEAT_DEF(VMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0020},
> > -   {FEAT_DEF(SMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0040},
> > -   {FEAT_DEF(EIST),  {0x1, 0, 0, 0, REG_ECX}, 0x0080},
> > -   {FEAT_DEF(TM2),   {0x1, 0, 0, 0, REG_ECX}, 0x0100},
> > -   {FEAT_DEF(SSSE3), {0x1, 0, 0, 0, REG_ECX}, 0x0200},
> > -   {FEAT_DEF(CNXT_ID),   {0x1, 0, 0, 0, REG_ECX}, 0x0400},
> > -   {FEAT_DEF(FMA),   {0x1, 0, 0, 0, REG_ECX}, 0x1000},
> > -   {FEAT_DEF(CMPXCHG16B),{0x1, 0, 0, 0, REG_ECX}, 0x2000},
> > -   {FEAT_DEF(XTPR),  {0x1, 0, 0, 0, REG_ECX}, 0x4000},
> > -   {FEAT_DEF(PDCM),  {0x1, 0, 0, 0, REG_ECX}, 0x8000},
> > -   {FEAT_DEF(PCID),  {0x1, 0, 0, 0, REG_ECX}, 0x0002},
> > -   {FEAT_DEF(DCA),   {0x1, 0, 0, 0, REG_ECX}, 0x0004},
> > -   {FEAT_DEF(SSE4_1),{0x1, 0, 0, 0, REG_ECX}, 0x0008},
> > -   {FEAT_DEF(SSE4_2),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
> > -   {FEAT_DEF(X2APIC),{0x1, 0, 0, 0, REG_ECX}, 0x0020},
> > -   {FEAT_DEF(MOVBE), {0x1, 0, 0, 0, REG_ECX}, 0x0040},
> > -   {FEAT_DEF(POPCNT),{0x1, 0, 0, 0, REG_ECX}, 0x0080},
&

[dpdk-dev] [RFC UNTESTED PATCH] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-03-20 Thread Neil Horman
On Thu, Mar 20, 2014 at 02:04:43PM -0400, Neil Horman wrote:
> On Thu, Mar 20, 2014 at 12:39:21PM -0400, Neil Horman wrote:
> > On Thu, Mar 20, 2014 at 08:53:50AM -0700, H. Peter Anvin wrote:
> > > Neil Horman reported that on x86-64 the upper half of %rbx would get
> > > clobbered when the code was compiled PIC or PIE, because the
> > > i386-specific code to preserve %ebx was incorrectly compiled.
> > > 
> > > However, the code is really way more complex than it needs to be.  For
> > > one thing, the CPUID instruction only needs %eax (leaf) and %ecx
> > > (subleaf) as parameters, and since we are testing for bits, we might
> > > as well list the bits explicitly.  Furthermore, we can use an array
> > > rather than doing a switch statement inside a structure.
> > > 
> > > Reported-by: Neil Horman 
> > > Signed-off-by: H. Peter Anvin 
> > Acked-by: Neil Horman 
> > 
> Sorry, I'm just acking the proposed change, I've not tested it yet, though 
> based
> on our conversation, this is the right thing to do.  I'll have test reports
> shortly.
> Neil
> 
Yeah, on testing there are some build and functional errors in this.  Notes
inline


> > > ---
> > >  lib/librte_eal/common/eal_common_cpuflags.c | 272 
> > > +---
> > >  1 file changed, 121 insertions(+), 151 deletions(-)
> > > 
> > > diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
> > > b/lib/librte_eal/common/eal_common_cpuflags.c
> > > index 1ebf78cc2a48..bf66ad9d94ec 100644
> > > --- a/lib/librte_eal/common/eal_common_cpuflags.c
> > > +++ b/lib/librte_eal/common/eal_common_cpuflags.c
> > > @@ -54,21 +54,12 @@
> > >   */
> > >  enum cpu_register_t {
> > >   REG_EAX = 0,
> > > - REG_EBX,
> > >   REG_ECX,
> > >   REG_EDX,
> > > + REG_EBX,
> > >  };
> > >  
> > > -/**
> > > - * Parameters for CPUID instruction
> > > - */
> > > -struct cpuid_parameters_t {
> > > - uint32_t eax;
> > > - uint32_t ebx;
> > > - uint32_t ecx;
> > > - uint32_t edx;
> > > - enum cpu_register_t return_register;
> > > -};
> > > +typedef uint32_t cpuid_registers_t[4];
> > >  
> > >  #define CPU_FLAG_NAME_MAX_LEN 64
> > >  
> > > @@ -78,8 +69,10 @@ struct cpuid_parameters_t {
> > >  struct feature_entry {
> > >   enum rte_cpu_flag_t feature;/**< feature name */
> > >   char name[CPU_FLAG_NAME_MAX_LEN];   /**< String for printing */
> > > - struct cpuid_parameters_t params;   /**< cpuid parameters */
> > > - uint32_t feature_mask;  /**< bitmask for feature */
> > > + uint32_t leaf;  /**< cpuid leaf */
> > > + uint32_t subleaf;   /**< cpuid subleaf */
> > > + uint32_t reg;   /**< cpuid register */
> > > + uint32_t bit;   /**< cpuid register bit */
> > >  };
> > >  
> > >  #define FEAT_DEF(f) RTE_CPUFLAG_##f, #f
> > > @@ -88,97 +81,97 @@ struct feature_entry {
> > >   * An array that holds feature entries
> > >   */
> > >  static const struct feature_entry cpu_feature_table[] = {
> > > - {FEAT_DEF(SSE3),  {0x1, 0, 0, 0, REG_ECX}, 0x0001},
> > > - {FEAT_DEF(PCLMULQDQ), {0x1, 0, 0, 0, REG_ECX}, 0x0002},
> > > - {FEAT_DEF(DTES64),{0x1, 0, 0, 0, REG_ECX}, 0x0004},
> > > - {FEAT_DEF(MONITOR),   {0x1, 0, 0, 0, REG_ECX}, 0x0008},
> > > - {FEAT_DEF(DS_CPL),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
> > > - {FEAT_DEF(VMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0020},
> > > - {FEAT_DEF(SMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0040},
> > > - {FEAT_DEF(EIST),  {0x1, 0, 0, 0, REG_ECX}, 0x0080},
> > > - {FEAT_DEF(TM2),   {0x1, 0, 0, 0, REG_ECX}, 0x0100},
> > > - {FEAT_DEF(SSSE3), {0x1, 0, 0, 0, REG_ECX}, 0x0200},
> > > - {FEAT_DEF(CNXT_ID),   {0x1, 0, 0, 0, REG_ECX}, 0x0400},
> > > - {FEAT_DEF(FMA),   {0x1, 0, 0, 0, REG_ECX}, 0x1000},
> > > - {FEAT_DEF(CMPXCHG16B),{0x1, 0, 0, 0, REG_ECX}, 0x2000},
> > > - {FEAT_DEF(XTPR),  {0x1, 0, 0, 0, REG_ECX}, 0x4000},
> > > - {FEAT_DEF(PDCM),  {0x1, 0, 0, 0, REG_ECX}, 0x8000},
> > > - {FEAT_DEF(PCID),  {0x1, 0, 0, 0, REG_ECX}, 0

[dpdk-dev] [PATCH v2] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-03-21 Thread Neil Horman
From: "H. Peter Anvin" 

Neil Horman reported that on x86-64 the upper half of %rbx would get
clobbered when the code was compiled PIC or PIE, because the
i386-specific code to preserve %ebx was incorrectly compiled.

However, the code is really way more complex than it needs to be.  For
one thing, the CPUID instruction only needs %eax (leaf) and %ecx
(subleaf) as parameters, and since we are testing for bits, we might
as well list the bits explicitly.  Furthermore, we can use an array
rather than doing a switch statement inside a structure.

Reported-by: Neil Horman 
Signed-off-by: H. Peter Anvin 
Tested-by: Neil Horman 

---
Change notes:
v2) Corrected build errors
Fixed cpuid_register_t reference passing
Fixed typedef name typo
---
 lib/librte_eal/common/eal_common_cpuflags.c | 274 +---
 1 file changed, 123 insertions(+), 151 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
b/lib/librte_eal/common/eal_common_cpuflags.c
index 1ebf78c..438d9c5 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -54,21 +54,12 @@
  */
 enum cpu_register_t {
REG_EAX = 0,
-   REG_EBX,
REG_ECX,
REG_EDX,
+   REG_EBX,
 };

-/**
- * Parameters for CPUID instruction
- */
-struct cpuid_parameters_t {
-   uint32_t eax;
-   uint32_t ebx;
-   uint32_t ecx;
-   uint32_t edx;
-   enum cpu_register_t return_register;
-};
+typedef uint32_t cpuid_registers_t[4];

 #define CPU_FLAG_NAME_MAX_LEN 64

@@ -78,8 +69,10 @@ struct cpuid_parameters_t {
 struct feature_entry {
enum rte_cpu_flag_t feature;/**< feature name */
char name[CPU_FLAG_NAME_MAX_LEN];   /**< String for printing */
-   struct cpuid_parameters_t params;   /**< cpuid parameters */
-   uint32_t feature_mask;  /**< bitmask for feature */
+   uint32_t leaf;  /**< cpuid leaf */
+   uint32_t subleaf;   /**< cpuid subleaf */
+   uint32_t reg;   /**< cpuid register */
+   uint32_t bit;   /**< cpuid register bit */
 };

 #define FEAT_DEF(f) RTE_CPUFLAG_##f, #f
@@ -88,97 +81,97 @@ struct feature_entry {
  * An array that holds feature entries
  */
 static const struct feature_entry cpu_feature_table[] = {
-   {FEAT_DEF(SSE3),  {0x1, 0, 0, 0, REG_ECX}, 0x0001},
-   {FEAT_DEF(PCLMULQDQ), {0x1, 0, 0, 0, REG_ECX}, 0x0002},
-   {FEAT_DEF(DTES64),{0x1, 0, 0, 0, REG_ECX}, 0x0004},
-   {FEAT_DEF(MONITOR),   {0x1, 0, 0, 0, REG_ECX}, 0x0008},
-   {FEAT_DEF(DS_CPL),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
-   {FEAT_DEF(VMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0020},
-   {FEAT_DEF(SMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0040},
-   {FEAT_DEF(EIST),  {0x1, 0, 0, 0, REG_ECX}, 0x0080},
-   {FEAT_DEF(TM2),   {0x1, 0, 0, 0, REG_ECX}, 0x0100},
-   {FEAT_DEF(SSSE3), {0x1, 0, 0, 0, REG_ECX}, 0x0200},
-   {FEAT_DEF(CNXT_ID),   {0x1, 0, 0, 0, REG_ECX}, 0x0400},
-   {FEAT_DEF(FMA),   {0x1, 0, 0, 0, REG_ECX}, 0x1000},
-   {FEAT_DEF(CMPXCHG16B),{0x1, 0, 0, 0, REG_ECX}, 0x2000},
-   {FEAT_DEF(XTPR),  {0x1, 0, 0, 0, REG_ECX}, 0x4000},
-   {FEAT_DEF(PDCM),  {0x1, 0, 0, 0, REG_ECX}, 0x8000},
-   {FEAT_DEF(PCID),  {0x1, 0, 0, 0, REG_ECX}, 0x0002},
-   {FEAT_DEF(DCA),   {0x1, 0, 0, 0, REG_ECX}, 0x0004},
-   {FEAT_DEF(SSE4_1),{0x1, 0, 0, 0, REG_ECX}, 0x0008},
-   {FEAT_DEF(SSE4_2),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
-   {FEAT_DEF(X2APIC),{0x1, 0, 0, 0, REG_ECX}, 0x0020},
-   {FEAT_DEF(MOVBE), {0x1, 0, 0, 0, REG_ECX}, 0x0040},
-   {FEAT_DEF(POPCNT),{0x1, 0, 0, 0, REG_ECX}, 0x0080},
-   {FEAT_DEF(TSC_DEADLINE),  {0x1, 0, 0, 0, REG_ECX}, 0x0100},
-   {FEAT_DEF(AES),   {0x1, 0, 0, 0, REG_ECX}, 0x0200},
-   {FEAT_DEF(XSAVE), {0x1, 0, 0, 0, REG_ECX}, 0x0400},
-   {FEAT_DEF(OSXSAVE),   {0x1, 0, 0, 0, REG_ECX}, 0x0800},
-   {FEAT_DEF(AVX),   {0x1, 0, 0, 0, REG_ECX}, 0x1000},
-   {FEAT_DEF(F16C),  {0x1, 0, 0, 0, REG_ECX}, 0x2000},
-   {FEAT_DEF(RDRAND),{0x1, 0, 0, 0, REG_ECX}, 0x4000},
-
-   {FEAT_DEF(FPU),   {0x1, 0, 0, 0, REG_EDX}, 0x0001},
-   {FEAT_DEF(VME),   {0x1, 0, 0, 0, REG_EDX}, 0x0002},
-   {FEAT_DEF(DE),{0x1, 0, 0, 0, REG_EDX}, 0x0004},
-   {FEAT_DEF(PSE),   {0x1, 0, 0, 0, REG_EDX}, 0x0008},
-   {FEAT_DEF(TSC),   {0x1, 0, 0, 0, REG_ED

[dpdk-dev] [PATCH v2] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-03-21 Thread Neil Horman
On Fri, Mar 21, 2014 at 08:03:34AM -0700, H. Peter Anvin wrote:
> On 03/21/2014 07:49 AM, Neil Horman wrote:
> > From: "H. Peter Anvin" 
> > 
> > Neil Horman reported that on x86-64 the upper half of %rbx would get
> > clobbered when the code was compiled PIC or PIE, because the
> > i386-specific code to preserve %ebx was incorrectly compiled.
> > 
> > However, the code is really way more complex than it needs to be.  For
> > one thing, the CPUID instruction only needs %eax (leaf) and %ecx
> > (subleaf) as parameters, and since we are testing for bits, we might
> > as well list the bits explicitly.  Furthermore, we can use an array
> > rather than doing a switch statement inside a structure.
> > 
> > Reported-by: Neil Horman 
> > Signed-off-by: H. Peter Anvin 
> > Tested-by: Neil Horman 
> > 
> 
> Thank you for dealing with this!
> 
> On the subject of my other email... are C99 initializers acceptable in
> dpdk?  If so, I think making that change, too, would be a good idea.
> 
I'll have to defer this to others, I'm not sure what the accepted initalization
method is.  I'm guessing their fine, as both icc and gcc allow them and those
are the supported compilers for dpdk, but I'd like to hear someone in the
maintainership comment.

Best
Neil

>   -hpa
> 
> 
> 


[dpdk-dev] [RFC UNTESTED PATCH] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-03-24 Thread Neil Horman
On Thu, Mar 20, 2014 at 10:03:53AM -0700, H. Peter Anvin wrote:
> I just realized there is yet another oddity in this code:
> 
> > @@ -78,8 +69,10 @@ struct cpuid_parameters_t {
> >  struct feature_entry {
> > enum rte_cpu_flag_t feature;/**< feature name */
> 
> The structure contains a field with an enum value...
> 
> > char name[CPU_FLAG_NAME_MAX_LEN];   /**< String for printing */
> > -   struct cpuid_parameters_t params;   /**< cpuid parameters */
> > -   uint32_t feature_mask;  /**< bitmask for feature */
> > +   uint32_t leaf;  /**< cpuid leaf */
> > +   uint32_t subleaf;   /**< cpuid subleaf */
> > +   uint32_t reg;   /**< cpuid register */
> > +   uint32_t bit;   /**< cpuid register bit */
> >  };
> >  
> >  
> >  /*
> > @@ -240,17 +207,20 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
> >  int
> >  rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
> >  {
> > -   int value;
> > +   const struct feature_entry *feat;
> > +   cpu_registers_t regs;
> >  
> > if (feature >= RTE_CPUFLAG_NUMFLAGS)
> > /* Flag does not match anything in the feature tables */
> > return -ENOENT;
> >  
> > -   /* get value of the register containing the desired feature */
> > -   value = rte_cpu_get_features(cpu_feature_table[feature].params);
> > +   feat = &cpu_feature_table[feature];
> > +
> > +   /* get the cpuid leaf containing the desired feature */
> > +   rte_cpu_get_features(feat->leaf, feat->subleaf, ®s);
> >  
> > /* check if the feature is enabled */
> > -   return (cpu_feature_table[feature].feature_mask & value) > 0;
> > +   return (regs[feat->reg] >> feat->bit) & 1;
> >  }
> >  
> >  /**
> 
> ... however, this field is never actually accessed *anywhere* in the
> code; the code instead uses the enum value as the table index. There is
> absolutely no enforcement that the table contents is aligned with the enum.
> 
> If C99-style initializers are permitted in this codebase, I would
> strongly recommend using them, and then drop the enum field in struct
> feature_entry and use a macro such as:
> 
Actually, its a bit simpler than that, the enum parameter is actually completely
unused, and so can be removed entirely.  The FEAT_DEF macro does what you
suggest below already, but only for the feature and name fields.

I'll remove the enum and its definition, and augment the macro to cover the rest
of the fields.

Neil

> #define FEAT(name,leaf,subleaf,reg,bit) \
>   [RTE_CPUFLAG_##f] = { leaf, subleaf, reg, bit, #f },
> 
> (I'd move the string to the end, but that is just a microoptimization.
> I'm kind of OCD that way.)
> 
>   -hpa
> 
> 


[dpdk-dev] [PATCH v3] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-03-24 Thread Neil Horman
Neil Horman reported that on x86-64 the upper half of %rbx would get
clobbered when the code was compiled PIC or PIE, because the
i386-specific code to preserve %ebx was incorrectly compiled.

However, the code is really way more complex than it needs to be.  For
one thing, the CPUID instruction only needs %eax (leaf) and %ecx
(subleaf) as parameters, and since we are testing for bits, we might
as well list the bits explicitly.  Furthermore, we can use an array
rather than doing a switch statement inside a structure.

Reported-by: Neil Horman 
Signed-off-by: H. Peter Anvin 
Signed-off-by: Neil Horman 

---
Change notes:
v2) Corrected build errors
Fixed cpuid_register_t reference passing
Fixed typedef name typo

v3)
* Modified feature_entry struct to drop the name field, as its unused
* Modified cpu_feature_table to use C99 initalizers
* Updated FEAT_DEF macro to include all feature_entry fields
* Modified cpuid_reg enum to start at 1 rather than zero
* Added CPUID_REG macro to drop enum value by 1 during access
* Added check on feat->reg use to detect missing entries
* Fixed a bug in rte_cpu_check_supported in which negative errors are ignored
---
 lib/librte_eal/common/eal_common_cpuflags.c | 281 +---
 1 file changed, 134 insertions(+), 147 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
b/lib/librte_eal/common/eal_common_cpuflags.c
index 1ebf78c..9ee0490 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -53,22 +53,15 @@
  * Enumeration of CPU registers
  */
 enum cpu_register_t {
-   REG_EAX = 0,
-   REG_EBX,
+   REG_EAX = 1,
REG_ECX,
REG_EDX,
+   REG_EBX,
 };

-/**
- * Parameters for CPUID instruction
- */
-struct cpuid_parameters_t {
-   uint32_t eax;
-   uint32_t ebx;
-   uint32_t ecx;
-   uint32_t edx;
-   enum cpu_register_t return_register;
-};
+#define CPUID_REG(reg) (reg - 1)
+
+typedef uint32_t cpuid_registers_t[4];

 #define CPU_FLAG_NAME_MAX_LEN 64

@@ -76,109 +69,111 @@ struct cpuid_parameters_t {
  * Struct to hold a processor feature entry
  */
 struct feature_entry {
-   enum rte_cpu_flag_t feature;/**< feature name */
+   uint32_t leaf;  /**< cpuid leaf */
+   uint32_t subleaf;   /**< cpuid subleaf */
+   uint32_t reg;   /**< cpuid register */
+   uint32_t bit;   /**< cpuid register bit */
char name[CPU_FLAG_NAME_MAX_LEN];   /**< String for printing */
-   struct cpuid_parameters_t params;   /**< cpuid parameters */
-   uint32_t feature_mask;  /**< bitmask for feature */
 };

-#define FEAT_DEF(f) RTE_CPUFLAG_##f, #f
+#define FEAT_DEF(name, leaf, subleaf, reg, bit) \
+   [RTE_CPUFLAG_##name] = {leaf, subleaf, reg, bit, #name },

 /**
  * An array that holds feature entries
  */
 static const struct feature_entry cpu_feature_table[] = {
-   {FEAT_DEF(SSE3),  {0x1, 0, 0, 0, REG_ECX}, 0x0001},
-   {FEAT_DEF(PCLMULQDQ), {0x1, 0, 0, 0, REG_ECX}, 0x0002},
-   {FEAT_DEF(DTES64),{0x1, 0, 0, 0, REG_ECX}, 0x0004},
-   {FEAT_DEF(MONITOR),   {0x1, 0, 0, 0, REG_ECX}, 0x0008},
-   {FEAT_DEF(DS_CPL),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
-   {FEAT_DEF(VMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0020},
-   {FEAT_DEF(SMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0040},
-   {FEAT_DEF(EIST),  {0x1, 0, 0, 0, REG_ECX}, 0x0080},
-   {FEAT_DEF(TM2),   {0x1, 0, 0, 0, REG_ECX}, 0x0100},
-   {FEAT_DEF(SSSE3), {0x1, 0, 0, 0, REG_ECX}, 0x0200},
-   {FEAT_DEF(CNXT_ID),   {0x1, 0, 0, 0, REG_ECX}, 0x0400},
-   {FEAT_DEF(FMA),   {0x1, 0, 0, 0, REG_ECX}, 0x1000},
-   {FEAT_DEF(CMPXCHG16B),{0x1, 0, 0, 0, REG_ECX}, 0x2000},
-   {FEAT_DEF(XTPR),  {0x1, 0, 0, 0, REG_ECX}, 0x4000},
-   {FEAT_DEF(PDCM),  {0x1, 0, 0, 0, REG_ECX}, 0x8000},
-   {FEAT_DEF(PCID),  {0x1, 0, 0, 0, REG_ECX}, 0x0002},
-   {FEAT_DEF(DCA),   {0x1, 0, 0, 0, REG_ECX}, 0x0004},
-   {FEAT_DEF(SSE4_1),{0x1, 0, 0, 0, REG_ECX}, 0x0008},
-   {FEAT_DEF(SSE4_2),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
-   {FEAT_DEF(X2APIC),{0x1, 0, 0, 0, REG_ECX}, 0x0020},
-   {FEAT_DEF(MOVBE), {0x1, 0, 0, 0, REG_ECX}, 0x0040},
-   {FEAT_DEF(POPCNT),{0x1, 0, 0, 0, REG_ECX}, 0x0080},
-   {FEAT_DEF(TSC_DEADLINE),  {0x1, 0, 0, 0, REG_ECX}, 0x0100},
-   {FEAT_DEF(AES),   {0x1, 0, 0, 0, REG_ECX}, 0x0200},
-   {FEAT_DEF(XSAVE), {0x1, 0, 0, 0, REG_ECX}, 0x0400},
-   {FEAT_DEF(OSXSAVE),  

[dpdk-dev] [PATCH v3] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-03-24 Thread Neil Horman
On Mon, Mar 24, 2014 at 11:09:52AM -0700, H. Peter Anvin wrote:
> On 03/24/2014 10:44 AM, Neil Horman wrote:
> > * Modified cpuid_reg enum to start at 1 rather than zero
> > * Added CPUID_REG macro to drop enum value by 1 during access
> 
> I guess I don't get it... why?
> 
To add an extra sanity check in rte_get_flag_enabled.  If we were moving to the
use of C99 initalizers, I wanted something to catch the possibility that we skip
a flag by accident (i.e. leave a zero initalized hole in the array).  Except 0
from my read is a valid value for all the fields of the array.  So I bumped up
the cpuid register enum by one and wrapped it in a macro.  That way we can test
for !feat->reg as an indicator that we're requesting feature support for a flag
thats not listed in the array.
Neil

>   -hpa
> 
> 


[dpdk-dev] [PATCH v3] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-03-25 Thread Neil Horman
On Mon, Mar 24, 2014 at 01:47:55PM -0700, H. Peter Anvin wrote:
> On 03/24/2014 12:52 PM, Neil Horman wrote:
> >>
> > To add an extra sanity check in rte_get_flag_enabled.  If we were moving to 
> > the
> > use of C99 initalizers, I wanted something to catch the possibility that we 
> > skip
> > a flag by accident (i.e. leave a zero initalized hole in the array).  
> > Except 0
> > from my read is a valid value for all the fields of the array.  So I bumped 
> > up
> > the cpuid register enum by one and wrapped it in a macro.  That way we can 
> > test
> > for !feat->reg as an indicator that we're requesting feature support for a 
> > flag
> > thats not listed in the array.
> 
> It actually isn't: there aren't any flags in CPUID leaf 0, so since the
> code only looks for bits it'd be perfectly okay to reject leaf 0.
> 
> Another thing that I noted is that the code doesn't actually check that
> any particular leaf is valid (by checking the maximum leaf number in
> CPUID leaf 0x:EAX).  Especially for the leaf 7 features this
> could result in false positives, which obviously would be disastrous.
> 
Thanks, I'll improve this checking today.

>   -hpa
> 
> 
> 


[dpdk-dev] [PATCH v4] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-03-25 Thread Neil Horman
Neil Horman reported that on x86-64 the upper half of %rbx would get
clobbered when the code was compiled PIC or PIE, because the
i386-specific code to preserve %ebx was incorrectly compiled.

However, the code is really way more complex than it needs to be.  For
one thing, the CPUID instruction only needs %eax (leaf) and %ecx
(subleaf) as parameters, and since we are testing for bits, we might
as well list the bits explicitly.  Furthermore, we can use an array
rather than doing a switch statement inside a structure.

Reported-by: Neil Horman 
Signed-off-by: H. Peter Anvin 

---
Change notes:
v2) Corrected build errors
Fixed cpuid_register_t reference passing
Fixed typedef name typo

v3)
* Modified feature_entry struct to drop the name field, as its unused
* Modified cpu_feature_table to use C99 initalizers
* Updated FEAT_DEF macro to include all feature_entry fields
* Modified cpuid_reg enum to start at 1 rather than zero
* Added CPUID_REG macro to drop enum value by 1 during access
* Added check on feat->reg use to detect missing entries
* Fixed a bug in rte_cpu_check_supported in which negative errors are ignored

v4)
* Fixed sanity checks to not offset feat->reg and just check !feat->reg
* Added a check for the sanity of the leaf node
---
 lib/librte_eal/common/eal_common_cpuflags.c | 287 ++--
 1 file changed, 141 insertions(+), 146 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
b/lib/librte_eal/common/eal_common_cpuflags.c
index 1ebf78c..b61e271 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -54,21 +54,12 @@
  */
 enum cpu_register_t {
REG_EAX = 0,
-   REG_EBX,
REG_ECX,
REG_EDX,
+   REG_EBX,
 };

-/**
- * Parameters for CPUID instruction
- */
-struct cpuid_parameters_t {
-   uint32_t eax;
-   uint32_t ebx;
-   uint32_t ecx;
-   uint32_t edx;
-   enum cpu_register_t return_register;
-};
+typedef uint32_t cpuid_registers_t[4];

 #define CPU_FLAG_NAME_MAX_LEN 64

@@ -76,109 +67,111 @@ struct cpuid_parameters_t {
  * Struct to hold a processor feature entry
  */
 struct feature_entry {
-   enum rte_cpu_flag_t feature;/**< feature name */
+   uint32_t leaf;  /**< cpuid leaf */
+   uint32_t subleaf;   /**< cpuid subleaf */
+   uint32_t reg;   /**< cpuid register */
+   uint32_t bit;   /**< cpuid register bit */
char name[CPU_FLAG_NAME_MAX_LEN];   /**< String for printing */
-   struct cpuid_parameters_t params;   /**< cpuid parameters */
-   uint32_t feature_mask;  /**< bitmask for feature */
 };

-#define FEAT_DEF(f) RTE_CPUFLAG_##f, #f
+#define FEAT_DEF(name, leaf, subleaf, reg, bit) \
+   [RTE_CPUFLAG_##name] = {leaf, subleaf, reg, bit, #name },

 /**
  * An array that holds feature entries
  */
 static const struct feature_entry cpu_feature_table[] = {
-   {FEAT_DEF(SSE3),  {0x1, 0, 0, 0, REG_ECX}, 0x0001},
-   {FEAT_DEF(PCLMULQDQ), {0x1, 0, 0, 0, REG_ECX}, 0x0002},
-   {FEAT_DEF(DTES64),{0x1, 0, 0, 0, REG_ECX}, 0x0004},
-   {FEAT_DEF(MONITOR),   {0x1, 0, 0, 0, REG_ECX}, 0x0008},
-   {FEAT_DEF(DS_CPL),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
-   {FEAT_DEF(VMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0020},
-   {FEAT_DEF(SMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0040},
-   {FEAT_DEF(EIST),  {0x1, 0, 0, 0, REG_ECX}, 0x0080},
-   {FEAT_DEF(TM2),   {0x1, 0, 0, 0, REG_ECX}, 0x0100},
-   {FEAT_DEF(SSSE3), {0x1, 0, 0, 0, REG_ECX}, 0x0200},
-   {FEAT_DEF(CNXT_ID),   {0x1, 0, 0, 0, REG_ECX}, 0x0400},
-   {FEAT_DEF(FMA),   {0x1, 0, 0, 0, REG_ECX}, 0x1000},
-   {FEAT_DEF(CMPXCHG16B),{0x1, 0, 0, 0, REG_ECX}, 0x2000},
-   {FEAT_DEF(XTPR),  {0x1, 0, 0, 0, REG_ECX}, 0x4000},
-   {FEAT_DEF(PDCM),  {0x1, 0, 0, 0, REG_ECX}, 0x8000},
-   {FEAT_DEF(PCID),  {0x1, 0, 0, 0, REG_ECX}, 0x0002},
-   {FEAT_DEF(DCA),   {0x1, 0, 0, 0, REG_ECX}, 0x0004},
-   {FEAT_DEF(SSE4_1),{0x1, 0, 0, 0, REG_ECX}, 0x0008},
-   {FEAT_DEF(SSE4_2),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
-   {FEAT_DEF(X2APIC),{0x1, 0, 0, 0, REG_ECX}, 0x0020},
-   {FEAT_DEF(MOVBE), {0x1, 0, 0, 0, REG_ECX}, 0x0040},
-   {FEAT_DEF(POPCNT),{0x1, 0, 0, 0, REG_ECX}, 0x0080},
-   {FEAT_DEF(TSC_DEADLINE),  {0x1, 0, 0, 0, REG_ECX}, 0x0100},
-   {FEAT_DEF(AES),   {0x1, 0, 0, 0, REG_ECX}, 0x0200},
-   {FEAT_DEF(XSAVE), {0x1, 0, 0, 0, REG_ECX}, 0x0400},
-   {FEAT_DEF(OSXSAVE),

[dpdk-dev] [PATCH v5] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-03-25 Thread Neil Horman
Neil Horman reported that on x86-64 the upper half of %rbx would get
clobbered when the code was compiled PIC or PIE, because the
i386-specific code to preserve %ebx was incorrectly compiled.

However, the code is really way more complex than it needs to be.  For
one thing, the CPUID instruction only needs %eax (leaf) and %ecx
(subleaf) as parameters, and since we are testing for bits, we might
as well list the bits explicitly.  Furthermore, we can use an array
rather than doing a switch statement inside a structure.

Reported-by: Neil Horman 
Signed-off-by: H. Peter Anvin 
Signed-off-by: Neil Horman 

---
Change notes:
v2) Corrected build errors
Fixed cpuid_register_t reference passing
Fixed typedef name typo

v3)
* Modified feature_entry struct to drop the name field, as its unused
* Modified cpu_feature_table to use C99 initalizers
* Updated FEAT_DEF macro to include all feature_entry fields
* Modified cpuid_reg enum to start at 1 rather than zero
* Added CPUID_REG macro to drop enum value by 1 during access
* Added check on feat->reg use to detect missing entries
* Fixed a bug in rte_cpu_check_supported in which negative errors are ignored

v4)
* Fixed sanity checks to not offset feat->reg and just check !feat->reg
* Added a check for the sanity of the leaf node

v5)
* Fixed max leaf check to just return not supported rather than error
---
 lib/librte_eal/common/eal_common_cpuflags.c | 281 ++--
 1 file changed, 136 insertions(+), 145 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
b/lib/librte_eal/common/eal_common_cpuflags.c
index 1ebf78c..f9c1840 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -59,16 +59,7 @@ enum cpu_register_t {
REG_EDX,
 };

-/**
- * Parameters for CPUID instruction
- */
-struct cpuid_parameters_t {
-   uint32_t eax;
-   uint32_t ebx;
-   uint32_t ecx;
-   uint32_t edx;
-   enum cpu_register_t return_register;
-};
+typedef uint32_t cpuid_registers_t[4];

 #define CPU_FLAG_NAME_MAX_LEN 64

@@ -76,109 +67,111 @@ struct cpuid_parameters_t {
  * Struct to hold a processor feature entry
  */
 struct feature_entry {
-   enum rte_cpu_flag_t feature;/**< feature name */
+   uint32_t leaf;  /**< cpuid leaf */
+   uint32_t subleaf;   /**< cpuid subleaf */
+   uint32_t reg;   /**< cpuid register */
+   uint32_t bit;   /**< cpuid register bit */
char name[CPU_FLAG_NAME_MAX_LEN];   /**< String for printing */
-   struct cpuid_parameters_t params;   /**< cpuid parameters */
-   uint32_t feature_mask;  /**< bitmask for feature */
 };

-#define FEAT_DEF(f) RTE_CPUFLAG_##f, #f
+#define FEAT_DEF(name, leaf, subleaf, reg, bit) \
+   [RTE_CPUFLAG_##name] = {leaf, subleaf, reg, bit, #name },

 /**
  * An array that holds feature entries
  */
 static const struct feature_entry cpu_feature_table[] = {
-   {FEAT_DEF(SSE3),  {0x1, 0, 0, 0, REG_ECX}, 0x0001},
-   {FEAT_DEF(PCLMULQDQ), {0x1, 0, 0, 0, REG_ECX}, 0x0002},
-   {FEAT_DEF(DTES64),{0x1, 0, 0, 0, REG_ECX}, 0x0004},
-   {FEAT_DEF(MONITOR),   {0x1, 0, 0, 0, REG_ECX}, 0x0008},
-   {FEAT_DEF(DS_CPL),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
-   {FEAT_DEF(VMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0020},
-   {FEAT_DEF(SMX),   {0x1, 0, 0, 0, REG_ECX}, 0x0040},
-   {FEAT_DEF(EIST),  {0x1, 0, 0, 0, REG_ECX}, 0x0080},
-   {FEAT_DEF(TM2),   {0x1, 0, 0, 0, REG_ECX}, 0x0100},
-   {FEAT_DEF(SSSE3), {0x1, 0, 0, 0, REG_ECX}, 0x0200},
-   {FEAT_DEF(CNXT_ID),   {0x1, 0, 0, 0, REG_ECX}, 0x0400},
-   {FEAT_DEF(FMA),   {0x1, 0, 0, 0, REG_ECX}, 0x1000},
-   {FEAT_DEF(CMPXCHG16B),{0x1, 0, 0, 0, REG_ECX}, 0x2000},
-   {FEAT_DEF(XTPR),  {0x1, 0, 0, 0, REG_ECX}, 0x4000},
-   {FEAT_DEF(PDCM),  {0x1, 0, 0, 0, REG_ECX}, 0x8000},
-   {FEAT_DEF(PCID),  {0x1, 0, 0, 0, REG_ECX}, 0x0002},
-   {FEAT_DEF(DCA),   {0x1, 0, 0, 0, REG_ECX}, 0x0004},
-   {FEAT_DEF(SSE4_1),{0x1, 0, 0, 0, REG_ECX}, 0x0008},
-   {FEAT_DEF(SSE4_2),{0x1, 0, 0, 0, REG_ECX}, 0x0010},
-   {FEAT_DEF(X2APIC),{0x1, 0, 0, 0, REG_ECX}, 0x0020},
-   {FEAT_DEF(MOVBE), {0x1, 0, 0, 0, REG_ECX}, 0x0040},
-   {FEAT_DEF(POPCNT),{0x1, 0, 0, 0, REG_ECX}, 0x0080},
-   {FEAT_DEF(TSC_DEADLINE),  {0x1, 0, 0, 0, REG_ECX}, 0x0100},
-   {FEAT_DEF(AES),   {0x1, 0, 0, 0, REG_ECX}, 0x0200},
-   {FEAT_DEF(XSAVE), {0x1, 0, 0, 0, REG_ECX}, 0x040

[dpdk-dev] [PATCH] pcap: remove test for PCAP_CAN_SEND

2014-03-28 Thread Neil Horman
The libpcap library has had the ability to send packets since 2004, theres
really no need to test for it.  Especially in the way dpdk is doing as, as
according to the libpcap git tree pcap_sendpacket has never been a #define, and
dpdk tests for its existance with an #ifdef.  Its easier just to remove the test
entirely

Signed-off-by: Neil Horman 
---
 lib/librte_pmd_pcap/rte_eth_pcap.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c 
b/lib/librte_pmd_pcap/rte_eth_pcap.c
index fbafd19..fe94a79 100644
--- a/lib/librte_pmd_pcap/rte_eth_pcap.c
+++ b/lib/librte_pmd_pcap/rte_eth_pcap.c
@@ -217,7 +217,6 @@ eth_pcap_tx_dumper(void *queue,
return num_tx;
 }

-#ifdef PCAP_CAN_SEND
 /*
  * Callback to handle sending packets through a real NIC.
  */
@@ -248,17 +247,6 @@ eth_pcap_tx(void *queue,
tx_queue->err_pkts += nb_pkts - num_tx;
return num_tx;
 }
-#else
-static uint16_t
-eth_pcap_tx(__rte_unused void *queue,
-   __rte_unused struct rte_mbuf **bufs,
-   __rte_unused uint16_t nb_pkts)
-{
-   RTE_LOG(ERR, PMD, "pcap library cannot send packets, please rebuild "
- "with a more up to date libpcap\n");
-   return -1;
-}
-#endif

 static int
 eth_dev_start(struct rte_eth_dev *dev)
-- 
1.8.3.1



[dpdk-dev] [PATCH] pcap: remove test for PCAP_CAN_SEND

2014-03-29 Thread Neil Horman
On Sat, Mar 29, 2014 at 11:34:46AM +0100, Thomas Monjalon wrote:
> Hi Neil,
> 
> 28/03/2014 21:32, Neil Horman :
> > The libpcap library has had the ability to send packets since 2004, theres
> > really no need to test for it.  Especially in the way dpdk is doing as, as
> > according to the libpcap git tree pcap_sendpacket has never been a #define,
> > and dpdk tests for its existance with an #ifdef.  Its easier just to remove
> > the test entirely
> > 
> > Signed-off-by: Neil Horman 
> 
> A similar patch is already applied on HEAD:
>   http://dpdk.org/browse/dpdk/commit/?id=2a315d698510e7b33
> 
My bad, thanks!
Neil

> -- 
> Thomas
> 


[dpdk-dev] [PATCH v2 0/4] recipes for RPM packages

2014-05-01 Thread Neil Horman
On Thu, May 01, 2014 at 08:53:02AM +0200, Thomas Monjalon wrote:
> 2014-04-30 11:22, Neil Horman:
> > On Wed, Apr 30, 2014 at 01:09:38PM +0200, Thomas Monjalon wrote:
> > > The 4 spec files are used to build 4 different git trees with their own
> > > versioning:
> > >   http://dpdk.org/browse
> > > 
> > > So I think it's saner to keep them in their repository.
> [...]
> > 
> > Yeah, if they're separate git trees, they can be separate specs.  That said
> > though, it strongly begs the question as to why you are keeping open source
> > pmds outside of the dpdk library?  That really doesn't make much sense, 
> > whats preventing that integration (followed by the integration of the spec
> > files)?
> 
> These extensions have their own versioning.
That doesn't seem to be a reason to keep them separately, in fact if anything
its a reason to merge them so that versioning can be merged.

> They include PMD but also kernel modules (memnic and vmxnet3-usermap).
Thats nothing new.  The DPDK houses several PMD's that require kernel modules
which are stored as part of the DPDK source tree, and built with it

> In case of memnic, the kernel module is an alternative to DPDK PMD. So there 
> is no good reason to integrate it in DPDK.
I don't see what you're saying here.  Just because a given pmd offers an
alternate implementation to simmilar functionality isn't  reason to keep them
separate, its a reason to bring them together.  Users interested in one may well
be interested in the other, and keeping them maintained together offers the
opportunity to merge functionaty more readily.  Regardless of being maintained
in one tree or two, they still offer the user the same thing, by maintaining
them in the same tree you just offer the user a more convienient choice.

> And it's better to host both 
> drivers together in order to keep coherency and share some resources.
Thats a reason to host them in the same tree, not just co-located on the same
server.

> Extensions can also be a place to host some test applications related to its 
> PMD.
> 
Once again, you already do this for the pmd's integrated to the dpdk in the
examples directory, why not do it for the external pmds that you're also
hosting?

> If you see DPDK as a framework, it's really logical to have repositories 
> hosting some projects which are (partly) using the framework.
> 
By your reasoning, if I see DPDK as a framework, none of the PMDs should be
integrated to the dpdk core repository because none of them use every aspect of
the library.  You could certainly do this, and it would be an ok organization,
but it would be a maintenece nightmare, because to update something in the core
library that affected the pmd's would necessitate cloning N git trees for all
the supported PMDs and updating them separately. No new contributors are going
to want that headache. 

All I'm saying here is, you've got several PMD's that are meant to be used with
(and only with) the DPDK, you co-host them on the same git server, their
licensing is compatible/identical, and you're maintaining them.  You're 95% of
the way there, go the extra 5% and integrate them.  What you have currently is
effectively 3 out of tree modules for your library.  As with any out of tree
module, you'll find that, as you grow in contributors maintenece will lag on
those modules, because contibutors wont know (or won't care) to go update the
additional git trees.  It effectively marks them as second class citizens.

Neil



[dpdk-dev] [PATCH v2 0/4] recipes for RPM packages

2014-05-01 Thread Neil Horman
On Wed, Apr 30, 2014 at 02:46:41AM +0200, Thomas Monjalon wrote:
> The goal of this patch serie is to be able to package DPDK
> for RPM-based distributions.
> 
> The file naming currently doesn't allow to install different DPDK versions.
> But the packaging naming should be ready to manage different DPDK versions
> having different API/ABI for different applications:
>   - dpdk-core has full version in its name to manage API breaking
>   - extensions have a number as name suffix to manage PMD API breaking.
> When API/ABI will be stable, package names could be simpler.
> 
> I suggest to add these .spec files as a starting point for integration
> in Linux distributions.
> 
> Changes since v1:
>   - name of .spec file match package name
>   - version in package name
>   - no static library
>   - ldconfig/depmod in scriplets
> 
> Thanks for your comments/reviews.
> -- 
> Thomas
> 


I understand that this is holding up the 1.6.0r2 release, as well as the 1.7.0
integration.  As such, given that my concerns, while valid IMO, aren't required
for the release:

Acked-by: Neil Horman 



[dpdk-dev] [PATCH v2] mk: fix build ignoring other installed versions

2014-05-01 Thread Neil Horman
On Wed, Apr 30, 2014 at 10:57:00PM +0200, Thomas Monjalon wrote:
> If some DPDK libraries are installed on the system, the linker was trying
> to use them before searching in -L path.
> The obscure reason is that we were prefixing -L with -Wl, to pass it
> directly to the linker.
> But -L is also a gcc option. And allowing gcc to process this option fixes
> the issue.
> 
> Signed-off-by: Thomas Monjalon 

Acked-by: Neil Horman 



[dpdk-dev] [PATCH 01/15 v2] makefiles: Fixed -share command line option error

2014-05-02 Thread Neil Horman
On Wed, Apr 30, 2014 at 01:42:12AM +0200, Thomas Monjalon wrote:
> 2014-04-16 09:51, Neil Horman:
> > The shared libraries built with the current makefile set produce static
> > libraries rather than actual shared objects.  This is due to several missing
> > options that are required to correctly build shared objects using ld, as
> > well as a mis-specified -share option (which should be -shared). Switching
> > to the use of CC rather than LD and fixing the -shared option corrects
> > these problems and builds the DSOs correctly.
> > 
> > Signed-off-by: Neil Horman 
> 
> Applied for version 1.6.0r2.
> 
> Thanks
> -- 
> Thomas
> 

So, I just went and looked at 1.6.0r2 and noted that you applied this patch, but
the rest of the series is still missing.  This is what I was talking about
earlier when I said you weren't applying patch series atomically.  It makes it
impossible to have any clue what the upstream development head is going to look
like.  On top of that, since you're clearly integrating other changes ahead of
this, theres every likelyhood the rest of my v5 series won't apply.

the v5 series has sat out here for a few weeks now without comment. If there
aren't any objections to it, apply it.  Whats the problem here?  I'm not going
to package the DPDK until this series (or the functionality it offers) is in
place.

Neil



[dpdk-dev] [PATCH 01/15 v2] makefiles: Fixed -share command line option error

2014-05-02 Thread Neil Horman
On Fri, May 02, 2014 at 02:22:17PM +0200, Thomas Monjalon wrote:
> 2014-05-02 07:09, Neil Horman:
> > On Wed, Apr 30, 2014 at 01:42:12AM +0200, Thomas Monjalon wrote:
> > > 2014-04-16 09:51, Neil Horman:
> > > > The shared libraries built with the current makefile set produce static
> > > > libraries rather than actual shared objects.  This is due to several
> > > > missing options that are required to correctly build shared objects
> > > > using ld, as well as a mis-specified -share option (which should be
> > > > -shared). Switching to the use of CC rather than LD and fixing the
> > > > -shared option corrects these problems and builds the DSOs correctly.
> > > > 
> > > > Signed-off-by: Neil Horman 
> > > 
> > > Applied for version 1.6.0r2.
> > 
> > So, I just went and looked at 1.6.0r2 and noted that you applied this patch,
> > but the rest of the series is still missing.  This is what I was talking
> > about earlier when I said you weren't applying patch series atomically.  It
> > makes it impossible to have any clue what the upstream development head is
> > going to look like.  On top of that, since you're clearly integrating other
> > changes ahead of this, theres every likelyhood the rest of my v5 series
> > won't apply.
> > 
> > the v5 series has sat out here for a few weeks now without comment. If there
> > aren't any objections to it, apply it.  Whats the problem here?  I'm not
> > going to package the DPDK until this series (or the functionality it
> > offers) is in place.
> 
> This patch is clearly an important fix. So I took it for release 1.6.0r2.
> The other patches of the serie are enhancements which will be in 1.7.0.
> 
> The goal is to change the integration model.
> Now we'll stop integrating enhancements and big changes when first release 
> candidate is out. Then it will be clear that only fixes and mandatory changes 
> will be integrated in the last part of the release cycle.
> 
> I hope you agree we're improving the workflow.

Apologies to you Thomas, and the rest of the 6wind list.  He just explained to
me that patch applications ni 1.6.0r2 aren't inherited by 1.7.0 so the entire
patch series will need to be reapplied to 1.7.0.  The workflow is atypical to
me, but I should have seen that, given there is no master branch.  Sorry for the
outburst.
Neil

> -- 
> Thomas
> 


[dpdk-dev] [PATCH 5/5] add FILE arguement to debug functions

2014-05-04 Thread Neil Horman
On Fri, May 02, 2014 at 04:42:56PM -0700, Stephen Hemminger wrote:
> The DPDK dump functions are useful for remote debugging of an
> applications. But when application runs as a daemon, stdout
> is typically routed to /dev/null.
> 
> Instead change all these functions to take a stdio FILE * handle
> instead. An application can then use open_memstream() to capture
> the output.
> 
> Signed-off-by: Stephen Hemminger 
> 
Why not convert these to rte_log calls?  Seems like we already have the
infrastrucutre here, we just need to use it.
Neil



[dpdk-dev] [PATCH 5/5] add FILE arguement to debug functions

2014-05-05 Thread Neil Horman
On Sun, May 04, 2014 at 01:17:50PM -0700, Stephen Hemminger wrote:
> On Sun, 4 May 2014 08:20:54 -0400
> Neil Horman  wrote:
> 
> > On Fri, May 02, 2014 at 04:42:56PM -0700, Stephen Hemminger wrote:
> > > The DPDK dump functions are useful for remote debugging of an
> > > applications. But when application runs as a daemon, stdout
> > > is typically routed to /dev/null.
> > > 
> > > Instead change all these functions to take a stdio FILE * handle
> > > instead. An application can then use open_memstream() to capture
> > > the output.
> > > 
> > > Signed-off-by: Stephen Hemminger 
> > > 
> > Why not convert these to rte_log calls?  Seems like we already have the
> > infrastrucutre here, we just need to use it.
> > Neil
> > 
> 
> Because it is useful to have remote console like functionatlity,
> and dumping this to log doesn't work for that.
> 
Why not?  Looking at it it seems to me that you could get this exact same
functionality by calling rte_openlog_stream(stdout);

Neil



[dpdk-dev] [PATCH 5/5] add FILE arguement to debug functions

2014-05-06 Thread Neil Horman
On Mon, May 05, 2014 at 06:55:31PM -0700, Stephen Hemminger wrote:
> On Mon, 5 May 2014 06:53:09 -0400
> Neil Horman  wrote:
> 
> > On Sun, May 04, 2014 at 01:17:50PM -0700, Stephen Hemminger wrote:
> > > On Sun, 4 May 2014 08:20:54 -0400
> > > Neil Horman  wrote:
> > > 
> > > > On Fri, May 02, 2014 at 04:42:56PM -0700, Stephen Hemminger wrote:
> > > > > The DPDK dump functions are useful for remote debugging of an
> > > > > applications. But when application runs as a daemon, stdout
> > > > > is typically routed to /dev/null.
> > > > > 
> > > > > Instead change all these functions to take a stdio FILE * handle
> > > > > instead. An application can then use open_memstream() to capture
> > > > > the output.
> > > > > 
> > > > > Signed-off-by: Stephen Hemminger 
> > > > > 
> > > > Why not convert these to rte_log calls?  Seems like we already have the
> > > > infrastrucutre here, we just need to use it.
> > > > Neil
> > > > 
> > > 
> > > Because it is useful to have remote console like functionatlity,
> > > and dumping this to log doesn't work for that.
> > > 
> > Why not?  Looking at it it seems to me that you could get this exact same
> > functionality by calling rte_openlog_stream(stdout);
> > 
> > Neil
> > 
> 
> If you have a remote console (per connection) and a thread handling
> those requests you want to direct output of a command like 'show mempool'
> to respond to that request. An existing file descriptor don't work they are 
> global
> no per thread. There is a difference between log and response to request.
> 
ah, sorry, wasn't considering the multiuser case
Neil

> 


[dpdk-dev] compilation error with 1.6.0r2

2014-05-12 Thread Neil Horman
On Mon, May 12, 2014 at 11:11:56AM +0200, Thomas Monjalon wrote:
> Hi,
> 
> Your email is invalid because of a "proprietary disclaimer".
> 
> Please stop sending such emails.
> 
> 
Except that this totally happens if you set RHEL_RELEASE_CODE to somthing larger
than (6,4).  ethtool_adv_to_mmd_eee_adv_t and its companion functions protected
by that if statement in kcompat.h are never declared in user space, as they are
not included in the uapi/mdio.h header, on in the kernel-specific mdio.h header.
Before you add a clause to include support for 6.5 and later, you need to remove
that if conditional and declare those functions unilaterally for any relase
beyond where it was defined in the kernel.

Neil

> 2014-05-12 14:15, Prashant Upadhyaya:
> > Hi,
> > 
> > I recently picked the 1.6.0r2 from dpdk.org and tried to compile it the
> > usual way and ran into the following compilation error. I am aware I can
> > sidestep these by getting the compiler to treat them as warnings, but these
> > did not use to come with 1.6.0r1 so wanted to report it here. I am using
> > Fedora 18 to compile this version of DPDK.
> > 
> > Regards
> > -Prashant
> > 
> > 
> > [root at localhost ~]# cd dpdk-1.6.0r2/
> > [root at localhost dpdk-1.6.0r2]# make install T=x86_64-default-linuxapp-gcc
> > == Installing x86_64-default-linuxapp-gcc
> > == Build scripts
> > == Build scripts/testhost
> > == Build lib
> > == Build lib/librte_eal
> > == Build lib/librte_eal/common
> > == Build lib/librte_eal/linuxapp
> > == Build lib/librte_eal/linuxapp/igb_uio
> >   Building modules, stage 2.
> >   MODPOST 1 modules
> > == Build lib/librte_eal/linuxapp/eal
> > == Build lib/librte_eal/linuxapp/kni
> >   CC [M] 
> > /root/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxap
> > p/kni/igb_ethtool.o
> > /root/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxap
> > p/kni/igb_ethtool.c: In function 'igb_get_eee':
> > /root/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxap
> > p/kni/igb_ethtool.c:2441:4: error: implicit declaration of function
> > 'mmd_eee_adv_to_ethtool_adv_t' [-Werror=implicit-function-declaration]
> > /root/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxap
> > p/kni/igb_ethtool.c: In function 'igb_set_eee':
> > /root/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxap
> > p/kni/igb_ethtool.c:2551:2: error: implicit declaration of function
> > 'ethtool_adv_to_mmd_eee_adv_t' [-Werror=implicit-function-declaration] cc1:
> > all warnings being treated as errors
> > make[10]: ***
> > [/root/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxa
> > pp/kni/igb_ethtool.o] Error 1 make[9]: ***
> > [_module_/root/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc/build/lib/librte_ea
> > l/linuxapp/kni] Error 2 make[8]: *** [sub-make] Error 2
> > make[7]: *** [rte_kni.ko] Error 2
> > make[6]: *** [kni] Error 2
> > make[5]: *** [linuxapp] Error 2
> > make[4]: *** [librte_eal] Error 2
> > make[3]: *** [lib] Error 2
> > make[2]: *** [all] Error 2
> > make[1]: *** [x86_64-default-linuxapp-gcc_install] Error 2
> > make: *** [install] Error 2
> > [root at localhost dpdk-1.6.0r2]#
> > 
> > 
> > 
> > "DISCLAIMER: This message is proprietary to Aricent and is intended solely
> > for the use of the individual to whom it is addressed. It may contain
> > privileged or confidential information and should not be circulated or used
> > for any purpose other than for what it is intended. If you have received
> > this message in error, please notify the originator immediately. If you are
> > not the intended recipient, you are notified that you are strictly
> > prohibited from using, copying, altering, or disclosing the contents of
> > this message. Aricent accepts no responsibility for loss or damage arising
> > from the use of the information transmitted by this email including damage
> > from virus."
> 
> 


[dpdk-dev] [PATCH RFC 06/11] mbuf: replace data pointer by an offset

2014-05-12 Thread Neil Horman
On Mon, May 12, 2014 at 02:36:12PM +, Venkatesan, Venky wrote:
> Olivier, 
> 
> This is a hugely problematic change, and has a pretty large  performance 
> impact (because the dependency to compute and access). We debated this for a 
> long time during the early days of DPDK and decided against it. This is also 
> a repeated sequence - the driver will do it twice (Rx + Tx) and the next 
> level stack will do it twice (Rx + Tx) ... 
> 
> My vote is to reject this change particular change to the mbuf. 
> 
> Regards, 
> -Venky
> 
Do you have perforamance numbers to compare throughput with and without this
change?  I always feel suspcious when I see the spectre of performane used to
support or deny a change without supporting reasoning or metrics.

Neil

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Monday, May 12, 2014 7:13 AM
> To: Olivier Matz
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH RFC 06/11] mbuf: replace data pointer by an 
> offset
> 
> Hi Olivier,
> 
> 2014-05-09 16:50, Olivier Matz:
> > The mbuf structure already contains a pointer to the beginning of the 
> > buffer (m->buf_addr). It is not needed to use 8 bytes again to store 
> > another pointer to the beginning of the data.
> > 
> > Using a 16 bits unsigned integer is enough as we know that a mbuf is 
> > never longer than 64KB. We gain 6 bytes in the structure thanks to 
> > this modification.
> > 
> > Signed-off-by: Olivier Matz 
> [...]
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -132,6 +132,13 @@ struct rte_mbuf {
> > void *buf_addr;   /**< Virtual address of segment buffer. */
> > uint64_t buf_physaddr:48; /**< Physical address of segment buffer. */
> > uint64_t buf_len:16;  /**< Length of segment buffer. */
> > +
> > +   /* valid for any segment */
> > +   struct rte_mbuf *next;/**< Next segment of scattered packet. */
> > +   uint16_t data_off;
> > +   uint16_t data_len;/**< Amount of data in segment buffer. */
> > +   uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
> > +
> >  #ifdef RTE_MBUF_REFCNT
> > /**
> >  * 16-bit Reference counter.
> > @@ -142,36 +149,30 @@ struct rte_mbuf {
> >  * config option.
> >  */
> > union {
> > -   rte_atomic16_t refcnt_atomic;   /**< Atomically accessed refcnt 
> > */
> > -   uint16_t refcnt;/**< Non-atomically accessed 
> > refcnt 
> */
> > +   rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
> > +   uint16_t refcnt;  /**< Non-atomically accessed refcnt */
> > };
> >  #else
> > -   uint16_t refcnt_reserved; /**< Do not use this field */
> > +   uint16_t refcnt_reserved; /**< Do not use this field */
> >  #endif
> > 
> > -   uint16_t ol_flags;/**< Offload features. */
> > -   uint32_t reserved; /**< Unused field. Required for padding. 
> */
> > -
> > -   /* valid for any segment */
> > -   struct rte_mbuf *next;  /**< Next segment of scattered packet. */
> > -   void* data; /**< Start address of data in segment buffer. */
> > -   uint16_t data_len;  /**< Amount of data in segment buffer. */
> > -
> > /* these fields are valid for first segment only */
> > -   uint8_t nb_segs;/**< Number of segments. */
> > -   uint8_t in_port;/**< Input port. */
> > -   uint32_t pkt_len;   /**< Total pkt len: sum of all segment data_len.
> > */ +uint8_t nb_segs;  /**< Number of segments. */
> > +   uint8_t in_port;  /**< Input port. */
> > +   uint16_t ol_flags;/**< Offload features. */
> > +   uint16_t reserved;/**< Unused field. Required for padding. */
> > 
> > /* offload features, valid for first segment only */
> > union rte_vlan_macip vlan_macip;
> > union {
> > -   uint32_t rss;   /**< RSS hash result if RSS enabled */
> > +   uint32_t rss; /**< RSS hash result if RSS enabled */
> > struct {
> > uint16_t hash;
> > uint16_t id;
> > -   } fdir; /**< Filter identifier if FDIR enabled */
> > -   uint32_t sched; /**< Hierarchical scheduler */
> > -   } hash; /**< hash information */
> > +   } fdir;   /**< Filter identifier if FDIR enabled */
> > +   uint32_t sched;   /**< Hierarchical scheduler */
> > +   } hash;   /**< hash information */
> > +   uint64_t reserved2;   /**< Unused field. Required for padding. */
> >  } __rte_cache_aligned;
> 
> There are some cosmetic changes mixed with real changes.
> It make hard to read them.
> Please split this patch.
> 
> --
> Thomas
> 


[dpdk-dev] [PATCH RFC 06/11] mbuf: replace data pointer by an offset

2014-05-12 Thread Neil Horman
On Mon, May 12, 2014 at 04:06:23PM +, Venkatesan, Venky wrote:
> Olivier, 
> 
> The impact isn't going to be felt on the driver quite as much (and can be 
> mitigated) - the driver runs a pretty low IPC (~1.7) compared to some of the 
> more optimized code above it that actually accesses the data. The problem 
> with the dependent compute is like this - in effect you are changing 
> 
> struct eth_hdr * eth = (struct eth_hdr *) m->data;
> to 
> struct eth_hdr * eth = (struct eth_hdr *) ( (char *)m->buf _addr + 
> m->data_offset);
> 
> We have some code that actually processes 4-8 packets in parallel (parse + 
> hash), with a pretty high IPC. What we've done here is essentially replaced 
> is a simple load, with  a load, load, add sequence in front of it. There is 
> no real way to do these computations in parallel for multiple packets - it 
> has to be done one or two at a time. What suffers is the IPC of the overall 
> function that does the parse/hash quite significantly. It's those functions 
> that I worry about more than the driver.  I haven't yet been able to come up 
> with a mitigation for this yet. 
> 
> Neil, 
> 
> The last time we looked at this change - and it's been a while ago, the 
> negative effect on the upper level functions built on this was on the order 
> of about 15-20%. It's probably will get worse once we tune the code even 
> more.  Hope the above explanation gives you a flavour of the problem this 
> will introduce. 
> 
I'm sorry, it doesnt.  I take you at your word that it was a problem, but I
don't think we can just categorically deny patches based on past testing of
potentially simmilar code, especially given that this series attempts to improve
some traffic patten via the implementation TSO (meaning the net result will be
different based on the use case).  

I understand what your saying above, that this code incurs a second load
operation (though I would think they could be implemented in parallel, or at the
very least accelerated by clever placement of data_offset relative to buf_addr
to ensure that the second load was cache hot).

Regardless, my point is, just saying that this can't be done because you saw a
performance hit with something simmilar in the past, isn't helpful.  If you
think thats a problem, then we really need to get details of your test case and
measurements you took so that they can be reproduced, and confirmed or refuted.

Regards
Neil.

> Regards, 
> -Venky
> 
> 
> 
> 
> -Original Message-
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com] 
> Sent: Monday, May 12, 2014 8:07 AM
> To: Neil Horman; Venkatesan, Venky
> Cc: Thomas Monjalon; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH RFC 06/11] mbuf: replace data pointer by an 
> offset
> 
> Hi Venky,
> 
> On 05/12/2014 04:41 PM, Neil Horman wrote:
> >> This is a hugely problematic change, and has a pretty large 
> >> performance impact (because the dependency to compute and access). We 
> >> debated this for a long time during the early days of DPDK and 
> >> decided against it. This is also a repeated sequence - the driver 
> >> will do it twice (Rx + Tx) and the next level stack will do it twice 
> >> (Rx + Tx) ...
> >>
> >> My vote is to reject this change particular change to the mbuf.
> >>
> >> Regards,
> >> -Venky
> >>
> > Do you have perforamance numbers to compare throughput with and 
> > without this change?  I always feel suspcious when I see the spectre 
> > of performane used to support or deny a change without supporting reasoning 
> > or metrics.
> 
> I agree with Neil. My feeling is that it won't impact performance, and it is 
> correlated with the forwarding tests I've done with this patch.
> 
> I don't really understand what would cost more by storing the offset instead 
> of the virtual address. I agree that each time the stack will access to the 
> begining of the mbuf, there will be an arithmetic operation, but it is 
> compensated by other operations that will be
> accelerated:
> 
> - When receiving a packet, the driver will do:
> 
>  m->data_off = RTE_PKTMBUF_HEADROOM;
> 
>instead of:
> 
>  m->data = (char*) rxm->buf_addr + RTE_PKTMBUF_HEADROOM;
> 
> - Each time the stack will prepend data, it has to check if the headroom
>is large enough to do the operation. This will be faster as data_off
>is the headroom.
> 
> - When transmitting a packet, the driver will get the physical address:
> 
>  phys_addr = m->buf_physaddr + m->data_off
> 
>instead of:
> 
>  phys_addr = (m->buf_physaddr +  \
>  ((char *)m->data - (char *)m->buf_addr)))
> 
> Moreover, these operations look negligible to me (few cycles) compared to the 
> large amount of arithmetic operations and tests done in the driver.
> 
> Regards,
> Olivier
> 


[dpdk-dev] Heads up: Fedora packaging plans

2014-05-13 Thread Neil Horman
Hey all-
This isn't really germaine to dpdk development, but Thomas and Vincent,
you expressed interest in my progress regarding packaging of dpdk for Fedora, so
I figured I would post here in case others were interested.

Please find here:
http://people.redhat.com/nhorman/dpdk-1.7.0-0.1.gitb20539d68.src.rpm

My current effort to do so.  I've made some changes from the stock spec file
included in dpdk:

* Modified the version and release values to be separate from the name.  I did
some reading on requirements for packaging and it seems we can be a bit more lax
with ABI version on a pre-release I think, so I setup the N-V-R to use
pre-release conventions, which makes sense, give that this is a 1.7.0
pre-release.  The git tag on the relase value will get bumped as we move forward
in the patch series.

* Added config files to match desired configs for Fedora (i.e. disabled PMD's
that require out of tree kernel modules

* Removed Packager tag (Fedora doesn't use those)

* Moved the package target directories to include N-V of the package in the path
names.  This allows for multiple versions of the dpdk to be installed in
parallel (I.e. dpdk-1.7.0 files are in /lib/dpdk-1.7.0, /usr/include/dpdk-1.7.0,
etc).  This is how java packages allow for multiple version installs, and makes
sense given ABI instability in dpdk.  It will require that developers add some
-I / -L paths to their makefiles to pull the proper version, but I think thats a
fair tradeoff.

My plan is to go through the review process with this package, and update to
tagged 1.7.0 as soon as its ready.  

Neil



[dpdk-dev] Heads up: Fedora packaging plans

2014-05-14 Thread Neil Horman
On Wed, May 14, 2014 at 12:46:38AM +0200, Vincent JARDIN wrote:
> Neil,
> 
> > Please find here:
> >   http://people.redhat.com/nhorman/dpdk-1.7.0-0.1.gitb20539d68.src.rpm
> 
> Your spec file is broken:
> 
> A simple patch:
> --- dpdk.spec 2014-05-13 22:43:15.89200 +
> +++ ../dpdk.spec.vj   2014-05-13 22:42:40.22100 +
> @@ -75,7 +75,7 @@
> 
>  %build
>  make O=%{target} T=%{target} config
> -sed -ri 's,(RTE_MACHINE=).*,\1%{machine},  %{target}/.config
> +sed -ri 's,(RTE_MACHINE=).*,\1%{machine},' %{target}/.config
>  sed -ri 's,(RTE_APP_TEST=).*,\1n,' %{target}/.config
>  sed -ri 's,(RTE_BUILD_SHARED_LIB=).*,\1y,' %{target}/.config
>  make O=%{target} %{?_smp_mflags}
> 
> Best regards,
>   Vincent
> 
Thank you, that should actually be entirely removed though.  We don't need to be
altering the configuration at all since I dropped in complete new config files
as sources .



[dpdk-dev] [PATCH v5 00/14] dpdk: Separate compile time linkage between eal lib and pmd's

2014-05-16 Thread Neil Horman
On Mon, Apr 21, 2014 at 10:59:25AM -0400, Neil Horman wrote:
> Disconnect compile time linkage between eal library / applications and pmd's
> 
> I noticed that, while tinkering with dpdk, building for shared libraries still
> resulted in all the test applications linking to all the built pmd's, despite
> not actually needing them all.  We are able to tell an application at run time
> (via the -d/--blacklist/--whitelist/--vdev options) which pmd's we want to 
> use,
> and so have no need to link them at all. The only reason they get pulled in is
> because rte_eal_non_pci_init_etherdev and rte_pmd_init_all contain static 
> lists
> to the individual pmd init functions. The result is that, even when building 
> as
> DSO's, we have to load all the pmd libraries, which is space inefficient and
> defeating of some of the purpose of shared objects.
> 
> To correct this, I developed this patch series, which introduces a new macro,
> PMD_REGISTER_DRIVER, which wraps up Oliviers work using constructors on the
> virtual device pmds, then expands on it to include the physical device pmds,
> allowing us to break linkages between dpdk applications and pmd's almost
> entirely (save for the ring and xenvirt drivers, which have additional api's
> outside of the standard dpdk code that we need to further fix).  This also
> allows us to completely remove the rte_pmd_init_all routine, hiding its 
> function
> internally to the rte_eal_init path.
> 
> I've tested this feature using the igb and pcap pmd's, both statically and
> dynamically linked with the test and testpmd sample applications, and it seems
> to work well.
> 
> Note, I encountered  a few bugs along the way, which I fixed and noted in the
> series.
> 
> Regards
> Neil
> 
> Change Notes:
> 
> 1) Reposted the entire series.  Recent changes permeated accross several
> patches, and since a few patches already had multiple versions, I've reposted
> the entire series and bumped the version number to 5, whcih skips a few
> versions, but ensures this series is seen as the most recent.
> 
> 2) Merged the PMD_REGISTER_DRIVER macro into rte_dev.h, which is a better
> location for it.  Required removing include of rte_pmd.h across most of the
> patches in the series.
> 
> 3) Cleaned up various leftover "virtual" comments in the driver registration 
> api
> that no longer belong there after making it generic
> 
> 4) Fixed the LINK_USING_CC check in rte.lib.mk so it works like other uses
> 
> 5) Fixed CPU_LDFLAGS conversion to use -Wl in case we link with gcc.
> 
> 
> 

Ping, so whats the status on the rest of this series?  You said you would
integrate it with 1.7.0, but you've applied several dozen patches ahead of it
(many of which were posted after it), and as a result, it no longer applies.  I
assume you're planning on fixing all of this up?

Neil



[dpdk-dev] [PATCH 0/3] ring: provide rte_ring_as_ethdev API

2014-05-16 Thread Neil Horman
On Fri, May 16, 2014 at 07:15:11PM +0100, Bruce Richardson wrote:
> This patch set aims to provide a shorter simpler alternative the public API 
> functions for using rings as ethdevs provided by the librte_pmd_ring library. 
> This alternative just provides simple RX and TX burst functions and a 
> conversion API, without any of the complexities present in the pmd_ring 
> version. This replacement should allow the public APIs in the pmd_ring 
> library to be deprecated in the future.
> 
> Bruce Richardson (3):
>   ethdev: Remove ethdev.h dependency on mbuf +  mempool
>   ring: add support for converting a ring to ethdev
>   ring: autotest for using ring as ethdev
> 
>  app/test-pmd/cmdline.c  |  1 +
>  app/test/test_pmd_ring.c|  1 +
>  app/test/test_ring.c| 25 
>  lib/librte_ether/rte_ethdev.h   |  4 +++-
>  lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c |  1 +
>  lib/librte_ring/Makefile|  1 +
>  lib/librte_ring/rte_ring.c  | 42 
> +
>  lib/librte_ring/rte_ring.h  | 11 +
>  8 files changed, 85 insertions(+), 1 deletion(-)
> 
> -- 
> 1.9.0
> 
> 
NAK, I don't think this makes sense.  If you want to encapsulate a ring pair as
an ethdev, then write a pmd that does so.  That will give you a standardized
ethdev that you can create using the existing --vdev librte_eal command line
options without having to widen your API surface, or having to write
applications that specifically know about the fact that your ethdev is composed
of rings under the covers.

Neil



[dpdk-dev] Heads up: Fedora packaging plans

2014-05-19 Thread Neil Horman
On Mon, May 19, 2014 at 12:11:35PM +0200, Thomas Monjalon wrote:
> Hi Neil,
> 
> Thanks for sharing your progress.
> 
No worries.

> My main concerns are about naming and extensions.
> We must keep "dpdk-core" naming in order to distinguish it from PMD 
> extensions.
I don't see why.  We can name packages whatever we want, as long as the spec and
srpm share the same name. It seems to me that the core should be the base name
of the package while the extensions should have some extension on their name.

> And then, packaging of memnic and non-uio paravirtualization PMDs 
> (virtio/vmxnet3) are missing.
> 
They're in separate repositories, I was planning on packaging them at a later
time separately, since their versioning and development is handled separately.

> 2014-05-13 15:08, Neil Horman:
> > My current effort to do so.  I've made some changes from the stock spec file
> > included in dpdk:
> 
> We should try to get .spec for Fedora and in-tree .spec as common as possible.
> There are probably some things to push.
> 
Ok, sure, just keep in mind that different distributions have different
packaging requirements that may affect the contents of the spec file, and so
attaining parity may not be possible (or even worthwhile).

> > * Modified the version and release values to be separate from the name.  I
> > did some reading on requirements for packaging and it seems we can be a bit
> > more lax with ABI version on a pre-release I think, so I setup the N-V-R to
> > use pre-release conventions, which makes sense, give that this is a 1.7.0
> > pre-release.  The git tag on the relase value will get bumped as we move
> > forward in the patch series.
> 
> I thought that we should put version in the name, in order to be able to 
> install many versions together. How is it handled by yum?
> 
So, I spent some time thinking about this, and I _really_ want to avoid the
inclusion of a version with the package name.  Doing so, while it allows yum to
install multiple versions side-by-side, is a real overhead for me, as it
requires that I go through a new pacakge review process for each released
version that we want to package.  I do not have time to do that.  If someone
from 6wind or intel wants to get involved in the Packaging process we can look
at that as a solution, but while I'm doing it, its really just too much
overhead.  This method will allow multiple version to be installed side by side
as well.  The tradeoff is that yum doesn't directly allow that, as it will just
preform an upgrade.  The multiple version solution will require that you
download older versions and install them directly using rpm commands.  I think
thats a fair tradeoff.

> > * Added config files to match desired configs for Fedora (i.e. disabled
> > PMD's that require out of tree kernel modules
> 
> It would be clearer to make your configuration changes with "sed -i".
> In a near future we would probably need a "configure" script to do it.
> 
I really disagree.  Its not clearer in my mind at all - in that the final config
file is a product of two pieces of information (the base config file, and the
sed scripts that you run on it), as opposed to one piece (the canonical modified
config specified in the source line).  Using sed also implies that you need to
list sed as a BuildRequires (minimal buildroots may not include sed when they
are spun up).

> So you don't package igb_uio but you build it because there is no option to 
> disable it currently. We should add such option.
> 
Not sure what you mean here.  The only uio code I see in the package is the uio
unbind script for igb, which should still work just fine (save for the fact that
we don't have a user space PMD to attach the hardware to).  I can certainly
remove the script though so it doesn't appear in the package until such time as
the LAD group integrates the uio code in the upstream driver.

> > * Moved the package target directories to include N-V of the package in the
> > path names.  This allows for multiple versions of the dpdk to be installed
> > in parallel (I.e. dpdk-1.7.0 files are in /lib/dpdk-1.7.0,
> > /usr/include/dpdk-1.7.0, etc).  This is how java packages allow for
> > multiple version installs, and makes sense given ABI instability in dpdk. 
> > It will require that developers add some -I / -L paths to their makefiles
> > to pull the proper version, but I think thats a fair tradeoff.
> 
> I don't see version for include directory and bin directory (testpmd).
> 
Yup, need to fix that.  Thank you!
Neil

> Thanks
> -- 
> Thomas
> 


[dpdk-dev] [PATCH 0/3] ring: provide rte_ring_as_ethdev API

2014-05-19 Thread Neil Horman
On Mon, May 19, 2014 at 10:59:18AM +, Richardson, Bruce wrote:
> 
> 
> > -Original Message-
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Friday, May 16, 2014 7:54 PM
> > To: Richardson, Bruce
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 0/3] ring: provide rte_ring_as_ethdev API
> > 
> > On Fri, May 16, 2014 at 07:15:11PM +0100, Bruce Richardson wrote:
> > >
> > NAK, I don't think this makes sense.  If you want to encapsulate a ring 
> > pair as
> > an ethdev, then write a pmd that does so.  That will give you a standardized
> > ethdev that you can create using the existing --vdev librte_eal command line
> > options without having to widen your API surface, or having to write
> > applications that specifically know about the fact that your ethdev is 
> > composed
> > of rings under the covers.
> > 
> 
> The objective is not to "encapsulate a ring pair", but instead allow a ring 
> to be "type-cast" to an ethdev for the purposes of rx and tx. 
Thats semantics.  Whatever you want to call it, you're goal is to treat a ring
pair like an ethernet interface.  You already have a mechanism to do that,
librte_pmd_ring.

> If this is provided, we can provide standard functions which work to take 
> packets in using rx_burst and which send packets out after processing using 
> tx_burst. The same code can then be used unmodified without worrying about 
> whether the packets come from/to a NIC or from another core (via ring).
Again, you can already do this, librte_pmd_ring.  You're re-inventing the wheel.

If what you want is the ability to dynamically take a ring that you've created
and use it interchangeably as a ring and an ethernet device, I would suggest one
of the following:

1) Create a loopback PMD that registers a port, where anything transmitted to it
is immediately recieved on it again.  This allows you to reuse the existing
rte_eth_* api entirely to accomplish what you need

2) Create a tap device library and PMD, akin to the linux Tun/Tap device driver.
This is some additional work of course, and still expands the api, but does so
in a controlled and generic manner, useful to other applications.  By that I
mean that other data sources besides librte_ring can be used to feed data into
the network stack (pcap files or port mirroring applications, etc).

Either way, what you have right now is doing little more than solving a generic
problem only for one data source, in a way that unnecessecarily expands the API
surface.

Neil



[dpdk-dev] Heads up: Fedora packaging plans

2014-05-19 Thread Neil Horman
On Mon, May 19, 2014 at 06:28:47PM +0200, Thomas Monjalon wrote:
> 2014-05-19 09:18, Neil Horman:
> > On Mon, May 19, 2014 at 12:11:35PM +0200, Thomas Monjalon wrote:
> > > My main concerns are about naming and extensions.
> > > We must keep "dpdk-core" naming in order to distinguish it from PMD
> > > extensions.
> > 
> > I don't see why.  We can name packages whatever we want, as long as the spec
> > and srpm share the same name. It seems to me that the core should be the
> > base name of the package while the extensions should have some extension on
> > their name.
> 
> OK
> 
> > > And then, packaging of memnic and non-uio paravirtualization PMDs
> > > (virtio/vmxnet3) are missing.
> > 
> > They're in separate repositories, I was planning on packaging them at a
> > later time separately, since their versioning and development is handled
> > separately.
> 
> OK
> 
> > > We should try to get .spec for Fedora and in-tree .spec as common as
> > > possible. There are probably some things to push.
> > 
> > Ok, sure, just keep in mind that different distributions have different
> > packaging requirements that may affect the contents of the spec file, and so
> > attaining parity may not be possible (or even worthwhile).
> 
> Yes, the in-tree file should be a common base.
> 
> > > I thought that we should put version in the name, in order to be able to
> > > install many versions together. How is it handled by yum?
> > 
> > So, I spent some time thinking about this, and I _really_ want to avoid the
> > inclusion of a version with the package name.  Doing so, while it allows yum
> > to install multiple versions side-by-side, is a real overhead for me, as it
> > requires that I go through a new pacakge review process for each released
> > version that we want to package.  I do not have time to do that.  If
> > someone from 6wind or intel wants to get involved in the Packaging process
> > we can look at that as a solution, but while I'm doing it, its really just
> > too much overhead.  This method will allow multiple version to be installed
> > side by side as well.  The tradeoff is that yum doesn't directly allow
> > that, as it will just preform an upgrade.  The multiple version solution
> > will require that you download older versions and install them directly
> > using rpm commands.  I think thats a fair tradeoff.
> 
> OK
> 
> > > > * Added config files to match desired configs for Fedora (i.e. disabled
> > > > PMD's that require out of tree kernel modules
> > > 
> > > It would be clearer to make your configuration changes with "sed -i".
> > > In a near future we would probably need a "configure" script to do it.
> > 
> > I really disagree.  Its not clearer in my mind at all - in that the final
> > config file is a product of two pieces of information (the base config
> > file, and the sed scripts that you run on it), as opposed to one piece (the
> > canonical modified config specified in the source line).  Using sed also
> > implies that you need to list sed as a BuildRequires (minimal buildroots
> > may not include sed when they are spun up).
> 
> I don't understand the logic.
> When packaging a library using autotools, you are setting some options to 
> "configure" script which are not saved elsewhere than in the spec file, right?
> Why it wouldn't be the same here (options in spec file)? 
> 
When you need to override options that can't be auto-detected, then yes.  In
that case, you can override options that can't be automatically determined from
the build environment.  Thats a common use case and its well understood by
everyone, which is fine.

But thats not what you have here.  DPDK has a build environment that more akin
to the linux kernel, in that, from an rpm build perspective a flat file is
inserted as the configuration.  the spec file just drops it in the proper place
and uses it, unmodified.  Theres no expectation that, when a configuration is
completely supplied, that it will be modified prior to build time.

Its the difference between static and dynamic generation of configuration data.
If you're using autoconf, then you know that you don't have a set configuration
until you run ./configure, which means you know to look to the spec file to
determine any options for building, and you know that your configuration will be
generated after the spec file runs configure, whatever it is.  If conversely you
have a static configuration file (as DPDK does), the assumption is that the
selected configuration is included in the sou

[dpdk-dev] [PATCH 0/4] New library: rte_distributor

2014-05-20 Thread Neil Horman
On Tue, May 20, 2014 at 11:00:53AM +0100, Bruce Richardson wrote:
> This adds a new library to the Intel DPDK whereby a set of packets can be 
> distributed one-at-a-time to a set of worker cores, with dynamic load 
> balancing being done between those workers. Flows are identified by a tag 
> within the mbuf (currently the RSS hash field, 32-bit value), which is used 
> to ensure that no two packets of the same flow are processed in parallel, 
> thereby preserving ordering.
> 
> Bruce Richardson (4):
>   eal: add tailq for new distributor component
>   distributor: new packet distributor library
>   distributor: add distributor library to build
>   distributor: add unit tests for distributor lib
> 
>  app/test/Makefile  |   2 +
>  app/test/commands.c|   7 +-
>  app/test/test.h|   2 +
>  app/test/test_distributor.c| 582 
> +
>  app/test/test_distributor_perf.c   | 274 
>  config/defconfig_i686-default-linuxapp-gcc |   5 +
>  config/defconfig_i686-default-linuxapp-icc |   5 +
>  config/defconfig_x86_64-default-bsdapp-gcc |   6 +
>  config/defconfig_x86_64-default-linuxapp-gcc   |   5 +
>  config/defconfig_x86_64-default-linuxapp-icc   |   5 +
>  lib/Makefile   |   1 +
>  lib/librte_distributor/Makefile|  50 +++
>  lib/librte_distributor/rte_distributor.c   | 417 ++
>  lib/librte_distributor/rte_distributor.h   | 173 
>  lib/librte_eal/common/include/rte_tailq_elem.h |   2 +
>  mk/rte.app.mk  |   4 +
>  16 files changed, 1539 insertions(+), 1 deletion(-)
>  create mode 100644 app/test/test_distributor.c
>  create mode 100644 app/test/test_distributor_perf.c
>  create mode 100644 lib/librte_distributor/Makefile
>  create mode 100644 lib/librte_distributor/rte_distributor.c
>  create mode 100644 lib/librte_distributor/rte_distributor.h
> 
> -- 
> 1.9.0
> 
> 
This sounds an awful lot like the team and bonding drivers.  Why implement this
as a separate application accessible api, rather than a stacked PMD?  If you do
the latter then existing applications could concievably change their
configurations to use this technology and gain the benefit of load distribution
without having to alter the application to use a new api.

Neil



[dpdk-dev] [PATCH v5 00/14] dpdk: Separate compile time linkage between eal lib and pmd's

2014-05-20 Thread Neil Horman
On Tue, May 20, 2014 at 02:45:09PM +0200, Thomas Monjalon wrote:
> 2014-04-21 10:59, Neil Horman:
> > Disconnect compile time linkage between eal library / applications and pmd's
> > 
> > I noticed that, while tinkering with dpdk, building for shared libraries
> > still resulted in all the test applications linking to all the built pmd's,
> > despite not actually needing them all.  We are able to tell an application
> > at run time (via the -d/--blacklist/--whitelist/--vdev options) which pmd's
> > we want to use, and so have no need to link them at all. The only reason
> > they get pulled in is because rte_eal_non_pci_init_etherdev and
> > rte_pmd_init_all contain static lists to the individual pmd init functions.
> > The result is that, even when building as DSO's, we have to load all the
> > pmd libraries, which is space inefficient and defeating of some of the
> > purpose of shared objects.
> > 
> > To correct this, I developed this patch series, which introduces a new
> > macro, PMD_REGISTER_DRIVER, which wraps up Oliviers work using constructors
> > on the virtual device pmds, then expands on it to include the physical
> > device pmds, allowing us to break linkages between dpdk applications and
> > pmd's almost entirely (save for the ring and xenvirt drivers, which have
> > additional api's outside of the standard dpdk code that we need to further
> > fix).  This also allows us to completely remove the rte_pmd_init_all
> > routine, hiding its function internally to the rte_eal_init path.
> > 
> > I've tested this feature using the igb and pcap pmd's, both statically and
> > dynamically linked with the test and testpmd sample applications, and it
> > seems to work well.
> > 
> > Note, I encountered  a few bugs along the way, which I fixed and noted in
> > the series.
> 
> Thanks for this nice cleanup.
> 
> Acked-by: Thomas Monjalon 
> 
> Applied for version 1.7.0.
> 
Thank you Thomas, I will pull this into the rpm I have awaiting revew for
Fedora asap.

Neil

> -- 
> Thomas
> 


[dpdk-dev] [PATCH 0/4] New library: rte_distributor

2014-05-20 Thread Neil Horman
On Tue, May 20, 2014 at 11:02:15AM +, Richardson, Bruce wrote:
> > -Original Message-
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Tuesday, May 20, 2014 11:39 AM
> > To: Richardson, Bruce
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 0/4] New library: rte_distributor
> > 
> > >
> > This sounds an awful lot like the team and bonding drivers.  Why implement 
> > this
> > as a separate application accessible api, rather than a stacked PMD?  If 
> > you do
> > the latter then existing applications could concievably change their
> > configurations to use this technology and gain the benefit of load 
> > distribution
> > without having to alter the application to use a new api.
> > 
> 
> I'm not sure I see the similarity with the bonded driver, which merges 
> multiple ports into a single logical port, i.e. you pull packets from a 
> single source which is actually pull packets from possibly multiple sources 
> behind the scenes, whereas this takes packets from an unknown source and 
> distributes them among a set of workers a single packet at a time. (While 
> handling single packets is slower than handling packet bursts, it is 
> something that is sometimes needed to support existing code which may not be 
> written to work with packet bursts.) 
> 
Ah, my bad, I was looking at the API as a way of multiplexing locally generated
data to multiple workers for transmission over multiple network interfaces, not
to demultiplex received data to multiple workers.  That makes more sense.  Sorry
for the noise.  I've got a few more comments inline with the rest of your
patches.
Neil



[dpdk-dev] [PATCH 2/4] distributor: new packet distributor library

2014-05-20 Thread Neil Horman
On Tue, May 20, 2014 at 11:00:55AM +0100, Bruce Richardson wrote:
> This adds the code for a new Intel DPDK library for packet distribution.
> The distributor is a component which is designed to pass packets
> one-at-a-time to workers, with dynamic load balancing. Using the RSS
> field in the mbuf as a tag, the distributor tracks what packet tag is
> being processed by what worker and then ensures that no two packets with
> the same tag are in-flight simultaneously. Once a tag is not in-flight,
> then the next packet with that tag will be sent to the next available
> core.
> 
> Signed-off-by: Bruce Richardson 
>

> +#define RTE_DISTRIB_GET_BUF (1)
> +#define RTE_DISTRIB_RETURN_BUF (2)
> +
Can you document the meaning of these bits please, the code makes it somewhat
confusing to differentiate them.  As I read the code, GET_BUF is used as a flag
to indicate that rte_distributor_get_pkt needs to wait while a buffer is
filled in by the processing thread, while RETURN_BUF indicates that a worker is
leaving and the buffer needs to be (re)assigned to an alternate worker, is that
correct?

> +#define RTE_DISTRIB_BACKLOG_SIZE 8
> +#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
> +
> +#define RTE_DISTRIB_MAX_RETURNS 128
> +#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
> +
> +union rte_distributor_buffer {
> + volatile int64_t bufptr64;
> + char pad[CACHE_LINE_SIZE*3];
Do you need the pad, if you mark the struct as cache aligned?
> +} __rte_cache_aligned;
> 
+
>
> +
> +struct rte_mbuf *
> +rte_distributor_get_pkt(struct rte_distributor *d,
> + unsigned worker_id, struct rte_mbuf *oldpkt,
> + unsigned reserved __rte_unused)
> +{
> + union rte_distributor_buffer *buf = &d->bufs[worker_id];
> + int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS) | 
> \
> + RTE_DISTRIB_GET_BUF;
> + while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
> + rte_pause();
> + buf->bufptr64 = req;
> + while (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
> + rte_pause();
You may want to document the fact that this is deadlock prone.  You clearly
state that only a single thread can run the processing routine, but if a user
selects a single worker thread to preform double duty, the GET_BUF_FLAG will
never get cleared here, and no other queues will get processed.

> + /* since bufptr64 is a signed value, this should be an arithmetic shift 
> */
> + int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
> + return (struct rte_mbuf *)((uintptr_t)ret);
> +}
> +
> +int
> +rte_distributor_return_pkt(struct rte_distributor *d,
> + unsigned worker_id, struct rte_mbuf *oldpkt)
> +{
Maybe some optional sanity checking, here and above, to ensure that a packet
returned through get_pkt doesn't also get returned here, mangling the flags
field?

>
> +
> +/* flush the distributor, so that there are no outstanding packets in flight 
> or
> + * queued up. */
> +int
> +rte_distributor_flush(struct rte_distributor *d)
> +{
You need to document that this function can only be called by the same thread
that is running rte_distributor_process, lest your corrupt your queue data.
Alternatively, it might be nicer to modify this functions internals to set a
flag in the distributor status bits to make the process routine do the flush
work when it gets set.  that would allow this function to be called by any
other thread, which seems like a more natural interface.

>
> +}
> +
> +/* clears the internal returns array in the distributor */
> +void
> +rte_distributor_clear_returns(struct rte_distributor *d)
> +{
This can also only be called by the same thread that runs the process routine,
lest the start and count values get mis-assigned.

> + d->returns.start = d->returns.count = 0;
> +#ifndef __OPTIMIZE__
> + memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
> +#endif
> +}
> +
> +/* creates a distributor instance */
> +struct rte_distributor *
> +rte_distributor_create(const char *name,
> + unsigned socket_id,
> + unsigned num_workers,
> + struct rte_distributor_extra_args *args __rte_unused)
> +{
> + struct rte_distributor *d;
> + struct rte_distributor_list *distributor_list;
> + char mz_name[RTE_MEMZONE_NAMESIZE];
> + const struct rte_memzone *mz;
> +
> + /* compilation-time checks */
> + RTE_BUILD_BUG_ON((sizeof(*d) & CACHE_LINE_MASK) != 0);
> + RTE_BUILD_BUG_ON((RTE_MAX_LCORE & 7) != 0);
> +
> + if (name == NULL || num_workers >= RTE_MAX_LCORE) {
> + rte_errno = EINVAL;
> + return NULL;
> + }
> + rte_snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
> + mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
> + if (mz == NULL) {
> + rte_errno = ENOMEM;
> + return NULL;
> + }
> +
> + /* check that we have an initialised tail queue *

[dpdk-dev] [PATCH 2/4] distributor: new packet distributor library

2014-05-21 Thread Neil Horman
On Wed, May 21, 2014 at 10:21:26AM +, Richardson, Bruce wrote:
> > -Original Message-
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Tuesday, May 20, 2014 7:19 PM
> > To: Richardson, Bruce
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 2/4] distributor: new packet distributor 
> > library
> > 
> > On Tue, May 20, 2014 at 11:00:55AM +0100, Bruce Richardson wrote:
> > > This adds the code for a new Intel DPDK library for packet distribution.
> > > The distributor is a component which is designed to pass packets
> > > one-at-a-time to workers, with dynamic load balancing. Using the RSS
> > > field in the mbuf as a tag, the distributor tracks what packet tag is
> > > being processed by what worker and then ensures that no two packets with
> > > the same tag are in-flight simultaneously. Once a tag is not in-flight,
> > > then the next packet with that tag will be sent to the next available
> > > core.
> > >
> > > Signed-off-by: Bruce Richardson 
> > >
> > 
> > >

> > Don't need to reserve an extra argument here.  You're not ABI safe 
> > currently,
> > and if DPDK becomes ABI safe in the future, we will use a linker script to
> > provide versions with backward compatibility easily enough.
> We may not have ABI compatibility between releases, but on the other hand we 
> try to reduce the amount of code changes that need to be made by our 
> customers who are compiling their code against the libraries - generally 
> linking against static rather than shared libraries. Since we have a 
> reasonable expectation that this field will be needed in a future release, we 
> want to include it now so that when we do need it, no code changes need to be 
> made to upgrade this particular library to a new Intel DPDK version.

I understand why you added the reserved argument, but I still don't think its a
good idea, especially since you're not ABI safe/stable at the moment.  By adding
this argument, you're forcing early users to declare a variable to pass into
your library that they know is unused, and as such likely uninitalized (or at
least initilized to an unknown value).  When you do in the future make use of
this unknown value, your internal implementation will have to support being
called by both 'old' applications that just pass in any old value, and 'new'
users who pass in valid data, and the implementation wont have any way to
differentiate between the two.  You can certainly document a reserved value that
current users must initilize that variable too, so that you can make that
differentiation, but you have to hope they do that correctly and consistently.
It seems to me it would be better to do something like:
1) Not include the reserved parameter
2) When you do add the extra parameter, rename the function as well, and
3) provide a compatibility function that preserves the old API and passes the
reserved value as the new parameter to the renamed function in (2)

That way old applications will run transparently, and you don't have to hope
they code the reserved values properly (note you can also do this with a macro
if you want to save the call instruction)

Ideally, you would just do this with a version script during linking, so that
you could include 2 versions of the same function name (v1 without the extra
paramter and v2 with the extra parameter), and old applications linked against
v1 would just continue to work, but dpdk isn't there yet :)
Neil



[dpdk-dev] [PATCH RFC 03/11] mbuf: remove rte_ctrlmbuf

2014-05-26 Thread Neil Horman
On Sun, May 25, 2014 at 09:39:22PM +, Gilmore, Walter E wrote:
> Olivier you're making an assumption that customer application code running on 
> the Intel DPDK isn't using the rte_ctrlmbuf structure. 
> Remember there are more than 300 customers using the Intel DPDK and there is 
> no way you can predict that this is not used by them. 
> The purpose of this structure is to send commands, events or any other type 
> of information between user application tasks (normally from a manager task).
> It has been there since the beginning in the original design and it's up to 
> the user to define what is in the data field and how they wish to use it. 
> It's one thing to fix a bug but to remove a structure like this because you 
> don't see it use in the other parts is asking for trouble with customers. 
> 

Not to rub salt in this, but I'd like to point out here that this strikes me as
a case of wanting cake and eating it too.  This community seems adamant against
the notion of having a fixed API for the dpdk project, yet fractures the moment
anyone tries to change something that they, or someone they are working with,
might be using.

If you want to make sure that use cases outside the scope of the project itself
stay usable, stabilize the API, and mark it with a version.  If you do this,
then you can change the API, and mark it with a new version in the link stage,
and just focus on maintaining backward compatibility.

Neil

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> Sent: Friday, May 09, 2014 10:51 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH RFC 03/11] mbuf: remove rte_ctrlmbuf
> 
> The initial role of rte_ctrlmbuf is to carry generic messages (data pointer + 
> data length) but it's not used by the DPDK or it applications.
> Keeping it implies:
>   - loosing 1 byte in the rte_mbuf structure
>   - having some dead code rte_mbuf.[ch]
> 
> This patch removes this feature. Thanks to it, it is now possible to simplify 
> the rte_mbuf structure by merging the rte_pktmbuf structure in it. This is 
> done in next commit.
> 
> Signed-off-by: Olivier Matz 
> ---
>  app/test-pmd/cmdline.c   |   1 -
>  app/test-pmd/testpmd.c   |   2 -
>  app/test-pmd/txonly.c|   2 +-
>  app/test/commands.c  |   1 -
>  app/test/test_mbuf.c |  72 +
>  examples/ipv4_multicast/main.c   |   2 +-
>  lib/librte_mbuf/rte_mbuf.c   |  65 +++-
>  lib/librte_mbuf/rte_mbuf.h   | 175 
> ++-
>  lib/librte_pmd_e1000/em_rxtx.c   |   2 +-
>  lib/librte_pmd_e1000/igb_rxtx.c  |   2 +-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c|   4 +-
>  lib/librte_pmd_virtio/virtio_rxtx.c  |   2 +-
>  lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c|   2 +-
>  lib/librte_pmd_xenvirt/rte_eth_xenvirt.c |   2 +-
>  14 files changed, 54 insertions(+), 280 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 
> 7becedc..e3d1849 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -5010,7 +5010,6 @@ dump_struct_sizes(void)  #define DUMP_SIZE(t) 
> printf("sizeof(" #t ") = %u\n", (unsigned)sizeof(t));
>   DUMP_SIZE(struct rte_mbuf);
>   DUMP_SIZE(struct rte_pktmbuf);
> - DUMP_SIZE(struct rte_ctrlmbuf);
>   DUMP_SIZE(struct rte_mempool);
>   DUMP_SIZE(struct rte_ring);
>  #undef DUMP_SIZE
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 
> 9c56914..76b3823 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -389,13 +389,11 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
>   mb_ctor_arg = (struct mbuf_ctor_arg *) opaque_arg;
>   mb = (struct rte_mbuf *) raw_mbuf;
>  
> - mb->type = RTE_MBUF_PKT;
>   mb->pool = mp;
>   mb->buf_addr = (void *) ((char *)mb + mb_ctor_arg->seg_buf_offset);
>   mb->buf_physaddr = (uint64_t) (rte_mempool_virt2phy(mp, mb) +
>   mb_ctor_arg->seg_buf_offset);
>   mb->buf_len  = mb_ctor_arg->seg_buf_size;
> - mb->type = RTE_MBUF_PKT;
>   mb->ol_flags = 0;
>   mb->pkt.data = (char *) mb->buf_addr + RTE_PKTMBUF_HEADROOM;
>   mb->pkt.nb_segs  = 1;
> diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index 
> 1cf2574..1f066d0 100644
> --- a/app/test-pmd/txonly.c
> +++ b/app/test-pmd/txonly.c
> @@ -93,7 +93,7 @@ tx_mbuf_alloc(struct rte_mempool *mp)
>   struct rte_mbuf *m;
>  
>   m = __rte_mbuf_raw_alloc(mp);
> - __rte_mbuf_sanity_check_raw(m, RTE_MBUF_PKT, 0);
> + __rte_mbuf_sanity_check_raw(m, 0);
>   return (m);
>  }
>  
> diff --git a/app/test/commands.c b/app/test/commands.c index b145036..c69544b 
> 100644
> --- a/app/test/commands.c
> +++ b/app/test/commands.c
> @@ -262,7 +262,6 @@ dump_struct_sizes(void)  #define DUMP_SIZE(t) 
> printf("sizeof(" #t ") = %u\n"

[dpdk-dev] [PATCH v2 0/3] Support setting TX rate for queue and VF

2014-05-26 Thread Neil Horman
On Mon, May 26, 2014 at 03:45:28PM +0800, Ouyang Changchun wrote:
> This patch v2 fixes some errors and warnings reported by checkpatch.pl.
>  
> This patch series also contain the 3 items:
> 1. Add API to support setting TX rate for a queue or a VF.
> 2. Implement the functionality of setting TX rate for queue or VF in IXGBE 
> PMD.
> 3. Add commands in testpmd to test the functionality of setting TX rate for 
> queue or VF.
> 
> Ouyang Changchun (3):
>   Add API to support set TX rate for a queue and VF.
>   Implement the functionality of setting TX rate for queue or VF in
> IXGBE PMD.
>   Add commands in testpmd to test the functionality of setting TX rate
> for queue or VF.
> 
>  app/test-pmd/cmdline.c  | 159 
> +++-
>  app/test-pmd/config.c   |  47 +++
>  app/test-pmd/testpmd.h  |   3 +
>  lib/librte_ether/rte_ethdev.c   |  71 
>  lib/librte_ether/rte_ethdev.h   |  51 
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 122 +++
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  13 ++-
>  7 files changed, 462 insertions(+), 4 deletions(-)
> 
This seems a bit backwards.  queue rate limiting is rather a generic function,
that doesn't really need to know any details about the hardware, save for its
maximum tx rate, but you're implementaiton requires that it be re-implemented
for each PMD.  Why not just export max tx rates from the PMD and write a generic
queuing libarary to do rate limitation for any PMD?

Neil

> -- 
> 1.9.0
> 
> 


[dpdk-dev] [PATCH 00/29] Packet Framework

2014-05-27 Thread Neil Horman
On Tue, May 27, 2014 at 06:09:23PM +0100, Cristian Dumitrescu wrote:
> Intel DPDK Packet Framework provides a standard methodology (logically 
> similar to OpenFlow) for rapid development of complex packet processing 
> pipelines out of ports, tables and actions.
> 
> A pipeline is constructed by connecting its input ports to its output ports 
> through a chain of lookup tables. As result of lookup operation into the 
> current table, one of the table entries (or the default table entry, in case 
> of lookup miss) is identified to provide the actions to be executed on the 
> current packet and the associated action meta-data. The behavior of user 
> actions is defined through the configurable table action handler, while the 
> reserved actions define the next hop for the current packet (either another 
> table, an output port or packet drop) and are handled transparently by the 
> framework.
> 
> Three new Intel DPDK libraries are introduced for Packet Framework: 
> librte_port, librte_table, librte_pipeline. Please check the Intel DPDK 
> Programmer's Guide for full description of the Packet Framework design.
> 
> Two sample applications are provided for Packet Framework: app/test-pipeline 
> and examples/ip_pipeline. Please check the Intel Sample Apps Guide for a 
> detailed description of how these sample apps.
> 
Isn't this at least in part functionality that OVS provides on top of DPDK?  Why
re-invent the wheel?

Neil



[dpdk-dev] [PATCH] mk: fix link with gcc

2014-05-28 Thread Neil Horman
On Tue, May 27, 2014 at 02:55:16PM +0200, Thomas Monjalon wrote:
> Some linker options were not prefixed by -Wl, when using gcc:
>   -z muldefs
>   -melf_i386 (32-bit config)
> 
> Using macro linkerprefix is fixing it.
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  mk/rte.lib.mk | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
> index f5d2789..c58e68e 100644
> --- a/mk/rte.lib.mk
> +++ b/mk/rte.lib.mk
> @@ -62,6 +62,8 @@ exe2cmd = $(strip $(call dotfile,$(patsubst %,%.cmd,$(1
>  ifeq ($(LINK_USING_CC),1)
>  # Override the definition of LD here, since we're linking with CC
>  LD := $(CC)
> +LD_MULDEFS := $(call linkerprefix,-z$(comma)muldefs)
> +CPU_LDFLAGS := $(call linkerprefix,$(CPU_LDFLAGS))
>  endif
>  
Agree with Olivier, what exactly is the problem here?  Also, I don't think this
is correct, as CPU_LD_FLAGS and -z muldefs below is used in conjunction with
$LD.  It would make sense to prefix -Wl to these options if we were passing them
through $CC, but not $LD

Neil

>  O_TO_A = $(AR) crus $(LIB) $(OBJS-y)
> @@ -73,7 +75,7 @@ O_TO_A_DO = @set -e; \
>   $(O_TO_A) && \
>   echo $(O_TO_A_CMD) > $(call exe2cmd,$(@))
>  
> -O_TO_S = $(LD) $(CPU_LDFLAGS) -z muldefs -shared $(OBJS-y) -o $(LIB)
> +O_TO_S = $(LD) $(CPU_LDFLAGS) $(LD_MULDEFS) -shared $(OBJS-y) -o $(LIB)
>  O_TO_S_STR = $(subst ','\'',$(O_TO_S)) #'# fix syntax highlight
>  O_TO_S_DISP = $(if $(V),"$(O_TO_S_STR)","  LD $(@)")
>  O_TO_S_DO = @set -e; \
> @@ -89,7 +91,7 @@ O_TO_C_DO = @set -e; \
>   $(lib_dir) \
>   $(copy_obj)
>  else
> -O_TO_C = $(LD) -z muldefs -shared $(OBJS-y) -o $(LIB_ONE)
> +O_TO_C = $(LD) $(LD_MULDEFS) -shared $(OBJS-y) -o $(LIB_ONE)
>  O_TO_C_STR = $(subst ','\'',$(O_TO_C)) #'# fix syntax highlight
>  O_TO_C_DISP = $(if $(V),"$(O_TO_C_STR)","  LD_C $(@)")
>  O_TO_C_DO = @set -e; \
> -- 
> 1.9.2
> 
> 


[dpdk-dev] [PATCH 0/4] Link Bonding Library

2014-05-28 Thread Neil Horman
On Wed, May 28, 2014 at 04:32:00PM +0100, declan.doherty at intel.com wrote:
> From: Declan Doherty 
> 
> Initial release of Link Bonding Library (lib/librte_bond) with support for 
> bonding modes :
>  0 - Round Robin
>  1 - Active Backup
>  2 - Balance l2 / l23 / l34 
>  3 - Broadcast
> 
Why make this a separate library?  That requires exposure of yet another API to
applications.  Instead, why not write a PMD that can enslave other PMD's and
treat them all as a single interface?  That way this all works with the existing
API.

Neil

> patches split:
>  1 - library + makefile changes
>  2 - Unit test suite, including code to generate packet bursts for
> testing rx and tx functionality of bonded device and a
> virtual/stubbed out ethdev for use as slave ethdev in testing
>  3 - Link bonding integration into testpmd, including :
>  - Includes the ability to  create new bonded devices.
>  - Add /remove bonding slave devices. 
>  - Interogate bonded device stats/configuration
>  - Change bonding modes and select balance transmit polices
>  4 - Add Link Bonding Library to Doxygen
> 
> 
>  app/test-pmd/cmdline.c|  550 +
>  app/test-pmd/parameters.c |4 +-
>  app/test-pmd/testpmd.c|   28 +-
>  app/test-pmd/testpmd.h|2 +
>  app/test/Makefile |3 +
>  app/test/commands.c   |3 +
>  app/test/packet_burst_generator.c |  276 +++
>  app/test/packet_burst_generator.h |   85 +
>  app/test/test.h   |1 +
>  app/test/test_link_bonding.c  | 4007 
> +
>  app/test/virtual_pmd.c|  580 ++
>  app/test/virtual_pmd.h|   74 +
>  config/common_bsdapp  |5 +
>  config/common_linuxapp|5 +
>  doc/doxy-api-index.md |1 +
>  doc/doxy-api.conf |1 +
>  lib/Makefile  |1 +
>  lib/librte_bond/Makefile  |   28 +
>  lib/librte_bond/rte_bond.c| 1679 
>  lib/librte_bond/rte_bond.h|  228 +++
>  mk/rte.app.mk |5 +
>  21 files changed, 7564 insertions(+), 2 deletions(-)
>  create mode 100644 app/test/packet_burst_generator.c
>  create mode 100644 app/test/packet_burst_generator.h
>  create mode 100644 app/test/test_link_bonding.c
>  create mode 100644 app/test/virtual_pmd.c
>  create mode 100644 app/test/virtual_pmd.h
>  create mode 100644 lib/librte_bond/Makefile
>  create mode 100644 lib/librte_bond/rte_bond.c
>  create mode 100644 lib/librte_bond/rte_bond.h
> 
> -- 
> 1.8.5.3
> 
> 


[dpdk-dev] [PATCH] mk: fix link with gcc

2014-05-29 Thread Neil Horman
On Thu, May 29, 2014 at 08:24:56AM +0200, Thomas Monjalon wrote:
> Hi Neil,
> 
> 2014-05-28 10:17, Neil Horman:
> > On Tue, May 27, 2014 at 02:55:16PM +0200, Thomas Monjalon wrote:
> > > Some linker options were not prefixed by -Wl, when using gcc:
> > >   -z muldefs
> > >   -melf_i386 (32-bit config)
> > > 
> > > Using macro linkerprefix is fixing it.
> > > 
> > > Signed-off-by: Thomas Monjalon 
> [...]
> > >  ifeq ($(LINK_USING_CC),1)
> > >  # Override the definition of LD here, since we're linking with CC
> > >  LD := $(CC)
> > > 
> > > +LD_MULDEFS := $(call linkerprefix,-z$(comma)muldefs)
> > > +CPU_LDFLAGS := $(call linkerprefix,$(CPU_LDFLAGS))
> > 
> > Agree with Olivier, what exactly is the problem here?
> 
> When using CC as LD, linker options should be prefixed with -Wl.
> 
> > Also, I don't think
> > this is correct, as CPU_LD_FLAGS and -z muldefs below is used in
> > conjunction with $LD.  It would make sense to prefix -Wl to these options
> > if we were passing them through $CC, but not $LD
> 
> Yes, but options are prefixed only in the case LD = CC.
> 
ah, sorry, was looking at the wrong clause in rte.lib.mk

> Neil, this situation is funny as you're the author of the patch making LD as 
> CC and you submitted this kind of fix to prefix CPU_LDFLAGS :)
> This patch is a translation of yours with use of macro linkerprefix.
> 
Yeah, my fault, I was only looking at the lines changed, rather than the context
surrounding them.  This makes sense.
Neil

> --  
> Thomas
> 


[dpdk-dev] [PATCH 0/4] Link Bonding Library

2014-05-29 Thread Neil Horman
On Thu, May 29, 2014 at 10:33:00AM +, Doherty, Declan wrote:
> -Original Message-
> > From: Neil Horman [mailto:nhorman at tuxdriver.com] 
> > Sent: Wednesday, May 28, 2014 6:49 PM
> > To: Doherty, Declan
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 0/4] Link Bonding Library
> >
> > On Wed, May 28, 2014 at 04:32:00PM +0100, declan.doherty at intel.com wrote:
> > > From: Declan Doherty 
> > > 
> > > Initial release of Link Bonding Library (lib/librte_bond) with support 
> > > for bonding modes :
> > >  0 - Round Robin
> > >  1 - Active Backup
> > >  2 - Balance l2 / l23 / l34
> > >  3 - Broadcast
> > > 
> > Why make this a separate library?  That requires exposure of yet another 
> > API to applications.  Instead, why > not write a PMD that can enslave other 
> > PMD's and treat them all as a single interface?  That way this all >  > 
> > works with the existing API.
> >
> > Neil
> 
> Hi Neil,
> the link bonding device is essentially a software PMD, and as such supports 
> all the standard PMD APIs, the only new APIs which the link bonding library 
> introduces  are for the control operations of the bonded device which are 
> currently unsupported by the standard PMD API. Operations such as creating, 
> adding/removing slaves, and configuring the modes of operation of the device 
> have no analogous APIs in the current PMD API and required new ones to be 
> created .

Thats really only true in spirit, in the sense that this library transmits and
receives frames like a PMD does.  In practice it doesn't work and isn't
architected the same way.  You don't register the driver using the same method
as the other PMDs, which means that using --vdev on the command line wont work
for this type of device.  It also implies that applications have to be made
specifically aware of the fact that they are using a bonded interface (i.e. they
need to call the bonding setup routines to create the bond).  I would recommend:

1) Register the pmd using the PMD_DRIVER_REGISTER macro, like other PMD's
2) Use the kvargs library to support configuration via the --vdev command line
option, so bonds can be created administratively, rather than just
programatically
3) Separate the command api from the PMD functionality into separate libraries
(use control mbufs to communicate configuration changes to the pmd).  This will
allow users to dynamically load the pmd functionality (without compile or run
time linking requirements), and then optionally use the programatic interface
(or not if they want to save memory)

Regards
Neil

> 
> Declan
> --
> Intel Shannon Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
> Business address: Dromore House, East Park, Shannon, Co. Clare
> 
> This e-mail and any attachments may contain confidential material for the 
> sole use of the intended recipient(s). Any review or distribution by others 
> is strictly prohibited. If you are not the intended recipient, please contact 
> the sender and delete all copies.
> 
> 
> 


[dpdk-dev] [PATCH v2 2/5] distributor: new packet distributor library

2014-05-29 Thread Neil Horman
> +
> +/* flush the distributor, so that there are no outstanding packets in flight 
> or
> + * queued up. */
Its not clear to me that this is a distributor only function.  You modified the
comments to indicate that lcores can't preform double duty as both a worker and
a distributor, which is fine, but it implies that there is a clear distinction
between functions that are 'worker' functions and 'distributor' functions.
While its for the most part clear-ish (workers call rte_distributor_get_pkt and
rte_distibutor_return_pkt, distibutors calls rte_distributor_create/process.
This is in a grey area.  the analogy I'm thinking of here are kernel workqueues.
Theres a specific workqueue thread that processes the workqueue, but any process
can sync or flush the workqueue, leading me to think this process can be called
by a worker lcore.

> +int
> +rte_distributor_flush(struct rte_distributor *d)
> +{
> + unsigned wkr, total_outstanding = 0;
> + unsigned flushed = 0;
> + unsigned ret_start = d->returns.start,
> + ret_count = d->returns.count;
> +
> + for (wkr = 0; wkr < d->num_workers; wkr++)
> + total_outstanding += d->backlog[wkr].count +
> + !!(d->in_flight_tags[wkr]);
> +
> + wkr = 0;
> + while (flushed < total_outstanding) {
> +
> + if (d->in_flight_tags[wkr] != 0 || d->backlog[wkr].count) {
> + const int64_t data = d->bufs[wkr].bufptr64;
> + uintptr_t oldbuf = 0;
> +
> + if (data & RTE_DISTRIB_GET_BUF) {
> + flushed += (d->in_flight_tags[wkr] != 0);
> + if (d->backlog[wkr].count) {
> + d->bufs[wkr].bufptr64 =
> + backlog_pop(&d->backlog[wkr]);
> + /* we need to mark something as being
> +  * in-flight, but it doesn't matter what
> +  * as we never check it except
> +  * to check for non-zero.
> +  */
> + d->in_flight_tags[wkr] = 1;
> + } else {
> + d->bufs[wkr].bufptr64 =
> + RTE_DISTRIB_GET_BUF;
> + d->in_flight_tags[wkr] = 0;
> + }
> + oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
> + } else if (data & RTE_DISTRIB_RETURN_BUF) {
> + if (d->backlog[wkr].count == 0 ||
> + move_backlog(d, wkr) == 0) {
> + /* only if we move backlog,
> +  * process this packet */
> + d->bufs[wkr].bufptr64 = 0;
> + oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
> + flushed++;
> + d->in_flight_tags[wkr] = 0;
> + }
> + }
> +
> + store_return(oldbuf, d, &ret_start, &ret_count);
> + }
> +
I know the comments for move_backlog say you use that function here rather than
what you do in distributor_process because you're tracking the flush count here.
That said, if you instead recomputed the total_outstanding count on each loop
iteration, and tested it for 0, I think you could just reduce the flush
operation to a looping call to rte_distributor_process.  It would save you
having to maintain the flush code and the move_backlog code separately, which
would be a nice savings.

> + if (++wkr == d->num_workers)
> + wkr = 0;
Nit: wkr = ++wkr % d->num_workers avoids the additional branch in your loop


Regards
Neil



[dpdk-dev] DPDK Community Conference Call - Friday 31st October

2014-11-01 Thread Neil Horman
On Fri, Oct 31, 2014 at 05:36:59PM +, O'driscoll, Tim wrote:
> Thanks again to those who attended the call earlier. Hopefully people found 
> it useful.
> 
> We'll schedule a follow-up call for 2 weeks' time. One thing that we do want 
> to look into is an easy way to allow screen sharing, so that we can use some 
> slides to guide the discussion. Internally within Intel we use MS Lync. We 
> can try that, but it's not always very user-friendly for external 
> participants, and doesn't have a Linux client. Other options would include 
> GoToMeeting or WebEx. If anybody has input on a good tool for this, let me 
> know.
> 
Don't use Lync, its a horrid tool.  As you note, there is no linux client (nor a
mac client that I'm aware of).  Google hangouts offers screen sharing, and is
free for anyone to use.

> We covered the following features from our 2.0 list today, and will discuss 
> the remainder on the next call. I've called out below who on our side was 
> describing each of the features, and included their email addresses. If 
> anybody has further questions on these, feel free to either ask openly on the 
> mailing list, or else contact the relevant person directly.
> 
> Bifurcated Driver (Danny.Zhou at intel.com)
> Packet Reordering/Packet Distributor (Bruce.Richardson at intel.com)
> New Hardware Support (Walter.E.Gilmore at intel.com)
> Fortville features (Heqing.Zhu at intel.com)
> Support Multiple Threads per Core (Venky.Venkatesan at intel.com)
> Cuckoo Hash (Bruce.Richardson at intel.com, Venky.Venkatesan at intel.com)
> 
> The Cuckoo Hash paper that was mentioned is available at: 
> http://www.cs.cmu.edu/~dongz/papers/cuckooswitch.pdf.
> 
> Finally, if anybody has suggestions for topics for future calls, please let 
> me know.
> 
ABI versioning, in support of packaging the DPDK for non-rebuilding users.

Neil

> 
> Thanks,
> Tim
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of O'driscoll, Tim
> > Sent: Friday, October 31, 2014 3:35 PM
> > To: 'dev at dpdk.org'
> > Subject: Re: [dpdk-dev] DPDK Community Conference Call - Friday 31st
> > October
> > 
> > This is just a reminder for anybody who's interested that this will be on 
> > in 30
> > minutes, and that we'll be discussing the feature list for the DPDK 2.0 
> > release
> > in March 2015.
> > 
> > Audio bridge details are:
> > France: +33 1588 77298
> > Germany:+49 8999 143191
> > Israel: +972 2589 6577
> > Russia: +7 495 641 4663
> > UK: +44 1793 402663
> > USA/Canada: +1 916 356 2663 or +1-888-875-9370
> > 
> > Bridge: 5
> > Conference ID: 1264677285
> > 
> > 
> > Tim
> > 
> > > -Original Message-
> > > From: O'driscoll, Tim
> > > Sent: Friday, October 24, 2014 10:22 AM
> > > To: dev at dpdk.org
> > > Subject: DPDK Community Conference Call - Friday 31st October
> > >
> > > We're planning to hold our first community conference call on Friday
> > > 31st October. It's impossible to find a time that suits everybody, so
> > > we've chosen to do this in the afternoon/evening in Europe, which is
> > > the morning in the USA. This does unfortunately limit participation
> > > from PRC, Japan and other parts of the world. Here's the time and date in 
> > > a
> > variety of time zones:
> > >
> > > Dublin (Ireland)  Friday, October 31, 2014 at
> > > 4:00:00 PMGMT UTC
> > > Paris (France)Friday, October 31, 2014 at 
> > > 5:00:00
> > > PMCET UTC+1 hour
> > > San Francisco (U.S.A. - California)   Friday, October 31, 2014 at 
> > > 9:00:00
> > > AMPDT UTC-7 hours
> > > New York (U.S.A. - New York)  Friday, October 31, 2014 at
> > 12:00:00
> > > Noon EDT UTC-4 hours
> > > Tel Aviv (Israel) Friday, October 31, 2014 at
> > 6:00:00
> > > PMIST UTC+2 hours
> > > Moscow (Russia)   Friday, October 31, 2014 at 7:00:00
> > > PMMSK UTC+3 hours
> > >
> > >
> > > Audio bridge details are:
> > > France:   +33 1588 77298
> > > Germany:  +49 8999 143191
> > > Israel:   +972 2589 6577
> > > Russia:   +7 495 641 4663
> > > UK:   +44 1793 402663
> > > USA:  +1 916 356 2663
> > >
> > > Bridge: 5
> > > Conference ID: 1264677285
> > >
> > > If anybody needs an access number for another country, let me know.
> > >
> > >
> > > Agenda:
> > > Discuss feature list for DPDK 2.0 (Q1 2015).
> > > Suggestions for topics for future calls.
> > >
> > >
> > > Thanks,
> > > Tim
> 


[dpdk-dev] [PATCH] Add external parser support for unknown commands.

2014-11-03 Thread Neil Horman
On Mon, Nov 03, 2014 at 02:25:51PM +, Wiles, Roger Keith wrote:
> 
> > On Nov 3, 2014, at 8:16 AM, Bruce Richardson  > intel.com> wrote:
> > 
> > On Mon, Nov 03, 2014 at 02:08:46PM +, Wiles, Roger Keith wrote:
> >> 
> >>> On Nov 3, 2014, at 4:41 AM, Bruce Richardson  >>> intel.com> wrote:
> >>> 
> >>> On Sun, Nov 02, 2014 at 04:28:28PM -0600, Keith Wiles wrote:
>  Allow for a external parser to handle the command line if the
>  command is not found and the developer has called the routine
>  int cmdline_set_external_parser(struct cmdline * cl,
>    cmdline_external_parser_t parser);
>  function to set the function pointer.
>  
>  The function for the external parser function should return 
>  CMDLINE_PARSE_NOMATCH
>  if not able to match the command requested or zero is handled.
>  
>  Prototype of external routine:
>  int (*cmdline_external_parser_t)(struct cmdline * cl, const char * buy);
>  
>  Signed-off-by: Keith Wiles 
> >>> 
> >>> Hi Keith,
> >>> 
> >>> what is the expected use case for this? Is it for embedding other 
> >>> programming languages alongside the existing DPDK command-line or some 
> >>> other purpose? [Perhaps the use case could be called out in the patch 
> >>> description]
> >> 
> >> Hi Bruce,
> >> 
> >> I guess the external parser could be used for other programming languages, 
> >> but the case I was looking at was to provide a default escape from the 
> >> command line parser to allow my application to handle the commands not 
> >> understood by the parser. Now that you point it out I could use something 
> >> like ?%? to execute a single line of script code, 
> >> which is a good idea (thanks).
> >> 
> >> One case I am looking at is when you want to execute a command and do not 
> >> want to add the support into the commands.c file for every possible 
> >> command. Take the case where you have a bunch of scripts (Lua) in a 
> >> directory much like a bin directory. Then you could type foo.lua or foo on 
> >> the command line and execute the foo.lua having the application detect you 
> >> want to load and run a Lua script after it has finished parsing for the 
> >> builtin commands.
> >> 
> >> For Pktgen I had to add a command called ?run  ? to 
> >> support running a script with arguments. I also needed to add a argvlist 
> >> type to cmdline to not error out on that command and split up the args 
> >> into a argv list like format. (Maybe I need to submit that code??) It 
> >> seemed more straight forward to just pass the command line to the 
> >> application to run the command. I understand that seems like a minor 
> >> point, but it does make it easier to use and to support the features I 
> >> want to support in my PoC.
> >> 
> >> Using this method you can just type the name instead of something like 
> >> ?run foo.lua? or just ?run foo? and let the code figure out what to run. I 
> >> have more plans for this features as well and have not finished the basic 
> >> PoC yet. If you want a peek I can show you what I am working on currently.
> >> 
> >> Does this help and do I really need to add all of this to the commit 
> >> message :-)
> >> 
> > Thanks for the explanation. However, if you are looking to have the 
> > application handle a bunch of commands itself, why does it need to use the 
> > commandline library at all? Why not just have the app handle all the 
> > commands instead of some of them?
> 
> I guess that would be reasonable, but then I would have to add support for 
> all of the command line parsing being done in the cmdline code. Think of this 
> as a default case for the parser and to me that makes more sense then just 
> doing my own command line design. In the cmdline code you guys provided is a 
> lot of features like history, control key support, arg parsing (IP, MAC) and 
> many others. I would rather not have to write that code myself.
> 
> The default case is the same behavior today, with giving a no match error 
> unless they add the external parser.

It seems alot simpler than that to me.  Looking at the test applications, the
command line parser expects the application to create an array of
cmdline_parse_ctx_t structures to support new option parsing.  If your goal is
to support other languages, it seems to make more sense to just use foreign
language bindings to merge your coding language support with the DPDK
(ostensibly you will already have to do that if you want to use other parts of
the DPDK).

Am I missing something?
Neil


> > 
> > /Bruce
> 
> Keith Wiles, Principal Technologist with CTO office, Wind River mobile 
> 972-213-5533
> 


[dpdk-dev] [PATCH] Add external parser support for unknown commands.

2014-11-03 Thread Neil Horman
On Mon, Nov 03, 2014 at 03:26:50PM -0800, Stephen Hemminger wrote:
> On Mon, 3 Nov 2014 16:50:15 +
> "Wiles, Roger Keith"  wrote:
> 
> > 
> > > On Nov 3, 2014, at 10:06 AM, Neil Horman  wrote:
> > > 
> > > On Mon, Nov 03, 2014 at 02:25:51PM +, Wiles, Roger Keith wrote:
> > >> 
> > >>> On Nov 3, 2014, at 8:16 AM, Bruce Richardson  > >>> intel.com> wrote:
> > >>> 
> > >>> On Mon, Nov 03, 2014 at 02:08:46PM +, Wiles, Roger Keith wrote:
> > >>>> 
> > >>>>> On Nov 3, 2014, at 4:41 AM, Bruce Richardson  > >>>>> intel.com> wrote:
> > >>>>> 
> > >>>>> On Sun, Nov 02, 2014 at 04:28:28PM -0600, Keith Wiles wrote:
> > >>>>>> Allow for a external parser to handle the command line if the
> > >>>>>> command is not found and the developer has called the routine
> > >>>>>> int cmdline_set_external_parser(struct cmdline * cl,
> > >>>>>>  cmdline_external_parser_t parser);
> > >>>>>> function to set the function pointer.
> > >>>>>> 
> > >>>>>> The function for the external parser function should return 
> > >>>>>> CMDLINE_PARSE_NOMATCH
> > >>>>>> if not able to match the command requested or zero is handled.
> > >>>>>> 
> > >>>>>> Prototype of external routine:
> > >>>>>> int (*cmdline_external_parser_t)(struct cmdline * cl, const char * 
> > >>>>>> buy);
> > >>>>>> 
> > >>>>>> Signed-off-by: Keith Wiles 
> > >>>>> 
> > >>>>> Hi Keith,
> > >>>>> 
> > >>>>> what is the expected use case for this? Is it for embedding other 
> > >>>>> programming languages alongside the existing DPDK command-line or 
> > >>>>> some other purpose? [Perhaps the use case could be called out in the 
> > >>>>> patch description]
> > >>>> 
> > >>>> Hi Bruce,
> > >>>> 
> > >>>> I guess the external parser could be used for other programming 
> > >>>> languages, but the case I was looking at was to provide a default 
> > >>>> escape from the command line parser to allow my application to handle 
> > >>>> the commands not understood by the parser. Now that you point it out I 
> > >>>> could use something like ?%? to execute a single 
> > >>>> line of script code, which is a good idea (thanks).
> > >>>> 
> > >>>> One case I am looking at is when you want to execute a command and do 
> > >>>> not want to add the support into the commands.c file for every 
> > >>>> possible command. Take the case where you have a bunch of scripts 
> > >>>> (Lua) in a directory much like a bin directory. Then you could type 
> > >>>> foo.lua or foo on the command line and execute the foo.lua having the 
> > >>>> application detect you want to load and run a Lua script after it has 
> > >>>> finished parsing for the builtin commands.
> > >>>> 
> > >>>> For Pktgen I had to add a command called ?run  ? to 
> > >>>> support running a script with arguments. I also needed to add a 
> > >>>> argvlist type to cmdline to not error out on that command and split up 
> > >>>> the args into a argv list like format. (Maybe I need to submit that 
> > >>>> code??) It seemed more straight forward to just pass the command line 
> > >>>> to the application to run the command. I understand that seems like a 
> > >>>> minor point, but it does make it easier to use and to support the 
> > >>>> features I want to support in my PoC.
> > >>>> 
> > >>>> Using this method you can just type the name instead of something like 
> > >>>> ?run foo.lua? or just ?run foo? and let the code figure out what to 
> > >>>> run. I have more plans for this features as well and have not finished 
> > >>>> the basic PoC yet. If you want a peek I can show you what I am working 
> > >>>> on currently.
> > >>>> 
> > >>>> Does this help and do I really

[dpdk-dev] [PATCH] Add external parser support for unknown commands.

2014-11-04 Thread Neil Horman
On Tue, Nov 04, 2014 at 04:52:48AM +, Wiles, Roger Keith wrote:
> 
> > On Nov 3, 2014, at 5:42 PM, Neil Horman  wrote:
> > 
> > On Mon, Nov 03, 2014 at 03:26:50PM -0800, Stephen Hemminger wrote:
> >> On Mon, 3 Nov 2014 16:50:15 +
> >> "Wiles, Roger Keith"  wrote:
> >> 
> >>> 
> >>>> On Nov 3, 2014, at 10:06 AM, Neil Horman  
> >>>> wrote:
> >>>> 
> >>>> On Mon, Nov 03, 2014 at 02:25:51PM +, Wiles, Roger Keith wrote:
> >>>>> 
> >>>>>> On Nov 3, 2014, at 8:16 AM, Bruce Richardson  >>>>>> intel.com> wrote:
> >>>>>> 
> >>>>>> On Mon, Nov 03, 2014 at 02:08:46PM +, Wiles, Roger Keith wrote:
> >>>>>>> 
> >>>>>>>> On Nov 3, 2014, at 4:41 AM, Bruce Richardson  >>>>>>>> intel.com> wrote:
> >>>>>>>> 
> >>>>>>>> On Sun, Nov 02, 2014 at 04:28:28PM -0600, Keith Wiles wrote:
> >>>>>>>>> Allow for a external parser to handle the command line if the
> >>>>>>>>> command is not found and the developer has called the routine
> >>>>>>>>> int cmdline_set_external_parser(struct cmdline * cl,
> >>>>>>>>> cmdline_external_parser_t parser);
> >>>>>>>>> function to set the function pointer.
> >>>>>>>>> 
> >>>>>>>>> The function for the external parser function should return 
> >>>>>>>>> CMDLINE_PARSE_NOMATCH
> >>>>>>>>> if not able to match the command requested or zero is handled.
> >>>>>>>>> 
> >>>>>>>>> Prototype of external routine:
> >>>>>>>>> int (*cmdline_external_parser_t)(struct cmdline * cl, const char * 
> >>>>>>>>> buy);
> >>>>>>>>> 
> >>>>>>>>> Signed-off-by: Keith Wiles 
> >>>>>>>> 
> >>>>>>>> Hi Keith,
> >>>>>>>> 
> >>>>>>>> what is the expected use case for this? Is it for embedding other 
> >>>>>>>> programming languages alongside the existing DPDK command-line or 
> >>>>>>>> some other purpose? [Perhaps the use case could be called out in the 
> >>>>>>>> patch description]
> >>>>>>> 
> >>>>>>> Hi Bruce,
> >>>>>>> 
> >>>>>>> I guess the external parser could be used for other programming 
> >>>>>>> languages, but the case I was looking at was to provide a default 
> >>>>>>> escape from the command line parser to allow my application to handle 
> >>>>>>> the commands not understood by the parser. Now that you point it out 
> >>>>>>> I could use something like ?%? to execute a 
> >>>>>>> single line of script code, which is a good idea (thanks).
> >>>>>>> 
> >>>>>>> One case I am looking at is when you want to execute a command and do 
> >>>>>>> not want to add the support into the commands.c file for every 
> >>>>>>> possible command. Take the case where you have a bunch of scripts 
> >>>>>>> (Lua) in a directory much like a bin directory. Then you could type 
> >>>>>>> foo.lua or foo on the command line and execute the foo.lua having the 
> >>>>>>> application detect you want to load and run a Lua script after it has 
> >>>>>>> finished parsing for the builtin commands.
> >>>>>>> 
> >>>>>>> For Pktgen I had to add a command called ?run  ? to 
> >>>>>>> support running a script with arguments. I also needed to add a 
> >>>>>>> argvlist type to cmdline to not error out on that command and split 
> >>>>>>> up the args into a argv list like format. (Maybe I need to submit 
> >>>>>>> that code??) It seemed more straight forward to just pass the command 
> >>>>>>> line to the application to run the command. I understand that seems 
> >>>>>>> like a minor point, but it does make it easier to use and to s

[dpdk-dev] [PATCH] Add external parser support for unknown commands.

2014-11-04 Thread Neil Horman
On Tue, Nov 04, 2014 at 02:44:39PM +, Wiles, Roger Keith wrote:
> 
> > On Nov 4, 2014, at 5:27 AM, Neil Horman  wrote:
> > 
> > On Tue, Nov 04, 2014 at 04:52:48AM +, Wiles, Roger Keith wrote:
> >> 
> >>> On Nov 3, 2014, at 5:42 PM, Neil Horman  wrote:
> >>> 
> >>> On Mon, Nov 03, 2014 at 03:26:50PM -0800, Stephen Hemminger wrote:
> >>>> On Mon, 3 Nov 2014 16:50:15 +
> >>>> "Wiles, Roger Keith"  wrote:
> >>>> 
> >>>>> 
> >>>>>> On Nov 3, 2014, at 10:06 AM, Neil Horman  
> >>>>>> wrote:
> >>>>>> 
> >>>>>> On Mon, Nov 03, 2014 at 02:25:51PM +, Wiles, Roger Keith wrote:
> >>>>>>> 
> >>>>>>>> On Nov 3, 2014, at 8:16 AM, Bruce Richardson  >>>>>>>> intel.com> wrote:
> >>>>>>>> 
> >>>>>>>> On Mon, Nov 03, 2014 at 02:08:46PM +, Wiles, Roger Keith wrote:
> >>>>>>>>> 
> >>>>>>>>>> On Nov 3, 2014, at 4:41 AM, Bruce Richardson  >>>>>>>>>> intel.com> wrote:
> >>>>>>>>>> 
> >>>>>>>>>> On Sun, Nov 02, 2014 at 04:28:28PM -0600, Keith Wiles wrote:
> >>>>>>>>>>> Allow for a external parser to handle the command line if the
> >>>>>>>>>>> command is not found and the developer has called the routine
> >>>>>>>>>>> int cmdline_set_external_parser(struct cmdline * cl,
> >>>>>>>>>>>cmdline_external_parser_t parser);
> >>>>>>>>>>> function to set the function pointer.
> >>>>>>>>>>> 
> >>>>>>>>>>> The function for the external parser function should return 
> >>>>>>>>>>> CMDLINE_PARSE_NOMATCH
> >>>>>>>>>>> if not able to match the command requested or zero is handled.
> >>>>>>>>>>> 
> >>>>>>>>>>> Prototype of external routine:
> >>>>>>>>>>> int (*cmdline_external_parser_t)(struct cmdline * cl, const char 
> >>>>>>>>>>> * buy);
> >>>>>>>>>>> 
> >>>>>>>>>>> Signed-off-by: Keith Wiles 
> >>>>>>>>>> 
> >>>>>>>>>> Hi Keith,
> >>>>>>>>>> 
> >>>>>>>>>> what is the expected use case for this? Is it for embedding other 
> >>>>>>>>>> programming languages alongside the existing DPDK command-line or 
> >>>>>>>>>> some other purpose? [Perhaps the use case could be called out in 
> >>>>>>>>>> the patch description]
> >>>>>>>>> 
> >>>>>>>>> Hi Bruce,
> >>>>>>>>> 
> >>>>>>>>> I guess the external parser could be used for other programming 
> >>>>>>>>> languages, but the case I was looking at was to provide a default 
> >>>>>>>>> escape from the command line parser to allow my application to 
> >>>>>>>>> handle the commands not understood by the parser. Now that you 
> >>>>>>>>> point it out I could use something like ?%? to 
> >>>>>>>>> execute a single line of script code, which is a good idea (thanks).
> >>>>>>>>> 
> >>>>>>>>> One case I am looking at is when you want to execute a command and 
> >>>>>>>>> do not want to add the support into the commands.c file for every 
> >>>>>>>>> possible command. Take the case where you have a bunch of scripts 
> >>>>>>>>> (Lua) in a directory much like a bin directory. Then you could type 
> >>>>>>>>> foo.lua or foo on the command line and execute the foo.lua having 
> >>>>>>>>> the application detect you want to load and run a Lua script after 
> >>>>>>>>> it has finished parsing for the builtin commands.
> >>>>>>>>> 
> >>>>>>>>> F

[dpdk-dev] [PATCH] Add external parser support for unknown commands.

2014-11-05 Thread Neil Horman
On Tue, Nov 04, 2014 at 08:45:53PM +, Wiles, Roger Keith wrote:
> 
> >> 
> > Can you provide a real example here?  usnig vague terms like "huge" really 
> > makes
> > more of an emotional argument than a factual one.  To cite an example the
> > cmdline_test program adds a command line paramter that just lets you parse a
> > number out of the command line.  Silly, granted, but it serves the purpose. 
> >  Its
> > called cmd_num, and with functions and data all told, it looks like it takes
> > about 17 lines of code.  Now thats more than what you're adding with you 
> > patch,
> > I grant you, but I assert that the "potentially huge" argument you're making
> > above is false, especially whan you consider that some reasonably clever 
> > coding
> > can likely allow you to reuse function parsing fairly easily.
> 
> I think I gave you an example below, but here is one I am looking at now.
> 
This was the level I was looking for, thank you.

> I have a number of programs and scripts in a bin directory, one of the 
> reasons for these programs is to be able to extend the application without 
> having to rebuild the application or adding special parsing of the arguments 
> for just the one program. In my case the scripts can parse the arguments as 
> it already has the support builtin and having cmdline do any processing of 
> the commands is redundant.
> 
> The programs being loaded are shared objects via dlopen() and already have 
> its own parsing code and trying to parse an IP address in cmdline to a binary 
> format would just require me to convert it back to a string to pass that 
> string in a argc/argv format. The project I am doing now is building a 
> dpdk-shell like environment, which starts up DPDK as normal then stops at a 
> command prompt.
> 
So, just so that I'm clear, you're taking example applications from the DPDK,
rebuilding them as shared objects (which they are not currently setup to be
built as), and then running them from a program of your authorship, by loading
them (the DPDK example applications), via dlopen  and calling some arbitrary
function(s) within them.  Is that correct?  If so, that seemsodd.  Commonly
programs that extend other programs without modifying them use fork to establish
a parent/child relationship, and then use the parent to properly control the
child.  GDB is a great example here, one which handles your command line
predicament by adding a command to the gdb console to build the child command
line.  Theres no reason you can't do the same thing, and that requires even less
code than what your proposing here.  If you wanted it to be non-interactive, its
just as easy to add a command line option to your application parser that is:

--child-options=""

which passes a complete command line to the dpdk application without having to
convolute the work of two parsers.

> In the command prompt I am able to load and execute application like l2fwd or 
> l3fwd built as a shared object. Plus I still have a command prompt from the 
> dpdk-sh to launch other applications or look at stats or debug the 
> application. I am still refining the details, but being able to launch a 
> ?dpdk-sh? like system then be able to execute/debug/view stats and anything 
> else one can think of is a very reasonable use. Using this method the user 
> commands are simple and easy to remember, plus someone can build a new 
> application to debug without rebuilding DPDK.
> 
This sounds even more like the GDB example above.  Its set args command should
be your guide to a good implementation.

> I can see this type of environment as a cleaner way for new users to 
> understand DPDK and start playing with the system. I want to add support to 
> run multiple applications at the same time or at least be able to grab stats 
> and information from the application and/or DPDK without someone having to 
> add that support to his application. It is possible with some changes to 
> remove the cmdline parsing from the l3fwd application by adding the basic 
> commands into the dpdk-sh environment. ** This does not mean I am forcing 
> cmdline on applications or developers they can still use DPDK without cmdline 
> or any other feature.
> 
Sounding more and more like a GDB type of environment...

> Plus I can now execute any function within the application of loaded modules 
> or DPDK by doing a symbol lookup and call the function.
> 
Like GDB

> I am trying to build the dpdk-sh with very little modifications to DPDK and 
> as another application similar to Pktgen. Today the examples directory has 
> some great example code all developed by Intel and I would like to start 
> seeing other applications (like Pktgen) contributed to the community. It 
> would be even better (with some effort) to rewrite the examples directory to 
> use dpdk-sh instead or as well.

Then you should definately use the GDB model, as that requires zero changes to
anything in the DPDK.

> > 
> >> Lets say you have a directory on the

[dpdk-dev] [PATCH 1/2] lib/librte_vhost: code style fixes

2014-11-06 Thread Neil Horman
On Thu, Nov 06, 2014 at 07:31:41AM +0800, Huawei Xie wrote:
> fixes alignment issues, lengthy lines, misordered type and other coding style 
> issues.
> 
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/eventfd_link/eventfd_link.c | 244 ++---
>  lib/librte_vhost/eventfd_link/eventfd_link.h | 127 ++-
>  lib/librte_vhost/rte_virtio_net.h|   3 +-
>  lib/librte_vhost/vhost-net-cdev.c| 187 +---
>  lib/librte_vhost/vhost_rxtx.c|  13 +-
>  lib/librte_vhost/virtio-net.c| 317 
> +--
>  6 files changed, 494 insertions(+), 397 deletions(-)
> 
Acked-by: Neil Horman 



[dpdk-dev] [PATCH 2/2] lib/librte_vhost: printk->pr_debug

2014-11-06 Thread Neil Horman
On Thu, Nov 06, 2014 at 07:31:42AM +0800, Huawei Xie wrote:
> printk -> pr_debug
> 
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/eventfd_link/eventfd_link.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_vhost/eventfd_link/eventfd_link.c 
> b/lib/librte_vhost/eventfd_link/eventfd_link.c
> index 542ec2c..7755dd6 100644
> --- a/lib/librte_vhost/eventfd_link/eventfd_link.c
> +++ b/lib/librte_vhost/eventfd_link/eventfd_link.c
> @@ -88,13 +88,13 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
> unsigned long arg)
>   task_target =
>   pid_task(find_vpid(eventfd_copy.target_pid), 
> PIDTYPE_PID);
>   if (task_target == NULL) {
> - printk(KERN_DEBUG "Failed to get mem ctx for target 
> pid\n");
> + pr_debug("Failed to get mem ctx for target pid\n");
>   return -EFAULT;
>   }
>  
>   files = get_files_struct(current);
>   if (files == NULL) {
> - printk(KERN_DEBUG "Failed to get files struct\n");
> + pr_debug("Failed to get files struct\n");
>   return -EFAULT;
>   }
>  
> @@ -109,7 +109,7 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
> unsigned long arg)
>   put_files_struct(files);
>  
>   if (file == NULL) {
> - printk(KERN_DEBUG "Failed to get file from source 
> pid\n");
> + pr_debug("Failed to get file from source pid\n");
>   return 0;
>   }
>  
> @@ -128,7 +128,7 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
> unsigned long arg)
>  
>   files = get_files_struct(task_target);
>   if (files == NULL) {
> - printk(KERN_DEBUG "Failed to get files struct\n");
> + pr_debug("Failed to get files struct\n");
>   return -EFAULT;
>   }
>  
> @@ -143,7 +143,7 @@ eventfd_link_ioctl(struct file *f, unsigned int ioctl, 
> unsigned long arg)
>   put_files_struct(files);
>  
>   if (file == NULL) {
> - printk(KERN_DEBUG "Failed to get file from target 
> pid\n");
> + pr_debug("Failed to get file from target pid\n");
>   return 0;
>   }
>  
> -- 
> 1.8.1.4
> 
> 
Acked-by: Neil Horman 


[dpdk-dev] [PATCH 2/7] ENIC PMD License

2014-11-07 Thread Neil Horman
On Sat, Nov 08, 2014 at 01:35:42AM +0530, Sujith Sankar wrote:
> Signed-off-by: Sujith Sankar 
> ---
>  lib/librte_pmd_enic/LICENSE | 23 +++
>  1 file changed, 23 insertions(+)
>  create mode 100644 lib/librte_pmd_enic/LICENSE
> 
> diff --git a/lib/librte_pmd_enic/LICENSE b/lib/librte_pmd_enic/LICENSE
> new file mode 100644
> index 000..589e36f
> --- /dev/null
> +++ b/lib/librte_pmd_enic/LICENSE
> @@ -0,0 +1,23 @@
> + * Copyright (c) 2014, Cisco Systems, Inc. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in
> + * the documentation and/or other materials provided with the
> + * distribution.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> + * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> + * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
> + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
> + * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> + * POSSIBILITY OF SUCH DAMAGE.
> -- 
> 1.9.1
> 
> 

It looks like you're trying to use the BSD 2 clause license from here.  Can you
please grab the properly formatted copy from someplace like here:
http://opensource.org/licenses/BSD-2-Clause

So that its clear.  I think it should just call out the clauses more
clearly.

Thanks
Neil



[dpdk-dev] [PATCH 1/7] DPDK changes for accommodating ENIC PMD

2014-11-07 Thread Neil Horman
On Sat, Nov 08, 2014 at 01:35:41AM +0530, Sujith Sankar wrote:
> Signed-off-by: Sujith Sankar 
> ---
>  app/test-pmd/testpmd.c | 1 +
>  config/common_linuxapp | 6 ++
>  lib/Makefile   | 1 +
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 7 +++
>  lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 1 +
>  mk/rte.app.mk  | 4 
>  6 files changed, 20 insertions(+)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index f76406f..4857d56 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -1874,6 +1874,7 @@ main(int argc, char** argv)
>   "check that "
> "CONFIG_RTE_LIBRTE_IGB_PMD=y and that "
> "CONFIG_RTE_LIBRTE_EM_PMD=y and that "
> +   "CONFIG_RTE_LIBRTE_ENIC_PMD=y and that "
> "CONFIG_RTE_LIBRTE_IXGBE_PMD=y in your "
> "configuration file\n");
>  
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 57b61c9..6b5bac6 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -210,6 +210,12 @@ CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM=4
>  CONFIG_RTE_LIBRTE_I40E_ITR_INTERVAL=-1
>  
>  #
> +# Compile burst-oriented Cisco ENIC PMD driver
> +#
> +CONFIG_RTE_LIBRTE_ENIC_PMD=y
> +CONFIG_RTE_LIBRTE_ENIC_PMD_DEBUG_TRACE=n
> +
> +#
>  # Compile burst-oriented VIRTIO PMD driver
>  #
>  CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
> diff --git a/lib/Makefile b/lib/Makefile
> index e3237ff..1911790 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -43,6 +43,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += librte_cmdline
>  DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
>  DIRS-$(CONFIG_RTE_LIBRTE_E1000_PMD) += librte_pmd_e1000
>  DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += librte_pmd_ixgbe
> +DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += librte_pmd_enic
Don't include this change until last in the series, as you haven't patched in
this directory yet.  Otherwise you'll get a failed to build problem.




[dpdk-dev] [PATCH 3/7] ENIC PMD Makefile

2014-11-07 Thread Neil Horman
On Sat, Nov 08, 2014 at 01:35:43AM +0530, Sujith Sankar wrote:
> Signed-off-by: Sujith Sankar 
> ---
>  lib/librte_pmd_enic/Makefile | 66 
> 
>  1 file changed, 66 insertions(+)
>  create mode 100644 lib/librte_pmd_enic/Makefile
> 
> diff --git a/lib/librte_pmd_enic/Makefile b/lib/librte_pmd_enic/Makefile
> new file mode 100644
> index 000..7605a8f
> --- /dev/null
> +++ b/lib/librte_pmd_enic/Makefile
> @@ -0,0 +1,66 @@
> +#   BSD LICENSE
> +# 
> +#   Copyright(c) 2010-2013 Intel Corporation. All rights reserved.
> +#   All rights reserved.
> +# 
> +#   Redistribution and use in source and binary forms, with or without 
> +#   modification, are permitted provided that the following conditions 
> +#   are met:
> +# 
> +# * Redistributions of source code must retain the above copyright 
> +#   notice, this list of conditions and the following disclaimer.
> +# * Redistributions in binary form must reproduce the above copyright 
> +#   notice, this list of conditions and the following disclaimer in 
> +#   the documentation and/or other materials provided with the 
> +#   distribution.
> +# * Neither the name of Intel Corporation nor the names of its 
> +#   contributors may be used to endorse or promote products derived 
> +#   from this software without specific prior written permission.
> +# 
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +# 
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_enic.a
> +
> +CFLAGS += -I$(RTE_SDK)/lib/librte_hash/
> +CFLAGS += -O3 -Wno-deprecated
> +
> +VPATH += $(RTE_SDK)/lib/librte_pmd_enic/src
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_main.c 
> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_clsf.c 
> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_cq.c 
> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_wq.c 
> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_dev.c 
> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_intr.c 
> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_rq.c 
> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_etherdev.c
> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_res.c
> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_rss.c
> +
> +
> +# this lib depends upon:
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += lib/librte_eal lib/librte_ether
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += lib/librte_mempool lib/librte_mbuf
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += lib/librte_net lib/librte_malloc
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> +
> -- 
> 1.9.1
> 
> 

Make this the last patch in your series, and merge it with the chunk from the
last patch that adds the enic directory to the lib/Makefile, so that a bisect
will build between these commits.

Neil



[dpdk-dev] White listing a virtual device

2014-11-07 Thread Neil Horman
On Fri, Nov 07, 2014 at 01:13:37PM +, Nicolas Pernas Maradei wrote:
> On 07/11/14 12:55, Thomas Monjalon wrote:
> >It's by design. If you add a vdev, you want to use it and there is no
> >reason to whitelist it, and especially no reason to blacklist a device
> >you created for your usage.
> >
> >Do you agree?
> 
> Hi Thomas,
> 
> Generally speaking you probably won't want to white list a virtual device -
> just using it. However it does seem an inconsistency in the design that you
> could add virtual devices but you can't white list them. If they are added
> to the main device list they should be treated just as another device.
> 
> In our particular use case we want to white list a pcap device to ensure
> that it is the only available port for testing.
> 
> Thanks,
> Nico.
> 

Then you create the pcap device with --vdev, and simply don't load the pmds for
any of your physical devices (or just don't use pci-whitelist at all if you're
doing a static build).  If you do that, then the corresponding niantic driver
won't initialize any of the hardware, you'll only get the pcap port.

Neil



[dpdk-dev] White listing a virtual device

2014-11-07 Thread Neil Horman
On Fri, Nov 07, 2014 at 01:39:52PM +, Nicolas Pernas Maradei wrote:
> 
> On 07/11/14 13:26, Neil Horman wrote:
> >Then you create the pcap device with --vdev, and simply don't load the pmds 
> >for
> >any of your physical devices (or just don't use pci-whitelist at all if 
> >you're
> >doing a static build).  If you do that, then the corresponding niantic driver
> >won't initialize any of the hardware, you'll only get the pcap port.
> >
> >Neil
> 
> Hi Neil,
> 
> What you are saying is just another way to black list the ports I don't want
> to use. I'm aware of that option (as well as using the -b option) but in our
> particular case we have several systems under test with different
> configurations and we want to use this virtual port only. Which seems to be
> a perfect use case for white listing rather than black listing or modifying
> the system configuration.
> 
> As far as I remember this option was available in previous versions.
> 
> Thanks,
> Nico.
> 

Ah, you want the -w option then, it still appears in the short options list in
my tree. That sets up the option parsing for all pci devices to require
whitelisting to be initalized.  virtual devices are exempt from this process
because declaring them with --vdev implicitly means you want to use them.

Neil



[dpdk-dev] [PATCH 3/7] ENIC PMD Makefile

2014-11-10 Thread Neil Horman
On Mon, Nov 10, 2014 at 09:59:45AM +, Sujith Sankar (ssujith) wrote:
> Neil,
> 
> If I move the DPDK patch that accommodates ENIC PMD (that is the one that
> patches lib/Makefile) to the last in the series, builds between commits
> would succeed, wouldn?t it?  Moving that to the last is anyway needed.
> 
correct, yes.
Neil

> Thanks,
> -Sujith
> 
> On 07/11/14 9:16 pm, "Sujith Sankar (ssujith)"  wrote:
> 
> >Hi Neil,
> >
> >Thanks for the comments.  I shall work on the modifications that you have
> >suggested and get back with V2.
> >
> >Regards,
> >-Sujith
> >
> >On 07/11/14 5:04 pm, "Neil Horman"  wrote:
> >
> >>On Sat, Nov 08, 2014 at 01:35:43AM +0530, Sujith Sankar wrote:
> >>> Signed-off-by: Sujith Sankar 
> >>> ---
> >>>  lib/librte_pmd_enic/Makefile | 66
> >>>
> >>>  1 file changed, 66 insertions(+)
> >>>  create mode 100644 lib/librte_pmd_enic/Makefile
> >>> 
> >>> diff --git a/lib/librte_pmd_enic/Makefile
> >>>b/lib/librte_pmd_enic/Makefile
> >>> new file mode 100644
> >>> index 000..7605a8f
> >>> --- /dev/null
> >>> +++ b/lib/librte_pmd_enic/Makefile
> >>> @@ -0,0 +1,66 @@
> >>> +#   BSD LICENSE
> >>> +# 
> >>> +#   Copyright(c) 2010-2013 Intel Corporation. All rights reserved.
> >>> +#   All rights reserved.
> >>> +# 
> >>> +#   Redistribution and use in source and binary forms, with or without
> >>> +#   modification, are permitted provided that the following conditions
> >>> +#   are met:
> >>> +# 
> >>> +# * Redistributions of source code must retain the above copyright
> >>> +#   notice, this list of conditions and the following disclaimer.
> >>> +# * Redistributions in binary form must reproduce the above
> >>>copyright 
> >>> +#   notice, this list of conditions and the following disclaimer
> >>>in 
> >>> +#   the documentation and/or other materials provided with the
> >>> +#   distribution.
> >>> +# * Neither the name of Intel Corporation nor the names of its
> >>> +#   contributors may be used to endorse or promote products
> >>>derived 
> >>> +#   from this software without specific prior written permission.
> >>> +# 
> >>> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> >>>CONTRIBUTORS 
> >>> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> >>> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> >>>FOR 
> >>> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> >>>COPYRIGHT 
> >>> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> >>>INCIDENTAL, 
> >>> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> >>> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> >>>USE, 
> >>> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
> >>>ANY 
> >>> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> >>>TORT 
> >>> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
> >>>USE 
> >>> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> >>>DAMAGE.
> >>> +# 
> >>> +
> >>> +include $(RTE_SDK)/mk/rte.vars.mk
> >>> +
> >>> +#
> >>> +# library name
> >>> +#
> >>> +LIB = librte_pmd_enic.a
> >>> +
> >>> +CFLAGS += -I$(RTE_SDK)/lib/librte_hash/
> >>> +CFLAGS += -O3 -Wno-deprecated
> >>> +
> >>> +VPATH += $(RTE_SDK)/lib/librte_pmd_enic/src
> >>> +
> >>> +#
> >>> +# all source are stored in SRCS-y
> >>> +#
> >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_main.c
> >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_clsf.c
> >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_cq.c
> >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_wq.c
> >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_dev.c
> >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_intr.c
> >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_rq.c
> >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_etherdev.c
> >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic_res.c
> >>> +SRCS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += vnic_rss.c
> >>> +
> >>> +
> >>> +# this lib depends upon:
> >>> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += lib/librte_eal
> >>>lib/librte_ether
> >>> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += lib/librte_mempool
> >>>lib/librte_mbuf
> >>> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += lib/librte_net
> >>>lib/librte_malloc
> >>> +
> >>> +include $(RTE_SDK)/mk/rte.lib.mk
> >>> +
> >>> -- 
> >>> 1.9.1
> >>> 
> >>> 
> >>
> >>Make this the last patch in your series, and merge it with the chunk from
> >>the
> >>last patch that adds the enic directory to the lib/Makefile, so that a
> >>bisect
> >>will build between these commits.
> >>
> >>Neil
> >>
> >
> 
> 


[dpdk-dev] building shared library

2014-11-11 Thread Neil Horman
On Tue, Nov 11, 2014 at 03:26:04PM +, De Lara Guarch, Pablo wrote:
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Newman Poborsky
> > Sent: Tuesday, November 11, 2014 3:17 PM
> > To: Gonzalez Monroy, Sergio
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] building shared library
> > 
> > Hi,
> > 
> > after building DPDK libs as shared libraries and linking it, I'm back to my
> > first problem: rte_eal_driver_register() never gest called and my app
> > crashes since there are no drivers registered.  As previously mentioned, in
> > regular DPDK user app this functions is called for every driver before
> > main(). How?
> 
> If I am not wrong here, you have to use the -d option to specify the driver 
> you want to use.
> 
> Btw, the option you were looking for can be found in config/common_linuxapp 
> or config/common_bsdapp.
> 

Alternatively, when you link your application you can speify
-llibrte_pmd_ and your applicaion should call all the constructors when
the dynamic loader hits your binaries DT_NEEDED table.  Thats how you can avoid
the command line specification.

Neil

> Pablo
> > 
> > BR,
> > Newman
> > 
> > On Tue, Nov 11, 2014 at 3:44 PM, Newman Poborsky
> > 
> > wrote:
> > 
> > > Hi Sergio,
> > >
> > > no, that sounds good, thank you.  Since I'm not that familiar with DPDK
> > > build system, where should this option be set? In 'lib' folder's Makefile?
> > >
> > > Thank you once again!
> > >
> > > BR,
> > > Newman
> > >
> > > On Tue, Nov 11, 2014 at 3:18 PM, Sergio Gonzalez Monroy <
> > > sergio.gonzalez.monroy at intel.com> wrote:
> > >
> > >> On Tue, Nov 11, 2014 at 01:10:29PM +0100, Newman Poborsky wrote:
> > >> > Hi,
> > >> >
> > >> > I want to build one .so file with my app (it contains API that I want 
> > >> > to
> > >> > call through JNI) and all DPDK libs that I use in my app.
> > >> >
> > >> > As I've already mentioned, when I build and start my dpdk app as a
> > >> > standalone application, I can see that before main() is called, there
> > >> is a
> > >> > call to 'rte_eal_driver_register()' function for every driver. When I
> > >> build
> > >> > .so file, this does not happen and no driver is registered so everyting
> > >> > after rte_eal_init() fails.
> > >> >
> > >> Hi Newman,
> > >>
> > >> AFAIK the current build system does not support that.
> > >>
> > >> You can build DPDK as shared libs by setting the following config option:
> > >> CONFIG_RTE_BUILD_SHARED_LIB=y
> > >>
> > >> Then build your app as an .so that links against DPDK libs, so you have
> > >> explicit dependencies (such dependencies should show with ldd).
> > >>
> > >> Is there any reason why you want everything to be a single .so ?
> > >>
> > >> I don't know much about how Java loads DSOs but I reckon that it must
> > >> resolve
> > >> explicit dependencies such as libc.
> > >>
> > >> Thanks,
> > >> Sergio
> > >>
> > >>
> > >> >
> > >> > BR,
> > >> > Newman
> > >> >
> > >>
> > >
> > >


[dpdk-dev] IRC Channel, Come Guys ;)

2014-04-02 Thread Neil Horman
On Wed, Apr 02, 2014 at 10:13:30AM +0200, Thomas Monjalon wrote:
> Hello,
> 
> 2014-04-01 23:35, Fred Pedrisa:
> > I've created an unofficial channel in freenode irc network, so for those
> > interested in joining there for a better chatting here are the informations
> > 
> > Server : irc.freenode.net
> > Channel : ##dpdk (Yes, you must type two sharps).
> 
> Why not use #dpdk ?
> 
Agreed, two hashes seems like your working around something that needs to be
fixed.  #dpdk is open and already has interested parties on it.
Neil

> -- 
> Thomas
> 


[dpdk-dev] [PATCH v5] eal_common_cpuflags: Fix %rbx corruption, and simplify the code

2014-04-02 Thread Neil Horman
On Tue, Mar 25, 2014 at 01:51:04PM -0700, H. Peter Anvin wrote:
> On 03/25/2014 12:52 PM, Neil Horman wrote:
> > Neil Horman reported that on x86-64 the upper half of %rbx would get
> > clobbered when the code was compiled PIC or PIE, because the
> > i386-specific code to preserve %ebx was incorrectly compiled.
> > 
> > However, the code is really way more complex than it needs to be.  For
> > one thing, the CPUID instruction only needs %eax (leaf) and %ecx
> > (subleaf) as parameters, and since we are testing for bits, we might
> > as well list the bits explicitly.  Furthermore, we can use an array
> > rather than doing a switch statement inside a structure.
> > 
> > Reported-by: Neil Horman 
> > Signed-off-by: H. Peter Anvin 
> > Signed-off-by: Neil Horman 
> > 
> 
> Looks good to me.
> 
> Reviewed-by: H. Peter Anvin 
> 
>   -hpa
> 
> 
> 
Bump, did this get lost somewhere?  Its been over a week and I don't see it in
the tree

Regards
Neil



[dpdk-dev] [PATCH 03/16] pkg: add recipe for RPM

2014-04-02 Thread Neil Horman
On Wed, Apr 02, 2014 at 11:53:51AM +0200, Thomas Monjalon wrote:
> Hello,
> 
> Sorry for the long delay.
> 
> 2014-02-26 14:07, Thomas Graf:
> > On 02/04/2014 04:54 PM, Thomas Monjalon wrote:
> > > +Version: 1.5.2r1
> > > +Release: 1
> > 
> > What kind of upgrade strategy do you have in mind?
> > 
> > I'm raising this because Fedora and other distributions will require
> > a unique package name for every version of the package that is not
> > backwards compatible.
> > 
> > Typically libraries provide backwards compatible within a major release,
> > i.e. all 1.x.x releases would be compatible. I realize that this might
> > not be applicable yet but maybe 1.5.x?
> > 
> > Depending on the versioning schema the name would be dpdk15, dpdk16, ...
> > or dpdk152, dpdk153, ...
> 
> We are working on this but at the moment there is no restriction on API/ABI 
> breakage. So I think it's too early to define such rule.
>  
Now that you have DSO builds in place, theres no reason not to take the extra
step of versioning your API's, making backwards compatibility fairly
straightforward.  Monolithic builds are still somewhat problematic regarding API
stability, but you could certainly offer stability in the DSOs.

> > > +BuildRequires: kernel-devel, kernel-headers, doxygen
> > 
> > Is a python environment required as well?
> 
> Python is only needed to run some tools on the target. But is is optional.
> Do you think it should be written somewhere?
> 
> > > +%description
> > > +Dummy main package. Make only subpackages.
> > 
> > I would just call the main package "libdpdk152" so you don't have to
> > repeat the encoding versioning in all the subpackages.
> > 
> > > +
> > > +%package core-runtime
> > 
> > What about calling it just "libdpdk"?
> 
The version name should be left out of the library name, whatever you do.
Packaging can be responsible for versioning.

> In this case, it should be libdpdk-core in order to distinguish it from dpdk 
> extensions. But the name of the project is dpdk so it seems simpler to call 
> it 
> dpdk-core.
> Is the "lib" prefix mandatory for libraries?
> 
Not strictly, but IIRC if you don't add the lib, the linker won't find it with
the -l option, so you'll want to add it.

> > > +%files core-runtime
> > > +%dir %{datadir}
> > > +%{datadir}/config
> > > +%{datadir}/tools
> > > +%{moddir}/*
> > > +%{_sbindir}/*
> > > +%{_bindir}/*
> > > +%{_libdir}/*.so
> > 
> > This brings up the question of multiple parallel DPDK installations.
> > A specific application linking to library version X will also require
> > tools of version X, right? A second application linking against version
> > Y will require tools version Y. Right now, these could not be installed
> > in parallel. Any chance we can make the runtime version independent?
> 
> Are you thinking about installing different major versions? In my 
> understanding, we cannot install 2 different minor versions of a package.
> As long as there is no stable API, there is no major versions defined.
> So don't you think we should speak about it later?
> 
If the versioning is done properly (i.e shared libraries get version ids
attached to the library files properly), you can install as many library
versions as you like.  You can only install a single -devel package, since it
links lib.so to a specific version.

> > Same applies to header files. A good option here would be to install
> > them to /usr/include/libdpdk{version}/ and have a dpdk-1.5.2.pc which
> > provides Cflags: -I${includedir}/libdpdk${version}
> 
> Yes same applies :)
> I agree that a .pc file would be a good idea. But we also must allow to build 
> with the DPDK framework.
> 
> > You'll also need for all packages and subpackages installing shared
> > libraries:
> > 
> > %post -p /sbin/ldconfig
> > %postun -p /sbin/ldconfig
> 
> OK
> 
> Thanks for the review
> -- 
> Thomas
> 


[dpdk-dev] DPDK API/ABI Stability

2014-04-09 Thread Neil Horman
Hey all-
I was going to include this as an addendum to the packaging thread on
this list, but I can't seem to find it in my inbox, so forgive me starting a new
one.

I wanted to broach the subject of ABI/API stability on the list here.
Given the recent great efforts to make dpdk packagable by disributions, I think
we probably need to discuss API stability in more depth and come up with a plan
to implement it.  Has anyone started looking into this?  If not, it seems to me
to be reasonable to start by placing a line in the sand with the functions
documented here:

http://dpdk.org/doc/api/

It seems to me we can start reviewing the API library by library, enusring only
those functions are exported, making sure the data types are appropriate for
export, and marking them with a linker script to version them appropriately.

Thoughts?

Neil



[dpdk-dev] DPDK API/ABI Stability

2014-04-10 Thread Neil Horman
On Wed, Apr 09, 2014 at 02:08:49PM -0700, Stephen Hemminger wrote:
> On Wed, 9 Apr 2014 14:39:52 -0400
> Neil Horman  wrote:
> 
> > Hey all-
> > I was going to include this as an addendum to the packaging thread on
> > this list, but I can't seem to find it in my inbox, so forgive me starting 
> > a new
> > one.
> > 
> > I wanted to broach the subject of ABI/API stability on the list here.
> > Given the recent great efforts to make dpdk packagable by disributions, I 
> > think
> > we probably need to discuss API stability in more depth and come up with a 
> > plan
> > to implement it.  Has anyone started looking into this?  If not, it seems 
> > to me
> > to be reasonable to start by placing a line in the sand with the functions
> > documented here:
> > 
> > http://dpdk.org/doc/api/
> > 
> > It seems to me we can start reviewing the API library by library, enusring 
> > only
> > those functions are exported, making sure the data types are appropriate for
> > export, and marking them with a linker script to version them appropriately.
> 
> To what level? source? binary, internal functions?
> 
Well, I was thinking both (hence the API/ABI comment above), but at least API
stability as a start.  Stabilizing internal functions doesn't make any sense to
me since, by definition those aren't exposed to users trying to make use of the
library.

> Some of the API's could be stablized without much impact but others such
> as the device driver interface is incomplete and freezing it would make
> live hard.

But the driver interface isn't listed on the api documentation above.  Clearly
we'd need to address that eventually, but as a start it can likely be ignored,
at least we can give applications a modicum of stability.

Neil



[dpdk-dev] [PATCH 0/19] Separate compile time linkage between eal lib and pmd's

2014-04-10 Thread Neil Horman
Disconnect compile time linkage between eal library / applications and pmd's

I noticed that, while tinkering with dpdk, building for shared libraries still
resulted in all the test applications linking to all the built pmd's, despite
not actually needing them all.  We are able to tell an application at run time
(via the -d/--blacklist/--whitelist/--vdev options) which pmd's we want to use, 
and
so have no need to link them at all. The only reason they get pulled in is
because rte_eal_non_pci_init_etherdev and rte_pmd_init_all contain static lists
to the individual pmd init functions. The result is that, even when building as
DSO's, we have to load all the pmd libraries, which is space inefficient and
defeating of some of the purpose of shared objects. 

To correct this, I developed this patch series, which introduces two new macros,
PMD_INIT_NONPCI and PMD_INIT.  These two macros use constructors to register
their init routines at runtime, either prior to the execution of main() when
linked statically, or when dlopen is called on a DSO at run time.  The result is
that PMD's can be loaded at run time without the application or eal library
having to hold a reference to them.  They work in a very simmilar fashion to the
module_init routine in the linux kernel.

I've tested this feature using the igb and pcap pmd's, both statically and
dynamically linked with the test and testpmd sample applications, and it seems
to work well.

Note, I encountered  a few bugs along the way, which I fixed and noted in the
series.

Regards
Neil



[dpdk-dev] [PATCH 01/19] makefiles: Fixed -share command line option error

2014-04-10 Thread Neil Horman
The shared libraries built with the current makefile set produce static
libraries rather than actual shared objects.  This is due to several missing
options that are required to correctly build shared objects using ld, as well as
a mis-specified -share option (which should be -shared). Switching to the use of
CC rather than LD and fixing the -shared option corrects these problems and
builds the DSOs correctly.

Signed-off-by: Neil Horman 
---
 mk/rte.lib.mk  | 4 ++--
 mk/rte.sharelib.mk | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index f75ca92..8a914db 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -68,7 +68,7 @@ O_TO_A_DO = @set -e; \
$(O_TO_A) && \
echo $(O_TO_A_CMD) > $(call exe2cmd,$(@))

-O_TO_S = $(LD) $(CPU_LDFLAGS) -z muldefs -share $(OBJS-y) -o $(LIB)
+O_TO_S = $(CC) $(CPU_LDFLAGS) -z muldefs -shared $(OBJS-y) -o $(LIB)
 O_TO_S_STR = $(subst ','\'',$(O_TO_S)) #'# fix syntax highlight
 O_TO_S_DISP = $(if $(V),"$(O_TO_S_STR)","  LD $(@)")
 O_TO_S_DO = @set -e; \
@@ -84,7 +84,7 @@ O_TO_C_DO = @set -e; \
$(lib_dir) \
$(copy_obj)
 else
-O_TO_C = $(LD) -z muldefs -share $(OBJS-y) -o $(LIB_ONE)
+O_TO_C = $(CC) -z muldefs -shared $(OBJS-y) -o $(LIB_ONE)
 O_TO_C_STR = $(subst ','\'',$(O_TO_C)) #'# fix syntax highlight
 O_TO_C_DISP = $(if $(V),"$(O_TO_C_STR)","  LD_C $(@)")
 O_TO_C_DO = @set -e; \
diff --git a/mk/rte.sharelib.mk b/mk/rte.sharelib.mk
index 3967b51..8e821e0 100644
--- a/mk/rte.sharelib.mk
+++ b/mk/rte.sharelib.mk
@@ -45,7 +45,7 @@ sharelib: $(LIB_ONE) FORCE

 OBJS = $(wildcard $(RTE_OUTPUT)/build/lib/*.o)

-O_TO_S = $(LD) $(CPU_LDFLAGS) -share $(OBJS) -o $(RTE_OUTPUT)/lib/$(LIB_ONE)
+O_TO_S = $(CC) $(CPU_LDFLAGS) -shared $(OBJS) -o $(RTE_OUTPUT)/lib/$(LIB_ONE)
 O_TO_S_STR = $(subst ','\'',$(O_TO_S)) #'# fix syntax highlight
 O_TO_S_DISP = $(if $(V),"$(O_TO_S_STR)","  LD $(@)")
 O_TO_S_CMD = "cmd_$@ = $(O_TO_S_STR)"
-- 
1.8.3.1



[dpdk-dev] [PATCH 02/19] pmd: Add PMD_INIT_NONPCI macros

2014-04-10 Thread Neil Horman
This macro will allow nonpci based pmd driver to register an init function with
the core eal library so that they don't need to be referenced at link time.  The
macro expands to a constructor that calls an eal library registration function,
passing a pointer to the pmd's init routine.

Signed-off-by: Neil Horman 
---
 lib/librte_eal/common/Makefile |  2 +-
 lib/librte_eal/common/eal_common_nonpci_devs.c | 60 +-
 lib/librte_eal/common/include/rte_pmd.h| 59 +
 3 files changed, 119 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pmd.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index b9f3b88..e52b665 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -38,7 +38,7 @@ INC += rte_pci_dev_ids.h rte_per_lcore.h rte_prefetch.h 
rte_random.h
 INC += rte_rwlock.h rte_spinlock.h rte_tailq.h rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_cpuflags.h rte_version.h rte_tailq_elem.h
 INC += rte_eal_memconfig.h rte_malloc_heap.h
-INC += rte_hexdump.h rte_devargs.h
+INC += rte_hexdump.h rte_devargs.h rte_pmd.h

 ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
diff --git a/lib/librte_eal/common/eal_common_nonpci_devs.c 
b/lib/librte_eal/common/eal_common_nonpci_devs.c
index 71cbb1e..0f94b65 100644
--- a/lib/librte_eal/common/eal_common_nonpci_devs.c
+++ b/lib/librte_eal/common/eal_common_nonpci_devs.c
@@ -34,6 +34,8 @@

 #include 
 #include 
+#include 
+#include 
 #include 
 #ifdef RTE_LIBRTE_PMD_RING
 #include 
@@ -46,6 +48,8 @@
 #endif
 #include 
 #include 
+#include 
+#include 
 #include "eal_private.h"

 struct device_init {
@@ -79,17 +83,54 @@ struct device_init dev_types[] = {
}
 };

+TAILQ_HEAD(pmd_nonpci_list, pmd_nonpci_entry);
+
+/* Definition for shared object drivers. */
+struct pmd_nonpci_entry {
+TAILQ_ENTRY(pmd_noncpi_entry) next;
+
+charname[PATH_MAX];
+   int (*pmd_initfn)(const char *, const char *);
+};
+
+/* List of external loadable drivers */
+static struct pmd_nonpci_list pmd_nonpci_init_list =
+TAILQ_HEAD_INITIALIZER(pmd_nonpci_init_list);
+
+void rte_eal_nonpci_dev_init_register(const char *name, int 
(*dev_initfn)(const char *, const char *))
+{
+   struct pmd_nonpci_entry *new = malloc(sizeof(struct pmd_nonpci_entry));
+
+   if (!new) {
+   printf("Not enough memory to register %s\n", name);
+   goto out;
+   }
+
+   memset(new->name, 0, PATH_MAX);
+   strncpy(new->name, name, PATH_MAX);
+   new->name[PATH_MAX-1] = 0;
+   new->pmd_initfn = dev_initfn;
+   printf("REGISTERING INIT ROUTINE FOR %s\n", new->name);
+   TAILQ_INSERT_TAIL(&pmd_nonpci_init_list, new, next);
+out:
+   return; 
+}
+
+
 int
 rte_eal_non_pci_ethdev_init(void)
 {
struct rte_devargs *devargs;
+   struct pmd_nonpci_entry *entry;
uint8_t i;
+   int found = 0;

/* call the init function for each virtual device */
TAILQ_FOREACH(devargs, &devargs_list, next) {

if (devargs->type != RTE_DEVTYPE_VIRTUAL)
continue;
+   found = 0;

for (i = 0; i < NUM_DEV_TYPES; i++) {
/* search a driver prefix in virtual device name */
@@ -98,14 +139,31 @@ rte_eal_non_pci_ethdev_init(void)
 sizeof(dev_types[i].dev_prefix) - 1)) {
dev_types[i].init_fn(devargs->virtual.drv_name,
 devargs->args);
+   found = 1;
break;
}
}

-   if (i == NUM_DEV_TYPES) {
+   if (found)
+   continue;
+
+   TAILQ_FOREACH(entry, &pmd_nonpci_init_list, next) {
+   if (!entry->name)
+   continue;
+   if (!strncmp(entry->name,
+devargs->virtual.drv_name,
+strlen(entry->name))) {
+   RTE_LOG(INFO, PMD, "Initalizing pmd %s\n", 
devargs->virtual.drv_name);
+   found = 1;
+   entry->pmd_initfn(devargs->virtual.drv_name, 
devargs->args);
+   }
+   }
+
+   if (!found) {
rte_panic("no driver found for %s\n",
  devargs->virtual.drv_name);
}
}
+
return 0;
 }
diff --git a/lib/librte_eal/common/include/rte_pmd.h 
b/lib/librte_eal/common/include/rte_pmd.h
new file mode 100644
index 000..75a4895
--- /dev/nul

[dpdk-dev] [PATCH 03/19] eal: dlopen the DSO built poll mode drivers before init

2014-04-10 Thread Neil Horman
We need to call dlopen on any DSO's referenced on the command line before
calling their init routines.  This provides the DSO's the opportunity to run the
constructors created by the PMD_INIT_NONPCI macro prior to the init list being
traversed.

Signed-off-by: Neil Horman 
---
 lib/librte_eal/linuxapp/eal/eal.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 905ce37..a4ad0eb 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -1046,9 +1046,6 @@ rte_eal_init(int argc, char **argv)

rte_eal_mcfg_complete();

-   if (rte_eal_non_pci_ethdev_init() < 0)
-   rte_panic("Cannot init non-PCI eth_devs\n");
-
TAILQ_FOREACH(solib, &solib_list, next) {
solib->lib_handle = dlopen(solib->name, RTLD_NOW);
if ((solib->lib_handle == NULL) && (solib->name[0] != '/')) {
@@ -1061,6 +1058,9 @@ rte_eal_init(int argc, char **argv)
RTE_LOG(WARNING, EAL, "%s\n", dlerror());
}

+   if (rte_eal_non_pci_ethdev_init() < 0)
+   rte_panic("Cannot init non-PCI eth_devs\n");
+
RTE_LOG(DEBUG, EAL, "Master core %u is ready (tid=%x)\n",
rte_config.master_lcore, (int)thread_id);

-- 
1.8.3.1



[dpdk-dev] [PATCH 04/19] pcap: Convert pcap poll mode driver to use new PMD_INIT_NONPCI macro

2014-04-10 Thread Neil Horman
Convert the pcap poll mode driver to link itself to libpcap and register a init
routine via the PMD_INIT_NONPCI macro

Signed-off-by: Neil Horman 
---
 lib/librte_eal/common/eal_common_nonpci_devs.c | 9 -
 lib/librte_pmd_pcap/Makefile   | 2 ++
 lib/librte_pmd_pcap/rte_eth_pcap.c | 2 ++
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_nonpci_devs.c 
b/lib/librte_eal/common/eal_common_nonpci_devs.c
index 0f94b65..d925b7f 100644
--- a/lib/librte_eal/common/eal_common_nonpci_devs.c
+++ b/lib/librte_eal/common/eal_common_nonpci_devs.c
@@ -40,9 +40,6 @@
 #ifdef RTE_LIBRTE_PMD_RING
 #include 
 #endif
-#ifdef RTE_LIBRTE_PMD_PCAP
-#include 
-#endif
 #ifdef RTE_LIBRTE_PMD_XENVIRT
 #include 
 #endif
@@ -65,12 +62,6 @@ struct device_init dev_types[] = {
.init_fn = rte_pmd_ring_init
},
 #endif
-#ifdef RTE_LIBRTE_PMD_PCAP
-   {
-   .dev_prefix = RTE_ETH_PCAP_PARAM_NAME,
-   .init_fn = rte_pmd_pcap_init
-   },
-#endif
 #ifdef RTE_LIBRTE_PMD_XENVIRT
{
.dev_prefix = RTE_ETH_XENVIRT_PARAM_NAME,
diff --git a/lib/librte_pmd_pcap/Makefile b/lib/librte_pmd_pcap/Makefile
index 5218f28..f7653fb 100644
--- a/lib/librte_pmd_pcap/Makefile
+++ b/lib/librte_pmd_pcap/Makefile
@@ -37,6 +37,8 @@ include $(RTE_SDK)/mk/rte.vars.mk
 #
 LIB = librte_pmd_pcap.a

+CPU_LDFLAGS += -lpcap
+
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)

diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c 
b/lib/librte_pmd_pcap/rte_eth_pcap.c
index fe94a79..5bc81f7 100644
--- a/lib/librte_pmd_pcap/rte_eth_pcap.c
+++ b/lib/librte_pmd_pcap/rte_eth_pcap.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 

@@ -766,3 +767,4 @@ rte_pmd_pcap_init(const char *name, const char *params)

 }

+PMD_INIT_NONPCI(rte_pmd_pcap_init, RTE_ETH_PCAP_PARAM_NAME);
-- 
1.8.3.1



[dpdk-dev] [PATCH 05/19] ring: convert the ring pmd driver to use the PMD_INIT_NONPCI macro

2014-04-10 Thread Neil Horman
Separate the ring pmd from calls within the core eal library so that it will
not be loaded unless needed (or compiled in statically)

Note: The demo test application calls directly into the ring pmd driver, and
that needs to be cleaned up, so that the -d option is used on the command line
to test the proper pmd.  for now I've just added linkage to the pmd directly to
avoid build breaks and will fix up the application operation in a subsequent
patch

Signed-off-by: Neil Horman 
---
 app/test/Makefile  |  2 ++
 lib/librte_eal/common/eal_common_nonpci_devs.c | 10 +-
 lib/librte_pmd_ring/rte_eth_ring.c |  3 +++
 mk/rte.app.mk  |  4 
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/app/test/Makefile b/app/test/Makefile
index b49785e..978e331 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -36,6 +36,8 @@ include $(RTE_SDK)/mk/rte.vars.mk
 #
 APP = test

+LDLIBS += -lrte_ring -lrte_pmd_ring
+
 #
 # all sources are stored in SRCS-y
 #
diff --git a/lib/librte_eal/common/eal_common_nonpci_devs.c 
b/lib/librte_eal/common/eal_common_nonpci_devs.c
index d925b7f..51497fd 100644
--- a/lib/librte_eal/common/eal_common_nonpci_devs.c
+++ b/lib/librte_eal/common/eal_common_nonpci_devs.c
@@ -34,12 +34,10 @@

 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-#ifdef RTE_LIBRTE_PMD_RING
-#include 
-#endif
 #ifdef RTE_LIBRTE_PMD_XENVIRT
 #include 
 #endif
@@ -56,12 +54,6 @@ struct device_init {

 #define NUM_DEV_TYPES (sizeof(dev_types)/sizeof(dev_types[0]))
 struct device_init dev_types[] = {
-#ifdef RTE_LIBRTE_PMD_RING
-   {
-   .dev_prefix = RTE_ETH_RING_PARAM_NAME,
-   .init_fn = rte_pmd_ring_init
-   },
-#endif
 #ifdef RTE_LIBRTE_PMD_XENVIRT
{
.dev_prefix = RTE_ETH_XENVIRT_PARAM_NAME,
diff --git a/lib/librte_pmd_ring/rte_eth_ring.c 
b/lib/librte_pmd_ring/rte_eth_ring.c
index 24635f3..9fec098 100644
--- a/lib/librte_pmd_ring/rte_eth_ring.c
+++ b/lib/librte_pmd_ring/rte_eth_ring.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 

 struct ring_queue {
struct rte_ring *rng;
@@ -395,3 +396,5 @@ rte_pmd_ring_init(const char *name, const char *params)
}
return 0;
 }
+
+PMD_INIT_NONPCI(rte_pmd_ring_init, RTE_ETH_RING_PARAM_NAME);
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 072718a..cf8b942 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -133,10 +133,6 @@ ifeq ($(CONFIG_RTE_LIBRTE_ETHER),y)
 LDLIBS += -lethdev
 endif

-ifeq ($(CONFIG_RTE_LIBRTE_PMD_RING),y)
-LDLIBS += -lrte_pmd_ring
-endif
-
 ifeq ($(CONFIG_RTE_LIBRTE_MALLOC),y)
 LDLIBS += -lrte_malloc
 endif
-- 
1.8.3.1



[dpdk-dev] [PATCH 06/19] xenvert: Convert xenvirt pmd to use PMD_INIT_NONPCI macro

2014-04-10 Thread Neil Horman
Convert the xenvirt driver to use the PMD_INIT_NONPCI macro so that we can break
the linkages between it and the core library to avoid unnneded loading when
built as a DSO

Signed-off-by: Neil Horman 
---
 lib/librte_eal/common/eal_common_nonpci_devs.c | 9 -
 lib/librte_pmd_xenvirt/Makefile| 2 ++
 lib/librte_pmd_xenvirt/rte_eth_xenvirt.c   | 3 +++
 mk/rte.app.mk  | 6 --
 4 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_nonpci_devs.c 
b/lib/librte_eal/common/eal_common_nonpci_devs.c
index 51497fd..c65621e 100644
--- a/lib/librte_eal/common/eal_common_nonpci_devs.c
+++ b/lib/librte_eal/common/eal_common_nonpci_devs.c
@@ -38,9 +38,6 @@
 #include 
 #include 
 #include 
-#ifdef RTE_LIBRTE_PMD_XENVIRT
-#include 
-#endif
 #include 
 #include 
 #include 
@@ -54,12 +51,6 @@ struct device_init {

 #define NUM_DEV_TYPES (sizeof(dev_types)/sizeof(dev_types[0]))
 struct device_init dev_types[] = {
-#ifdef RTE_LIBRTE_PMD_XENVIRT
-   {
-   .dev_prefix = RTE_ETH_XENVIRT_PARAM_NAME,
-   .init_fn = rte_pmd_xenvirt_init
-   },
-#endif
{
.dev_prefix = "-nodev-",
.init_fn = NULL
diff --git a/lib/librte_pmd_xenvirt/Makefile b/lib/librte_pmd_xenvirt/Makefile
index bf6d432..d5fff3b 100644
--- a/lib/librte_pmd_xenvirt/Makefile
+++ b/lib/librte_pmd_xenvirt/Makefile
@@ -39,6 +39,8 @@ LIB = librte_pmd_xenvirt.a
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)

+CPU_LDFLAGS += -lxenvirt
+
 #
 # all source are stored in SRCS-y
 #
diff --git a/lib/librte_pmd_xenvirt/rte_eth_xenvirt.c 
b/lib/librte_pmd_xenvirt/rte_eth_xenvirt.c
index bad8dd4..20d35dc 100644
--- a/lib/librte_pmd_xenvirt/rte_eth_xenvirt.c
+++ b/lib/librte_pmd_xenvirt/rte_eth_xenvirt.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 

@@ -704,3 +705,5 @@ rte_pmd_xenvirt_init(const char *name, const char *params)
eth_dev_xenvirt_create(name, params, rte_socket_id(), DEV_CREATE);
return 0;
 }
+
+PMD_INIT_NONPCI(rte_pmd_xenvirt_init, RTE_ETH_XENVIRT_PARAM_NAME);
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index cf8b942..d6fdf9e 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -158,12 +158,6 @@ ifeq ($(CONFIG_RTE_LIBRTE_EAL),y)
 LDLIBS += -lrte_eal
 endif

-
-ifeq ($(CONFIG_RTE_LIBRTE_PMD_XENVIRT),y)
-LDLIBS += -lrte_pmd_xenvirt
-LDLIBS += -lxenstore
-endif
-
 ifeq ($(CONFIG_RTE_LIBRTE_CMDLINE),y)
 LDLIBS += -lrte_cmdline
 endif
-- 
1.8.3.1



[dpdk-dev] [PATCH 07/19] pmd: remove dead code

2014-04-10 Thread Neil Horman
Now that we've converted the available PMD drivers to use the PMD_INIT_NONPCI
macro, we can remove the rest of the old init code

Signed-off-by: Neil Horman 
---
 lib/librte_eal/common/eal_common_nonpci_devs.c | 34 +-
 1 file changed, 1 insertion(+), 33 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_nonpci_devs.c 
b/lib/librte_eal/common/eal_common_nonpci_devs.c
index c65621e..78bedf3 100644
--- a/lib/librte_eal/common/eal_common_nonpci_devs.c
+++ b/lib/librte_eal/common/eal_common_nonpci_devs.c
@@ -44,19 +44,6 @@
 #include 
 #include "eal_private.h"

-struct device_init {
-   const char *dev_prefix;
-   int (*init_fn)(const char*, const char *);
-};
-
-#define NUM_DEV_TYPES (sizeof(dev_types)/sizeof(dev_types[0]))
-struct device_init dev_types[] = {
-   {
-   .dev_prefix = "-nodev-",
-   .init_fn = NULL
-   }
-};
-
 TAILQ_HEAD(pmd_nonpci_list, pmd_nonpci_entry);

 /* Definition for shared object drivers. */
@@ -96,31 +83,12 @@ rte_eal_non_pci_ethdev_init(void)
 {
struct rte_devargs *devargs;
struct pmd_nonpci_entry *entry;
-   uint8_t i;
-   int found = 0;
+   int found;

-   /* call the init function for each virtual device */
TAILQ_FOREACH(devargs, &devargs_list, next) {
-
if (devargs->type != RTE_DEVTYPE_VIRTUAL)
continue;
found = 0;
-
-   for (i = 0; i < NUM_DEV_TYPES; i++) {
-   /* search a driver prefix in virtual device name */
-   if (!strncmp(dev_types[i].dev_prefix,
-   devargs->virtual.drv_name,
-sizeof(dev_types[i].dev_prefix) - 1)) {
-   dev_types[i].init_fn(devargs->virtual.drv_name,
-devargs->args);
-   found = 1;
-   break;
-   }
-   }
-
-   if (found)
-   continue;
-
TAILQ_FOREACH(entry, &pmd_nonpci_init_list, next) {
if (!entry->name)
continue;
-- 
1.8.3.1



[dpdk-dev] [PATCH 08/19] test: fix test app to dynamically link pmd_pcap when needed

2014-04-10 Thread Neil Horman
the test application calls functions in the pmd_pcap driver directly.  We 
should fix this properly, but for now just link in the library

Signed-off-by: Neil Horman 
---
 mk/rte.app.mk | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index d6fdf9e..0499050 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -162,8 +162,10 @@ ifeq ($(CONFIG_RTE_LIBRTE_CMDLINE),y)
 LDLIBS += -lrte_cmdline
 endif

+ifeq ($(RTE_BUILD_SHARED_LIB),n)
 ifeq ($(CONFIG_RTE_LIBRTE_PMD_PCAP),y)
-LDLIBS += -lrte_pmd_pcap -lpcap
+LDLIBS += -lrte_pmd_pcap
+endif
 endif

 LDLIBS += $(EXECENV_LDLIBS)
-- 
1.8.3.1



[dpdk-dev] [PATCH 09/19] testpmd: only link pcap pmd when statically linking

2014-04-10 Thread Neil Horman
when doing a static link, we need to add -lpacp to testpmd because it will need
to resolve those symbols when it links in librte_pmd_pcap.a

Signed-off-by: Neil Horman 
---
 mk/rte.app.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 0499050..2f2ff16 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -164,7 +164,7 @@ endif

 ifeq ($(RTE_BUILD_SHARED_LIB),n)
 ifeq ($(CONFIG_RTE_LIBRTE_PMD_PCAP),y)
-LDLIBS += -lrte_pmd_pcap
+LDLIBS += -lrte_pmd_pcap -lpcap
 endif
 endif

-- 
1.8.3.1



[dpdk-dev] [PATCH 10/19] make: include whole archive on static link

2014-04-10 Thread Neil Horman
This happens automatically on dyanmic linking, but when linking an archive we
need to to include the whole archive to make sure we call all the constructors.
Not doing this causes them to be discarded due to the fact theres no symbolic
reference connecting the pmds to the application.

Signed-off-by: Neil Horman 
---
 mk/rte.app.mk | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 2f2ff16..41eab08 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -198,6 +198,8 @@ endif
 ifeq ($(LINK_USING_CC),1)
 comma := ,
 LDLIBS := $(addprefix -Wl$(comma),$(LDLIBS))
+LDLIBS := -Wl$(comma)--whole-archive $(LDLIBS)
+LDLIBS += -Wl,--no-whole-archive -Wl,-z -Wl,muldefs
 LDFLAGS := $(addprefix -Wl$(comma),$(LDFLAGS))
 override EXTRA_LDFLAGS := $(addprefix -Wl$(comma),$(EXTRA_LDFLAGS))
 O_TO_EXE = $(CC) $(CFLAGS) $(LDFLAGS_$(@)) \
-- 
1.8.3.1



[dpdk-dev] [PATCH 11/19] pmd: Move rte_pmd_init_all to be non-inline

2014-04-10 Thread Neil Horman
Organizational change in support of doing dynamic init routine registration

Signed-off-by: Neil Horman 
---
 lib/librte_ether/rte_ethdev.c | 52 ++
 lib/librte_ether/rte_ethdev.h | 53 +--
 2 files changed, 53 insertions(+), 52 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a5727dd..dd378f0 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -112,6 +112,58 @@ static uint8_t nb_ports = 0;
 /* spinlock for eth device callbacks */
 static rte_spinlock_t rte_eth_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;

+int rte_pmd_init_all(void)
+{
+int ret = -ENODEV;
+
+#ifdef RTE_LIBRTE_IGB_PMD
+if ((ret = rte_igb_pmd_init()) != 0) {
+RTE_LOG(ERR, PMD, "Cannot init igb PMD\n");
+return (ret);
+}
+if ((ret = rte_igbvf_pmd_init()) != 0) {
+RTE_LOG(ERR, PMD, "Cannot init igbvf PMD\n");
+return (ret);
+}
+#endif /* RTE_LIBRTE_IGB_PMD */
+
+#ifdef RTE_LIBRTE_EM_PMD
+if ((ret = rte_em_pmd_init()) != 0) {
+RTE_LOG(ERR, PMD, "Cannot init em PMD\n");
+return (ret);
+}
+#endif /* RTE_LIBRTE_EM_PMD */
+
+#ifdef RTE_LIBRTE_IXGBE_PMD
+if ((ret = rte_ixgbe_pmd_init()) != 0) {
+RTE_LOG(ERR, PMD, "Cannot init ixgbe PMD\n");
+return (ret);
+}
+if ((ret = rte_ixgbevf_pmd_init()) != 0) {
+RTE_LOG(ERR, PMD, "Cannot init ixgbevf PMD\n");
+return (ret);
+}
+#endif /* RTE_LIBRTE_IXGBE_PMD */
+
+#ifdef RTE_LIBRTE_VIRTIO_PMD
+if ((ret = rte_virtio_pmd_init()) != 0) {
+RTE_LOG(ERR, PMD, "Cannot init virtio PMD\n");
+return (ret);
+}
+#endif /* RTE_LIBRTE_VIRTIO_PMD */
+
+#ifdef RTE_LIBRTE_VMXNET3_PMD
+if ((ret = rte_vmxnet3_pmd_init()) != 0) {
+RTE_LOG(ERR, PMD, "Cannot init vmxnet3 PMD\n");
+return (ret);
+}
+#endif /* RTE_LIBRTE_VMXNET3_PMD */
+
+if (ret == -ENODEV)
+RTE_LOG(ERR, PMD, "No PMD(s) are configured\n");
+return (ret);
+}
+
 /**
  * The user application callback description.
  *
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index dea7471..bf2ded4 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1396,58 +1396,7 @@ extern int rte_vmxnet3_pmd_init(void);
  *   -ENODEV if there are no drivers available
  *   (e.g. if all driver config options are = n).
  */
-static inline
-int rte_pmd_init_all(void)
-{
-   int ret = -ENODEV;
-
-#ifdef RTE_LIBRTE_IGB_PMD
-   if ((ret = rte_igb_pmd_init()) != 0) {
-   RTE_LOG(ERR, PMD, "Cannot init igb PMD\n");
-   return (ret);
-   }
-   if ((ret = rte_igbvf_pmd_init()) != 0) {
-   RTE_LOG(ERR, PMD, "Cannot init igbvf PMD\n");
-   return (ret);
-   }
-#endif /* RTE_LIBRTE_IGB_PMD */
-
-#ifdef RTE_LIBRTE_EM_PMD
-   if ((ret = rte_em_pmd_init()) != 0) {
-   RTE_LOG(ERR, PMD, "Cannot init em PMD\n");
-   return (ret);
-   }
-#endif /* RTE_LIBRTE_EM_PMD */
-
-#ifdef RTE_LIBRTE_IXGBE_PMD
-   if ((ret = rte_ixgbe_pmd_init()) != 0) {
-   RTE_LOG(ERR, PMD, "Cannot init ixgbe PMD\n");
-   return (ret);
-   }
-   if ((ret = rte_ixgbevf_pmd_init()) != 0) {
-   RTE_LOG(ERR, PMD, "Cannot init ixgbevf PMD\n");
-   return (ret);
-   }
-#endif /* RTE_LIBRTE_IXGBE_PMD */
-
-#ifdef RTE_LIBRTE_VIRTIO_PMD
-   if ((ret = rte_virtio_pmd_init()) != 0) {
-   RTE_LOG(ERR, PMD, "Cannot init virtio PMD\n");
-   return (ret);
-   }
-#endif /* RTE_LIBRTE_VIRTIO_PMD */
-
-#ifdef RTE_LIBRTE_VMXNET3_PMD
-   if ((ret = rte_vmxnet3_pmd_init()) != 0) {
-   RTE_LOG(ERR, PMD, "Cannot init vmxnet3 PMD\n");
-   return (ret);
-   }
-#endif /* RTE_LIBRTE_VMXNET3_PMD */
-
-   if (ret == -ENODEV)
-   RTE_LOG(ERR, PMD, "No PMD(s) are configured\n");
-   return (ret);
-}
+extern int rte_pmd_init_all(void);

 /**
  * Configure an Ethernet device.
-- 
1.8.3.1



  1   2   3   4   5   6   7   8   9   10   >