date:20140617

[dpdk-dev] [v2 00/23] Packet Framework

2014-06-17 Thread Thomas Monjalon

2014-06-04 19:08, Cristian Dumitrescu:
> > Intel DPDK Packet Framework provides a standard methodology (logically
> > similar to OpenFlow) for rapid development of complex packet processing
> > pipelines out of ports, tables and actions.
> >
> > A pipeline is constructed by connecting its input ports to its output
> > ports through a chain of lookup tables. As result of lookup operation
> > into the current table, one of the table entries (or the default table
> > entry, in case of lookup miss) is identified to provide the actions to
> > be executed on the current packet and the associated action meta-data.
> > The behavior of user actions is defined through the configurable table
> > action handler, while the reserved actions define the next hop for the
> > current packet (either another table, an output port or packet drop)
> > and are handled transparently by the framework.
> >
> > Three new Intel DPDK libraries are introduced for Packet Framework:
> > librte_port, librte_table, librte_pipeline.
> > Please check the Intel DPDK Programmer's Guide for full description
> > of the Packet Framework design.
> >
> > Two sample applications are provided for Packet Framework:
> > app/test-pipeline and examples/ip_pipeline.
> > Please check the Intel Sample Apps Guide for a detailed description
> > of how these sample apps.
> 
> Acked by: Ivan Boule 

It was conflicting with vhost examples because of new logtype:
http://dpdk.org/browse/dpdk/commit/?id=7b79b2718f0d028cc0

I've ported fragmentation and reassembly ports to the new ip_frag library
instead of the duplicated code from the old example.

I've removed CONFIG_RTE_TEST_PIPELINE option. CONFIG_RTE_LIBRTE_PIPELINE
should be sufficient.
By the way, more build options conditioning could be needed in order to
disable some features (e.g. disabling LPM lib should silently skip LPM port).

Commit splitting have been reworked for atomicity, especially makefiles and
doxygen files.

Packet Framework is a big piece of code which is now applied to master branch
and should be ready for version 1.7.0.

Thanks a lot
-- 
Thomas

[dpdk-dev] [PATCH] rte_memory.h: include stdio.h for FILE

2014-06-17 Thread Xie, Huawei

Hi Shimamoto:
At least rte_tailq.h, rte_mbuf.h should also include stdio.h.

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Hiroshi Shimamoto
Sent: Thursday, June 12, 2014 4:11 PM
To: dev at dpdk.org
Cc: Hayato Momma
Subject: [dpdk-dev] [PATCH] rte_memory.h: include stdio.h for FILE

From: Hiroshi Shimamoto 

The below commit requires stdio FILE structure.

commit 591a9d7985c1230652d9f7ea1f9221e8c66ec188
Author: Stephen Hemminger 
Date:   Fri May 2 16:42:56 2014 -0700

add FILE argument to debug functions

Application which includes rte_memory.h without stdio.h will be hit compilation 
failure.

/path/to/include/rte_memory.h:146:30: error: unknown type name 'FILE'
 void rte_dump_physmem_layout(FILE *f);

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 lib/librte_eal/common/include/rte_memory.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/common/include/rte_memory.h 
b/lib/librte_eal/common/include/rte_memory.h
index 7f21244..4cf8ea9 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -42,6 +42,7 @@

 #include 
 #include 
+#include 

 #ifdef RTE_EXEC_ENV_LINUXAPP
 #include 
--
1.9.1

[dpdk-dev] Need help to run l2fwd-ivshmem

2014-06-17 Thread GongJinrong

Hi, 

   Can anyone give some guide about how to run l2fwd-ivshmem, I run host
binary by "host -c 7 -n 3", but got "EAL: No free hugepages reported in
hugepages-2048kB", how can I make l2fwd-ivshmem work? Another question, is
the guest receive data from host binary?

Best Regards
John Gong

[dpdk-dev] Need help to run l2fwd-ivshmem

2014-06-17 Thread GongJinrong

Hugepages issue solved, but I got the this print line "EAL: No IVSHMEM
configuration found!", does l2fwd-ivshmem need ovdk to create a ivshmem
port?

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of GongJinrong
Sent: Tuesday, June 17, 2014 2:50 PM
To: dev at dpdk.org
Subject: [dpdk-dev] Need help to run l2fwd-ivshmem

Hi, 

   Can anyone give some guide about how to run l2fwd-ivshmem, I run host
binary by "host -c 7 -n 3", but got "EAL: No free hugepages reported in
hugepages-2048kB", how can I make l2fwd-ivshmem work? Another question, is
the guest receive data from host binary?

Best Regards
John Gong

[dpdk-dev] vfio detection

2014-06-17 Thread Burakov, Anatoly

Hi Bruce,

> I have a number of NIC ports which were working correctly yesterday and are
> bound correctly to the igb_uio driver - and I want to keep using them
> through the igb_uio driver for now, not vfio. However, whenever I run a
> dpdk application today, I find that the vfio kernel module is getting loaded
> each time - even after I manually remove it, and verify that it has been
> removed by checking lsmod. Is this expected? If so, why are we loading the
> vfio driver when I just want to continue using igb_uio which works fine?

Can you elaborate a bit on what do you mean by "loading vfio driver"? Do you 
mean the vfio-pci kernel gets loaded by DPDK? I certainly didn't put in any 
code that would automatically load that driver, and certainly not binding 
devices to it.

> Secondly, then, when testpmd or any other app loads, it automatically tries
> to map the NIC using vfio and then aborts on the very first NIC port when it
> fails to do so.

This shouldn't happen, unless you have a device bound to VFIO and have another 
device in the same IOMMU group that is bound to something else. Can you provide 
a log of what you are seeing?

> This a) prevents the port from being mapped using igb_uio, and
> b) for ports which are meant to stay under linux control, forces me to start
> enumerating ports using blacklist or whitelisting, rather than having things
> "just work" on a properly configured system as before, i.e. if a port is bound
> to igb_uio or vfio it is used, if not bound, it is ignored. Again, is this by 
> design
> and expected, because it seems a major regression in usability?

I think automatic port unbinding and binding was removed, so this again 
shouldn't happen at all.

It would be useful to have logs for all of these described situations, because 
we certainly didn't encounter any of that during the validation cycle.

Best regards,
Anatoly Burakov
DPDK SW Engineer

[dpdk-dev] [PATCH v2 0/7] add mtu and flow control handlers

2014-06-17 Thread David Marchand

Hello Konstantin,


On 06/16/2014 07:07 PM, Ananyev, Konstantin wrote:
>
> 1)  [PATCH v2 3/7] ethdev: store min rx buffer size
> @@ -879,6 +879,8 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t 
> rx_queue_id,
>  const struct rte_eth_rxconf *rx_conf,
>  struct rte_mempool *mp)
>   {
> ...
> + if (!ret) {
> + if (dev->data->min_rx_buf_size > mbp_buf_size)
> + dev->data->min_rx_buf_size = mbp_buf_size;
> + }
> +
> + return ret;
>
> Where do you set the initial value of min_rx_buf_size?
> Can't find it by some reason.

Hum, actually, dev->data structure is supposed to be set to 0 at init 
time or I missed something.

I would say this happens once for the whole rte_eth_dev_data array in 
rte_eth_dev_data_alloc() in primary process (first call to 
rte_eth_dev_allocate()).


>
> 2)  [PATCH v2 5/7] ethdev: add mtu accessors
> +static int
> +ixgbe_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> +{
> ...
> + if (!dev->data->scattered_rx &&
> + frame_size > dev->data->min_rx_buf_size - RTE_PKTMBUF_HEADROOM)
> + return -EINVAL;
>
> Reading 82599 spec, 8.2.3.22.13 Max Frame Size - MAXFRS (0x04268; RW):
> " The MFS does not include the 4 bytes of the VLAN header. Packets with VLAN 
> header
> can be as large as MFS + 4. When double VLAN is enabled, the device adds 8 to 
> the
> MFS for any packets."
>
> So, I suppose it should be:
> if (!dev->data->scattered_rx &&
> frame_size + 2 * IXGBE_VLAN_TAG_SIZE > dev->data->min_rx_buf_size - 
> RTE_PKTMBUF_HEADROOM)
>
> Like in ixgbe_dev_rx_init().

Ok, I forgot to take this part you mentioned earlier.
I will send an update later (depending on the points 1) and 3)).


>
> 3)  if ((mtu < 68) || (frame_size > dev_info.max_rx_pktlen))
> Can we add a new define for min allowable MTU (68) as it used in few places.

RTE_IPV4_MIN_MTU then ?
I am not sure where this belongs, it could go in rte_ethdev.h.



-- 
David Marchand

[dpdk-dev] [PATCH] vfio: make container open error non-fatal

2014-06-17 Thread Burakov, Anatoly

Hi Bruce,

> The below patch is the quickest fix I found to make my applications work
> again, but I'm not sure it's the best solution. Can anyone else offer other
> suggestions to improve this?

Are you running things as root? If not, I suggest to try and use the setup.sh 
script to correct permissions on the VFIO container and see if it works.

The inability of opening a container is likely a problem with permissions on 
the container, and thus should be considered fatal as far as VFIO is concerned. 
However, given that we try to use VFIO unconditionally, I think your suggestion 
is a good solution to the problem, however I would also close the group fd's 
that were already opened before returning 1.

Best regards,
Anatoly Burakov
DPDK SW Engineer

[dpdk-dev] [PATCH] vfio: make container open error non-fatal

2014-06-17 Thread Burakov, Anatoly

Hi Bruce,

> Hi Bruce,
> 
> > The below patch is the quickest fix I found to make my applications
> > work again, but I'm not sure it's the best solution. Can anyone else
> > offer other suggestions to improve this?
> 
> Are you running things as root? If not, I suggest to try and use the setup.sh
> script to correct permissions on the VFIO container and see if it works.
> 
> The inability of opening a container is likely a problem with permissions on
> the container, and thus should be considered fatal as far as VFIO is
> concerned. However, given that we try to use VFIO unconditionally, I think
> your suggestion is a good solution to the problem, however I would also
> close the group fd's that were already opened before returning 1.

On a second thought, I think this may be better solved by checking access() on 
the container. Right now I think PCI init checks for access on /dev/vfio (the 
VFIO dir) but not /dev/vfio/vfio (the container). I will prepare a patch 
shortly, so I would appreciate if you self-NAKed yours :-)

Best regards,
Anatoly Burakov
DPDK SW Engineer

[dpdk-dev] [PATCH v2 0/7] add mtu and flow control handlers

2014-06-17 Thread Ananyev, Konstantin

Hi David,

>>
>> 1)  [PATCH v2 3/7] ethdev: store min rx buffer size
>> @@ -879,6 +879,8 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t 
>> rx_queue_id,
>> const struct rte_eth_rxconf *rx_conf,
>> struct rte_mempool *mp)
>>   {
>> ...
>> +if (!ret) {
>> +if (dev->data->min_rx_buf_size > mbp_buf_size)
>> +dev->data->min_rx_buf_size = mbp_buf_size;
>> +}
>> +
>> +return ret;
>>
>> Where do you set the initial value of min_rx_buf_size?
>> Can't find it by some reason.

>Hum, actually, dev->data structure is supposed to be set to 0 at init 
>time or I missed something.

>I would say this happens once for the whole rte_eth_dev_data array in 
>rte_eth_dev_data_alloc() in primary process (first call to 
>rte_eth_dev_allocate()).

Yes, I understand that it will be initialised to 0 together with whole 
dev->data.
But then, the condition:
if (dev->data->min_rx_buf_size > mbp_buf_size)
would never be true, and  min_rx_buf_size would always remain 0?
I thought you need to initialise it with UINT32_MAX(or UINT16_MAX).
BTW, not big deal, but I think uint16_t is enough for min_rx_buf_size.

>>
>> 3)  if ((mtu < 68) || (frame_size > dev_info.max_rx_pktlen))
>> Can we add a new define for min allowable MTU (68) as it used in few places.

>RTE_IPV4_MIN_MTU then ?

Sounds good to me.

>I am not sure where this belongs, it could go in rte_ethdev.h.

Probably rte_ether.h?

Konstantin

[dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init

2014-06-17 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 4de6061..9eb5dcd 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -523,17 +523,6 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
rte_snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
loc->domain, loc->bus, loc->devid, loc->function);

-   /* get container fd (needs to be done only once per initialization) */
-   if (vfio_cfg.vfio_container_fd == -1) {
-   int vfio_container_fd = pci_vfio_get_container_fd();
-   if (vfio_container_fd < 0) {
-   RTE_LOG(ERR, EAL, "  %s cannot open VFIO container!\n", 
pci_addr);
-   return -1;
-   }
-
-   vfio_cfg.vfio_container_fd = vfio_container_fd;
-   }
-
/* get group number */
iommu_group_no = pci_vfio_get_group_no(pci_addr);

@@ -770,10 +759,10 @@ pci_vfio_enable(void)
vfio_cfg.vfio_groups[i].fd = -1;
vfio_cfg.vfio_groups[i].group_no = -1;
}
-   vfio_cfg.vfio_container_fd = -1;
+   vfio_cfg.vfio_container_fd = pci_vfio_get_container_fd();

/* check if we have VFIO driver enabled */
-   if (access(VFIO_DIR, F_OK) == 0)
+   if (vfio_cfg.vfio_container_fd != -1)
vfio_cfg.vfio_enabled = 1;
else
RTE_LOG(INFO, EAL, "VFIO driver not loaded or wrong 
permissions\n");
-- 
1.8.1.4

[dpdk-dev] [PATCH v2] ethdev: add Rx error counters for missed, badcrc and badlen packets

2014-06-17 Thread De Lara Guarch, Pablo

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, June 12, 2014 10:56 PM
> To: dev at dpdk.org
> Cc: david.marchand at 6wind.com; De Lara Guarch, Pablo; Ivan Boule
> Subject: [PATCH v2] ethdev: add Rx error counters for missed, badcrc and
> badlen packets
> 
> From: Ivan Boule 
> 
> Split input error stats to have a better understanding of why packets
> have been dropped.
> Keep ierrors field untouched for backward compatibility.
> 
> Signed-off-by: Ivan Boule 
> Signed-off-by: David Marchand 
> Signed-off-by: Thomas Monjalon 
> ---
>  app/test-pmd/config.c   | 24 +---
>  app/test-pmd/testpmd.c  | 34 --
>  examples/load_balancer/runtime.c|  2 +-
>  lib/librte_ether/rte_ethdev.h   |  3 +++
>  lib/librte_pmd_e1000/em_ethdev.c|  9 +++--
>  lib/librte_pmd_e1000/igb_ethdev.c   |  9 +++--
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 12 +---
>  7 files changed, 64 insertions(+), 29 deletions(-)
> 
> changes in v2:
> - fix alignments when displaying fwd and nic statistics in testpmd
> 
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index e0298c6..2137fd3 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -126,19 +126,29 @@ nic_stats_display(portid_t port_id)
>  nic_stats_border, port_id, nic_stats_border);
> 
>   if ((!port->rx_queue_stats_mapping_enabled) && (!port-
> >tx_queue_stats_mapping_enabled)) {
> - printf("  RX-packets: %-10"PRIu64" RX-errors: %-
> 10"PRIu64"RX-bytes: "
> -"%-"PRIu64"\n"
> -"  TX-packets: %-10"PRIu64" TX-errors: %-10"PRIu64"TX-
> bytes: "
> + printf("  RX-packets: %-10"PRIu64" RX-missed: %-10"PRIu64"
> RX-bytes:  "
> +"%-"PRIu64"\n",
> +stats.ipackets, stats.imissed, stats.ibytes);
> + printf("  RX-badcrc:  %-10"PRIu64" RX-badlen: %-10"PRIu64"
> RX-errors: "
> +"%-"PRIu64"\n",
> +stats.ibadcrc, stats.ibadlen, stats.ierrors);
> + printf("  RX-nombuf:  %-10"PRIu64"\n",
> +stats.rx_nombuf);
> + printf("  TX-packets: %-10"PRIu64" TX-errors: %-10"PRIu64"
> TX-bytes:  "
>  "%-"PRIu64"\n",
> -stats.ipackets, stats.ierrors, stats.ibytes,
>  stats.opackets, stats.oerrors, stats.obytes);
>   }
>   else {
>   printf("  RX-packets:  %10"PRIu64"RX-errors:
> %10"PRIu64
> -"RX-bytes: %10"PRIu64"\n"
> -"  TX-packets:  %10"PRIu64"TX-errors:
> %10"PRIu64
> +"RX-bytes: %10"PRIu64"\n",
> +stats.ipackets, stats.ierrors, stats.ibytes);
> + printf("  RX-badcrc:   %10"PRIu64"RX-badlen:
> %10"PRIu64
> +"  RX-errors:  %10"PRIu64"\n",
> +stats.ibadcrc, stats.ibadlen, stats.ierrors);
> + printf("  RX-nombuf:   %10"PRIu64"\n",
> +stats.rx_nombuf);
> + printf("  TX-packets:  %10"PRIu64"TX-errors:
> %10"PRIu64
>  "TX-bytes: %10"PRIu64"\n",
> -stats.ipackets, stats.ierrors, stats.ibytes,
>  stats.opackets, stats.oerrors, stats.obytes);
>   }
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 2529dc3..0727fb3 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -770,39 +770,45 @@ fwd_port_stats_display(portid_t port_id, struct
> rte_eth_stats *stats)
>   if ((!port->rx_queue_stats_mapping_enabled) && (!port-
> >tx_queue_stats_mapping_enabled)) {
>   printf("  RX-packets: %-14"PRIu64" RX-dropped: %-
> 14"PRIu64"RX-total: "
>  "%-"PRIu64"\n",
> -stats->ipackets, stats->ierrors,
> -(uint64_t) (stats->ipackets + stats->ierrors));
> +stats->ipackets, stats->imissed,
> +(uint64_t) (stats->ipackets + stats->imissed));
> 
>   if (cur_fwd_eng == &csum_fwd_engine)
>   printf("  Bad-ipcsum: %-14"PRIu64" Bad-l4csum: %-
> 14"PRIu64" \n",
>  port->rx_bad_ip_csum, port->rx_bad_l4_csum);
> + if (((stats->ierrors - stats->imissed) + stats->rx_nombuf) > 0) 
> {
> + printf("  RX-badcrc:  %-14"PRIu64" RX-badlen:  %-
> 14"PRIu64
> +"RX-error: %-"PRIu64"\n",
> +stats->ibadcrc, stats->ibadlen, stats->ierrors);
> + printf("  RX-nombufs: %-14"PRIu64"\n", stats-
> >rx_nombuf);
> + }
> 
>   printf("  TX-packets: %-14"PRIu64" TX-dropped: %-
> 14"PRIu64"TX-total: "
>  "%-"PRIu64"

[dpdk-dev] [PATCH v2 0/7] add mtu and flow control handlers

2014-06-17 Thread Thomas Monjalon

2014-06-17 08:57, Ananyev, Konstantin:
> >> 3)  if ((mtu < 68) || (frame_size > dev_info.max_rx_pktlen))
> >> Can we add a new define for min allowable MTU (68) as it used in few 
> >> places.
> 
> >RTE_IPV4_MIN_MTU then ?
> 
> Sounds good to me.
> 
> >I am not sure where this belongs, it could go in rte_ethdev.h.
> 
> Probably rte_ether.h?

As you konw, rte_ether.h is for ethernet definition
(and should be located in librte_net).
For RTE_IPV4_MIN_MTU, I think librte_net/rte_ip.h is more appropriate.

-- 
Thomas

[dpdk-dev] [PATCH v2] ethdev: add Rx error counters for missed, badcrc and badlen packets

2014-06-17 Thread Thomas Monjalon

> > From: Ivan Boule 
> > 
> > Split input error stats to have a better understanding of why packets
> > have been dropped.
> > Keep ierrors field untouched for backward compatibility.
> > 
> > Signed-off-by: Ivan Boule 
> > Signed-off-by: David Marchand 
> > Signed-off-by: Thomas Monjalon 
> 
> Acked-by: Pablo de Lara 

Applied for version 1.7.0.

Thanks
-- 
Thomas

[dpdk-dev] [PATCH 0/7] Make DPDK tailqs fully local

2014-06-17 Thread Burakov, Anatoly

Found a few races, a v2 will be submitted shortly.

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Anatoly Burakov
> Sent: Friday, June 13, 2014 4:29 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 0/7] Make DPDK tailqs fully local

Best regards,
Anatoly Burakov
DPDK SW Engineer

[dpdk-dev] [PATCH v2] ixgbe: Fix for 82599 Bypass NIC, getting incorrect media type

2014-06-17 Thread Thomas Monjalon

> Function ixgbe_get_media_type_82599 returns media_type =
> ixgbe_media_type_unknown, when using an 82599 Bypass NIC,
> so that causes link status interrupt not to work properly.
> 
> change in v2: Fixed compilation error when RTE_NIC_BYPASS=n
> 
> Signed-off-by: Pablo de Lara 
> ---
>  lib/librte_pmd_ixgbe/ixgbe/ixgbe_82599.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)

Note that we shouldn't modify the Intel base driver.
Acked-by: Thomas Monjalon 

Applied for version 1.7.0.
Feel free to revert if someone think it's not acceptable.

-- 
Thomas

[dpdk-dev] [PATCH] examples/vmdq: Fix core id issue for TX burst

2014-06-17 Thread Thomas Monjalon

2014-06-12 15:10, Ouyang Changchun:
> This patch fixes a core id issue in sample vmdq, in case core mask doesn't 
> start
> with lcore_id 0 but 20, for instance, it should use core_id instead of 
> lcore_id.
> 
> Signed-off-by: Ouyang Changchun 
> Tested-by: Waterman Cao 

Acked-by: Thomas Monjalon 

Applied for version 1.7.0.

Thanks
-- 
Thomas

[dpdk-dev] [PATCH] vfio: make container open error non-fatal

2014-06-17 Thread Neil Horman

On Mon, Jun 16, 2014 at 10:30:54PM +, Richardson, Bruce wrote:
> The below patch is the quickest fix I found to make my applications work 
> again, but I'm not sure it's the best solution. Can anyone else offer other 
> suggestions to improve this?
> 
> > -Original Message-
> > From: Richardson, Bruce
> > Sent: Monday, June 16, 2014 3:29 PM
> > To: dev at dpdk.org
> > Cc: Richardson, Bruce
> > Subject: [PATCH] vfio: make container open error non-fatal
> > 
> > When setting up an app to run using the uio driver, errors caused by
> > VFIO failures should not abruptly cause the app to fail.
> > 
> > Example: on a board with 8 ports bound to igb_uio module, and no VFIO
> > configuration, a testpmd run currently fails with:
> > 
> > EAL:   cannot open VFIO container!
> > EAL:   :04:00.0 cannot open VFIO container!
> > EAL: Error - exiting with code: 1
> >   Cause: Requested device :04:00.0 cannot be used
> > 
> > With this patch applied, the problem with VFIO is ignored and testpmd
> > successfully starts up - with ignored errors with vfio - as below:
> > 
> > EAL: PCI device :04:00.0 on NUMA socket 0
> > EAL:   probe driver: 8086:1521 rte_igb_pmd
> > EAL:   unknown IOMMU driver!
> > EAL:   :04:00.0 cannot open VFIO container!
> > EAL:   :04:00.0 not managed by UIO driver, skipping
> > <...scan results for other ports skipped...>
> > EAL: PCI device :8e:00.0 on NUMA socket 1
> > EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> > EAL:   unknown IOMMU driver!
> > EAL:   :8e:00.0 cannot open VFIO container!
> > EAL:   PCI memory mapped at 0x7ff4ff5fa000
> > EAL:   PCI memory mapped at 0x7ff4ff5f6000
> > EAL: PCI device :8e:00.1 on NUMA socket 1
> > EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> > EAL:   unknown IOMMU driver!
> > EAL:   :8e:00.1 cannot open VFIO container!
> > EAL:   PCI memory mapped at 0x7ff4ff4f6000
> > EAL:   PCI memory mapped at 0x7ff4ff4f2000
> > Interactive-mode selected
> > Configuring Port 0 (socket 0)
> > <...other 7 ports ...>
> > Checking link statuses...
> > Port 0 Link Up - speed 1 Mbps - full-duplex
> > Port 1 Link Down
> > Port 2 Link Up - speed 1 Mbps - full-duplex
> > Port 3 Link Down
> > Port 4 Link Up - speed 1 Mbps - full-duplex
> > Port 5 Link Down
> > Port 6 Link Up - speed 1 Mbps - full-duplex
> > Port 7 Link Down
> > Done
> > testpmd>
> > 
> > This issue is introduced by the VFIO patch set addition, specifically
> > commit ff0b67d1.
> > 
> > Signed-off-by: Bruce Richardson 
> > ---
> >  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > index 4de6061..4af38f6 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > @@ -528,7 +528,7 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
> > int vfio_container_fd = pci_vfio_get_container_fd();
> > if (vfio_container_fd < 0) {
> > RTE_LOG(ERR, EAL, "  %s cannot open VFIO
> > container!\n", pci_addr);
> > -   return -1;
> > +   return 1;
> > }
> > 
> > vfio_cfg.vfio_container_fd = vfio_container_fd;
> > --
> > 1.9.3
> 
> 

I think it would be preferable to convert the pci_vfio_get_container_fd function
to return not -1, but some -ERRNO value, so that the caller can differentiate
between fatal and non-fatal errors (for instance, not having any vfio container
seems non-fatal, but having one with an incompatible api version may be
unworkable.

Neil

[dpdk-dev] [PATCH v2 0/7] add mtu and flow control handlers

2014-06-17 Thread Ananyev, Konstantin

>As you konw, rte_ether.h is for ethernet definition
>(and should be located in librte_net).
>For RTE_IPV4_MIN_MTU, I think librte_net/rte_ip.h is more appropriate.

Yes, it is.
Konstantin

[dpdk-dev] Unable to send Response packets to the same port

2014-06-17 Thread Tomasz K

Hello

We're currently testing an application based on L2FWD example.

1. The application is located on VM which has 2 VFs from 2 different PFs

2. One core simply polls both RX queues from VFs, makes simple message
processing and forwards the messages to appropriate TX queue of different
VF (so there is no multiple access to the same port)

3. However sometimes when message is being processed, it results with
failure and code needs to send back Failure notification to the port from
which the message was received.

The issue is that sometimes we see that packets are not being sent back
(even though rte_eth_tx_burst() is succesfull... checked with tcpdump on
peer ).
Instead the core receives next packets and tries to send Failure
Indications again until it runs out of memory in mempool.

One thing to notice is that our app priority is latency over throughput so
it always invokes rte_eth_tx_burst with only 1 packet to send. (we are
suspecting this might be an issue here)

Has anyone encountered such issue before.?

Host Setup:
DL380p Gen8 Server Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz
Ubuntu 14.04: 3.13.0-24-generic
Intel 82599

VM Setup:
Ubuntu 14.04: 3.13.0-24-generic
2 VFs (each one from different PF)

[dpdk-dev] [PATCH v2 0/7] add mtu and flow control handlers

2014-06-17 Thread David Marchand

On 06/17/2014 10:57 AM, Ananyev, Konstantin wrote:
> Yes, I understand that it will be initialised to 0 together with whole 
> dev->data.
> But then, the condition:
> if (dev->data->min_rx_buf_size > mbp_buf_size)
> would never be true, and  min_rx_buf_size would always remain 0?
> I thought you need to initialise it with UINT32_MAX(or UINT16_MAX).
> BTW, not big deal, but I think uint16_t is enough for min_rx_buf_size.

- Oh, right...
We need a check on this :
if (!dev->data->min_rx_buf_size ||
 dev->data->min_rx_buf_size > mbp_buf_size)


- Yep, uint16_t should be enough for min_rx_buf_size, but then, we might 
want to update other places where bufsizes are compared to uin32_t as well.


- Actually, looking at dev->data structure, there is something 
suspicious to me.
 From what I understood, secondary processes are not supposed to touch 
dev->data, at it is shared between processes.
So I don't understand why rte_eth_dev_allocate() writes 
dev->data->port_id, without looking at process type.

Idem, later in rte_eth_dev_init(), where 
eth_dev->data->rx_mbuf_alloc_failed is set to 0 (which should already be 
set to 0 anyway).

I think a cleanup is required here but it can wait until 1.7 is out.
Plus, I am not sure we should let secondary processes use fdir calls, 
change vlan offloads etc...


>
>>>
>>> 3)  if ((mtu < 68) || (frame_size > dev_info.max_rx_pktlen))
>>> Can we add a new define for min allowable MTU (68) as it used in few places.
>
>> RTE_IPV4_MIN_MTU then ?
>
> Sounds good to me.
>
>> I am not sure where this belongs, it could go in rte_ethdev.h.
>
> Probably rte_ether.h?

Ok, I spoke to Ivan and Thomas off-list.
I propose to add the following definition in rte_ether.h :

#define ETHER_MIN_MTU 68
/**< Minimum MTU for IPv4 packets, see RFC 791. */

What do you think of this ?


-- 
David Marchand

[dpdk-dev] Unable to send Response packets to the same port

2014-06-17 Thread Tomasz K

Update:

I forgot to mention that in our case, due to some our internal constrains
code always allocates new m_buf for received packet and adds additional
overhead PDU to it (both ways)

It seems that problem lies with the same mempool being used. We've tried to
create another mempool for packet allocation, and it seems to be working
fine.

Thanks
Tomasz Kasowicz


2014-06-17 14:55 GMT+02:00 Tomasz K :

> Hello
>
> We're currently testing an application based on L2FWD example.
>
> 1. The application is located on VM which has 2 VFs from 2 different PFs
>
> 2. One core simply polls both RX queues from VFs, makes simple message
> processing and forwards the messages to appropriate TX queue of different
> VF (so there is no multiple access to the same port)
>
> 3. However sometimes when message is being processed, it results with
> failure and code needs to send back Failure notification to the port from
> which the message was received.
>
> The issue is that sometimes we see that packets are not being sent back
> (even though rte_eth_tx_burst() is succesfull... checked with tcpdump on
> peer ).
> Instead the core receives next packets and tries to send Failure
> Indications again until it runs out of memory in mempool.
>
> One thing to notice is that our app priority is latency over throughput so
> it always invokes rte_eth_tx_burst with only 1 packet to send. (we are
> suspecting this might be an issue here)
>
> Has anyone encountered such issue before.?
>
> Host Setup:
> DL380p Gen8 Server Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz
> Ubuntu 14.04: 3.13.0-24-generic
> Intel 82599
>
> VM Setup:
> Ubuntu 14.04: 3.13.0-24-generic
> 2 VFs (each one from different PF)
>

[dpdk-dev] [PATCH 1/2] kni: fix build with kernel 3.16

2014-06-17 Thread Aaro Koskinen

SET_ETHTOOL_OPS is gone in 3.16, so modify drivers accordingly.

Signed-off-by: Aaro Koskinen 
---
 lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h   | 4 
 lib/librte_eal/linuxapp/kni/ethtool/ixgbe/kcompat.h | 5 +
 lib/librte_eal/linuxapp/kni/kni_ethtool.c   | 2 +-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h 
b/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
index 4c27d5d..f5e4435 100644
--- a/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
+++ b/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
@@ -3853,4 +3853,8 @@ skb_set_hash(struct sk_buff *skb, __u32 hash, 
__always_unused int type)
 #endif /* NETIF_F_RXHASH */
 #endif /* < 3.14.0 */

+#if ( LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0) )
+#define SET_ETHTOOL_OPS(netdev, ops) ((netdev)->ethtool_ops = (ops))
+#endif /* >= 3.16.0 */
+
 #endif /* _KCOMPAT_H_ */
diff --git a/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/kcompat.h 
b/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/kcompat.h
index 4126d14..5a6a770 100644
--- a/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/kcompat.h
+++ b/lib/librte_eal/linuxapp/kni/ethtool/ixgbe/kcompat.h
@@ -3136,4 +3136,9 @@ static inline int __kc_pci_vfs_assigned(struct pci_dev 
*dev)
 #define pci_vfs_assigned(dev) __kc_pci_vfs_assigned(dev)

 #endif
+
+#if ( LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0) )
+#define SET_ETHTOOL_OPS(netdev, ops) ((netdev)->ethtool_ops = (ops))
+#endif /* >= 3.16.0 */
+
 #endif /* _KCOMPAT_H_ */
diff --git a/lib/librte_eal/linuxapp/kni/kni_ethtool.c 
b/lib/librte_eal/linuxapp/kni/kni_ethtool.c
index d0673e5..06b6d46 100644
--- a/lib/librte_eal/linuxapp/kni/kni_ethtool.c
+++ b/lib/librte_eal/linuxapp/kni/kni_ethtool.c
@@ -213,5 +213,5 @@ struct ethtool_ops kni_ethtool_ops = {
 void
 kni_set_ethtool_ops(struct net_device *netdev)
 {
-   SET_ETHTOOL_OPS(netdev, &kni_ethtool_ops);
+   netdev->ethtool_ops = &kni_ethtool_ops;
 }
-- 
2.0.0

[dpdk-dev] [PATCH 2/2] kni: igb: modify rate configation to support min/max rate fields

2014-06-17 Thread Aaro Koskinen

This follows the mainline Linux kernel commit
ed616689a3d95eb6c9bdbb1ef74b0f50cbdf276a (Add support to configure SR-IOV
VF minimum and maximum Tx rate) by Sucheta Chakraborty, and enables to
build the driver against 3.16.

Signed-off-by: Aaro Koskinen 
---
 lib/librte_eal/linuxapp/kni/ethtool/igb/igb_main.c | 23 ++
 lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h  |  1 +
 2 files changed, 24 insertions(+)

diff --git a/lib/librte_eal/linuxapp/kni/ethtool/igb/igb_main.c 
b/lib/librte_eal/linuxapp/kni/ethtool/igb/igb_main.c
index 0657237..a802a02 100644
--- a/lib/librte_eal/linuxapp/kni/ethtool/igb/igb_main.c
+++ b/lib/librte_eal/linuxapp/kni/ethtool/igb/igb_main.c
@@ -200,7 +200,11 @@ static int igb_ndo_set_vf_vlan(struct net_device *netdev,
 static int igb_ndo_set_vf_spoofchk(struct net_device *netdev, int vf,
bool setting);
 #endif
+#ifdef HAVE_VF_MIN_MAX_TXRATE
+static int igb_ndo_set_vf_bw(struct net_device *, int, int, int);
+#else /* HAVE_VF_MIN_MAX_TXRATE */
 static int igb_ndo_set_vf_bw(struct net_device *netdev, int vf, int tx_rate);
+#endif /* HAVE_VF_MIN_MAX_TXRATE */
 static int igb_ndo_get_vf_config(struct net_device *netdev, int vf,
 struct ifla_vf_info *ivi);
 static void igb_check_vf_rate_limit(struct igb_adapter *);
@@ -2278,7 +2282,11 @@ static const struct net_device_ops igb_netdev_ops = {
 #ifdef IFLA_VF_MAX
.ndo_set_vf_mac = igb_ndo_set_vf_mac,
.ndo_set_vf_vlan= igb_ndo_set_vf_vlan,
+#ifdef HAVE_VF_MIN_MAX_TXRATE
+   .ndo_set_vf_rate= igb_ndo_set_vf_bw,
+#else /* HAVE_VF_MIN_MAX_TXRATE */
.ndo_set_vf_tx_rate = igb_ndo_set_vf_bw,
+#endif /* HAVE_VF_MIN_MAX_TXRATE */
.ndo_get_vf_config  = igb_ndo_get_vf_config,
 #ifdef HAVE_VF_SPOOFCHK_CONFIGURE
.ndo_set_vf_spoofchk= igb_ndo_set_vf_spoofchk,
@@ -9389,7 +9397,12 @@ static void igb_check_vf_rate_limit(struct igb_adapter 
*adapter)
}
 }

+#ifdef HAVE_VF_MIN_MAX_TXRATE
+static int igb_ndo_set_vf_bw(struct net_device *netdev, int vf, int 
min_tx_rate,
+int tx_rate)
+#else /* HAVE_VF_MIN_MAX_TXRATE */
 static int igb_ndo_set_vf_bw(struct net_device *netdev, int vf, int tx_rate)
+#endif /* HAVE_VF_MIN_MAX_TXRATE */
 {
struct igb_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
@@ -9398,6 +9411,11 @@ static int igb_ndo_set_vf_bw(struct net_device *netdev, 
int vf, int tx_rate)
if (hw->mac.type != e1000_82576)
return -EOPNOTSUPP;

+#ifdef HAVE_VF_MIN_MAX_TXRATE
+   if (min_tx_rate)
+   return -EINVAL;
+#endif /* HAVE_VF_MIN_MAX_TXRATE */
+
actual_link_speed = igb_link_mbps(adapter->link_speed);
if ((vf >= adapter->vfs_allocated_count) ||
(!(E1000_READ_REG(hw, E1000_STATUS) & E1000_STATUS_LU)) ||
@@ -9419,7 +9437,12 @@ static int igb_ndo_get_vf_config(struct net_device 
*netdev,
return -EINVAL;
ivi->vf = vf;
memcpy(&ivi->mac, adapter->vf_data[vf].vf_mac_addresses, ETH_ALEN);
+#ifdef HAVE_VF_MIN_MAX_TXRATE
+   ivi->max_tx_rate = adapter->vf_data[vf].tx_rate;
+   ivi->min_tx_rate = 0;
+#else /* HAVE_VF_MIN_MAX_TXRATE */
ivi->tx_rate = adapter->vf_data[vf].tx_rate;
+#endif /* HAVE_VF_MIN_MAX_TXRATE */
ivi->vlan = adapter->vf_data[vf].pf_vlan;
ivi->qos = adapter->vf_data[vf].pf_qos;
 #ifdef HAVE_VF_SPOOFCHK_CONFIGURE
diff --git a/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h 
b/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
index f5e4435..7c5d6ac 100644
--- a/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
+++ b/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
@@ -3855,6 +3855,7 @@ skb_set_hash(struct sk_buff *skb, __u32 hash, 
__always_unused int type)

 #if ( LINUX_VERSION_CODE >= KERNEL_VERSION(3,16,0) )
 #define SET_ETHTOOL_OPS(netdev, ops) ((netdev)->ethtool_ops = (ops))
+#define HAVE_VF_MIN_MAX_TXRATE 1
 #endif /* >= 3.16.0 */

 #endif /* _KCOMPAT_H_ */
-- 
2.0.0

[dpdk-dev] [PATCH v2 0/7] add mtu and flow control handlers

2014-06-17 Thread Ananyev, Konstantin

>- Actually, looking at dev->data structure, there is something 
>suspicious to me.
>From what I understood, secondary processes are not supposed to touch 
>dev->data, at it is shared between processes.
>So I don't understand why rte_eth_dev_allocate() writes 
>dev->data->port_id, without looking at process type.

It was a while since I looked at that part...
But yes, it doesn't look right to me either.
As I remember, primary and secondary processes supposed to have exactly the 
same device list.
Probably that's why it was ok so far.

>Idem, later in rte_eth_dev_init(), where 
>eth_dev->data->rx_mbuf_alloc_failed is set to 0 (which should already be 
>set to 0 anyway).

>I think a cleanup is required here but it can wait until 1.7 is out.

Yes, agree.

>Plus, I am not sure we should let secondary processes use fdir calls, 
>change vlan offloads etc...


>Ok, I spoke to Ivan and Thomas off-list.
>I propose to add the following definition in rte_ether.h :

>#define ETHER_MIN_MTU 68
>/**< Minimum MTU for IPv4 packets, see RFC 791. */

>What do you think of this ?

That's fine too.

Konstantin

[dpdk-dev] [PATCH 9/9] rte_acl: make acl tailq fully local

2014-06-17 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_acl/acl.h |  1 -
 lib/librte_acl/rte_acl.c | 74 +++-
 2 files changed, 60 insertions(+), 15 deletions(-)

diff --git a/lib/librte_acl/acl.h b/lib/librte_acl/acl.h
index e6d7985..b9d63fd 100644
--- a/lib/librte_acl/acl.h
+++ b/lib/librte_acl/acl.h
@@ -149,7 +149,6 @@ struct rte_acl_bld_trie {
 };

 struct rte_acl_ctx {
-   TAILQ_ENTRY(rte_acl_ctx) next;/**< Next in list. */
charname[RTE_ACL_NAMESIZE];
/** Name of the ACL context. */
int32_t socket_id;
diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
index 129a41f..3b47ab6 100644
--- a/lib/librte_acl/rte_acl.c
+++ b/lib/librte_acl/rte_acl.c
@@ -36,13 +36,14 @@

 #defineBIT_SIZEOF(x)   (sizeof(x) * CHAR_BIT)

-TAILQ_HEAD(rte_acl_list, rte_acl_ctx);
+TAILQ_HEAD(rte_acl_list, rte_tailq_entry);

 struct rte_acl_ctx *
 rte_acl_find_existing(const char *name)
 {
-   struct rte_acl_ctx *ctx;
+   struct rte_acl_ctx *ctx = NULL;
struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;

/* check that we have an initialised tail queue */
acl_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_ACL, rte_acl_list);
@@ -52,27 +53,55 @@ rte_acl_find_existing(const char *name)
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(ctx, acl_list, next) {
+   TAILQ_FOREACH(te, acl_list, next) {
+   ctx = (struct rte_acl_ctx*) te->data;
if (strncmp(name, ctx->name, sizeof(ctx->name)) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (ctx == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return ctx;
 }

 void
 rte_acl_free(struct rte_acl_ctx *ctx)
 {
+   struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;
+
if (ctx == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_ACL, rte_acl_list, ctx);
+   /* check that we have an initialised tail queue */
+   acl_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_ACL, rte_acl_list);
+   if (acl_list == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, acl_list, next) {
+   if (te->data == (void *) ctx)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(acl_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

rte_free(ctx->mem);
rte_free(ctx);
+   rte_free(te);
 }

 struct rte_acl_ctx *
@@ -81,6 +110,7 @@ rte_acl_create(const struct rte_acl_param *param)
size_t sz;
struct rte_acl_ctx *ctx;
struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;
char name[sizeof(ctx->name)];

/* check that we have an initialised tail queue */
@@ -105,15 +135,31 @@ rte_acl_create(const struct rte_acl_param *param)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* if we already have one with that name */
-   TAILQ_FOREACH(ctx, acl_list, next) {
+   TAILQ_FOREACH(te, acl_list, next) {
+   ctx = (struct rte_acl_ctx*) te->data;
if (strncmp(param->name, ctx->name, sizeof(ctx->name)) == 0)
break;
}

/* if ACL with such name doesn't exist, then create a new one. */
-   if (ctx == NULL && (ctx = rte_zmalloc_socket(name, sz, CACHE_LINE_SIZE,
-   param->socket_id)) != NULL) {
+   if (te == NULL) {
+   ctx = NULL;
+   te = rte_zmalloc("ACL_TAILQ_ENTRY", sizeof(*te), 0);
+
+   if (te == NULL) {
+   RTE_LOG(ERR, ACL, "Cannot allocate tailq entry!\n");
+   goto exit;
+   }
+
+   ctx = rte_zmalloc_socket(name, sz, CACHE_LINE_SIZE, 
param->socket_id);

+   if (ctx == NULL) {
+   RTE_LOG(ERR, ACL,
+   "allocation of %zu bytes on socket %d for %s 
failed\n",
+   sz, param->socket_id, name);
+   rte_free(te);
+   goto exit;
+   }
/* init new allocated context. */
ctx->rules = ctx + 1;
ctx->max_rules = param->max_rule_num;
@@ -121,14 +167,12 @@ rte_acl_create(const struct rte_acl_param *param)
ctx->socket_id = param->socket_id;
rte_snprintf(ctx->name, sizeof(ctx->name), "%s", param->name);

-   TAILQ_INSERT_TAIL(acl_list, ctx, next);
+   te->data = (void *) ctx;

-   } else if (ctx == NU

[dpdk-dev] [PATCH 4/9] rte_hash: make rte_hash tailq fully local

2014-06-17 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_hash/rte_hash.c | 61 +++---
 lib/librte_hash/rte_hash.h |  2 --
 2 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/lib/librte_hash/rte_hash.c b/lib/librte_hash/rte_hash.c
index d4221a8..eea5c01 100644
--- a/lib/librte_hash/rte_hash.c
+++ b/lib/librte_hash/rte_hash.c
@@ -60,7 +60,7 @@
 #include "rte_hash.h"


-TAILQ_HEAD(rte_hash_list, rte_hash);
+TAILQ_HEAD(rte_hash_list, rte_tailq_entry);

 /* Macro to enable/disable run-time checking of function parameters */
 #if defined(RTE_LIBRTE_HASH_DEBUG)
@@ -141,24 +141,29 @@ find_first(uint32_t sig, const uint32_t *sig_bucket, 
uint32_t num_sigs)
 struct rte_hash *
 rte_hash_find_existing(const char *name)
 {
-   struct rte_hash *h;
+   struct rte_hash *h = NULL;
+   struct rte_tailq_entry *te;
struct rte_hash_list *hash_list;

/* check that we have an initialised tail queue */
-   if ((hash_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_HASH, 
rte_hash_list)) == NULL) {
+   if ((hash_list =
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_HASH, rte_hash_list)) 
== NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(h, hash_list, next) {
+   TAILQ_FOREACH(te, hash_list, next) {
+   h = (struct rte_hash *) te->data;
if (strncmp(name, h->name, RTE_HASH_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (h == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return h;
 }

@@ -166,6 +171,7 @@ struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
struct rte_hash *h = NULL;
+   struct rte_tailq_entry *te;
uint32_t num_buckets, sig_bucket_size, key_size,
hash_tbl_size, sig_tbl_size, key_tbl_size, mem_size;
char hash_name[RTE_HASH_NAMESIZE];
@@ -212,17 +218,25 @@ rte_hash_create(const struct rte_hash_parameters *params)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(h, hash_list, next) {
+   TAILQ_FOREACH(te, hash_list, next) {
+   h = (struct rte_hash *) te->data;
if (strncmp(params->name, h->name, RTE_HASH_NAMESIZE) == 0)
break;
}
-   if (h != NULL)
+   if (te != NULL)
+   goto exit;
+
+   te = rte_zmalloc("HASH_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, HASH, "tailq entry allocation failed\n");
goto exit;
+   }

h = (struct rte_hash *)rte_zmalloc_socket(hash_name, mem_size,
   CACHE_LINE_SIZE, params->socket_id);
if (h == NULL) {
RTE_LOG(ERR, HASH, "memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -242,7 +256,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->hash_func = (params->hash_func == NULL) ?
DEFAULT_HASH_FUNC : params->hash_func;

-   TAILQ_INSERT_TAIL(hash_list, h, next);
+   te->data = (void *) h;
+
+   TAILQ_INSERT_TAIL(hash_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -253,11 +269,38 @@ exit:
 void
 rte_hash_free(struct rte_hash *h)
 {
+   struct rte_tailq_entry *te;
+   struct rte_hash_list *hash_list;
+
if (h == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_HASH, rte_hash_list, h);
+   /* check that we have an initialised tail queue */
+   if ((hash_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_HASH, rte_hash_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find out tailq entry */
+   TAILQ_FOREACH(te, hash_list, next) {
+   if (te->data == (void *) h)
+   break;
+   }
+
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(hash_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(h);
+   rte_free(te);
 }

 static inline int32_t
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 5228e3a..2ecaf1a 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -86,8 +86,6 @@ struct rte_hash_parameters {

 /** A hash table structure. */
 struct rte_hash {
-   TAILQ_ENTRY(rte_hash) next;/**< Next in list. */
-
char name[RTE_HASH_NAMESIZE];   /**< Name of the hash. */
uint32_t entries;   /**< Total table entries. */
uint32_t buc

[dpdk-dev] [PATCH 0/9] Make DPDK tailqs fully local

2014-06-17 Thread Anatoly Burakov

This issue was reported by OVS-DPDK project, and the fix should go to
upstream DPDK. This is not memnic-related - this is to do with
DPDK's rte_ivshmem library.

Every DPDK data structure has a corresponding TAILQ reserved for it in
the runtime config file. Those TAILQs are fully local to the process,
however most data structures contain pointers to next entry in the
TAILQ.

Since the data structures such as rings are shared in their entirety,
those TAILQ pointers are shared as well. Meaning that, after a
successful rte_ring creation, the tailq_next pointer of the last
ring in the TAILQ will be updated with a pointer to a ring which may
not be present in the address space of another process (i.e. a ring
that may be host-local or guest-local, and not shared over IVSHMEM).
Any successive ring create/lookup on the other side of IVSHMEM will
result in trying to dereference an invalid pointer.

This patchset fixes this problem by creating a default tailq entry
that may be used by any data structure that chooses to use TAILQs.
This default TAILQ entry will consist of a tailq_next/tailq_prev
pointers, and an opaque pointer to arbitrary data. All TAILQ
pointers from data structures themselves will be removed and
replaced by those generic TAILQ entries, thus fixing the problem
of potentially exposing local address space to shared structures.

Technically, only rte_ring structure require modification, because
IVSHMEM is only using memzones (which aren't in TAILQs) and rings,
but for consistency's sake other TAILQ-based data structures were
adapted as well.

v2 changes:
* fixed race conditions in *_free operations
* fixed multiprocess support for malloc heaps
* added similar changes for acl
* rebased on top of e88b42f818bc1a6d4ce6cb70371b66e37fa34f7d

Anatoly Burakov (9):
  eal: map shared config into exact same address as primary process
  rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer
  rte_ring: make ring tailq fully local
  rte_hash: make rte_hash tailq fully local
  rte_fbk_hash: make rte_fbk_hash tailq fully local
  rte_mempool: make mempool tailq fully local
  rte_lpm: make lpm tailq fully local
  rte_lpm6: make lpm6 tailq fully local
  rte_acl: make acl tailq fully local

 app/test/test_tailq.c | 33 +-
 lib/librte_acl/acl.h  |  1 -
 lib/librte_acl/rte_acl.c  | 74 ++-
 lib/librte_eal/common/eal_common_tailqs.c |  2 +-
 lib/librte_eal/common/include/rte_eal_memconfig.h |  5 ++
 lib/librte_eal/common/include/rte_tailq.h |  9 +--
 lib/librte_eal/linuxapp/eal/eal.c | 31 --
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 17 +-
 lib/librte_hash/rte_fbk_hash.c| 73 +-
 lib/librte_hash/rte_fbk_hash.h|  3 -
 lib/librte_hash/rte_hash.c| 61 ---
 lib/librte_hash/rte_hash.h|  2 -
 lib/librte_lpm/rte_lpm.c  | 65 
 lib/librte_lpm/rte_lpm.h  |  2 -
 lib/librte_lpm/rte_lpm6.c | 62 +++
 lib/librte_mempool/Makefile   |  3 +-
 lib/librte_mempool/rte_mempool.c  | 37 +---
 lib/librte_mempool/rte_mempool.h  |  2 -
 lib/librte_ring/Makefile  |  4 +-
 lib/librte_ring/rte_ring.c| 33 +++---
 lib/librte_ring/rte_ring.h|  2 -
 21 files changed, 402 insertions(+), 119 deletions(-)

-- 
1.8.1.4

[dpdk-dev] [PATCH 8/9] rte_lpm6: make lpm6 tailq fully local

2014-06-17 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_lpm/rte_lpm6.c | 62 ++-
 1 file changed, 51 insertions(+), 11 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 56c74a1..73b48d0 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -77,7 +77,7 @@ enum valid_flag {
VALID
 };

-TAILQ_HEAD(rte_lpm6_list, rte_lpm6);
+TAILQ_HEAD(rte_lpm6_list, rte_tailq_entry);

 /** Tbl entry structure. It is the same for both tbl24 and tbl8 */
 struct rte_lpm6_tbl_entry {
@@ -99,8 +99,6 @@ struct rte_lpm6_rule {

 /** LPM6 structure. */
 struct rte_lpm6 {
-   TAILQ_ENTRY(rte_lpm6) next;  /**< Next in list. */
-
/* LPM metadata. */
char name[RTE_LPM6_NAMESIZE];/**< Name of the lpm. */
uint32_t max_rules;  /**< Max number of rules. */
@@ -149,6 +147,7 @@ rte_lpm6_create(const char *name, int socket_id,
 {
char mem_name[RTE_LPM6_NAMESIZE];
struct rte_lpm6 *lpm = NULL;
+   struct rte_tailq_entry *te;
uint64_t mem_size, rules_size;
struct rte_lpm6_list *lpm_list;

@@ -179,12 +178,20 @@ rte_lpm6_create(const char *name, int socket_id,
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* Guarantee there's no existing */
-   TAILQ_FOREACH(lpm, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   lpm = (struct rte_lpm6 *) te->data;
if (strncmp(name, lpm->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
-   if (lpm != NULL)
+   if (te != NULL)
+   goto exit;
+
+   /* allocate tailq entry */
+   te = rte_zmalloc("LPM6_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, LPM, "Failed to allocate tailq entry!\n");
goto exit;
+   }

/* Allocate memory to store the LPM data structures. */
lpm = (struct rte_lpm6 *)rte_zmalloc_socket(mem_name, (size_t)mem_size,
@@ -192,6 +199,7 @@ rte_lpm6_create(const char *name, int socket_id,

if (lpm == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -201,6 +209,7 @@ rte_lpm6_create(const char *name, int socket_id,
if (lpm->rules_tbl == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
rte_free(lpm);
+   rte_free(te);
goto exit;
}

@@ -209,7 +218,9 @@ rte_lpm6_create(const char *name, int socket_id,
lpm->number_tbl8s = config->number_tbl8s;
rte_snprintf(lpm->name, sizeof(lpm->name), "%s", name);

-   TAILQ_INSERT_TAIL(lpm_list, lpm, next);
+   te->data = (void *) lpm;
+
+   TAILQ_INSERT_TAIL(lpm_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -223,7 +234,8 @@ exit:
 struct rte_lpm6 *
 rte_lpm6_find_existing(const char *name)
 {
-   struct rte_lpm6 *l;
+   struct rte_lpm6 *l = NULL;
+   struct rte_tailq_entry *te;
struct rte_lpm6_list *lpm_list;

/* Check that we have an initialised tail queue */
@@ -234,14 +246,17 @@ rte_lpm6_find_existing(const char *name)
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(l, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   l = (struct rte_lpm6 *) te->data;
if (strncmp(name, l->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (l == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return l;
 }
@@ -252,13 +267,38 @@ rte_lpm6_find_existing(const char *name)
 void
 rte_lpm6_free(struct rte_lpm6 *lpm)
 {
+   struct rte_lpm6_list *lpm_list;
+   struct rte_tailq_entry *te;
+
/* Check user arguments. */
if (lpm == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM6, rte_lpm6_list, lpm);
-   rte_free(lpm->rules_tbl);
+   /* check that we have an initialised tail queue */
+   if ((lpm_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm6_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, lpm_list, next) {
+   if (te->data == (void *) lpm)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(lpm_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(lpm);
+   rte_free(te);
 }

 /*
-- 
1.8.1.4

[dpdk-dev] [PATCH 5/9] rte_fbk_hash: make rte_fbk_hash tailq fully local

2014-06-17 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_hash/rte_fbk_hash.c | 73 ++
 lib/librte_hash/rte_fbk_hash.h |  3 --
 2 files changed, 59 insertions(+), 17 deletions(-)

diff --git a/lib/librte_hash/rte_fbk_hash.c b/lib/librte_hash/rte_fbk_hash.c
index 4d67554..1356cf4 100644
--- a/lib/librte_hash/rte_fbk_hash.c
+++ b/lib/librte_hash/rte_fbk_hash.c
@@ -54,7 +54,7 @@

 #include "rte_fbk_hash.h"

-TAILQ_HEAD(rte_fbk_hash_list, rte_fbk_hash_table);
+TAILQ_HEAD(rte_fbk_hash_list, rte_tailq_entry);

 /**
  * Performs a lookup for an existing hash table, and returns a pointer to
@@ -69,24 +69,29 @@ TAILQ_HEAD(rte_fbk_hash_list, rte_fbk_hash_table);
 struct rte_fbk_hash_table *
 rte_fbk_hash_find_existing(const char *name)
 {
-   struct rte_fbk_hash_table *h;
+   struct rte_fbk_hash_table *h = NULL;
+   struct rte_tailq_entry *te;
struct rte_fbk_hash_list *fbk_hash_list;

/* check that we have an initialised tail queue */
if ((fbk_hash_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list)) == 
NULL) {
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(h, fbk_hash_list, next) {
+   TAILQ_FOREACH(te, fbk_hash_list, next) {
+   h = (struct rte_fbk_hash_table *) te->data;
if (strncmp(name, h->name, RTE_FBK_HASH_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);
-   if (h == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return h;
 }

@@ -104,6 +109,7 @@ struct rte_fbk_hash_table *
 rte_fbk_hash_create(const struct rte_fbk_hash_params *params)
 {
struct rte_fbk_hash_table *ht = NULL;
+   struct rte_tailq_entry *te;
char hash_name[RTE_FBK_HASH_NAMESIZE];
const uint32_t mem_size =
sizeof(*ht) + (sizeof(ht->t[0]) * params->entries);
@@ -112,7 +118,8 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)

/* check that we have an initialised tail queue */
if ((fbk_hash_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list)) == 
NULL) {
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}
@@ -134,20 +141,28 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(ht, fbk_hash_list, next) {
+   TAILQ_FOREACH(te, fbk_hash_list, next) {
+   ht = (struct rte_fbk_hash_table *) te->data;
if (strncmp(params->name, ht->name, RTE_FBK_HASH_NAMESIZE) == 0)
break;
}
-   if (ht != NULL)
+   if (te != NULL)
goto exit;

+   te = rte_zmalloc("FBK_HASH_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, HASH, "Failed to allocate tailq entry\n");
+   goto exit;
+   }
+
/* Allocate memory for table. */
-   ht = (struct rte_fbk_hash_table *)rte_malloc_socket(hash_name, mem_size,
+   ht = (struct rte_fbk_hash_table *)rte_zmalloc_socket(hash_name, 
mem_size,
0, params->socket_id);
-   if (ht == NULL)
+   if (ht == NULL) {
+   RTE_LOG(ERR, HASH, "Failed to allocate fbk hash table\n");
+   rte_free(te);
goto exit;
-
-   memset(ht, 0, mem_size);
+   }

/* Set up hash table context. */
rte_snprintf(ht->name, sizeof(ht->name), "%s", params->name);
@@ -169,7 +184,9 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)
ht->init_val = RTE_FBK_HASH_INIT_VAL_DEFAULT;
}

-   TAILQ_INSERT_TAIL(fbk_hash_list, ht, next);
+   te->data = (void *) ht;
+
+   TAILQ_INSERT_TAIL(fbk_hash_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -186,10 +203,38 @@ exit:
 void
 rte_fbk_hash_free(struct rte_fbk_hash_table *ht)
 {
+   struct rte_tailq_entry *te;
+   struct rte_fbk_hash_list *fbk_hash_list;
+
if (ht == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list, ht);
+   /* check that we have an initialised tail queue */
+   if ((fbk_hash_list =
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   r

[dpdk-dev] [PATCH 3/9] rte_ring: make ring tailq fully local

2014-06-17 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 17 ++--
 lib/librte_ring/Makefile  |  4 ++--
 lib/librte_ring/rte_ring.c| 33 +++
 lib/librte_ring/rte_ring.h|  2 --
 4 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c 
b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
index 4ad76a7..fa5f4e3 100644
--- a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
+++ b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -101,7 +102,7 @@ static int memseg_idx;
 static int pagesz;

 /* Tailq heads to add rings to */
-TAILQ_HEAD(rte_ring_list, rte_ring);
+TAILQ_HEAD(rte_ring_list, rte_tailq_entry);

 /*
  * Utility functions
@@ -754,6 +755,7 @@ rte_eal_ivshmem_obj_init(void)
struct ivshmem_segment * seg;
struct rte_memzone * mz;
struct rte_ring * r;
+   struct rte_tailq_entry *te;
unsigned i, ms, idx;
uint64_t offset;

@@ -808,6 +810,8 @@ rte_eal_ivshmem_obj_init(void)
mcfg->memzone_idx++;
}

+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
/* find rings */
for (i = 0; i < mcfg->memzone_idx; i++) {
mz = &mcfg->memzone[i];
@@ -819,10 +823,19 @@ rte_eal_ivshmem_obj_init(void)

r = (struct rte_ring*) (mz->addr_64);

-   TAILQ_INSERT_TAIL(ring_list, r, next);
+   te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot allocate ring tailq 
entry!\n");
+   return -1;
+   }
+
+   te->data = (void *) r;
+
+   TAILQ_INSERT_TAIL(ring_list, te, next);

RTE_LOG(DEBUG, EAL, "Found ring: '%s' at %p\n", r->name, 
mz->addr);
}
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

 #ifdef RTE_LIBRTE_IVSHMEM_DEBUG
rte_memzone_dump(stdout);
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 550507d..2380a43 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -42,7 +42,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h

-# this lib needs eal
-DEPDIRS-$(CONFIG_RTE_LIBRTE_RING) += lib/librte_eal
+# this lib needs eal and rte_malloc
+DEPDIRS-$(CONFIG_RTE_LIBRTE_RING) += lib/librte_eal lib/librte_malloc

 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 2fe4024..d2ff3fe 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -75,6 +75,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -89,7 +90,7 @@

 #include "rte_ring.h"

-TAILQ_HEAD(rte_ring_list, rte_ring);
+TAILQ_HEAD(rte_ring_list, rte_tailq_entry);

 /* true if x is a power of 2 */
 #define POWEROF2(x) x)-1) & (x)) == 0)
@@ -155,6 +156,7 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
 {
char mz_name[RTE_MEMZONE_NAMESIZE];
struct rte_ring *r;
+   struct rte_tailq_entry *te;
const struct rte_memzone *mz;
ssize_t ring_size;
int mz_flags = 0;
@@ -173,6 +175,13 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
return NULL;
}

+   te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, RING, "Cannot reserve memory for tailq\n");
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+
rte_snprintf(mz_name, sizeof(mz_name), "%s%s", RTE_RING_MZ_PREFIX, 
name);

rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -186,10 +195,14 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
/* no need to check return value here, we already checked the
 * arguments above */
rte_ring_init(r, name, count, flags);
-   TAILQ_INSERT_TAIL(ring_list, r, next);
+
+   te->data = (void *) r;
+
+   TAILQ_INSERT_TAIL(ring_list, te, next);
} else {
r = NULL;
RTE_LOG(ERR, RING, "Cannot reserve memory\n");
+   rte_free(te);
}
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

@@ -272,7 +285,7 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 void
 rte_ring_list_dump(FILE *f)
 {
-   const struct rte_ring *mp;
+   const struct rte_tailq_entry *te;
struct rte_ring_list *ring_list;

/* check that we have an initialised tail queue */
@@ -284,8 +297,8 @@ rte_ring_list_dump(FILE *f)

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);

-   TAILQ_FOREACH(mp, ring_list, next) {
-   rte_ring_dump(f, mp);
+   TAILQ_FOREACH(te, ring_list, next)

[dpdk-dev] [PATCH 6/9] rte_mempool: make mempool tailq fully local

2014-06-17 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_mempool/Makefile  |  3 ++-
 lib/librte_mempool/rte_mempool.c | 37 -
 lib/librte_mempool/rte_mempool.h |  2 --
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index c79b306..9939e10 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -44,7 +44,8 @@ endif
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h

-# this lib needs eal
+# this lib needs eal, rte_ring and rte_malloc
 DEPDIRS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += lib/librte_eal lib/librte_ring
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += lib/librte_malloc

 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7eebf7f..736e854 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -60,7 +61,7 @@

 #include "rte_mempool.h"

-TAILQ_HEAD(rte_mempool_list, rte_mempool);
+TAILQ_HEAD(rte_mempool_list, rte_tailq_entry);

 #define CACHE_FLUSHTHRESH_MULTIPLIER 1.5

@@ -404,6 +405,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
char mz_name[RTE_MEMZONE_NAMESIZE];
char rg_name[RTE_RING_NAMESIZE];
struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_ring *r;
const struct rte_memzone *mz;
size_t mempool_size;
@@ -501,6 +503,13 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
}
}

+   /* try to allocate tailq entry */
+   te = rte_zmalloc("MEMPOOL_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, MEMPOOL, "Cannot allocate tailq entry!\n");
+   goto exit;
+   }
+
/*
 * If user provided an external memory buffer, then use it to
 * store mempool objects. Otherwise reserve memzone big enough to
@@ -527,8 +536,10 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
 * no more memory: in this case we loose previously reserved
 * space for the as we cannot free it
 */
-   if (mz == NULL)
+   if (mz == NULL) {
+   rte_free(te);
goto exit;
+   }

if (rte_eal_has_hugepages()) {
startaddr = (void*)mz->addr;
@@ -587,7 +598,9 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,

mempool_populate(mp, n, 1, obj_init, obj_init_arg);

-   RTE_EAL_TAILQ_INSERT_TAIL(RTE_TAILQ_MEMPOOL, rte_mempool_list, mp);
+   te->data = (void *) mp;
+
+   RTE_EAL_TAILQ_INSERT_TAIL(RTE_TAILQ_MEMPOOL, rte_mempool_list, te);

 exit:
rte_rwlock_write_unlock(RTE_EAL_MEMPOOL_RWLOCK);
@@ -812,6 +825,7 @@ void
 rte_mempool_list_dump(FILE *f)
 {
const struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -822,7 +836,8 @@ rte_mempool_list_dump(FILE *f)

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
+   TAILQ_FOREACH(te, mempool_list, next) {
+   mp = (struct rte_mempool *) te->data;
rte_mempool_dump(f, mp);
}

@@ -834,6 +849,7 @@ struct rte_mempool *
 rte_mempool_lookup(const char *name)
 {
struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -844,15 +860,18 @@ rte_mempool_lookup(const char *name)

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
+   TAILQ_FOREACH(te, mempool_list, next) {
+   mp = (struct rte_mempool *) te->data;
if (strncmp(name, mp->name, RTE_MEMPOOL_NAMESIZE) == 0)
break;
}

rte_rwlock_read_unlock(RTE_EAL_MEMPOOL_RWLOCK);

-   if (mp == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return mp;
 }
@@ -860,7 +879,7 @@ rte_mempool_lookup(const char *name)
 void rte_mempool_walk(void (*func)(const struct rte_mempool *, void *),
  void *arg)
 {
-   struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te = NULL;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -871,8 +890,8 @@ void rte_mempool_walk(void (*func)(const struct rte_mempool 
*, void *),

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
-   (*func)(mp, arg);
+   TAILQ_FOREACH(te, mempool_list, next) {
+   (*func)((struct rte_mempool *) te->data, arg);
}

rte_rwlock_read_unlock(RTE_EAL_MEMPOO

[dpdk-dev] [PATCH 1/9] eal: map shared config into exact same address as primary process

2014-06-17 Thread Anatoly Burakov

Shared config is shared across primary and secondary processes.
However,when using rte_malloc, the malloc elements keep references to
the heap inside themselves. This heap reference might not be referencing
a local heap because the heap reference points to the heap of whatever
process has allocated that malloc element. Therefore, there can be
situations when malloc elements in a given heap actually reference
different addresses for the same heap - depending on which process has
allocated the element. This can lead to segmentation faults when dealing
with malloc elements allocated on the same heap by different processes.

To fix this problem, heaps will now have the same addresses across
processes. In order to achieve that, a new field in a shared mem_config
(a structure that holds the heaps, and which is shared across processes)
was added to keep the address of where this config is mapped in the
primary process.

Secondary process will now map the config in two stages - first, it'll
map it into an arbitrary address and read the address the primary
process has allocated for the shared config. Then, the config is
unmapped and re-mapped using the address previously read.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/include/rte_eal_memconfig.h |  5 
 lib/librte_eal/linuxapp/eal/eal.c | 31 +++
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h 
b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 30ce6fc..d6359e5 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -89,6 +89,11 @@ struct rte_mem_config {

/* Heaps of Malloc per socket */
struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+
+   /* address of mem_config in primary process. used to map shared config 
into
+* exact same address the primary process maps it.
+*/
+   uint64_t mem_cfg_addr;
 } __attribute__((__packed__));


diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 6994303..fedd82f 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -239,6 +239,11 @@ rte_eal_config_create(void)
}
memcpy(rte_mem_cfg_addr, &early_mem_config, sizeof(early_mem_config));
rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
+
+   /* store address of the config in the config itself so that secondary
+* processes could later map the config into this exact location */
+   rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
+
 }

 /* attach to an existing shared memory config */
@@ -246,6 +251,8 @@ static void
 rte_eal_config_attach(void)
 {
void *rte_mem_cfg_addr;
+   struct rte_mem_config *mem_config;
+
const char *pathname = eal_runtime_config_path();

if (internal_config.no_shconf)
@@ -257,13 +264,27 @@ rte_eal_config_attach(void)
rte_panic("Cannot open '%s' for rte_mem_config\n", 
pathname);
}

-   rte_mem_cfg_addr = mmap(NULL, sizeof(*rte_config.mem_config),
-   PROT_READ | PROT_WRITE, MAP_SHARED, mem_cfg_fd, 
0);
-   close(mem_cfg_fd);
-   if (rte_mem_cfg_addr == MAP_FAILED)
+   /* map it as read-only first */
+   mem_config = (struct rte_mem_config *) mmap(NULL, sizeof(*mem_config),
+   PROT_READ, MAP_SHARED, mem_cfg_fd, 0);
+   if (mem_config == MAP_FAILED)
rte_panic("Cannot mmap memory for rte_config\n");

-   rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
+   /* store address used by primary process */
+   rte_mem_cfg_addr = (void *) (uintptr_t) mem_config->mem_cfg_addr;
+
+   /* unmap the config */
+   munmap(mem_config, sizeof(*mem_config));
+
+   /* map the config again, with the proper virtual address */
+   mem_config = (struct rte_mem_config *) mmap(rte_mem_cfg_addr,
+   sizeof(*mem_config), PROT_READ | PROT_WRITE, MAP_SHARED,
+   mem_cfg_fd, 0);
+   if (mem_config == MAP_FAILED || mem_config != rte_mem_cfg_addr)
+   rte_panic("Cannot mmap memory for rte_config\n");
+   close(mem_cfg_fd);
+
+   rte_config.mem_config = mem_config;
 }

 /* Detect if we are a primary or a secondary process */
-- 
1.8.1.4

[dpdk-dev] [PATCH 2/9] rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer

2014-06-17 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 app/test/test_tailq.c | 33 ---
 lib/librte_eal/common/eal_common_tailqs.c |  2 +-
 lib/librte_eal/common/include/rte_tailq.h |  9 +
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/app/test/test_tailq.c b/app/test/test_tailq.c
index 67da009..c9b53ee 100644
--- a/app/test/test_tailq.c
+++ b/app/test/test_tailq.c
@@ -52,16 +52,16 @@

 #define DEFAULT_TAILQ (RTE_TAILQ_NUM)

-static struct rte_dummy d_elem;
+static struct rte_tailq_entry d_elem;

 static int
 test_tailq_create(void)
 {
-   struct rte_dummy_head *d_head;
+   struct rte_tailq_entry_head *d_head;
unsigned i;

/* create a first tailq and check its non-null */
-   d_head = RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error allocating dummy_q0\n");

@@ -70,13 +70,14 @@ test_tailq_create(void)
TAILQ_INSERT_TAIL(d_head, &d_elem, next);

/* try allocating dummy_q0 again, and check for failure */
-   if (RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_dummy_head) == NULL)
+   if (RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head) == 
NULL)
do_return("Error, non-null result returned when attemption to "
"re-allocate a tailq\n");

/* now fill up the tailq slots available and check we get an error */
for (i = RTE_TAILQ_NUM; i < RTE_MAX_TAILQ; i++){
-   if ((d_head = RTE_TAILQ_RESERVE_BY_IDX(i, rte_dummy_head)) == 
NULL)
+   if ((d_head = RTE_TAILQ_RESERVE_BY_IDX(i,
+   rte_tailq_entry_head)) == NULL)
break;
}

@@ -91,10 +92,10 @@ static int
 test_tailq_lookup(void)
 {
/* run successful  test - check result is found */
-   struct rte_dummy_head *d_head;
-   struct rte_dummy *d_ptr;
+   struct rte_tailq_entry_head *d_head;
+   struct rte_tailq_entry *d_ptr;

-   d_head = RTE_TAILQ_LOOKUP_BY_IDX(DEFAULT_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error with tailq lookup\n");

@@ -104,7 +105,7 @@ test_tailq_lookup(void)
"expected element not found\n");

/* now try a bad/error lookup */
-   d_head = RTE_TAILQ_LOOKUP_BY_IDX(RTE_MAX_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP_BY_IDX(RTE_MAX_TAILQ, rte_tailq_entry_head);
if (d_head != NULL)
do_return("Error, lookup does not return NULL for bad tailq 
name\n");

@@ -115,7 +116,7 @@ test_tailq_lookup(void)
 static int
 test_tailq_deprecated(void)
 {
-   struct rte_dummy_head *d_head;
+   struct rte_tailq_entry_head *d_head;

/* since TAILQ_RESERVE is not able to create new tailqs,
 * we should find an existing one (IOW, RTE_TAILQ_RESERVE behaves 
identical
@@ -123,29 +124,29 @@ test_tailq_deprecated(void)
 *
 * PCI_RESOURCE_LIST tailq is guaranteed to
 * be present in any DPDK app. */
-   d_head = RTE_TAILQ_RESERVE("PCI_RESOURCE_LIST", rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE("PCI_RESOURCE_LIST", rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error finding PCI_RESOURCE_LIST\n");

-   d_head = RTE_TAILQ_LOOKUP("PCI_RESOURCE_LIST", rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP("PCI_RESOURCE_LIST", rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error finding PCI_RESOURCE_LIST\n");

/* try doing that with non-existent names */
-   d_head = RTE_TAILQ_RESERVE("random name", rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE("random name", rte_tailq_entry_head);
if (d_head != NULL)
do_return("Non-existent tailq found!\n");

-   d_head = RTE_TAILQ_LOOKUP("random name", rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP("random name", rte_tailq_entry_head);
if (d_head != NULL)
do_return("Non-existent tailq found!\n");

/* try doing the same with NULL names */
-   d_head = RTE_TAILQ_RESERVE(NULL, rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE(NULL, rte_tailq_entry_head);
if (d_head != NULL)
do_return("NULL tailq found!\n");

-   d_head = RTE_TAILQ_LOOKUP(NULL, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP(NULL, rte_tailq_entry_head);
if (d_head != NULL)
do_return("NULL tailq found!\n");

diff --git a/lib/librte_eal/common/eal_common_tailqs.c 
b/lib/librte_eal/common/eal_common_tailqs.c
index f294a58..db9a185 100644
--- a/lib/librte_eal/common/eal_common_tailqs.c
+++ b/lib/librte_eal/common/eal_common_tailqs.c
@@ -118,7 +118,7 @@ rte_dump_tailq(FILE *f)
rte_rwlock_rea

[dpdk-dev] [PATCH 7/9] rte_lpm: make lpm tailq fully local

2014-06-17 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_lpm/rte_lpm.c | 65 
 lib/librte_lpm/rte_lpm.h |  2 --
 2 files changed, 54 insertions(+), 13 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 592750e..6a49d43 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -56,7 +56,7 @@

 #include "rte_lpm.h"

-TAILQ_HEAD(rte_lpm_list, rte_lpm);
+TAILQ_HEAD(rte_lpm_list, rte_tailq_entry);

 #define MAX_DEPTH_TBL24 24

@@ -118,24 +118,29 @@ depth_to_range(uint8_t depth)
 struct rte_lpm *
 rte_lpm_find_existing(const char *name)
 {
-   struct rte_lpm *l;
+   struct rte_lpm *l = NULL;
+   struct rte_tailq_entry *te;
struct rte_lpm_list *lpm_list;

/* check that we have an initialised tail queue */
-   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) 
== NULL) {
+   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM,
+   rte_lpm_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(l, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   l = (struct rte_lpm *) te->data;
if (strncmp(name, l->name, RTE_LPM_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (l == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return l;
 }
@@ -149,12 +154,13 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
 {
char mem_name[RTE_LPM_NAMESIZE];
struct rte_lpm *lpm = NULL;
+   struct rte_tailq_entry *te;
uint32_t mem_size;
struct rte_lpm_list *lpm_list;

/* check that we have an initialised tail queue */
-   if ((lpm_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) == NULL) {
+   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM,
+   rte_lpm_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}
@@ -176,18 +182,27 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(lpm, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   lpm = (struct rte_lpm *) te->data;
if (strncmp(name, lpm->name, RTE_LPM_NAMESIZE) == 0)
break;
}
-   if (lpm != NULL)
+   if (te != NULL)
goto exit;

+   /* allocate tailq entry */
+   te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, LPM, "Failed to allocate tailq entry\n");
+   goto exit;
+   }
+
/* Allocate memory to store the LPM data structures. */
lpm = (struct rte_lpm *)rte_zmalloc_socket(mem_name, mem_size,
CACHE_LINE_SIZE, socket_id);
if (lpm == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -195,7 +210,9 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
lpm->max_rules = max_rules;
rte_snprintf(lpm->name, sizeof(lpm->name), "%s", name);

-   TAILQ_INSERT_TAIL(lpm_list, lpm, next);
+   te->data = (void *) lpm;
+
+   TAILQ_INSERT_TAIL(lpm_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -209,12 +226,38 @@ exit:
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+   struct rte_lpm_list *lpm_list;
+   struct rte_tailq_entry *te;
+
/* Check user arguments. */
if (lpm == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM, rte_lpm_list, lpm);
+   /* check that we have an initialised tail queue */
+   if ((lpm_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, lpm_list, next) {
+   if (te->data == (void *) lpm)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(lpm_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(lpm);
+   rte_free(te);
 }

 /*
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index d35565d..308f5ef 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -132,8 +132,6 @@ struct rte_lpm_rule_info {

 /** @internal LPM structure. */
 struct rte_lpm

[dpdk-dev] [PATCH v2] malloc: fix malloc and free linear complexity

2014-06-17 Thread De Lara Guarch, Pablo

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
> rsanford2 at gmail.com
> Sent: Friday, May 16, 2014 1:59 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v2] malloc: fix malloc and free linear complexity
> 
> Problems with lib rte_malloc:
>  1. Rte_malloc searches a heap's entire free list looking for
> the best fit, resulting in linear complexity.
>  2. Heaps store free blocks in a singly-linked list, resulting
> in linear complexity when rte_free needs to remove an
> adjacent block.
>  3. The library inserts and removes free blocks with ad hoc,
> in-line code, rather than using linked-list functions or
> macros.
> 
> This patch addresses those problems as follows:
>  1. Replace single free list with a handful of free lists.
> Each free list contains blocks of a specified size range,
> for example:
>   list[0]: (0   , 2^7]
>   list[1]: (2^7 , 2^9]
>   list[2]: (2^9 , 2^11]
>   list[3]: (2^11, 2^13]
>   list[4]: (2^13, MAX_SIZE]
> 
> When allocating a block, start at the first list that can
> contain a big enough block. Search subsequent lists, if
> necessary. Terminate the search as soon as we find a block
> that is big enough.
>  2. Use doubly-linked lists, so that we can remove free blocks
> in constant time.
>  3. Use BSD LIST macros, as defined in sys/queue.h and the
> QUEUE(3) man page.
> 
> Signed-off-by: Robert Sanford 
> ---
>  lib/librte_eal/common/include/rte_malloc_heap.h |6 +-
>  lib/librte_malloc/malloc_elem.c |  121 
> +++
>  lib/librte_malloc/malloc_elem.h |   17 +++-
>  lib/librte_malloc/malloc_heap.c |   67 ++---
>  4 files changed, 128 insertions(+), 83 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h
> b/lib/librte_eal/common/include/rte_malloc_heap.h
> index 5e139cf..1f5d653 100644
> --- a/lib/librte_eal/common/include/rte_malloc_heap.h
> +++ b/lib/librte_eal/common/include/rte_malloc_heap.h
> @@ -35,14 +35,18 @@
>  #define _RTE_MALLOC_HEAP_H_
> 
>  #include 
> +#include 
>  #include 
> 
> +/* Number of free lists per heap, grouped by size. */
> +#define RTE_HEAP_NUM_FREELISTS  5
> +
>  /**
>   * Structure to hold malloc heap
>   */
>  struct malloc_heap {
>   rte_spinlock_t lock;
> - struct malloc_elem * volatile free_head;
> + LIST_HEAD(, malloc_elem) free_head[RTE_HEAP_NUM_FREELISTS];
>   unsigned mz_count;
>   unsigned alloc_count;
>   size_t total_size;
> diff --git a/lib/librte_malloc/malloc_elem.c b/lib/librte_malloc/malloc_elem.c
> index f0da640..13cd5d3 100644
> --- a/lib/librte_malloc/malloc_elem.c
> +++ b/lib/librte_malloc/malloc_elem.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include 
> @@ -60,7 +61,8 @@ malloc_elem_init(struct malloc_elem *elem,
>  {
>   elem->heap = heap;
>   elem->mz = mz;
> - elem->prev = elem->next_free = NULL;
> + elem->prev = NULL;
> + memset(&elem->free_list, 0, sizeof(elem->free_list));
>   elem->state = ELEM_FREE;
>   elem->size = size;
>   elem->pad = 0;
> @@ -125,14 +127,71 @@ split_elem(struct malloc_elem *elem, struct
> malloc_elem *split_pt)
>  }
> 
>  /*
> + * Given an element size, compute its freelist index.
> + * We free an element into the freelist containing similarly-sized elements.
> + * We try to allocate elements starting with the freelist containing
> + * similarly-sized elements, and if necessary, we search freelists
> + * containing larger elements.
> + *
> + * Example element size ranges for a heap with five free lists:
> + *   heap->free_head[0] - (0   , 2^7]
> + *   heap->free_head[1] - (2^7 , 2^9]
> + *   heap->free_head[2] - (2^9 , 2^11]
> + *   heap->free_head[3] - (2^11, 2^13]
> + *   heap->free_head[4] - (2^13, MAX_SIZE]
> + */
> +size_t
> +malloc_elem_free_list_index(size_t size)
> +{
> +#define MALLOC_MINSIZE_LOG2   7
> +#define MALLOC_LOG2_INCREMENT 2
> +
> + size_t log2;
> + size_t index;
> +
> + if (size <= (1UL << MALLOC_MINSIZE_LOG2))
> + return 0;
> +
> + /* Find next power of 2 >= size. */
> + log2 = sizeof(size) * 8 - __builtin_clzl(size-1);
> +
> + /* Compute freelist index, based on log2(size). */
> + index = (log2 - MALLOC_MINSIZE_LOG2 +
> MALLOC_LOG2_INCREMENT - 1) /
> + MALLOC_LOG2_INCREMENT;
> +
> + return (index <= RTE_HEAP_NUM_FREELISTS-1?
> + index: RTE_HEAP_NUM_FREELISTS-1);
> +}
> +
> +/*
> + * Add the specified element to its heap's free list.
> + */
> +void
> +malloc_elem_free_list_insert(struct malloc_elem *elem)
> +{
> + size_t idx = malloc_elem_free_list_index(elem->size -
> MALLOC_ELEM_HEADER_LEN);
> +
> + elem->state = ELEM_FREE;
> + LIST_INSERT_HEAD(&elem->heap->free_head[idx], elem, free_list);
> +}
> +
> +/*
> + * Remove the specified element from its heap's free list.
> + */
> +stati

[dpdk-dev] [PATCH v2 00/27] Add i40e PMD support

2014-06-17 Thread Thomas Monjalon

> The 2nd version of series of patches are to add i40e PMD support.
> It contains the updated basic shared code, and some other enhancements.
> It adds the support of the latest version of firmware.
> * Add new PMD driver of i40e in the folder of librte_pmd_i40e
> * Add some neccessary definitions, changes in rte_mbuf.h and eth_dev
> * Add new configurations for i40e
> * Add or modifiy makefiles to support i40e compilation
> * Add neccessary changes in ixgbe, e1000 and vmxnet3 PMD, as hash flags
>   has been enlarged from 16 bits to 64 bits to support i40e
> * Add neccessary changes in example applications and testpmd to use
>   ETH_RSS_IP to replace all IP hash flags, as i40e introduced more
>   hash flags.
> * Add command in testpmd for port based vlan insertion offload testing
> * Add neccessary changes in eth_dev to support configuring maximum
>   packet length of less than 1518
> * Add two sys files in igb_uio to support enabling/disabling
>   'Extended Tag' and resetting 'Max Read Request Size', as it has
>   big impacts on i40e performance
> * Add neccessary changes in pci to read/write the above two sys files
>   during probing PCI
> 
> Features/enhancements to be implemented later:
> * Set link speed, and physically up/down
> * Double VLAN support, flow director, VMDq and DCB
> * VLAN insertion/stripping, RSS in VF
> 
> Signed-off-by: Helin Zhang 
> Signed-off-by: Jing Chen 
> Acked-by: Cunming Liang 
> Acked-by: Jijiang Liu 
> Acked-by: Jingjing Wu 
> Acked-by: Heqing Zhu 
> Tested-by: Waterman Cao 

Applied for version 1.7.0.

Some things could be cleaned up later, especially i40e specific flags in
generic API must be removed. Please work on a patch for next release.

Thanks for the hard work
-- 
Thomas

[dpdk-dev] [PATCH v2] malloc: fix malloc and free linear complexity

2014-06-17 Thread Robert Sanford

Hi Pablo,

> Overall patch looks OK, but malloc unit tests fail on the last test
(test_multi_alloc_statistics).
> Apparently, the biggest free chunk size changes as it allocates some
memory, whereas in
> the previous implementation, this size did not change. I wonder if unit
test is wrong or if there
> is actually an issue here. Could you look at this as well?

Thanks for your comments. Yes, I will investigate problems with the malloc
unit tests.

BTW, I intend to make one other adjustment to the previous (v2) set of
changes:

--- a/lib/librte_malloc/malloc_elem.c
+++ b/lib/librte_malloc/malloc_elem.c
@@ -143,7 +143,7 @@ split_elem(struct malloc_elem *elem, struct malloc_elem
*split_pt)
 size_t
 malloc_elem_free_list_index(size_t size)
 {
-#define MALLOC_MINSIZE_LOG2   7
+#define MALLOC_MINSIZE_LOG2   8
 #define MALLOC_LOG2_INCREMENT 2

--
Robert

[dpdk-dev] vfio detection

2014-06-17 Thread Richardson, Bruce

> -Original Message-
> From: Burakov, Anatoly
> Sent: Tuesday, June 17, 2014 1:40 AM
> To: Richardson, Bruce; dev at dpdk.org
> Subject: RE: vfio detection
> 
> Hi Bruce,
> 
> > I have a number of NIC ports which were working correctly yesterday and are
> > bound correctly to the igb_uio driver - and I want to keep using them
> > through the igb_uio driver for now, not vfio. However, whenever I run a
> > dpdk application today, I find that the vfio kernel module is getting loaded
> > each time - even after I manually remove it, and verify that it has been
> > removed by checking lsmod. Is this expected? If so, why are we loading the
> > vfio driver when I just want to continue using igb_uio which works fine?
> 
> Can you elaborate a bit on what do you mean by "loading vfio driver"? Do you
> mean the vfio-pci kernel gets loaded by DPDK? I certainly didn't put in any 
> code
> that would automatically load that driver, and certainly not binding devices 
> to it.

The kernel module called just "vfio" is constantly getting reloaded, and there 
is always a "/dev/vfio" directory, which triggers the vfio code handling every 
time I run dpdk.

> 
> > Secondly, then, when testpmd or any other app loads, it automatically tries
> > to map the NIC using vfio and then aborts on the very first NIC port when it
> > fails to do so.
> 
> This shouldn't happen, unless you have a device bound to VFIO and have another
> device in the same IOMMU group that is bound to something else. Can you
> provide a log of what you are seeing?

Log of testpmd run attached.

> 
> > This a) prevents the port from being mapped using igb_uio, and
> > b) for ports which are meant to stay under linux control, forces me to start
> > enumerating ports using blacklist or whitelisting, rather than having things
> > "just work" on a properly configured system as before, i.e. if a port is 
> > bound
> > to igb_uio or vfio it is used, if not bound, it is ignored. Again, is this 
> > by design
> > and expected, because it seems a major regression in usability?
> 
> I think automatic port unbinding and binding was removed, so this again
> shouldn't happen at all.
> 
> It would be useful to have logs for all of these described situations, 
> because we
> certainly didn't encounter any of that during the validation cycle.
> 
Log of testpmd run attached. If you need any more debugging info, let me know.
I'll also test out the patch you just posted to the list - see if it makes any 
difference, and I'll send on a log from it.

/Bruce

[dpdk-dev] [PATCH 1/9] eal: map shared config into exact same address as primary process

2014-06-17 Thread Ananyev, Konstantin

Hi Anatoly

> 
> Shared config is shared across primary and secondary processes.
> However,when using rte_malloc, the malloc elements keep references to
> the heap inside themselves. This heap reference might not be referencing
> a local heap because the heap reference points to the heap of whatever
> process has allocated that malloc element. Therefore, there can be
> situations when malloc elements in a given heap actually reference
> different addresses for the same heap - depending on which process has
> allocated the element. This can lead to segmentation faults when dealing
> with malloc elements allocated on the same heap by different processes.
> 
> To fix this problem, heaps will now have the same addresses across
> processes. In order to achieve that, a new field in a shared mem_config
> (a structure that holds the heaps, and which is shared across processes)
> was added to keep the address of where this config is mapped in the
> primary process.
> 
> Secondary process will now map the config in two stages - first, it'll
> map it into an arbitrary address and read the address the primary
> process has allocated for the shared config. Then, the config is
> unmapped and re-mapped using the address previously read.
> 
> Signed-off-by: Anatoly Burakov 
> ---
>  lib/librte_eal/common/include/rte_eal_memconfig.h |  5 
>  lib/librte_eal/linuxapp/eal/eal.c | 31 
> +++
>  2 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h 
> b/lib/librte_eal/common/include/rte_eal_memconfig.h
> index 30ce6fc..d6359e5 100644
> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> @@ -89,6 +89,11 @@ struct rte_mem_config {
> 
>   /* Heaps of Malloc per socket */
>   struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
> +
> + /* address of mem_config in primary process. used to map shared config 
> into
> +  * exact same address the primary process maps it.
> +  */
> + uint64_t mem_cfg_addr;
>  } __attribute__((__packed__));
> 
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
> b/lib/librte_eal/linuxapp/eal/eal.c
> index 6994303..fedd82f 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -239,6 +239,11 @@ rte_eal_config_create(void)
>   }
>   memcpy(rte_mem_cfg_addr, &early_mem_config, sizeof(early_mem_config));
>   rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
> +
> + /* store address of the config in the config itself so that secondary
> +  * processes could later map the config into this exact location */
> + rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
> +
>  }
> 
>  /* attach to an existing shared memory config */
> @@ -246,6 +251,8 @@ static void
>  rte_eal_config_attach(void)
>  {
>   void *rte_mem_cfg_addr;
> + struct rte_mem_config *mem_config;
> +
>   const char *pathname = eal_runtime_config_path();
> 
>   if (internal_config.no_shconf)
> @@ -257,13 +264,27 @@ rte_eal_config_attach(void)
>   rte_panic("Cannot open '%s' for rte_mem_config\n", 
> pathname);
>   }
> 
> - rte_mem_cfg_addr = mmap(NULL, sizeof(*rte_config.mem_config),
> - PROT_READ | PROT_WRITE, MAP_SHARED, mem_cfg_fd, 
> 0);
> - close(mem_cfg_fd);
> - if (rte_mem_cfg_addr == MAP_FAILED)
> + /* map it as read-only first */
> + mem_config = (struct rte_mem_config *) mmap(NULL, sizeof(*mem_config),
> + PROT_READ, MAP_SHARED, mem_cfg_fd, 0);
> + if (mem_config == MAP_FAILED)
>   rte_panic("Cannot mmap memory for rte_config\n");
> 
> - rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
> + /* store address used by primary process */
> + rte_mem_cfg_addr = (void *) (uintptr_t) mem_config->mem_cfg_addr;
> +
> + /* unmap the config */
> + munmap(mem_config, sizeof(*mem_config));
> +
> + /* map the config again, with the proper virtual address */
> + mem_config = (struct rte_mem_config *) mmap(rte_mem_cfg_addr,
> + sizeof(*mem_config), PROT_READ | PROT_WRITE, MAP_SHARED,
> + mem_cfg_fd, 0);
> + if (mem_config == MAP_FAILED || mem_config != rte_mem_cfg_addr)
> + rte_panic("Cannot mmap memory for rte_config\n");
> + close(mem_cfg_fd);
> +
> + rte_config.mem_config = mem_config;
>  }
> 
>  /* Detect if we are a primary or a secondary process */
> --

I think we introduce a race window here.
If secondary process would do first mmap() before 
rte_config.mem_config->mem_cfg_addr was properly set by primary process,
then it will try to do second mmap() with wrong address.
I think we need to do second mmap() straight after 
rte_eal_mcfg_wait_complete(), or even just inside it.

Konstantin

[dpdk-dev] [PATCH] vfio: make container open error non-fatal

2014-06-17 Thread Richardson, Bruce



> -Original Message-
> From: Burakov, Anatoly
> Sent: Tuesday, June 17, 2014 1:52 AM
> To: Richardson, Bruce; dev at dpdk.org
> Subject: RE: [PATCH] vfio: make container open error non-fatal
> 
> Hi Bruce,
> 
> > The below patch is the quickest fix I found to make my applications work
> > again, but I'm not sure it's the best solution. Can anyone else offer other
> > suggestions to improve this?
> 
> Are you running things as root? If not, I suggest to try and use the setup.sh 
> script
> to correct permissions on the VFIO container and see if it works.
> 
Same error running with or without sudo. I'm not trying to get vfio to work, 
I'm trying to find out how to make dpdk ignore vfio since I didn't do any 
setting up of vfio. :-)

[dpdk-dev] vfio detection

2014-06-17 Thread Thomas Monjalon

Bruce, your log files have been automatically removed on the mailing list.
It's simpler to put logs in the email body.

-- 
Thomas

[dpdk-dev] vfio detection

2014-06-17 Thread Richardson, Bruce

Yes, so I see. Resending with log in body.

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, June 17, 2014 9:36 AM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] vfio detection
> 
> Bruce, your log files have been automatically removed on the mailing list.
> It's simpler to put logs in the email body.
> 
> --
> Thomas

[dpdk-dev] vfio detection

2014-06-17 Thread Richardson, Bruce

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Richardson, Bruce
> Sent: Tuesday, June 17, 2014 9:29 AM
> To: Burakov, Anatoly; dev at dpdk.org
> Subject: Re: [dpdk-dev] vfio detection
> 
> > -Original Message-
> > From: Burakov, Anatoly
> > Sent: Tuesday, June 17, 2014 1:40 AM
> > To: Richardson, Bruce; dev at dpdk.org
> > Subject: RE: vfio detection
> >
> > Hi Bruce,
> >
> > > I have a number of NIC ports which were working correctly yesterday and
> are
> > > bound correctly to the igb_uio driver - and I want to keep using them
> > > through the igb_uio driver for now, not vfio. However, whenever I run a
> > > dpdk application today, I find that the vfio kernel module is getting 
> > > loaded
> > > each time - even after I manually remove it, and verify that it has been
> > > removed by checking lsmod. Is this expected? If so, why are we loading the
> > > vfio driver when I just want to continue using igb_uio which works fine?
> >
> > Can you elaborate a bit on what do you mean by "loading vfio driver"? Do you
> > mean the vfio-pci kernel gets loaded by DPDK? I certainly didn't put in any 
> > code
> > that would automatically load that driver, and certainly not binding 
> > devices to
> it.
> 
> The kernel module called just "vfio" is constantly getting reloaded, and 
> there is
> always a "/dev/vfio" directory, which triggers the vfio code handling every 
> time I
> run dpdk.
> 
> >
> > > Secondly, then, when testpmd or any other app loads, it automatically 
> > > tries
> > > to map the NIC using vfio and then aborts on the very first NIC port when 
> > > it
> > > fails to do so.
> >
> > This shouldn't happen, unless you have a device bound to VFIO and have
> another
> > device in the same IOMMU group that is bound to something else. Can you
> > provide a log of what you are seeing?
> 
> Log of testpmd run attached.

Log got stripped from mail, including below instead.

Script started on Tue 17 Jun 2014 17:23:54 IST

bruce at silpixa00372841:dpdk.org$ ./tools/dpdk_nic_bind.py --status

Network devices using DPDK-compatible driver

:84:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
:87:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
:8b:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
:8e:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe

Network devices using kernel driver
===
:04:00.0 'I350 Gigabit Network Connection' if=em0 drv=igb unused=igb_uio 
*Active*
:04:00.1 'I350 Gigabit Network Connection' if=ens2f1 drv=igb unused=igb_uio 
:04:00.2 'I350 Gigabit Network Connection' if=ens2f2 drv=igb unused=igb_uio 
:04:00.3 'I350 Gigabit Network Connection' if=ens2f3 drv=igb unused=igb_uio 

Other network devices
=
:0a:00.1 'DH8900CC Null Device' unused=igb_uio
:0b:00.1 'DH8900CC Null Device' unused=igb_uio
:0c:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' 
unused=ixgbe,igb_uio
:0c:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' 
unused=ixgbe,igb_uio
:84:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
:87:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
:8b:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
:8e:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio


bruce at silpixa00372841:dpdk.org$ sudo 
./x86_64-native-linuxapp-gcc/app/testpmd -c F00 -n 4 -- -i
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Detected lcore 2 as core 2 on socket 0
EAL: Detected lcore 3 as core 3 on socket 0
EAL: Detected lcore 4 as core 4 on socket 0
EAL: Detected lcore 5 as core 5 on socket 0
EAL: Detected lcore 6 as core 6 on socket 0
EAL: Detected lcore 7 as core 7 on socket 0
EAL: Detected lcore 8 as core 0 on socket 1
EAL: Detected lcore 9 as core 1 on socket 1
EAL: Detected lcore 10 as core 2 on socket 1
EAL: Detected lcore 11 as core 3 on socket 1
EAL: Detected lcore 12 as core 4 on socket 1
EAL: Detected lcore 13 as core 5 on socket 1
EAL: Detected lcore 14 as core 6 on socket 1
EAL: Detected lcore 15 as core 7 on socket 1
EAL: Detected lcore 16 as core 0 on socket 0
EAL: Detected lcore 17 as core 1 on socket 0
EAL: Detected lcore 18 as core 2 on socket 0
EAL: Detected lcore 19 as core 3 on socket 0
EAL: Detected lcore 20 as core 4 on socket 0
EAL: Detected lcore 21 as core 5 on socket 0
EAL: Detected lcore 22 as core 6 on socket 0
EAL: Detected lcore 23 as core 7 on socket 0
EAL: Detected lcore 24 as core 0 on socket 1
EAL: Detected lcore 25 as core 1 on socket 1
EAL: Detected lcore 26 as core 2 on socket 1
EAL: Detected lcore 27 as core 3 on socket 1
EAL: Detected lcore 28 as core 4 on socket 1
EAL: Detected lcore 29 as core 5 on socket 1
EAL: Detected lcore 30 as core 6 on socket 1
EAL: Detected lcore 31 as core 7 on socket 1
EAL: Support maximum 64 logical co

[dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init

2014-06-17 Thread Richardson, Bruce



> -Original Message-
> From: Burakov, Anatoly
> Sent: Tuesday, June 17, 2014 2:12 AM
> To: dev at dpdk.org
> Cc: Richardson, Bruce
> Subject: [PATCH] vfio: open VFIO container at startup rather than during init
> 
> 
> Signed-off-by: Anatoly Burakov 
> ---

This seems to fix the issue I was having. \o/

Acked-by: Bruce Richardson

[dpdk-dev] [PATCH] vfio: make container open error non-fatal

2014-06-17 Thread Richardson, Bruce



> -Original Message-
> From: Richardson, Bruce
> Sent: Monday, June 16, 2014 3:29 PM
> To: dev at dpdk.org
> Cc: Richardson, Bruce
> Subject: [PATCH] vfio: make container open error non-fatal
> 
> When setting up an app to run using the uio driver, errors caused by
> VFIO failures should not abruptly cause the app to fail.
> 
> Example: on a board with 8 ports bound to igb_uio module, and no VFIO
> configuration, a testpmd run currently fails with:
> 
> EAL:   cannot open VFIO container!
> EAL:   :04:00.0 cannot open VFIO container!
> EAL: Error - exiting with code: 1
>   Cause: Requested device :04:00.0 cannot be used
> 
> With this patch applied, the problem with VFIO is ignored and testpmd
> successfully starts up - with ignored errors with vfio - as below:
> 
> EAL: PCI device :04:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:1521 rte_igb_pmd
> EAL:   unknown IOMMU driver!
> EAL:   :04:00.0 cannot open VFIO container!
> EAL:   :04:00.0 not managed by UIO driver, skipping
> <...scan results for other ports skipped...>
> EAL: PCI device :8e:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL:   unknown IOMMU driver!
> EAL:   :8e:00.0 cannot open VFIO container!
> EAL:   PCI memory mapped at 0x7ff4ff5fa000
> EAL:   PCI memory mapped at 0x7ff4ff5f6000
> EAL: PCI device :8e:00.1 on NUMA socket 1
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL:   unknown IOMMU driver!
> EAL:   :8e:00.1 cannot open VFIO container!
> EAL:   PCI memory mapped at 0x7ff4ff4f6000
> EAL:   PCI memory mapped at 0x7ff4ff4f2000
> Interactive-mode selected
> Configuring Port 0 (socket 0)
> <...other 7 ports ...>
> Checking link statuses...
> Port 0 Link Up - speed 1 Mbps - full-duplex
> Port 1 Link Down
> Port 2 Link Up - speed 1 Mbps - full-duplex
> Port 3 Link Down
> Port 4 Link Up - speed 1 Mbps - full-duplex
> Port 5 Link Down
> Port 6 Link Up - speed 1 Mbps - full-duplex
> Port 7 Link Down
> Done
> testpmd>
> 
> This issue is introduced by the VFIO patch set addition, specifically
> commit ff0b67d1.
> 
> Signed-off-by: Bruce Richardson 
> ---


Self-NAK. Anatoly's patch is a better fix.

[dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init

2014-06-17 Thread Thomas Monjalon

> Signed-off-by: Anatoly Burakov 

Please Anatoly, could you provide a text explaining what was broken
and why you fixed it this way?

Thanks
-- 
Thomas

[dpdk-dev] [dpdk-stv] [PATCH 1/1] Fix the pointer 'ctx1' uninitialized error with gcc 4.5.1

2014-06-17 Thread Thomas Monjalon

2014-06-16 11:22, Min Cao:
> Discription: This patch is aimed to fix the the pointer 'ctx1' uninitialized
> error with gcc4.5.1 as described below:
> "dpdk/lib/librte_kvargs/rte_kvargs.c:51:14: error: 'ctx1' may be used
> uninitialized in this function"
> 
> Signed-off-by: Cao Min 
> Acked-by: Liu, Jijiang 
> Tested-by: Waterman Cao 

Applied for version 1.7.0.

Thanks
-- 
Thomas

[dpdk-dev] vfio detection

2014-06-17 Thread Neil Horman

On Tue, Jun 17, 2014 at 04:38:38PM +, Richardson, Bruce wrote:
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Richardson, Bruce
> > Sent: Tuesday, June 17, 2014 9:29 AM
> > To: Burakov, Anatoly; dev at dpdk.org
> > Subject: Re: [dpdk-dev] vfio detection
> > 
> > > -Original Message-
> > > From: Burakov, Anatoly
> > > Sent: Tuesday, June 17, 2014 1:40 AM
> > > To: Richardson, Bruce; dev at dpdk.org
> > > Subject: RE: vfio detection
> > >
> > > Hi Bruce,
> > >
> > > > I have a number of NIC ports which were working correctly yesterday and
> > are
> > > > bound correctly to the igb_uio driver - and I want to keep using them
> > > > through the igb_uio driver for now, not vfio. However, whenever I run a
> > > > dpdk application today, I find that the vfio kernel module is getting 
> > > > loaded
> > > > each time - even after I manually remove it, and verify that it has been
> > > > removed by checking lsmod. Is this expected? If so, why are we loading 
> > > > the
> > > > vfio driver when I just want to continue using igb_uio which works fine?
> > >
> > > Can you elaborate a bit on what do you mean by "loading vfio driver"? Do 
> > > you
> > > mean the vfio-pci kernel gets loaded by DPDK? I certainly didn't put in 
> > > any code
> > > that would automatically load that driver, and certainly not binding 
> > > devices to
> > it.
> > 
> > The kernel module called just "vfio" is constantly getting reloaded, and 
> > there is
> > always a "/dev/vfio" directory, which triggers the vfio code handling every 
> > time I
> > run dpdk.
> > 
> > >
> > > > Secondly, then, when testpmd or any other app loads, it automatically 
> > > > tries
> > > > to map the NIC using vfio and then aborts on the very first NIC port 
> > > > when it
> > > > fails to do so.
> > >
> > > This shouldn't happen, unless you have a device bound to VFIO and have
> > another
> > > device in the same IOMMU group that is bound to something else. Can you
> > > provide a log of what you are seeing?
> > 
> > Log of testpmd run attached.
> 
> Log got stripped from mail, including below instead.
> 
> Script started on Tue 17 Jun 2014 17:23:54 IST
> 
> bruce at silpixa00372841:dpdk.org$ ./tools/dpdk_nic_bind.py --status
> 
> Network devices using DPDK-compatible driver
> 
> :84:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> :87:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> :8b:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> :8e:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> 
> Network devices using kernel driver
> ===
> :04:00.0 'I350 Gigabit Network Connection' if=em0 drv=igb unused=igb_uio 
> *Active*
> :04:00.1 'I350 Gigabit Network Connection' if=ens2f1 drv=igb 
> unused=igb_uio 
> :04:00.2 'I350 Gigabit Network Connection' if=ens2f2 drv=igb 
> unused=igb_uio 
> :04:00.3 'I350 Gigabit Network Connection' if=ens2f3 drv=igb 
> unused=igb_uio 
> 
> Other network devices
> =
> :0a:00.1 'DH8900CC Null Device' unused=igb_uio
> :0b:00.1 'DH8900CC Null Device' unused=igb_uio
> :0c:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' 
> unused=ixgbe,igb_uio
> :0c:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' 
> unused=ixgbe,igb_uio
> :84:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> :87:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> :8b:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> :8e:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> 
> 
> bruce at silpixa00372841:dpdk.org$ sudo 
> ./x86_64-native-linuxapp-gcc/app/testpmd -c F00 -n 4 -- -i
> EAL: Detected lcore 0 as core 0 on socket 0
> EAL: Detected lcore 1 as core 1 on socket 0
> EAL: Detected lcore 2 as core 2 on socket 0
> EAL: Detected lcore 3 as core 3 on socket 0
> EAL: Detected lcore 4 as core 4 on socket 0
> EAL: Detected lcore 5 as core 5 on socket 0
> EAL: Detected lcore 6 as core 6 on socket 0
> EAL: Detected lcore 7 as core 7 on socket 0
> EAL: Detected lcore 8 as core 0 on socket 1
> EAL: Detected lcore 9 as core 1 on socket 1
> EAL: Detected lcore 10 as core 2 on socket 1
> EAL: Detected lcore 11 as core 3 on socket 1
> EAL: Detected lcore 12 as core 4 on socket 1
> EAL: Detected lcore 13 as core 5 on socket 1
> EAL: Detected lcore 14 as core 6 on socket 1
> EAL: Detected lcore 15 as core 7 on socket 1
> EAL: Detected lcore 16 as core 0 on socket 0
> EAL: Detected lcore 17 as core 1 on socket 0
> EAL: Detected lcore 18 as core 2 on socket 0
> EAL: Detected lcore 19 as core 3 on socket 0
> EAL: Detected lcore 20 as core 4 on socket 0
> EAL: Detected lcore 21 as core 5 on socket 0
> EAL: Detected lcore 22 as core 6 on socket 0
> EAL: Detected lcore 23 as core 7 on socket 0
> EAL: Detected lcore 24 as core 0 on socket 1
> EAL: Detected lcore 25 as cor

[dpdk-dev] vfio detection

2014-06-17 Thread Richardson, Bruce



> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Tuesday, June 17, 2014 10:42 AM
> To: Richardson, Bruce
> Cc: Burakov, Anatoly; dev at dpdk.org
> Subject: Re: [dpdk-dev] vfio detection
> 
> On Tue, Jun 17, 2014 at 04:38:38PM +, Richardson, Bruce wrote:
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Richardson, Bruce
> > > Sent: Tuesday, June 17, 2014 9:29 AM
> > > To: Burakov, Anatoly; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] vfio detection
> > >
> > > > -Original Message-
> > > > From: Burakov, Anatoly
> > > > Sent: Tuesday, June 17, 2014 1:40 AM
> > > > To: Richardson, Bruce; dev at dpdk.org
> > > > Subject: RE: vfio detection
> > > >
> > > > Hi Bruce,
> > > >
> > > > > I have a number of NIC ports which were working correctly yesterday 
> > > > > and
> > > are
> > > > > bound correctly to the igb_uio driver - and I want to keep using them
> > > > > through the igb_uio driver for now, not vfio. However, whenever I run 
> > > > > a
> > > > > dpdk application today, I find that the vfio kernel module is getting
> loaded
> > > > > each time - even after I manually remove it, and verify that it has 
> > > > > been
> > > > > removed by checking lsmod. Is this expected? If so, why are we loading
> the
> > > > > vfio driver when I just want to continue using igb_uio which works 
> > > > > fine?
> > > >
> > > > Can you elaborate a bit on what do you mean by "loading vfio driver"? Do
> you
> > > > mean the vfio-pci kernel gets loaded by DPDK? I certainly didn't put in 
> > > > any
> code
> > > > that would automatically load that driver, and certainly not binding 
> > > > devices
> to
> > > it.
> > >
> > > The kernel module called just "vfio" is constantly getting reloaded, and 
> > > there
> is
> > > always a "/dev/vfio" directory, which triggers the vfio code handling 
> > > every
> time I
> > > run dpdk.
> > >
> > > >
> > > > > Secondly, then, when testpmd or any other app loads, it automatically
> tries
> > > > > to map the NIC using vfio and then aborts on the very first NIC port 
> > > > > when
> it
> > > > > fails to do so.
> > > >
> > > > This shouldn't happen, unless you have a device bound to VFIO and have
> > > another
> > > > device in the same IOMMU group that is bound to something else. Can you
> > > > provide a log of what you are seeing?
> > >
> > > Log of testpmd run attached.
> >
> > Log got stripped from mail, including below instead.
> >
> > Script started on Tue 17 Jun 2014 17:23:54 IST
> >
> > bruce at silpixa00372841:dpdk.org$ ./tools/dpdk_nic_bind.py --status
> >
> > Network devices using DPDK-compatible driver
> > 
> > :84:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> > :87:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> > :8b:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> > :8e:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> >
> > Network devices using kernel driver
> > ===
> > :04:00.0 'I350 Gigabit Network Connection' if=em0 drv=igb
> unused=igb_uio *Active*
> > :04:00.1 'I350 Gigabit Network Connection' if=ens2f1 drv=igb
> unused=igb_uio
> > :04:00.2 'I350 Gigabit Network Connection' if=ens2f2 drv=igb
> unused=igb_uio
> > :04:00.3 'I350 Gigabit Network Connection' if=ens2f3 drv=igb
> unused=igb_uio
> >
> > Other network devices
> > =
> > :0a:00.1 'DH8900CC Null Device' unused=igb_uio
> > :0b:00.1 'DH8900CC Null Device' unused=igb_uio
> > :0c:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
> unused=ixgbe,igb_uio
> > :0c:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
> unused=ixgbe,igb_uio
> > :84:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> > :87:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> > :8b:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> > :8e:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> >
> >
> > bruce at silpixa00372841:dpdk.org$ sudo ./x86_64-native-linuxapp-
> gcc/app/testpmd -c F00 -n 4 -- -i
> > EAL: Detected lcore 0 as core 0 on socket 0
> > EAL: Detected lcore 1 as core 1 on socket 0
> > EAL: Detected lcore 2 as core 2 on socket 0
> > EAL: Detected lcore 3 as core 3 on socket 0
> > EAL: Detected lcore 4 as core 4 on socket 0
> > EAL: Detected lcore 5 as core 5 on socket 0
> > EAL: Detected lcore 6 as core 6 on socket 0
> > EAL: Detected lcore 7 as core 7 on socket 0
> > EAL: Detected lcore 8 as core 0 on socket 1
> > EAL: Detected lcore 9 as core 1 on socket 1
> > EAL: Detected lcore 10 as core 2 on socket 1
> > EAL: Detected lcore 11 as core 3 on socket 1
> > EAL: Detected lcore 12 as core 4 on socket 1
> > EAL: Detected lcore 13 as core 5 on socket 1
> > EAL: Detected lcore 14 as core 6 on socket 1
> > EAL: Detected lcore 15 as core 7 on socket 1
> > EAL: Detecte

[dpdk-dev] [PATCH v3 0/7] add mtu and flow control handlers

2014-06-17 Thread David Marchand

This patchset introduces 3 new ethdev operations: flow control parameters
retrieval and mtu get/set operations.

Changes since v1:
- compute min rx buffer size at ethdev level (to simplify pmd mtu checks)
- introduce enable_scatter rx mode so that we can advise pmd to configure
  scatter mode
- rework mtu get/set operations (based on Konstantin comments)
- pass checkpatch.pl checks

Changes since v2:
- rebase on top of master
- fix min_rx_buf_size computation (patch 3)
- fix frame size checks for ixgbe so that vlan and double vlan frames can be
  received (patch 5 and 6)
- add a new ETHER_MIN_MTU macro in rte_ether.h (patch 5 and 6)


-- 
David Marchand

David Marchand (3):
  ethdev: add autoneg parameter in flow ctrl accessors
  ethdev: store min rx buffer size
  ethdev: introduce enable_scatter rx mode

Ivan Boule (2):
  ixgbe: add set_mtu to ixgbevf
  app/testpmd: allow to configure mtu

Samuel Gauthier (1):
  ethdev: add mtu accessors

Zijie Pan (1):
  ethdev: retrieve flow control configuration

 app/test-pmd/cmdline.c  |   54 +
 app/test-pmd/config.c   |   13 
 app/test-pmd/testpmd.h  |2 +-
 lib/librte_ether/rte_ethdev.c   |   77 +--
 lib/librte_ether/rte_ethdev.h   |   65 +++-
 lib/librte_ether/rte_ether.h|2 +
 lib/librte_pmd_e1000/em_ethdev.c|   89 +
 lib/librte_pmd_e1000/em_rxtx.c  |5 ++
 lib/librte_pmd_e1000/igb_ethdev.c   |  100 
 lib/librte_pmd_e1000/igb_rxtx.c |   10 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  145 ++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   |   27 ++-
 12 files changed, 573 insertions(+), 16 deletions(-)

-- 
1.7.10.4

[dpdk-dev] [PATCH v3 1/7] ethdev: retrieve flow control configuration

2014-06-17 Thread David Marchand

From: Zijie Pan 

This patch adds a new function in ethdev api to retrieve current flow control
configuration.
This operation has been implemented for rte_em_pmd, rte_igb_pmd and
rte_ixgbe_pmd.

Signed-off-by: Zijie Pan 
Signed-off-by: David Marchand 
---
 lib/librte_ether/rte_ethdev.c   |   16 ++
 lib/librte_ether/rte_ethdev.h   |   24 +--
 lib/librte_pmd_e1000/em_ethdev.c|   44 
 lib/librte_pmd_e1000/igb_ethdev.c   |   44 
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   55 +--
 5 files changed, 179 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 66eb266..9b9d5f6 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1637,6 +1637,22 @@ rte_eth_dev_fdir_set_masks(uint8_t port_id, struct 
rte_fdir_masks *fdir_mask)
 }

 int
+rte_eth_dev_flow_ctrl_get(uint8_t port_id, struct rte_eth_fc_conf *fc_conf)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->flow_ctrl_get, -ENOTSUP);
+   memset(fc_conf, 0, sizeof(*fc_conf));
+   return (*dev->dev_ops->flow_ctrl_get)(dev, fc_conf);
+}
+
+int
 rte_eth_dev_flow_ctrl_set(uint8_t port_id, struct rte_eth_fc_conf *fc_conf)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 1dd1d39..1e0564d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1140,8 +1140,12 @@ typedef int (*fdir_set_masks_t)(struct rte_eth_dev *dev,
struct rte_fdir_masks *fdir_masks);
 /**< @internal Setup flow director masks on an Ethernet device */

+typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
+  struct rte_eth_fc_conf *fc_conf);
+/**< @internal Get current flow control parameter on an Ethernet device */
+
 typedef int (*flow_ctrl_set_t)(struct rte_eth_dev *dev,
-   struct rte_eth_fc_conf *fc_conf);
+  struct rte_eth_fc_conf *fc_conf);
 /**< @internal Setup flow control parameter on an Ethernet device */

 typedef int (*priority_flow_ctrl_set_t)(struct rte_eth_dev *dev,
@@ -1389,6 +1393,7 @@ struct eth_dev_ops {
eth_queue_release_ttx_queue_release;/**< Release TX queue.*/
eth_dev_led_on_t   dev_led_on;/**< Turn on LED. */
eth_dev_led_off_t  dev_led_off;   /**< Turn off LED. */
+   flow_ctrl_get_tflow_ctrl_get; /**< Get flow control. */
flow_ctrl_set_tflow_ctrl_set; /**< Setup flow control. */
priority_flow_ctrl_set_t   priority_flow_ctrl_set; /**< Setup priority 
flow control.*/
eth_mac_addr_remove_t  mac_addr_remove; /**< Remove MAC address */
@@ -2701,6 +2706,21 @@ int  rte_eth_led_on(uint8_t port_id);
 int  rte_eth_led_off(uint8_t port_id);

 /**
+ * Get current status of the Ethernet link flow control for Ethernet device
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param fc_conf
+ *   The pointer to the structure where to store the flow control parameters.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support flow control.
+ *   - (-ENODEV)  if *port_id* invalid.
+ */
+int rte_eth_dev_flow_ctrl_get(uint8_t port_id,
+ struct rte_eth_fc_conf *fc_conf);
+
+/**
  * Configure the Ethernet link flow control for Ethernet device
  *
  * @param port_id
@@ -2715,7 +2735,7 @@ int  rte_eth_led_off(uint8_t port_id);
  *   - (-EIO) if flow control setup failure
  */
 int rte_eth_dev_flow_ctrl_set(uint8_t port_id,
-   struct rte_eth_fc_conf *fc_conf);
+ struct rte_eth_fc_conf *fc_conf);

 /**
  * Configure the Ethernet priority flow control under DCB environment
diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index dc0082f..58efcdf 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -77,6 +77,8 @@ static void eth_em_stats_get(struct rte_eth_dev *dev,
 static void eth_em_stats_reset(struct rte_eth_dev *dev);
 static void eth_em_infos_get(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info);
+static int eth_em_flow_ctrl_get(struct rte_eth_dev *dev,
+   struct rte_eth_fc_conf *fc_conf);
 static int eth_em_flow_ctrl_set(struct rte_eth_dev *dev,
struct rte_eth_fc_conf *fc_conf);
 static int eth_em_interrupt_setup(struct rte_eth_dev *dev);
@@ -153,6 +155,7 @@ static struct eth_dev_ops eth_em_ops = {
.tx_queue_release = eth_em_tx_queue_release,

[dpdk-dev] [PATCH v3 2/7] ethdev: add autoneg parameter in flow ctrl accessors

2014-06-17 Thread David Marchand

Add autoneg field in flow control parameters.
This makes it easier to understand why changing some parameters does not always
have the expected result.

Changing autoneg is not supported at the moment.

Signed-off-by: David Marchand 
---
 lib/librte_ether/rte_ethdev.h   |1 +
 lib/librte_pmd_e1000/em_ethdev.c|3 +++
 lib/librte_pmd_e1000/igb_ethdev.c   |3 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |3 +++
 4 files changed, 10 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 1e0564d..a410afd 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -649,6 +649,7 @@ struct rte_eth_fc_conf {
uint16_t send_xon;/**< Is XON frame need be sent */
enum rte_eth_fc_mode mode;  /**< Link flow control mode */
uint8_t mac_ctrl_frame_fwd; /**< Forward MAC control frames */
+   uint8_t autoneg;  /**< Use Pause autoneg */
 };

 /**
diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index 58efcdf..7913ff0 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -1378,6 +1378,7 @@ eth_em_flow_ctrl_get(struct rte_eth_dev *dev, struct 
rte_eth_fc_conf *fc_conf)
fc_conf->high_water = hw->fc.high_water;
fc_conf->low_water = hw->fc.low_water;
fc_conf->send_xon = hw->fc.send_xon;
+   fc_conf->autoneg = hw->mac.autoneg;

/*
 * Return rx_pause and tx_pause status according to actual setting of
@@ -1422,6 +1423,8 @@ eth_em_flow_ctrl_set(struct rte_eth_dev *dev, struct 
rte_eth_fc_conf *fc_conf)
uint32_t rctl;

hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   if (fc_conf->autoneg != hw->mac.autoneg)
+   return -ENOTSUP;
rx_buf_size = em_get_rx_buffer_size(hw);
PMD_INIT_LOG(DEBUG, "Rx packet buffer size = 0x%x \n", rx_buf_size);

diff --git a/lib/librte_pmd_e1000/igb_ethdev.c 
b/lib/librte_pmd_e1000/igb_ethdev.c
index 92ac4a8..c92b737 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -1871,6 +1871,7 @@ eth_igb_flow_ctrl_get(struct rte_eth_dev *dev, struct 
rte_eth_fc_conf *fc_conf)
fc_conf->high_water = hw->fc.high_water;
fc_conf->low_water = hw->fc.low_water;
fc_conf->send_xon = hw->fc.send_xon;
+   fc_conf->autoneg = hw->mac.autoneg;

/*
 * Return rx_pause and tx_pause status according to actual setting of
@@ -1915,6 +1916,8 @@ eth_igb_flow_ctrl_set(struct rte_eth_dev *dev, struct 
rte_eth_fc_conf *fc_conf)
uint32_t rctl;

hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   if (fc_conf->autoneg != hw->mac.autoneg)
+   return -ENOTSUP;
rx_buf_size = igb_get_rx_buffer_size(hw);
PMD_INIT_LOG(DEBUG, "Rx packet buffer size = 0x%x \n", rx_buf_size);

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index f130080..559d246 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -2309,6 +2309,7 @@ ixgbe_flow_ctrl_get(struct rte_eth_dev *dev, struct 
rte_eth_fc_conf *fc_conf)
fc_conf->high_water = hw->fc.high_water[0];
fc_conf->low_water = hw->fc.low_water[0];
fc_conf->send_xon = hw->fc.send_xon;
+   fc_conf->autoneg = !hw->fc.disable_fc_autoneg;

/*
 * Return rx_pause status according to actual setting of
@@ -2360,6 +2361,8 @@ ixgbe_flow_ctrl_set(struct rte_eth_dev *dev, struct 
rte_eth_fc_conf *fc_conf)
PMD_INIT_FUNC_TRACE();

hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   if (fc_conf->autoneg != !hw->fc.disable_fc_autoneg)
+   return -ENOTSUP;
rx_buf_size = IXGBE_READ_REG(hw, IXGBE_RXPBSIZE(0));
PMD_INIT_LOG(DEBUG, "Rx packet buffer size = 0x%x \n", rx_buf_size);

-- 
1.7.10.4

[dpdk-dev] [PATCH v3 3/7] ethdev: store min rx buffer size

2014-06-17 Thread David Marchand

This avoids code duplication in PMD when dealing with mtu changes.

Signed-off-by: David Marchand 
---
 lib/librte_ether/rte_ethdev.c |   20 +++-
 lib/librte_ether/rte_ethdev.h |3 +++
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 9b9d5f6..9061c7d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -884,6 +884,8 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t 
rx_queue_id,
   const struct rte_eth_rxconf *rx_conf,
   struct rte_mempool *mp)
 {
+   int ret;
+   uint32_t mbp_buf_size;
struct rte_eth_dev *dev;
struct rte_pktmbuf_pool_private *mbp_priv;
struct rte_eth_dev_info dev_info;
@@ -924,13 +926,14 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t 
rx_queue_id,
return (-ENOSPC);
}
mbp_priv = rte_mempool_get_priv(mp);
-   if ((uint32_t) (mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM) <
-   dev_info.min_rx_bufsize) {
+   mbp_buf_size = mbp_priv->mbuf_data_room_size;
+
+   if ((mbp_buf_size - RTE_PKTMBUF_HEADROOM) < dev_info.min_rx_bufsize) {
PMD_DEBUG_TRACE("%s mbuf_data_room_size %d < %d "
"(RTE_PKTMBUF_HEADROOM=%d + min_rx_bufsize(dev)"
"=%d)\n",
mp->name,
-   (int)mbp_priv->mbuf_data_room_size,
+   (int)mbp_buf_size,
(int)(RTE_PKTMBUF_HEADROOM +
  dev_info.min_rx_bufsize),
(int)RTE_PKTMBUF_HEADROOM,
@@ -938,8 +941,15 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t 
rx_queue_id,
return (-EINVAL);
}

-   return (*dev->dev_ops->rx_queue_setup)(dev, rx_queue_id, nb_rx_desc,
-  socket_id, rx_conf, mp);
+   ret = (*dev->dev_ops->rx_queue_setup)(dev, rx_queue_id, nb_rx_desc,
+ socket_id, rx_conf, mp);
+   if (!ret) {
+   if (!dev->data->min_rx_buf_size ||
+   dev->data->min_rx_buf_size > mbp_buf_size)
+   dev->data->min_rx_buf_size = mbp_buf_size;
+   }
+
+   return ret;
 }

 int
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index a410afd..581259d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1515,6 +1515,9 @@ struct rte_eth_dev_data {
struct rte_eth_conf dev_conf;   /**< Configuration applied to device. */
uint16_t max_frame_size;/**< Default is ETHER_MAX_LEN (1518). */

+   uint32_t min_rx_buf_size;
+   /**< Common rx buffer size handled by all queues */
+
uint64_t rx_mbuf_alloc_failed; /**< RX ring mbuf allocation failures. */
struct ether_addr* mac_addrs;/**< Device Ethernet Link address. */
uint64_t mac_pool_sel[ETH_NUM_RECEIVE_MAC_ADDR];
-- 
1.7.10.4

[dpdk-dev] [PATCH v3 4/7] ethdev: introduce enable_scatter rx mode

2014-06-17 Thread David Marchand

We might want to be sure the scatter packets reception handler is selected in a
pmd. This makes it possible to then change mtu later, without the need of
restarting a port.
It is then the pmd duty to tell it enabled the scatter reception handler by
setting dev->data->scattered_rx to 1.

Signed-off-by: David Marchand 
---
 lib/librte_ether/rte_ethdev.h |3 ++-
 lib/librte_pmd_e1000/em_rxtx.c|5 +
 lib/librte_pmd_e1000/igb_rxtx.c   |   10 ++
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c |   10 ++
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 581259d..2b98700 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -307,7 +307,8 @@ struct rte_eth_rxmode {
hw_vlan_strip: 1, /**< VLAN strip enable. */
hw_vlan_extend   : 1, /**< Extended VLAN enable. */
jumbo_frame  : 1, /**< Jumbo Frame Receipt enable. */
-   hw_strip_crc : 1; /**< Enable CRC stripping by hardware. */
+   hw_strip_crc : 1, /**< Enable CRC stripping by hardware. */
+   enable_scatter   : 1; /**< Enable scatter packets rx handler */
 };

 /**
diff --git a/lib/librte_pmd_e1000/em_rxtx.c b/lib/librte_pmd_e1000/em_rxtx.c
index 1575e79..e5f1933 100644
--- a/lib/librte_pmd_e1000/em_rxtx.c
+++ b/lib/librte_pmd_e1000/em_rxtx.c
@@ -1714,6 +1714,11 @@ eth_em_rx_init(struct rte_eth_dev *dev)
}
}

+   if (dev->data->dev_conf.rxmode.enable_scatter) {
+   dev->rx_pkt_burst = eth_em_recv_scattered_pkts;
+   dev->data->scattered_rx = 1;
+   }
+
/*
 * Setup the Checksum Register.
 * Receive Full-Packet Checksum Offload is mutually exclusive with RSS.
diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index 9f0310d..aea898c 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -2008,6 +2008,11 @@ eth_igb_rx_init(struct rte_eth_dev *dev)
E1000_WRITE_REG(hw, E1000_RXDCTL(rxq->reg_idx), rxdctl);
}

+   if (dev->data->dev_conf.rxmode.enable_scatter) {
+   dev->rx_pkt_burst = eth_igb_recv_scattered_pkts;
+   dev->data->scattered_rx = 1;
+   }
+
/*
 * Setup BSIZE field of RCTL register, if needed.
 * Buffer sizes >= 1024 are not [supposed to be] setup in the RCTL
@@ -2277,6 +2282,11 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev)
E1000_WRITE_REG(hw, E1000_RXDCTL(i), rxdctl);
}

+   if (dev->data->dev_conf.rxmode.enable_scatter) {
+   dev->rx_pkt_burst = eth_igb_recv_scattered_pkts;
+   dev->data->scattered_rx = 1;
+   }
+
/*
 * Setup the HW Rx Head and Tail Descriptor Pointers.
 * This needs to be done after enable.
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index ca72e75..f487859 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -3486,6 +3486,11 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
}
}

+   if (dev->data->dev_conf.rxmode.enable_scatter) {
+   dev->rx_pkt_burst = ixgbe_recv_scattered_pkts;
+   dev->data->scattered_rx = 1;
+   }
+
/*
 * Device configured with multiple RX queues.
 */
@@ -3961,6 +3966,11 @@ ixgbevf_dev_rx_init(struct rte_eth_dev *dev)
}
}

+   if (dev->data->dev_conf.rxmode.enable_scatter) {
+   dev->rx_pkt_burst = ixgbe_recv_scattered_pkts;
+   dev->data->scattered_rx = 1;
+   }
+
return 0;
 }

-- 
1.7.10.4

[dpdk-dev] [PATCH v3 5/7] ethdev: add mtu accessors

2014-06-17 Thread David Marchand

From: Samuel Gauthier 

This patch adds two new functions in ethdev api to retrieve current MTU and
change MTU of a port.
Only .mtu_set function is pmd specific.
pmd should update its max_rx_pkt_len if needed.

This operation has been implemented for rte_em_pmd, rte_igb_pmd and
rte_ixgbe_pmd.

Signed-off-by: Samuel Gauthier 
Signed-off-by: Ivan Boule 
Signed-off-by: David Marchand 
---
 lib/librte_ether/rte_ethdev.c   |   41 +--
 lib/librte_ether/rte_ethdev.h   |   34 +-
 lib/librte_ether/rte_ether.h|2 ++
 lib/librte_pmd_e1000/em_ethdev.c|   42 +++
 lib/librte_pmd_e1000/igb_ethdev.c   |   53 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   50 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   |2 +-
 7 files changed, 220 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 9061c7d..7256841 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -201,9 +201,9 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
TAILQ_INIT(&(eth_dev->callbacks));

/*
-* Set the default maximum frame size.
+* Set the default MTU.
 */
-   eth_dev->data->max_frame_size = ETHER_MAX_LEN;
+   eth_dev->data->mtu = ETHER_MTU;

/* Invoke PMD device initialization function */
diag = (*eth_drv->eth_dev_init)(eth_drv, eth_dev);
@@ -1234,6 +1234,43 @@ rte_eth_macaddr_get(uint8_t port_id, struct ether_addr 
*mac_addr)
ether_addr_copy(&dev->data->mac_addrs[0], mac_addr);
 }

+
+int
+rte_eth_dev_get_mtu(uint8_t port_id, uint16_t *mtu)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   *mtu = dev->data->mtu;
+   return 0;
+}
+
+int
+rte_eth_dev_set_mtu(uint8_t port_id, uint16_t mtu)
+{
+   int ret;
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-ENODEV);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mtu_set, -ENOTSUP);
+
+   ret = (*dev->dev_ops->mtu_set)(dev, mtu);
+   if (!ret)
+   dev->data->mtu = mtu;
+
+   return ret;
+}
+
 int
 rte_eth_dev_vlan_filter(uint8_t port_id, uint16_t vlan_id, int on)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2b98700..2406e45 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1071,6 +1071,9 @@ typedef uint32_t (*eth_rx_queue_count_t)(struct 
rte_eth_dev *dev,
 typedef int (*eth_rx_descriptor_done_t)(void *rxq, uint16_t offset);
 /**< @Check DD bit of specific RX descriptor */

+typedef int (*mtu_set_t)(struct rte_eth_dev *dev, uint16_t mtu);
+/**< @internal Set MTU. */
+
 typedef int (*vlan_filter_set_t)(struct rte_eth_dev *dev,
  uint16_t vlan_id,
  int on);
@@ -1378,6 +1381,7 @@ struct eth_dev_ops {
eth_queue_stats_mapping_set_t queue_stats_mapping_set;
/**< Configure per queue stat counter mapping. */
eth_dev_infos_get_tdev_infos_get; /**< Get device info. */
+   mtu_set_t  mtu_set; /**< Set MTU. */
vlan_filter_set_t  vlan_filter_set;  /**< Filter VLAN Setup. */
vlan_tpid_set_tvlan_tpid_set;  /**< Outer VLAN TPID 
Setup. */
vlan_strip_queue_set_t vlan_strip_queue_set; /**< VLAN Stripping on 
queue. */
@@ -1514,7 +1518,7 @@ struct rte_eth_dev_data {
/**< Link-level information & status */

struct rte_eth_conf dev_conf;   /**< Configuration applied to device. */
-   uint16_t max_frame_size;/**< Default is ETHER_MAX_LEN (1518). */
+   uint16_t mtu;   /**< Maximum Transmission Unit. */

uint32_t min_rx_buf_size;
/**< Common rx buffer size handled by all queues */
@@ -2061,6 +2065,34 @@ extern void rte_eth_dev_info_get(uint8_t port_id,
 struct rte_eth_dev_info *dev_info);

 /**
+ * Retrieve the MTU of an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mtu
+ *   A pointer to a uint16_t where the retrieved MTU is to be stored.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port_id* invalid.
+ */
+extern int rte_eth_dev_get_mtu(uint8_t port_id, uint16_t *mtu);
+
+/**
+ * Change the MTU of an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mtu
+ *   A uint16_t for the MTU to be applied.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if operation is not supported.
+ *   - (-ENODEV) if *port_id

[dpdk-dev] [PATCH v3 6/7] ixgbe: add set_mtu to ixgbevf

2014-06-17 Thread David Marchand

From: Ivan Boule 

The support of jumbo frames in the ixgbevf Poll Mode Driver of 10GbE
82599 VF functions consists in the following enhancements:

- Implement the mtu_set function in the ixgbevf PMD, using the IXGBE_VF_SET_LPE
  request of the version 1.0 of the VF/PF mailbox API for this purpose.

- Add a detailed explanation on the VF/PF rx max frame len negotiation.

Signed-off-by: Ivan Boule 
Signed-off-by: David Marchand 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   37 +++
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   |   15 +-
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index fca8fd7..85c4a77 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -220,6 +220,8 @@ static int ixgbe_remove_5tuple_filter(struct rte_eth_dev 
*dev,
 static int ixgbe_get_5tuple_filter(struct rte_eth_dev *dev, uint16_t index,
struct rte_5tuple_filter *filter, uint16_t *rx_queue);

+static int ixgbevf_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu);
+
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -376,6 +378,7 @@ static struct eth_dev_ops ixgbevf_eth_dev_ops = {
.stats_reset  = ixgbevf_dev_stats_reset,
.dev_close= ixgbevf_dev_close,
.dev_infos_get= ixgbe_dev_info_get,
+   .mtu_set  = ixgbevf_dev_set_mtu,
.vlan_filter_set  = ixgbevf_vlan_filter_set,
.vlan_strip_queue_set = ixgbevf_vlan_strip_queue_set,
.vlan_offload_set = ixgbevf_vlan_offload_set,
@@ -3937,6 +3940,40 @@ ixgbe_get_5tuple_filter(struct rte_eth_dev *dev, 
uint16_t index,
return -ENOENT;
 }

+static int
+ixgbevf_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
+{
+   struct ixgbe_hw *hw;
+   uint32_t max_frame = mtu + ETHER_HDR_LEN + ETHER_CRC_LEN;
+
+   hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   if ((mtu < ETHER_MIN_MTU) || (max_frame > ETHER_MAX_JUMBO_FRAME_LEN))
+   return -EINVAL;
+
+   /* refuse mtu that requires the support of scattered packets when this
+* feature has not been enabled before. */
+   if (!dev->data->scattered_rx &&
+   (max_frame + 2 * IXGBE_VLAN_TAG_SIZE >
+dev->data->min_rx_buf_size - RTE_PKTMBUF_HEADROOM))
+   return -EINVAL;
+
+   /*
+* When supported by the underlying PF driver, use the IXGBE_VF_SET_MTU
+* request of the version 2.0 of the mailbox API.
+* For now, use the IXGBE_VF_SET_LPE request of the version 1.0
+* of the mailbox API.
+* This call to IXGBE_SET_LPE action won't work with ixgbe pf drivers
+* prior to 3.11.33 which contains the following change:
+* "ixgbe: Enable jumbo frames support w/ SR-IOV"
+*/
+   ixgbevf_rlpml_set_vf(hw, max_frame);
+
+   /* update max frame size */
+   dev->data->dev_conf.rxmode.max_rx_pkt_len = max_frame;
+   return 0;
+}
+
 static struct rte_driver rte_ixgbe_driver = {
.type = PMD_PDEV,
.init = rte_ixgbe_pmd_init,
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 9b640e5..7f05b26 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -3883,7 +3883,20 @@ ixgbevf_dev_rx_init(struct rte_eth_dev *dev)
PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);

-   /* setup MTU */
+   /*
+* When the VF driver issues a IXGBE_VF_RESET request, the PF driver
+* disables the VF receipt of packets if the PF MTU is > 1500.
+* This is done to deal with 82599 limitations that imposes
+* the PF and all VFs to share the same MTU.
+* Then, the PF driver enables again the VF receipt of packet when
+* the VF driver issues a IXGBE_VF_SET_LPE request.
+* In the meantime, the VF device cannot be used, even if the VF driver
+* and the Guest VM network stack are ready to accept packets with a
+* size up to the PF MTU.
+* As a work-around to this PF behaviour, force the call to
+* ixgbevf_rlpml_set_vf even if jumbo frames are not used. This way,
+* VF packets received can work in all cases.
+*/
ixgbevf_rlpml_set_vf(hw,
(uint16_t)dev->data->dev_conf.rxmode.max_rx_pkt_len);

-- 
1.7.10.4

[dpdk-dev] [PATCH v3 7/7] app/testpmd: allow to configure mtu

2014-06-17 Thread David Marchand

From: Ivan Boule 

Take avantage of the .set_mtu ethdev function and make it possible to configure
MTU on devices using testpmd.

Signed-off-by: Ivan Boule 
Signed-off-by: David Marchand 
---
 app/test-pmd/cmdline.c |   54 
 app/test-pmd/config.c  |   13 
 app/test-pmd/testpmd.h |2 +-
 3 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f64f2df..e3e51fc 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -527,6 +527,8 @@ static void cmd_help_long_parsed(void *parsed_result,
"port config all (txfreet|txrst|rxfreet) (value)\n"
"Set free threshold for rx/tx, or set"
" tx rs bit threshold.\n\n"
+   "port config mtu X value\n"
+   "Set the MTU of port X to a given value\n\n"
);
}

@@ -1086,6 +1088,57 @@ cmdline_parse_inst_t cmd_config_max_pkt_len = {
},
 };

+/* *** configure port MTU *** */
+struct cmd_config_mtu_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t keyword;
+   cmdline_fixed_string_t mtu;
+   uint8_t port_id;
+   uint16_t value;
+};
+
+static void
+cmd_config_mtu_parsed(void *parsed_result,
+ __attribute__((unused)) struct cmdline *cl,
+ __attribute__((unused)) void *data)
+{
+   struct cmd_config_mtu_result *res = parsed_result;
+
+   if (res->value < ETHER_MIN_LEN) {
+   printf("mtu cannot be less than %d\n", ETHER_MIN_LEN);
+   return;
+   }
+   port_mtu_set(res->port_id, res->value);
+}
+
+cmdline_parse_token_string_t cmd_config_mtu_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_config_mtu_result, port,
+"port");
+cmdline_parse_token_string_t cmd_config_mtu_keyword =
+   TOKEN_STRING_INITIALIZER(struct cmd_config_mtu_result, keyword,
+"config");
+cmdline_parse_token_string_t cmd_config_mtu_mtu =
+   TOKEN_STRING_INITIALIZER(struct cmd_config_mtu_result, keyword,
+"mtu");
+cmdline_parse_token_num_t cmd_config_mtu_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_config_mtu_result, port_id, UINT8);
+cmdline_parse_token_num_t cmd_config_mtu_value =
+   TOKEN_NUM_INITIALIZER(struct cmd_config_mtu_result, value, UINT16);
+
+cmdline_parse_inst_t cmd_config_mtu = {
+   .f = cmd_config_mtu_parsed,
+   .data = NULL,
+   .help_str = "port config mtu value",
+   .tokens = {
+   (void *)&cmd_config_mtu_port,
+   (void *)&cmd_config_mtu_keyword,
+   (void *)&cmd_config_mtu_mtu,
+   (void *)&cmd_config_mtu_port_id,
+   (void *)&cmd_config_mtu_value,
+   NULL,
+   },
+};
+
 /* *** configure rx mode *** */
 struct cmd_config_rx_mode_flag {
cmdline_fixed_string_t port;
@@ -6553,6 +6606,7 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *)&cmd_config_speed_all,
(cmdline_parse_inst_t *)&cmd_config_speed_specific,
(cmdline_parse_inst_t *)&cmd_config_rx_tx,
+   (cmdline_parse_inst_t *)&cmd_config_mtu,
(cmdline_parse_inst_t *)&cmd_config_max_pkt_len,
(cmdline_parse_inst_t *)&cmd_config_rx_mode_flag,
(cmdline_parse_inst_t *)&cmd_config_rss,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index b654eb8..0023ab2 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -503,6 +503,19 @@ port_reg_set(portid_t port_id, uint32_t reg_off, uint32_t 
reg_v)
display_port_reg_value(port_id, reg_off, reg_v);
 }

+void
+port_mtu_set(portid_t port_id, uint16_t mtu)
+{
+   int diag;
+
+   if (port_id_is_invalid(port_id))
+   return;
+   diag = rte_eth_dev_set_mtu(port_id, mtu);
+   if (diag == 0)
+   return;
+   printf("Set MTU failed. diag=%d\n", diag);
+}
+
 /*
  * RX/TX ring descriptors display functions.
  */
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index aa1942b..5839f93 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -454,7 +454,7 @@ void fwd_config_setup(void);
 void set_def_fwd_config(void);
 int init_fwd_streams(void);

-
+void port_mtu_set(portid_t port_id, uint16_t mtu);
 void port_reg_bit_display(portid_t port_id, uint32_t reg_off, uint8_t bit_pos);
 void port_reg_bit_set(portid_t port_id, uint32_t reg_off, uint8_t bit_pos,
  uint8_t bit_v);
-- 
1.7.10.4

[dpdk-dev] [PATCH] testpmd: Simplify logic in error branch

2014-06-17 Thread Bruce Richardson

Simplifiy the logic in the error checking branch. Rather than having a
single error branch which checked both RX and TX conditions and made
extensive use of the ? operator, move the error checking explicitly into
the RX and TX individual branches.

The original code caused compilation issues when attempting compilation
with clang on BSD, due to type mismatches. This change fixes the issues
while making the code easier to read and maintain overall.

Signed-off-by: Bruce Richardson 
---
 app/test-pmd/parameters.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index aa0e2bf..3ff4f81 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -316,14 +316,14 @@ parse_queue_stats_mapping_config(const char *q_arg, int 
is_rx)
return -1;
}

-   if (is_rx ? (nb_rx_queue_stats_mappings >= 
MAX_RX_QUEUE_STATS_MAPPINGS) :
-   (nb_tx_queue_stats_mappings >= 
MAX_TX_QUEUE_STATS_MAPPINGS)) {
-   printf("exceeded max number of %s queue statistics 
mappings: %hu\n",
-  is_rx ? "RX" : "TX",
-  is_rx ? nb_rx_queue_stats_mappings : 
nb_tx_queue_stats_mappings);
-   return -1;
-   }
if (!is_rx) {
+   if ((nb_tx_queue_stats_mappings >=
+   MAX_TX_QUEUE_STATS_MAPPINGS)) {
+   printf("exceeded max number of TX queue "
+   "statistics mappings: %hu\n",
+   nb_tx_queue_stats_mappings);
+   return -1;
+   }

tx_queue_stats_mappings_array[nb_tx_queue_stats_mappings].port_id =
(uint8_t)int_fld[FLD_PORT];

tx_queue_stats_mappings_array[nb_tx_queue_stats_mappings].queue_id =
@@ -333,6 +333,13 @@ parse_queue_stats_mapping_config(const char *q_arg, int 
is_rx)
++nb_tx_queue_stats_mappings;
}
else {
+   if ((nb_rx_queue_stats_mappings >=
+   MAX_RX_QUEUE_STATS_MAPPINGS)) {
+   printf("exceeded max number of RX queue "
+   "statistics mappings: %hu\n",
+   nb_rx_queue_stats_mappings);
+   return -1;
+   }

rx_queue_stats_mappings_array[nb_rx_queue_stats_mappings].port_id =
(uint8_t)int_fld[FLD_PORT];

rx_queue_stats_mappings_array[nb_rx_queue_stats_mappings].queue_id =
-- 
1.9.3

[dpdk-dev] [PATCH] EAL: add format(printf) attrib. to appropriate fns

2014-06-17 Thread Bruce Richardson

Mark the rte_log, cmdline_printf and rte_snprintf functions as
being printf-style functions. This causes compilation errors
due to mis-matched parameter types, so the parameter types are
fixed where appropriate.

Signed-off-by: Bruce Richardson 
---
 app/cmdline_test/commands.c|  2 +-
 app/test/test_cmdline_etheraddr.c  |  2 +-
 app/test/test_eal_flags.c  |  3 ++-
 app/test/test_mp_secondary.c   |  3 ++-
 examples/exception_path/main.c |  4 +--
 examples/netmap_compat/lib/compat_netmap.c |  8 +++---
 examples/qos_sched/args.c  | 12 ++---
 examples/qos_sched/init.c  | 17 ++---
 examples/qos_sched/main.c  | 34 ++
 lib/librte_cmdline/cmdline.h   |  3 ++-
 lib/librte_eal/common/include/rte_log.h|  3 ++-
 lib/librte_eal/common/include/rte_string_fns.h |  3 ++-
 lib/librte_kni/rte_kni.c   |  8 +++---
 13 files changed, 57 insertions(+), 45 deletions(-)

diff --git a/app/cmdline_test/commands.c b/app/cmdline_test/commands.c
index 66c8fb9..404f51a 100644
--- a/app/cmdline_test/commands.c
+++ b/app/cmdline_test/commands.c
@@ -321,7 +321,7 @@ cmd_get_history_bufsize_parsed(__attribute__((unused)) void 
*parsed_result,
struct cmdline *cl,
__attribute__((unused)) void *data)
 {
-   cmdline_printf(cl, "History buffer size: %u\n",
+   cmdline_printf(cl, "History buffer size: %zu\n",
sizeof(cl->rdl.history_buf));
 }

diff --git a/app/test/test_cmdline_etheraddr.c 
b/app/test/test_cmdline_etheraddr.c
index c67a0a5..739f249 100644
--- a/app/test/test_cmdline_etheraddr.c
+++ b/app/test/test_cmdline_etheraddr.c
@@ -147,7 +147,7 @@ test_parse_etheraddr_invalid_param(void)

/* copy string to buffer */
rte_snprintf(buf, sizeof(buf), "%s",
-   ether_addr_valid_strs[0]);
+   ether_addr_valid_strs[0].str);

ret = cmdline_parse_etheraddr(NULL, buf, NULL);
if (ret == -1) {
diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index ea4a567..2401556 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -268,7 +268,8 @@ get_current_prefix(char * prefix, int size)

/* copy string all the way from second char up to start of _config */
rte_snprintf(prefix, size, "%.*s",
-   strnlen(buf, sizeof(buf)) - sizeof("_config"), &buf[1]);
+   (int)(strnlen(buf, sizeof(buf)) - sizeof("_config")),
+   &buf[1]);

return prefix;
 }
diff --git a/app/test/test_mp_secondary.c b/app/test/test_mp_secondary.c
index 9d7d28e..5ec99a2 100644
--- a/app/test/test_mp_secondary.c
+++ b/app/test/test_mp_secondary.c
@@ -103,7 +103,8 @@ get_current_prefix(char * prefix, int size)

/* copy string all the way from second char up to start of _config */
rte_snprintf(prefix, size, "%.*s",
-   strnlen(buf, sizeof(buf)) - sizeof("_config"), &buf[1]);
+   (int)(strnlen(buf, sizeof(buf)) - sizeof("_config")),
+   &buf[1]);

return prefix;
 }
diff --git a/examples/exception_path/main.c b/examples/exception_path/main.c
index 78ed91d..3ece2ac 100644
--- a/examples/exception_path/main.c
+++ b/examples/exception_path/main.c
@@ -229,7 +229,7 @@ static int tap_create(char *name)
ifr.ifr_flags = IFF_TAP | IFF_NO_PI;

if (name && *name)
-   rte_snprintf(ifr.ifr_name, IFNAMSIZ, name);
+   rte_snprintf(ifr.ifr_name, IFNAMSIZ, "%s", name);

ret = ioctl(fd, TUNSETIFF, (void *) &ifr);
if (ret < 0) {
@@ -238,7 +238,7 @@ static int tap_create(char *name)
}

if (name)
-   rte_snprintf(name, IFNAMSIZ, ifr.ifr_name);
+   rte_snprintf(name, IFNAMSIZ, "%s", ifr.ifr_name);

return fd;
 }
diff --git a/examples/netmap_compat/lib/compat_netmap.c 
b/examples/netmap_compat/lib/compat_netmap.c
index 946fab4..190151e 100644
--- a/examples/netmap_compat/lib/compat_netmap.c
+++ b/examples/netmap_compat/lib/compat_netmap.c
@@ -717,8 +717,8 @@ rte_netmap_init_port(uint8_t portid, const struct 
rte_netmap_port_conf *conf)

if (ret < 0) {
RTE_LOG(ERR, USER1,
-   "Couldn't configure TX queue %hhu of "
-   "port %hu\n",
+   "Couldn't configure TX queue %"PRIu16" of "
+   "port %"PRIu8"\n",
i, portid);
return (ret);
}
@@ -728,8 +728,8 @@ rte_netmap_init_port(uint8_t portid, const struct 
rte_netmap_port_conf *conf)

if (ret < 0) {
RTE_LOG(ERR, USER1,
-   "Couldn't confi

[dpdk-dev] vfio detection

2014-06-17 Thread Neil Horman

On Tue, Jun 17, 2014 at 05:45:55PM +, Richardson, Bruce wrote:
> 
> 
> > -Original Message-
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Tuesday, June 17, 2014 10:42 AM
> > To: Richardson, Bruce
> > Cc: Burakov, Anatoly; dev at dpdk.org
> > Subject: Re: [dpdk-dev] vfio detection
> > 
> > On Tue, Jun 17, 2014 at 04:38:38PM +, Richardson, Bruce wrote:
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Richardson, 
> > > > Bruce
> > > > Sent: Tuesday, June 17, 2014 9:29 AM
> > > > To: Burakov, Anatoly; dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] vfio detection
> > > >
> > > > > -Original Message-
> > > > > From: Burakov, Anatoly
> > > > > Sent: Tuesday, June 17, 2014 1:40 AM
> > > > > To: Richardson, Bruce; dev at dpdk.org
> > > > > Subject: RE: vfio detection
> > > > >
> > > > > Hi Bruce,
> > > > >
> > > > > > I have a number of NIC ports which were working correctly yesterday 
> > > > > > and
> > > > are
> > > > > > bound correctly to the igb_uio driver - and I want to keep using 
> > > > > > them
> > > > > > through the igb_uio driver for now, not vfio. However, whenever I 
> > > > > > run a
> > > > > > dpdk application today, I find that the vfio kernel module is 
> > > > > > getting
> > loaded
> > > > > > each time - even after I manually remove it, and verify that it has 
> > > > > > been
> > > > > > removed by checking lsmod. Is this expected? If so, why are we 
> > > > > > loading
> > the
> > > > > > vfio driver when I just want to continue using igb_uio which works 
> > > > > > fine?
> > > > >
> > > > > Can you elaborate a bit on what do you mean by "loading vfio driver"? 
> > > > > Do
> > you
> > > > > mean the vfio-pci kernel gets loaded by DPDK? I certainly didn't put 
> > > > > in any
> > code
> > > > > that would automatically load that driver, and certainly not binding 
> > > > > devices
> > to
> > > > it.
> > > >
> > > > The kernel module called just "vfio" is constantly getting reloaded, 
> > > > and there
> > is
> > > > always a "/dev/vfio" directory, which triggers the vfio code handling 
> > > > every
> > time I
> > > > run dpdk.
> > > >
> > > > >
> > > > > > Secondly, then, when testpmd or any other app loads, it 
> > > > > > automatically
> > tries
> > > > > > to map the NIC using vfio and then aborts on the very first NIC 
> > > > > > port when
> > it
> > > > > > fails to do so.
> > > > >
> > > > > This shouldn't happen, unless you have a device bound to VFIO and have
> > > > another
> > > > > device in the same IOMMU group that is bound to something else. Can 
> > > > > you
> > > > > provide a log of what you are seeing?
> > > >
> > > > Log of testpmd run attached.
> > >
> > > Log got stripped from mail, including below instead.
> > >
> > > Script started on Tue 17 Jun 2014 17:23:54 IST
> > >
> > > bruce at silpixa00372841:dpdk.org$ ./tools/dpdk_nic_bind.py --status
> > >
> > > Network devices using DPDK-compatible driver
> > > 
> > > :84:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> > > :87:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> > > :8b:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> > > :8e:00.0 'Ethernet Server Adapter X520-4' drv=igb_uio unused=ixgbe
> > >
> > > Network devices using kernel driver
> > > ===
> > > :04:00.0 'I350 Gigabit Network Connection' if=em0 drv=igb
> > unused=igb_uio *Active*
> > > :04:00.1 'I350 Gigabit Network Connection' if=ens2f1 drv=igb
> > unused=igb_uio
> > > :04:00.2 'I350 Gigabit Network Connection' if=ens2f2 drv=igb
> > unused=igb_uio
> > > :04:00.3 'I350 Gigabit Network Connection' if=ens2f3 drv=igb
> > unused=igb_uio
> > >
> > > Other network devices
> > > =
> > > :0a:00.1 'DH8900CC Null Device' unused=igb_uio
> > > :0b:00.1 'DH8900CC Null Device' unused=igb_uio
> > > :0c:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
> > unused=ixgbe,igb_uio
> > > :0c:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
> > unused=ixgbe,igb_uio
> > > :84:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> > > :87:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> > > :8b:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> > > :8e:00.1 'Ethernet Server Adapter X520-4' unused=ixgbe,igb_uio
> > >
> > >
> > > bruce at silpixa00372841:dpdk.org$ sudo ./x86_64-native-linuxapp-
> > gcc/app/testpmd -c F00 -n 4 -- -i
> > > EAL: Detected lcore 0 as core 0 on socket 0
> > > EAL: Detected lcore 1 as core 1 on socket 0
> > > EAL: Detected lcore 2 as core 2 on socket 0
> > > EAL: Detected lcore 3 as core 3 on socket 0
> > > EAL: Detected lcore 4 as core 4 on socket 0
> > > EAL: Detected lcore 5 as core 5 on socket 0
> > > EAL: Detected lcore 6 as core 6 on socket 0
> > > EAL: Detected lcore 7 as core 7 on socket 0
> > > EAL: Detec

[dpdk-dev] [PATCH] vfio: correct system call error checking

2014-06-17 Thread Neil Horman

Noticed today that ioctl error code return checking was incorrect in some of the
vfio code.  ioctl can return a negative value if the system detects an error
before the target device/driver can produce a return code.  The dpdk vfio code
only checks specfically for the values that it expects, which leaves it open to
accepting unexpected error codes as success.  For instance, if the vfio layer
noted that the iommu driver hadn't finished registering yet, it would return an
-EINVAL error code, but the dpdk would accept that as success, becuase it wasn't
0.

Fix this to specifically check for < 0 error codes

Signed-off-by: Neil Horman 
CC: Thomas Monjalon 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 4de6061..65aa8ad 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -319,16 +319,16 @@ pci_vfio_get_container_fd(void)

/* check VFIO API version */
ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
-   if (ret != VFIO_API_VERSION) {
-   RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
+   if ((ret < 0) || (ret != VFIO_API_VERSION)) {
+   RTE_LOG(ERR, EAL, "  unknown VFIO API version! errno = 
%d\n", errno);
close(vfio_container_fd);
return -1;
}

/* check if we support IOMMU type 1 */
ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION, 
VFIO_TYPE1_IOMMU);
-   if (!ret) {
-   RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
+   if (ret <= 0) {
+   RTE_LOG(ERR, EAL, "  unknown IOMMU driver! errno = 
%d\n", errno);
close(vfio_container_fd);
return -1;
}
-- 
1.8.3.1

[dpdk-dev] DPDK Support for the i217 ?

2014-06-17 Thread Palin, Francois

Hi all,



Just checking once more on a question I have asked back in January:



We would like to know if DPDK support for the i217 will be provided anytime 
soon.

The Supported NICs list doesn't show the i217. Intel DPDK Release Notes don't 
mention the i217 either.



The answer I received back then was:

Support for the i217 will be available in the Intel DPDK version 1.7.  Not sure 
of the timeline, but next release.

Is this still in the plans for version 1.7, if that's the case, any idea about 
the time for release 1.7 ?



Thank you !



Fran?ois Palin

[dpdk-dev] [PATCH] vfio: correct system call error checking

2014-06-17 Thread Richardson, Bruce

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> Sent: Tuesday, June 17, 2014 12:04 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] vfio: correct system call error checking
> 
> Noticed today that ioctl error code return checking was incorrect in some of 
> the
> vfio code.  ioctl can return a negative value if the system detects an error
> before the target device/driver can produce a return code.  The dpdk vfio code
> only checks specfically for the values that it expects, which leaves it open 
> to
> accepting unexpected error codes as success.  For instance, if the vfio layer
> noted that the iommu driver hadn't finished registering yet, it would return 
> an
> -EINVAL error code, but the dpdk would accept that as success, becuase it
> wasn't
> 0.
> 
> Fix this to specifically check for < 0 error codes
> 
> Signed-off-by: Neil Horman 
> CC: Thomas Monjalon 
> ---
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> index 4de6061..65aa8ad 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> @@ -319,16 +319,16 @@ pci_vfio_get_container_fd(void)
> 
>   /* check VFIO API version */
>   ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
> - if (ret != VFIO_API_VERSION) {
> - RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
> + if ((ret < 0) || (ret != VFIO_API_VERSION)) {
> + RTE_LOG(ERR, EAL, "  unknown VFIO API version! errno
> = %d\n", errno);
>   close(vfio_container_fd);
>   return -1;
>   }

Not sure how this change improves things, since the existing check will already 
trigger an error on all values <0. Can you please clarify why you think this 
needs to be changed?

> 
>   /* check if we support IOMMU type 1 */
>   ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION,
> VFIO_TYPE1_IOMMU);
> - if (!ret) {
> - RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
> + if (ret <= 0) {
> + RTE_LOG(ERR, EAL, "  unknown IOMMU driver! errno =
> %d\n", errno);
>   close(vfio_container_fd);
>   return -1;
>   }

Ack on this change part. The previously code was incorrect according to what I 
read in the docs for VFIO.

/Bruce

[dpdk-dev] [PATCH] vfio: correct system call error checking

2014-06-17 Thread Neil Horman

On Tue, Jun 17, 2014 at 08:21:29PM +, Richardson, Bruce wrote:
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> > Sent: Tuesday, June 17, 2014 12:04 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] vfio: correct system call error checking
> > 
> > Noticed today that ioctl error code return checking was incorrect in some 
> > of the
> > vfio code.  ioctl can return a negative value if the system detects an error
> > before the target device/driver can produce a return code.  The dpdk vfio 
> > code
> > only checks specfically for the values that it expects, which leaves it 
> > open to
> > accepting unexpected error codes as success.  For instance, if the vfio 
> > layer
> > noted that the iommu driver hadn't finished registering yet, it would 
> > return an
> > -EINVAL error code, but the dpdk would accept that as success, becuase it
> > wasn't
> > 0.
> > 
> > Fix this to specifically check for < 0 error codes
> > 
> > Signed-off-by: Neil Horman 
> > CC: Thomas Monjalon 
> > ---
> >  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > index 4de6061..65aa8ad 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > @@ -319,16 +319,16 @@ pci_vfio_get_container_fd(void)
> > 
> > /* check VFIO API version */
> > ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
> > -   if (ret != VFIO_API_VERSION) {
> > -   RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
> > +   if ((ret < 0) || (ret != VFIO_API_VERSION)) {
> > +   RTE_LOG(ERR, EAL, "  unknown VFIO API version! errno
> > = %d\n", errno);
> > close(vfio_container_fd);
> > return -1;
> > }
> 
> Not sure how this change improves things, since the existing check will 
> already trigger an error on all values <0. Can you please clarify why you 
> think this needs to be changed?
Ah, my bad, the ret < 0 is superfulous, as the != already catches it, but the
log message change is valuable in that it differentiates bad API version
detection from other system errors.  I can respin that if you like.
Neil

> 
> > 
> > /* check if we support IOMMU type 1 */
> > ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION,
> > VFIO_TYPE1_IOMMU);
> > -   if (!ret) {
> > -   RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
> > +   if (ret <= 0) {
> > +   RTE_LOG(ERR, EAL, "  unknown IOMMU driver! errno =
> > %d\n", errno);
> > close(vfio_container_fd);
> > return -1;
> > }
> 
> Ack on this change part. The previously code was incorrect according to what 
> I read in the docs for VFIO.
> 
> /Bruce
>

[dpdk-dev] [PATCH] vfio: correct system call error checking

2014-06-17 Thread Richardson, Bruce

> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Tuesday, June 17, 2014 1:40 PM
> To: Richardson, Bruce
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] vfio: correct system call error checking
> 
> On Tue, Jun 17, 2014 at 08:21:29PM +, Richardson, Bruce wrote:
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Neil Horman
> > > Sent: Tuesday, June 17, 2014 12:04 PM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] [PATCH] vfio: correct system call error checking
> > >
> > > Noticed today that ioctl error code return checking was incorrect in some 
> > > of
> the
> > > vfio code.  ioctl can return a negative value if the system detects an 
> > > error
> > > before the target device/driver can produce a return code.  The dpdk vfio
> code
> > > only checks specfically for the values that it expects, which leaves it 
> > > open to
> > > accepting unexpected error codes as success.  For instance, if the vfio 
> > > layer
> > > noted that the iommu driver hadn't finished registering yet, it would 
> > > return
> an
> > > -EINVAL error code, but the dpdk would accept that as success, becuase it
> > > wasn't
> > > 0.
> > >
> > > Fix this to specifically check for < 0 error codes
> > >
> > > Signed-off-by: Neil Horman 
> > > CC: Thomas Monjalon 
> > > ---
> > >  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 8 
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > > b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > > index 4de6061..65aa8ad 100644
> > > --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> > > @@ -319,16 +319,16 @@ pci_vfio_get_container_fd(void)
> > >
> > >   /* check VFIO API version */
> > >   ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
> > > - if (ret != VFIO_API_VERSION) {
> > > - RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
> > > + if ((ret < 0) || (ret != VFIO_API_VERSION)) {
> > > + RTE_LOG(ERR, EAL, "  unknown VFIO API version! errno
> > > = %d\n", errno);
> > >   close(vfio_container_fd);
> > >   return -1;
> > >   }
> >
> > Not sure how this change improves things, since the existing check will 
> > already
> trigger an error on all values <0. Can you please clarify why you think this 
> needs
> to be changed?
> Ah, my bad, the ret < 0 is superfulous, as the != already catches it, but the
> log message change is valuable in that it differentiates bad API version
> detection from other system errors.  I can respin that if you like.
> Neil

Perhaps a respin separating out ioctl errors vs version errors might be good, 
giving different error messages for each case.

/Bruce

[dpdk-dev] [PATCH v3 0/7] add mtu and flow control handlers

2014-06-17 Thread Ananyev, Konstantin


> This patchset introduces 3 new ethdev operations: flow control parameters
> retrieval and mtu get/set operations.
> 
> Changes since v1:
> - compute min rx buffer size at ethdev level (to simplify pmd mtu checks)
> - introduce enable_scatter rx mode so that we can advise pmd to configure
>   scatter mode
> - rework mtu get/set operations (based on Konstantin comments)
> - pass checkpatch.pl checks
> 
> Changes since v2:
> - rebase on top of master
> - fix min_rx_buf_size computation (patch 3)
> - fix frame size checks for ixgbe so that vlan and double vlan frames can be
>   received (patch 5 and 6)
> - add a new ETHER_MIN_MTU macro in rte_ether.h (patch 5 and 6)
> 
> 
> --
> David Marchand
> 
> David Marchand (3):
>   ethdev: add autoneg parameter in flow ctrl accessors
>   ethdev: store min rx buffer size
>   ethdev: introduce enable_scatter rx mode
> 
> Ivan Boule (2):
>   ixgbe: add set_mtu to ixgbevf
>   app/testpmd: allow to configure mtu
> 
> Samuel Gauthier (1):
>   ethdev: add mtu accessors
> 
> Zijie Pan (1):
>   ethdev: retrieve flow control configuration
> 
>  app/test-pmd/cmdline.c  |   54 +
>  app/test-pmd/config.c   |   13 
>  app/test-pmd/testpmd.h  |2 +-
>  lib/librte_ether/rte_ethdev.c   |   77 +--
>  lib/librte_ether/rte_ethdev.h   |   65 +++-
>  lib/librte_ether/rte_ether.h|2 +
>  lib/librte_pmd_e1000/em_ethdev.c|   89 +
>  lib/librte_pmd_e1000/em_rxtx.c  |5 ++
>  lib/librte_pmd_e1000/igb_ethdev.c   |  100 
>  lib/librte_pmd_e1000/igb_rxtx.c |   10 +++
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  145 
> ++-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c   |   27 ++-
>  12 files changed, 573 insertions(+), 16 deletions(-)
> 
> --

Acked-by: Konstantin Ananyev

[dpdk-dev] [PATCH 0/8] virtio driver phase 2

2014-06-17 Thread Stephen Hemminger

On Fri, 13 Jun 2014 18:06:17 -0700
Stephen Hemminger  wrote:

> This is second group of patches for cleaning up virtio driver
> prior to the functionality changes in next phase.
> 
> 

Ping.. no comments 

I have a lot of patches stacked and ready or in testing.
And am trying to pace them out a rate that review is actually possible.

67 matches

Mail list logo