[dpdk-dev] [PATCH] net/mlx5: fix a wrong error handler

2017-02-28 Thread Nelio Laranjeiro
The returned error code was wrongly handled causing the PMD to refuse to
start.

Fixes: 91572d2a0b1a ("net/mlx5: fix startup when flow cannot be applied")

CC: sta...@dpdk.org
Signed-off-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_trigger.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 683b57c..0acbf28 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -87,7 +87,7 @@ mlx5_dev_start(struct rte_eth_dev *dev)
if (dev->data->dev_conf.fdir_conf.mode != RTE_FDIR_MODE_NONE)
priv_fdir_enable(priv);
err = priv_flow_start(priv);
-   if (!err) {
+   if (err) {
priv->started = 0;
ERROR("%p: an error occurred while configuring flows:"
  " %s",
-- 
2.1.4



Re: [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version

2017-02-28 Thread Jan Blunck
On Wed, Feb 22, 2017 at 2:24 PM, Christian Ehrhardt
 wrote:
> --- a/mk/rte.lib.mk
> +++ b/mk/rte.lib.mk
> @@ -40,6 +40,12 @@ EXTLIB_BUILD ?= n
>  # VPATH contains at least SRCDIR
>  VPATH += $(SRCDIR)
>
> +ifneq ($(CONFIG_RTE_MAJOR_ABI),)
> +ifneq ($(LIBABIVER),)
> +LIBABIVER := $(CONFIG_RTE_MAJOR_ABI)
> +endif
> +endif
> +
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
>  LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
>  ifeq ($(EXTLIB_BUILD),n)
> @@ -156,11 +162,7 @@ $(RTE_OUTPUT)/lib/$(LIB): $(LIB)
> @[ -d $(RTE_OUTPUT)/lib ] || mkdir -p $(RTE_OUTPUT)/lib
> $(Q)cp -f $(LIB) $(RTE_OUTPUT)/lib
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> -ifeq ($(CONFIG_RTE_NEXT_ABI)$(EXTLIB_BUILD),yn)
> -   $(Q)ln -s -f $< $(basename $(basename $@))
> -else
> -   $(Q)ln -s -f $< $(basename $@)
> -endif
> +   $(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')
>  endif
>

In case CONFIG_RTE_NEXT_ABI=y is set this is actually generating
shared objects with suffix:

  .so.$(CONFIG_RTE_MAJOR_ABI).1

I don't think that this is the intention.


Re: [dpdk-dev] [PATCH v3 00/10] Rework vdev probing to use rte_bus infrastructure

2017-02-28 Thread Shreyansh Jain

On Monday 27 February 2017 06:39 PM, Jan Blunck wrote:

On Sat, Feb 25, 2017 at 11:28 AM, Jan Blunck  wrote:

With the rte_bus infrastructure present in 17.02 it is possible to refactor
the virtual device probing into a bus. This series also introduces the
rte_vdev_device to better keep track of devices.

This patchset depends on:
http://dpdk.org/dev/patchwork/patch/20416/
http://dpdk.org/dev/patchwork/patch/20417/

Changes since version 2:
 * implicit bus registration through rte_eal_vdrv_register()


On a second thought I don't think that this is correct though since it
opens up the possibility of racing an alternative "virtual" bus. I
don't think that this is a good thing though. I'll fix this in v4.

Thoughts?


I prefer the RTE_REGISTER* way.
The issue of duplicate bus remains whether we use the macro or the
implicit way.

If you use RTE_*, do you think that duplicate registration issue
is worth fixing?
(It would mean rte_bus_register to return error which the caller
would then need to handle).



Thanks,
Jan


 * explicit delay probing of virtual bus in rte_bus_probe()
 * addition of rte_vdev_device_args() helper
 * make virtual driver probe and remove take rte_vdev_device

Changes since version 1:
 * addition of rte_vdev_device_name() helper
 * removed rte_eal_dev_init() from *.map files
 * use SOCKET_ID_ANY

Jan Blunck (10):
  eal: probe legacy PCI devices before other bus devices
  eal: probe new virtual bus after other bus devices
  eal: move virtual device probing into a bus
  eal: remove unused rte_eal_dev_init()
  eal: Refactor vdev driver probe/remove
  eal: add struct rte_vdev_device
  eal: add virtual device name helper function
  eal: add virtual device arguments helper function
  eal: make virtual bus use rte_vdev_device
  eal: make virtual driver probe and remove take rte_vdev_device

 drivers/crypto/null/null_crypto_pmd.c   |  18 +-
 drivers/net/af_packet/rte_eth_af_packet.c   |  11 +-
 drivers/net/bonding/rte_eth_bond_pmd.c  |  13 +-
 drivers/net/mpipe/mpipe_tilegx.c|  10 +-
 drivers/net/null/rte_eth_null.c |  13 +-
 drivers/net/pcap/rte_eth_pcap.c |  12 +-
 drivers/net/ring/rte_eth_ring.c |   9 +-
 drivers/net/tap/rte_eth_tap.c   |  10 +-
 drivers/net/vhost/rte_eth_vhost.c   |  10 +-
 drivers/net/virtio/virtio_user_ethdev.c |  18 +-
 drivers/net/xenvirt/rte_eth_xenvirt.c   |   9 +-
 lib/librte_eal/bsdapp/eal/eal.c |   9 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   1 -
 lib/librte_eal/common/eal_common_bus.c  |  16 +-
 lib/librte_eal/common/eal_common_dev.c  |  28 ---
 lib/librte_eal/common/eal_common_vdev.c | 259 +---
 lib/librte_eal/common/include/rte_dev.h |   5 -
 lib/librte_eal/common/include/rte_vdev.h|  26 ++-
 lib/librte_eal/linuxapp/eal/eal.c   |   9 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   1 -
 20 files changed, 355 insertions(+), 132 deletions(-)

--
2.7.4







[dpdk-dev] Pktgen-DPDK build errors on ppc64le

2017-02-28 Thread Rahul Lakkireddy
Hi All,

We are seeing Pktgen-DPDK not building on ppc64le with latest dpdk git
repo.

Below build errors are seen:
-
[pktgen-dpdk]# make
== lib
== common
  CC l2p.o
In file included from /root/pktgen-dpdk/lib/common/l2p.c:88:0:
/root/pktgen-dpdk/lib/common/l2p.h: In function ‘pg_raw_dump_l2p’:
/root/pktgen-dpdk/lib/common/l2p.h:103:2: error: comparison is always true due 
to limited range of data type [-Werror=type-limits]
  for (i = 0; i  < RTE_MAX_LCORE; i++)
  ^
/root/pktgen-dpdk/lib/common/l2p.h:109:2: error: comparison is always true due 
to limited range of data type [-Werror=type-limits]
  for (j = 0; j < RTE_MAX_LCORE; j++) {
  ^
/root/pktgen-dpdk/lib/common/l2p.h: In function ‘pg_dump_l2p’:
/root/pktgen-dpdk/lib/common/l2p.h:486:2: error: comparison is always true due 
to limited range of data type [-Werror=type-limits]
  for (lid = 0; lid < RTE_MAX_LCORE; lid++) {
  ^
/root/pktgen-dpdk/lib/common/l2p.c: In function ‘pg_port_matrix_dump’:
/root/pktgen-dpdk/lib/common/l2p.c:477:3: error: large integer implicitly 
truncated to unsigned type [-Werror=overflow]
   cnt.rxtx = get_map(l2p, pid, RTE_MAX_LCORE);
   ^
/root/pktgen-dpdk/lib/common/l2p.c:488:3: error: large integer implicitly 
truncated to unsigned type [-Werror=overflow]
   cnt.rxtx = get_map(l2p, pid, RTE_MAX_LCORE);
   ^
cc1: all warnings being treated as errors
make[3]: *** [l2p.o] Error 1
make[2]: *** [all] Error 2
make[1]: *** [common] Error 2
make: *** [lib] Error 2
-

By changing CONFIG_RTE_MAX_LCORE to 128 in defconfig_ppc_64-power8-linuxapp-gcc
file, Pktgen-DPDK then builds fine.

Thanks,
Rahul


Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization

2017-02-28 Thread Jan Blunck
On Fri, Feb 24, 2017 at 3:00 PM, Olivier Matz  wrote:
>
> In my opinion, if we have the room in the first cache line, we should
> put it there. The only argument I see against is "we may find something
> more important in the future, and we won't have room for it in the
> first cache line". I don't feel we should penalize today's use cases for
> hypothetic future use cases.
>
>
>
>> 2. timestamp normalization point
>>  inside PMD RX vs somewhere later as user needs it (extra
>> function in dev_ops?).
>
> This point could be changed. My initial proposition tries to provide a
> generic API for timestamp. Let me remind it here:
>
> a- the timestamp is in nanosecond
> b- the reference is always the same for a given path: if the timestamp
>is set in a PMD, all the packets for this PMD will have the same
>reference, but for 2 different PMDs (or a sw lib), the reference
>would not be the same.
>
> We may remove a-, and just have:
>  - the reference and the unit are always the same for a given path: if
>the timestamp is set in a PMD, all the packets for this PMD will have
>the same reference and unit, but for 2 different PMDs (or a sw lib),
>they would not be the same.
>
> In both cases, we would need a conversion code (maybe in a library) if
> the application wants to work with timestamps from several sources. The
> second solution removes the normalization code in the PMD when not
> needed, it is probably better.

I agree.

>
> About having the timestamp in the packet data, I don't think it is
> a good solution for a generic API in DPDK. The timestamp is a metadata,
> it has to go in the mbuf metadata. The packet data should not be
> modified when the timestamp is enabled.

Good NICs already do that based on the packet type (e.g. NTP/PTP packets).

>
> But this would not prevent to have driver-specific features to do that.
> In that case, the application will be aware that it is using this
> specific driver and that it will receive a timestamp in the packet data.
>
> To summarize, the generic API could be:
> - an ethdev API to enable the timestamp in a PMD for received packets
> - a mbuf flag "timestamp present"
> - a mbuf 64b field to store the timestamp value
>
> Additionally, a driver-specific API can be added for a given PMD.
> Example:
> - the generic timestamp ethdev is disabled (or not supported)
> - a driver-specific feature "put timestamp in packet" is enabled
> It would have no additional cost compared to what we have today, since
> the timestamp in mbuf is not read/written.
>

Thanks for the writeup. This sounds like a reasonable approach to me.

Do you still want to call the 64bit field "timestamp" or rename it to
something neutral and document that it is used together with the mbuf
flags?


Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization

2017-02-28 Thread Ananyev, Konstantin
Hi everyone,

> >
> > In my opinion, if we have the room in the first cache line, we should
> > put it there. The only argument I see against is "we may find something
> > more important in the future, and we won't have room for it in the
> > first cache line". I don't feel we should penalize today's use cases for
> > hypothetic future use cases.
> >
> >
> >
> >> 2. timestamp normalization point
> >>  inside PMD RX vs somewhere later as user needs it (extra
> >> function in dev_ops?).
> >
> > This point could be changed. My initial proposition tries to provide a
> > generic API for timestamp. Let me remind it here:
> >
> > a- the timestamp is in nanosecond
> > b- the reference is always the same for a given path: if the timestamp
> >is set in a PMD, all the packets for this PMD will have the same
> >reference, but for 2 different PMDs (or a sw lib), the reference
> >would not be the same.
> >
> > We may remove a-, and just have:
> >  - the reference and the unit are always the same for a given path: if
> >the timestamp is set in a PMD, all the packets for this PMD will have
> >the same reference and unit, but for 2 different PMDs (or a sw lib),
> >they would not be the same.
> >
> > In both cases, we would need a conversion code (maybe in a library) if
> > the application wants to work with timestamps from several sources. The
> > second solution removes the normalization code in the PMD when not
> > needed, it is probably better.
> 
> I agree.

One question - does that mean that application would need to
keep a track from what PMD each particular packet came to do the normalization?
Konstantin

> 
> >
> > About having the timestamp in the packet data, I don't think it is
> > a good solution for a generic API in DPDK. The timestamp is a metadata,
> > it has to go in the mbuf metadata. The packet data should not be
> > modified when the timestamp is enabled.
> 
> Good NICs already do that based on the packet type (e.g. NTP/PTP packets).
> 
> >
> > But this would not prevent to have driver-specific features to do that.
> > In that case, the application will be aware that it is using this
> > specific driver and that it will receive a timestamp in the packet data.
> >
> > To summarize, the generic API could be:
> > - an ethdev API to enable the timestamp in a PMD for received packets
> > - a mbuf flag "timestamp present"
> > - a mbuf 64b field to store the timestamp value
> >
> > Additionally, a driver-specific API can be added for a given PMD.
> > Example:
> > - the generic timestamp ethdev is disabled (or not supported)
> > - a driver-specific feature "put timestamp in packet" is enabled
> > It would have no additional cost compared to what we have today, since
> > the timestamp in mbuf is not read/written.
> >
> 
> Thanks for the writeup. This sounds like a reasonable approach to me.
> 
> Do you still want to call the 64bit field "timestamp" or rename it to
> something neutral and document that it is used together with the mbuf
> flags?


Re: [dpdk-dev] [PATCH v3 00/10] Rework vdev probing to use rte_bus infrastructure

2017-02-28 Thread Jan Blunck
On Tue, Feb 28, 2017 at 9:48 AM, Shreyansh Jain  wrote:
> On Monday 27 February 2017 06:39 PM, Jan Blunck wrote:
>>
>> On Sat, Feb 25, 2017 at 11:28 AM, Jan Blunck 
>> wrote:
>>>
>>> With the rte_bus infrastructure present in 17.02 it is possible to
>>> refactor
>>> the virtual device probing into a bus. This series also introduces the
>>> rte_vdev_device to better keep track of devices.
>>>
>>> This patchset depends on:
>>> http://dpdk.org/dev/patchwork/patch/20416/
>>> http://dpdk.org/dev/patchwork/patch/20417/
>>>
>>> Changes since version 2:
>>>  * implicit bus registration through rte_eal_vdrv_register()
>>
>>
>> On a second thought I don't think that this is correct though since it
>> opens up the possibility of racing an alternative "virtual" bus. I
>> don't think that this is a good thing though. I'll fix this in v4.
>>
>> Thoughts?
>
>
> I prefer the RTE_REGISTER* way.
> The issue of duplicate bus remains whether we use the macro or the
> implicit way.
>
> If you use RTE_*, do you think that duplicate registration issue
> is worth fixing?
> (It would mean rte_bus_register to return error which the caller
> would then need to handle).
>

I don't think that there is a good way to handle this if we use the
library constructors. It is better to fail the registration of the bus
and log the error via RTE_LOG(). That introduces a dependency on the
initialization of the logging system to be available before the
library constructors run (before main()).


[dpdk-dev] [PATCH] net/mlx5: fix flow mark action handling

2017-02-28 Thread Nelio Laranjeiro
Mark value is always reported even when not requested or invalid.

Fixes: ea3bc3b1df94 ("net/mlx5: support mark flow action")

CC: sta...@dpdk.org
Reported-by: Mark Bloch 
Signed-off-by: Nelio Laranjeiro 
Acked-by: Adrien Mazarguil 
---
 drivers/net/mlx5/mlx5_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index b2b7223..e152776 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1431,7 +1431,7 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, 
uint16_t pkts_n)
}
if (rxq->mark &&
((cqe->sop_drop_qpn !=
- htonl(MLX5_FLOW_MARK_INVALID)) ||
+ htonl(MLX5_FLOW_MARK_INVALID)) &&
 (cqe->sop_drop_qpn !=
  htonl(MLX5_FLOW_MARK_DEFAULT {
pkt->hash.fdir.hi =
-- 
2.1.4



Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization

2017-02-28 Thread Olivier Matz
Hi,

On Tue, 28 Feb 2017 09:05:07 +, "Ananyev, Konstantin"
 wrote:
> Hi everyone,
> 
> > >
> > > In my opinion, if we have the room in the first cache line, we
> > > should put it there. The only argument I see against is "we may
> > > find something more important in the future, and we won't have
> > > room for it in the first cache line". I don't feel we should
> > > penalize today's use cases for hypothetic future use cases.
> > >
> > >
> > >  
> > >> 2. timestamp normalization point
> > >>  inside PMD RX vs somewhere later as user needs it (extra
> > >> function in dev_ops?).  
> > >
> > > This point could be changed. My initial proposition tries to
> > > provide a generic API for timestamp. Let me remind it here:
> > >
> > > a- the timestamp is in nanosecond
> > > b- the reference is always the same for a given path: if the
> > > timestamp is set in a PMD, all the packets for this PMD will have
> > > the same reference, but for 2 different PMDs (or a sw lib), the
> > > reference would not be the same.
> > >
> > > We may remove a-, and just have:
> > >  - the reference and the unit are always the same for a given
> > > path: if the timestamp is set in a PMD, all the packets for this
> > > PMD will have the same reference and unit, but for 2 different
> > > PMDs (or a sw lib), they would not be the same.
> > >
> > > In both cases, we would need a conversion code (maybe in a
> > > library) if the application wants to work with timestamps from
> > > several sources. The second solution removes the normalization
> > > code in the PMD when not needed, it is probably better.  
> > 
> > I agree.  
> 
> One question - does that mean that application would need to
> keep a track from what PMD each particular packet came to do the
> normalization? Konstantin

I'd say yes. It does not look very difficult to do, since the mbuf
contains the input port id.


> > > About having the timestamp in the packet data, I don't think it is
> > > a good solution for a generic API in DPDK. The timestamp is a
> > > metadata, it has to go in the mbuf metadata. The packet data
> > > should not be modified when the timestamp is enabled.  
> > 
> > Good NICs already do that based on the packet type (e.g. NTP/PTP
> > packets). 
> > >
> > > But this would not prevent to have driver-specific features to do
> > > that. In that case, the application will be aware that it is
> > > using this specific driver and that it will receive a timestamp
> > > in the packet data.
> > >
> > > To summarize, the generic API could be:
> > > - an ethdev API to enable the timestamp in a PMD for received
> > > packets
> > > - a mbuf flag "timestamp present"
> > > - a mbuf 64b field to store the timestamp value
> > >
> > > Additionally, a driver-specific API can be added for a given PMD.
> > > Example:
> > > - the generic timestamp ethdev is disabled (or not supported)
> > > - a driver-specific feature "put timestamp in packet" is enabled
> > > It would have no additional cost compared to what we have today,
> > > since the timestamp in mbuf is not read/written.
> > >  
> > 
> > Thanks for the writeup. This sounds like a reasonable approach to
> > me.
> > 
> > Do you still want to call the 64bit field "timestamp" or rename it
> > to something neutral and document that it is used together with the
> > mbuf flags?  

I think timestamp is a good name. In the current RFC patchset, we have
this comment:

/** Valid if PKT_RX_TIMESTAMP is set. The unit is nanoseconds */
uint64_t timestamp;

We could change it to something like:

/** Valid if PKT_RX_TIMESTAMP is set. The unit and time
 *  reference are not normalized but are always the same
 *  for a given port.
 */
uint64_t timestamp;


Regards,
Olivier


Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization

2017-02-28 Thread Jan Blunck
On Tue, Feb 28, 2017 at 10:05 AM, Ananyev, Konstantin
 wrote:
>> >
>> > In both cases, we would need a conversion code (maybe in a library) if
>> > the application wants to work with timestamps from several sources. The
>> > second solution removes the normalization code in the PMD when not
>> > needed, it is probably better.
>>
>> I agree.
>
> One question - does that mean that application would need to
> keep a track from what PMD each particular packet came to do the 
> normalization?

Yes. You usually do this based on mbuf->port.


Re: [dpdk-dev] [PATCH v3 00/10] Rework vdev probing to use rte_bus infrastructure

2017-02-28 Thread Shreyansh Jain
> -Original Message-
> From: jblu...@gmail.com [mailto:jblu...@gmail.com] On Behalf Of Jan Blunck
> Sent: Tuesday, February 28, 2017 2:49 PM
> To: Shreyansh Jain 
> Cc: dev ; David Marchand ; Ferruh
> Yigit 
> Subject: Re: [PATCH v3 00/10] Rework vdev probing to use rte_bus
> infrastructure
> 
> On Tue, Feb 28, 2017 at 9:48 AM, Shreyansh Jain 
> wrote:
> > On Monday 27 February 2017 06:39 PM, Jan Blunck wrote:
> >>
> >> On Sat, Feb 25, 2017 at 11:28 AM, Jan Blunck 
> >> wrote:
> >>>
> >>> With the rte_bus infrastructure present in 17.02 it is possible to
> >>> refactor
> >>> the virtual device probing into a bus. This series also introduces the
> >>> rte_vdev_device to better keep track of devices.
> >>>
> >>> This patchset depends on:
> >>> http://dpdk.org/dev/patchwork/patch/20416/
> >>> http://dpdk.org/dev/patchwork/patch/20417/
> >>>
> >>> Changes since version 2:
> >>>  * implicit bus registration through rte_eal_vdrv_register()
> >>
> >>
> >> On a second thought I don't think that this is correct though since it
> >> opens up the possibility of racing an alternative "virtual" bus. I
> >> don't think that this is a good thing though. I'll fix this in v4.
> >>
> >> Thoughts?
> >
> >
> > I prefer the RTE_REGISTER* way.
> > The issue of duplicate bus remains whether we use the macro or the
> > implicit way.
> >
> > If you use RTE_*, do you think that duplicate registration issue
> > is worth fixing?
> > (It would mean rte_bus_register to return error which the caller
> > would then need to handle).
> >
> 
> I don't think that there is a good way to handle this if we use the
> library constructors. It is better to fail the registration of the bus
> and log the error via RTE_LOG(). That introduces a dependency on the
> initialization of the logging system to be available before the
> library constructors run (before main()).

Agree. Log dependency is the tricky part. Log subsystem is in turn
dependent on users input for level.


Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization

2017-02-28 Thread Jan Blunck
On Tue, Feb 28, 2017 at 10:23 AM, Olivier Matz  wrote:
>> >
>> > Do you still want to call the 64bit field "timestamp" or rename it
>> > to something neutral and document that it is used together with the
>> > mbuf flags?
>
> I think timestamp is a good name. In the current RFC patchset, we have
> this comment:
>
> /** Valid if PKT_RX_TIMESTAMP is set. The unit is nanoseconds */
> uint64_t timestamp;
>
> We could change it to something like:
>
> /** Valid if PKT_RX_TIMESTAMP is set. The unit and time
>  *  reference are not normalized but are always the same
>  *  for a given port.
>  */
> uint64_t timestamp;
>

Looks good to me.

Thanks,
Jan


Re: [dpdk-dev] [PATCH v2 11/15] net/avp: packet receive functions

2017-02-28 Thread Bruce Richardson
On Mon, Feb 27, 2017 at 05:06:24PM +, Legacy, Allain wrote:
> 
> > -Original Message-
> > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > On Sun, 26 Feb 2017 14:08:59 -0500
> > Allain Legacy  wrote:
> > 
> > Try not to break error messages onto two lines, it makes it harder when a
> > user is trying to find the location of the error message with search tools
> That's our normal practice, but checkpatches.sh flagged them as warnings.   
> If these are acceptable exceptions to the tool then are there plans to 
> publish an exhaustive list of checkpatches.sh errors/warnings/checks that can 
> be ignored?  Alternatively, are there plans to change the tools to ignore 
> these automatically and not flag them as issues.   As a contributor it is 
> difficult to tell what will be deemed acceptable and what will not (in some 
> cases) so the default position is to fix all issues before submission. 
>
In my experience, checkpatch ignores long lines that are due to error
messages. Perhaps you need to put the error message on a separate line,
if other things before the message are of significant size.

/Bruce


Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization

2017-02-28 Thread Ananyev, Konstantin


> 
> Hi,
> 
> On Tue, 28 Feb 2017 09:05:07 +, "Ananyev, Konstantin"
>  wrote:
> > Hi everyone,
> >
> > > >
> > > > In my opinion, if we have the room in the first cache line, we
> > > > should put it there. The only argument I see against is "we may
> > > > find something more important in the future, and we won't have
> > > > room for it in the first cache line". I don't feel we should
> > > > penalize today's use cases for hypothetic future use cases.
> > > >
> > > >
> > > >
> > > >> 2. timestamp normalization point
> > > >>  inside PMD RX vs somewhere later as user needs it (extra
> > > >> function in dev_ops?).
> > > >
> > > > This point could be changed. My initial proposition tries to
> > > > provide a generic API for timestamp. Let me remind it here:
> > > >
> > > > a- the timestamp is in nanosecond
> > > > b- the reference is always the same for a given path: if the
> > > > timestamp is set in a PMD, all the packets for this PMD will have
> > > > the same reference, but for 2 different PMDs (or a sw lib), the
> > > > reference would not be the same.
> > > >
> > > > We may remove a-, and just have:
> > > >  - the reference and the unit are always the same for a given
> > > > path: if the timestamp is set in a PMD, all the packets for this
> > > > PMD will have the same reference and unit, but for 2 different
> > > > PMDs (or a sw lib), they would not be the same.
> > > >
> > > > In both cases, we would need a conversion code (maybe in a
> > > > library) if the application wants to work with timestamps from
> > > > several sources. The second solution removes the normalization
> > > > code in the PMD when not needed, it is probably better.
> > >
> > > I agree.
> >
> > One question - does that mean that application would need to
> > keep a track from what PMD each particular packet came to do the
> > normalization? Konstantin
> 
> I'd say yes. It does not look very difficult to do, since the mbuf
> contains the input port id.
> 

I understand that we can use mbuf->port here, but it means that we'll
introduce new implicit dependency between timestamp and port values.
>From my point that introduces new implications:
1. all PMDs that do set a timestamp would also have to set port value too.
Probably not a big deal as most of PMDs do set port value anyway right now,
but it means it would be hard to get rid/change mbuf->port in future.
2. Applications would not allowed to change mbuf->port value before 
normalization is done
(from what I heard some apps do update mbuf->port to store routing 
decisions).  
BTW, how the app would keep track which mbufs were already normalized, and 
which were not?
3. In theory with eth_dev_detach() - mbuf->port value might be not valid at the 
point when application
 would decide to do normalization. 

So to me all that approach with delayed normalization seems unnecessary 
overcomplicated.
Original one suggested by Olivier, when normalization is done in PMD at RX look
much cleaner and more manageable. 
Konstantin


Re: [dpdk-dev] [RFC 17.05 v1 0/3] Merge l3fwd-acl and l3fwd

2017-02-28 Thread Ananyev, Konstantin
Hi Ravi,

> 
> Thanks to Konstantin and Bruce on first internal review comments. This
> patch is RFC for 17.05 to merge l3fwd-acl and l3fwd code and add file
> read options to build LPM and EM tables.


Thanks for the patch, I think it is really useful one.
Can I suggest you re-submit it as non-RFC now, as we are in 17.05 window 
already?
About the patch itself, one question I forgot to ask you before:

+++ b/examples/l3fwd/main.c
@@ -161,7 +163,9 @@  static struct rte_eth_conf port_conf = {
.rx_adv_conf = {
.rss_conf = {
.rss_key = NULL,
-   .rss_hf = ETH_RSS_IP,
+   .rss_hf = ETH_RSS_IP | ETH_RSS_UDP |
+   ETH_RSS_TCP | ETH_RSS_SCTP,
+
},
},


Why it is necessary to change RSS hash input values?

As another nit - there are few checkpatch warnings, that probably need
to be addressed.
Apart from that looks good to me.
Thanks
Konstantin




Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization

2017-02-28 Thread Olivier Matz
On Tue, 28 Feb 2017 10:29:41 +, "Ananyev, Konstantin"
 wrote:
> > 
> > Hi,
> > 
> > On Tue, 28 Feb 2017 09:05:07 +, "Ananyev, Konstantin"
> >  wrote:  
> > > Hi everyone,
> > >  
> > > > >
> > > > > In my opinion, if we have the room in the first cache line, we
> > > > > should put it there. The only argument I see against is "we
> > > > > may find something more important in the future, and we won't
> > > > > have room for it in the first cache line". I don't feel we
> > > > > should penalize today's use cases for hypothetic future use
> > > > > cases.
> > > > >
> > > > >
> > > > >  
> > > > >> 2. timestamp normalization point
> > > > >>  inside PMD RX vs somewhere later as user needs it (extra
> > > > >> function in dev_ops?).  
> > > > >
> > > > > This point could be changed. My initial proposition tries to
> > > > > provide a generic API for timestamp. Let me remind it here:
> > > > >
> > > > > a- the timestamp is in nanosecond
> > > > > b- the reference is always the same for a given path: if the
> > > > > timestamp is set in a PMD, all the packets for this PMD will
> > > > > have the same reference, but for 2 different PMDs (or a sw
> > > > > lib), the reference would not be the same.
> > > > >
> > > > > We may remove a-, and just have:
> > > > >  - the reference and the unit are always the same for a given
> > > > > path: if the timestamp is set in a PMD, all the packets for
> > > > > this PMD will have the same reference and unit, but for 2
> > > > > different PMDs (or a sw lib), they would not be the same.
> > > > >
> > > > > In both cases, we would need a conversion code (maybe in a
> > > > > library) if the application wants to work with timestamps from
> > > > > several sources. The second solution removes the normalization
> > > > > code in the PMD when not needed, it is probably better.  
> > > >
> > > > I agree.  
> > >
> > > One question - does that mean that application would need to
> > > keep a track from what PMD each particular packet came to do the
> > > normalization? Konstantin  
> > 
> > I'd say yes. It does not look very difficult to do, since the mbuf
> > contains the input port id.
> >   
>
> I understand that we can use mbuf->port here, but it means that we'll
> introduce new implicit dependency between timestamp and port values.
> From my point that introduces new implications:
> 1. all PMDs that do set a timestamp would also have to set port value too.
> Probably not a big deal as most of PMDs do set port value anyway right 
> now,
> but it means it would be hard to get rid/change mbuf->port in future.

Currently, all PMDs must set m->port.
If in the future we remove m->port, the applications that use it will need
to store the value in a mbuf metadata, or pass it as arguments through function
calls.


> 2. Applications would not allowed to change mbuf->port value before 
> normalization is done
> (from what I heard some apps do update mbuf->port to store routing 
> decisions).  
> BTW, how the app would keep track which mbufs were already normalized, 
> and which were not?

I don't think it should be allowed to change m->port value. Applications that
are doing this are responsible of what they change.


> 3. In theory with eth_dev_detach() - mbuf->port value might be not valid at 
> the point when application
>  would decide to do normalization. 
>
> So to me all that approach with delayed normalization seems unnecessary 
> overcomplicated.
> Original one suggested by Olivier, when normalization is done in PMD at RX 
> look
> much cleaner and more manageable. 

Detaching a device requires a synchronization between control and data plane,
and not only for this use case. In the first solution, the normalization is
partial: unit is nanosecond, but the time reference is different.

So, after the discussion, I'm more convinced by the second solution.


Regards,
Olivier


Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting

2017-02-28 Thread Jerin Jacob
On Thu, Feb 23, 2017 at 05:23:54PM +, Bruce Richardson wrote:
> Users compiling DPDK should not need to know or care about the arrangement
> of cachelines in the rte_ring structure. Therefore just remove the build
> option and set the structures to be always split. For improved
> performance use 128B rather than 64B alignment since it stops the producer
> and consumer data being on adjacent cachelines.
> 
> Signed-off-by: Bruce Richardson 
> ---
>  config/common_base | 1 -
>  doc/guides/rel_notes/release_17_05.rst | 6 ++
>  lib/librte_ring/rte_ring.c | 2 --
>  lib/librte_ring/rte_ring.h | 8 ++--
>  4 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/config/common_base b/config/common_base
> index aeee13e..099ffda 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
>  #
>  CONFIG_RTE_LIBRTE_RING=y
>  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
>  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
>  
>  #
> diff --git a/doc/guides/rel_notes/release_17_05.rst 
> b/doc/guides/rel_notes/release_17_05.rst
> index e25ea9f..ea45e0c 100644
> --- a/doc/guides/rel_notes/release_17_05.rst
> +++ b/doc/guides/rel_notes/release_17_05.rst
> @@ -110,6 +110,12 @@ API Changes
> Also, make sure to start the actual text at the margin.
> =
>  
> +* **Reworked rte_ring library**
> +
> +  The rte_ring library has been reworked and updated. The following changes
> +  have been made to it:
> +
> +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
>  
>  ABI Changes
>  ---
> diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> index ca0a108..4bc6da1 100644
> --- a/lib/librte_ring/rte_ring.c
> +++ b/lib/librte_ring/rte_ring.c
> @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, 
> unsigned count,
>   /* compilation-time checks */
>   RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> RTE_CACHE_LINE_MASK) != 0);
> -#ifdef RTE_RING_SPLIT_PROD_CONS
>   RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> RTE_CACHE_LINE_MASK) != 0);
> -#endif
>   RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> RTE_CACHE_LINE_MASK) != 0);
>  #ifdef RTE_LIBRTE_RING_DEBUG
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index 72ccca5..04fe667 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -168,7 +168,7 @@ struct rte_ring {
>   uint32_t mask;   /**< Mask (size-1) of ring. */
>   volatile uint32_t head;  /**< Producer head. */
>   volatile uint32_t tail;  /**< Producer tail. */
> - } prod __rte_cache_aligned;
> + } prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);

I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
size of 128B

> + } prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);


>  
>   /** Ring consumer status. */
>   struct cons {
> @@ -177,11 +177,7 @@ struct rte_ring {
>   uint32_t mask;   /**< Mask (size-1) of ring. */
>   volatile uint32_t head;  /**< Consumer head. */
>   volatile uint32_t tail;  /**< Consumer tail. */
> -#ifdef RTE_RING_SPLIT_PROD_CONS
> - } cons __rte_cache_aligned;
> -#else
> - } cons;
> -#endif
> + } cons __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
>  
>  #ifdef RTE_LIBRTE_RING_DEBUG
>   struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
> -- 
> 2.9.3
> 


Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization

2017-02-28 Thread Ananyev, Konstantin
> 
> On Tue, 28 Feb 2017 10:29:41 +, "Ananyev, Konstantin"
>  wrote:
> > >
> > > Hi,
> > >
> > > On Tue, 28 Feb 2017 09:05:07 +, "Ananyev, Konstantin"
> > >  wrote:
> > > > Hi everyone,
> > > >
> > > > > >
> > > > > > In my opinion, if we have the room in the first cache line, we
> > > > > > should put it there. The only argument I see against is "we
> > > > > > may find something more important in the future, and we won't
> > > > > > have room for it in the first cache line". I don't feel we
> > > > > > should penalize today's use cases for hypothetic future use
> > > > > > cases.
> > > > > >
> > > > > >
> > > > > >
> > > > > >> 2. timestamp normalization point
> > > > > >>  inside PMD RX vs somewhere later as user needs it (extra
> > > > > >> function in dev_ops?).
> > > > > >
> > > > > > This point could be changed. My initial proposition tries to
> > > > > > provide a generic API for timestamp. Let me remind it here:
> > > > > >
> > > > > > a- the timestamp is in nanosecond
> > > > > > b- the reference is always the same for a given path: if the
> > > > > > timestamp is set in a PMD, all the packets for this PMD will
> > > > > > have the same reference, but for 2 different PMDs (or a sw
> > > > > > lib), the reference would not be the same.
> > > > > >
> > > > > > We may remove a-, and just have:
> > > > > >  - the reference and the unit are always the same for a given
> > > > > > path: if the timestamp is set in a PMD, all the packets for
> > > > > > this PMD will have the same reference and unit, but for 2
> > > > > > different PMDs (or a sw lib), they would not be the same.
> > > > > >
> > > > > > In both cases, we would need a conversion code (maybe in a
> > > > > > library) if the application wants to work with timestamps from
> > > > > > several sources. The second solution removes the normalization
> > > > > > code in the PMD when not needed, it is probably better.
> > > > >
> > > > > I agree.
> > > >
> > > > One question - does that mean that application would need to
> > > > keep a track from what PMD each particular packet came to do the
> > > > normalization? Konstantin
> > >
> > > I'd say yes. It does not look very difficult to do, since the mbuf
> > > contains the input port id.
> > >
> >
> > I understand that we can use mbuf->port here, but it means that we'll
> > introduce new implicit dependency between timestamp and port values.
> > From my point that introduces new implications:
> > 1. all PMDs that do set a timestamp would also have to set port value too.
> > Probably not a big deal as most of PMDs do set port value anyway right 
> > now,
> > but it means it would be hard to get rid/change mbuf->port in future.
> 
> Currently, all PMDs must set m->port.
> If in the future we remove m->port, the applications that use it will need
> to store the value in a mbuf metadata, or pass it as arguments through 
> function
> calls.
> 
> 
> > 2. Applications would not allowed to change mbuf->port value before 
> > normalization is done
> > (from what I heard some apps do update mbuf->port to store routing 
> > decisions).
> > BTW, how the app would keep track which mbufs were already normalized, 
> > and which were not?
> 
> I don't think it should be allowed to change m->port value.

As far as I know it is allowed right now.
PMD RX routine sets mbuf->port, after that application is free to use it
in a way it likes.
What we are introducing here is basically a new dependency between 2
mbuf fields and new restriction. 

Another thing that doesn't look very convenient to me here -
We can have 2 different values of timestamp (both normalized and not)
and there is no clear way for the application to know which one is in use right 
now.
So each app writer would have to come-up with his own solution.   

> Applications that
> are doing this are responsible of what they change.
> 
> 
> > 3. In theory with eth_dev_detach() - mbuf->port value might be not valid at 
> > the point when application
> >  would decide to do normalization.
> >
> > So to me all that approach with delayed normalization seems unnecessary 
> > overcomplicated.
> > Original one suggested by Olivier, when normalization is done in PMD at RX 
> > look
> > much cleaner and more manageable.
> 
> Detaching a device requires a synchronization between control and data plane,
> and not only for this use case. 

Of course it does.
But right now it is possible to do:

eth_rx_burst(port=0, ..., &mbuf, 1);
eth_dev_detach(port=0, ...);
...
/*process previously received mbuf */

With what you are proposing it would be not always possible any more.

>In the first solution, the normalization is
> partial: unit is nanosecond, but the time reference is different.

Not sure I get you here...
Konstantin

> 
> So, after the discussion, I'm more convinced by the second solution.
> 
> 
> Regards,
> Olivier


Re: [dpdk-dev] [PATCH v2 02/15] net/avp: public header files

2017-02-28 Thread Jerin Jacob
On Sun, Feb 26, 2017 at 02:08:50PM -0500, Allain Legacy wrote:
> Adds public/exported header files for the AVP PMD.  The AVP device is a
> shared memory based device.  The structures and constants that define the
> method of operation of the device must be visible by both the PMD and the
> host DPDK application.  They must not change without proper version
> controls and updates to both the hypervisor DPDK application and the PMD.
> 
> The hypervisor DPDK application is a Wind River Systems proprietary
> virtual switch.
> 
> Signed-off-by: Allain Legacy 
> Signed-off-by: Matt Peters 
> ---
>  drivers/net/avp/rte_avp_common.h | 424 
> +++
>  drivers/net/avp/rte_avp_fifo.h   | 157 +++
>  2 files changed, 581 insertions(+)
>  create mode 100644 drivers/net/avp/rte_avp_common.h
>  create mode 100644 drivers/net/avp/rte_avp_fifo.h
> 
> diff --git a/drivers/net/avp/rte_avp_common.h 
> b/drivers/net/avp/rte_avp_common.h
> new file mode 100644
> index 000..579d5ac
> --- /dev/null
> +++ b/drivers/net/avp/rte_avp_common.h
> @@ -0,0 +1,424 @@
> +/*-
> + *   This file is provided under a dual BSD/LGPLv2 license.  When using or
> + *   redistributing this file, you may do so under either license.
> + *
> + *   GNU LESSER GENERAL PUBLIC LICENSE
> + *
> + *   Copyright(c) 2010-2013 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2014-2015 Wind River Systems, Inc. All rights reserved.
> + *
> + *   This program is free software; you can redistribute it and/or modify
> + *   it under the terms of version 2.1 of the GNU Lesser General Public 
> License
> + *   as published by the Free Software Foundation.
> + *
> + *   This program is distributed in the hope that it will be useful, but
> + *   WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + *   Lesser General Public License for more details.
> + *
> + *   Contact Information:
> + *   Wind River Systems, Inc.
> + *
> + *
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2013 Intel Corporation. All rights reserved.
> + *   Copyright(c) 2014-2016 Wind River Systems, Inc. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *   * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + *   * Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in
> + * the documentation and/or other materials provided with the
> + * distribution.
> + *   * Neither the name of Intel Corporation nor the names of its
> + * contributors may be used to endorse or promote products derived
> + * from this software without specific prior written permission.
> + *
> + *THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + */
> +
> +#ifndef _RTE_AVP_COMMON_H_
> +#define _RTE_AVP_COMMON_H_
> +
> +#ifdef __KERNEL__
> +#include 
> +#endif
> +
> +/**
> + * AVP name is part of network device name.
> + */
> +#define RTE_AVP_NAMESIZE 32
> +
> +/**
> + * AVP alias is a user-defined value used for lookups from secondary
> + * processes.  Typically, this is a UUID.
> + */
> +#define RTE_AVP_ALIASSIZE 128
> +
> +/**
> + * Memory aligment (cache aligned)
> + */
> +#ifndef RTE_AVP_ALIGNMENT
> +#define RTE_AVP_ALIGNMENT 64

I think we use RTE_CACHE_LINE_SIZE here? PPC and ThunderX1 targets are
cache line size of 128B

> +#endif
> +
> +/*
> + * Request id.
> + */


Re: [dpdk-dev] [PATCH v2 08/15] net/avp: device initialization

2017-02-28 Thread Jerin Jacob
On Sun, Feb 26, 2017 at 02:08:56PM -0500, Allain Legacy wrote:
> Adds support for initialization newly probed AVP PCI devices.  Initial
> queue translations are setup in preparation for device configuration.
> 
> Signed-off-by: Allain Legacy 
> Signed-off-by: Matt Peters 
> ---
>  drivers/net/avp/avp_ethdev.c | 757 
> +++
>  1 file changed, 757 insertions(+)
> 
> diff --git a/drivers/net/avp/avp_ethdev.c b/drivers/net/avp/avp_ethdev.c
> index 6522555..3bbff33 100644
> --- a/drivers/net/avp/avp_ethdev.c
> +++ b/drivers/net/avp/avp_ethdev.c
> @@ -60,6 +60,8 @@
>  #include "avp_logs.h"
>  
>  
> +static int avp_dev_create(struct rte_pci_device *pci_dev,
> +   struct rte_eth_dev *eth_dev);
>  
>  static int eth_avp_dev_init(struct rte_eth_dev *eth_dev);
>  static int eth_avp_dev_uninit(struct rte_eth_dev *eth_dev);
> @@ -103,6 +105,16 @@
>  };
>  
>  
> +/**@{ AVP device flags */
> +#define RTE_AVP_F_PROMISC (1 << 1)
> +#define RTE_AVP_F_CONFIGURED (1 << 2)
> +#define RTE_AVP_F_LINKUP (1 << 3)
> +#define RTE_AVP_F_DETACHED (1 << 4)
> +/**@} */
> +
> +/* Ethernet device validation marker */
> +#define RTE_AVP_ETHDEV_MAGIC 0x92972862

I think, we don't need to add RTE_ for internal flags and PMD APIs etc.
> +
>  /*
>   * Defines the AVP device attributes which are attached to an RTE ethernet
>   * device
> @@ -150,11 +162,726 @@ struct avp_adapter {
>   struct avp_dev avp;
>  } __rte_cache_aligned;
>  
> +
> +/* 32-bit MMIO register write */
> +#define RTE_AVP_WRITE32(_value, _addr) ((*(uint32_t *)_addr) = (_value))
> +
> +/* 32-bit MMIO register read */
> +#define RTE_AVP_READ32(_addr) (*(uint32_t *)(_addr))

Use rte_write32 and rte_read32 API instead.

> +
>  /* Macro to cast the ethernet device private data to a AVP object */
>  #define RTE_AVP_DEV_PRIVATE_TO_HW(adapter) \
>   (&((struct avp_adapter *)adapter)->avp)
>  
>  /*
> + * Defines the structure of a AVP device queue for the purpose of handling 
> the
> + * receive and transmit burst callback functions
> + */


Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting

2017-02-28 Thread Bruce Richardson
On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> On Thu, Feb 23, 2017 at 05:23:54PM +, Bruce Richardson wrote:
> > Users compiling DPDK should not need to know or care about the arrangement
> > of cachelines in the rte_ring structure. Therefore just remove the build
> > option and set the structures to be always split. For improved
> > performance use 128B rather than 64B alignment since it stops the producer
> > and consumer data being on adjacent cachelines.
> > 
> > Signed-off-by: Bruce Richardson 
> > ---
> >  config/common_base | 1 -
> >  doc/guides/rel_notes/release_17_05.rst | 6 ++
> >  lib/librte_ring/rte_ring.c | 2 --
> >  lib/librte_ring/rte_ring.h | 8 ++--
> >  4 files changed, 8 insertions(+), 9 deletions(-)
> > 
> > diff --git a/config/common_base b/config/common_base
> > index aeee13e..099ffda 100644
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> >  #
> >  CONFIG_RTE_LIBRTE_RING=y
> >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> >  
> >  #
> > diff --git a/doc/guides/rel_notes/release_17_05.rst 
> > b/doc/guides/rel_notes/release_17_05.rst
> > index e25ea9f..ea45e0c 100644
> > --- a/doc/guides/rel_notes/release_17_05.rst
> > +++ b/doc/guides/rel_notes/release_17_05.rst
> > @@ -110,6 +110,12 @@ API Changes
> > Also, make sure to start the actual text at the margin.
> > =
> >  
> > +* **Reworked rte_ring library**
> > +
> > +  The rte_ring library has been reworked and updated. The following changes
> > +  have been made to it:
> > +
> > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> >  
> >  ABI Changes
> >  ---
> > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > index ca0a108..4bc6da1 100644
> > --- a/lib/librte_ring/rte_ring.c
> > +++ b/lib/librte_ring/rte_ring.c
> > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, 
> > unsigned count,
> > /* compilation-time checks */
> > RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> >   RTE_CACHE_LINE_MASK) != 0);
> > -#ifdef RTE_RING_SPLIT_PROD_CONS
> > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> >   RTE_CACHE_LINE_MASK) != 0);
> > -#endif
> > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> >   RTE_CACHE_LINE_MASK) != 0);
> >  #ifdef RTE_LIBRTE_RING_DEBUG
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index 72ccca5..04fe667 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -168,7 +168,7 @@ struct rte_ring {
> > uint32_t mask;   /**< Mask (size-1) of ring. */
> > volatile uint32_t head;  /**< Producer head. */
> > volatile uint32_t tail;  /**< Producer tail. */
> > -   } prod __rte_cache_aligned;
> > +   } prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> 
> I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache 
> line
> size of 128B
> 
Sure.

However, can you perhaps try a performance test and check to see if
there is a performance difference between the two values before I change
it? In my tests I see improved performance by having an extra blank
cache-line between the producer and consumer data.

/Bruce


[dpdk-dev] [PATCH v2 0/2] librte_net: add crc computation support

2017-02-28 Thread Jasvinder Singh
In some applications, CRC (Cyclic Redundancy Check) needs to be computed or
updated during packet processing operations. This patchset adds software
implementation of some common standard CRCs (32-bit Ethernet CRC as per
Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]). Two
versions of each 32-bit and 16-bit CRC calculation are proposed.

The first version presents a fast and efficient CRC generation on IA
processors by using the carry-less multiplication instruction – PCLMULQDQ
(i.e SSE4.2 instrinsics). In this implementation, a parallelized folding
approach has been used to first reduce an arbitrary length buffer to a small
fixed size length buffer (16 bytes) with the help of precomputed constants.
The resultant single 16-bytes chunk is further reduced by Barrett reduction
method to generate final CRC value. For more details on the implementation,
see reference [1].

The second version presents the fallback solution to support the CRC
generation without needing any specific support from CPU (for examples-
SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm that
uses precomputed 256 element table as explained in reference[2].

Following APIs have been added;

(i) rte_net_crc_init()   
(ii)rte_net_crc_calc()

The first API (i) initalises the data structures required for CRC computation
and this api should be used only once in the application before using second
API (ii) for 16-bit and 32-bit CRC calculations.

References:
[1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
[2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
http://www.ross.net/crc/download/crc_v3.txt

v2 changes:
- fix build errors for target i686-native-linuxapp-gcc
- fix checkpatch warnings

Notes:
- Build not successful with clang version earlier than 3.7.0 due to
  missing intrinsics. Refer dpdk known issue section for more details.

Jasvinder Singh (2):
  librte_net: add crc init and compute APIs
  app/test: add unit test for CRC computation

 app/test/Makefile  |   2 +
 app/test/test_crc.c| 229 +
 lib/librte_net/Makefile|   2 +
 lib/librte_net/rte_net_crc.c   | 664 +
 lib/librte_net/rte_net_crc.h   | 101 ++
 lib/librte_net/rte_net_version.map |   8 +
 6 files changed, 1006 insertions(+)
 create mode 100644 app/test/test_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h

-- 
2.5.5



[dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs

2017-02-28 Thread Jasvinder Singh
APIs for initialising and computing the crc (16-bit and 32-bit CRCs)
are added. For CRCs calculation, scalar as well as x86 intrinsic(sse4.2)
versions are implemented.

The scalar version is based on generic Look-Up Table(LUT) algorithm,
while x86 intrinsic version uses carry-less multiplication for
fast CRC computation.

Signed-off-by: Jasvinder Singh 
---
 lib/librte_net/Makefile|   2 +
 lib/librte_net/rte_net_crc.c   | 664 +
 lib/librte_net/rte_net_crc.h   | 101 ++
 lib/librte_net/rte_net_version.map |   8 +
 4 files changed, 775 insertions(+)
 create mode 100644 lib/librte_net/rte_net_crc.c
 create mode 100644 lib/librte_net/rte_net_crc.h

diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index 20cf664..41be751 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -39,11 +39,13 @@ EXPORT_MAP := rte_net_version.map
 LIBABIVER := 1
 
 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
 
 DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_eal lib/librte_mbuf
 
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
new file mode 100644
index 000..78a49dd
--- /dev/null
+++ b/lib/librte_net/rte_net_crc.c
@@ -0,0 +1,664 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+/* Macros for printing using RTE_LOG */
+#define RTE_LOGTYPE_CRC RTE_LOGTYPE_USER1
+
+/** CRC polynomials */
+#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
+#define CRC16_CCITT_POLYNOMIAL 0x1021U
+
+typedef int (*rte_net_crc_handler)(struct rte_net_crc_params *);
+
+static int rte_crc16_ccitt_handler(struct rte_net_crc_params *p);
+static int rte_crc32_eth_handler(struct rte_net_crc_params *p);
+static int rte_crc_invalid_handler(struct rte_net_crc_params *p);
+
+static rte_net_crc_handler *handlers;
+
+static rte_net_crc_handler handlers_scalar[] = {
+   [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
+   [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
+   [RTE_NET_CRC_REQS] = rte_crc_invalid_handler,
+};
+
+int
+rte_crc_invalid_handler(__rte_unused struct rte_net_crc_params *p)
+{
+   RTE_LOG(ERR, CRC, "CRC type not supported!\n");
+   return -1;  /* Error */
+}
+
+#if defined RTE_ARCH_X86_64 && defined RTE_MACHINE_CPUFLAG_SSE4_2
+
+#include 
+
+/** PCLMULQDQ CRC computation context structure */
+struct crc_pclmulqdq_ctx {
+   __m128i rk1_rk2;
+   __m128i rk5_rk6;
+   __m128i rk7_rk8;
+};
+
+struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16);
+struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16);
+/**
+ * @brief Performs one folding round
+ *
+ * Logically function operates as follows:
+ * DATA = READ_NEXT_16BYTES();
+ * F1 = LSB8(FOLD)
+ * F2 = MSB8(FOLD)
+ * T1 = CLMUL(F1, RK1)
+ * T2 = CLMUL(F2, RK2)
+ * FOLD = XOR(T1, T2, DATA)
+ *
+ * @param data_block 16 byte data block
+ * @param precomp precomputed rk1 constanst
+ * @param fold running 16 byte folded data
+ *
+ * @

[dpdk-dev] [PATCH v2 2/2] app/test: add unit test for CRC computation

2017-02-28 Thread Jasvinder Singh
This patch provides a set of tests for verifying the functional
correctness of 16-bit and 32-bit CRC APIs.

Signed-off-by: Jasvinder Singh 
---
 app/test/Makefile   |   2 +
 app/test/test_crc.c | 229 
 2 files changed, 231 insertions(+)
 create mode 100644 app/test/test_crc.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 1a5e03d..2a497f7 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c
+
 ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y)
 SRCS-y += test_red.c
 SRCS-y += test_sched.c
diff --git a/app/test/test_crc.c b/app/test/test_crc.c
new file mode 100644
index 000..17c9161
--- /dev/null
+++ b/app/test/test_crc.c
@@ -0,0 +1,229 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include "test.h"
+
+#define MBUF_DATA_SIZE  2048
+#define NB_MBUF 64
+#define CRC_VEC_LEN32
+#define CRC32_VEC_LEN1 1512
+#define CRC32_VEC_LEN2 348
+#define CRC16_VEC_LEN1 12
+#define CRC16_VEC_LEN2 2
+
+/* CRC test vector */
+static const uint8_t crc_vec[CRC_VEC_LEN] = {
+   '0', '1', '2', '3', '4', '5', '6', '7',
+   '8', '9', 'a', 'b', 'c', 'd', 'e', 'f',
+   'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D',
+   'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L',
+};
+
+/* 32-bit CRC test vector */
+static const uint8_t crc32_vec1[12] = {
+   0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F,
+   0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59,
+};
+
+/* 16-bit CRC test vector 1*/
+static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = {
+   0x0D, 0x01, 0x01, 0x23, 0x45, 0x67,
+   0x89, 0x01, 0x23, 0x45, 0x00, 0x01,
+};
+
+/* 16-bit CRC test vector 2*/
+static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = {
+   0x03, 0x3f,
+};
+/** CRC results */
+static const uint32_t crc32_vec_res = 0xb491aab4;
+static const uint32_t crc32_vec1_res = 0xac54d294;
+static const uint32_t crc32_vec2_res = 0xefaae02f;
+static const uint32_t crc16_vec_res = 0x6bec;
+static const uint16_t crc16_vec1_res = 0x8cdd;
+static const uint16_t crc16_vec2_res = 0xec5b;
+
+static int
+crc_calc(struct rte_mempool *mp,
+   const uint8_t *vec,
+   uint32_t vec_len,
+   uint32_t crc_r,
+   enum rte_net_crc_type type)
+{
+   struct rte_net_crc_params p;
+   uint8_t *data;
+
+   /* alloc first mbuf from the mempool */
+   p.mbuf = rte_pktmbuf_alloc(mp);
+   if (p.mbuf == NULL) {
+   printf("rte_pktmbuf_alloc() failed!\n");
+   return -1;
+   }
+
+   /* copy parameters */
+   p.type = type;
+   p.data_offset = 0;
+   p.data_len = vec_len;
+
+   /* append data length to an mbuf */
+   data = (uint8_t *)rte_pktmbuf_append(p.mbuf, p.data_len);
+
+   /* copy ref_vector */
+   rte_memcpy(data, vec, p.data_len);
+
+   /* dump mbuf data on console*/
+   rte_pktmbuf_dump(stdout, p.mbuf, p.data_len);
+
+   /* compute CRC */
+   int ret = rte_net_crc_calc(&p);
+
+   if (crc_r != (uint32_t) ret) {
+   rte_pktmbuf_f

Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting

2017-02-28 Thread Jerin Jacob
On Tue, Feb 28, 2017 at 11:57:03AM +, Bruce Richardson wrote:
> On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> > On Thu, Feb 23, 2017 at 05:23:54PM +, Bruce Richardson wrote:
> > > Users compiling DPDK should not need to know or care about the arrangement
> > > of cachelines in the rte_ring structure. Therefore just remove the build
> > > option and set the structures to be always split. For improved
> > > performance use 128B rather than 64B alignment since it stops the producer
> > > and consumer data being on adjacent cachelines.
> > > 
> > > Signed-off-by: Bruce Richardson 
> > > ---
> > >  config/common_base | 1 -
> > >  doc/guides/rel_notes/release_17_05.rst | 6 ++
> > >  lib/librte_ring/rte_ring.c | 2 --
> > >  lib/librte_ring/rte_ring.h | 8 ++--
> > >  4 files changed, 8 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/config/common_base b/config/common_base
> > > index aeee13e..099ffda 100644
> > > --- a/config/common_base
> > > +++ b/config/common_base
> > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> > >  #
> > >  CONFIG_RTE_LIBRTE_RING=y
> > >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> > >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> > >  
> > >  #
> > > diff --git a/doc/guides/rel_notes/release_17_05.rst 
> > > b/doc/guides/rel_notes/release_17_05.rst
> > > index e25ea9f..ea45e0c 100644
> > > --- a/doc/guides/rel_notes/release_17_05.rst
> > > +++ b/doc/guides/rel_notes/release_17_05.rst
> > > @@ -110,6 +110,12 @@ API Changes
> > > Also, make sure to start the actual text at the margin.
> > > =
> > >  
> > > +* **Reworked rte_ring library**
> > > +
> > > +  The rte_ring library has been reworked and updated. The following 
> > > changes
> > > +  have been made to it:
> > > +
> > > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> > >  
> > >  ABI Changes
> > >  ---
> > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > index ca0a108..4bc6da1 100644
> > > --- a/lib/librte_ring/rte_ring.c
> > > +++ b/lib/librte_ring/rte_ring.c
> > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, 
> > > unsigned count,
> > >   /* compilation-time checks */
> > >   RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> > > RTE_CACHE_LINE_MASK) != 0);
> > > -#ifdef RTE_RING_SPLIT_PROD_CONS
> > >   RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> > > RTE_CACHE_LINE_MASK) != 0);
> > > -#endif
> > >   RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> > > RTE_CACHE_LINE_MASK) != 0);
> > >  #ifdef RTE_LIBRTE_RING_DEBUG
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index 72ccca5..04fe667 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -168,7 +168,7 @@ struct rte_ring {
> > >   uint32_t mask;   /**< Mask (size-1) of ring. */
> > >   volatile uint32_t head;  /**< Producer head. */
> > >   volatile uint32_t tail;  /**< Producer tail. */
> > > - } prod __rte_cache_aligned;
> > > + } prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> > 
> > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache 
> > line
> > size of 128B
> > 
> Sure.
> 
> However, can you perhaps try a performance test and check to see if
> there is a performance difference between the two values before I change
> it? In my tests I see improved performance by having an extra blank
> cache-line between the producer and consumer data.

Sure. Which test are you running to measure the performance difference?
Is it app/test/test_ring_perf.c?

> 
> /Bruce


Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs

2017-02-28 Thread Jerin Jacob
On Tue, Feb 28, 2017 at 12:08:20PM +, Jasvinder Singh wrote:
> APIs for initialising and computing the crc (16-bit and 32-bit CRCs)
> are added. For CRCs calculation, scalar as well as x86 intrinsic(sse4.2)
> versions are implemented.
> 
> The scalar version is based on generic Look-Up Table(LUT) algorithm,
> while x86 intrinsic version uses carry-less multiplication for
> fast CRC computation.
> 
> Signed-off-by: Jasvinder Singh 
> ---
>  lib/librte_net/Makefile|   2 +
>  lib/librte_net/rte_net_crc.c   | 664 
> +
>  lib/librte_net/rte_net_crc.h   | 101 ++
>  lib/librte_net/rte_net_version.map |   8 +
>  4 files changed, 775 insertions(+)
>  create mode 100644 lib/librte_net/rte_net_crc.c
>  create mode 100644 lib/librte_net/rte_net_crc.h
> 
> diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
> index 20cf664..41be751 100644
> --- a/lib/librte_net/Makefile
> +++ b/lib/librte_net/Makefile
> @@ -39,11 +39,13 @@ EXPORT_MAP := rte_net_version.map
>  LIBABIVER := 1
>  
>  SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c
> +SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c
>  
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h
>  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
>  SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
>  
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_eal lib/librte_mbuf
>  
> diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
> new file mode 100644
> index 000..78a49dd
> --- /dev/null
> +++ b/lib/librte_net/rte_net_crc.c
> @@ -0,0 +1,664 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include 
> +#include 
> +
> +/* Macros for printing using RTE_LOG */
> +#define RTE_LOGTYPE_CRC RTE_LOGTYPE_USER1
> +
> +/** CRC polynomials */
> +#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
> +#define CRC16_CCITT_POLYNOMIAL 0x1021U
> +
> +typedef int (*rte_net_crc_handler)(struct rte_net_crc_params *);
> +
> +static int rte_crc16_ccitt_handler(struct rte_net_crc_params *p);
> +static int rte_crc32_eth_handler(struct rte_net_crc_params *p);
> +static int rte_crc_invalid_handler(struct rte_net_crc_params *p);
> +
> +static rte_net_crc_handler *handlers;
> +
> +static rte_net_crc_handler handlers_scalar[] = {
> + [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
> + [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
> + [RTE_NET_CRC_REQS] = rte_crc_invalid_handler,
> +};
> +
> +int
> +rte_crc_invalid_handler(__rte_unused struct rte_net_crc_params *p)
> +{
> + RTE_LOG(ERR, CRC, "CRC type not supported!\n");
> + return -1;  /* Error */
> +}
> +
> +#if defined RTE_ARCH_X86_64 && defined RTE_MACHINE_CPUFLAG_SSE4_2

Could you please abstract the vector function and move the SSE implementation to
separate file for future clean neon and altivec integration.

Reference: lib/librte_lpm/rte_lpm_sse.h


Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization

2017-02-28 Thread Olivier Matz
On Tue, 28 Feb 2017 11:48:20 +, "Ananyev, Konstantin"
 wrote:
> > 
> > On Tue, 28 Feb 2017 10:29:41 +, "Ananyev, Konstantin"
> >  wrote:  
> > > >
> > > > Hi,
> > > >
> > > > On Tue, 28 Feb 2017 09:05:07 +, "Ananyev, Konstantin"
> > > >  wrote:  
> > > > > Hi everyone,
> > > > >  
> > > > > > >
> > > > > > > In my opinion, if we have the room in the first cache
> > > > > > > line, we should put it there. The only argument I see
> > > > > > > against is "we may find something more important in the
> > > > > > > future, and we won't have room for it in the first cache
> > > > > > > line". I don't feel we should penalize today's use cases
> > > > > > > for hypothetic future use cases.
> > > > > > >
> > > > > > >
> > > > > > >  
> > > > > > >> 2. timestamp normalization point
> > > > > > >>  inside PMD RX vs somewhere later as user needs it
> > > > > > >> (extra function in dev_ops?).  
> > > > > > >
> > > > > > > This point could be changed. My initial proposition tries
> > > > > > > to provide a generic API for timestamp. Let me remind it
> > > > > > > here:
> > > > > > >
> > > > > > > a- the timestamp is in nanosecond
> > > > > > > b- the reference is always the same for a given path: if
> > > > > > > the timestamp is set in a PMD, all the packets for this
> > > > > > > PMD will have the same reference, but for 2 different
> > > > > > > PMDs (or a sw lib), the reference would not be the same.
> > > > > > >
> > > > > > > We may remove a-, and just have:
> > > > > > >  - the reference and the unit are always the same for a
> > > > > > > given path: if the timestamp is set in a PMD, all the
> > > > > > > packets for this PMD will have the same reference and
> > > > > > > unit, but for 2 different PMDs (or a sw lib), they would
> > > > > > > not be the same.
> > > > > > >
> > > > > > > In both cases, we would need a conversion code (maybe in a
> > > > > > > library) if the application wants to work with timestamps
> > > > > > > from several sources. The second solution removes the
> > > > > > > normalization code in the PMD when not needed, it is
> > > > > > > probably better.  
> > > > > >
> > > > > > I agree.  
> > > > >
> > > > > One question - does that mean that application would need to
> > > > > keep a track from what PMD each particular packet came to do
> > > > > the normalization? Konstantin  
> > > >
> > > > I'd say yes. It does not look very difficult to do, since the
> > > > mbuf contains the input port id.
> > > >  
> > >
> > > I understand that we can use mbuf->port here, but it means that
> > > we'll introduce new implicit dependency between timestamp and
> > > port values. From my point that introduces new implications:
> > > 1. all PMDs that do set a timestamp would also have to set port
> > > value too. Probably not a big deal as most of PMDs do set port
> > > value anyway right now, but it means it would be hard to get
> > > rid/change mbuf->port in future.  
> > 
> > Currently, all PMDs must set m->port.
> > If in the future we remove m->port, the applications that use it
> > will need to store the value in a mbuf metadata, or pass it as
> > arguments through function calls.
> > 
> >   
> > > 2. Applications would not allowed to change mbuf->port value
> > > before normalization is done (from what I heard some apps do
> > > update mbuf->port to store routing decisions). BTW, how the app
> > > would keep track which mbufs were already normalized, and which
> > > were not?  
> > 
> > I don't think it should be allowed to change m->port value.  
> 
> As far as I know it is allowed right now.
> PMD RX routine sets mbuf->port, after that application is free to use
> it in a way it likes.

The descriptor or m->port is "Input port". If the applications stores
something else than the input port, it is its responsibility if it
breaks something else. Like changing any other field to put something
that does not match the description.


> What we are introducing here is basically a new dependency between 2
> mbuf fields and new restriction. 

On the other hand, there is no strong dependency: the API to do the
normalization can take the port as a parameter.


> 
> Another thing that doesn't look very convenient to me here -
> We can have 2 different values of timestamp (both normalized and not)
> and there is no clear way for the application to know which one is in
> use right now. So each app writer would have to come-up with his own
> solution.   

It depends:
- the solution you describe is to have the application storing the
  normalized value in its private metadata.
- another solution would be to store the normalized value in
  m->timestamp. In this case, we would need a flag to tell if the
  timestamp value is normalized.

The problem pointed out by Jan is that doing the timestamp
normalization may take some CPU cycles, even if a small part of packets
requires it.


> 
> > Applications that
> > are doing this are responsible of what they change.
> > 
> >   
> > > 3. In theory with 

[dpdk-dev] [PATCH] net/ixgbe: remove invalid declaration

2017-02-28 Thread Olivier Matz
The function is not defined anywhere, remove it.

Fixes: 0eb609239efd ("ixgbe: enable Rx queue interrupts for PF and VF")

Signed-off-by: Olivier Matz 
---
 drivers/net/ixgbe/ixgbe_ethdev.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 680d5d9..c13b10e 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -514,7 +514,6 @@ uint32_t ixgbe_dev_rx_queue_count(struct rte_eth_dev *dev,
uint16_t rx_queue_id);
 
 int ixgbe_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
-int ixgbevf_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
 
 int ixgbe_dev_rx_init(struct rte_eth_dev *dev);
 
-- 
2.8.1



Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting

2017-02-28 Thread Bruce Richardson
On Tue, Feb 28, 2017 at 05:38:34PM +0530, Jerin Jacob wrote:
> On Tue, Feb 28, 2017 at 11:57:03AM +, Bruce Richardson wrote:
> > On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> > > On Thu, Feb 23, 2017 at 05:23:54PM +, Bruce Richardson wrote:
> > > > Users compiling DPDK should not need to know or care about the 
> > > > arrangement
> > > > of cachelines in the rte_ring structure. Therefore just remove the build
> > > > option and set the structures to be always split. For improved
> > > > performance use 128B rather than 64B alignment since it stops the 
> > > > producer
> > > > and consumer data being on adjacent cachelines.
> > > > 
> > > > Signed-off-by: Bruce Richardson 
> > > > ---
> > > >  config/common_base | 1 -
> > > >  doc/guides/rel_notes/release_17_05.rst | 6 ++
> > > >  lib/librte_ring/rte_ring.c | 2 --
> > > >  lib/librte_ring/rte_ring.h | 8 ++--
> > > >  4 files changed, 8 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/config/common_base b/config/common_base
> > > > index aeee13e..099ffda 100644
> > > > --- a/config/common_base
> > > > +++ b/config/common_base
> > > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> > > >  #
> > > >  CONFIG_RTE_LIBRTE_RING=y
> > > >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> > > >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> > > >  
> > > >  #
> > > > diff --git a/doc/guides/rel_notes/release_17_05.rst 
> > > > b/doc/guides/rel_notes/release_17_05.rst
> > > > index e25ea9f..ea45e0c 100644
> > > > --- a/doc/guides/rel_notes/release_17_05.rst
> > > > +++ b/doc/guides/rel_notes/release_17_05.rst
> > > > @@ -110,6 +110,12 @@ API Changes
> > > > Also, make sure to start the actual text at the margin.
> > > > =
> > > >  
> > > > +* **Reworked rte_ring library**
> > > > +
> > > > +  The rte_ring library has been reworked and updated. The following 
> > > > changes
> > > > +  have been made to it:
> > > > +
> > > > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> > > >  
> > > >  ABI Changes
> > > >  ---
> > > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > > index ca0a108..4bc6da1 100644
> > > > --- a/lib/librte_ring/rte_ring.c
> > > > +++ b/lib/librte_ring/rte_ring.c
> > > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char 
> > > > *name, unsigned count,
> > > > /* compilation-time checks */
> > > > RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> > > >   RTE_CACHE_LINE_MASK) != 0);
> > > > -#ifdef RTE_RING_SPLIT_PROD_CONS
> > > > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> > > >   RTE_CACHE_LINE_MASK) != 0);
> > > > -#endif
> > > > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> > > >   RTE_CACHE_LINE_MASK) != 0);
> > > >  #ifdef RTE_LIBRTE_RING_DEBUG
> > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > > index 72ccca5..04fe667 100644
> > > > --- a/lib/librte_ring/rte_ring.h
> > > > +++ b/lib/librte_ring/rte_ring.h
> > > > @@ -168,7 +168,7 @@ struct rte_ring {
> > > > uint32_t mask;   /**< Mask (size-1) of ring. */
> > > > volatile uint32_t head;  /**< Producer head. */
> > > > volatile uint32_t tail;  /**< Producer tail. */
> > > > -   } prod __rte_cache_aligned;
> > > > +   } prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> > > 
> > > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> > > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are 
> > > cache line
> > > size of 128B
> > > 
> > Sure.
> > 
> > However, can you perhaps try a performance test and check to see if
> > there is a performance difference between the two values before I change
> > it? In my tests I see improved performance by having an extra blank
> > cache-line between the producer and consumer data.
> 
> Sure. Which test are you running to measure the performance difference?
> Is it app/test/test_ring_perf.c?
> 
> > 
Yep, just the basic ring perf test. I look mostly at the core-to-core
numbers, since hyperthread-to-hyperthread or NUMA socket to NUMA socket
would be far less common use cases IMHO.

/Bruce



Re: [dpdk-dev] [PATCH v2] tools: add tags and cscope index file generation support

2017-02-28 Thread Jerin Jacob
On Mon, Feb 27, 2017 at 03:18:52PM +0100, Thomas Monjalon wrote:
> 2017-01-17 14:11, Jerin Jacob:
> > This script generates cscope, gtags, and tags
> > index files based on EAL environment.
> > (architecture and OS(linux/bsd))
> > 
> > Selection of the architecture and OS environment
> > is based on dpdk configuration target(T=)
> 
> What is the purpose of selecting a configuration?
> Is it to go quicker in the implementation you are interested in?

Yes. That is the hard part to do.

> In that case, I think we need a catch-all option, because I like
> being prompted by vim that several implementations exist and I can
> choose one of them.

OK. Then we can make T= as optional and if T= is not specified then script can
take all the source files. Thoughts?

> 
> > example usage:
> > make tags T=x86_64-native-linuxapp-gcc
> > make cscope T=x86_64-native-linuxapp-gcc
> > make gtags T=x86_64-native-linuxapp-gcc
> > 
> > Signed-off-by: Jerin Jacob 
> > Reviewed-by: Yuanhan Liu 
> > Reviewed-by: Ferruh Yigit 
> [...]
> >  .gitignore|   8 ++
> >  devtools/tags.sh  | 251 
> > ++
> >  mk/rte.sdkroot.mk |   4 +
> >  3 files changed, 263 insertions(+)
> 
> I think build-tags.sh would be a better name.

OK

> 
> On the implementation, I have few comments:
>   - is there a way to re-use the skip list when including the related 
> files?
>   - you can remove tile from this patch

OK

> And for the details:
>   - why a bash shebang where /bin/sh would be wider?

I will change to /bin/sh

>   - verbose option should be -v (with getopts)

OK

>   - please use $() instead of backquotes

OK

>   - please avoid one-line functions used only once like doctags

OK

> 
> I can help you with this script if needed.

OK. I can work on this next week, if you have time then feel free to
take up this.

> 


Re: [dpdk-dev] [PATCH v2 0/4] New crypto algorithm string parser API

2017-02-28 Thread De Lara Guarch, Pablo


> -Original Message-
> From: De Lara Guarch, Pablo
> Sent: Monday, February 27, 2017 2:39 PM
> To: Doherty, Declan
> Cc: dev@dpdk.org; De Lara Guarch, Pablo
> Subject: [PATCH v2 0/4] New crypto algorithm string parser API
> 
> Last release, an array with strings for the supported algorithms
> by the cryptodev library was added and used in the crypto-perf app.
> 
> This patchset creates a new API to parse strings from the user,
> to select the desired algorithm (using the array above),
> which can be used by any application, making it consistent across
> all the applications (now, L2fwd-crypto and crypto-perf apps are
> using different strings).
> 
> Changes in v2:
> 
> - Modified L2fwd-crypto document to reflect the changes
> 
> Pablo de Lara (4):
>   cryptodev: add missing algorithm strings
>   cryptodev: add algorithm string parsers
>   app/crypto-perf: use cryptodev algorithm parser
>   examples/l2fwd-crypto: use cryptodev algorithm parser
> 
>  app/test-crypto-perf/cperf_options_parsing.c   | 206 
> ++---
>  doc/guides/sample_app_ug/l2_forward_crypto.rst |  10 +-
>  examples/l2fwd-crypto/main.c   |  85 ++
>  lib/librte_cryptodev/rte_cryptodev.c   |  38 +
>  lib/librte_cryptodev/rte_cryptodev.h   |  30 
>  lib/librte_cryptodev/rte_cryptodev_version.map |   8 +
>  6 files changed, 105 insertions(+), 272 deletions(-)
> 
> --
> 2.7.4

Applied to dpdk-next-crypto.

Pablo



Re: [dpdk-dev] [PATCH v5 03/26] eal: No panic on hugepages info init

2017-02-28 Thread Bruce Richardson
On Mon, Feb 27, 2017 at 11:17:48AM -0500, Aaron Conole wrote:
> When attempting to scan hugepages, signal to the eal.c that an error has
> occurred, rather than performing a panic.
> 
> Signed-off-by: Aaron Conole 
> ---
>  lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c 
> b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
> index 18858e2..4d47eaf 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
> @@ -283,9 +283,11 @@ eal_hugepage_info_init(void)
>   struct dirent *dirent;
>  
>   dir = opendir(sys_dir_path);
> - if (dir == NULL)
> - rte_panic("Cannot open directory %s to read system hugepage "
> + if (dir == NULL) {
> + RTE_LOG(ERR, EAL, "Cannot open directory %s to read system 
> hugepage "
> "info\n", sys_dir_path);
> + return -1;
> + }

Minor nit.
The error message should go on a line on its own, without any breaks to
make it easy to "grep". This should also eliminate the checkpatch
complaint about it being too long.

/Bruce



Re: [dpdk-dev] [PATCH v5 08/26] eal: do not panic on memzone initialization fails

2017-02-28 Thread Bruce Richardson
On Mon, Feb 27, 2017 at 11:17:53AM -0500, Aaron Conole wrote:
> When memzone initialization fails, report the error to the calling
> application rather than panic().  Without a good way of detaching /
> releasing hugepages, at this point the application will have to restart.
> 
> Signed-off-by: Aaron Conole 
> ---
>  lib/librte_eal/linuxapp/eal/eal.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
> b/lib/librte_eal/linuxapp/eal/eal.c
> index a671ed4..1e54ca1 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -839,8 +839,11 @@ rte_eal_init(int argc, char **argv)
>   /* the directories are locked during eal_hugepage_info_init */
>   eal_hugedirs_unlock();
>  
> - if (rte_eal_memzone_init() < 0)
> - rte_panic("Cannot init memzone\n");
> + if (rte_eal_memzone_init() < 0) {
> + RTE_LOG(ERR, EAL, "Cannot init memzone\n");

Any particular reason why not "rte_eal_init_alert" as with the other
cases?

/Bruce



Re: [dpdk-dev] [PATCH v5 00/26] linux/eal: Remove most causes of panic on init

2017-02-28 Thread Bruce Richardson
On Mon, Feb 27, 2017 at 11:17:45AM -0500, Aaron Conole wrote:
> In many cases, it's enough to simply let the application know that the
> call to initialize DPDK has failed.  A complete halt can then be
> decided by the application based on error returned (and the app could
> even attempt a possible re-attempt after some corrective action by the
> user or application).
> 
Set looks pretty good. Just a couple of minor issues, apart from the one
checkpatch issue I flagged:

* seems to be some inconsistency between using the "rte_eal_init_alert"
  function and RTE_LOG. If there is some reason why some cases use
  the function and others don't I think that might need to be called out
  somewhere.
* check-git-log flags a number of minor errors with commit titles and
  messages - most common is commit titles starting with a capital
  letter.

Otherwise:
Series-Acked-by: Bruce Richardson 



Re: [dpdk-dev] [PATCH v5 08/26] eal: do not panic on memzone initialization fails

2017-02-28 Thread Aaron Conole
Bruce Richardson  writes:

> On Mon, Feb 27, 2017 at 11:17:53AM -0500, Aaron Conole wrote:
>> When memzone initialization fails, report the error to the calling
>> application rather than panic().  Without a good way of detaching /
>> releasing hugepages, at this point the application will have to restart.
>> 
>> Signed-off-by: Aaron Conole 
>> ---
>>  lib/librte_eal/linuxapp/eal/eal.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>> 
>> diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
>> b/lib/librte_eal/linuxapp/eal/eal.c
>> index a671ed4..1e54ca1 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal.c
>> @@ -839,8 +839,11 @@ rte_eal_init(int argc, char **argv)
>>  /* the directories are locked during eal_hugepage_info_init */
>>  eal_hugedirs_unlock();
>>  
>> -if (rte_eal_memzone_init() < 0)
>> -rte_panic("Cannot init memzone\n");
>> +if (rte_eal_memzone_init() < 0) {
>> +RTE_LOG(ERR, EAL, "Cannot init memzone\n");
>
> Any particular reason why not "rte_eal_init_alert" as with the other
> cases?

I only used rte_eal_init_alert() for cases which occur before logging
happens, but I am not opposed to switching it for all the cases.  I'll
swap them all for v6.

Thanks again for the review, Bruce!

-Aaron


Re: [dpdk-dev] [PATCH v5 03/26] eal: No panic on hugepages info init

2017-02-28 Thread Aaron Conole
Bruce Richardson  writes:

> On Mon, Feb 27, 2017 at 11:17:48AM -0500, Aaron Conole wrote:
>> When attempting to scan hugepages, signal to the eal.c that an error has
>> occurred, rather than performing a panic.
>> 
>> Signed-off-by: Aaron Conole 
>> ---
>>  lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>> 
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c 
>> b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
>> index 18858e2..4d47eaf 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
>> @@ -283,9 +283,11 @@ eal_hugepage_info_init(void)
>>  struct dirent *dirent;
>>  
>>  dir = opendir(sys_dir_path);
>> -if (dir == NULL)
>> -rte_panic("Cannot open directory %s to read system hugepage "
>> +if (dir == NULL) {
>> +RTE_LOG(ERR, EAL, "Cannot open directory %s to read system 
>> hugepage "
>>"info\n", sys_dir_path);
>> +return -1;
>> +}
>
> Minor nit.
> The error message should go on a line on its own, without any breaks to
> make it easy to "grep". This should also eliminate the checkpatch
> complaint about it being too long.

Yes, will fix it for v6.

-Aaron


Re: [dpdk-dev] Pktgen-DPDK build errors on ppc64le

2017-02-28 Thread Wiles, Keith

> On Feb 28, 2017, at 2:48 AM, Rahul Lakkireddy  
> wrote:
> 
> Hi All,
> 
> We are seeing Pktgen-DPDK not building on ppc64le with latest dpdk git
> repo.

Yes, it looks like Pktgen needs to change to account for 256 cores. It is using 
uint8_t and needs to change to uint16_t variables. I can only suggest you can 
change the sizes in the places the compiler gives an error. I will try to get 
Pktgen updated to fix this issue in the next release.

> 
> Below build errors are seen:
> -
> [pktgen-dpdk]# make
> == lib
> == common
>  CC l2p.o
> In file included from /root/pktgen-dpdk/lib/common/l2p.c:88:0:
> /root/pktgen-dpdk/lib/common/l2p.h: In function ‘pg_raw_dump_l2p’:
> /root/pktgen-dpdk/lib/common/l2p.h:103:2: error: comparison is always true 
> due to limited range of data type [-Werror=type-limits]
>  for (i = 0; i  < RTE_MAX_LCORE; i++)
>  ^
> /root/pktgen-dpdk/lib/common/l2p.h:109:2: error: comparison is always true 
> due to limited range of data type [-Werror=type-limits]
>  for (j = 0; j < RTE_MAX_LCORE; j++) {
>  ^
> /root/pktgen-dpdk/lib/common/l2p.h: In function ‘pg_dump_l2p’:
> /root/pktgen-dpdk/lib/common/l2p.h:486:2: error: comparison is always true 
> due to limited range of data type [-Werror=type-limits]
>  for (lid = 0; lid < RTE_MAX_LCORE; lid++) {
>  ^
> /root/pktgen-dpdk/lib/common/l2p.c: In function ‘pg_port_matrix_dump’:
> /root/pktgen-dpdk/lib/common/l2p.c:477:3: error: large integer implicitly 
> truncated to unsigned type [-Werror=overflow]
>   cnt.rxtx = get_map(l2p, pid, RTE_MAX_LCORE);
>   ^
> /root/pktgen-dpdk/lib/common/l2p.c:488:3: error: large integer implicitly 
> truncated to unsigned type [-Werror=overflow]
>   cnt.rxtx = get_map(l2p, pid, RTE_MAX_LCORE);
>   ^
> cc1: all warnings being treated as errors
> make[3]: *** [l2p.o] Error 1
> make[2]: *** [all] Error 2
> make[1]: *** [common] Error 2
> make: *** [lib] Error 2
> -
> 
> By changing CONFIG_RTE_MAX_LCORE to 128 in 
> defconfig_ppc_64-power8-linuxapp-gcc
> file, Pktgen-DPDK then builds fine.
> 
> Thanks,
> Rahul

Regards,
Keith



Re: [dpdk-dev] [RFC 3/8] mbuf: set mbuf fields while in pool

2017-02-28 Thread Olivier Matz
Hi Bruce,

On Tue, 24 Jan 2017 15:50:49 +, Bruce Richardson
 wrote:
> On Tue, Jan 24, 2017 at 04:19:28PM +0100, Olivier Matz wrote:
> > Set the value of m->refcnt to 1, m->nb_segs to 1 and m->next
> > to NULL when the mbuf is stored inside the mempool (unused).
> > This is done in rte_pktmbuf_prefree_seg(), before freeing or
> > recycling a mbuf.
> > 
> > Before this patch, the value of m->refcnt was expected to be 0
> > while in pool.
> > 
> > The objectives are:
> > 
> > - to avoid drivers to set m->next to NULL in the early Rx path,
> > since this field is in the second 64B of the mbuf and its access
> > could trigger a cache miss
> > 
> > - rationalize the behavior of raw_alloc/raw_free: one is now the
> >   symmetric of the other, and refcnt is never changed in these
> > functions.
> > 
> > Signed-off-by: Olivier Matz 
> > ---
> >  drivers/net/mlx5/mlx5_rxtx.c |  5 ++---
> >  drivers/net/mpipe/mpipe_tilegx.c |  1 +
> >  lib/librte_mbuf/rte_mbuf.c   |  2 ++
> >  lib/librte_mbuf/rte_mbuf.h   | 45
> > +--- 4 files changed, 38
> > insertions(+), 15 deletions(-) 
> 
> > /* compat with older versions */
> >  __rte_deprecated
> > -static inline void __attribute__((always_inline))
> > +static inline void
> >  __rte_mbuf_raw_free(struct rte_mbuf *m)
> >  {
> > rte_mbuf_raw_free(m);
> > @@ -1218,8 +1232,12 @@ static inline void rte_pktmbuf_detach(struct
> > rte_mbuf *m) m->data_len = 0;
> > m->ol_flags = 0;
> >  
> > -   if (rte_mbuf_refcnt_update(md, -1) == 0)
> > +   if (rte_mbuf_refcnt_update(md, -1) == 0) {  
> 
> Minor nit, but in the case that we only have a single reference to the
> mbufs, we are always setting that to zero just to re-increment it to 1
> again.
> 
> > +   md->next = NULL;
> > +   md->nb_segs = 1;
> > +   rte_mbuf_refcnt_set(md, 1);
> > rte_mbuf_raw_free(md);
> > +   }
> >  }
> >  
> >  /**  


I'm trying to gather the comments that have been made on this patchset.
About this one, I think it would be more complex to change the code
to avoid to set the refcnt twice:

 - we would need to duplicate code from rte_mbuf_refcnt_update(), which
   I think is not a very good idea, due to the big comment
 - it would make the detach code less readable
 - it's even not sure that it will be more performant: since
   rte_mbuf_refcnt_update() is inline, the compiler is probably able to
   do the simplification by itself.



Olivier


Re: [dpdk-dev] [PATCH v2 1/5] test: move tests to separate folder

2017-02-28 Thread Thomas Monjalon
2017-02-16 14:57, Ferruh Yigit:
> This is to logically group unit tests into their own folder,
> separating them from "app" folder.
> 
> Hopefully this will make the unit test in DPDK more visible.
> 
> Following binaries moved to "test" folder:
> cmdline-test
> test-acl
> test-pipeline
> test<-- various DPDK unit tests
> 
> Signed-off-by: Ferruh Yigit 
> Acked-by: Bruce Richardson 

I've fixed the MAINTAINERS file and applied the series, thanks


[dpdk-dev] [PATCH 2/2] net/mlx5: add hardware TSO support for VXLAN and GRE

2017-02-28 Thread Shahaf Shuler
This commit adds support for hardware TSO for tunneled packets.

Signed-off-by: Shahaf Shuler 
---
 drivers/net/mlx5/mlx5_ethdev.c | 4 +++-
 drivers/net/mlx5/mlx5_rxtx.c   | 9 +
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 3edfd49..4de3595 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -696,7 +696,9 @@ struct priv *
if (priv->tso)
info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;
if (priv->tunnel_en)
-   info->tx_offload_capa |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM;
+   info->tx_offload_capa |= (DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM |
+ DEV_TX_OFFLOAD_VXLAN_TNL_TSO |
+ DEV_TX_OFFLOAD_GRE_TNL_TSO);
if (priv_get_ifname(priv, &ifname) == 0)
info->if_index = if_nametoindex(ifname);
/* FIXME: RETA update/query API expects the callee to know the size of
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 145daa0..98e7205 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -488,9 +488,18 @@
(1 << txq->wqe_n) *
MLX5_WQE_SIZE);
unsigned int copy_b;
+   const uint64_t is_tunneled =
+   buf->ol_flags &
+   (PKT_TX_TUNNEL_GRE |
+PKT_TX_TUNNEL_VXLAN);
 
tso_header_sz = buf->l2_len + buf->l3_len +
buf->l4_len;
+
+   if (is_tunneled && txq->tunnel_en) {
+   tso_header_sz += buf->outer_l2_len +
+buf->outer_l3_len;
+   }
if (unlikely(tso_header_sz >
 MLX5_MAX_TSO_HEADER))
break;
-- 
1.8.3.1



[dpdk-dev] [PATCH 1/2] net/mlx5: add hardware checksum offload for tunnel packets

2017-02-28 Thread Shahaf Shuler
Prior to this commit Tx checksum offload was supported only for the
inner headers.
This commit adds support for the hardware to compute the checksum for the
outer headers as well.

The support is for tunneling protocols GRE and VXLAN.

Signed-off-by: Shahaf Shuler 
---
 doc/guides/nics/features/mlx5.ini |  2 ++
 doc/guides/nics/mlx5.rst  |  3 ++-
 drivers/net/mlx5/mlx5.c   |  7 +++
 drivers/net/mlx5/mlx5.h   |  2 ++
 drivers/net/mlx5/mlx5_ethdev.c|  2 ++
 drivers/net/mlx5/mlx5_prm.h   |  6 ++
 drivers/net/mlx5/mlx5_rxtx.c  | 14 +-
 drivers/net/mlx5/mlx5_rxtx.h  |  2 ++
 drivers/net/mlx5/mlx5_txq.c   |  2 ++
 9 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/features/mlx5.ini 
b/doc/guides/nics/features/mlx5.ini
index c6948cb..c2e1c6a 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -27,6 +27,8 @@ CRC offload  = Y
 VLAN offload = Y
 L3 checksum offload  = Y
 L4 checksum offload  = Y
+Inner L3 checksum= Y
+Inner L4 checksum= Y
 Packet type parsing  = Y
 Basic stats  = Y
 Stats per queue  = Y
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 8651456..a9fab9c 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -91,13 +91,14 @@ Features
 - KVM and VMware ESX SR-IOV modes are supported.
 - RSS hash result is supported.
 - Hardware TSO.
+- Hardware checksum TX offload for VXLAN and GRE.
 
 Limitations
 ---
 
 - Inner RSS for VXLAN frames is not supported yet.
 - Port statistics through software counters only.
-- Hardware checksum offloads for VXLAN inner header are not supported yet.
+- Hardware checksum RX offloads for VXLAN inner header are not supported yet.
 - Secondary process RX is not supported.
 
 Configuration
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 3623fbe..ffa16bd 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -375,6 +375,7 @@
struct ibv_device_attr device_attr;
unsigned int sriov;
unsigned int mps;
+   unsigned int tunnel_en;
int idx;
int i;
 
@@ -429,12 +430,17 @@
 * as all ConnectX-5 devices.
 */
switch (pci_dev->id.device_id) {
+   case PCI_DEVICE_ID_MELLANOX_CONNECTX4:
+   tunnel_en = 1;
+   mps = 0;
+   break;
case PCI_DEVICE_ID_MELLANOX_CONNECTX4LX:
case PCI_DEVICE_ID_MELLANOX_CONNECTX5:
case PCI_DEVICE_ID_MELLANOX_CONNECTX5VF:
case PCI_DEVICE_ID_MELLANOX_CONNECTX5EX:
case PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF:
mps = 1;
+   tunnel_en = 1;
break;
default:
mps = 0;
@@ -539,6 +545,7 @@
priv->mtu = ETHER_MTU;
priv->mps = mps; /* Enable MPW by default if supported. */
priv->cqe_comp = 1; /* Enable compression by default. */
+   priv->tunnel_en = tunnel_en;
err = mlx5_args(priv, pci_dev->device.devargs);
if (err) {
ERROR("failed to process device arguments: %s",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d2bb835..7ba2886 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -127,6 +127,8 @@ struct priv {
unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */
unsigned int pending_alarm:1; /* An alarm is pending. */
unsigned int tso:1; /* Whether TSO is supported. */
+   unsigned int tunnel_en:1;
+   /* Whether Tx offloads for tunneled packets are supported. */
unsigned int max_tso_payload_sz; /* Maximum TCP payload for TSO. */
unsigned int txq_inline; /* Maximum packet size for inlining. */
unsigned int txqs_inline; /* Queue number threshold for inlining. */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index d56331c..3edfd49 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -695,6 +695,8 @@ struct priv *
 DEV_TX_OFFLOAD_TCP_CKSUM);
if (priv->tso)
info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO;
+   if (priv->tunnel_en)
+   info->tx_offload_capa |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM;
if (priv_get_ifname(priv, &ifname) == 0)
info->if_index = if_nametoindex(ifname);
/* FIXME: RETA update/query API expects the callee to know the size of
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index 755b5d7..33fc386 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -120,6 +120,12 @@
 /* Tunnel packet bit in the CQE. */
 #define MLX5_CQE_RX_TUNNEL_PACKET (1u << 0)
 
+/* Inner L3

[dpdk-dev] [PATCH 0/2] net/mlx5: add Tx offloads for tunneled packets

2017-02-28 Thread Shahaf Shuler
This patchset adds support for hardware TX offloads for tunneled packets.

[PATCH 1/2] net/mlx5: add hardware checksum offload for tunnel
[PATCH 2/2] net/mlx5: add hardware TSO support for VXLAN and GRE


Re: [dpdk-dev] [PATCH 1/3] mk: clean .lib.cmd files

2017-02-28 Thread Thomas Monjalon
> Signed-off-by: Ferruh Yigit 
> ---
>  mk/rte.lib.mk | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Series merged and applied as one commit, thanks


Re: [dpdk-dev] [PATCH v5] doc: use corelist instead of coremask

2017-02-28 Thread Mcnamara, John


> -Original Message-
> From: Wiles, Keith
> Sent: Monday, February 27, 2017 7:14 PM
> To: dev@dpdk.org
> Cc: iryz...@nfware.com; thomas.monja...@6wind.com; Mcnamara, John
> 
> Subject: [PATCH v5] doc: use corelist instead of coremask
> 
> The coremask option in DPDK is difficult to use and we should be
> promoting the use of the corelist (-l) option. The patch
> adjusts the docs to use -l EAL option instead of the -c option.
> 
> The patch only changes the docs and not the code as the -c option
> will continue to exist unless it is removed in the future. The -c
> option should be kept to maintain backward compatibility.
> 
> Signed-off-by: Keith Wiles 


Thomas, since this patch affects a lot of the documentation it would be
good to merge it as early as possible in the release cycle.

Acked-by: John McNamara 



Re: [dpdk-dev] [PATCH] app/testpmd: Fix typos

2017-02-28 Thread Mcnamara, John

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Nirmoy Das
> Sent: Monday, February 27, 2017 7:24 PM
> To: dev@dpdk.org
> Cc: Nirmoy Das 
> Subject: [dpdk-dev] [PATCH] app/testpmd: Fix typos
> 
> fixes trivial typos in app/test-pmd/cmdline.c, app/test-pmd/icmpecho.c,
> app/test-pmd/testpmd.c
> 
> Signed-off-by: Nirmoy Das 


Hi Nirmoy,

Thanks for that.

Just a small heads-up since this is your first patch. The subject line should
be lowercase (apart from known abbreviations) and there is no need to list
the files in the body since git will record that anyway. Have a look at
the contribution guidelines for some more pointers:

http://dpdk.org/doc/guides/contributing/patches.html#commit-messages-subject-line

Apart from that, thanks for the patch.

Acked-by: John McNamara 




Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting

2017-02-28 Thread Jerin Jacob
On Tue, Feb 28, 2017 at 01:52:26PM +, Bruce Richardson wrote:
> On Tue, Feb 28, 2017 at 05:38:34PM +0530, Jerin Jacob wrote:
> > On Tue, Feb 28, 2017 at 11:57:03AM +, Bruce Richardson wrote:
> > > On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> > > > On Thu, Feb 23, 2017 at 05:23:54PM +, Bruce Richardson wrote:
> > > > > Users compiling DPDK should not need to know or care about the 
> > > > > arrangement
> > > > > of cachelines in the rte_ring structure. Therefore just remove the 
> > > > > build
> > > > > option and set the structures to be always split. For improved
> > > > > performance use 128B rather than 64B alignment since it stops the 
> > > > > producer
> > > > > and consumer data being on adjacent cachelines.
> > > > > 
> > > > > Signed-off-by: Bruce Richardson 
> > > > > ---
> > > > >  config/common_base | 1 -
> > > > >  doc/guides/rel_notes/release_17_05.rst | 6 ++
> > > > >  lib/librte_ring/rte_ring.c | 2 --
> > > > >  lib/librte_ring/rte_ring.h | 8 ++--
> > > > >  4 files changed, 8 insertions(+), 9 deletions(-)
> > > > > 
> > > > > diff --git a/config/common_base b/config/common_base
> > > > > index aeee13e..099ffda 100644
> > > > > --- a/config/common_base
> > > > > +++ b/config/common_base
> > > > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> > > > >  #
> > > > >  CONFIG_RTE_LIBRTE_RING=y
> > > > >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > > > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> > > > >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> > > > >  
> > > > >  #
> > > > > diff --git a/doc/guides/rel_notes/release_17_05.rst 
> > > > > b/doc/guides/rel_notes/release_17_05.rst
> > > > > index e25ea9f..ea45e0c 100644
> > > > > --- a/doc/guides/rel_notes/release_17_05.rst
> > > > > +++ b/doc/guides/rel_notes/release_17_05.rst
> > > > > @@ -110,6 +110,12 @@ API Changes
> > > > > Also, make sure to start the actual text at the margin.
> > > > > =
> > > > >  
> > > > > +* **Reworked rte_ring library**
> > > > > +
> > > > > +  The rte_ring library has been reworked and updated. The following 
> > > > > changes
> > > > > +  have been made to it:
> > > > > +
> > > > > +  * removed the build-time setting 
> > > > > ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> > > > >  
> > > > >  ABI Changes
> > > > >  ---
> > > > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > > > index ca0a108..4bc6da1 100644
> > > > > --- a/lib/librte_ring/rte_ring.c
> > > > > +++ b/lib/librte_ring/rte_ring.c
> > > > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char 
> > > > > *name, unsigned count,
> > > > >   /* compilation-time checks */
> > > > >   RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> > > > > RTE_CACHE_LINE_MASK) != 0);
> > > > > -#ifdef RTE_RING_SPLIT_PROD_CONS
> > > > >   RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> > > > > RTE_CACHE_LINE_MASK) != 0);
> > > > > -#endif
> > > > >   RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> > > > > RTE_CACHE_LINE_MASK) != 0);
> > > > >  #ifdef RTE_LIBRTE_RING_DEBUG
> > > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > > > index 72ccca5..04fe667 100644
> > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > @@ -168,7 +168,7 @@ struct rte_ring {
> > > > >   uint32_t mask;   /**< Mask (size-1) of ring. */
> > > > >   volatile uint32_t head;  /**< Producer head. */
> > > > >   volatile uint32_t tail;  /**< Producer tail. */
> > > > > - } prod __rte_cache_aligned;
> > > > > + } prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> > > > 
> > > > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> > > > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are 
> > > > cache line
> > > > size of 128B
> > > > 
> > > Sure.
> > > 
> > > However, can you perhaps try a performance test and check to see if
> > > there is a performance difference between the two values before I change
> > > it? In my tests I see improved performance by having an extra blank
> > > cache-line between the producer and consumer data.
> > 
> > Sure. Which test are you running to measure the performance difference?
> > Is it app/test/test_ring_perf.c?
> > 
> > > 
> Yep, just the basic ring perf test. I look mostly at the core-to-core
> numbers, since hyperthread-to-hyperthread or NUMA socket to NUMA socket
> would be far less common use cases IMHO.

Performance test result shows regression with RTE_CACHE_LINE_MIN_SIZE
scheme in some use case and some use case has higher performance(Testing using
two physical cores)


# base code
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 84
MP/MC single enq/dequeue: 301
SP/SC burst enq/dequeue (size: 8): 20
MP

[dpdk-dev] [PATCH v6 00/26] linux/eal: Remove most causes of panic on init

2017-02-28 Thread Aaron Conole
From: Aaron Conole 

In many cases, it's enough to simply let the application know that the
call to initialize DPDK has failed.  A complete halt can then be
decided by the application based on error returned (and the app could
even attempt a possible re-attempt after some corrective action by the
user or application).

Changes ->v2:
- Audited all "RTE_LOG (" calls that were introduced, and converted
  to "RTE_LOG("
- Added some fprintf(stderr, "") lines to indicate errors before logging
  is initialized
- Removed assignments to errno.
- Changed patch 14/25 to reflect EFAULT, and document in 25/25

Changes ->v3:
- Checkpatch issues in patches 3 (spelling mistake), 9 (issue with leading
  spaces), and 19 (braces around single line statement if-condition)

Changes ->v4:
- Error text cleanup.
- Add a new check around rte_bus_scan(), added during the development of
  this series.

Changes ->v5:
- checkpatch.pl cleanup in patch 02/26
- move rte_errno.h include from patch 15 to patch 02
- remove stdbool.h and use int as return type in patch 06/26

Changes ->v6:
- convert all of the initialization calls to RTE_LOG() to rte_eal_init_alert()
- run through check-git-log and checkpatches
- add Bruce's ack to the series

I kept the rte_errno reflection, since this is control-path code and the
init function returns a sentinel value of -1.

Aaron Conole (26):
  eal: cpu init will no longer panic
  eal: return error instead of panic for cpu init
  eal: do not panic on hugepage info init
  eal: do not panic on failed hugepage query
  eal: do not panic if parsing args returns error
  eal-common: introduce a way to query cpu support
  eal: do not panic when CPU isn't supported
  eal: do not panic on memzone initialization fails
  eal: set errno when exiting for already called
  eal: do not panic on log failures
  eal: do not panic on PCI-probe
  eal: do not panic on vfio failure
  eal: do not panic on memory init
  eal: do not panic on tailq init
  eal: do not panic on alarm init
  eal: convert timer init not to call panic
  eal: change the private pipe call to reflect errno
  eal: do not panic on interrupt thread init
  eal: do not error if plugins fail to init
  eal_pci: continue probing even on failures
  eal: do not panic on failed PCI-probe
  eal_common_dev: continue initializing vdevs
  eal: do not panic (or abort) if vdev init fails
  eal: do not panic when bus probe fails
  eal: do not panic on failed bus scan
  rte_eal_init: add info about various error codes

 lib/librte_eal/common/eal_common_cpuflags.c|  13 +-
 lib/librte_eal/common/eal_common_dev.c |   5 +-
 lib/librte_eal/common/eal_common_lcore.c   |   7 +-
 lib/librte_eal/common/eal_common_pci.c |  15 ++-
 lib/librte_eal/common/eal_common_tailqs.c  |   3 +-
 .../common/include/generic/rte_cpuflags.h  |   8 ++
 lib/librte_eal/common/include/rte_eal.h|  27 -
 lib/librte_eal/linuxapp/eal/eal.c  | 131 +++--
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c|   9 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c   |   5 +-
 10 files changed, 170 insertions(+), 53 deletions(-)

-- 
2.9.3



[dpdk-dev] [PATCH v6 01/26] eal: cpu init will no longer panic

2017-02-28 Thread Aaron Conole
After this change, the EAL CPU NUMA node resolution step can no longer
emit an rte_panic.  This aligns with the code in rte_eal_init, which
expects failures to return an error code.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/common/eal_common_lcore.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_lcore.c 
b/lib/librte_eal/common/eal_common_lcore.c
index 2cd4132..84fa0cb 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -83,16 +83,17 @@ rte_eal_cpu_init(void)
config->lcore_role[lcore_id] = ROLE_RTE;
lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
-   if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES)
+   if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
 #ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
lcore_config[lcore_id].socket_id = 0;
 #else
-   rte_panic("Socket ID (%u) is greater than "
+   RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
"RTE_MAX_NUMA_NODES (%d)\n",
lcore_config[lcore_id].socket_id,
RTE_MAX_NUMA_NODES);
+   return -1;
 #endif
-
+   }
RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
"core %u on socket %u\n",
lcore_id, lcore_config[lcore_id].core_id,
-- 
2.9.3



[dpdk-dev] [PATCH v6 02/26] eal: return error instead of panic for cpu init

2017-02-28 Thread Aaron Conole
There may be no way to gracefully recover, but the application
should be notified that a failure happened, rather than completely
aborting.  This allows the user to proceed with a "slow-path" type
solution.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index bf6b818..81692e7 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -61,6 +61,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -740,6 +741,12 @@ static int rte_eal_vfio_setup(void)
 }
 #endif
 
+static void rte_eal_init_alert(const char *msg)
+{
+   fprintf(stderr, "EAL: FATAL: %s\n", msg);
+   RTE_LOG(ERR, EAL, "%s\n", msg);
+}
+
 /* Launch threads, called at application init(). */
 int
 rte_eal_init(int argc, char **argv)
@@ -767,8 +774,11 @@ rte_eal_init(int argc, char **argv)
/* set log level as early as possible */
rte_set_log_level(internal_config.log_level);
 
-   if (rte_eal_cpu_init() < 0)
-   rte_panic("Cannot detect lcores\n");
+   if (rte_eal_cpu_init() < 0) {
+   rte_eal_init_alert("Cannot detect lcores.");
+   rte_errno = ENOTSUP;
+   return -1;
+   }
 
fctret = eal_parse_args(argc, argv);
if (fctret < 0)
-- 
2.9.3



[dpdk-dev] [PATCH v6 03/26] eal: do not panic on hugepage info init

2017-02-28 Thread Aaron Conole
When attempting to scan hugepages, signal to the eal that an error has
occurred, rather than performing a panic.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c 
b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 18858e2..7a21e8f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -283,9 +283,12 @@ eal_hugepage_info_init(void)
struct dirent *dirent;
 
dir = opendir(sys_dir_path);
-   if (dir == NULL)
-   rte_panic("Cannot open directory %s to read system hugepage "
- "info\n", sys_dir_path);
+   if (dir == NULL) {
+   RTE_LOG(ERR, EAL,
+   "Cannot open directory %s to read system hugepage 
info\n",
+   sys_dir_path);
+   return -1;
+   }
 
for (dirent = readdir(dir); dirent != NULL; dirent = readdir(dir)) {
struct hugepage_info *hpi;
-- 
2.9.3



[dpdk-dev] [PATCH v6 04/26] eal: do not panic on failed hugepage query

2017-02-28 Thread Aaron Conole
If we fail to acquire hugepage information, simply signal an error to
the application.  This clears the run_once counter, allowing the user or
application to take a corrective action and retry.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 81692e7..12bd941 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -787,8 +787,12 @@ rte_eal_init(int argc, char **argv)
if (internal_config.no_hugetlbfs == 0 &&
internal_config.process_type != RTE_PROC_SECONDARY &&
internal_config.xen_dom0_support == 0 &&
-   eal_hugepage_info_init() < 0)
-   rte_panic("Cannot get hugepage information\n");
+   eal_hugepage_info_init() < 0) {
+   rte_eal_init_alert("Cannot get hugepage information.");
+   rte_errno = EACCES;
+   rte_atomic32_clear(&run_once);
+   return -1;
+   }
 
if (internal_config.memory == 0 && internal_config.force_sockets == 0) {
if (internal_config.no_hugetlbfs)
-- 
2.9.3



[dpdk-dev] [PATCH v6 06/26] eal-common: introduce a way to query cpu support

2017-02-28 Thread Aaron Conole
This adds a new API to check for the eal cpu versions.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/common/eal_common_cpuflags.c  | 13 +++--
 lib/librte_eal/common/include/generic/rte_cpuflags.h |  8 
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
b/lib/librte_eal/common/eal_common_cpuflags.c
index b5f76f7..9a2d080 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -43,6 +43,13 @@
 void
 rte_cpu_check_supported(void)
 {
+   if (!rte_cpu_is_supported())
+   exit(1);
+}
+
+int
+rte_cpu_is_supported(void)
+{
/* This is generated at compile-time by the build system */
static const enum rte_cpu_flag_t compile_time_flags[] = {
RTE_COMPILE_TIME_CPUFLAGS
@@ -57,14 +64,16 @@ rte_cpu_check_supported(void)
fprintf(stderr,
"ERROR: CPU feature flag lookup failed with 
error %d\n",
ret);
-   exit(1);
+   return 0;
}
if (!ret) {
fprintf(stderr,
"ERROR: This system does not support \"%s\".\n"
"Please check that RTE_MACHINE is set 
correctly.\n",
rte_cpu_get_flag_name(compile_time_flags[i]));
-   exit(1);
+   return 0;
}
}
+
+   return 1;
 }
diff --git a/lib/librte_eal/common/include/generic/rte_cpuflags.h 
b/lib/librte_eal/common/include/generic/rte_cpuflags.h
index 71321f3..8d27031 100644
--- a/lib/librte_eal/common/include/generic/rte_cpuflags.h
+++ b/lib/librte_eal/common/include/generic/rte_cpuflags.h
@@ -82,4 +82,12 @@ rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature);
 void
 rte_cpu_check_supported(void);
 
+/**
+ * This function checks that the currently used CPU supports the CPU features
+ * that were specified at compile time. It is called automatically within the
+ * EAL, so does not need to be used by applications.  This version returns a
+ * result so that decisions may be made (for instance, graceful shutdowns).
+ */
+int
+rte_cpu_is_supported(void);
 #endif /* _RTE_CPUFLAGS_H_ */
-- 
2.9.3



[dpdk-dev] [PATCH v6 05/26] eal: do not panic if parsing args returns error

2017-02-28 Thread Aaron Conole
It's possible that the application could take a corrective action here,
and either prompt the user for different arguments, or at least perform
a better logging.  Exiting this early prevents any useful information
gathering from the application layer.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 12bd941..f7511ab 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -781,8 +781,12 @@ rte_eal_init(int argc, char **argv)
}
 
fctret = eal_parse_args(argc, argv);
-   if (fctret < 0)
-   exit(1);
+   if (fctret < 0) {
+   rte_eal_init_alert("Invalid 'command line' arguments.");
+   rte_errno = EINVAL;
+   rte_atomic32_clear(&run_once);
+   return -1;
+   }
 
if (internal_config.no_hugetlbfs == 0 &&
internal_config.process_type != RTE_PROC_SECONDARY &&
-- 
2.9.3



[dpdk-dev] [PATCH v6 07/26] eal: do not panic when CPU isn't supported

2017-02-28 Thread Aaron Conole
It's now possible to gracefully exit the application, or for
applications which support non-dpdk datapaths working in concert with
DPDK datapaths, there no longer is the possibility of exiting for
unsupported CPUs.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index f7511ab..a671ed4 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -759,7 +759,11 @@ rte_eal_init(int argc, char **argv)
char thread_name[RTE_MAX_THREAD_NAME_LEN];
 
/* checks if the machine is adequate */
-   rte_cpu_check_supported();
+   if (!rte_cpu_is_supported()) {
+   rte_eal_init_alert("unsupported cpu type.");
+   rte_errno = ENOTSUP;
+   return -1;
+   }
 
if (!rte_atomic32_test_and_set(&run_once))
return -1;
-- 
2.9.3



[dpdk-dev] [PATCH v6 08/26] eal: do not panic on memzone initialization fails

2017-02-28 Thread Aaron Conole
When memzone initialization fails, report the error to the calling
application rather than panic().  Without a good way of detaching /
releasing hugepages, at this point the application will have to restart.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index a671ed4..5a92b28 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -839,8 +839,11 @@ rte_eal_init(int argc, char **argv)
/* the directories are locked during eal_hugepage_info_init */
eal_hugedirs_unlock();
 
-   if (rte_eal_memzone_init() < 0)
-   rte_panic("Cannot init memzone\n");
+   if (rte_eal_memzone_init() < 0) {
+   rte_eal_init_alert("Cannot init memzone\n");
+   rte_errno = ENODEV;
+   return -1;
+   }
 
if (rte_eal_tailqs_init() < 0)
rte_panic("Cannot init tail queues for objects\n");
-- 
2.9.3



[dpdk-dev] [PATCH v6 09/26] eal: set errno when exiting for already called

2017-02-28 Thread Aaron Conole
Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 5a92b28..564cac3 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -765,8 +765,11 @@ rte_eal_init(int argc, char **argv)
return -1;
}
 
-   if (!rte_atomic32_test_and_set(&run_once))
+   if (!rte_atomic32_test_and_set(&run_once)) {
+   rte_eal_init_alert("already called initialization.");
+   rte_errno = EALREADY;
return -1;
+   }
 
logid = strrchr(argv[0], '/');
logid = strdup(logid ? logid + 1: argv[0]);
-- 
2.9.3



[dpdk-dev] [PATCH v6 10/26] eal: do not panic on log failures

2017-02-28 Thread Aaron Conole
When log initialization fails, it's generally because the fopencookie
failed.  While this is rare in practice, it could happen, and it is
likely because of memory pressure.  So, flag the error, and allow the
user to retry.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 564cac3..e1740a6 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -825,8 +825,12 @@ rte_eal_init(int argc, char **argv)
 
rte_config_init();
 
-   if (rte_eal_log_init(logid, internal_config.syslog_facility) < 0)
-   rte_panic("Cannot init logs\n");
+   if (rte_eal_log_init(logid, internal_config.syslog_facility) < 0) {
+   rte_eal_init_alert("Cannot init logging.");
+   rte_errno = ENOMEM;
+   rte_atomic32_clear(&run_once);
+   return -1;
+   }
 
if (rte_eal_pci_init() < 0)
rte_panic("Cannot init PCI\n");
-- 
2.9.3



[dpdk-dev] [PATCH v6 11/26] eal: do not panic on PCI-probe

2017-02-28 Thread Aaron Conole
This will usually be an issue because of permissions.  However, it could
also be caused by OOM.  In either case, errno will contain the
underlying cause.  It is safe to re-init the system here, so allow the
application to take corrective action and reinit.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index e1740a6..d5ef7b5 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -832,8 +832,12 @@ rte_eal_init(int argc, char **argv)
return -1;
}
 
-   if (rte_eal_pci_init() < 0)
-   rte_panic("Cannot init PCI\n");
+   if (rte_eal_pci_init() < 0) {
+   rte_eal_init_alert("Cannot init PCI\n");
+   rte_errno = EUNATCH;
+   rte_atomic32_clear(&run_once);
+   return -1;
+   }
 
 #ifdef VFIO_PRESENT
if (rte_eal_vfio_setup() < 0)
-- 
2.9.3



[dpdk-dev] [PATCH v6 12/26] eal: do not panic on vfio failure

2017-02-28 Thread Aaron Conole
Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index d5ef7b5..10eefd3 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -840,8 +840,12 @@ rte_eal_init(int argc, char **argv)
}
 
 #ifdef VFIO_PRESENT
-   if (rte_eal_vfio_setup() < 0)
-   rte_panic("Cannot init VFIO\n");
+   if (rte_eal_vfio_setup() < 0) {
+   rte_eal_init_alert("Cannot init VFIO\n");
+   rte_errno = EAGAIN;
+   rte_atomic32_clear(&run_once);
+   return -1;
+   }
 #endif
 
if (rte_eal_memory_init() < 0)
-- 
2.9.3



[dpdk-dev] [PATCH v6 15/26] eal: do not panic on alarm init

2017-02-28 Thread Aaron Conole
rte_eal_alarm_init() call uses the linux timerfd framework to create a
poll()-able timer using standard posix file operations.  This could fail
for a few reasons given in the man-pages, but many could be
corrected by the user application.  No need to panic.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index aa10192..7aa6b6e 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -869,8 +869,11 @@ rte_eal_init(int argc, char **argv)
return -1;
}
 
-   if (rte_eal_alarm_init() < 0)
-   rte_panic("Cannot init interrupt-handling thread\n");
+   if (rte_eal_alarm_init() < 0) {
+   rte_eal_init_alert("Cannot init interrupt-handling thread\n");
+   /* rte_eal_alarm_init sets rte_errno on failure. */
+   return -1;
+   }
 
if (rte_eal_timer_init() < 0)
rte_panic("Cannot init HPET or TSC timers\n");
-- 
2.9.3



[dpdk-dev] [PATCH v6 14/26] eal: do not panic on tailq init

2017-02-28 Thread Aaron Conole
There are some theoretical racy conditions in the system that _could_
cause early tailq init to fail;  however, no need to panic the
application.  While it can't continue using DPDK, it could make better
alerts to the user.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/common/eal_common_tailqs.c | 3 +--
 lib/librte_eal/linuxapp/eal/eal.c | 7 +--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_tailqs.c 
b/lib/librte_eal/common/eal_common_tailqs.c
index bb08ec8..4f69828 100644
--- a/lib/librte_eal/common/eal_common_tailqs.c
+++ b/lib/librte_eal/common/eal_common_tailqs.c
@@ -188,8 +188,7 @@ rte_eal_tailqs_init(void)
if (t->head == NULL) {
RTE_LOG(ERR, EAL,
"Cannot initialize tailq: %s\n", t->name);
-   /* no need to TAILQ_REMOVE, we are going to panic in
-* rte_eal_init() */
+   /* TAILQ_REMOVE not needed, error is already fatal */
goto fail;
}
}
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index ae0beed..aa10192 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -863,8 +863,11 @@ rte_eal_init(int argc, char **argv)
return -1;
}
 
-   if (rte_eal_tailqs_init() < 0)
-   rte_panic("Cannot init tail queues for objects\n");
+   if (rte_eal_tailqs_init() < 0) {
+   rte_eal_init_alert("Cannot init tail queues for objects\n");
+   rte_errno = EFAULT;
+   return -1;
+   }
 
if (rte_eal_alarm_init() < 0)
rte_panic("Cannot init interrupt-handling thread\n");
-- 
2.9.3



[dpdk-dev] [PATCH v6 13/26] eal: do not panic on memory init

2017-02-28 Thread Aaron Conole
This can only happen when access to hugepages (either as primary or
secondary process) fails (and that is usually permissions).  Since the
manner of failure is not reversible, we cannot allow retry.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 10eefd3..ae0beed 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -848,8 +848,11 @@ rte_eal_init(int argc, char **argv)
}
 #endif
 
-   if (rte_eal_memory_init() < 0)
-   rte_panic("Cannot init memory\n");
+   if (rte_eal_memory_init() < 0) {
+   rte_eal_init_alert("Cannot init memory\n");
+   rte_errno = ENOMEM;
+   return -1;
+   }
 
/* the directories are locked during eal_hugepage_info_init */
eal_hugedirs_unlock();
-- 
2.9.3



[dpdk-dev] [PATCH v6 16/26] eal: convert timer init not to call panic

2017-02-28 Thread Aaron Conole
After code inspection, there is no way for eal_timer_init() to fail.  It
simply returns 0 in all cases.  As such, this test could either go-away
or stay here as 'future-proofing'.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 7aa6b6e..73a7d38 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -875,8 +875,11 @@ rte_eal_init(int argc, char **argv)
return -1;
}
 
-   if (rte_eal_timer_init() < 0)
-   rte_panic("Cannot init HPET or TSC timers\n");
+   if (rte_eal_timer_init() < 0) {
+   rte_eal_init_alert("Cannot init HPET or TSC timers\n");
+   rte_errno = ENOTSUP;
+   return -1;
+   }
 
eal_check_mem_on_local_socket();
 
-- 
2.9.3



[dpdk-dev] [PATCH v6 17/26] eal: change the private pipe call to reflect errno

2017-02-28 Thread Aaron Conole
There could be some confusion as to why the call failed - this change
will always reflect the value of the error in rte_error.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c 
b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 92a19cb..5bb833e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -898,13 +898,16 @@ rte_eal_intr_init(void)
 * create a pipe which will be waited by epoll and notified to
 * rebuild the wait list of epoll.
 */
-   if (pipe(intr_pipe.pipefd) < 0)
+   if (pipe(intr_pipe.pipefd) < 0) {
+   rte_errno = errno;
return -1;
+   }
 
/* create the host thread to wait/handle the interrupt */
ret = pthread_create(&intr_thread, NULL,
eal_intr_thread_main, NULL);
if (ret != 0) {
+   rte_errno = ret;
RTE_LOG(ERR, EAL,
"Failed to create thread for interrupt handling\n");
} else {
-- 
2.9.3



[dpdk-dev] [PATCH v6 18/26] eal: do not panic on interrupt thread init

2017-02-28 Thread Aaron Conole
When initializing the interrupt thread, there are a number of possible
reasons for failure - some of which are correctable by the application.
Do not panic() needlessly, and give the application a change to reflect
this information to the user.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 73a7d38..d174ad4 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -894,8 +894,10 @@ rte_eal_init(int argc, char **argv)
rte_config.master_lcore, (int)thread_id, cpuset,
ret == 0 ? "" : "...");
 
-   if (rte_eal_intr_init() < 0)
-   rte_panic("Cannot init interrupt-handling thread\n");
+   if (rte_eal_intr_init() < 0) {
+   rte_eal_init_alert("Cannot init interrupt-handling thread\n");
+   return -1;
+   }
 
if (rte_bus_scan())
rte_panic("Cannot scan the buses for devices\n");
-- 
2.9.3



[dpdk-dev] [PATCH v6 19/26] eal: do not error if plugins fail to init

2017-02-28 Thread Aaron Conole
Plugins are useful and important.  However, it seems crazy to abort
everything just because they don't initialize properly.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index d174ad4..0ba9766 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -884,7 +884,7 @@ rte_eal_init(int argc, char **argv)
eal_check_mem_on_local_socket();
 
if (eal_plugins_init() < 0)
-   rte_panic("Cannot init plugins\n");
+   rte_eal_init_alert("Cannot init plugins\n");
 
eal_thread_init_master(rte_config.master_lcore);
 
-- 
2.9.3



[dpdk-dev] [PATCH v6 20/26] eal_pci: continue probing even on failures

2017-02-28 Thread Aaron Conole
Some devices may be inaccessible for a variety of reasons, or the
PCI-bus may be unavailable causing the whole thing to fail.  Still,
better to continue attempts at probes.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/common/eal_common_pci.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index 72547bd..9416190 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -69,6 +69,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -416,6 +417,7 @@ rte_eal_pci_probe(void)
struct rte_pci_device *dev = NULL;
struct rte_devargs *devargs;
int probe_all = 0;
+   int ret_1 = 0;
int ret = 0;
 
if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0)
@@ -430,17 +432,20 @@ rte_eal_pci_probe(void)
 
/* probe all or only whitelisted devices */
if (probe_all)
-   ret = pci_probe_all_drivers(dev);
+   ret_1 = pci_probe_all_drivers(dev);
else if (devargs != NULL &&
devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
-   ret = pci_probe_all_drivers(dev);
-   if (ret < 0)
-   rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
+   ret_1 = pci_probe_all_drivers(dev);
+   if (ret_1 < 0) {
+   RTE_LOG(ERR, EAL, "Requested device " PCI_PRI_FMT
 " cannot be used\n", dev->addr.domain, 
dev->addr.bus,
 dev->addr.devid, dev->addr.function);
+   rte_errno = errno;
+   ret = 1;
+   }
}
 
-   return 0;
+   return -ret;
 }
 
 /* dump one device */
-- 
2.9.3



[dpdk-dev] [PATCH v6 21/26] eal: do not panic on failed PCI-probe

2017-02-28 Thread Aaron Conole
Since PCI isn't neccessarily required, it may be possible to simply log
the error and continue on letting the user check the logs and restart
the application when things have failed.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 0ba9766..b13e1dd 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -943,8 +943,11 @@ rte_eal_init(int argc, char **argv)
rte_panic("Cannot probe devices\n");
 
/* Probe & Initialize PCI devices */
-   if (rte_eal_pci_probe())
-   rte_panic("Cannot probe PCI\n");
+   if (rte_eal_pci_probe()) {
+   rte_eal_init_alert("Cannot probe PCI\n");
+   rte_errno = ENOTSUP;
+   return -1;
+   }
 
if (rte_eal_dev_init() < 0)
rte_panic("Cannot init pmd devices\n");
-- 
2.9.3



[dpdk-dev] [PATCH v6 22/26] eal_common_dev: continue initializing vdevs

2017-02-28 Thread Aaron Conole
Even if one vdev should fail, there's no need to prevent further
processing.  Log the error, and reflect it to the higher levels to
decide.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/common/eal_common_dev.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index 4f3b493..9889997 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -80,6 +80,7 @@ int
 rte_eal_dev_init(void)
 {
struct rte_devargs *devargs;
+   int ret = 0;
 
/*
 * Note that the dev_driver_list is populated here
@@ -97,11 +98,11 @@ rte_eal_dev_init(void)
devargs->args)) {
RTE_LOG(ERR, EAL, "failed to initialize %s device\n",
devargs->virt.drv_name);
-   return -1;
+   ret = -1;
}
}
 
-   return 0;
+   return ret;
 }
 
 int rte_eal_dev_attach(const char *name, const char *devargs)
-- 
2.9.3



[dpdk-dev] [PATCH v6 24/26] eal: do not panic when bus probe fails

2017-02-28 Thread Aaron Conole
Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index ddc50f2..8274196 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -939,8 +939,11 @@ rte_eal_init(int argc, char **argv)
rte_eal_mp_wait_lcore();
 
/* Probe all the buses and devices/drivers on them */
-   if (rte_bus_probe())
-   rte_panic("Cannot probe devices\n");
+   if (rte_bus_probe()) {
+   rte_eal_init_alert("Cannot probe devices\n");
+   rte_errno = ENOTSUP;
+   return -1;
+   }
 
/* Probe & Initialize PCI devices */
if (rte_eal_pci_probe()) {
-- 
2.9.3



[dpdk-dev] [PATCH v6 25/26] eal: do not panic on failed bus scan

2017-02-28 Thread Aaron Conole
For now, do an abort.  It's likely that even aborting the initialization
is premature in this case, as it may be possible to proceed even if one
bus or another is not available.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 8274196..6d6b825 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -899,8 +899,11 @@ rte_eal_init(int argc, char **argv)
return -1;
}
 
-   if (rte_bus_scan())
-   rte_panic("Cannot scan the buses for devices\n");
+   if (rte_bus_scan()) {
+   rte_eal_init_alert("Cannot scan the buses for devices\n");
+   rte_errno = ENODEV;
+   return -1;
+   }
 
RTE_LCORE_FOREACH_SLAVE(i) {
 
-- 
2.9.3



[dpdk-dev] [PATCH v6 23/26] eal: do not panic (or abort) if vdev init fails

2017-02-28 Thread Aaron Conole
Seems like it's possible to continue.  At least, the error is reflected
properly in the logs.  A user could then go and correct or investigate
the situation.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index b13e1dd..ddc50f2 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -950,7 +950,7 @@ rte_eal_init(int argc, char **argv)
}
 
if (rte_eal_dev_init() < 0)
-   rte_panic("Cannot init pmd devices\n");
+   rte_eal_init_alert("Cannot init pmd devices\n");
 
rte_eal_mcfg_complete();
 
-- 
2.9.3



[dpdk-dev] [PATCH v6 26/26] rte_eal_init: add info about various error codes

2017-02-28 Thread Aaron Conole
The rte_eal_init function will now pass failure reason hints to the
application.  To help app developers deciper this, add some brief
information about what the codes are indicating.

Signed-off-by: Aaron Conole 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/common/include/rte_eal.h | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_eal.h 
b/lib/librte_eal/common/include/rte_eal.h
index 03fee50..9251244 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -159,7 +159,32 @@ int rte_eal_iopl_init(void);
  * function call and should not be further interpreted by the
  * application.  The EAL does not take any ownership of the memory used
  * for either the argv array, or its members.
- *   - On failure, a negative error value.
+ *   - On failure, -1 and rte_errno is set to a value indicating the cause
+ * for failure.  In some instances, the application will need to be
+ * restarted as part of clearing the issue.
+ *
+ *   Error codes returned via rte_errno:
+ * EACCES indicates a permissions issue.
+ *
+ * EAGAIN indicates either a bus or system resource was not available,
+ *setup may be attempted again.
+ *
+ * EALREADY indicates that the rte_eal_init function has already been
+ *  called, and cannot be called again.
+ *
+ * EFAULT indicates the tailq configuration name was not found in
+ *memory configuration.
+ *
+ * EINVAL indicates invalid parameters were passed as argv/argc.
+ *
+ * ENOMEM indicates failure likely caused by an out-of-memory condition.
+ *
+ * ENODEV indicates memory setup issues.
+ *
+ * ENOTSUP indicates that the EAL cannot initialize on this system.
+ *
+ * EUNATCH indicates that the PCI bus is either not present, or is not
+ * readable by the eal.
  */
 int rte_eal_init(int argc, char **argv);
 
-- 
2.9.3



Re: [dpdk-dev] [ovs-dev] [PATCH] selinux: Allow creating tap devices.

2017-02-28 Thread Aaron Conole
Daniele Di Proietto  writes:

> On 26/01/2017 12:35, "Ansis Atteka"  wrote:
>>
>>
>>On 26 January 2017 at 21:24, Aaron Conole 
>> wrote:
>>
>>Daniele Di Proietto  writes:
>>
>>> On 25/01/2017 00:01, "Ansis Atteka"  wrote:
>>>
On Jan 25, 2017 4:22 AM, "Daniele Di Proietto"  
wrote:

Current SELinux policy in RHEL and Fedora doesn't allow the creation of
TAP devices.

A tap device is used by dpif-netdev to create internal devices.

Without this patch, adding any bridge backed by the userspace datapath
would fail.

This doesn't mean that we can run Open vSwitch with DPDK under SELinux
yet, but at least we can use the userspace datapath.

Signed-off-by: Daniele Di Proietto 
>>
>>I just noticed this, sorry for jumping in late.
>>
Acked-by: Ansis Atteka 


I saw that other open source projects like OpenVPN use rw_file_perms
 shortcut macro. Not sure how relevant that is for OVS but that macro
 expands to a little more function calls than what you have
 below. Maybe we don't need it, if what you have
 just worked.
>>>
>>> Thanks a lot for the review.
>>>
>>> I cooked this up using audit2allow and I tested it on fedora 25.  I'm
>>> now able to create and delete userspace bridges, without any further
>>> complaints from selinux
>>
>>I have the following openvswitch-custom.te that did work to run
>>ovs+dpdk under selinux and pass traffic:
>>
>>
>>Thanks for posting this. I think that this is really helpful to
>> gather all necessary OVS+DPDK rules from different sources to make
>> sure that nothing is missed.
> +1, thanks a lot
>> 
>>
>>
>> 8< 
>>
>>require {
>>type openvswitch_t;
>>type openvswitch_tmp_t;
>>type openvswitch_var_run_t;
>>type ifconfig_exec_t;
>>type hostname_exec_t;
>>type vfio_device_t;
>>type kernel_t;
>>type tun_tap_device_t;
>>type hugetlbfs_t;
>>type init_t;
>>class netlink_socket { setopt getopt create connect getattr write 
>> read };
>>class file { write getattr read open execute execute_no_trans create 
>> unlink };
>>class chr_file { write getattr read open ioctl };
>>class unix_stream_socket { write getattr read connectto connect 
>> setopt getopt sendto accept bind recvfrom acceptfrom };
>>class dir { write remove_name add_name lock read };
>>}
>>
>>#= openvswitch_t ==
>>allow openvswitch_t self:netlink_socket { setopt getopt create connect 
>>getattr write read };
>>allow openvswitch_t hostname_exec_t:file { read getattr open execute 
>>execute_no_trans };
>>allow openvswitch_t ifconfig_exec_t:file { read getattr open execute 
>>execute_no_trans };
>>allow openvswitch_t openvswitch_tmp_t:file { execute execute_no_trans };
>>allow openvswitch_t openvswitch_tmp_t:unix_stream_socket { write getattr read 
>>connectto connect setopt getopt sendto accept bind recvfrom acceptfrom };
>>allow openvswitch_t vfio_device_t:chr_file { read write open ioctl getattr };
>>allow openvswitch_t tun_tap_device_t:chr_file { read write getattr open ioctl 
>>};
>>allow openvswitch_t hugetlbfs_t:dir { write remove_name add_name lock read };
>>allow openvswitch_t hugetlbfs_t:file { create unlink };
>>allow openvswitch_t kernel_t:unix_stream_socket { write getattr read 
>>connectto connect setopt getopt sendto accept bind recvfrom acceptfrom };
>>allow openvswitch_t init_t:file { read open };
>>
>> >8 
>>
>>You'll note that this change gives the openvswitch complete access to
>>hugetlbfs label, which might be the biggest scary part.
>>
>>
>>There is also option to use SELinux switches that allow to activate only 
>>subset of SElinux rules on a "per OVS feature basis" if there is risk that 
>>because of DPDK whitelise we could be unconditionally loosening up SElinux 
>>policy too much for non-DPDK
>> cases. See [https://wiki.centos.org/TipsAndTricks/SelinuxBooleans] for more 
>> details.
> Ok, so perhaps we should require tun_tap_device_t permissions only if
> we enable userspace support with a boolean.
> I just posted this piece because the corresponding code is in openvswitch 
> source tree.
> The rest of the permissions (hugepages, vfio) are required because of
> code that's in the dpdk library.  Is there a way to put these in DPDK
> and then just call a macro here, like
> dpdk_perms(openvswitch_t)

Below is an example of the macro:

 8< 

define(`dpdk_perms', `
gen_require(`
type vfio_device_t;
type kernel_t;
type hugetlbfs_t;
class file { write getattr read open execute execute_no_trans
create unlink };
class chr_file { write getattr read open ioctl };
class unix_stream_socket { write getattr read connectto connect
  

Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization

2017-02-28 Thread Ananyev, Konstantin


> -Original Message-
> From: Olivier Matz [mailto:olivier.m...@6wind.com]
> Sent: Tuesday, February 28, 2017 12:28 PM
> To: Ananyev, Konstantin 
> Cc: Jan Blunck ; Richardson, Bruce 
> ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
> 
> On Tue, 28 Feb 2017 11:48:20 +, "Ananyev, Konstantin"
>  wrote:
> > >
> > > On Tue, 28 Feb 2017 10:29:41 +, "Ananyev, Konstantin"
> > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, 28 Feb 2017 09:05:07 +, "Ananyev, Konstantin"
> > > > >  wrote:
> > > > > > Hi everyone,
> > > > > >
> > > > > > > >
> > > > > > > > In my opinion, if we have the room in the first cache
> > > > > > > > line, we should put it there. The only argument I see
> > > > > > > > against is "we may find something more important in the
> > > > > > > > future, and we won't have room for it in the first cache
> > > > > > > > line". I don't feel we should penalize today's use cases
> > > > > > > > for hypothetic future use cases.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >> 2. timestamp normalization point
> > > > > > > >>  inside PMD RX vs somewhere later as user needs it
> > > > > > > >> (extra function in dev_ops?).
> > > > > > > >
> > > > > > > > This point could be changed. My initial proposition tries
> > > > > > > > to provide a generic API for timestamp. Let me remind it
> > > > > > > > here:
> > > > > > > >
> > > > > > > > a- the timestamp is in nanosecond
> > > > > > > > b- the reference is always the same for a given path: if
> > > > > > > > the timestamp is set in a PMD, all the packets for this
> > > > > > > > PMD will have the same reference, but for 2 different
> > > > > > > > PMDs (or a sw lib), the reference would not be the same.
> > > > > > > >
> > > > > > > > We may remove a-, and just have:
> > > > > > > >  - the reference and the unit are always the same for a
> > > > > > > > given path: if the timestamp is set in a PMD, all the
> > > > > > > > packets for this PMD will have the same reference and
> > > > > > > > unit, but for 2 different PMDs (or a sw lib), they would
> > > > > > > > not be the same.
> > > > > > > >
> > > > > > > > In both cases, we would need a conversion code (maybe in a
> > > > > > > > library) if the application wants to work with timestamps
> > > > > > > > from several sources. The second solution removes the
> > > > > > > > normalization code in the PMD when not needed, it is
> > > > > > > > probably better.
> > > > > > >
> > > > > > > I agree.
> > > > > >
> > > > > > One question - does that mean that application would need to
> > > > > > keep a track from what PMD each particular packet came to do
> > > > > > the normalization? Konstantin
> > > > >
> > > > > I'd say yes. It does not look very difficult to do, since the
> > > > > mbuf contains the input port id.
> > > > >
> > > >
> > > > I understand that we can use mbuf->port here, but it means that
> > > > we'll introduce new implicit dependency between timestamp and
> > > > port values. From my point that introduces new implications:
> > > > 1. all PMDs that do set a timestamp would also have to set port
> > > > value too. Probably not a big deal as most of PMDs do set port
> > > > value anyway right now, but it means it would be hard to get
> > > > rid/change mbuf->port in future.
> > >
> > > Currently, all PMDs must set m->port.
> > > If in the future we remove m->port, the applications that use it
> > > will need to store the value in a mbuf metadata, or pass it as
> > > arguments through function calls.
> > >
> > >
> > > > 2. Applications would not allowed to change mbuf->port value
> > > > before normalization is done (from what I heard some apps do
> > > > update mbuf->port to store routing decisions). BTW, how the app
> > > > would keep track which mbufs were already normalized, and which
> > > > were not?
> > >
> > > I don't think it should be allowed to change m->port value.
> >
> > As far as I know it is allowed right now.
> > PMD RX routine sets mbuf->port, after that application is free to use
> > it in a way it likes.
> 
> The descriptor or m->port is "Input port". If the applications stores
> something else than the input port, it is its responsibility if it
> breaks something else. Like changing any other field to put something
> that does not match the description.
> 
> 
> > What we are introducing here is basically a new dependency between 2
> > mbuf fields and new restriction.
> 
> On the other hand, there is no strong dependency: the API to do the
> normalization can take the port as a parameter.

Ok, that would be much better - the dependency is still there,
but at least we don't force it.

> 
> 
> >
> > Another thing that doesn't look very convenient to me here -
> > We can have 2 different values of timestamp (both normalized and not)
> > and there is no clear way for the application to know which one is in
> > use right now. So each app writer would have to come-up with his own
> > solution.

[dpdk-dev] What is the max size of packets rte_eth_tx_burst() can send practically/theoritically?

2017-02-28 Thread Joo Kim
Hello,


For example, as I understand,  ovs-dpdk code  uses a buffer of size 32
when it transmits via rte_eth_tx_burst().

I think it can transmit more packets in a bust.
I know there should be a balance between throughput and latency. But, I am
wondering what max size packets  a dpdk application can give  to
rte_eth_tx_burst()  practically/theoretically?
(I see some max macro in ixgbe_rxtx.h)

Thanks


[dpdk-dev] [PATCH 0/2] net/mlx5: add enhanced multi-packet send for ConnectX-5

2017-02-28 Thread Yongseok Koh
This patch set is to add the Enhanced Multi-Packet Send feature which is
newly introduced for ConnectX-5 families of adaptors.

Yongseok Koh (2):
  net/mlx5: add enhanced multi-packet send for ConnectX-5
  doc: update PMD options for mlx5

 doc/guides/nics/mlx5.rst   |  32 +++-
 drivers/net/mlx5/mlx5.c|  34 +++-
 drivers/net/mlx5/mlx5.h|   4 +-
 drivers/net/mlx5/mlx5_ethdev.c |   6 +-
 drivers/net/mlx5/mlx5_prm.h|  23 +++
 drivers/net/mlx5/mlx5_rxtx.c   | 405 +
 drivers/net/mlx5/mlx5_rxtx.h   |   5 +
 drivers/net/mlx5/mlx5_txq.c|  18 +-
 8 files changed, 514 insertions(+), 13 deletions(-)

-- 
2.11.0



[dpdk-dev] [PATCH 2/2] doc: update PMD options for mlx5

2017-02-28 Thread Yongseok Koh
Enhanced multi-packet send mode is newly introduced for ConnectX-5 families
of adaptors.

Signed-off-by: Yongseok Koh 
---
 doc/guides/nics/mlx5.rst | 32 
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 09922a08f..fda8bc3fe 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -181,14 +181,38 @@ Run-time configuration
 
 - ``txq_mpw_en`` parameter [int]
 
-  A nonzero value enables multi-packet send. This feature allows the TX
-  burst function to pack up to five packets in two descriptors in order to
-  save PCI bandwidth and improve performance at the cost of a slightly
-  higher CPU usage.
+  A nonzero value enables multi-packet send (MPS) for ConnectX-4 Lx and
+  enhanced multi-packet send (Enhanced MPS) for ConnectX-5. MPS allows the
+  TX burst function to pack up multiple packets in a single descriptor
+  session in order to save PCI bandwidth and improve performance at the
+  cost of a slightly higher CPU usage. When ``txq_inline`` is set along
+  with ``txq_mpw_en``, TX burst function tries to copy entire packet data
+  on to TX descriptor instead of including pointer of packet only if there
+  is enough room remained in the descriptor. ``txq_inline`` sets
+  per-descriptor space for either pointers or inlined packets. In addition,
+  Enhanced MPS supports hybrid mode - mixing inlined packets and pointers
+  in the same descriptor.
 
   It is currently only supported on the ConnectX-4 Lx and ConnectX-5
   families of adapters. Enabled by default.
 
+- ``txq_mpw_hdr_dseg_en`` parameter [int]
+
+  A nonzero value enables including two pointers in the unused space in the
+  first block of TX descriptor. This can be used to lessen CPU load for
+  memory copy.
+
+  Effective only when Enhanced MPS is supported. Disabled by default.
+
+- ``txq_max_inline_len`` parameter [int]
+
+  Maximum size of packet to be inlined. This limits the size of packet to
+  be inlined. If the size of a packet is larger than configured value, the
+  packet isn't inlined even though there's enough space remained in the
+  descriptor. Instead, the packet is included with pointer.
+
+  Effective only when Enhanced MPS is supported. The default value is 256.
+
 Prerequisites
 -
 
-- 
2.11.0



[dpdk-dev] [PATCH 1/2] net/mlx5: add enhanced multi-packet send for ConnectX-5

2017-02-28 Thread Yongseok Koh
ConnectX-5 supports enhanced version of multi-packet send (MPS). An MPS Tx
descriptor can carry multiple packets either by including pointers of
packets or by inlining packets. Inlining packet data can be helpful to
better utilize PCIe bandwidth. In addition, Enhanced MPS supports hybrid
mode - mixing inlined packets and pointers in a descriptor. This feature is
enabled by default if supported by HW.

Signed-off-by: Yongseok Koh 
---
 drivers/net/mlx5/mlx5.c|  34 +++-
 drivers/net/mlx5/mlx5.h|   4 +-
 drivers/net/mlx5/mlx5_ethdev.c |   6 +-
 drivers/net/mlx5/mlx5_prm.h|  23 +++
 drivers/net/mlx5/mlx5_rxtx.c   | 405 +
 drivers/net/mlx5/mlx5_rxtx.h   |   5 +
 drivers/net/mlx5/mlx5_txq.c|  18 +-
 7 files changed, 486 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index d4bd4696c..24e3865f0 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -84,6 +84,12 @@
 /* Device parameter to enable multi-packet send WQEs. */
 #define MLX5_TXQ_MPW_EN "txq_mpw_en"
 
+/* Device parameter to configure the number of dsegs before inlined packet. */
+#define MLX5_TXQ_MPW_HDR_DSEG_EN "txq_mpw_hdr_dseg_en"
+
+/* Device parameter to limit the size of inlining packet */
+#define MLX5_TXQ_MAX_INLINE_LEN "txq_max_inline_len"
+
 /**
  * Retrieve integer value from environment variable.
  *
@@ -289,7 +295,11 @@ mlx5_args_check(const char *key, const char *val, void 
*opaque)
} else if (strcmp(MLX5_TXQS_MIN_INLINE, key) == 0) {
priv->txqs_inline = tmp;
} else if (strcmp(MLX5_TXQ_MPW_EN, key) == 0) {
-   priv->mps &= !!tmp; /* Enable MPW only if HW supports */
+   priv->mps = !!tmp ? priv->mps : MLX5_MPW_DISABLED;
+   } else if (strcmp(MLX5_TXQ_MPW_HDR_DSEG_EN, key) == 0) {
+   priv->mpw_hdr_dseg = !!tmp;
+   } else if (strcmp(MLX5_TXQ_MAX_INLINE_LEN, key) == 0) {
+   priv->txq_max_inline_len = tmp;
} else {
WARN("%s: unknown parameter", key);
return -EINVAL;
@@ -316,6 +326,8 @@ mlx5_args(struct priv *priv, struct rte_devargs *devargs)
MLX5_TXQ_INLINE,
MLX5_TXQS_MIN_INLINE,
MLX5_TXQ_MPW_EN,
+   MLX5_TXQ_MPW_HDR_DSEG_EN,
+   MLX5_TXQ_MAX_INLINE_LEN,
NULL,
};
struct rte_kvargs *kvlist;
@@ -424,20 +436,23 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
 */
switch (pci_dev->id.device_id) {
case PCI_DEVICE_ID_MELLANOX_CONNECTX4LX:
+   mps = MLX5_MPW;
+   break;
case PCI_DEVICE_ID_MELLANOX_CONNECTX5:
case PCI_DEVICE_ID_MELLANOX_CONNECTX5VF:
case PCI_DEVICE_ID_MELLANOX_CONNECTX5EX:
case PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF:
-   mps = 1;
+   mps = MLX5_MPW_ENHANCED;
break;
default:
-   mps = 0;
+   mps = MLX5_MPW_DISABLED;
}
INFO("PCI information matches, using device \"%s\""
-" (SR-IOV: %s, MPS: %s)",
+" (SR-IOV: %s, %sMPS: %s)",
 list[i]->name,
 sriov ? "true" : "false",
-mps ? "true" : "false");
+mps == MLX5_MPW_ENHANCED ? "Enhanced " : "",
+mps != MLX5_MPW_DISABLED ? "true" : "false");
attr_ctx = ibv_open_device(list[i]);
err = errno;
break;
@@ -531,6 +546,12 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
priv->pd = pd;
priv->mtu = ETHER_MTU;
priv->mps = mps; /* Enable MPW by default if supported. */
+   /* Set default values for Enhanced MPW, a.k.a MPWv2 */
+   if (mps == MLX5_MPW_ENHANCED) {
+   priv->mpw_hdr_dseg = 0;
+   priv->txqs_inline = MLX5_EMPW_MIN_TXQS;
+   priv->txq_max_inline_len = MLX5_EMPW_MAX_INLINE_LEN;
+   }
priv->cqe_comp = 1; /* Enable compression by default. */
err = mlx5_args(priv, pci_dev->device.devargs);
if (err) {
@@ -586,6 +607,9 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
rte_pci_device *pci_dev)
err = ENOTSUP;
goto port_error;
}
+   INFO("%sMPS is %s",
+priv->mps == MLX5_MPW_ENHANCED ? "Enhanced " : "",
+priv->mps != MLX5_MPW_DISABLED ? "enabled" : "disabled");
/* Allocate and register default RSS hash keys. */
priv->rss_conf = rte_calloc(__func__, hash_rxq_in

[dpdk-dev] [PATCH 0/9] Update ixgbe base driver

2017-02-28 Thread Wenzhuo Lu
Some minor changes and fixes, including,
link block check for KR,
complete HW initialization even if SFP is not present,
add VF xcast promiscuous mode.

Wenzhuo Lu (9):
  net/ixgbe/base: make a debug message simple
  net/ixgbe/base: remove X550em SFP iXFI setup
  net/ixgbe/base: add link block check for KR
  net/ixgbe/base: add bit for enabling L3/L4 filtering
  net/ixgbe/base: complete hw init when SFP not present
  net/ixgbe/base: disable FC for 15B0
  net/ixgbe: support xcast promisc mode
  net/ixgbe/base: fix build error
  net/ixgbe/base: update readme

 doc/guides/rel_notes/release_17_05.rst |  8 +
 drivers/net/ixgbe/base/README  |  4 +--
 drivers/net/ixgbe/base/ixgbe_common.c  |  7 ++--
 drivers/net/ixgbe/base/ixgbe_mbx.h |  8 +
 drivers/net/ixgbe/base/ixgbe_phy.c | 23 ++---
 drivers/net/ixgbe/base/ixgbe_type.h|  1 +
 drivers/net/ixgbe/base/ixgbe_vf.c  |  4 +++
 drivers/net/ixgbe/base/ixgbe_x550.c| 63 +-
 drivers/net/ixgbe/ixgbe_ethdev.c   |  6 
 9 files changed, 47 insertions(+), 77 deletions(-)

-- 
1.9.3



[dpdk-dev] [PATCH 5/9] net/ixgbe/base: complete hw init when SFP not present

2017-02-28 Thread Wenzhuo Lu
If SFP module is not present, reset_hw doesn't return success.
SW should complete the initialization, or with specific module
it resulted in no link when the module was later inserted.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_common.c 
b/drivers/net/ixgbe/base/ixgbe_common.c
index 9645667..df6fbf9 100644
--- a/drivers/net/ixgbe/base/ixgbe_common.c
+++ b/drivers/net/ixgbe/base/ixgbe_common.c
@@ -495,7 +495,7 @@ s32 ixgbe_init_hw_generic(struct ixgbe_hw *hw)
/* Reset the hardware */
status = hw->mac.ops.reset_hw(hw);
 
-   if (status == IXGBE_SUCCESS) {
+   if (status == IXGBE_SUCCESS || status == IXGBE_ERR_SFP_NOT_PRESENT) {
/* Start the HW */
status = hw->mac.ops.start_hw(hw);
}
-- 
1.9.3



[dpdk-dev] [PATCH 1/9] net/ixgbe/base: make a debug message simple

2017-02-28 Thread Wenzhuo Lu
The debug message is too long. Make it shorter.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x550.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 6f9c034..cf72bba 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -2519,7 +2519,7 @@ s32 ixgbe_reset_hw_X550em(struct ixgbe_hw *hw)
  status);
 
if (status == IXGBE_ERR_SFP_NOT_SUPPORTED) {
-   DEBUGOUT("Returning from reset HW since PHY ops init returned 
IXGBE_ERR_SFP_NOT_SUPPORTED\n");
+   DEBUGOUT("Returning from reset HW due to PHY init failure\n");
return status;
}
 
-- 
1.9.3



[dpdk-dev] [PATCH 3/9] net/ixgbe/base: add link block check for KR

2017-02-28 Thread Wenzhuo Lu
When setting up link on x550 KR devices, should check
if there're constraints on link from manageability, which
may result in link loss.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x550.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index a28ced1..1058684 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -2663,6 +2663,9 @@ s32 ixgbe_setup_kr_x550em(struct ixgbe_hw *hw)
if (hw->phy.autoneg_advertised & IXGBE_LINK_SPEED_2_5GB_FULL)
return IXGBE_SUCCESS;
 
+   if (ixgbe_check_reset_blocked(hw))
+   return 0;
+
return ixgbe_setup_kr_speed_x550em(hw, hw->phy.autoneg_advertised);
 }
 
-- 
1.9.3



[dpdk-dev] [PATCH 2/9] net/ixgbe/base: remove X550em SFP iXFI setup

2017-02-28 Thread Wenzhuo Lu
Removes X550em SFP iXFI setup since there is no released
HW production with SFP iXFI.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_x550.c | 57 +++--
 1 file changed, 11 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index cf72bba..a28ced1 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -2694,53 +2694,18 @@ s32 ixgbe_setup_mac_link_sfp_x550em(struct ixgbe_hw *hw,
if (ret_val != IXGBE_SUCCESS)
return ret_val;
 
-   if (!(hw->phy.nw_mng_if_sel & IXGBE_NW_MNG_IF_SEL_INT_PHY_MODE)) {
-   /* Configure CS4227 LINE side to 10G SR. */
-   reg_slice = IXGBE_CS4227_LINE_SPARE22_MSB +
-   (hw->bus.lan_id << 12);
-   reg_val = IXGBE_CS4227_SPEED_10G;
-   ret_val = hw->link.ops.write_link(hw, hw->link.addr, reg_slice,
- reg_val);
-
-   reg_slice = IXGBE_CS4227_LINE_SPARE24_LSB +
-   (hw->bus.lan_id << 12);
+   /* Configure internal PHY for KR/KX. */
+   ixgbe_setup_kr_speed_x550em(hw, speed);
+
+   /* Configure CS4227 LINE side to proper mode. */
+   reg_slice = IXGBE_CS4227_LINE_SPARE24_LSB +
+   (hw->bus.lan_id << 12);
+   if (setup_linear)
+   reg_val = (IXGBE_CS4227_EDC_MODE_CX1 << 1) | 0x1;
+   else
reg_val = (IXGBE_CS4227_EDC_MODE_SR << 1) | 0x1;
-   ret_val = hw->link.ops.write_link(hw, hw->link.addr, reg_slice,
- reg_val);
-
-   /* Configure CS4227 for HOST connection rate then type. */
-   reg_slice = IXGBE_CS4227_HOST_SPARE22_MSB +
-   (hw->bus.lan_id << 12);
-   reg_val = (speed & IXGBE_LINK_SPEED_10GB_FULL) ?
-   IXGBE_CS4227_SPEED_10G : IXGBE_CS4227_SPEED_1G;
-   ret_val = hw->link.ops.write_link(hw, hw->link.addr, reg_slice,
- reg_val);
-
-   reg_slice = IXGBE_CS4227_HOST_SPARE24_LSB +
-   (hw->bus.lan_id << 12);
-   if (setup_linear)
-   reg_val = (IXGBE_CS4227_EDC_MODE_CX1 << 1) | 0x1;
-   else
-   reg_val = (IXGBE_CS4227_EDC_MODE_SR << 1) | 0x1;
-   ret_val = hw->link.ops.write_link(hw, hw->link.addr, reg_slice,
- reg_val);
-
-   /* Setup XFI internal link. */
-   ret_val = ixgbe_setup_ixfi_x550em(hw, &speed);
-   } else {
-   /* Configure internal PHY for KR/KX. */
-   ixgbe_setup_kr_speed_x550em(hw, speed);
-
-   /* Configure CS4227 LINE side to proper mode. */
-   reg_slice = IXGBE_CS4227_LINE_SPARE24_LSB +
-   (hw->bus.lan_id << 12);
-   if (setup_linear)
-   reg_val = (IXGBE_CS4227_EDC_MODE_CX1 << 1) | 0x1;
-   else
-   reg_val = (IXGBE_CS4227_EDC_MODE_SR << 1) | 0x1;
-   ret_val = hw->link.ops.write_link(hw, hw->link.addr, reg_slice,
- reg_val);
-   }
+   ret_val = hw->link.ops.write_link(hw, hw->link.addr, reg_slice,
+ reg_val);
return ret_val;
 }
 
-- 
1.9.3



[dpdk-dev] [PATCH 4/9] net/ixgbe/base: add bit for enabling L3/L4 filtering

2017-02-28 Thread Wenzhuo Lu
Add a L3/L4 filtering definition of register MRQC for
the future use.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_type.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ixgbe/base/ixgbe_type.h 
b/drivers/net/ixgbe/base/ixgbe_type.h
index bb1f85b..6acd966 100644
--- a/drivers/net/ixgbe/base/ixgbe_type.h
+++ b/drivers/net/ixgbe/base/ixgbe_type.h
@@ -2622,6 +2622,7 @@ enum {
 #define IXGBE_MRQC_VMDQRSS64EN 0x000B /* VMDq2 64 pools w/ RSS */
 #define IXGBE_MRQC_VMDQRT8TCEN 0x000C /* VMDq2/RT 16 pool 8 TC */
 #define IXGBE_MRQC_VMDQRT4TCEN 0x000D /* VMDq2/RT 32 pool 4 TC */
+#define IXGBE_MRQC_L3L4TXSWEN  0x8000 /* Enable L3/L4 Tx switch */
 #define IXGBE_MRQC_RSS_FIELD_MASK  0x
 #define IXGBE_MRQC_RSS_FIELD_IPV4_TCP  0x0001
 #define IXGBE_MRQC_RSS_FIELD_IPV4  0x0002
-- 
1.9.3



[dpdk-dev] [PATCH 7/9] net/ixgbe: support xcast promisc mode

2017-02-28 Thread Wenzhuo Lu
Add the support of xcast promiscuous mode. It's
added in mailbox v1.3.
Move the definition of xcast mode to base code.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_mbx.h | 8 
 drivers/net/ixgbe/base/ixgbe_vf.c  | 4 
 drivers/net/ixgbe/ixgbe_ethdev.c   | 6 --
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.h 
b/drivers/net/ixgbe/base/ixgbe_mbx.h
index 7556a81..bde50a5 100644
--- a/drivers/net/ixgbe/base/ixgbe_mbx.h
+++ b/drivers/net/ixgbe/base/ixgbe_mbx.h
@@ -114,6 +114,14 @@ enum ixgbe_pfvf_api_rev {
 #define IXGBE_VF_GET_RSS_KEY   0x0b/* get RSS key */
 #define IXGBE_VF_UPDATE_XCAST_MODE 0x0c
 
+/* mode choices for IXGBE_VF_UPDATE_XCAST_MODE */
+enum ixgbevf_xcast_modes {
+   IXGBEVF_XCAST_MODE_NONE = 0,
+   IXGBEVF_XCAST_MODE_MULTI,
+   IXGBEVF_XCAST_MODE_ALLMULTI,
+   IXGBEVF_XCAST_MODE_PROMISC,
+};
+
 /* GET_QUEUES return data indices within the mailbox */
 #define IXGBE_VF_TX_QUEUES 1   /* number of Tx queues supported */
 #define IXGBE_VF_RX_QUEUES 2   /* number of Rx queues supported */
diff --git a/drivers/net/ixgbe/base/ixgbe_vf.c 
b/drivers/net/ixgbe/base/ixgbe_vf.c
index 8775ee5..b513190 100644
--- a/drivers/net/ixgbe/base/ixgbe_vf.c
+++ b/drivers/net/ixgbe/base/ixgbe_vf.c
@@ -432,6 +432,10 @@ s32 ixgbevf_update_xcast_mode(struct ixgbe_hw *hw, int 
xcast_mode)
 
switch (hw->api_version) {
case ixgbe_mbox_api_12:
+   /* New modes were introduced in 1.3 version */
+   if (xcast_mode > IXGBEVF_XCAST_MODE_ALLMULTI)
+   return IXGBE_ERR_FEATURE_NOT_SUPPORTED;
+   /* Fall through */
case ixgbe_mbox_api_13:
break;
default:
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 7169007..2e497a8 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -155,12 +155,6 @@
 #define IXGBE_QDE_STRIP_TAG0x0004
 #define IXGBE_VTEICR_MASK  0x07
 
-enum ixgbevf_xcast_modes {
-   IXGBEVF_XCAST_MODE_NONE = 0,
-   IXGBEVF_XCAST_MODE_MULTI,
-   IXGBEVF_XCAST_MODE_ALLMULTI,
-};
-
 #define IXGBE_EXVET_VET_EXT_SHIFT  16
 #define IXGBE_DMATXCTL_VT_MASK 0x
 
-- 
1.9.3



[dpdk-dev] [PATCH 6/9] net/ixgbe/base: disable FC for 15B0

2017-02-28 Thread Wenzhuo Lu
Make sure that ixgbe_device_supports_autoneg_fc()
returns false and hw->fc.disable_fc_autoneg is set
to true to avoid running the fc_autoneg function
for the device 15B0, as this device doesn't support
this function.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_common.c | 5 -
 drivers/net/ixgbe/base/ixgbe_x550.c   | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_common.c 
b/drivers/net/ixgbe/base/ixgbe_common.c
index df6fbf9..4dabb43 100644
--- a/drivers/net/ixgbe/base/ixgbe_common.c
+++ b/drivers/net/ixgbe/base/ixgbe_common.c
@@ -189,7 +189,10 @@ bool ixgbe_device_supports_autoneg_fc(struct ixgbe_hw *hw)
 
break;
case ixgbe_media_type_backplane:
-   supported = true;
+   if (hw->device_id == IXGBE_DEV_ID_X550EM_X_XFI)
+   supported = false;
+   else
+   supported = true;
break;
case ixgbe_media_type_copper:
/* only some copper devices support flow control autoneg */
diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c 
b/drivers/net/ixgbe/base/ixgbe_x550.c
index 1058684..3049865 100644
--- a/drivers/net/ixgbe/base/ixgbe_x550.c
+++ b/drivers/net/ixgbe/base/ixgbe_x550.c
@@ -4005,6 +4005,9 @@ s32 ixgbe_setup_fc_X550em(struct ixgbe_hw *hw)
/* This device does not fully support AN. */
hw->fc.disable_fc_autoneg = true;
break;
+   case IXGBE_DEV_ID_X550EM_X_XFI:
+   hw->fc.disable_fc_autoneg = true;
+   break;
default:
break;
}
-- 
1.9.3



[dpdk-dev] [PATCH 8/9] net/ixgbe/base: fix build error

2017-02-28 Thread Wenzhuo Lu
Fix ICC build error by removing the EWARN third parameter.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/base/ixgbe_phy.c | 23 ++-
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ixgbe/base/ixgbe_phy.c 
b/drivers/net/ixgbe/base/ixgbe_phy.c
index c953805..d2d06cf 100644
--- a/drivers/net/ixgbe/base/ixgbe_phy.c
+++ b/drivers/net/ixgbe/base/ixgbe_phy.c
@@ -1540,16 +1540,7 @@ s32 ixgbe_identify_sfp_module_generic(struct ixgbe_hw 
*hw)
status = IXGBE_SUCCESS;
} else {
if (hw->allow_unsupported_sfp == true) {
-   EWARN(hw, "WARNING: Intel (R) Network "
- "Connections are quality tested "
- "using Intel (R) Ethernet Optics."
- " Using untested modules is not "
- "supported and may cause unstable"
- " operation or damage to the "
- "module or the adapter. Intel "
- "Corporation is not responsible "
- "for any harm caused by using "
- "untested modules.\n", status);
+   EWARN(hw, "WARNING: Intel (R) Network 
Connections are quality tested using Intel (R) Ethernet Optics. Using untested 
modules is not supported and may cause unstable operation or damage to the 
module or the adapter. Intel Corporation is not responsible for any harm caused 
by using untested modules.\n");
status = IXGBE_SUCCESS;
} else {
DEBUGOUT("SFP+ module not supported\n");
@@ -1802,16 +1793,7 @@ s32 ixgbe_identify_qsfp_module_generic(struct ixgbe_hw 
*hw)
status = IXGBE_SUCCESS;
} else {
if (hw->allow_unsupported_sfp == true) {
-   EWARN(hw, "WARNING: Intel (R) Network "
- "Connections are quality tested "
- "using Intel (R) Ethernet Optics."
- " Using untested modules is not "
- "supported and may cause unstable"
- " operation or damage to the "
- "module or the adapter. Intel "
- "Corporation is not responsible "
- "for any harm caused by using "
- "untested modules.\n", status);
+   EWARN(hw, "WARNING: Intel (R) Network 
Connections are quality tested using Intel (R) Ethernet Optics. Using untested 
modules is not supported and may cause unstable operation or damage to the 
module or the adapter. Intel Corporation is not responsible for any harm caused 
by using untested modules.\n");
status = IXGBE_SUCCESS;
} else {
DEBUGOUT("QSFP module not supported\n");
@@ -1836,7 +1818,6 @@ s32 ixgbe_identify_qsfp_module_generic(struct ixgbe_hw 
*hw)
return IXGBE_ERR_SFP_NOT_PRESENT;
 }
 
-
 /**
  *  ixgbe_get_sfp_init_sequence_offsets - Provides offset of PHY init sequence
  *  @hw: pointer to hardware structure
-- 
1.9.3



[dpdk-dev] [PATCH 9/9] net/ixgbe/base: update readme

2017-02-28 Thread Wenzhuo Lu
Signed-off-by: Wenzhuo Lu 
---
 doc/guides/rel_notes/release_17_05.rst | 8 
 drivers/net/ixgbe/base/README  | 4 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_05.rst 
b/doc/guides/rel_notes/release_17_05.rst
index e25ea9f..ab87e81 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -41,6 +41,14 @@ New Features
  Also, make sure to start the actual text at the margin.
  =
 
+* **Updated the ixgbe base driver.**
+
+  Updated the ixgbe base driver, including the following changes:
+
+  * Add link block check for KR.
+  * Complete HW initialization even if SFP is not present.
+  * Add VF xcast promiscuous mode.
+
 
 Resolved Issues
 ---
diff --git a/drivers/net/ixgbe/base/README b/drivers/net/ixgbe/base/README
index 0a6054f..49bb27c 100644
--- a/drivers/net/ixgbe/base/README
+++ b/drivers/net/ixgbe/base/README
@@ -1,7 +1,7 @@
 ..
  BSD LICENSE

- Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  All rights reserved.

  Redistribution and use in source and binary forms, with or without
@@ -34,7 +34,7 @@ Intel® IXGBE driver
 ===
 
 This directory contains source code of FreeBSD ixgbe driver of version
-cid-10g-shared-code.2017.01.05 released by the team which develop
+cid-10g-shared-code.2017.02.27 released by the team which develop
 basic drivers for any ixgbe NIC. The sub-directory of base/
 contains the original source package.
 This driver is valid for the product(s) listed below
-- 
1.9.3



Re: [dpdk-dev] net/bonding - need for 8023ad slave to be in promisc mode

2017-02-28 Thread sujith sankar
Hi Declan,

Got your name from the maintainers' list.
Could you please help me with this?

Thanks,
-Sujith

On Mon, Feb 27, 2017 at 1:40 PM, sujith sankar  wrote:
> Hi folks,
>
> Could someone clarify the need to put mode 4 bonding slaves in promiscuous
> mode?
>
> 840 void
> 841 bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev, uint8_t
> slave_id)
> 842 {
>  .
>  .
>  .
> 878 /* use this port as agregator */
> 879 port->aggregator_port_id = slave_id;
> 880 rte_eth_promiscuous_enable(slave_id);
>
>
> The Linux kernel mode bonding driver gets it working by assigning the
> bond/primary slave MAC address to the rest of the slaves.  Is there a reason
> for not using this method?
>
> I tried this using my app on DPDK 16.04 on 82599, but could not get rx
> working.  It would be great if someone could help me with this.
>
> Thanks,
> -Sujith


Re: [dpdk-dev] What is the max size of packets rte_eth_tx_burst() can send practically/theoritically?

2017-02-28 Thread Stephen Hemminger
On Tue, 28 Feb 2017 20:28:21 -0800
Joo Kim  wrote:

> Hello,
> 
> 
> For example, as I understand,  ovs-dpdk code  uses a buffer of size 32
> when it transmits via rte_eth_tx_burst().
> 
> I think it can transmit more packets in a bust.
> I know there should be a balance between throughput and latency. But, I am
> wondering what max size packets  a dpdk application can give  to
> rte_eth_tx_burst()  practically/theoretically?
> (I see some max macro in ixgbe_rxtx.h)
> 
> Thanks

You could theoretically transmit a burst size up to the configured number of TX 
descriptors.
The downside is you will add latency and have to handle the TX ring getting 
full more often.
Bigger burst sizes really don't win that much. 50% of the gain happens by just 
sending 2 at a time.


Re: [dpdk-dev] [RFC PATCH] net/virtio: Align Virtio-net header on cache line in receive path

2017-02-28 Thread Maxime Coquelin



On 02/23/2017 06:49 AM, Yuanhan Liu wrote:

On Wed, Feb 22, 2017 at 10:36:36AM +0100, Maxime Coquelin wrote:



On 02/22/2017 02:37 AM, Yuanhan Liu wrote:

On Tue, Feb 21, 2017 at 06:32:43PM +0100, Maxime Coquelin wrote:

This patch aligns the Virtio-net header on a cache-line boundary to
optimize cache utilization, as it puts the Virtio-net header (which
is always accessed) on the same cache line as the packet header.

For example with an application that forwards packets at L2 level,
a single cache-line will be accessed with this patch, instead of
two before.


I'm assuming you were testing pkt size <= (64 - hdr_size)?


No, I tested with 64 bytes packets only.


Oh, my bad, I overlooked it. While you were saying "a single cache
line", I was thinking putting the virtio net hdr and the "whole"
packet data in single cache line, which is not possible for pkt
size 64B.


I run some more tests this morning with different packet sizes,
and also with changing the mbuf size on guest side to have multi-
buffers packets:

+---+++-+
| Txpkt | Rxmbuf | v17.02 | v17.02 + vnet hdr align |
+---+++-+
|64 |   2048 |  11.05 |   11.78 |
|   128 |   2048 |  10.66 |   11.48 |
|   256 |   2048 |  10.47 |   11.21 |
|   512 |   2048 |  10.22 |   10.88 |
|  1024 |   2048 |   7.65 |7.84 |
|  1500 |   2048 |   6.25 |6.45 |
|  2000 |   2048 |   5.31 |5.43 |
|  2048 |   2048 |   5.32 |4.25 |
|  1500 |512 |   3.89 |3.98 |
|  2048 |512 |   1.96 |2.02 |
+---+++-+


Could you share more info, say is it a PVP test? Is mergeable on?
What's the fwd mode?


No, this is not PVP benchmark, I have neither another server nor a 
packet generator connected to my Haswell machine back-to-back.


This is simple micro-benchmark, vhost PMD in txonly, Virtio PMD in
rxonly. In this configuration, mergeable is ON and no offload disabled
in QEMU cmdline.

That's why I would be interested in more testing on recent hardware
with PVP benchmark. Is it something that could be run in Intel lab?

I did some more trials, and I think that most of the gain seen in this
microbenchmark  could happen in fact on vhost side.
Indeed, I monitored the number of packets dequeued at each .rx_pkt_burst()
call, and I can see there are packets in the vq only once every 20
calls. On Vhost side, monitoring shows that it always succeeds to write
its burts, i.e. the vq is never full.


In case of multi-buffers packets, next segments will be aligned on
a cache-line boundary, instead of cache-line boundary minus size of
vnet header before.


The another thing is, this patch always makes the pkt data cache
unaligned for the first packet, which makes Zhihong's optimization
on memcpy (for big packet) useless.

   commit f5472703c0bdfc29c46fc4b2ca445bce3dc08c9f
   Author: Zhihong Wang 
   Date:   Tue Dec 6 20:31:06 2016 -0500


I did run some loopback test with large packet also, an I see a small gain
with my patch (fwd io on both ends):

+---+++-+
| Txpkt | Rxmbuf | v17.02 | v17.02 + vnet hdr align |
+---+++-+
|  1500 |   2048 |   4.05 |4.14 |
+---+++-+


Wierd, that basically means Zhihong's patch doesn't work? Could you add
one more colum here: what's the data when roll back to the point without
Zhihong's commit?


I add this to my ToDo list, don't expect results before next week.



   Signed-off-by: Zhihong Wang 
   Reviewed-by: Yuanhan Liu 
   Tested-by: Lei Yao 


Does this need to be cache-line aligned?


Nope, the alignment size is different with different platforms. AVX512
needs a 64B alignment, while AVX2 needs 32B alignment.


I also tried to align pkt on 16bytes boundary, basically putting header
at HEADROOM + 4 bytes offset, but I didn't measured any gain on
Haswell,


The fast rte_memcpy path (when dst & src is well aligned) on Haswell
(with AVX2) requires 32B alignment. Even the 16B boundary would make
it into the slow path. From this point of view, the extra pad does
not change anything. Thus, no gain is expected.


and even a drop on SandyBridge.


That's weird, SandyBridge requries the 16B alignment, meaning the extra
pad should put it into fast path of rte_memcpy, whereas the performance
is worse.


Thanks for the info, I will run more tests to explain this.

Cheers,
Maxime


--yliu


I understand your point regarding aligned memcpy, but I'm surprised I
don't see its expected superiority with my benchmarks.
Any thoughts?


Re: [dpdk-dev] [PATCH 0/7] virtio/vhost: Add MTU feature support

2017-02-28 Thread Maxime Coquelin



On 02/23/2017 08:10 AM, Yuanhan Liu wrote:

On Mon, Feb 13, 2017 at 03:28:13PM +0100, Maxime Coquelin wrote:

This series adds support to new Virtio's MTU feature[1].


Seems you missed a link here?

Oops, I meant this link:
https://www.spinics.net/lists/linux-virtualization/msg28908.html

Will fix in v2.




The MTU
value is set via QEMU parameters.

If the feature is negotiated (i.e supported by both host andcguest,
and valid MTU value is set in QEMU via its host_mtu parameter), QEMU
shares the configured MTU value throught dedicated Vhost protocol
feature.

On vhost side, the value is stored in the virtio_net structure, and
made available to the application thanks to new vhost lib's
rte_vhost_mtu_get() function.

rte_vhost_mtu_set() functions is implemented, but only succeed if the
application sets the same value as the one set in QEMU. Idea is that
it would be used for the application to ensure configured MTU value
is consistent, but maybe the mtu_get() API is enough, and mtu_set()
could just be dropped.


If the vhost MTU is designed to be read-only, then we may should drop
the set_mtu function.


Ok, I will remove it in next revision.


Vhost PMD mtu_set callback is also implemented
in the same spirit.

To be able to set eth_dev's MTU value at the right time, i.e. to call
rte_vhost_mtu_get() just after Virtio features have been negotiated
and before the device is really started, a new vhost flag has been
introduced (VIRTIO_DEV_READY), because the VIRTIO_DEV_RUNNING flag is
set too late (after .new_device() ops is called).


Okay, and I think this kind of info should be in corresponding commit log.

Sure.



Regarding valid MTU values, the maximum MTU value accepted on vhost
side is 65535 bytes, as defined in Virtio Spec and supported in
Virtio-net Kernel driver. But in Virtio PMD, current maximum frame
size is 9728 bytes (~9700 bytes MTU). So maximum MTU size accepted in
Virtio PMD is the minimum between ~9700 bytes and host's MTU.

Initially, we thought about disabling the rx-mergeable feature when
MTU value was low enough to ensure all received packets would fit in
receive buffers (when offloads are disabled). Doing this, we would
save one cache-miss in the receive path. Problem is that we don't
know the buffers size at Virtio feature neogotiation time.
It might be possible for the application to call the configure
callback again once the Rx queue is set up, but it seems a bit hacky.


Worse, if multiple queue is involved, one queue could have it's own
mempool, meaning the buffer size could be different, whereas the
MTU feature is global.

Indeed, we should ensure that for every queues, the corresponding buf
size can handle a full packet to be able to disable the mergeable
feature.

Let's skip this for now and focus on improving the rx-mergeable path.

Thanks,
Maxime


--yliu


So I decided to skip this optimization for now, even if feedback and
are of course appreciated.

Finally, this series also adds MTU value printing  in testpmd's
"show port info" command when non-zero.

This series target v17.05 release.

Cheers,
Maxime

Maxime Coquelin (7):
  vhost: Enable VIRTIO_NET_F_MTU feature
  vhost: vhost-user: Add MTU protocol feature support
  vhost: Add new ready status flag
  vhost: Add API to get/set MTU value
  net/vhost: Implement mtu_set callback
  net/virtio: Add MTU feature support
  app/testpmd: print MTU value in show port info

 app/test-pmd/config.c   |  5 +
 doc/guides/nics/features/vhost.ini  |  1 +
 doc/guides/nics/features/virtio.ini |  1 +
 drivers/net/vhost/rte_eth_vhost.c   | 18 +++
 drivers/net/virtio/virtio_ethdev.c  | 22 +--
 drivers/net/virtio/virtio_ethdev.h  |  3 ++-
 drivers/net/virtio/virtio_pci.h |  3 +++
 lib/librte_vhost/rte_virtio_net.h   | 31 ++
 lib/librte_vhost/vhost.c| 42 ++-
 lib/librte_vhost/vhost.h|  9 +++-
 lib/librte_vhost/vhost_user.c   | 44 +++--
 lib/librte_vhost/vhost_user.h   |  5 -
 12 files changed, 171 insertions(+), 13 deletions(-)

--
2.9.3


[dpdk-dev] [PATCH 0/2] Add ZUC EEA3 capability to Intel(R) QuickAssist Technology driver

2017-02-28 Thread Arek Kusztal
This patchset add ZUC EEA3 stream cipher capability to Intel(R) QuickAssist 
Technology
driver and corresponding test cases to QAT test suite

Arek Kusztal (2):
  crypto/qat: add ZUC EEA3 cipher capability
  app/test: add ZUC EEA3 test cases to QAT test suite

 app/test/test_cryptodev.c| 12 +++
 doc/guides/cryptodevs/qat.rst|  2 +
 drivers/crypto/qat/qat_adf/qat_algs.h| 11 ++-
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 97 
 drivers/crypto/qat/qat_crypto.c  | 34 -
 5 files changed, 137 insertions(+), 19 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH 1/2] crypto/qat: add ZUC EEA3 cipher capability

2017-02-28 Thread Arek Kusztal
This commit adds ZUC EEA3 cipher capability to Intel(R) QuickAssist
Technology driver

Signed-off-by: Arek Kusztal 
---
 doc/guides/cryptodevs/qat.rst|  2 +
 drivers/crypto/qat/qat_adf/qat_algs.h| 11 ++-
 drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 97 
 drivers/crypto/qat/qat_crypto.c  | 34 -
 4 files changed, 125 insertions(+), 19 deletions(-)

diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst
index 9ecd19b..79b9c9d 100644
--- a/doc/guides/cryptodevs/qat.rst
+++ b/doc/guides/cryptodevs/qat.rst
@@ -55,6 +55,7 @@ Cipher algorithms:
 * ``RTE_CRYPTO_CIPHER_NULL``
 * ``RTE_CRYPTO_CIPHER_KASUMI_F8``
 * ``RTE_CRYPTO_CIPHER_DES_CBC``
+* ``RTE_CRYPTO_CIPHER_ZUC_EEA3``
 
 Hash algorithms:
 
@@ -79,6 +80,7 @@ Limitations
 * SNOW 3G (UEA2) and KASUMI (F8) supported only if cipher length, cipher 
offset fields are byte-aligned.
 * SNOW 3G (UIA2) and KASUMI (F9) supported only if hash length, hash offset 
fields are byte-aligned.
 * No BSD support as BSD QAT kernel driver not available.
+* ZUC EEA3 is not supported by dh895xcc devices
 
 
 Installation
diff --git a/drivers/crypto/qat/qat_adf/qat_algs.h 
b/drivers/crypto/qat/qat_adf/qat_algs.h
index b9e3fd6..37f64d4 100644
--- a/drivers/crypto/qat/qat_adf/qat_algs.h
+++ b/drivers/crypto/qat/qat_adf/qat_algs.h
@@ -80,6 +80,14 @@ struct qat_alg_buf {
uint64_t addr;
 } __rte_packed;
 
+enum qat_crypto_proto_flag {
+   QAT_CRYPTO_PROTO_FLAG_NONE = 0,
+   QAT_CRYPTO_PROTO_FLAG_CCM = 1,
+   QAT_CRYPTO_PROTO_FLAG_GCM = 2,
+   QAT_CRYPTO_PROTO_FLAG_SNOW3G = 3,
+   QAT_CRYPTO_PROTO_FLAG_ZUC = 4
+};
+
 /*
  * Maximum number of SGL entries
  */
@@ -143,7 +151,7 @@ int qat_alg_aead_session_create_content_desc_auth(struct 
qat_session *cdesc,
unsigned int operation);
 
 void qat_alg_init_common_hdr(struct icp_qat_fw_comn_req_hdr *header,
-   uint16_t proto);
+   enum qat_crypto_proto_flag 
proto_flags);
 
 void qat_alg_ablkcipher_init_enc(struct qat_alg_ablkcipher_cd *cd,
int alg, const uint8_t *key,
@@ -158,4 +166,5 @@ int qat_alg_validate_snow3g_key(int key_len, enum 
icp_qat_hw_cipher_algo *alg);
 int qat_alg_validate_kasumi_key(int key_len, enum icp_qat_hw_cipher_algo *alg);
 int qat_alg_validate_3des_key(int key_len, enum icp_qat_hw_cipher_algo *alg);
 int qat_alg_validate_des_key(int key_len, enum icp_qat_hw_cipher_algo *alg);
+int qat_alg_validate_zuc_key(int key_len, enum icp_qat_hw_cipher_algo *alg);
 #endif
diff --git a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c 
b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
index fbeef0a..3831d19 100644
--- a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
+++ b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c
@@ -422,7 +422,7 @@ static int qat_alg_do_precomputes(enum icp_qat_hw_auth_algo 
hash_alg,
 }
 
 void qat_alg_init_common_hdr(struct icp_qat_fw_comn_req_hdr *header,
-   uint16_t proto)
+   enum qat_crypto_proto_flag proto_flags)
 {
PMD_INIT_FUNC_TRACE();
header->hdr_flags =
@@ -435,14 +435,60 @@ void qat_alg_init_common_hdr(struct 
icp_qat_fw_comn_req_hdr *header,
  ICP_QAT_FW_LA_PARTIAL_NONE);
ICP_QAT_FW_LA_CIPH_IV_FLD_FLAG_SET(header->serv_specif_flags,
   ICP_QAT_FW_CIPH_IV_16BYTE_DATA);
-   ICP_QAT_FW_LA_PROTO_SET(header->serv_specif_flags,
-   proto);
+
+   switch (proto_flags){
+   case QAT_CRYPTO_PROTO_FLAG_NONE:
+   ICP_QAT_FW_LA_PROTO_SET(header->serv_specif_flags,
+   ICP_QAT_FW_LA_NO_PROTO);
+   break;
+   case QAT_CRYPTO_PROTO_FLAG_CCM:
+   ICP_QAT_FW_LA_PROTO_SET(header->serv_specif_flags,
+   ICP_QAT_FW_LA_CCM_PROTO);
+   break;
+   case QAT_CRYPTO_PROTO_FLAG_GCM:
+   ICP_QAT_FW_LA_PROTO_SET(header->serv_specif_flags,
+   ICP_QAT_FW_LA_GCM_PROTO);
+   break;
+   case QAT_CRYPTO_PROTO_FLAG_SNOW3G:
+   ICP_QAT_FW_LA_PROTO_SET(header->serv_specif_flags,
+   ICP_QAT_FW_LA_SNOW_3G_PROTO);
+   break;
+   case QAT_CRYPTO_PROTO_FLAG_ZUC:
+   ICP_QAT_FW_LA_ZUC_3G_PROTO_FLAG_SET(header->serv_specif_flags,
+   ICP_QAT_FW_LA_ZUC_3G_PROTO);
+   break;
+   }
+
ICP_QAT_FW_LA_UPDATE_STATE_SET(header->serv_specif_flags,
   ICP_QAT_FW_LA_NO_UPDATE_STATE);
ICP_QAT_FW_LA_DIGEST_IN_BUFFER_SET(header->serv_specif_flags,
ICP_QAT_FW_LA_NO_DIGEST_IN_BUFFER);
 

[dpdk-dev] [PATCH 2/2] app/test: add ZUC EEA3 test cases to QAT test suite

2017-02-28 Thread Arek Kusztal
This patch adds ZUC EEA3 cipher test cases to Intel(R) QuickAssist
Technology crypto test suite

Signed-off-by: Arek Kusztal 
---
 app/test/test_cryptodev.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c
index 357a92e..45ddd35 100644
--- a/app/test/test_cryptodev.c
+++ b/app/test/test_cryptodev.c
@@ -7637,6 +7637,18 @@ static struct unit_test_suite cryptodev_qat_testsuite  = 
{
TEST_CASE_ST(ut_setup, ut_teardown,
test_snow3g_auth_cipher_test_case_1),
 
+   /** ZUC encrypt only (EEA3) */
+   TEST_CASE_ST(ut_setup, ut_teardown,
+   test_zuc_encryption_test_case_1),
+   TEST_CASE_ST(ut_setup, ut_teardown,
+   test_zuc_encryption_test_case_2),
+   TEST_CASE_ST(ut_setup, ut_teardown,
+   test_zuc_encryption_test_case_3),
+   TEST_CASE_ST(ut_setup, ut_teardown,
+   test_zuc_encryption_test_case_4),
+   TEST_CASE_ST(ut_setup, ut_teardown,
+   test_zuc_encryption_test_case_5),
+
/** HMAC_MD5 Authentication */
TEST_CASE_ST(ut_setup, ut_teardown,
test_MD5_HMAC_generate_case_1),
-- 
2.7.4