[dpdk-dev] [PATCH] net/mlx5: fix a wrong error handler
The returned error code was wrongly handled causing the PMD to refuse to start. Fixes: 91572d2a0b1a ("net/mlx5: fix startup when flow cannot be applied") CC: sta...@dpdk.org Signed-off-by: Nelio Laranjeiro --- drivers/net/mlx5/mlx5_trigger.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c index 683b57c..0acbf28 100644 --- a/drivers/net/mlx5/mlx5_trigger.c +++ b/drivers/net/mlx5/mlx5_trigger.c @@ -87,7 +87,7 @@ mlx5_dev_start(struct rte_eth_dev *dev) if (dev->data->dev_conf.fdir_conf.mode != RTE_FDIR_MODE_NONE) priv_fdir_enable(priv); err = priv_flow_start(priv); - if (!err) { + if (err) { priv->started = 0; ERROR("%p: an error occurred while configuring flows:" " %s", -- 2.1.4
Re: [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version
On Wed, Feb 22, 2017 at 2:24 PM, Christian Ehrhardt wrote: > --- a/mk/rte.lib.mk > +++ b/mk/rte.lib.mk > @@ -40,6 +40,12 @@ EXTLIB_BUILD ?= n > # VPATH contains at least SRCDIR > VPATH += $(SRCDIR) > > +ifneq ($(CONFIG_RTE_MAJOR_ABI),) > +ifneq ($(LIBABIVER),) > +LIBABIVER := $(CONFIG_RTE_MAJOR_ABI) > +endif > +endif > + > ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y) > LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB)) > ifeq ($(EXTLIB_BUILD),n) > @@ -156,11 +162,7 @@ $(RTE_OUTPUT)/lib/$(LIB): $(LIB) > @[ -d $(RTE_OUTPUT)/lib ] || mkdir -p $(RTE_OUTPUT)/lib > $(Q)cp -f $(LIB) $(RTE_OUTPUT)/lib > ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y) > -ifeq ($(CONFIG_RTE_NEXT_ABI)$(EXTLIB_BUILD),yn) > - $(Q)ln -s -f $< $(basename $(basename $@)) > -else > - $(Q)ln -s -f $< $(basename $@) > -endif > + $(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/') > endif > In case CONFIG_RTE_NEXT_ABI=y is set this is actually generating shared objects with suffix: .so.$(CONFIG_RTE_MAJOR_ABI).1 I don't think that this is the intention.
Re: [dpdk-dev] [PATCH v3 00/10] Rework vdev probing to use rte_bus infrastructure
On Monday 27 February 2017 06:39 PM, Jan Blunck wrote: On Sat, Feb 25, 2017 at 11:28 AM, Jan Blunck wrote: With the rte_bus infrastructure present in 17.02 it is possible to refactor the virtual device probing into a bus. This series also introduces the rte_vdev_device to better keep track of devices. This patchset depends on: http://dpdk.org/dev/patchwork/patch/20416/ http://dpdk.org/dev/patchwork/patch/20417/ Changes since version 2: * implicit bus registration through rte_eal_vdrv_register() On a second thought I don't think that this is correct though since it opens up the possibility of racing an alternative "virtual" bus. I don't think that this is a good thing though. I'll fix this in v4. Thoughts? I prefer the RTE_REGISTER* way. The issue of duplicate bus remains whether we use the macro or the implicit way. If you use RTE_*, do you think that duplicate registration issue is worth fixing? (It would mean rte_bus_register to return error which the caller would then need to handle). Thanks, Jan * explicit delay probing of virtual bus in rte_bus_probe() * addition of rte_vdev_device_args() helper * make virtual driver probe and remove take rte_vdev_device Changes since version 1: * addition of rte_vdev_device_name() helper * removed rte_eal_dev_init() from *.map files * use SOCKET_ID_ANY Jan Blunck (10): eal: probe legacy PCI devices before other bus devices eal: probe new virtual bus after other bus devices eal: move virtual device probing into a bus eal: remove unused rte_eal_dev_init() eal: Refactor vdev driver probe/remove eal: add struct rte_vdev_device eal: add virtual device name helper function eal: add virtual device arguments helper function eal: make virtual bus use rte_vdev_device eal: make virtual driver probe and remove take rte_vdev_device drivers/crypto/null/null_crypto_pmd.c | 18 +- drivers/net/af_packet/rte_eth_af_packet.c | 11 +- drivers/net/bonding/rte_eth_bond_pmd.c | 13 +- drivers/net/mpipe/mpipe_tilegx.c| 10 +- drivers/net/null/rte_eth_null.c | 13 +- drivers/net/pcap/rte_eth_pcap.c | 12 +- drivers/net/ring/rte_eth_ring.c | 9 +- drivers/net/tap/rte_eth_tap.c | 10 +- drivers/net/vhost/rte_eth_vhost.c | 10 +- drivers/net/virtio/virtio_user_ethdev.c | 18 +- drivers/net/xenvirt/rte_eth_xenvirt.c | 9 +- lib/librte_eal/bsdapp/eal/eal.c | 9 +- lib/librte_eal/bsdapp/eal/rte_eal_version.map | 1 - lib/librte_eal/common/eal_common_bus.c | 16 +- lib/librte_eal/common/eal_common_dev.c | 28 --- lib/librte_eal/common/eal_common_vdev.c | 259 +--- lib/librte_eal/common/include/rte_dev.h | 5 - lib/librte_eal/common/include/rte_vdev.h| 26 ++- lib/librte_eal/linuxapp/eal/eal.c | 9 +- lib/librte_eal/linuxapp/eal/rte_eal_version.map | 1 - 20 files changed, 355 insertions(+), 132 deletions(-) -- 2.7.4
[dpdk-dev] Pktgen-DPDK build errors on ppc64le
Hi All, We are seeing Pktgen-DPDK not building on ppc64le with latest dpdk git repo. Below build errors are seen: - [pktgen-dpdk]# make == lib == common CC l2p.o In file included from /root/pktgen-dpdk/lib/common/l2p.c:88:0: /root/pktgen-dpdk/lib/common/l2p.h: In function ‘pg_raw_dump_l2p’: /root/pktgen-dpdk/lib/common/l2p.h:103:2: error: comparison is always true due to limited range of data type [-Werror=type-limits] for (i = 0; i < RTE_MAX_LCORE; i++) ^ /root/pktgen-dpdk/lib/common/l2p.h:109:2: error: comparison is always true due to limited range of data type [-Werror=type-limits] for (j = 0; j < RTE_MAX_LCORE; j++) { ^ /root/pktgen-dpdk/lib/common/l2p.h: In function ‘pg_dump_l2p’: /root/pktgen-dpdk/lib/common/l2p.h:486:2: error: comparison is always true due to limited range of data type [-Werror=type-limits] for (lid = 0; lid < RTE_MAX_LCORE; lid++) { ^ /root/pktgen-dpdk/lib/common/l2p.c: In function ‘pg_port_matrix_dump’: /root/pktgen-dpdk/lib/common/l2p.c:477:3: error: large integer implicitly truncated to unsigned type [-Werror=overflow] cnt.rxtx = get_map(l2p, pid, RTE_MAX_LCORE); ^ /root/pktgen-dpdk/lib/common/l2p.c:488:3: error: large integer implicitly truncated to unsigned type [-Werror=overflow] cnt.rxtx = get_map(l2p, pid, RTE_MAX_LCORE); ^ cc1: all warnings being treated as errors make[3]: *** [l2p.o] Error 1 make[2]: *** [all] Error 2 make[1]: *** [common] Error 2 make: *** [lib] Error 2 - By changing CONFIG_RTE_MAX_LCORE to 128 in defconfig_ppc_64-power8-linuxapp-gcc file, Pktgen-DPDK then builds fine. Thanks, Rahul
Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
On Fri, Feb 24, 2017 at 3:00 PM, Olivier Matz wrote: > > In my opinion, if we have the room in the first cache line, we should > put it there. The only argument I see against is "we may find something > more important in the future, and we won't have room for it in the > first cache line". I don't feel we should penalize today's use cases for > hypothetic future use cases. > > > >> 2. timestamp normalization point >> inside PMD RX vs somewhere later as user needs it (extra >> function in dev_ops?). > > This point could be changed. My initial proposition tries to provide a > generic API for timestamp. Let me remind it here: > > a- the timestamp is in nanosecond > b- the reference is always the same for a given path: if the timestamp >is set in a PMD, all the packets for this PMD will have the same >reference, but for 2 different PMDs (or a sw lib), the reference >would not be the same. > > We may remove a-, and just have: > - the reference and the unit are always the same for a given path: if >the timestamp is set in a PMD, all the packets for this PMD will have >the same reference and unit, but for 2 different PMDs (or a sw lib), >they would not be the same. > > In both cases, we would need a conversion code (maybe in a library) if > the application wants to work with timestamps from several sources. The > second solution removes the normalization code in the PMD when not > needed, it is probably better. I agree. > > About having the timestamp in the packet data, I don't think it is > a good solution for a generic API in DPDK. The timestamp is a metadata, > it has to go in the mbuf metadata. The packet data should not be > modified when the timestamp is enabled. Good NICs already do that based on the packet type (e.g. NTP/PTP packets). > > But this would not prevent to have driver-specific features to do that. > In that case, the application will be aware that it is using this > specific driver and that it will receive a timestamp in the packet data. > > To summarize, the generic API could be: > - an ethdev API to enable the timestamp in a PMD for received packets > - a mbuf flag "timestamp present" > - a mbuf 64b field to store the timestamp value > > Additionally, a driver-specific API can be added for a given PMD. > Example: > - the generic timestamp ethdev is disabled (or not supported) > - a driver-specific feature "put timestamp in packet" is enabled > It would have no additional cost compared to what we have today, since > the timestamp in mbuf is not read/written. > Thanks for the writeup. This sounds like a reasonable approach to me. Do you still want to call the 64bit field "timestamp" or rename it to something neutral and document that it is used together with the mbuf flags?
Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
Hi everyone, > > > > In my opinion, if we have the room in the first cache line, we should > > put it there. The only argument I see against is "we may find something > > more important in the future, and we won't have room for it in the > > first cache line". I don't feel we should penalize today's use cases for > > hypothetic future use cases. > > > > > > > >> 2. timestamp normalization point > >> inside PMD RX vs somewhere later as user needs it (extra > >> function in dev_ops?). > > > > This point could be changed. My initial proposition tries to provide a > > generic API for timestamp. Let me remind it here: > > > > a- the timestamp is in nanosecond > > b- the reference is always the same for a given path: if the timestamp > >is set in a PMD, all the packets for this PMD will have the same > >reference, but for 2 different PMDs (or a sw lib), the reference > >would not be the same. > > > > We may remove a-, and just have: > > - the reference and the unit are always the same for a given path: if > >the timestamp is set in a PMD, all the packets for this PMD will have > >the same reference and unit, but for 2 different PMDs (or a sw lib), > >they would not be the same. > > > > In both cases, we would need a conversion code (maybe in a library) if > > the application wants to work with timestamps from several sources. The > > second solution removes the normalization code in the PMD when not > > needed, it is probably better. > > I agree. One question - does that mean that application would need to keep a track from what PMD each particular packet came to do the normalization? Konstantin > > > > > About having the timestamp in the packet data, I don't think it is > > a good solution for a generic API in DPDK. The timestamp is a metadata, > > it has to go in the mbuf metadata. The packet data should not be > > modified when the timestamp is enabled. > > Good NICs already do that based on the packet type (e.g. NTP/PTP packets). > > > > > But this would not prevent to have driver-specific features to do that. > > In that case, the application will be aware that it is using this > > specific driver and that it will receive a timestamp in the packet data. > > > > To summarize, the generic API could be: > > - an ethdev API to enable the timestamp in a PMD for received packets > > - a mbuf flag "timestamp present" > > - a mbuf 64b field to store the timestamp value > > > > Additionally, a driver-specific API can be added for a given PMD. > > Example: > > - the generic timestamp ethdev is disabled (or not supported) > > - a driver-specific feature "put timestamp in packet" is enabled > > It would have no additional cost compared to what we have today, since > > the timestamp in mbuf is not read/written. > > > > Thanks for the writeup. This sounds like a reasonable approach to me. > > Do you still want to call the 64bit field "timestamp" or rename it to > something neutral and document that it is used together with the mbuf > flags?
Re: [dpdk-dev] [PATCH v3 00/10] Rework vdev probing to use rte_bus infrastructure
On Tue, Feb 28, 2017 at 9:48 AM, Shreyansh Jain wrote: > On Monday 27 February 2017 06:39 PM, Jan Blunck wrote: >> >> On Sat, Feb 25, 2017 at 11:28 AM, Jan Blunck >> wrote: >>> >>> With the rte_bus infrastructure present in 17.02 it is possible to >>> refactor >>> the virtual device probing into a bus. This series also introduces the >>> rte_vdev_device to better keep track of devices. >>> >>> This patchset depends on: >>> http://dpdk.org/dev/patchwork/patch/20416/ >>> http://dpdk.org/dev/patchwork/patch/20417/ >>> >>> Changes since version 2: >>> * implicit bus registration through rte_eal_vdrv_register() >> >> >> On a second thought I don't think that this is correct though since it >> opens up the possibility of racing an alternative "virtual" bus. I >> don't think that this is a good thing though. I'll fix this in v4. >> >> Thoughts? > > > I prefer the RTE_REGISTER* way. > The issue of duplicate bus remains whether we use the macro or the > implicit way. > > If you use RTE_*, do you think that duplicate registration issue > is worth fixing? > (It would mean rte_bus_register to return error which the caller > would then need to handle). > I don't think that there is a good way to handle this if we use the library constructors. It is better to fail the registration of the bus and log the error via RTE_LOG(). That introduces a dependency on the initialization of the logging system to be available before the library constructors run (before main()).
[dpdk-dev] [PATCH] net/mlx5: fix flow mark action handling
Mark value is always reported even when not requested or invalid. Fixes: ea3bc3b1df94 ("net/mlx5: support mark flow action") CC: sta...@dpdk.org Reported-by: Mark Bloch Signed-off-by: Nelio Laranjeiro Acked-by: Adrien Mazarguil --- drivers/net/mlx5/mlx5_rxtx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index b2b7223..e152776 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -1431,7 +1431,7 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) } if (rxq->mark && ((cqe->sop_drop_qpn != - htonl(MLX5_FLOW_MARK_INVALID)) || + htonl(MLX5_FLOW_MARK_INVALID)) && (cqe->sop_drop_qpn != htonl(MLX5_FLOW_MARK_DEFAULT { pkt->hash.fdir.hi = -- 2.1.4
Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
Hi, On Tue, 28 Feb 2017 09:05:07 +, "Ananyev, Konstantin" wrote: > Hi everyone, > > > > > > > In my opinion, if we have the room in the first cache line, we > > > should put it there. The only argument I see against is "we may > > > find something more important in the future, and we won't have > > > room for it in the first cache line". I don't feel we should > > > penalize today's use cases for hypothetic future use cases. > > > > > > > > > > > >> 2. timestamp normalization point > > >> inside PMD RX vs somewhere later as user needs it (extra > > >> function in dev_ops?). > > > > > > This point could be changed. My initial proposition tries to > > > provide a generic API for timestamp. Let me remind it here: > > > > > > a- the timestamp is in nanosecond > > > b- the reference is always the same for a given path: if the > > > timestamp is set in a PMD, all the packets for this PMD will have > > > the same reference, but for 2 different PMDs (or a sw lib), the > > > reference would not be the same. > > > > > > We may remove a-, and just have: > > > - the reference and the unit are always the same for a given > > > path: if the timestamp is set in a PMD, all the packets for this > > > PMD will have the same reference and unit, but for 2 different > > > PMDs (or a sw lib), they would not be the same. > > > > > > In both cases, we would need a conversion code (maybe in a > > > library) if the application wants to work with timestamps from > > > several sources. The second solution removes the normalization > > > code in the PMD when not needed, it is probably better. > > > > I agree. > > One question - does that mean that application would need to > keep a track from what PMD each particular packet came to do the > normalization? Konstantin I'd say yes. It does not look very difficult to do, since the mbuf contains the input port id. > > > About having the timestamp in the packet data, I don't think it is > > > a good solution for a generic API in DPDK. The timestamp is a > > > metadata, it has to go in the mbuf metadata. The packet data > > > should not be modified when the timestamp is enabled. > > > > Good NICs already do that based on the packet type (e.g. NTP/PTP > > packets). > > > > > > But this would not prevent to have driver-specific features to do > > > that. In that case, the application will be aware that it is > > > using this specific driver and that it will receive a timestamp > > > in the packet data. > > > > > > To summarize, the generic API could be: > > > - an ethdev API to enable the timestamp in a PMD for received > > > packets > > > - a mbuf flag "timestamp present" > > > - a mbuf 64b field to store the timestamp value > > > > > > Additionally, a driver-specific API can be added for a given PMD. > > > Example: > > > - the generic timestamp ethdev is disabled (or not supported) > > > - a driver-specific feature "put timestamp in packet" is enabled > > > It would have no additional cost compared to what we have today, > > > since the timestamp in mbuf is not read/written. > > > > > > > Thanks for the writeup. This sounds like a reasonable approach to > > me. > > > > Do you still want to call the 64bit field "timestamp" or rename it > > to something neutral and document that it is used together with the > > mbuf flags? I think timestamp is a good name. In the current RFC patchset, we have this comment: /** Valid if PKT_RX_TIMESTAMP is set. The unit is nanoseconds */ uint64_t timestamp; We could change it to something like: /** Valid if PKT_RX_TIMESTAMP is set. The unit and time * reference are not normalized but are always the same * for a given port. */ uint64_t timestamp; Regards, Olivier
Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
On Tue, Feb 28, 2017 at 10:05 AM, Ananyev, Konstantin wrote: >> > >> > In both cases, we would need a conversion code (maybe in a library) if >> > the application wants to work with timestamps from several sources. The >> > second solution removes the normalization code in the PMD when not >> > needed, it is probably better. >> >> I agree. > > One question - does that mean that application would need to > keep a track from what PMD each particular packet came to do the > normalization? Yes. You usually do this based on mbuf->port.
Re: [dpdk-dev] [PATCH v3 00/10] Rework vdev probing to use rte_bus infrastructure
> -Original Message- > From: jblu...@gmail.com [mailto:jblu...@gmail.com] On Behalf Of Jan Blunck > Sent: Tuesday, February 28, 2017 2:49 PM > To: Shreyansh Jain > Cc: dev ; David Marchand ; Ferruh > Yigit > Subject: Re: [PATCH v3 00/10] Rework vdev probing to use rte_bus > infrastructure > > On Tue, Feb 28, 2017 at 9:48 AM, Shreyansh Jain > wrote: > > On Monday 27 February 2017 06:39 PM, Jan Blunck wrote: > >> > >> On Sat, Feb 25, 2017 at 11:28 AM, Jan Blunck > >> wrote: > >>> > >>> With the rte_bus infrastructure present in 17.02 it is possible to > >>> refactor > >>> the virtual device probing into a bus. This series also introduces the > >>> rte_vdev_device to better keep track of devices. > >>> > >>> This patchset depends on: > >>> http://dpdk.org/dev/patchwork/patch/20416/ > >>> http://dpdk.org/dev/patchwork/patch/20417/ > >>> > >>> Changes since version 2: > >>> * implicit bus registration through rte_eal_vdrv_register() > >> > >> > >> On a second thought I don't think that this is correct though since it > >> opens up the possibility of racing an alternative "virtual" bus. I > >> don't think that this is a good thing though. I'll fix this in v4. > >> > >> Thoughts? > > > > > > I prefer the RTE_REGISTER* way. > > The issue of duplicate bus remains whether we use the macro or the > > implicit way. > > > > If you use RTE_*, do you think that duplicate registration issue > > is worth fixing? > > (It would mean rte_bus_register to return error which the caller > > would then need to handle). > > > > I don't think that there is a good way to handle this if we use the > library constructors. It is better to fail the registration of the bus > and log the error via RTE_LOG(). That introduces a dependency on the > initialization of the logging system to be available before the > library constructors run (before main()). Agree. Log dependency is the tricky part. Log subsystem is in turn dependent on users input for level.
Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
On Tue, Feb 28, 2017 at 10:23 AM, Olivier Matz wrote: >> > >> > Do you still want to call the 64bit field "timestamp" or rename it >> > to something neutral and document that it is used together with the >> > mbuf flags? > > I think timestamp is a good name. In the current RFC patchset, we have > this comment: > > /** Valid if PKT_RX_TIMESTAMP is set. The unit is nanoseconds */ > uint64_t timestamp; > > We could change it to something like: > > /** Valid if PKT_RX_TIMESTAMP is set. The unit and time > * reference are not normalized but are always the same > * for a given port. > */ > uint64_t timestamp; > Looks good to me. Thanks, Jan
Re: [dpdk-dev] [PATCH v2 11/15] net/avp: packet receive functions
On Mon, Feb 27, 2017 at 05:06:24PM +, Legacy, Allain wrote: > > > -Original Message- > > From: Stephen Hemminger [mailto:step...@networkplumber.org] > > On Sun, 26 Feb 2017 14:08:59 -0500 > > Allain Legacy wrote: > > > > Try not to break error messages onto two lines, it makes it harder when a > > user is trying to find the location of the error message with search tools > That's our normal practice, but checkpatches.sh flagged them as warnings. > If these are acceptable exceptions to the tool then are there plans to > publish an exhaustive list of checkpatches.sh errors/warnings/checks that can > be ignored? Alternatively, are there plans to change the tools to ignore > these automatically and not flag them as issues. As a contributor it is > difficult to tell what will be deemed acceptable and what will not (in some > cases) so the default position is to fix all issues before submission. > In my experience, checkpatch ignores long lines that are due to error messages. Perhaps you need to put the error message on a separate line, if other things before the message are of significant size. /Bruce
Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
> > Hi, > > On Tue, 28 Feb 2017 09:05:07 +, "Ananyev, Konstantin" > wrote: > > Hi everyone, > > > > > > > > > > In my opinion, if we have the room in the first cache line, we > > > > should put it there. The only argument I see against is "we may > > > > find something more important in the future, and we won't have > > > > room for it in the first cache line". I don't feel we should > > > > penalize today's use cases for hypothetic future use cases. > > > > > > > > > > > > > > > >> 2. timestamp normalization point > > > >> inside PMD RX vs somewhere later as user needs it (extra > > > >> function in dev_ops?). > > > > > > > > This point could be changed. My initial proposition tries to > > > > provide a generic API for timestamp. Let me remind it here: > > > > > > > > a- the timestamp is in nanosecond > > > > b- the reference is always the same for a given path: if the > > > > timestamp is set in a PMD, all the packets for this PMD will have > > > > the same reference, but for 2 different PMDs (or a sw lib), the > > > > reference would not be the same. > > > > > > > > We may remove a-, and just have: > > > > - the reference and the unit are always the same for a given > > > > path: if the timestamp is set in a PMD, all the packets for this > > > > PMD will have the same reference and unit, but for 2 different > > > > PMDs (or a sw lib), they would not be the same. > > > > > > > > In both cases, we would need a conversion code (maybe in a > > > > library) if the application wants to work with timestamps from > > > > several sources. The second solution removes the normalization > > > > code in the PMD when not needed, it is probably better. > > > > > > I agree. > > > > One question - does that mean that application would need to > > keep a track from what PMD each particular packet came to do the > > normalization? Konstantin > > I'd say yes. It does not look very difficult to do, since the mbuf > contains the input port id. > I understand that we can use mbuf->port here, but it means that we'll introduce new implicit dependency between timestamp and port values. >From my point that introduces new implications: 1. all PMDs that do set a timestamp would also have to set port value too. Probably not a big deal as most of PMDs do set port value anyway right now, but it means it would be hard to get rid/change mbuf->port in future. 2. Applications would not allowed to change mbuf->port value before normalization is done (from what I heard some apps do update mbuf->port to store routing decisions). BTW, how the app would keep track which mbufs were already normalized, and which were not? 3. In theory with eth_dev_detach() - mbuf->port value might be not valid at the point when application would decide to do normalization. So to me all that approach with delayed normalization seems unnecessary overcomplicated. Original one suggested by Olivier, when normalization is done in PMD at RX look much cleaner and more manageable. Konstantin
Re: [dpdk-dev] [RFC 17.05 v1 0/3] Merge l3fwd-acl and l3fwd
Hi Ravi, > > Thanks to Konstantin and Bruce on first internal review comments. This > patch is RFC for 17.05 to merge l3fwd-acl and l3fwd code and add file > read options to build LPM and EM tables. Thanks for the patch, I think it is really useful one. Can I suggest you re-submit it as non-RFC now, as we are in 17.05 window already? About the patch itself, one question I forgot to ask you before: +++ b/examples/l3fwd/main.c @@ -161,7 +163,9 @@ static struct rte_eth_conf port_conf = { .rx_adv_conf = { .rss_conf = { .rss_key = NULL, - .rss_hf = ETH_RSS_IP, + .rss_hf = ETH_RSS_IP | ETH_RSS_UDP | + ETH_RSS_TCP | ETH_RSS_SCTP, + }, }, Why it is necessary to change RSS hash input values? As another nit - there are few checkpatch warnings, that probably need to be addressed. Apart from that looks good to me. Thanks Konstantin
Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
On Tue, 28 Feb 2017 10:29:41 +, "Ananyev, Konstantin" wrote: > > > > Hi, > > > > On Tue, 28 Feb 2017 09:05:07 +, "Ananyev, Konstantin" > > wrote: > > > Hi everyone, > > > > > > > > > > > > > In my opinion, if we have the room in the first cache line, we > > > > > should put it there. The only argument I see against is "we > > > > > may find something more important in the future, and we won't > > > > > have room for it in the first cache line". I don't feel we > > > > > should penalize today's use cases for hypothetic future use > > > > > cases. > > > > > > > > > > > > > > > > > > > >> 2. timestamp normalization point > > > > >> inside PMD RX vs somewhere later as user needs it (extra > > > > >> function in dev_ops?). > > > > > > > > > > This point could be changed. My initial proposition tries to > > > > > provide a generic API for timestamp. Let me remind it here: > > > > > > > > > > a- the timestamp is in nanosecond > > > > > b- the reference is always the same for a given path: if the > > > > > timestamp is set in a PMD, all the packets for this PMD will > > > > > have the same reference, but for 2 different PMDs (or a sw > > > > > lib), the reference would not be the same. > > > > > > > > > > We may remove a-, and just have: > > > > > - the reference and the unit are always the same for a given > > > > > path: if the timestamp is set in a PMD, all the packets for > > > > > this PMD will have the same reference and unit, but for 2 > > > > > different PMDs (or a sw lib), they would not be the same. > > > > > > > > > > In both cases, we would need a conversion code (maybe in a > > > > > library) if the application wants to work with timestamps from > > > > > several sources. The second solution removes the normalization > > > > > code in the PMD when not needed, it is probably better. > > > > > > > > I agree. > > > > > > One question - does that mean that application would need to > > > keep a track from what PMD each particular packet came to do the > > > normalization? Konstantin > > > > I'd say yes. It does not look very difficult to do, since the mbuf > > contains the input port id. > > > > I understand that we can use mbuf->port here, but it means that we'll > introduce new implicit dependency between timestamp and port values. > From my point that introduces new implications: > 1. all PMDs that do set a timestamp would also have to set port value too. > Probably not a big deal as most of PMDs do set port value anyway right > now, > but it means it would be hard to get rid/change mbuf->port in future. Currently, all PMDs must set m->port. If in the future we remove m->port, the applications that use it will need to store the value in a mbuf metadata, or pass it as arguments through function calls. > 2. Applications would not allowed to change mbuf->port value before > normalization is done > (from what I heard some apps do update mbuf->port to store routing > decisions). > BTW, how the app would keep track which mbufs were already normalized, > and which were not? I don't think it should be allowed to change m->port value. Applications that are doing this are responsible of what they change. > 3. In theory with eth_dev_detach() - mbuf->port value might be not valid at > the point when application > would decide to do normalization. > > So to me all that approach with delayed normalization seems unnecessary > overcomplicated. > Original one suggested by Olivier, when normalization is done in PMD at RX > look > much cleaner and more manageable. Detaching a device requires a synchronization between control and data plane, and not only for this use case. In the first solution, the normalization is partial: unit is nanosecond, but the time reference is different. So, after the discussion, I'm more convinced by the second solution. Regards, Olivier
Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
On Thu, Feb 23, 2017 at 05:23:54PM +, Bruce Richardson wrote: > Users compiling DPDK should not need to know or care about the arrangement > of cachelines in the rte_ring structure. Therefore just remove the build > option and set the structures to be always split. For improved > performance use 128B rather than 64B alignment since it stops the producer > and consumer data being on adjacent cachelines. > > Signed-off-by: Bruce Richardson > --- > config/common_base | 1 - > doc/guides/rel_notes/release_17_05.rst | 6 ++ > lib/librte_ring/rte_ring.c | 2 -- > lib/librte_ring/rte_ring.h | 8 ++-- > 4 files changed, 8 insertions(+), 9 deletions(-) > > diff --git a/config/common_base b/config/common_base > index aeee13e..099ffda 100644 > --- a/config/common_base > +++ b/config/common_base > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y > # > CONFIG_RTE_LIBRTE_RING=y > CONFIG_RTE_LIBRTE_RING_DEBUG=n > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n > CONFIG_RTE_RING_PAUSE_REP_COUNT=0 > > # > diff --git a/doc/guides/rel_notes/release_17_05.rst > b/doc/guides/rel_notes/release_17_05.rst > index e25ea9f..ea45e0c 100644 > --- a/doc/guides/rel_notes/release_17_05.rst > +++ b/doc/guides/rel_notes/release_17_05.rst > @@ -110,6 +110,12 @@ API Changes > Also, make sure to start the actual text at the margin. > = > > +* **Reworked rte_ring library** > + > + The rte_ring library has been reworked and updated. The following changes > + have been made to it: > + > + * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS`` > > ABI Changes > --- > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c > index ca0a108..4bc6da1 100644 > --- a/lib/librte_ring/rte_ring.c > +++ b/lib/librte_ring/rte_ring.c > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, > unsigned count, > /* compilation-time checks */ > RTE_BUILD_BUG_ON((sizeof(struct rte_ring) & > RTE_CACHE_LINE_MASK) != 0); > -#ifdef RTE_RING_SPLIT_PROD_CONS > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) & > RTE_CACHE_LINE_MASK) != 0); > -#endif > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) & > RTE_CACHE_LINE_MASK) != 0); > #ifdef RTE_LIBRTE_RING_DEBUG > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > index 72ccca5..04fe667 100644 > --- a/lib/librte_ring/rte_ring.h > +++ b/lib/librte_ring/rte_ring.h > @@ -168,7 +168,7 @@ struct rte_ring { > uint32_t mask; /**< Mask (size-1) of ring. */ > volatile uint32_t head; /**< Producer head. */ > volatile uint32_t tail; /**< Producer tail. */ > - } prod __rte_cache_aligned; > + } prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2); I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line size of 128B > + } prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2); > > /** Ring consumer status. */ > struct cons { > @@ -177,11 +177,7 @@ struct rte_ring { > uint32_t mask; /**< Mask (size-1) of ring. */ > volatile uint32_t head; /**< Consumer head. */ > volatile uint32_t tail; /**< Consumer tail. */ > -#ifdef RTE_RING_SPLIT_PROD_CONS > - } cons __rte_cache_aligned; > -#else > - } cons; > -#endif > + } cons __rte_aligned(RTE_CACHE_LINE_SIZE * 2); > > #ifdef RTE_LIBRTE_RING_DEBUG > struct rte_ring_debug_stats stats[RTE_MAX_LCORE]; > -- > 2.9.3 >
Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
> > On Tue, 28 Feb 2017 10:29:41 +, "Ananyev, Konstantin" > wrote: > > > > > > Hi, > > > > > > On Tue, 28 Feb 2017 09:05:07 +, "Ananyev, Konstantin" > > > wrote: > > > > Hi everyone, > > > > > > > > > > > > > > > > In my opinion, if we have the room in the first cache line, we > > > > > > should put it there. The only argument I see against is "we > > > > > > may find something more important in the future, and we won't > > > > > > have room for it in the first cache line". I don't feel we > > > > > > should penalize today's use cases for hypothetic future use > > > > > > cases. > > > > > > > > > > > > > > > > > > > > > > > >> 2. timestamp normalization point > > > > > >> inside PMD RX vs somewhere later as user needs it (extra > > > > > >> function in dev_ops?). > > > > > > > > > > > > This point could be changed. My initial proposition tries to > > > > > > provide a generic API for timestamp. Let me remind it here: > > > > > > > > > > > > a- the timestamp is in nanosecond > > > > > > b- the reference is always the same for a given path: if the > > > > > > timestamp is set in a PMD, all the packets for this PMD will > > > > > > have the same reference, but for 2 different PMDs (or a sw > > > > > > lib), the reference would not be the same. > > > > > > > > > > > > We may remove a-, and just have: > > > > > > - the reference and the unit are always the same for a given > > > > > > path: if the timestamp is set in a PMD, all the packets for > > > > > > this PMD will have the same reference and unit, but for 2 > > > > > > different PMDs (or a sw lib), they would not be the same. > > > > > > > > > > > > In both cases, we would need a conversion code (maybe in a > > > > > > library) if the application wants to work with timestamps from > > > > > > several sources. The second solution removes the normalization > > > > > > code in the PMD when not needed, it is probably better. > > > > > > > > > > I agree. > > > > > > > > One question - does that mean that application would need to > > > > keep a track from what PMD each particular packet came to do the > > > > normalization? Konstantin > > > > > > I'd say yes. It does not look very difficult to do, since the mbuf > > > contains the input port id. > > > > > > > I understand that we can use mbuf->port here, but it means that we'll > > introduce new implicit dependency between timestamp and port values. > > From my point that introduces new implications: > > 1. all PMDs that do set a timestamp would also have to set port value too. > > Probably not a big deal as most of PMDs do set port value anyway right > > now, > > but it means it would be hard to get rid/change mbuf->port in future. > > Currently, all PMDs must set m->port. > If in the future we remove m->port, the applications that use it will need > to store the value in a mbuf metadata, or pass it as arguments through > function > calls. > > > > 2. Applications would not allowed to change mbuf->port value before > > normalization is done > > (from what I heard some apps do update mbuf->port to store routing > > decisions). > > BTW, how the app would keep track which mbufs were already normalized, > > and which were not? > > I don't think it should be allowed to change m->port value. As far as I know it is allowed right now. PMD RX routine sets mbuf->port, after that application is free to use it in a way it likes. What we are introducing here is basically a new dependency between 2 mbuf fields and new restriction. Another thing that doesn't look very convenient to me here - We can have 2 different values of timestamp (both normalized and not) and there is no clear way for the application to know which one is in use right now. So each app writer would have to come-up with his own solution. > Applications that > are doing this are responsible of what they change. > > > > 3. In theory with eth_dev_detach() - mbuf->port value might be not valid at > > the point when application > > would decide to do normalization. > > > > So to me all that approach with delayed normalization seems unnecessary > > overcomplicated. > > Original one suggested by Olivier, when normalization is done in PMD at RX > > look > > much cleaner and more manageable. > > Detaching a device requires a synchronization between control and data plane, > and not only for this use case. Of course it does. But right now it is possible to do: eth_rx_burst(port=0, ..., &mbuf, 1); eth_dev_detach(port=0, ...); ... /*process previously received mbuf */ With what you are proposing it would be not always possible any more. >In the first solution, the normalization is > partial: unit is nanosecond, but the time reference is different. Not sure I get you here... Konstantin > > So, after the discussion, I'm more convinced by the second solution. > > > Regards, > Olivier
Re: [dpdk-dev] [PATCH v2 02/15] net/avp: public header files
On Sun, Feb 26, 2017 at 02:08:50PM -0500, Allain Legacy wrote: > Adds public/exported header files for the AVP PMD. The AVP device is a > shared memory based device. The structures and constants that define the > method of operation of the device must be visible by both the PMD and the > host DPDK application. They must not change without proper version > controls and updates to both the hypervisor DPDK application and the PMD. > > The hypervisor DPDK application is a Wind River Systems proprietary > virtual switch. > > Signed-off-by: Allain Legacy > Signed-off-by: Matt Peters > --- > drivers/net/avp/rte_avp_common.h | 424 > +++ > drivers/net/avp/rte_avp_fifo.h | 157 +++ > 2 files changed, 581 insertions(+) > create mode 100644 drivers/net/avp/rte_avp_common.h > create mode 100644 drivers/net/avp/rte_avp_fifo.h > > diff --git a/drivers/net/avp/rte_avp_common.h > b/drivers/net/avp/rte_avp_common.h > new file mode 100644 > index 000..579d5ac > --- /dev/null > +++ b/drivers/net/avp/rte_avp_common.h > @@ -0,0 +1,424 @@ > +/*- > + * This file is provided under a dual BSD/LGPLv2 license. When using or > + * redistributing this file, you may do so under either license. > + * > + * GNU LESSER GENERAL PUBLIC LICENSE > + * > + * Copyright(c) 2010-2013 Intel Corporation. All rights reserved. > + * Copyright(c) 2014-2015 Wind River Systems, Inc. All rights reserved. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of version 2.1 of the GNU Lesser General Public > License > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * Contact Information: > + * Wind River Systems, Inc. > + * > + * > + * BSD LICENSE > + * > + * Copyright(c) 2010-2013 Intel Corporation. All rights reserved. > + * Copyright(c) 2014-2016 Wind River Systems, Inc. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + *THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + *"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + *LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + *A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + *OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + *SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + *LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + *DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + *THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + *(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + *OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + * > + */ > + > +#ifndef _RTE_AVP_COMMON_H_ > +#define _RTE_AVP_COMMON_H_ > + > +#ifdef __KERNEL__ > +#include > +#endif > + > +/** > + * AVP name is part of network device name. > + */ > +#define RTE_AVP_NAMESIZE 32 > + > +/** > + * AVP alias is a user-defined value used for lookups from secondary > + * processes. Typically, this is a UUID. > + */ > +#define RTE_AVP_ALIASSIZE 128 > + > +/** > + * Memory aligment (cache aligned) > + */ > +#ifndef RTE_AVP_ALIGNMENT > +#define RTE_AVP_ALIGNMENT 64 I think we use RTE_CACHE_LINE_SIZE here? PPC and ThunderX1 targets are cache line size of 128B > +#endif > + > +/* > + * Request id. > + */
Re: [dpdk-dev] [PATCH v2 08/15] net/avp: device initialization
On Sun, Feb 26, 2017 at 02:08:56PM -0500, Allain Legacy wrote: > Adds support for initialization newly probed AVP PCI devices. Initial > queue translations are setup in preparation for device configuration. > > Signed-off-by: Allain Legacy > Signed-off-by: Matt Peters > --- > drivers/net/avp/avp_ethdev.c | 757 > +++ > 1 file changed, 757 insertions(+) > > diff --git a/drivers/net/avp/avp_ethdev.c b/drivers/net/avp/avp_ethdev.c > index 6522555..3bbff33 100644 > --- a/drivers/net/avp/avp_ethdev.c > +++ b/drivers/net/avp/avp_ethdev.c > @@ -60,6 +60,8 @@ > #include "avp_logs.h" > > > +static int avp_dev_create(struct rte_pci_device *pci_dev, > + struct rte_eth_dev *eth_dev); > > static int eth_avp_dev_init(struct rte_eth_dev *eth_dev); > static int eth_avp_dev_uninit(struct rte_eth_dev *eth_dev); > @@ -103,6 +105,16 @@ > }; > > > +/**@{ AVP device flags */ > +#define RTE_AVP_F_PROMISC (1 << 1) > +#define RTE_AVP_F_CONFIGURED (1 << 2) > +#define RTE_AVP_F_LINKUP (1 << 3) > +#define RTE_AVP_F_DETACHED (1 << 4) > +/**@} */ > + > +/* Ethernet device validation marker */ > +#define RTE_AVP_ETHDEV_MAGIC 0x92972862 I think, we don't need to add RTE_ for internal flags and PMD APIs etc. > + > /* > * Defines the AVP device attributes which are attached to an RTE ethernet > * device > @@ -150,11 +162,726 @@ struct avp_adapter { > struct avp_dev avp; > } __rte_cache_aligned; > > + > +/* 32-bit MMIO register write */ > +#define RTE_AVP_WRITE32(_value, _addr) ((*(uint32_t *)_addr) = (_value)) > + > +/* 32-bit MMIO register read */ > +#define RTE_AVP_READ32(_addr) (*(uint32_t *)(_addr)) Use rte_write32 and rte_read32 API instead. > + > /* Macro to cast the ethernet device private data to a AVP object */ > #define RTE_AVP_DEV_PRIVATE_TO_HW(adapter) \ > (&((struct avp_adapter *)adapter)->avp) > > /* > + * Defines the structure of a AVP device queue for the purpose of handling > the > + * receive and transmit burst callback functions > + */
Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote: > On Thu, Feb 23, 2017 at 05:23:54PM +, Bruce Richardson wrote: > > Users compiling DPDK should not need to know or care about the arrangement > > of cachelines in the rte_ring structure. Therefore just remove the build > > option and set the structures to be always split. For improved > > performance use 128B rather than 64B alignment since it stops the producer > > and consumer data being on adjacent cachelines. > > > > Signed-off-by: Bruce Richardson > > --- > > config/common_base | 1 - > > doc/guides/rel_notes/release_17_05.rst | 6 ++ > > lib/librte_ring/rte_ring.c | 2 -- > > lib/librte_ring/rte_ring.h | 8 ++-- > > 4 files changed, 8 insertions(+), 9 deletions(-) > > > > diff --git a/config/common_base b/config/common_base > > index aeee13e..099ffda 100644 > > --- a/config/common_base > > +++ b/config/common_base > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y > > # > > CONFIG_RTE_LIBRTE_RING=y > > CONFIG_RTE_LIBRTE_RING_DEBUG=n > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n > > CONFIG_RTE_RING_PAUSE_REP_COUNT=0 > > > > # > > diff --git a/doc/guides/rel_notes/release_17_05.rst > > b/doc/guides/rel_notes/release_17_05.rst > > index e25ea9f..ea45e0c 100644 > > --- a/doc/guides/rel_notes/release_17_05.rst > > +++ b/doc/guides/rel_notes/release_17_05.rst > > @@ -110,6 +110,12 @@ API Changes > > Also, make sure to start the actual text at the margin. > > = > > > > +* **Reworked rte_ring library** > > + > > + The rte_ring library has been reworked and updated. The following changes > > + have been made to it: > > + > > + * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS`` > > > > ABI Changes > > --- > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c > > index ca0a108..4bc6da1 100644 > > --- a/lib/librte_ring/rte_ring.c > > +++ b/lib/librte_ring/rte_ring.c > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, > > unsigned count, > > /* compilation-time checks */ > > RTE_BUILD_BUG_ON((sizeof(struct rte_ring) & > > RTE_CACHE_LINE_MASK) != 0); > > -#ifdef RTE_RING_SPLIT_PROD_CONS > > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) & > > RTE_CACHE_LINE_MASK) != 0); > > -#endif > > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) & > > RTE_CACHE_LINE_MASK) != 0); > > #ifdef RTE_LIBRTE_RING_DEBUG > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > > index 72ccca5..04fe667 100644 > > --- a/lib/librte_ring/rte_ring.h > > +++ b/lib/librte_ring/rte_ring.h > > @@ -168,7 +168,7 @@ struct rte_ring { > > uint32_t mask; /**< Mask (size-1) of ring. */ > > volatile uint32_t head; /**< Producer head. */ > > volatile uint32_t tail; /**< Producer tail. */ > > - } prod __rte_cache_aligned; > > + } prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2); > > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache > line > size of 128B > Sure. However, can you perhaps try a performance test and check to see if there is a performance difference between the two values before I change it? In my tests I see improved performance by having an extra blank cache-line between the producer and consumer data. /Bruce
[dpdk-dev] [PATCH v2 0/2] librte_net: add crc computation support
In some applications, CRC (Cyclic Redundancy Check) needs to be computed or updated during packet processing operations. This patchset adds software implementation of some common standard CRCs (32-bit Ethernet CRC as per Ethernet/[ISO/IEC 8802-3] and 16-bit CCITT-CRC [ITU-T X.25]). Two versions of each 32-bit and 16-bit CRC calculation are proposed. The first version presents a fast and efficient CRC generation on IA processors by using the carry-less multiplication instruction – PCLMULQDQ (i.e SSE4.2 instrinsics). In this implementation, a parallelized folding approach has been used to first reduce an arbitrary length buffer to a small fixed size length buffer (16 bytes) with the help of precomputed constants. The resultant single 16-bytes chunk is further reduced by Barrett reduction method to generate final CRC value. For more details on the implementation, see reference [1]. The second version presents the fallback solution to support the CRC generation without needing any specific support from CPU (for examples- SSE4.2 intrinsics). It is based on generic Look-Up Table(LUT) algorithm that uses precomputed 256 element table as explained in reference[2]. Following APIs have been added; (i) rte_net_crc_init() (ii)rte_net_crc_calc() The first API (i) initalises the data structures required for CRC computation and this api should be used only once in the application before using second API (ii) for 16-bit and 32-bit CRC calculations. References: [1] Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf [2] A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS http://www.ross.net/crc/download/crc_v3.txt v2 changes: - fix build errors for target i686-native-linuxapp-gcc - fix checkpatch warnings Notes: - Build not successful with clang version earlier than 3.7.0 due to missing intrinsics. Refer dpdk known issue section for more details. Jasvinder Singh (2): librte_net: add crc init and compute APIs app/test: add unit test for CRC computation app/test/Makefile | 2 + app/test/test_crc.c| 229 + lib/librte_net/Makefile| 2 + lib/librte_net/rte_net_crc.c | 664 + lib/librte_net/rte_net_crc.h | 101 ++ lib/librte_net/rte_net_version.map | 8 + 6 files changed, 1006 insertions(+) create mode 100644 app/test/test_crc.c create mode 100644 lib/librte_net/rte_net_crc.c create mode 100644 lib/librte_net/rte_net_crc.h -- 2.5.5
[dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
APIs for initialising and computing the crc (16-bit and 32-bit CRCs) are added. For CRCs calculation, scalar as well as x86 intrinsic(sse4.2) versions are implemented. The scalar version is based on generic Look-Up Table(LUT) algorithm, while x86 intrinsic version uses carry-less multiplication for fast CRC computation. Signed-off-by: Jasvinder Singh --- lib/librte_net/Makefile| 2 + lib/librte_net/rte_net_crc.c | 664 + lib/librte_net/rte_net_crc.h | 101 ++ lib/librte_net/rte_net_version.map | 8 + 4 files changed, 775 insertions(+) create mode 100644 lib/librte_net/rte_net_crc.c create mode 100644 lib/librte_net/rte_net_crc.h diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile index 20cf664..41be751 100644 --- a/lib/librte_net/Makefile +++ b/lib/librte_net/Makefile @@ -39,11 +39,13 @@ EXPORT_MAP := rte_net_version.map LIBABIVER := 1 SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c +SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c # install includes SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h +SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_eal lib/librte_mbuf diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c new file mode 100644 index 000..78a49dd --- /dev/null +++ b/lib/librte_net/rte_net_crc.c @@ -0,0 +1,664 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2017 Intel Corporation. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include + +/* Macros for printing using RTE_LOG */ +#define RTE_LOGTYPE_CRC RTE_LOGTYPE_USER1 + +/** CRC polynomials */ +#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL +#define CRC16_CCITT_POLYNOMIAL 0x1021U + +typedef int (*rte_net_crc_handler)(struct rte_net_crc_params *); + +static int rte_crc16_ccitt_handler(struct rte_net_crc_params *p); +static int rte_crc32_eth_handler(struct rte_net_crc_params *p); +static int rte_crc_invalid_handler(struct rte_net_crc_params *p); + +static rte_net_crc_handler *handlers; + +static rte_net_crc_handler handlers_scalar[] = { + [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, + [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, + [RTE_NET_CRC_REQS] = rte_crc_invalid_handler, +}; + +int +rte_crc_invalid_handler(__rte_unused struct rte_net_crc_params *p) +{ + RTE_LOG(ERR, CRC, "CRC type not supported!\n"); + return -1; /* Error */ +} + +#if defined RTE_ARCH_X86_64 && defined RTE_MACHINE_CPUFLAG_SSE4_2 + +#include + +/** PCLMULQDQ CRC computation context structure */ +struct crc_pclmulqdq_ctx { + __m128i rk1_rk2; + __m128i rk5_rk6; + __m128i rk7_rk8; +}; + +struct crc_pclmulqdq_ctx crc32_eth_pclmulqdq __rte_aligned(16); +struct crc_pclmulqdq_ctx crc16_ccitt_pclmulqdq __rte_aligned(16); +/** + * @brief Performs one folding round + * + * Logically function operates as follows: + * DATA = READ_NEXT_16BYTES(); + * F1 = LSB8(FOLD) + * F2 = MSB8(FOLD) + * T1 = CLMUL(F1, RK1) + * T2 = CLMUL(F2, RK2) + * FOLD = XOR(T1, T2, DATA) + * + * @param data_block 16 byte data block + * @param precomp precomputed rk1 constanst + * @param fold running 16 byte folded data + * + * @
[dpdk-dev] [PATCH v2 2/2] app/test: add unit test for CRC computation
This patch provides a set of tests for verifying the functional correctness of 16-bit and 32-bit CRC APIs. Signed-off-by: Jasvinder Singh --- app/test/Makefile | 2 + app/test/test_crc.c | 229 2 files changed, 231 insertions(+) create mode 100644 app/test/test_crc.c diff --git a/app/test/Makefile b/app/test/Makefile index 1a5e03d..2a497f7 100644 --- a/app/test/Makefile +++ b/app/test/Makefile @@ -160,6 +160,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_cirbuf.c SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_string.c SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += test_cmdline_lib.c +SRCS-$(CONFIG_RTE_LIBRTE_NET) += test_crc.c + ifeq ($(CONFIG_RTE_LIBRTE_SCHED),y) SRCS-y += test_red.c SRCS-y += test_sched.c diff --git a/app/test/test_crc.c b/app/test/test_crc.c new file mode 100644 index 000..17c9161 --- /dev/null +++ b/app/test/test_crc.c @@ -0,0 +1,229 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2017 Intel Corporation. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include + +#include "test.h" + +#define MBUF_DATA_SIZE 2048 +#define NB_MBUF 64 +#define CRC_VEC_LEN32 +#define CRC32_VEC_LEN1 1512 +#define CRC32_VEC_LEN2 348 +#define CRC16_VEC_LEN1 12 +#define CRC16_VEC_LEN2 2 + +/* CRC test vector */ +static const uint8_t crc_vec[CRC_VEC_LEN] = { + '0', '1', '2', '3', '4', '5', '6', '7', + '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', + 'g', 'h', 'i', 'j', 'A', 'B', 'C', 'D', + 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', +}; + +/* 32-bit CRC test vector */ +static const uint8_t crc32_vec1[12] = { + 0xBE, 0xD7, 0x23, 0x47, 0x6B, 0x8F, + 0xB3, 0x14, 0x5E, 0xFB, 0x35, 0x59, +}; + +/* 16-bit CRC test vector 1*/ +static const uint8_t crc16_vec1[CRC16_VEC_LEN1] = { + 0x0D, 0x01, 0x01, 0x23, 0x45, 0x67, + 0x89, 0x01, 0x23, 0x45, 0x00, 0x01, +}; + +/* 16-bit CRC test vector 2*/ +static const uint8_t crc16_vec2[CRC16_VEC_LEN2] = { + 0x03, 0x3f, +}; +/** CRC results */ +static const uint32_t crc32_vec_res = 0xb491aab4; +static const uint32_t crc32_vec1_res = 0xac54d294; +static const uint32_t crc32_vec2_res = 0xefaae02f; +static const uint32_t crc16_vec_res = 0x6bec; +static const uint16_t crc16_vec1_res = 0x8cdd; +static const uint16_t crc16_vec2_res = 0xec5b; + +static int +crc_calc(struct rte_mempool *mp, + const uint8_t *vec, + uint32_t vec_len, + uint32_t crc_r, + enum rte_net_crc_type type) +{ + struct rte_net_crc_params p; + uint8_t *data; + + /* alloc first mbuf from the mempool */ + p.mbuf = rte_pktmbuf_alloc(mp); + if (p.mbuf == NULL) { + printf("rte_pktmbuf_alloc() failed!\n"); + return -1; + } + + /* copy parameters */ + p.type = type; + p.data_offset = 0; + p.data_len = vec_len; + + /* append data length to an mbuf */ + data = (uint8_t *)rte_pktmbuf_append(p.mbuf, p.data_len); + + /* copy ref_vector */ + rte_memcpy(data, vec, p.data_len); + + /* dump mbuf data on console*/ + rte_pktmbuf_dump(stdout, p.mbuf, p.data_len); + + /* compute CRC */ + int ret = rte_net_crc_calc(&p); + + if (crc_r != (uint32_t) ret) { + rte_pktmbuf_f
Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
On Tue, Feb 28, 2017 at 11:57:03AM +, Bruce Richardson wrote: > On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote: > > On Thu, Feb 23, 2017 at 05:23:54PM +, Bruce Richardson wrote: > > > Users compiling DPDK should not need to know or care about the arrangement > > > of cachelines in the rte_ring structure. Therefore just remove the build > > > option and set the structures to be always split. For improved > > > performance use 128B rather than 64B alignment since it stops the producer > > > and consumer data being on adjacent cachelines. > > > > > > Signed-off-by: Bruce Richardson > > > --- > > > config/common_base | 1 - > > > doc/guides/rel_notes/release_17_05.rst | 6 ++ > > > lib/librte_ring/rte_ring.c | 2 -- > > > lib/librte_ring/rte_ring.h | 8 ++-- > > > 4 files changed, 8 insertions(+), 9 deletions(-) > > > > > > diff --git a/config/common_base b/config/common_base > > > index aeee13e..099ffda 100644 > > > --- a/config/common_base > > > +++ b/config/common_base > > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y > > > # > > > CONFIG_RTE_LIBRTE_RING=y > > > CONFIG_RTE_LIBRTE_RING_DEBUG=n > > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n > > > CONFIG_RTE_RING_PAUSE_REP_COUNT=0 > > > > > > # > > > diff --git a/doc/guides/rel_notes/release_17_05.rst > > > b/doc/guides/rel_notes/release_17_05.rst > > > index e25ea9f..ea45e0c 100644 > > > --- a/doc/guides/rel_notes/release_17_05.rst > > > +++ b/doc/guides/rel_notes/release_17_05.rst > > > @@ -110,6 +110,12 @@ API Changes > > > Also, make sure to start the actual text at the margin. > > > = > > > > > > +* **Reworked rte_ring library** > > > + > > > + The rte_ring library has been reworked and updated. The following > > > changes > > > + have been made to it: > > > + > > > + * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS`` > > > > > > ABI Changes > > > --- > > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c > > > index ca0a108..4bc6da1 100644 > > > --- a/lib/librte_ring/rte_ring.c > > > +++ b/lib/librte_ring/rte_ring.c > > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, > > > unsigned count, > > > /* compilation-time checks */ > > > RTE_BUILD_BUG_ON((sizeof(struct rte_ring) & > > > RTE_CACHE_LINE_MASK) != 0); > > > -#ifdef RTE_RING_SPLIT_PROD_CONS > > > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) & > > > RTE_CACHE_LINE_MASK) != 0); > > > -#endif > > > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) & > > > RTE_CACHE_LINE_MASK) != 0); > > > #ifdef RTE_LIBRTE_RING_DEBUG > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > > > index 72ccca5..04fe667 100644 > > > --- a/lib/librte_ring/rte_ring.h > > > +++ b/lib/librte_ring/rte_ring.h > > > @@ -168,7 +168,7 @@ struct rte_ring { > > > uint32_t mask; /**< Mask (size-1) of ring. */ > > > volatile uint32_t head; /**< Producer head. */ > > > volatile uint32_t tail; /**< Producer tail. */ > > > - } prod __rte_cache_aligned; > > > + } prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2); > > > > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of > > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache > > line > > size of 128B > > > Sure. > > However, can you perhaps try a performance test and check to see if > there is a performance difference between the two values before I change > it? In my tests I see improved performance by having an extra blank > cache-line between the producer and consumer data. Sure. Which test are you running to measure the performance difference? Is it app/test/test_ring_perf.c? > > /Bruce
Re: [dpdk-dev] [PATCH v2 1/2] librte_net: add crc init and compute APIs
On Tue, Feb 28, 2017 at 12:08:20PM +, Jasvinder Singh wrote: > APIs for initialising and computing the crc (16-bit and 32-bit CRCs) > are added. For CRCs calculation, scalar as well as x86 intrinsic(sse4.2) > versions are implemented. > > The scalar version is based on generic Look-Up Table(LUT) algorithm, > while x86 intrinsic version uses carry-less multiplication for > fast CRC computation. > > Signed-off-by: Jasvinder Singh > --- > lib/librte_net/Makefile| 2 + > lib/librte_net/rte_net_crc.c | 664 > + > lib/librte_net/rte_net_crc.h | 101 ++ > lib/librte_net/rte_net_version.map | 8 + > 4 files changed, 775 insertions(+) > create mode 100644 lib/librte_net/rte_net_crc.c > create mode 100644 lib/librte_net/rte_net_crc.h > > diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile > index 20cf664..41be751 100644 > --- a/lib/librte_net/Makefile > +++ b/lib/librte_net/Makefile > @@ -39,11 +39,13 @@ EXPORT_MAP := rte_net_version.map > LIBABIVER := 1 > > SRCS-$(CONFIG_RTE_LIBRTE_NET) := rte_net.c > +SRCS-$(CONFIG_RTE_LIBRTE_NET) += rte_net_crc.c > > # install includes > SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h > SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h > SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h > +SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h > > DEPDIRS-$(CONFIG_RTE_LIBRTE_NET) += lib/librte_eal lib/librte_mbuf > > diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c > new file mode 100644 > index 000..78a49dd > --- /dev/null > +++ b/lib/librte_net/rte_net_crc.c > @@ -0,0 +1,664 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2017 Intel Corporation. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#include > +#include > + > +/* Macros for printing using RTE_LOG */ > +#define RTE_LOGTYPE_CRC RTE_LOGTYPE_USER1 > + > +/** CRC polynomials */ > +#define CRC32_ETH_POLYNOMIAL 0x04c11db7UL > +#define CRC16_CCITT_POLYNOMIAL 0x1021U > + > +typedef int (*rte_net_crc_handler)(struct rte_net_crc_params *); > + > +static int rte_crc16_ccitt_handler(struct rte_net_crc_params *p); > +static int rte_crc32_eth_handler(struct rte_net_crc_params *p); > +static int rte_crc_invalid_handler(struct rte_net_crc_params *p); > + > +static rte_net_crc_handler *handlers; > + > +static rte_net_crc_handler handlers_scalar[] = { > + [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, > + [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, > + [RTE_NET_CRC_REQS] = rte_crc_invalid_handler, > +}; > + > +int > +rte_crc_invalid_handler(__rte_unused struct rte_net_crc_params *p) > +{ > + RTE_LOG(ERR, CRC, "CRC type not supported!\n"); > + return -1; /* Error */ > +} > + > +#if defined RTE_ARCH_X86_64 && defined RTE_MACHINE_CPUFLAG_SSE4_2 Could you please abstract the vector function and move the SSE implementation to separate file for future clean neon and altivec integration. Reference: lib/librte_lpm/rte_lpm_sse.h
Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
On Tue, 28 Feb 2017 11:48:20 +, "Ananyev, Konstantin" wrote: > > > > On Tue, 28 Feb 2017 10:29:41 +, "Ananyev, Konstantin" > > wrote: > > > > > > > > Hi, > > > > > > > > On Tue, 28 Feb 2017 09:05:07 +, "Ananyev, Konstantin" > > > > wrote: > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > In my opinion, if we have the room in the first cache > > > > > > > line, we should put it there. The only argument I see > > > > > > > against is "we may find something more important in the > > > > > > > future, and we won't have room for it in the first cache > > > > > > > line". I don't feel we should penalize today's use cases > > > > > > > for hypothetic future use cases. > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 2. timestamp normalization point > > > > > > >> inside PMD RX vs somewhere later as user needs it > > > > > > >> (extra function in dev_ops?). > > > > > > > > > > > > > > This point could be changed. My initial proposition tries > > > > > > > to provide a generic API for timestamp. Let me remind it > > > > > > > here: > > > > > > > > > > > > > > a- the timestamp is in nanosecond > > > > > > > b- the reference is always the same for a given path: if > > > > > > > the timestamp is set in a PMD, all the packets for this > > > > > > > PMD will have the same reference, but for 2 different > > > > > > > PMDs (or a sw lib), the reference would not be the same. > > > > > > > > > > > > > > We may remove a-, and just have: > > > > > > > - the reference and the unit are always the same for a > > > > > > > given path: if the timestamp is set in a PMD, all the > > > > > > > packets for this PMD will have the same reference and > > > > > > > unit, but for 2 different PMDs (or a sw lib), they would > > > > > > > not be the same. > > > > > > > > > > > > > > In both cases, we would need a conversion code (maybe in a > > > > > > > library) if the application wants to work with timestamps > > > > > > > from several sources. The second solution removes the > > > > > > > normalization code in the PMD when not needed, it is > > > > > > > probably better. > > > > > > > > > > > > I agree. > > > > > > > > > > One question - does that mean that application would need to > > > > > keep a track from what PMD each particular packet came to do > > > > > the normalization? Konstantin > > > > > > > > I'd say yes. It does not look very difficult to do, since the > > > > mbuf contains the input port id. > > > > > > > > > > I understand that we can use mbuf->port here, but it means that > > > we'll introduce new implicit dependency between timestamp and > > > port values. From my point that introduces new implications: > > > 1. all PMDs that do set a timestamp would also have to set port > > > value too. Probably not a big deal as most of PMDs do set port > > > value anyway right now, but it means it would be hard to get > > > rid/change mbuf->port in future. > > > > Currently, all PMDs must set m->port. > > If in the future we remove m->port, the applications that use it > > will need to store the value in a mbuf metadata, or pass it as > > arguments through function calls. > > > > > > > 2. Applications would not allowed to change mbuf->port value > > > before normalization is done (from what I heard some apps do > > > update mbuf->port to store routing decisions). BTW, how the app > > > would keep track which mbufs were already normalized, and which > > > were not? > > > > I don't think it should be allowed to change m->port value. > > As far as I know it is allowed right now. > PMD RX routine sets mbuf->port, after that application is free to use > it in a way it likes. The descriptor or m->port is "Input port". If the applications stores something else than the input port, it is its responsibility if it breaks something else. Like changing any other field to put something that does not match the description. > What we are introducing here is basically a new dependency between 2 > mbuf fields and new restriction. On the other hand, there is no strong dependency: the API to do the normalization can take the port as a parameter. > > Another thing that doesn't look very convenient to me here - > We can have 2 different values of timestamp (both normalized and not) > and there is no clear way for the application to know which one is in > use right now. So each app writer would have to come-up with his own > solution. It depends: - the solution you describe is to have the application storing the normalized value in its private metadata. - another solution would be to store the normalized value in m->timestamp. In this case, we would need a flag to tell if the timestamp value is normalized. The problem pointed out by Jan is that doing the timestamp normalization may take some CPU cycles, even if a small part of packets requires it. > > > Applications that > > are doing this are responsible of what they change. > > > > > > > 3. In theory with
[dpdk-dev] [PATCH] net/ixgbe: remove invalid declaration
The function is not defined anywhere, remove it. Fixes: 0eb609239efd ("ixgbe: enable Rx queue interrupts for PF and VF") Signed-off-by: Olivier Matz --- drivers/net/ixgbe/ixgbe_ethdev.h | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h index 680d5d9..c13b10e 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.h +++ b/drivers/net/ixgbe/ixgbe_ethdev.h @@ -514,7 +514,6 @@ uint32_t ixgbe_dev_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id); int ixgbe_dev_rx_descriptor_done(void *rx_queue, uint16_t offset); -int ixgbevf_dev_rx_descriptor_done(void *rx_queue, uint16_t offset); int ixgbe_dev_rx_init(struct rte_eth_dev *dev); -- 2.8.1
Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
On Tue, Feb 28, 2017 at 05:38:34PM +0530, Jerin Jacob wrote: > On Tue, Feb 28, 2017 at 11:57:03AM +, Bruce Richardson wrote: > > On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote: > > > On Thu, Feb 23, 2017 at 05:23:54PM +, Bruce Richardson wrote: > > > > Users compiling DPDK should not need to know or care about the > > > > arrangement > > > > of cachelines in the rte_ring structure. Therefore just remove the build > > > > option and set the structures to be always split. For improved > > > > performance use 128B rather than 64B alignment since it stops the > > > > producer > > > > and consumer data being on adjacent cachelines. > > > > > > > > Signed-off-by: Bruce Richardson > > > > --- > > > > config/common_base | 1 - > > > > doc/guides/rel_notes/release_17_05.rst | 6 ++ > > > > lib/librte_ring/rte_ring.c | 2 -- > > > > lib/librte_ring/rte_ring.h | 8 ++-- > > > > 4 files changed, 8 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/config/common_base b/config/common_base > > > > index aeee13e..099ffda 100644 > > > > --- a/config/common_base > > > > +++ b/config/common_base > > > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y > > > > # > > > > CONFIG_RTE_LIBRTE_RING=y > > > > CONFIG_RTE_LIBRTE_RING_DEBUG=n > > > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n > > > > CONFIG_RTE_RING_PAUSE_REP_COUNT=0 > > > > > > > > # > > > > diff --git a/doc/guides/rel_notes/release_17_05.rst > > > > b/doc/guides/rel_notes/release_17_05.rst > > > > index e25ea9f..ea45e0c 100644 > > > > --- a/doc/guides/rel_notes/release_17_05.rst > > > > +++ b/doc/guides/rel_notes/release_17_05.rst > > > > @@ -110,6 +110,12 @@ API Changes > > > > Also, make sure to start the actual text at the margin. > > > > = > > > > > > > > +* **Reworked rte_ring library** > > > > + > > > > + The rte_ring library has been reworked and updated. The following > > > > changes > > > > + have been made to it: > > > > + > > > > + * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS`` > > > > > > > > ABI Changes > > > > --- > > > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c > > > > index ca0a108..4bc6da1 100644 > > > > --- a/lib/librte_ring/rte_ring.c > > > > +++ b/lib/librte_ring/rte_ring.c > > > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char > > > > *name, unsigned count, > > > > /* compilation-time checks */ > > > > RTE_BUILD_BUG_ON((sizeof(struct rte_ring) & > > > > RTE_CACHE_LINE_MASK) != 0); > > > > -#ifdef RTE_RING_SPLIT_PROD_CONS > > > > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) & > > > > RTE_CACHE_LINE_MASK) != 0); > > > > -#endif > > > > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) & > > > > RTE_CACHE_LINE_MASK) != 0); > > > > #ifdef RTE_LIBRTE_RING_DEBUG > > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > > > > index 72ccca5..04fe667 100644 > > > > --- a/lib/librte_ring/rte_ring.h > > > > +++ b/lib/librte_ring/rte_ring.h > > > > @@ -168,7 +168,7 @@ struct rte_ring { > > > > uint32_t mask; /**< Mask (size-1) of ring. */ > > > > volatile uint32_t head; /**< Producer head. */ > > > > volatile uint32_t tail; /**< Producer tail. */ > > > > - } prod __rte_cache_aligned; > > > > + } prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2); > > > > > > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of > > > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are > > > cache line > > > size of 128B > > > > > Sure. > > > > However, can you perhaps try a performance test and check to see if > > there is a performance difference between the two values before I change > > it? In my tests I see improved performance by having an extra blank > > cache-line between the producer and consumer data. > > Sure. Which test are you running to measure the performance difference? > Is it app/test/test_ring_perf.c? > > > Yep, just the basic ring perf test. I look mostly at the core-to-core numbers, since hyperthread-to-hyperthread or NUMA socket to NUMA socket would be far less common use cases IMHO. /Bruce
Re: [dpdk-dev] [PATCH v2] tools: add tags and cscope index file generation support
On Mon, Feb 27, 2017 at 03:18:52PM +0100, Thomas Monjalon wrote: > 2017-01-17 14:11, Jerin Jacob: > > This script generates cscope, gtags, and tags > > index files based on EAL environment. > > (architecture and OS(linux/bsd)) > > > > Selection of the architecture and OS environment > > is based on dpdk configuration target(T=) > > What is the purpose of selecting a configuration? > Is it to go quicker in the implementation you are interested in? Yes. That is the hard part to do. > In that case, I think we need a catch-all option, because I like > being prompted by vim that several implementations exist and I can > choose one of them. OK. Then we can make T= as optional and if T= is not specified then script can take all the source files. Thoughts? > > > example usage: > > make tags T=x86_64-native-linuxapp-gcc > > make cscope T=x86_64-native-linuxapp-gcc > > make gtags T=x86_64-native-linuxapp-gcc > > > > Signed-off-by: Jerin Jacob > > Reviewed-by: Yuanhan Liu > > Reviewed-by: Ferruh Yigit > [...] > > .gitignore| 8 ++ > > devtools/tags.sh | 251 > > ++ > > mk/rte.sdkroot.mk | 4 + > > 3 files changed, 263 insertions(+) > > I think build-tags.sh would be a better name. OK > > On the implementation, I have few comments: > - is there a way to re-use the skip list when including the related > files? > - you can remove tile from this patch OK > And for the details: > - why a bash shebang where /bin/sh would be wider? I will change to /bin/sh > - verbose option should be -v (with getopts) OK > - please use $() instead of backquotes OK > - please avoid one-line functions used only once like doctags OK > > I can help you with this script if needed. OK. I can work on this next week, if you have time then feel free to take up this. >
Re: [dpdk-dev] [PATCH v2 0/4] New crypto algorithm string parser API
> -Original Message- > From: De Lara Guarch, Pablo > Sent: Monday, February 27, 2017 2:39 PM > To: Doherty, Declan > Cc: dev@dpdk.org; De Lara Guarch, Pablo > Subject: [PATCH v2 0/4] New crypto algorithm string parser API > > Last release, an array with strings for the supported algorithms > by the cryptodev library was added and used in the crypto-perf app. > > This patchset creates a new API to parse strings from the user, > to select the desired algorithm (using the array above), > which can be used by any application, making it consistent across > all the applications (now, L2fwd-crypto and crypto-perf apps are > using different strings). > > Changes in v2: > > - Modified L2fwd-crypto document to reflect the changes > > Pablo de Lara (4): > cryptodev: add missing algorithm strings > cryptodev: add algorithm string parsers > app/crypto-perf: use cryptodev algorithm parser > examples/l2fwd-crypto: use cryptodev algorithm parser > > app/test-crypto-perf/cperf_options_parsing.c | 206 > ++--- > doc/guides/sample_app_ug/l2_forward_crypto.rst | 10 +- > examples/l2fwd-crypto/main.c | 85 ++ > lib/librte_cryptodev/rte_cryptodev.c | 38 + > lib/librte_cryptodev/rte_cryptodev.h | 30 > lib/librte_cryptodev/rte_cryptodev_version.map | 8 + > 6 files changed, 105 insertions(+), 272 deletions(-) > > -- > 2.7.4 Applied to dpdk-next-crypto. Pablo
Re: [dpdk-dev] [PATCH v5 03/26] eal: No panic on hugepages info init
On Mon, Feb 27, 2017 at 11:17:48AM -0500, Aaron Conole wrote: > When attempting to scan hugepages, signal to the eal.c that an error has > occurred, rather than performing a panic. > > Signed-off-by: Aaron Conole > --- > lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c > b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c > index 18858e2..4d47eaf 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c > +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c > @@ -283,9 +283,11 @@ eal_hugepage_info_init(void) > struct dirent *dirent; > > dir = opendir(sys_dir_path); > - if (dir == NULL) > - rte_panic("Cannot open directory %s to read system hugepage " > + if (dir == NULL) { > + RTE_LOG(ERR, EAL, "Cannot open directory %s to read system > hugepage " > "info\n", sys_dir_path); > + return -1; > + } Minor nit. The error message should go on a line on its own, without any breaks to make it easy to "grep". This should also eliminate the checkpatch complaint about it being too long. /Bruce
Re: [dpdk-dev] [PATCH v5 08/26] eal: do not panic on memzone initialization fails
On Mon, Feb 27, 2017 at 11:17:53AM -0500, Aaron Conole wrote: > When memzone initialization fails, report the error to the calling > application rather than panic(). Without a good way of detaching / > releasing hugepages, at this point the application will have to restart. > > Signed-off-by: Aaron Conole > --- > lib/librte_eal/linuxapp/eal/eal.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/lib/librte_eal/linuxapp/eal/eal.c > b/lib/librte_eal/linuxapp/eal/eal.c > index a671ed4..1e54ca1 100644 > --- a/lib/librte_eal/linuxapp/eal/eal.c > +++ b/lib/librte_eal/linuxapp/eal/eal.c > @@ -839,8 +839,11 @@ rte_eal_init(int argc, char **argv) > /* the directories are locked during eal_hugepage_info_init */ > eal_hugedirs_unlock(); > > - if (rte_eal_memzone_init() < 0) > - rte_panic("Cannot init memzone\n"); > + if (rte_eal_memzone_init() < 0) { > + RTE_LOG(ERR, EAL, "Cannot init memzone\n"); Any particular reason why not "rte_eal_init_alert" as with the other cases? /Bruce
Re: [dpdk-dev] [PATCH v5 00/26] linux/eal: Remove most causes of panic on init
On Mon, Feb 27, 2017 at 11:17:45AM -0500, Aaron Conole wrote: > In many cases, it's enough to simply let the application know that the > call to initialize DPDK has failed. A complete halt can then be > decided by the application based on error returned (and the app could > even attempt a possible re-attempt after some corrective action by the > user or application). > Set looks pretty good. Just a couple of minor issues, apart from the one checkpatch issue I flagged: * seems to be some inconsistency between using the "rte_eal_init_alert" function and RTE_LOG. If there is some reason why some cases use the function and others don't I think that might need to be called out somewhere. * check-git-log flags a number of minor errors with commit titles and messages - most common is commit titles starting with a capital letter. Otherwise: Series-Acked-by: Bruce Richardson
Re: [dpdk-dev] [PATCH v5 08/26] eal: do not panic on memzone initialization fails
Bruce Richardson writes: > On Mon, Feb 27, 2017 at 11:17:53AM -0500, Aaron Conole wrote: >> When memzone initialization fails, report the error to the calling >> application rather than panic(). Without a good way of detaching / >> releasing hugepages, at this point the application will have to restart. >> >> Signed-off-by: Aaron Conole >> --- >> lib/librte_eal/linuxapp/eal/eal.c | 7 +-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/lib/librte_eal/linuxapp/eal/eal.c >> b/lib/librte_eal/linuxapp/eal/eal.c >> index a671ed4..1e54ca1 100644 >> --- a/lib/librte_eal/linuxapp/eal/eal.c >> +++ b/lib/librte_eal/linuxapp/eal/eal.c >> @@ -839,8 +839,11 @@ rte_eal_init(int argc, char **argv) >> /* the directories are locked during eal_hugepage_info_init */ >> eal_hugedirs_unlock(); >> >> -if (rte_eal_memzone_init() < 0) >> -rte_panic("Cannot init memzone\n"); >> +if (rte_eal_memzone_init() < 0) { >> +RTE_LOG(ERR, EAL, "Cannot init memzone\n"); > > Any particular reason why not "rte_eal_init_alert" as with the other > cases? I only used rte_eal_init_alert() for cases which occur before logging happens, but I am not opposed to switching it for all the cases. I'll swap them all for v6. Thanks again for the review, Bruce! -Aaron
Re: [dpdk-dev] [PATCH v5 03/26] eal: No panic on hugepages info init
Bruce Richardson writes: > On Mon, Feb 27, 2017 at 11:17:48AM -0500, Aaron Conole wrote: >> When attempting to scan hugepages, signal to the eal.c that an error has >> occurred, rather than performing a panic. >> >> Signed-off-by: Aaron Conole >> --- >> lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 6 -- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c >> b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c >> index 18858e2..4d47eaf 100644 >> --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c >> +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c >> @@ -283,9 +283,11 @@ eal_hugepage_info_init(void) >> struct dirent *dirent; >> >> dir = opendir(sys_dir_path); >> -if (dir == NULL) >> -rte_panic("Cannot open directory %s to read system hugepage " >> +if (dir == NULL) { >> +RTE_LOG(ERR, EAL, "Cannot open directory %s to read system >> hugepage " >>"info\n", sys_dir_path); >> +return -1; >> +} > > Minor nit. > The error message should go on a line on its own, without any breaks to > make it easy to "grep". This should also eliminate the checkpatch > complaint about it being too long. Yes, will fix it for v6. -Aaron
Re: [dpdk-dev] Pktgen-DPDK build errors on ppc64le
> On Feb 28, 2017, at 2:48 AM, Rahul Lakkireddy > wrote: > > Hi All, > > We are seeing Pktgen-DPDK not building on ppc64le with latest dpdk git > repo. Yes, it looks like Pktgen needs to change to account for 256 cores. It is using uint8_t and needs to change to uint16_t variables. I can only suggest you can change the sizes in the places the compiler gives an error. I will try to get Pktgen updated to fix this issue in the next release. > > Below build errors are seen: > - > [pktgen-dpdk]# make > == lib > == common > CC l2p.o > In file included from /root/pktgen-dpdk/lib/common/l2p.c:88:0: > /root/pktgen-dpdk/lib/common/l2p.h: In function ‘pg_raw_dump_l2p’: > /root/pktgen-dpdk/lib/common/l2p.h:103:2: error: comparison is always true > due to limited range of data type [-Werror=type-limits] > for (i = 0; i < RTE_MAX_LCORE; i++) > ^ > /root/pktgen-dpdk/lib/common/l2p.h:109:2: error: comparison is always true > due to limited range of data type [-Werror=type-limits] > for (j = 0; j < RTE_MAX_LCORE; j++) { > ^ > /root/pktgen-dpdk/lib/common/l2p.h: In function ‘pg_dump_l2p’: > /root/pktgen-dpdk/lib/common/l2p.h:486:2: error: comparison is always true > due to limited range of data type [-Werror=type-limits] > for (lid = 0; lid < RTE_MAX_LCORE; lid++) { > ^ > /root/pktgen-dpdk/lib/common/l2p.c: In function ‘pg_port_matrix_dump’: > /root/pktgen-dpdk/lib/common/l2p.c:477:3: error: large integer implicitly > truncated to unsigned type [-Werror=overflow] > cnt.rxtx = get_map(l2p, pid, RTE_MAX_LCORE); > ^ > /root/pktgen-dpdk/lib/common/l2p.c:488:3: error: large integer implicitly > truncated to unsigned type [-Werror=overflow] > cnt.rxtx = get_map(l2p, pid, RTE_MAX_LCORE); > ^ > cc1: all warnings being treated as errors > make[3]: *** [l2p.o] Error 1 > make[2]: *** [all] Error 2 > make[1]: *** [common] Error 2 > make: *** [lib] Error 2 > - > > By changing CONFIG_RTE_MAX_LCORE to 128 in > defconfig_ppc_64-power8-linuxapp-gcc > file, Pktgen-DPDK then builds fine. > > Thanks, > Rahul Regards, Keith
Re: [dpdk-dev] [RFC 3/8] mbuf: set mbuf fields while in pool
Hi Bruce, On Tue, 24 Jan 2017 15:50:49 +, Bruce Richardson wrote: > On Tue, Jan 24, 2017 at 04:19:28PM +0100, Olivier Matz wrote: > > Set the value of m->refcnt to 1, m->nb_segs to 1 and m->next > > to NULL when the mbuf is stored inside the mempool (unused). > > This is done in rte_pktmbuf_prefree_seg(), before freeing or > > recycling a mbuf. > > > > Before this patch, the value of m->refcnt was expected to be 0 > > while in pool. > > > > The objectives are: > > > > - to avoid drivers to set m->next to NULL in the early Rx path, > > since this field is in the second 64B of the mbuf and its access > > could trigger a cache miss > > > > - rationalize the behavior of raw_alloc/raw_free: one is now the > > symmetric of the other, and refcnt is never changed in these > > functions. > > > > Signed-off-by: Olivier Matz > > --- > > drivers/net/mlx5/mlx5_rxtx.c | 5 ++--- > > drivers/net/mpipe/mpipe_tilegx.c | 1 + > > lib/librte_mbuf/rte_mbuf.c | 2 ++ > > lib/librte_mbuf/rte_mbuf.h | 45 > > +--- 4 files changed, 38 > > insertions(+), 15 deletions(-) > > > /* compat with older versions */ > > __rte_deprecated > > -static inline void __attribute__((always_inline)) > > +static inline void > > __rte_mbuf_raw_free(struct rte_mbuf *m) > > { > > rte_mbuf_raw_free(m); > > @@ -1218,8 +1232,12 @@ static inline void rte_pktmbuf_detach(struct > > rte_mbuf *m) m->data_len = 0; > > m->ol_flags = 0; > > > > - if (rte_mbuf_refcnt_update(md, -1) == 0) > > + if (rte_mbuf_refcnt_update(md, -1) == 0) { > > Minor nit, but in the case that we only have a single reference to the > mbufs, we are always setting that to zero just to re-increment it to 1 > again. > > > + md->next = NULL; > > + md->nb_segs = 1; > > + rte_mbuf_refcnt_set(md, 1); > > rte_mbuf_raw_free(md); > > + } > > } > > > > /** I'm trying to gather the comments that have been made on this patchset. About this one, I think it would be more complex to change the code to avoid to set the refcnt twice: - we would need to duplicate code from rte_mbuf_refcnt_update(), which I think is not a very good idea, due to the big comment - it would make the detach code less readable - it's even not sure that it will be more performant: since rte_mbuf_refcnt_update() is inline, the compiler is probably able to do the simplification by itself. Olivier
Re: [dpdk-dev] [PATCH v2 1/5] test: move tests to separate folder
2017-02-16 14:57, Ferruh Yigit: > This is to logically group unit tests into their own folder, > separating them from "app" folder. > > Hopefully this will make the unit test in DPDK more visible. > > Following binaries moved to "test" folder: > cmdline-test > test-acl > test-pipeline > test<-- various DPDK unit tests > > Signed-off-by: Ferruh Yigit > Acked-by: Bruce Richardson I've fixed the MAINTAINERS file and applied the series, thanks
[dpdk-dev] [PATCH 2/2] net/mlx5: add hardware TSO support for VXLAN and GRE
This commit adds support for hardware TSO for tunneled packets. Signed-off-by: Shahaf Shuler --- drivers/net/mlx5/mlx5_ethdev.c | 4 +++- drivers/net/mlx5/mlx5_rxtx.c | 9 + 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 3edfd49..4de3595 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -696,7 +696,9 @@ struct priv * if (priv->tso) info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO; if (priv->tunnel_en) - info->tx_offload_capa |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM; + info->tx_offload_capa |= (DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM | + DEV_TX_OFFLOAD_VXLAN_TNL_TSO | + DEV_TX_OFFLOAD_GRE_TNL_TSO); if (priv_get_ifname(priv, &ifname) == 0) info->if_index = if_nametoindex(ifname); /* FIXME: RETA update/query API expects the callee to know the size of diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index 145daa0..98e7205 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -488,9 +488,18 @@ (1 << txq->wqe_n) * MLX5_WQE_SIZE); unsigned int copy_b; + const uint64_t is_tunneled = + buf->ol_flags & + (PKT_TX_TUNNEL_GRE | +PKT_TX_TUNNEL_VXLAN); tso_header_sz = buf->l2_len + buf->l3_len + buf->l4_len; + + if (is_tunneled && txq->tunnel_en) { + tso_header_sz += buf->outer_l2_len + +buf->outer_l3_len; + } if (unlikely(tso_header_sz > MLX5_MAX_TSO_HEADER)) break; -- 1.8.3.1
[dpdk-dev] [PATCH 1/2] net/mlx5: add hardware checksum offload for tunnel packets
Prior to this commit Tx checksum offload was supported only for the inner headers. This commit adds support for the hardware to compute the checksum for the outer headers as well. The support is for tunneling protocols GRE and VXLAN. Signed-off-by: Shahaf Shuler --- doc/guides/nics/features/mlx5.ini | 2 ++ doc/guides/nics/mlx5.rst | 3 ++- drivers/net/mlx5/mlx5.c | 7 +++ drivers/net/mlx5/mlx5.h | 2 ++ drivers/net/mlx5/mlx5_ethdev.c| 2 ++ drivers/net/mlx5/mlx5_prm.h | 6 ++ drivers/net/mlx5/mlx5_rxtx.c | 14 +- drivers/net/mlx5/mlx5_rxtx.h | 2 ++ drivers/net/mlx5/mlx5_txq.c | 2 ++ 9 files changed, 38 insertions(+), 2 deletions(-) diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini index c6948cb..c2e1c6a 100644 --- a/doc/guides/nics/features/mlx5.ini +++ b/doc/guides/nics/features/mlx5.ini @@ -27,6 +27,8 @@ CRC offload = Y VLAN offload = Y L3 checksum offload = Y L4 checksum offload = Y +Inner L3 checksum= Y +Inner L4 checksum= Y Packet type parsing = Y Basic stats = Y Stats per queue = Y diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 8651456..a9fab9c 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -91,13 +91,14 @@ Features - KVM and VMware ESX SR-IOV modes are supported. - RSS hash result is supported. - Hardware TSO. +- Hardware checksum TX offload for VXLAN and GRE. Limitations --- - Inner RSS for VXLAN frames is not supported yet. - Port statistics through software counters only. -- Hardware checksum offloads for VXLAN inner header are not supported yet. +- Hardware checksum RX offloads for VXLAN inner header are not supported yet. - Secondary process RX is not supported. Configuration diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 3623fbe..ffa16bd 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -375,6 +375,7 @@ struct ibv_device_attr device_attr; unsigned int sriov; unsigned int mps; + unsigned int tunnel_en; int idx; int i; @@ -429,12 +430,17 @@ * as all ConnectX-5 devices. */ switch (pci_dev->id.device_id) { + case PCI_DEVICE_ID_MELLANOX_CONNECTX4: + tunnel_en = 1; + mps = 0; + break; case PCI_DEVICE_ID_MELLANOX_CONNECTX4LX: case PCI_DEVICE_ID_MELLANOX_CONNECTX5: case PCI_DEVICE_ID_MELLANOX_CONNECTX5VF: case PCI_DEVICE_ID_MELLANOX_CONNECTX5EX: case PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF: mps = 1; + tunnel_en = 1; break; default: mps = 0; @@ -539,6 +545,7 @@ priv->mtu = ETHER_MTU; priv->mps = mps; /* Enable MPW by default if supported. */ priv->cqe_comp = 1; /* Enable compression by default. */ + priv->tunnel_en = tunnel_en; err = mlx5_args(priv, pci_dev->device.devargs); if (err) { ERROR("failed to process device arguments: %s", diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index d2bb835..7ba2886 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -127,6 +127,8 @@ struct priv { unsigned int cqe_comp:1; /* Whether CQE compression is enabled. */ unsigned int pending_alarm:1; /* An alarm is pending. */ unsigned int tso:1; /* Whether TSO is supported. */ + unsigned int tunnel_en:1; + /* Whether Tx offloads for tunneled packets are supported. */ unsigned int max_tso_payload_sz; /* Maximum TCP payload for TSO. */ unsigned int txq_inline; /* Maximum packet size for inlining. */ unsigned int txqs_inline; /* Queue number threshold for inlining. */ diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index d56331c..3edfd49 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -695,6 +695,8 @@ struct priv * DEV_TX_OFFLOAD_TCP_CKSUM); if (priv->tso) info->tx_offload_capa |= DEV_TX_OFFLOAD_TCP_TSO; + if (priv->tunnel_en) + info->tx_offload_capa |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM; if (priv_get_ifname(priv, &ifname) == 0) info->if_index = if_nametoindex(ifname); /* FIXME: RETA update/query API expects the callee to know the size of diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h index 755b5d7..33fc386 100644 --- a/drivers/net/mlx5/mlx5_prm.h +++ b/drivers/net/mlx5/mlx5_prm.h @@ -120,6 +120,12 @@ /* Tunnel packet bit in the CQE. */ #define MLX5_CQE_RX_TUNNEL_PACKET (1u << 0) +/* Inner L3
[dpdk-dev] [PATCH 0/2] net/mlx5: add Tx offloads for tunneled packets
This patchset adds support for hardware TX offloads for tunneled packets. [PATCH 1/2] net/mlx5: add hardware checksum offload for tunnel [PATCH 2/2] net/mlx5: add hardware TSO support for VXLAN and GRE
Re: [dpdk-dev] [PATCH 1/3] mk: clean .lib.cmd files
> Signed-off-by: Ferruh Yigit > --- > mk/rte.lib.mk | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Series merged and applied as one commit, thanks
Re: [dpdk-dev] [PATCH v5] doc: use corelist instead of coremask
> -Original Message- > From: Wiles, Keith > Sent: Monday, February 27, 2017 7:14 PM > To: dev@dpdk.org > Cc: iryz...@nfware.com; thomas.monja...@6wind.com; Mcnamara, John > > Subject: [PATCH v5] doc: use corelist instead of coremask > > The coremask option in DPDK is difficult to use and we should be > promoting the use of the corelist (-l) option. The patch > adjusts the docs to use -l EAL option instead of the -c option. > > The patch only changes the docs and not the code as the -c option > will continue to exist unless it is removed in the future. The -c > option should be kept to maintain backward compatibility. > > Signed-off-by: Keith Wiles Thomas, since this patch affects a lot of the documentation it would be good to merge it as early as possible in the release cycle. Acked-by: John McNamara
Re: [dpdk-dev] [PATCH] app/testpmd: Fix typos
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Nirmoy Das > Sent: Monday, February 27, 2017 7:24 PM > To: dev@dpdk.org > Cc: Nirmoy Das > Subject: [dpdk-dev] [PATCH] app/testpmd: Fix typos > > fixes trivial typos in app/test-pmd/cmdline.c, app/test-pmd/icmpecho.c, > app/test-pmd/testpmd.c > > Signed-off-by: Nirmoy Das Hi Nirmoy, Thanks for that. Just a small heads-up since this is your first patch. The subject line should be lowercase (apart from known abbreviations) and there is no need to list the files in the body since git will record that anyway. Have a look at the contribution guidelines for some more pointers: http://dpdk.org/doc/guides/contributing/patches.html#commit-messages-subject-line Apart from that, thanks for the patch. Acked-by: John McNamara
Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
On Tue, Feb 28, 2017 at 01:52:26PM +, Bruce Richardson wrote: > On Tue, Feb 28, 2017 at 05:38:34PM +0530, Jerin Jacob wrote: > > On Tue, Feb 28, 2017 at 11:57:03AM +, Bruce Richardson wrote: > > > On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote: > > > > On Thu, Feb 23, 2017 at 05:23:54PM +, Bruce Richardson wrote: > > > > > Users compiling DPDK should not need to know or care about the > > > > > arrangement > > > > > of cachelines in the rte_ring structure. Therefore just remove the > > > > > build > > > > > option and set the structures to be always split. For improved > > > > > performance use 128B rather than 64B alignment since it stops the > > > > > producer > > > > > and consumer data being on adjacent cachelines. > > > > > > > > > > Signed-off-by: Bruce Richardson > > > > > --- > > > > > config/common_base | 1 - > > > > > doc/guides/rel_notes/release_17_05.rst | 6 ++ > > > > > lib/librte_ring/rte_ring.c | 2 -- > > > > > lib/librte_ring/rte_ring.h | 8 ++-- > > > > > 4 files changed, 8 insertions(+), 9 deletions(-) > > > > > > > > > > diff --git a/config/common_base b/config/common_base > > > > > index aeee13e..099ffda 100644 > > > > > --- a/config/common_base > > > > > +++ b/config/common_base > > > > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y > > > > > # > > > > > CONFIG_RTE_LIBRTE_RING=y > > > > > CONFIG_RTE_LIBRTE_RING_DEBUG=n > > > > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n > > > > > CONFIG_RTE_RING_PAUSE_REP_COUNT=0 > > > > > > > > > > # > > > > > diff --git a/doc/guides/rel_notes/release_17_05.rst > > > > > b/doc/guides/rel_notes/release_17_05.rst > > > > > index e25ea9f..ea45e0c 100644 > > > > > --- a/doc/guides/rel_notes/release_17_05.rst > > > > > +++ b/doc/guides/rel_notes/release_17_05.rst > > > > > @@ -110,6 +110,12 @@ API Changes > > > > > Also, make sure to start the actual text at the margin. > > > > > = > > > > > > > > > > +* **Reworked rte_ring library** > > > > > + > > > > > + The rte_ring library has been reworked and updated. The following > > > > > changes > > > > > + have been made to it: > > > > > + > > > > > + * removed the build-time setting > > > > > ``CONFIG_RTE_RING_SPLIT_PROD_CONS`` > > > > > > > > > > ABI Changes > > > > > --- > > > > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c > > > > > index ca0a108..4bc6da1 100644 > > > > > --- a/lib/librte_ring/rte_ring.c > > > > > +++ b/lib/librte_ring/rte_ring.c > > > > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char > > > > > *name, unsigned count, > > > > > /* compilation-time checks */ > > > > > RTE_BUILD_BUG_ON((sizeof(struct rte_ring) & > > > > > RTE_CACHE_LINE_MASK) != 0); > > > > > -#ifdef RTE_RING_SPLIT_PROD_CONS > > > > > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) & > > > > > RTE_CACHE_LINE_MASK) != 0); > > > > > -#endif > > > > > RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) & > > > > > RTE_CACHE_LINE_MASK) != 0); > > > > > #ifdef RTE_LIBRTE_RING_DEBUG > > > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > > > > > index 72ccca5..04fe667 100644 > > > > > --- a/lib/librte_ring/rte_ring.h > > > > > +++ b/lib/librte_ring/rte_ring.h > > > > > @@ -168,7 +168,7 @@ struct rte_ring { > > > > > uint32_t mask; /**< Mask (size-1) of ring. */ > > > > > volatile uint32_t head; /**< Producer head. */ > > > > > volatile uint32_t tail; /**< Producer tail. */ > > > > > - } prod __rte_cache_aligned; > > > > > + } prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2); > > > > > > > > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of > > > > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are > > > > cache line > > > > size of 128B > > > > > > > Sure. > > > > > > However, can you perhaps try a performance test and check to see if > > > there is a performance difference between the two values before I change > > > it? In my tests I see improved performance by having an extra blank > > > cache-line between the producer and consumer data. > > > > Sure. Which test are you running to measure the performance difference? > > Is it app/test/test_ring_perf.c? > > > > > > Yep, just the basic ring perf test. I look mostly at the core-to-core > numbers, since hyperthread-to-hyperthread or NUMA socket to NUMA socket > would be far less common use cases IMHO. Performance test result shows regression with RTE_CACHE_LINE_MIN_SIZE scheme in some use case and some use case has higher performance(Testing using two physical cores) # base code RTE>>ring_perf_autotest ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue: 84 MP/MC single enq/dequeue: 301 SP/SC burst enq/dequeue (size: 8): 20 MP
[dpdk-dev] [PATCH v6 00/26] linux/eal: Remove most causes of panic on init
From: Aaron Conole In many cases, it's enough to simply let the application know that the call to initialize DPDK has failed. A complete halt can then be decided by the application based on error returned (and the app could even attempt a possible re-attempt after some corrective action by the user or application). Changes ->v2: - Audited all "RTE_LOG (" calls that were introduced, and converted to "RTE_LOG(" - Added some fprintf(stderr, "") lines to indicate errors before logging is initialized - Removed assignments to errno. - Changed patch 14/25 to reflect EFAULT, and document in 25/25 Changes ->v3: - Checkpatch issues in patches 3 (spelling mistake), 9 (issue with leading spaces), and 19 (braces around single line statement if-condition) Changes ->v4: - Error text cleanup. - Add a new check around rte_bus_scan(), added during the development of this series. Changes ->v5: - checkpatch.pl cleanup in patch 02/26 - move rte_errno.h include from patch 15 to patch 02 - remove stdbool.h and use int as return type in patch 06/26 Changes ->v6: - convert all of the initialization calls to RTE_LOG() to rte_eal_init_alert() - run through check-git-log and checkpatches - add Bruce's ack to the series I kept the rte_errno reflection, since this is control-path code and the init function returns a sentinel value of -1. Aaron Conole (26): eal: cpu init will no longer panic eal: return error instead of panic for cpu init eal: do not panic on hugepage info init eal: do not panic on failed hugepage query eal: do not panic if parsing args returns error eal-common: introduce a way to query cpu support eal: do not panic when CPU isn't supported eal: do not panic on memzone initialization fails eal: set errno when exiting for already called eal: do not panic on log failures eal: do not panic on PCI-probe eal: do not panic on vfio failure eal: do not panic on memory init eal: do not panic on tailq init eal: do not panic on alarm init eal: convert timer init not to call panic eal: change the private pipe call to reflect errno eal: do not panic on interrupt thread init eal: do not error if plugins fail to init eal_pci: continue probing even on failures eal: do not panic on failed PCI-probe eal_common_dev: continue initializing vdevs eal: do not panic (or abort) if vdev init fails eal: do not panic when bus probe fails eal: do not panic on failed bus scan rte_eal_init: add info about various error codes lib/librte_eal/common/eal_common_cpuflags.c| 13 +- lib/librte_eal/common/eal_common_dev.c | 5 +- lib/librte_eal/common/eal_common_lcore.c | 7 +- lib/librte_eal/common/eal_common_pci.c | 15 ++- lib/librte_eal/common/eal_common_tailqs.c | 3 +- .../common/include/generic/rte_cpuflags.h | 8 ++ lib/librte_eal/common/include/rte_eal.h| 27 - lib/librte_eal/linuxapp/eal/eal.c | 131 +++-- lib/librte_eal/linuxapp/eal/eal_hugepage_info.c| 9 +- lib/librte_eal/linuxapp/eal/eal_interrupts.c | 5 +- 10 files changed, 170 insertions(+), 53 deletions(-) -- 2.9.3
[dpdk-dev] [PATCH v6 01/26] eal: cpu init will no longer panic
After this change, the EAL CPU NUMA node resolution step can no longer emit an rte_panic. This aligns with the code in rte_eal_init, which expects failures to return an error code. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/common/eal_common_lcore.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c index 2cd4132..84fa0cb 100644 --- a/lib/librte_eal/common/eal_common_lcore.c +++ b/lib/librte_eal/common/eal_common_lcore.c @@ -83,16 +83,17 @@ rte_eal_cpu_init(void) config->lcore_role[lcore_id] = ROLE_RTE; lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id); lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id); - if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) + if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) { #ifdef RTE_EAL_ALLOW_INV_SOCKET_ID lcore_config[lcore_id].socket_id = 0; #else - rte_panic("Socket ID (%u) is greater than " + RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than " "RTE_MAX_NUMA_NODES (%d)\n", lcore_config[lcore_id].socket_id, RTE_MAX_NUMA_NODES); + return -1; #endif - + } RTE_LOG(DEBUG, EAL, "Detected lcore %u as " "core %u on socket %u\n", lcore_id, lcore_config[lcore_id].core_id, -- 2.9.3
[dpdk-dev] [PATCH v6 02/26] eal: return error instead of panic for cpu init
There may be no way to gracefully recover, but the application should be notified that a failure happened, rather than completely aborting. This allows the user to proceed with a "slow-path" type solution. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index bf6b818..81692e7 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -61,6 +61,7 @@ #include #include #include +#include #include #include #include @@ -740,6 +741,12 @@ static int rte_eal_vfio_setup(void) } #endif +static void rte_eal_init_alert(const char *msg) +{ + fprintf(stderr, "EAL: FATAL: %s\n", msg); + RTE_LOG(ERR, EAL, "%s\n", msg); +} + /* Launch threads, called at application init(). */ int rte_eal_init(int argc, char **argv) @@ -767,8 +774,11 @@ rte_eal_init(int argc, char **argv) /* set log level as early as possible */ rte_set_log_level(internal_config.log_level); - if (rte_eal_cpu_init() < 0) - rte_panic("Cannot detect lcores\n"); + if (rte_eal_cpu_init() < 0) { + rte_eal_init_alert("Cannot detect lcores."); + rte_errno = ENOTSUP; + return -1; + } fctret = eal_parse_args(argc, argv); if (fctret < 0) -- 2.9.3
[dpdk-dev] [PATCH v6 03/26] eal: do not panic on hugepage info init
When attempting to scan hugepages, signal to the eal that an error has occurred, rather than performing a panic. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c index 18858e2..7a21e8f 100644 --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c @@ -283,9 +283,12 @@ eal_hugepage_info_init(void) struct dirent *dirent; dir = opendir(sys_dir_path); - if (dir == NULL) - rte_panic("Cannot open directory %s to read system hugepage " - "info\n", sys_dir_path); + if (dir == NULL) { + RTE_LOG(ERR, EAL, + "Cannot open directory %s to read system hugepage info\n", + sys_dir_path); + return -1; + } for (dirent = readdir(dir); dirent != NULL; dirent = readdir(dir)) { struct hugepage_info *hpi; -- 2.9.3
[dpdk-dev] [PATCH v6 04/26] eal: do not panic on failed hugepage query
If we fail to acquire hugepage information, simply signal an error to the application. This clears the run_once counter, allowing the user or application to take a corrective action and retry. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 81692e7..12bd941 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -787,8 +787,12 @@ rte_eal_init(int argc, char **argv) if (internal_config.no_hugetlbfs == 0 && internal_config.process_type != RTE_PROC_SECONDARY && internal_config.xen_dom0_support == 0 && - eal_hugepage_info_init() < 0) - rte_panic("Cannot get hugepage information\n"); + eal_hugepage_info_init() < 0) { + rte_eal_init_alert("Cannot get hugepage information."); + rte_errno = EACCES; + rte_atomic32_clear(&run_once); + return -1; + } if (internal_config.memory == 0 && internal_config.force_sockets == 0) { if (internal_config.no_hugetlbfs) -- 2.9.3
[dpdk-dev] [PATCH v6 06/26] eal-common: introduce a way to query cpu support
This adds a new API to check for the eal cpu versions. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/common/eal_common_cpuflags.c | 13 +++-- lib/librte_eal/common/include/generic/rte_cpuflags.h | 8 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/common/eal_common_cpuflags.c b/lib/librte_eal/common/eal_common_cpuflags.c index b5f76f7..9a2d080 100644 --- a/lib/librte_eal/common/eal_common_cpuflags.c +++ b/lib/librte_eal/common/eal_common_cpuflags.c @@ -43,6 +43,13 @@ void rte_cpu_check_supported(void) { + if (!rte_cpu_is_supported()) + exit(1); +} + +int +rte_cpu_is_supported(void) +{ /* This is generated at compile-time by the build system */ static const enum rte_cpu_flag_t compile_time_flags[] = { RTE_COMPILE_TIME_CPUFLAGS @@ -57,14 +64,16 @@ rte_cpu_check_supported(void) fprintf(stderr, "ERROR: CPU feature flag lookup failed with error %d\n", ret); - exit(1); + return 0; } if (!ret) { fprintf(stderr, "ERROR: This system does not support \"%s\".\n" "Please check that RTE_MACHINE is set correctly.\n", rte_cpu_get_flag_name(compile_time_flags[i])); - exit(1); + return 0; } } + + return 1; } diff --git a/lib/librte_eal/common/include/generic/rte_cpuflags.h b/lib/librte_eal/common/include/generic/rte_cpuflags.h index 71321f3..8d27031 100644 --- a/lib/librte_eal/common/include/generic/rte_cpuflags.h +++ b/lib/librte_eal/common/include/generic/rte_cpuflags.h @@ -82,4 +82,12 @@ rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature); void rte_cpu_check_supported(void); +/** + * This function checks that the currently used CPU supports the CPU features + * that were specified at compile time. It is called automatically within the + * EAL, so does not need to be used by applications. This version returns a + * result so that decisions may be made (for instance, graceful shutdowns). + */ +int +rte_cpu_is_supported(void); #endif /* _RTE_CPUFLAGS_H_ */ -- 2.9.3
[dpdk-dev] [PATCH v6 05/26] eal: do not panic if parsing args returns error
It's possible that the application could take a corrective action here, and either prompt the user for different arguments, or at least perform a better logging. Exiting this early prevents any useful information gathering from the application layer. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 12bd941..f7511ab 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -781,8 +781,12 @@ rte_eal_init(int argc, char **argv) } fctret = eal_parse_args(argc, argv); - if (fctret < 0) - exit(1); + if (fctret < 0) { + rte_eal_init_alert("Invalid 'command line' arguments."); + rte_errno = EINVAL; + rte_atomic32_clear(&run_once); + return -1; + } if (internal_config.no_hugetlbfs == 0 && internal_config.process_type != RTE_PROC_SECONDARY && -- 2.9.3
[dpdk-dev] [PATCH v6 07/26] eal: do not panic when CPU isn't supported
It's now possible to gracefully exit the application, or for applications which support non-dpdk datapaths working in concert with DPDK datapaths, there no longer is the possibility of exiting for unsupported CPUs. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index f7511ab..a671ed4 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -759,7 +759,11 @@ rte_eal_init(int argc, char **argv) char thread_name[RTE_MAX_THREAD_NAME_LEN]; /* checks if the machine is adequate */ - rte_cpu_check_supported(); + if (!rte_cpu_is_supported()) { + rte_eal_init_alert("unsupported cpu type."); + rte_errno = ENOTSUP; + return -1; + } if (!rte_atomic32_test_and_set(&run_once)) return -1; -- 2.9.3
[dpdk-dev] [PATCH v6 08/26] eal: do not panic on memzone initialization fails
When memzone initialization fails, report the error to the calling application rather than panic(). Without a good way of detaching / releasing hugepages, at this point the application will have to restart. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index a671ed4..5a92b28 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -839,8 +839,11 @@ rte_eal_init(int argc, char **argv) /* the directories are locked during eal_hugepage_info_init */ eal_hugedirs_unlock(); - if (rte_eal_memzone_init() < 0) - rte_panic("Cannot init memzone\n"); + if (rte_eal_memzone_init() < 0) { + rte_eal_init_alert("Cannot init memzone\n"); + rte_errno = ENODEV; + return -1; + } if (rte_eal_tailqs_init() < 0) rte_panic("Cannot init tail queues for objects\n"); -- 2.9.3
[dpdk-dev] [PATCH v6 09/26] eal: set errno when exiting for already called
Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 5a92b28..564cac3 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -765,8 +765,11 @@ rte_eal_init(int argc, char **argv) return -1; } - if (!rte_atomic32_test_and_set(&run_once)) + if (!rte_atomic32_test_and_set(&run_once)) { + rte_eal_init_alert("already called initialization."); + rte_errno = EALREADY; return -1; + } logid = strrchr(argv[0], '/'); logid = strdup(logid ? logid + 1: argv[0]); -- 2.9.3
[dpdk-dev] [PATCH v6 10/26] eal: do not panic on log failures
When log initialization fails, it's generally because the fopencookie failed. While this is rare in practice, it could happen, and it is likely because of memory pressure. So, flag the error, and allow the user to retry. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 564cac3..e1740a6 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -825,8 +825,12 @@ rte_eal_init(int argc, char **argv) rte_config_init(); - if (rte_eal_log_init(logid, internal_config.syslog_facility) < 0) - rte_panic("Cannot init logs\n"); + if (rte_eal_log_init(logid, internal_config.syslog_facility) < 0) { + rte_eal_init_alert("Cannot init logging."); + rte_errno = ENOMEM; + rte_atomic32_clear(&run_once); + return -1; + } if (rte_eal_pci_init() < 0) rte_panic("Cannot init PCI\n"); -- 2.9.3
[dpdk-dev] [PATCH v6 11/26] eal: do not panic on PCI-probe
This will usually be an issue because of permissions. However, it could also be caused by OOM. In either case, errno will contain the underlying cause. It is safe to re-init the system here, so allow the application to take corrective action and reinit. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index e1740a6..d5ef7b5 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -832,8 +832,12 @@ rte_eal_init(int argc, char **argv) return -1; } - if (rte_eal_pci_init() < 0) - rte_panic("Cannot init PCI\n"); + if (rte_eal_pci_init() < 0) { + rte_eal_init_alert("Cannot init PCI\n"); + rte_errno = EUNATCH; + rte_atomic32_clear(&run_once); + return -1; + } #ifdef VFIO_PRESENT if (rte_eal_vfio_setup() < 0) -- 2.9.3
[dpdk-dev] [PATCH v6 12/26] eal: do not panic on vfio failure
Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index d5ef7b5..10eefd3 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -840,8 +840,12 @@ rte_eal_init(int argc, char **argv) } #ifdef VFIO_PRESENT - if (rte_eal_vfio_setup() < 0) - rte_panic("Cannot init VFIO\n"); + if (rte_eal_vfio_setup() < 0) { + rte_eal_init_alert("Cannot init VFIO\n"); + rte_errno = EAGAIN; + rte_atomic32_clear(&run_once); + return -1; + } #endif if (rte_eal_memory_init() < 0) -- 2.9.3
[dpdk-dev] [PATCH v6 15/26] eal: do not panic on alarm init
rte_eal_alarm_init() call uses the linux timerfd framework to create a poll()-able timer using standard posix file operations. This could fail for a few reasons given in the man-pages, but many could be corrected by the user application. No need to panic. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index aa10192..7aa6b6e 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -869,8 +869,11 @@ rte_eal_init(int argc, char **argv) return -1; } - if (rte_eal_alarm_init() < 0) - rte_panic("Cannot init interrupt-handling thread\n"); + if (rte_eal_alarm_init() < 0) { + rte_eal_init_alert("Cannot init interrupt-handling thread\n"); + /* rte_eal_alarm_init sets rte_errno on failure. */ + return -1; + } if (rte_eal_timer_init() < 0) rte_panic("Cannot init HPET or TSC timers\n"); -- 2.9.3
[dpdk-dev] [PATCH v6 14/26] eal: do not panic on tailq init
There are some theoretical racy conditions in the system that _could_ cause early tailq init to fail; however, no need to panic the application. While it can't continue using DPDK, it could make better alerts to the user. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/common/eal_common_tailqs.c | 3 +-- lib/librte_eal/linuxapp/eal/eal.c | 7 +-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/lib/librte_eal/common/eal_common_tailqs.c b/lib/librte_eal/common/eal_common_tailqs.c index bb08ec8..4f69828 100644 --- a/lib/librte_eal/common/eal_common_tailqs.c +++ b/lib/librte_eal/common/eal_common_tailqs.c @@ -188,8 +188,7 @@ rte_eal_tailqs_init(void) if (t->head == NULL) { RTE_LOG(ERR, EAL, "Cannot initialize tailq: %s\n", t->name); - /* no need to TAILQ_REMOVE, we are going to panic in -* rte_eal_init() */ + /* TAILQ_REMOVE not needed, error is already fatal */ goto fail; } } diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index ae0beed..aa10192 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -863,8 +863,11 @@ rte_eal_init(int argc, char **argv) return -1; } - if (rte_eal_tailqs_init() < 0) - rte_panic("Cannot init tail queues for objects\n"); + if (rte_eal_tailqs_init() < 0) { + rte_eal_init_alert("Cannot init tail queues for objects\n"); + rte_errno = EFAULT; + return -1; + } if (rte_eal_alarm_init() < 0) rte_panic("Cannot init interrupt-handling thread\n"); -- 2.9.3
[dpdk-dev] [PATCH v6 13/26] eal: do not panic on memory init
This can only happen when access to hugepages (either as primary or secondary process) fails (and that is usually permissions). Since the manner of failure is not reversible, we cannot allow retry. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 10eefd3..ae0beed 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -848,8 +848,11 @@ rte_eal_init(int argc, char **argv) } #endif - if (rte_eal_memory_init() < 0) - rte_panic("Cannot init memory\n"); + if (rte_eal_memory_init() < 0) { + rte_eal_init_alert("Cannot init memory\n"); + rte_errno = ENOMEM; + return -1; + } /* the directories are locked during eal_hugepage_info_init */ eal_hugedirs_unlock(); -- 2.9.3
[dpdk-dev] [PATCH v6 16/26] eal: convert timer init not to call panic
After code inspection, there is no way for eal_timer_init() to fail. It simply returns 0 in all cases. As such, this test could either go-away or stay here as 'future-proofing'. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 7aa6b6e..73a7d38 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -875,8 +875,11 @@ rte_eal_init(int argc, char **argv) return -1; } - if (rte_eal_timer_init() < 0) - rte_panic("Cannot init HPET or TSC timers\n"); + if (rte_eal_timer_init() < 0) { + rte_eal_init_alert("Cannot init HPET or TSC timers\n"); + rte_errno = ENOTSUP; + return -1; + } eal_check_mem_on_local_socket(); -- 2.9.3
[dpdk-dev] [PATCH v6 17/26] eal: change the private pipe call to reflect errno
There could be some confusion as to why the call failed - this change will always reflect the value of the error in rte_error. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal_interrupts.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index 92a19cb..5bb833e 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -898,13 +898,16 @@ rte_eal_intr_init(void) * create a pipe which will be waited by epoll and notified to * rebuild the wait list of epoll. */ - if (pipe(intr_pipe.pipefd) < 0) + if (pipe(intr_pipe.pipefd) < 0) { + rte_errno = errno; return -1; + } /* create the host thread to wait/handle the interrupt */ ret = pthread_create(&intr_thread, NULL, eal_intr_thread_main, NULL); if (ret != 0) { + rte_errno = ret; RTE_LOG(ERR, EAL, "Failed to create thread for interrupt handling\n"); } else { -- 2.9.3
[dpdk-dev] [PATCH v6 18/26] eal: do not panic on interrupt thread init
When initializing the interrupt thread, there are a number of possible reasons for failure - some of which are correctable by the application. Do not panic() needlessly, and give the application a change to reflect this information to the user. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 73a7d38..d174ad4 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -894,8 +894,10 @@ rte_eal_init(int argc, char **argv) rte_config.master_lcore, (int)thread_id, cpuset, ret == 0 ? "" : "..."); - if (rte_eal_intr_init() < 0) - rte_panic("Cannot init interrupt-handling thread\n"); + if (rte_eal_intr_init() < 0) { + rte_eal_init_alert("Cannot init interrupt-handling thread\n"); + return -1; + } if (rte_bus_scan()) rte_panic("Cannot scan the buses for devices\n"); -- 2.9.3
[dpdk-dev] [PATCH v6 19/26] eal: do not error if plugins fail to init
Plugins are useful and important. However, it seems crazy to abort everything just because they don't initialize properly. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index d174ad4..0ba9766 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -884,7 +884,7 @@ rte_eal_init(int argc, char **argv) eal_check_mem_on_local_socket(); if (eal_plugins_init() < 0) - rte_panic("Cannot init plugins\n"); + rte_eal_init_alert("Cannot init plugins\n"); eal_thread_init_master(rte_config.master_lcore); -- 2.9.3
[dpdk-dev] [PATCH v6 20/26] eal_pci: continue probing even on failures
Some devices may be inaccessible for a variety of reasons, or the PCI-bus may be unavailable causing the whole thing to fail. Still, better to continue attempts at probes. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/common/eal_common_pci.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c index 72547bd..9416190 100644 --- a/lib/librte_eal/common/eal_common_pci.c +++ b/lib/librte_eal/common/eal_common_pci.c @@ -69,6 +69,7 @@ #include #include +#include #include #include #include @@ -416,6 +417,7 @@ rte_eal_pci_probe(void) struct rte_pci_device *dev = NULL; struct rte_devargs *devargs; int probe_all = 0; + int ret_1 = 0; int ret = 0; if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0) @@ -430,17 +432,20 @@ rte_eal_pci_probe(void) /* probe all or only whitelisted devices */ if (probe_all) - ret = pci_probe_all_drivers(dev); + ret_1 = pci_probe_all_drivers(dev); else if (devargs != NULL && devargs->type == RTE_DEVTYPE_WHITELISTED_PCI) - ret = pci_probe_all_drivers(dev); - if (ret < 0) - rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT + ret_1 = pci_probe_all_drivers(dev); + if (ret_1 < 0) { + RTE_LOG(ERR, EAL, "Requested device " PCI_PRI_FMT " cannot be used\n", dev->addr.domain, dev->addr.bus, dev->addr.devid, dev->addr.function); + rte_errno = errno; + ret = 1; + } } - return 0; + return -ret; } /* dump one device */ -- 2.9.3
[dpdk-dev] [PATCH v6 21/26] eal: do not panic on failed PCI-probe
Since PCI isn't neccessarily required, it may be possible to simply log the error and continue on letting the user check the logs and restart the application when things have failed. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 0ba9766..b13e1dd 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -943,8 +943,11 @@ rte_eal_init(int argc, char **argv) rte_panic("Cannot probe devices\n"); /* Probe & Initialize PCI devices */ - if (rte_eal_pci_probe()) - rte_panic("Cannot probe PCI\n"); + if (rte_eal_pci_probe()) { + rte_eal_init_alert("Cannot probe PCI\n"); + rte_errno = ENOTSUP; + return -1; + } if (rte_eal_dev_init() < 0) rte_panic("Cannot init pmd devices\n"); -- 2.9.3
[dpdk-dev] [PATCH v6 22/26] eal_common_dev: continue initializing vdevs
Even if one vdev should fail, there's no need to prevent further processing. Log the error, and reflect it to the higher levels to decide. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/common/eal_common_dev.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index 4f3b493..9889997 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -80,6 +80,7 @@ int rte_eal_dev_init(void) { struct rte_devargs *devargs; + int ret = 0; /* * Note that the dev_driver_list is populated here @@ -97,11 +98,11 @@ rte_eal_dev_init(void) devargs->args)) { RTE_LOG(ERR, EAL, "failed to initialize %s device\n", devargs->virt.drv_name); - return -1; + ret = -1; } } - return 0; + return ret; } int rte_eal_dev_attach(const char *name, const char *devargs) -- 2.9.3
[dpdk-dev] [PATCH v6 24/26] eal: do not panic when bus probe fails
Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index ddc50f2..8274196 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -939,8 +939,11 @@ rte_eal_init(int argc, char **argv) rte_eal_mp_wait_lcore(); /* Probe all the buses and devices/drivers on them */ - if (rte_bus_probe()) - rte_panic("Cannot probe devices\n"); + if (rte_bus_probe()) { + rte_eal_init_alert("Cannot probe devices\n"); + rte_errno = ENOTSUP; + return -1; + } /* Probe & Initialize PCI devices */ if (rte_eal_pci_probe()) { -- 2.9.3
[dpdk-dev] [PATCH v6 25/26] eal: do not panic on failed bus scan
For now, do an abort. It's likely that even aborting the initialization is premature in this case, as it may be possible to proceed even if one bus or another is not available. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 8274196..6d6b825 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -899,8 +899,11 @@ rte_eal_init(int argc, char **argv) return -1; } - if (rte_bus_scan()) - rte_panic("Cannot scan the buses for devices\n"); + if (rte_bus_scan()) { + rte_eal_init_alert("Cannot scan the buses for devices\n"); + rte_errno = ENODEV; + return -1; + } RTE_LCORE_FOREACH_SLAVE(i) { -- 2.9.3
[dpdk-dev] [PATCH v6 23/26] eal: do not panic (or abort) if vdev init fails
Seems like it's possible to continue. At least, the error is reflected properly in the logs. A user could then go and correct or investigate the situation. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/linuxapp/eal/eal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index b13e1dd..ddc50f2 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -950,7 +950,7 @@ rte_eal_init(int argc, char **argv) } if (rte_eal_dev_init() < 0) - rte_panic("Cannot init pmd devices\n"); + rte_eal_init_alert("Cannot init pmd devices\n"); rte_eal_mcfg_complete(); -- 2.9.3
[dpdk-dev] [PATCH v6 26/26] rte_eal_init: add info about various error codes
The rte_eal_init function will now pass failure reason hints to the application. To help app developers deciper this, add some brief information about what the codes are indicating. Signed-off-by: Aaron Conole Acked-by: Bruce Richardson --- lib/librte_eal/common/include/rte_eal.h | 27 ++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h index 03fee50..9251244 100644 --- a/lib/librte_eal/common/include/rte_eal.h +++ b/lib/librte_eal/common/include/rte_eal.h @@ -159,7 +159,32 @@ int rte_eal_iopl_init(void); * function call and should not be further interpreted by the * application. The EAL does not take any ownership of the memory used * for either the argv array, or its members. - * - On failure, a negative error value. + * - On failure, -1 and rte_errno is set to a value indicating the cause + * for failure. In some instances, the application will need to be + * restarted as part of clearing the issue. + * + * Error codes returned via rte_errno: + * EACCES indicates a permissions issue. + * + * EAGAIN indicates either a bus or system resource was not available, + *setup may be attempted again. + * + * EALREADY indicates that the rte_eal_init function has already been + * called, and cannot be called again. + * + * EFAULT indicates the tailq configuration name was not found in + *memory configuration. + * + * EINVAL indicates invalid parameters were passed as argv/argc. + * + * ENOMEM indicates failure likely caused by an out-of-memory condition. + * + * ENODEV indicates memory setup issues. + * + * ENOTSUP indicates that the EAL cannot initialize on this system. + * + * EUNATCH indicates that the PCI bus is either not present, or is not + * readable by the eal. */ int rte_eal_init(int argc, char **argv); -- 2.9.3
Re: [dpdk-dev] [ovs-dev] [PATCH] selinux: Allow creating tap devices.
Daniele Di Proietto writes: > On 26/01/2017 12:35, "Ansis Atteka" wrote: >> >> >>On 26 January 2017 at 21:24, Aaron Conole >> wrote: >> >>Daniele Di Proietto writes: >> >>> On 25/01/2017 00:01, "Ansis Atteka" wrote: >>> On Jan 25, 2017 4:22 AM, "Daniele Di Proietto" wrote: Current SELinux policy in RHEL and Fedora doesn't allow the creation of TAP devices. A tap device is used by dpif-netdev to create internal devices. Without this patch, adding any bridge backed by the userspace datapath would fail. This doesn't mean that we can run Open vSwitch with DPDK under SELinux yet, but at least we can use the userspace datapath. Signed-off-by: Daniele Di Proietto >> >>I just noticed this, sorry for jumping in late. >> Acked-by: Ansis Atteka I saw that other open source projects like OpenVPN use rw_file_perms shortcut macro. Not sure how relevant that is for OVS but that macro expands to a little more function calls than what you have below. Maybe we don't need it, if what you have just worked. >>> >>> Thanks a lot for the review. >>> >>> I cooked this up using audit2allow and I tested it on fedora 25. I'm >>> now able to create and delete userspace bridges, without any further >>> complaints from selinux >> >>I have the following openvswitch-custom.te that did work to run >>ovs+dpdk under selinux and pass traffic: >> >> >>Thanks for posting this. I think that this is really helpful to >> gather all necessary OVS+DPDK rules from different sources to make >> sure that nothing is missed. > +1, thanks a lot >> >> >> >> 8< >> >>require { >>type openvswitch_t; >>type openvswitch_tmp_t; >>type openvswitch_var_run_t; >>type ifconfig_exec_t; >>type hostname_exec_t; >>type vfio_device_t; >>type kernel_t; >>type tun_tap_device_t; >>type hugetlbfs_t; >>type init_t; >>class netlink_socket { setopt getopt create connect getattr write >> read }; >>class file { write getattr read open execute execute_no_trans create >> unlink }; >>class chr_file { write getattr read open ioctl }; >>class unix_stream_socket { write getattr read connectto connect >> setopt getopt sendto accept bind recvfrom acceptfrom }; >>class dir { write remove_name add_name lock read }; >>} >> >>#= openvswitch_t == >>allow openvswitch_t self:netlink_socket { setopt getopt create connect >>getattr write read }; >>allow openvswitch_t hostname_exec_t:file { read getattr open execute >>execute_no_trans }; >>allow openvswitch_t ifconfig_exec_t:file { read getattr open execute >>execute_no_trans }; >>allow openvswitch_t openvswitch_tmp_t:file { execute execute_no_trans }; >>allow openvswitch_t openvswitch_tmp_t:unix_stream_socket { write getattr read >>connectto connect setopt getopt sendto accept bind recvfrom acceptfrom }; >>allow openvswitch_t vfio_device_t:chr_file { read write open ioctl getattr }; >>allow openvswitch_t tun_tap_device_t:chr_file { read write getattr open ioctl >>}; >>allow openvswitch_t hugetlbfs_t:dir { write remove_name add_name lock read }; >>allow openvswitch_t hugetlbfs_t:file { create unlink }; >>allow openvswitch_t kernel_t:unix_stream_socket { write getattr read >>connectto connect setopt getopt sendto accept bind recvfrom acceptfrom }; >>allow openvswitch_t init_t:file { read open }; >> >> >8 >> >>You'll note that this change gives the openvswitch complete access to >>hugetlbfs label, which might be the biggest scary part. >> >> >>There is also option to use SELinux switches that allow to activate only >>subset of SElinux rules on a "per OVS feature basis" if there is risk that >>because of DPDK whitelise we could be unconditionally loosening up SElinux >>policy too much for non-DPDK >> cases. See [https://wiki.centos.org/TipsAndTricks/SelinuxBooleans] for more >> details. > Ok, so perhaps we should require tun_tap_device_t permissions only if > we enable userspace support with a boolean. > I just posted this piece because the corresponding code is in openvswitch > source tree. > The rest of the permissions (hugepages, vfio) are required because of > code that's in the dpdk library. Is there a way to put these in DPDK > and then just call a macro here, like > dpdk_perms(openvswitch_t) Below is an example of the macro: 8< define(`dpdk_perms', ` gen_require(` type vfio_device_t; type kernel_t; type hugetlbfs_t; class file { write getattr read open execute execute_no_trans create unlink }; class chr_file { write getattr read open ioctl }; class unix_stream_socket { write getattr read connectto connect
Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
> -Original Message- > From: Olivier Matz [mailto:olivier.m...@6wind.com] > Sent: Tuesday, February 28, 2017 12:28 PM > To: Ananyev, Konstantin > Cc: Jan Blunck ; Richardson, Bruce > ; dev@dpdk.org > Subject: Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization > > On Tue, 28 Feb 2017 11:48:20 +, "Ananyev, Konstantin" > wrote: > > > > > > On Tue, 28 Feb 2017 10:29:41 +, "Ananyev, Konstantin" > > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > On Tue, 28 Feb 2017 09:05:07 +, "Ananyev, Konstantin" > > > > > wrote: > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > In my opinion, if we have the room in the first cache > > > > > > > > line, we should put it there. The only argument I see > > > > > > > > against is "we may find something more important in the > > > > > > > > future, and we won't have room for it in the first cache > > > > > > > > line". I don't feel we should penalize today's use cases > > > > > > > > for hypothetic future use cases. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 2. timestamp normalization point > > > > > > > >> inside PMD RX vs somewhere later as user needs it > > > > > > > >> (extra function in dev_ops?). > > > > > > > > > > > > > > > > This point could be changed. My initial proposition tries > > > > > > > > to provide a generic API for timestamp. Let me remind it > > > > > > > > here: > > > > > > > > > > > > > > > > a- the timestamp is in nanosecond > > > > > > > > b- the reference is always the same for a given path: if > > > > > > > > the timestamp is set in a PMD, all the packets for this > > > > > > > > PMD will have the same reference, but for 2 different > > > > > > > > PMDs (or a sw lib), the reference would not be the same. > > > > > > > > > > > > > > > > We may remove a-, and just have: > > > > > > > > - the reference and the unit are always the same for a > > > > > > > > given path: if the timestamp is set in a PMD, all the > > > > > > > > packets for this PMD will have the same reference and > > > > > > > > unit, but for 2 different PMDs (or a sw lib), they would > > > > > > > > not be the same. > > > > > > > > > > > > > > > > In both cases, we would need a conversion code (maybe in a > > > > > > > > library) if the application wants to work with timestamps > > > > > > > > from several sources. The second solution removes the > > > > > > > > normalization code in the PMD when not needed, it is > > > > > > > > probably better. > > > > > > > > > > > > > > I agree. > > > > > > > > > > > > One question - does that mean that application would need to > > > > > > keep a track from what PMD each particular packet came to do > > > > > > the normalization? Konstantin > > > > > > > > > > I'd say yes. It does not look very difficult to do, since the > > > > > mbuf contains the input port id. > > > > > > > > > > > > > I understand that we can use mbuf->port here, but it means that > > > > we'll introduce new implicit dependency between timestamp and > > > > port values. From my point that introduces new implications: > > > > 1. all PMDs that do set a timestamp would also have to set port > > > > value too. Probably not a big deal as most of PMDs do set port > > > > value anyway right now, but it means it would be hard to get > > > > rid/change mbuf->port in future. > > > > > > Currently, all PMDs must set m->port. > > > If in the future we remove m->port, the applications that use it > > > will need to store the value in a mbuf metadata, or pass it as > > > arguments through function calls. > > > > > > > > > > 2. Applications would not allowed to change mbuf->port value > > > > before normalization is done (from what I heard some apps do > > > > update mbuf->port to store routing decisions). BTW, how the app > > > > would keep track which mbufs were already normalized, and which > > > > were not? > > > > > > I don't think it should be allowed to change m->port value. > > > > As far as I know it is allowed right now. > > PMD RX routine sets mbuf->port, after that application is free to use > > it in a way it likes. > > The descriptor or m->port is "Input port". If the applications stores > something else than the input port, it is its responsibility if it > breaks something else. Like changing any other field to put something > that does not match the description. > > > > What we are introducing here is basically a new dependency between 2 > > mbuf fields and new restriction. > > On the other hand, there is no strong dependency: the API to do the > normalization can take the port as a parameter. Ok, that would be much better - the dependency is still there, but at least we don't force it. > > > > > > Another thing that doesn't look very convenient to me here - > > We can have 2 different values of timestamp (both normalized and not) > > and there is no clear way for the application to know which one is in > > use right now. So each app writer would have to come-up with his own > > solution.
[dpdk-dev] What is the max size of packets rte_eth_tx_burst() can send practically/theoritically?
Hello, For example, as I understand, ovs-dpdk code uses a buffer of size 32 when it transmits via rte_eth_tx_burst(). I think it can transmit more packets in a bust. I know there should be a balance between throughput and latency. But, I am wondering what max size packets a dpdk application can give to rte_eth_tx_burst() practically/theoretically? (I see some max macro in ixgbe_rxtx.h) Thanks
[dpdk-dev] [PATCH 0/2] net/mlx5: add enhanced multi-packet send for ConnectX-5
This patch set is to add the Enhanced Multi-Packet Send feature which is newly introduced for ConnectX-5 families of adaptors. Yongseok Koh (2): net/mlx5: add enhanced multi-packet send for ConnectX-5 doc: update PMD options for mlx5 doc/guides/nics/mlx5.rst | 32 +++- drivers/net/mlx5/mlx5.c| 34 +++- drivers/net/mlx5/mlx5.h| 4 +- drivers/net/mlx5/mlx5_ethdev.c | 6 +- drivers/net/mlx5/mlx5_prm.h| 23 +++ drivers/net/mlx5/mlx5_rxtx.c | 405 + drivers/net/mlx5/mlx5_rxtx.h | 5 + drivers/net/mlx5/mlx5_txq.c| 18 +- 8 files changed, 514 insertions(+), 13 deletions(-) -- 2.11.0
[dpdk-dev] [PATCH 2/2] doc: update PMD options for mlx5
Enhanced multi-packet send mode is newly introduced for ConnectX-5 families of adaptors. Signed-off-by: Yongseok Koh --- doc/guides/nics/mlx5.rst | 32 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 09922a08f..fda8bc3fe 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -181,14 +181,38 @@ Run-time configuration - ``txq_mpw_en`` parameter [int] - A nonzero value enables multi-packet send. This feature allows the TX - burst function to pack up to five packets in two descriptors in order to - save PCI bandwidth and improve performance at the cost of a slightly - higher CPU usage. + A nonzero value enables multi-packet send (MPS) for ConnectX-4 Lx and + enhanced multi-packet send (Enhanced MPS) for ConnectX-5. MPS allows the + TX burst function to pack up multiple packets in a single descriptor + session in order to save PCI bandwidth and improve performance at the + cost of a slightly higher CPU usage. When ``txq_inline`` is set along + with ``txq_mpw_en``, TX burst function tries to copy entire packet data + on to TX descriptor instead of including pointer of packet only if there + is enough room remained in the descriptor. ``txq_inline`` sets + per-descriptor space for either pointers or inlined packets. In addition, + Enhanced MPS supports hybrid mode - mixing inlined packets and pointers + in the same descriptor. It is currently only supported on the ConnectX-4 Lx and ConnectX-5 families of adapters. Enabled by default. +- ``txq_mpw_hdr_dseg_en`` parameter [int] + + A nonzero value enables including two pointers in the unused space in the + first block of TX descriptor. This can be used to lessen CPU load for + memory copy. + + Effective only when Enhanced MPS is supported. Disabled by default. + +- ``txq_max_inline_len`` parameter [int] + + Maximum size of packet to be inlined. This limits the size of packet to + be inlined. If the size of a packet is larger than configured value, the + packet isn't inlined even though there's enough space remained in the + descriptor. Instead, the packet is included with pointer. + + Effective only when Enhanced MPS is supported. The default value is 256. + Prerequisites - -- 2.11.0
[dpdk-dev] [PATCH 1/2] net/mlx5: add enhanced multi-packet send for ConnectX-5
ConnectX-5 supports enhanced version of multi-packet send (MPS). An MPS Tx descriptor can carry multiple packets either by including pointers of packets or by inlining packets. Inlining packet data can be helpful to better utilize PCIe bandwidth. In addition, Enhanced MPS supports hybrid mode - mixing inlined packets and pointers in a descriptor. This feature is enabled by default if supported by HW. Signed-off-by: Yongseok Koh --- drivers/net/mlx5/mlx5.c| 34 +++- drivers/net/mlx5/mlx5.h| 4 +- drivers/net/mlx5/mlx5_ethdev.c | 6 +- drivers/net/mlx5/mlx5_prm.h| 23 +++ drivers/net/mlx5/mlx5_rxtx.c | 405 + drivers/net/mlx5/mlx5_rxtx.h | 5 + drivers/net/mlx5/mlx5_txq.c| 18 +- 7 files changed, 486 insertions(+), 9 deletions(-) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index d4bd4696c..24e3865f0 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -84,6 +84,12 @@ /* Device parameter to enable multi-packet send WQEs. */ #define MLX5_TXQ_MPW_EN "txq_mpw_en" +/* Device parameter to configure the number of dsegs before inlined packet. */ +#define MLX5_TXQ_MPW_HDR_DSEG_EN "txq_mpw_hdr_dseg_en" + +/* Device parameter to limit the size of inlining packet */ +#define MLX5_TXQ_MAX_INLINE_LEN "txq_max_inline_len" + /** * Retrieve integer value from environment variable. * @@ -289,7 +295,11 @@ mlx5_args_check(const char *key, const char *val, void *opaque) } else if (strcmp(MLX5_TXQS_MIN_INLINE, key) == 0) { priv->txqs_inline = tmp; } else if (strcmp(MLX5_TXQ_MPW_EN, key) == 0) { - priv->mps &= !!tmp; /* Enable MPW only if HW supports */ + priv->mps = !!tmp ? priv->mps : MLX5_MPW_DISABLED; + } else if (strcmp(MLX5_TXQ_MPW_HDR_DSEG_EN, key) == 0) { + priv->mpw_hdr_dseg = !!tmp; + } else if (strcmp(MLX5_TXQ_MAX_INLINE_LEN, key) == 0) { + priv->txq_max_inline_len = tmp; } else { WARN("%s: unknown parameter", key); return -EINVAL; @@ -316,6 +326,8 @@ mlx5_args(struct priv *priv, struct rte_devargs *devargs) MLX5_TXQ_INLINE, MLX5_TXQS_MIN_INLINE, MLX5_TXQ_MPW_EN, + MLX5_TXQ_MPW_HDR_DSEG_EN, + MLX5_TXQ_MAX_INLINE_LEN, NULL, }; struct rte_kvargs *kvlist; @@ -424,20 +436,23 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) */ switch (pci_dev->id.device_id) { case PCI_DEVICE_ID_MELLANOX_CONNECTX4LX: + mps = MLX5_MPW; + break; case PCI_DEVICE_ID_MELLANOX_CONNECTX5: case PCI_DEVICE_ID_MELLANOX_CONNECTX5VF: case PCI_DEVICE_ID_MELLANOX_CONNECTX5EX: case PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF: - mps = 1; + mps = MLX5_MPW_ENHANCED; break; default: - mps = 0; + mps = MLX5_MPW_DISABLED; } INFO("PCI information matches, using device \"%s\"" -" (SR-IOV: %s, MPS: %s)", +" (SR-IOV: %s, %sMPS: %s)", list[i]->name, sriov ? "true" : "false", -mps ? "true" : "false"); +mps == MLX5_MPW_ENHANCED ? "Enhanced " : "", +mps != MLX5_MPW_DISABLED ? "true" : "false"); attr_ctx = ibv_open_device(list[i]); err = errno; break; @@ -531,6 +546,12 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) priv->pd = pd; priv->mtu = ETHER_MTU; priv->mps = mps; /* Enable MPW by default if supported. */ + /* Set default values for Enhanced MPW, a.k.a MPWv2 */ + if (mps == MLX5_MPW_ENHANCED) { + priv->mpw_hdr_dseg = 0; + priv->txqs_inline = MLX5_EMPW_MIN_TXQS; + priv->txq_max_inline_len = MLX5_EMPW_MAX_INLINE_LEN; + } priv->cqe_comp = 1; /* Enable compression by default. */ err = mlx5_args(priv, pci_dev->device.devargs); if (err) { @@ -586,6 +607,9 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) err = ENOTSUP; goto port_error; } + INFO("%sMPS is %s", +priv->mps == MLX5_MPW_ENHANCED ? "Enhanced " : "", +priv->mps != MLX5_MPW_DISABLED ? "enabled" : "disabled"); /* Allocate and register default RSS hash keys. */ priv->rss_conf = rte_calloc(__func__, hash_rxq_in
[dpdk-dev] [PATCH 0/9] Update ixgbe base driver
Some minor changes and fixes, including, link block check for KR, complete HW initialization even if SFP is not present, add VF xcast promiscuous mode. Wenzhuo Lu (9): net/ixgbe/base: make a debug message simple net/ixgbe/base: remove X550em SFP iXFI setup net/ixgbe/base: add link block check for KR net/ixgbe/base: add bit for enabling L3/L4 filtering net/ixgbe/base: complete hw init when SFP not present net/ixgbe/base: disable FC for 15B0 net/ixgbe: support xcast promisc mode net/ixgbe/base: fix build error net/ixgbe/base: update readme doc/guides/rel_notes/release_17_05.rst | 8 + drivers/net/ixgbe/base/README | 4 +-- drivers/net/ixgbe/base/ixgbe_common.c | 7 ++-- drivers/net/ixgbe/base/ixgbe_mbx.h | 8 + drivers/net/ixgbe/base/ixgbe_phy.c | 23 ++--- drivers/net/ixgbe/base/ixgbe_type.h| 1 + drivers/net/ixgbe/base/ixgbe_vf.c | 4 +++ drivers/net/ixgbe/base/ixgbe_x550.c| 63 +- drivers/net/ixgbe/ixgbe_ethdev.c | 6 9 files changed, 47 insertions(+), 77 deletions(-) -- 1.9.3
[dpdk-dev] [PATCH 5/9] net/ixgbe/base: complete hw init when SFP not present
If SFP module is not present, reset_hw doesn't return success. SW should complete the initialization, or with specific module it resulted in no link when the module was later inserted. Signed-off-by: Wenzhuo Lu --- drivers/net/ixgbe/base/ixgbe_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ixgbe/base/ixgbe_common.c b/drivers/net/ixgbe/base/ixgbe_common.c index 9645667..df6fbf9 100644 --- a/drivers/net/ixgbe/base/ixgbe_common.c +++ b/drivers/net/ixgbe/base/ixgbe_common.c @@ -495,7 +495,7 @@ s32 ixgbe_init_hw_generic(struct ixgbe_hw *hw) /* Reset the hardware */ status = hw->mac.ops.reset_hw(hw); - if (status == IXGBE_SUCCESS) { + if (status == IXGBE_SUCCESS || status == IXGBE_ERR_SFP_NOT_PRESENT) { /* Start the HW */ status = hw->mac.ops.start_hw(hw); } -- 1.9.3
[dpdk-dev] [PATCH 1/9] net/ixgbe/base: make a debug message simple
The debug message is too long. Make it shorter. Signed-off-by: Wenzhuo Lu --- drivers/net/ixgbe/base/ixgbe_x550.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c b/drivers/net/ixgbe/base/ixgbe_x550.c index 6f9c034..cf72bba 100644 --- a/drivers/net/ixgbe/base/ixgbe_x550.c +++ b/drivers/net/ixgbe/base/ixgbe_x550.c @@ -2519,7 +2519,7 @@ s32 ixgbe_reset_hw_X550em(struct ixgbe_hw *hw) status); if (status == IXGBE_ERR_SFP_NOT_SUPPORTED) { - DEBUGOUT("Returning from reset HW since PHY ops init returned IXGBE_ERR_SFP_NOT_SUPPORTED\n"); + DEBUGOUT("Returning from reset HW due to PHY init failure\n"); return status; } -- 1.9.3
[dpdk-dev] [PATCH 3/9] net/ixgbe/base: add link block check for KR
When setting up link on x550 KR devices, should check if there're constraints on link from manageability, which may result in link loss. Signed-off-by: Wenzhuo Lu --- drivers/net/ixgbe/base/ixgbe_x550.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c b/drivers/net/ixgbe/base/ixgbe_x550.c index a28ced1..1058684 100644 --- a/drivers/net/ixgbe/base/ixgbe_x550.c +++ b/drivers/net/ixgbe/base/ixgbe_x550.c @@ -2663,6 +2663,9 @@ s32 ixgbe_setup_kr_x550em(struct ixgbe_hw *hw) if (hw->phy.autoneg_advertised & IXGBE_LINK_SPEED_2_5GB_FULL) return IXGBE_SUCCESS; + if (ixgbe_check_reset_blocked(hw)) + return 0; + return ixgbe_setup_kr_speed_x550em(hw, hw->phy.autoneg_advertised); } -- 1.9.3
[dpdk-dev] [PATCH 2/9] net/ixgbe/base: remove X550em SFP iXFI setup
Removes X550em SFP iXFI setup since there is no released HW production with SFP iXFI. Signed-off-by: Wenzhuo Lu --- drivers/net/ixgbe/base/ixgbe_x550.c | 57 +++-- 1 file changed, 11 insertions(+), 46 deletions(-) diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c b/drivers/net/ixgbe/base/ixgbe_x550.c index cf72bba..a28ced1 100644 --- a/drivers/net/ixgbe/base/ixgbe_x550.c +++ b/drivers/net/ixgbe/base/ixgbe_x550.c @@ -2694,53 +2694,18 @@ s32 ixgbe_setup_mac_link_sfp_x550em(struct ixgbe_hw *hw, if (ret_val != IXGBE_SUCCESS) return ret_val; - if (!(hw->phy.nw_mng_if_sel & IXGBE_NW_MNG_IF_SEL_INT_PHY_MODE)) { - /* Configure CS4227 LINE side to 10G SR. */ - reg_slice = IXGBE_CS4227_LINE_SPARE22_MSB + - (hw->bus.lan_id << 12); - reg_val = IXGBE_CS4227_SPEED_10G; - ret_val = hw->link.ops.write_link(hw, hw->link.addr, reg_slice, - reg_val); - - reg_slice = IXGBE_CS4227_LINE_SPARE24_LSB + - (hw->bus.lan_id << 12); + /* Configure internal PHY for KR/KX. */ + ixgbe_setup_kr_speed_x550em(hw, speed); + + /* Configure CS4227 LINE side to proper mode. */ + reg_slice = IXGBE_CS4227_LINE_SPARE24_LSB + + (hw->bus.lan_id << 12); + if (setup_linear) + reg_val = (IXGBE_CS4227_EDC_MODE_CX1 << 1) | 0x1; + else reg_val = (IXGBE_CS4227_EDC_MODE_SR << 1) | 0x1; - ret_val = hw->link.ops.write_link(hw, hw->link.addr, reg_slice, - reg_val); - - /* Configure CS4227 for HOST connection rate then type. */ - reg_slice = IXGBE_CS4227_HOST_SPARE22_MSB + - (hw->bus.lan_id << 12); - reg_val = (speed & IXGBE_LINK_SPEED_10GB_FULL) ? - IXGBE_CS4227_SPEED_10G : IXGBE_CS4227_SPEED_1G; - ret_val = hw->link.ops.write_link(hw, hw->link.addr, reg_slice, - reg_val); - - reg_slice = IXGBE_CS4227_HOST_SPARE24_LSB + - (hw->bus.lan_id << 12); - if (setup_linear) - reg_val = (IXGBE_CS4227_EDC_MODE_CX1 << 1) | 0x1; - else - reg_val = (IXGBE_CS4227_EDC_MODE_SR << 1) | 0x1; - ret_val = hw->link.ops.write_link(hw, hw->link.addr, reg_slice, - reg_val); - - /* Setup XFI internal link. */ - ret_val = ixgbe_setup_ixfi_x550em(hw, &speed); - } else { - /* Configure internal PHY for KR/KX. */ - ixgbe_setup_kr_speed_x550em(hw, speed); - - /* Configure CS4227 LINE side to proper mode. */ - reg_slice = IXGBE_CS4227_LINE_SPARE24_LSB + - (hw->bus.lan_id << 12); - if (setup_linear) - reg_val = (IXGBE_CS4227_EDC_MODE_CX1 << 1) | 0x1; - else - reg_val = (IXGBE_CS4227_EDC_MODE_SR << 1) | 0x1; - ret_val = hw->link.ops.write_link(hw, hw->link.addr, reg_slice, - reg_val); - } + ret_val = hw->link.ops.write_link(hw, hw->link.addr, reg_slice, + reg_val); return ret_val; } -- 1.9.3
[dpdk-dev] [PATCH 4/9] net/ixgbe/base: add bit for enabling L3/L4 filtering
Add a L3/L4 filtering definition of register MRQC for the future use. Signed-off-by: Wenzhuo Lu --- drivers/net/ixgbe/base/ixgbe_type.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ixgbe/base/ixgbe_type.h b/drivers/net/ixgbe/base/ixgbe_type.h index bb1f85b..6acd966 100644 --- a/drivers/net/ixgbe/base/ixgbe_type.h +++ b/drivers/net/ixgbe/base/ixgbe_type.h @@ -2622,6 +2622,7 @@ enum { #define IXGBE_MRQC_VMDQRSS64EN 0x000B /* VMDq2 64 pools w/ RSS */ #define IXGBE_MRQC_VMDQRT8TCEN 0x000C /* VMDq2/RT 16 pool 8 TC */ #define IXGBE_MRQC_VMDQRT4TCEN 0x000D /* VMDq2/RT 32 pool 4 TC */ +#define IXGBE_MRQC_L3L4TXSWEN 0x8000 /* Enable L3/L4 Tx switch */ #define IXGBE_MRQC_RSS_FIELD_MASK 0x #define IXGBE_MRQC_RSS_FIELD_IPV4_TCP 0x0001 #define IXGBE_MRQC_RSS_FIELD_IPV4 0x0002 -- 1.9.3
[dpdk-dev] [PATCH 7/9] net/ixgbe: support xcast promisc mode
Add the support of xcast promiscuous mode. It's added in mailbox v1.3. Move the definition of xcast mode to base code. Signed-off-by: Wenzhuo Lu --- drivers/net/ixgbe/base/ixgbe_mbx.h | 8 drivers/net/ixgbe/base/ixgbe_vf.c | 4 drivers/net/ixgbe/ixgbe_ethdev.c | 6 -- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.h b/drivers/net/ixgbe/base/ixgbe_mbx.h index 7556a81..bde50a5 100644 --- a/drivers/net/ixgbe/base/ixgbe_mbx.h +++ b/drivers/net/ixgbe/base/ixgbe_mbx.h @@ -114,6 +114,14 @@ enum ixgbe_pfvf_api_rev { #define IXGBE_VF_GET_RSS_KEY 0x0b/* get RSS key */ #define IXGBE_VF_UPDATE_XCAST_MODE 0x0c +/* mode choices for IXGBE_VF_UPDATE_XCAST_MODE */ +enum ixgbevf_xcast_modes { + IXGBEVF_XCAST_MODE_NONE = 0, + IXGBEVF_XCAST_MODE_MULTI, + IXGBEVF_XCAST_MODE_ALLMULTI, + IXGBEVF_XCAST_MODE_PROMISC, +}; + /* GET_QUEUES return data indices within the mailbox */ #define IXGBE_VF_TX_QUEUES 1 /* number of Tx queues supported */ #define IXGBE_VF_RX_QUEUES 2 /* number of Rx queues supported */ diff --git a/drivers/net/ixgbe/base/ixgbe_vf.c b/drivers/net/ixgbe/base/ixgbe_vf.c index 8775ee5..b513190 100644 --- a/drivers/net/ixgbe/base/ixgbe_vf.c +++ b/drivers/net/ixgbe/base/ixgbe_vf.c @@ -432,6 +432,10 @@ s32 ixgbevf_update_xcast_mode(struct ixgbe_hw *hw, int xcast_mode) switch (hw->api_version) { case ixgbe_mbox_api_12: + /* New modes were introduced in 1.3 version */ + if (xcast_mode > IXGBEVF_XCAST_MODE_ALLMULTI) + return IXGBE_ERR_FEATURE_NOT_SUPPORTED; + /* Fall through */ case ixgbe_mbox_api_13: break; default: diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 7169007..2e497a8 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -155,12 +155,6 @@ #define IXGBE_QDE_STRIP_TAG0x0004 #define IXGBE_VTEICR_MASK 0x07 -enum ixgbevf_xcast_modes { - IXGBEVF_XCAST_MODE_NONE = 0, - IXGBEVF_XCAST_MODE_MULTI, - IXGBEVF_XCAST_MODE_ALLMULTI, -}; - #define IXGBE_EXVET_VET_EXT_SHIFT 16 #define IXGBE_DMATXCTL_VT_MASK 0x -- 1.9.3
[dpdk-dev] [PATCH 6/9] net/ixgbe/base: disable FC for 15B0
Make sure that ixgbe_device_supports_autoneg_fc() returns false and hw->fc.disable_fc_autoneg is set to true to avoid running the fc_autoneg function for the device 15B0, as this device doesn't support this function. Signed-off-by: Wenzhuo Lu --- drivers/net/ixgbe/base/ixgbe_common.c | 5 - drivers/net/ixgbe/base/ixgbe_x550.c | 3 +++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/ixgbe/base/ixgbe_common.c b/drivers/net/ixgbe/base/ixgbe_common.c index df6fbf9..4dabb43 100644 --- a/drivers/net/ixgbe/base/ixgbe_common.c +++ b/drivers/net/ixgbe/base/ixgbe_common.c @@ -189,7 +189,10 @@ bool ixgbe_device_supports_autoneg_fc(struct ixgbe_hw *hw) break; case ixgbe_media_type_backplane: - supported = true; + if (hw->device_id == IXGBE_DEV_ID_X550EM_X_XFI) + supported = false; + else + supported = true; break; case ixgbe_media_type_copper: /* only some copper devices support flow control autoneg */ diff --git a/drivers/net/ixgbe/base/ixgbe_x550.c b/drivers/net/ixgbe/base/ixgbe_x550.c index 1058684..3049865 100644 --- a/drivers/net/ixgbe/base/ixgbe_x550.c +++ b/drivers/net/ixgbe/base/ixgbe_x550.c @@ -4005,6 +4005,9 @@ s32 ixgbe_setup_fc_X550em(struct ixgbe_hw *hw) /* This device does not fully support AN. */ hw->fc.disable_fc_autoneg = true; break; + case IXGBE_DEV_ID_X550EM_X_XFI: + hw->fc.disable_fc_autoneg = true; + break; default: break; } -- 1.9.3
[dpdk-dev] [PATCH 8/9] net/ixgbe/base: fix build error
Fix ICC build error by removing the EWARN third parameter. Signed-off-by: Wenzhuo Lu --- drivers/net/ixgbe/base/ixgbe_phy.c | 23 ++- 1 file changed, 2 insertions(+), 21 deletions(-) diff --git a/drivers/net/ixgbe/base/ixgbe_phy.c b/drivers/net/ixgbe/base/ixgbe_phy.c index c953805..d2d06cf 100644 --- a/drivers/net/ixgbe/base/ixgbe_phy.c +++ b/drivers/net/ixgbe/base/ixgbe_phy.c @@ -1540,16 +1540,7 @@ s32 ixgbe_identify_sfp_module_generic(struct ixgbe_hw *hw) status = IXGBE_SUCCESS; } else { if (hw->allow_unsupported_sfp == true) { - EWARN(hw, "WARNING: Intel (R) Network " - "Connections are quality tested " - "using Intel (R) Ethernet Optics." - " Using untested modules is not " - "supported and may cause unstable" - " operation or damage to the " - "module or the adapter. Intel " - "Corporation is not responsible " - "for any harm caused by using " - "untested modules.\n", status); + EWARN(hw, "WARNING: Intel (R) Network Connections are quality tested using Intel (R) Ethernet Optics. Using untested modules is not supported and may cause unstable operation or damage to the module or the adapter. Intel Corporation is not responsible for any harm caused by using untested modules.\n"); status = IXGBE_SUCCESS; } else { DEBUGOUT("SFP+ module not supported\n"); @@ -1802,16 +1793,7 @@ s32 ixgbe_identify_qsfp_module_generic(struct ixgbe_hw *hw) status = IXGBE_SUCCESS; } else { if (hw->allow_unsupported_sfp == true) { - EWARN(hw, "WARNING: Intel (R) Network " - "Connections are quality tested " - "using Intel (R) Ethernet Optics." - " Using untested modules is not " - "supported and may cause unstable" - " operation or damage to the " - "module or the adapter. Intel " - "Corporation is not responsible " - "for any harm caused by using " - "untested modules.\n", status); + EWARN(hw, "WARNING: Intel (R) Network Connections are quality tested using Intel (R) Ethernet Optics. Using untested modules is not supported and may cause unstable operation or damage to the module or the adapter. Intel Corporation is not responsible for any harm caused by using untested modules.\n"); status = IXGBE_SUCCESS; } else { DEBUGOUT("QSFP module not supported\n"); @@ -1836,7 +1818,6 @@ s32 ixgbe_identify_qsfp_module_generic(struct ixgbe_hw *hw) return IXGBE_ERR_SFP_NOT_PRESENT; } - /** * ixgbe_get_sfp_init_sequence_offsets - Provides offset of PHY init sequence * @hw: pointer to hardware structure -- 1.9.3
[dpdk-dev] [PATCH 9/9] net/ixgbe/base: update readme
Signed-off-by: Wenzhuo Lu --- doc/guides/rel_notes/release_17_05.rst | 8 drivers/net/ixgbe/base/README | 4 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst index e25ea9f..ab87e81 100644 --- a/doc/guides/rel_notes/release_17_05.rst +++ b/doc/guides/rel_notes/release_17_05.rst @@ -41,6 +41,14 @@ New Features Also, make sure to start the actual text at the margin. = +* **Updated the ixgbe base driver.** + + Updated the ixgbe base driver, including the following changes: + + * Add link block check for KR. + * Complete HW initialization even if SFP is not present. + * Add VF xcast promiscuous mode. + Resolved Issues --- diff --git a/drivers/net/ixgbe/base/README b/drivers/net/ixgbe/base/README index 0a6054f..49bb27c 100644 --- a/drivers/net/ixgbe/base/README +++ b/drivers/net/ixgbe/base/README @@ -1,7 +1,7 @@ .. BSD LICENSE - Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + Copyright(c) 2010-2017 Intel Corporation. All rights reserved. All rights reserved. Redistribution and use in source and binary forms, with or without @@ -34,7 +34,7 @@ Intel® IXGBE driver === This directory contains source code of FreeBSD ixgbe driver of version -cid-10g-shared-code.2017.01.05 released by the team which develop +cid-10g-shared-code.2017.02.27 released by the team which develop basic drivers for any ixgbe NIC. The sub-directory of base/ contains the original source package. This driver is valid for the product(s) listed below -- 1.9.3
Re: [dpdk-dev] net/bonding - need for 8023ad slave to be in promisc mode
Hi Declan, Got your name from the maintainers' list. Could you please help me with this? Thanks, -Sujith On Mon, Feb 27, 2017 at 1:40 PM, sujith sankar wrote: > Hi folks, > > Could someone clarify the need to put mode 4 bonding slaves in promiscuous > mode? > > 840 void > 841 bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev, uint8_t > slave_id) > 842 { > . > . > . > 878 /* use this port as agregator */ > 879 port->aggregator_port_id = slave_id; > 880 rte_eth_promiscuous_enable(slave_id); > > > The Linux kernel mode bonding driver gets it working by assigning the > bond/primary slave MAC address to the rest of the slaves. Is there a reason > for not using this method? > > I tried this using my app on DPDK 16.04 on 82599, but could not get rx > working. It would be great if someone could help me with this. > > Thanks, > -Sujith
Re: [dpdk-dev] What is the max size of packets rte_eth_tx_burst() can send practically/theoritically?
On Tue, 28 Feb 2017 20:28:21 -0800 Joo Kim wrote: > Hello, > > > For example, as I understand, ovs-dpdk code uses a buffer of size 32 > when it transmits via rte_eth_tx_burst(). > > I think it can transmit more packets in a bust. > I know there should be a balance between throughput and latency. But, I am > wondering what max size packets a dpdk application can give to > rte_eth_tx_burst() practically/theoretically? > (I see some max macro in ixgbe_rxtx.h) > > Thanks You could theoretically transmit a burst size up to the configured number of TX descriptors. The downside is you will add latency and have to handle the TX ring getting full more often. Bigger burst sizes really don't win that much. 50% of the gain happens by just sending 2 at a time.
Re: [dpdk-dev] [RFC PATCH] net/virtio: Align Virtio-net header on cache line in receive path
On 02/23/2017 06:49 AM, Yuanhan Liu wrote: On Wed, Feb 22, 2017 at 10:36:36AM +0100, Maxime Coquelin wrote: On 02/22/2017 02:37 AM, Yuanhan Liu wrote: On Tue, Feb 21, 2017 at 06:32:43PM +0100, Maxime Coquelin wrote: This patch aligns the Virtio-net header on a cache-line boundary to optimize cache utilization, as it puts the Virtio-net header (which is always accessed) on the same cache line as the packet header. For example with an application that forwards packets at L2 level, a single cache-line will be accessed with this patch, instead of two before. I'm assuming you were testing pkt size <= (64 - hdr_size)? No, I tested with 64 bytes packets only. Oh, my bad, I overlooked it. While you were saying "a single cache line", I was thinking putting the virtio net hdr and the "whole" packet data in single cache line, which is not possible for pkt size 64B. I run some more tests this morning with different packet sizes, and also with changing the mbuf size on guest side to have multi- buffers packets: +---+++-+ | Txpkt | Rxmbuf | v17.02 | v17.02 + vnet hdr align | +---+++-+ |64 | 2048 | 11.05 | 11.78 | | 128 | 2048 | 10.66 | 11.48 | | 256 | 2048 | 10.47 | 11.21 | | 512 | 2048 | 10.22 | 10.88 | | 1024 | 2048 | 7.65 |7.84 | | 1500 | 2048 | 6.25 |6.45 | | 2000 | 2048 | 5.31 |5.43 | | 2048 | 2048 | 5.32 |4.25 | | 1500 |512 | 3.89 |3.98 | | 2048 |512 | 1.96 |2.02 | +---+++-+ Could you share more info, say is it a PVP test? Is mergeable on? What's the fwd mode? No, this is not PVP benchmark, I have neither another server nor a packet generator connected to my Haswell machine back-to-back. This is simple micro-benchmark, vhost PMD in txonly, Virtio PMD in rxonly. In this configuration, mergeable is ON and no offload disabled in QEMU cmdline. That's why I would be interested in more testing on recent hardware with PVP benchmark. Is it something that could be run in Intel lab? I did some more trials, and I think that most of the gain seen in this microbenchmark could happen in fact on vhost side. Indeed, I monitored the number of packets dequeued at each .rx_pkt_burst() call, and I can see there are packets in the vq only once every 20 calls. On Vhost side, monitoring shows that it always succeeds to write its burts, i.e. the vq is never full. In case of multi-buffers packets, next segments will be aligned on a cache-line boundary, instead of cache-line boundary minus size of vnet header before. The another thing is, this patch always makes the pkt data cache unaligned for the first packet, which makes Zhihong's optimization on memcpy (for big packet) useless. commit f5472703c0bdfc29c46fc4b2ca445bce3dc08c9f Author: Zhihong Wang Date: Tue Dec 6 20:31:06 2016 -0500 I did run some loopback test with large packet also, an I see a small gain with my patch (fwd io on both ends): +---+++-+ | Txpkt | Rxmbuf | v17.02 | v17.02 + vnet hdr align | +---+++-+ | 1500 | 2048 | 4.05 |4.14 | +---+++-+ Wierd, that basically means Zhihong's patch doesn't work? Could you add one more colum here: what's the data when roll back to the point without Zhihong's commit? I add this to my ToDo list, don't expect results before next week. Signed-off-by: Zhihong Wang Reviewed-by: Yuanhan Liu Tested-by: Lei Yao Does this need to be cache-line aligned? Nope, the alignment size is different with different platforms. AVX512 needs a 64B alignment, while AVX2 needs 32B alignment. I also tried to align pkt on 16bytes boundary, basically putting header at HEADROOM + 4 bytes offset, but I didn't measured any gain on Haswell, The fast rte_memcpy path (when dst & src is well aligned) on Haswell (with AVX2) requires 32B alignment. Even the 16B boundary would make it into the slow path. From this point of view, the extra pad does not change anything. Thus, no gain is expected. and even a drop on SandyBridge. That's weird, SandyBridge requries the 16B alignment, meaning the extra pad should put it into fast path of rte_memcpy, whereas the performance is worse. Thanks for the info, I will run more tests to explain this. Cheers, Maxime --yliu I understand your point regarding aligned memcpy, but I'm surprised I don't see its expected superiority with my benchmarks. Any thoughts?
Re: [dpdk-dev] [PATCH 0/7] virtio/vhost: Add MTU feature support
On 02/23/2017 08:10 AM, Yuanhan Liu wrote: On Mon, Feb 13, 2017 at 03:28:13PM +0100, Maxime Coquelin wrote: This series adds support to new Virtio's MTU feature[1]. Seems you missed a link here? Oops, I meant this link: https://www.spinics.net/lists/linux-virtualization/msg28908.html Will fix in v2. The MTU value is set via QEMU parameters. If the feature is negotiated (i.e supported by both host andcguest, and valid MTU value is set in QEMU via its host_mtu parameter), QEMU shares the configured MTU value throught dedicated Vhost protocol feature. On vhost side, the value is stored in the virtio_net structure, and made available to the application thanks to new vhost lib's rte_vhost_mtu_get() function. rte_vhost_mtu_set() functions is implemented, but only succeed if the application sets the same value as the one set in QEMU. Idea is that it would be used for the application to ensure configured MTU value is consistent, but maybe the mtu_get() API is enough, and mtu_set() could just be dropped. If the vhost MTU is designed to be read-only, then we may should drop the set_mtu function. Ok, I will remove it in next revision. Vhost PMD mtu_set callback is also implemented in the same spirit. To be able to set eth_dev's MTU value at the right time, i.e. to call rte_vhost_mtu_get() just after Virtio features have been negotiated and before the device is really started, a new vhost flag has been introduced (VIRTIO_DEV_READY), because the VIRTIO_DEV_RUNNING flag is set too late (after .new_device() ops is called). Okay, and I think this kind of info should be in corresponding commit log. Sure. Regarding valid MTU values, the maximum MTU value accepted on vhost side is 65535 bytes, as defined in Virtio Spec and supported in Virtio-net Kernel driver. But in Virtio PMD, current maximum frame size is 9728 bytes (~9700 bytes MTU). So maximum MTU size accepted in Virtio PMD is the minimum between ~9700 bytes and host's MTU. Initially, we thought about disabling the rx-mergeable feature when MTU value was low enough to ensure all received packets would fit in receive buffers (when offloads are disabled). Doing this, we would save one cache-miss in the receive path. Problem is that we don't know the buffers size at Virtio feature neogotiation time. It might be possible for the application to call the configure callback again once the Rx queue is set up, but it seems a bit hacky. Worse, if multiple queue is involved, one queue could have it's own mempool, meaning the buffer size could be different, whereas the MTU feature is global. Indeed, we should ensure that for every queues, the corresponding buf size can handle a full packet to be able to disable the mergeable feature. Let's skip this for now and focus on improving the rx-mergeable path. Thanks, Maxime --yliu So I decided to skip this optimization for now, even if feedback and are of course appreciated. Finally, this series also adds MTU value printing in testpmd's "show port info" command when non-zero. This series target v17.05 release. Cheers, Maxime Maxime Coquelin (7): vhost: Enable VIRTIO_NET_F_MTU feature vhost: vhost-user: Add MTU protocol feature support vhost: Add new ready status flag vhost: Add API to get/set MTU value net/vhost: Implement mtu_set callback net/virtio: Add MTU feature support app/testpmd: print MTU value in show port info app/test-pmd/config.c | 5 + doc/guides/nics/features/vhost.ini | 1 + doc/guides/nics/features/virtio.ini | 1 + drivers/net/vhost/rte_eth_vhost.c | 18 +++ drivers/net/virtio/virtio_ethdev.c | 22 +-- drivers/net/virtio/virtio_ethdev.h | 3 ++- drivers/net/virtio/virtio_pci.h | 3 +++ lib/librte_vhost/rte_virtio_net.h | 31 ++ lib/librte_vhost/vhost.c| 42 ++- lib/librte_vhost/vhost.h| 9 +++- lib/librte_vhost/vhost_user.c | 44 +++-- lib/librte_vhost/vhost_user.h | 5 - 12 files changed, 171 insertions(+), 13 deletions(-) -- 2.9.3
[dpdk-dev] [PATCH 0/2] Add ZUC EEA3 capability to Intel(R) QuickAssist Technology driver
This patchset add ZUC EEA3 stream cipher capability to Intel(R) QuickAssist Technology driver and corresponding test cases to QAT test suite Arek Kusztal (2): crypto/qat: add ZUC EEA3 cipher capability app/test: add ZUC EEA3 test cases to QAT test suite app/test/test_cryptodev.c| 12 +++ doc/guides/cryptodevs/qat.rst| 2 + drivers/crypto/qat/qat_adf/qat_algs.h| 11 ++- drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 97 drivers/crypto/qat/qat_crypto.c | 34 - 5 files changed, 137 insertions(+), 19 deletions(-) -- 2.7.4
[dpdk-dev] [PATCH 1/2] crypto/qat: add ZUC EEA3 cipher capability
This commit adds ZUC EEA3 cipher capability to Intel(R) QuickAssist Technology driver Signed-off-by: Arek Kusztal --- doc/guides/cryptodevs/qat.rst| 2 + drivers/crypto/qat/qat_adf/qat_algs.h| 11 ++- drivers/crypto/qat/qat_adf/qat_algs_build_desc.c | 97 drivers/crypto/qat/qat_crypto.c | 34 - 4 files changed, 125 insertions(+), 19 deletions(-) diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst index 9ecd19b..79b9c9d 100644 --- a/doc/guides/cryptodevs/qat.rst +++ b/doc/guides/cryptodevs/qat.rst @@ -55,6 +55,7 @@ Cipher algorithms: * ``RTE_CRYPTO_CIPHER_NULL`` * ``RTE_CRYPTO_CIPHER_KASUMI_F8`` * ``RTE_CRYPTO_CIPHER_DES_CBC`` +* ``RTE_CRYPTO_CIPHER_ZUC_EEA3`` Hash algorithms: @@ -79,6 +80,7 @@ Limitations * SNOW 3G (UEA2) and KASUMI (F8) supported only if cipher length, cipher offset fields are byte-aligned. * SNOW 3G (UIA2) and KASUMI (F9) supported only if hash length, hash offset fields are byte-aligned. * No BSD support as BSD QAT kernel driver not available. +* ZUC EEA3 is not supported by dh895xcc devices Installation diff --git a/drivers/crypto/qat/qat_adf/qat_algs.h b/drivers/crypto/qat/qat_adf/qat_algs.h index b9e3fd6..37f64d4 100644 --- a/drivers/crypto/qat/qat_adf/qat_algs.h +++ b/drivers/crypto/qat/qat_adf/qat_algs.h @@ -80,6 +80,14 @@ struct qat_alg_buf { uint64_t addr; } __rte_packed; +enum qat_crypto_proto_flag { + QAT_CRYPTO_PROTO_FLAG_NONE = 0, + QAT_CRYPTO_PROTO_FLAG_CCM = 1, + QAT_CRYPTO_PROTO_FLAG_GCM = 2, + QAT_CRYPTO_PROTO_FLAG_SNOW3G = 3, + QAT_CRYPTO_PROTO_FLAG_ZUC = 4 +}; + /* * Maximum number of SGL entries */ @@ -143,7 +151,7 @@ int qat_alg_aead_session_create_content_desc_auth(struct qat_session *cdesc, unsigned int operation); void qat_alg_init_common_hdr(struct icp_qat_fw_comn_req_hdr *header, - uint16_t proto); + enum qat_crypto_proto_flag proto_flags); void qat_alg_ablkcipher_init_enc(struct qat_alg_ablkcipher_cd *cd, int alg, const uint8_t *key, @@ -158,4 +166,5 @@ int qat_alg_validate_snow3g_key(int key_len, enum icp_qat_hw_cipher_algo *alg); int qat_alg_validate_kasumi_key(int key_len, enum icp_qat_hw_cipher_algo *alg); int qat_alg_validate_3des_key(int key_len, enum icp_qat_hw_cipher_algo *alg); int qat_alg_validate_des_key(int key_len, enum icp_qat_hw_cipher_algo *alg); +int qat_alg_validate_zuc_key(int key_len, enum icp_qat_hw_cipher_algo *alg); #endif diff --git a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c index fbeef0a..3831d19 100644 --- a/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c +++ b/drivers/crypto/qat/qat_adf/qat_algs_build_desc.c @@ -422,7 +422,7 @@ static int qat_alg_do_precomputes(enum icp_qat_hw_auth_algo hash_alg, } void qat_alg_init_common_hdr(struct icp_qat_fw_comn_req_hdr *header, - uint16_t proto) + enum qat_crypto_proto_flag proto_flags) { PMD_INIT_FUNC_TRACE(); header->hdr_flags = @@ -435,14 +435,60 @@ void qat_alg_init_common_hdr(struct icp_qat_fw_comn_req_hdr *header, ICP_QAT_FW_LA_PARTIAL_NONE); ICP_QAT_FW_LA_CIPH_IV_FLD_FLAG_SET(header->serv_specif_flags, ICP_QAT_FW_CIPH_IV_16BYTE_DATA); - ICP_QAT_FW_LA_PROTO_SET(header->serv_specif_flags, - proto); + + switch (proto_flags){ + case QAT_CRYPTO_PROTO_FLAG_NONE: + ICP_QAT_FW_LA_PROTO_SET(header->serv_specif_flags, + ICP_QAT_FW_LA_NO_PROTO); + break; + case QAT_CRYPTO_PROTO_FLAG_CCM: + ICP_QAT_FW_LA_PROTO_SET(header->serv_specif_flags, + ICP_QAT_FW_LA_CCM_PROTO); + break; + case QAT_CRYPTO_PROTO_FLAG_GCM: + ICP_QAT_FW_LA_PROTO_SET(header->serv_specif_flags, + ICP_QAT_FW_LA_GCM_PROTO); + break; + case QAT_CRYPTO_PROTO_FLAG_SNOW3G: + ICP_QAT_FW_LA_PROTO_SET(header->serv_specif_flags, + ICP_QAT_FW_LA_SNOW_3G_PROTO); + break; + case QAT_CRYPTO_PROTO_FLAG_ZUC: + ICP_QAT_FW_LA_ZUC_3G_PROTO_FLAG_SET(header->serv_specif_flags, + ICP_QAT_FW_LA_ZUC_3G_PROTO); + break; + } + ICP_QAT_FW_LA_UPDATE_STATE_SET(header->serv_specif_flags, ICP_QAT_FW_LA_NO_UPDATE_STATE); ICP_QAT_FW_LA_DIGEST_IN_BUFFER_SET(header->serv_specif_flags, ICP_QAT_FW_LA_NO_DIGEST_IN_BUFFER);
[dpdk-dev] [PATCH 2/2] app/test: add ZUC EEA3 test cases to QAT test suite
This patch adds ZUC EEA3 cipher test cases to Intel(R) QuickAssist Technology crypto test suite Signed-off-by: Arek Kusztal --- app/test/test_cryptodev.c | 12 1 file changed, 12 insertions(+) diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c index 357a92e..45ddd35 100644 --- a/app/test/test_cryptodev.c +++ b/app/test/test_cryptodev.c @@ -7637,6 +7637,18 @@ static struct unit_test_suite cryptodev_qat_testsuite = { TEST_CASE_ST(ut_setup, ut_teardown, test_snow3g_auth_cipher_test_case_1), + /** ZUC encrypt only (EEA3) */ + TEST_CASE_ST(ut_setup, ut_teardown, + test_zuc_encryption_test_case_1), + TEST_CASE_ST(ut_setup, ut_teardown, + test_zuc_encryption_test_case_2), + TEST_CASE_ST(ut_setup, ut_teardown, + test_zuc_encryption_test_case_3), + TEST_CASE_ST(ut_setup, ut_teardown, + test_zuc_encryption_test_case_4), + TEST_CASE_ST(ut_setup, ut_teardown, + test_zuc_encryption_test_case_5), + /** HMAC_MD5 Authentication */ TEST_CASE_ST(ut_setup, ut_teardown, test_MD5_HMAC_generate_case_1), -- 2.7.4