Re: [dpdk-dev] [PATCH v1] app/pdump: add check for PCAP PMD
Hi Ferruh, > -Original Message- > From: Yigit, Ferruh > Sent: Monday, March 5, 2018 2:33 PM > To: Varghese, Vipin ; dev@dpdk.org; Pattan, > Reshma > Cc: Mcnamara, John > Subject: Re: [dpdk-dev] [PATCH v1] app/pdump: add check for PCAP PMD > > On 3/5/2018 7:57 AM, Vipin Varghese wrote: > > dpdk-pdump makes use of LIBRTE_PMD_PCAP for interfacing the ring to > > the device-queue pair. Updating Makefile to check for the same. > > > > Signed-off-by: Vipin Varghese > > --- > > app/pdump/Makefile | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/app/pdump/Makefile b/app/pdump/Makefile index > > bd3c208..038a34f 100644 > > --- a/app/pdump/Makefile > > +++ b/app/pdump/Makefile > > @@ -3,6 +3,10 @@ > > > > include $(RTE_SDK)/mk/rte.vars.mk > > > > +ifeq ($(CONFIG_RTE_LIBRTE_PMD_PCAP),n) $(error "Please enable > > +CONFIG_RTE_LIBRTE_PMD_PCAP") endif > > pdump is enabled default, so won't this break the default build? Yes, you are right it will fail. Which then forces the user to enable PCAP. > > What about moving this to lib/librte_pdump, convert $(error ..) to $(warning > ..) > and disable CONFIG_RTE_LIBRTE_PDUMP there? If we set to warning and there are no PCAP headers in build system. The application gets built, but will fail internally becz the pcap API will fails during execution. > > > + > > ifeq ($(CONFIG_RTE_LIBRTE_PDUMP),y) > > > > APP = dpdk-pdump > >
Re: [dpdk-dev] [PATCH 3/4] drivers/net: do not allocate rte_eth_dev_data privately
> -Original Message- > From: Matan Azrad [mailto:ma...@mellanox.com] > Sent: Tuesday, March 6, 2018 2:08 PM > To: Tan, Jianfeng; Yigit, Ferruh > Cc: Richardson, Bruce; Ananyev, Konstantin; Thomas Monjalon; > maxime.coque...@redhat.com; Burakov, Anatoly; dev@dpdk.org > Subject: RE: [dpdk-dev] [PATCH 3/4] drivers/net: do not allocate > rte_eth_dev_data privately > > Hi Jianfeng > > Please see a comment below. > > > From: Jianfeng Tan, Sent: Sunday, March 4, 2018 5:30 PM > > We introduced private rte_eth_dev_data to allow vdev to be created both > in > > primary process and secondary process(es). This is not friendly to multi- > > process model, for example, it leads to port id contention issue if two > > processes both find the data entry is free. > > > > And to get stats of primary vdev in secondary, we must allocate from the > > pre-defined array so that we can find it. > > > > Suggested-by: Bruce Richardson > > Signed-off-by: Jianfeng Tan > > --- > > drivers/net/af_packet/rte_eth_af_packet.c | 25 +++-- > > drivers/net/kni/rte_eth_kni.c | 13 ++--- > > drivers/net/null/rte_eth_null.c | 17 +++-- > > drivers/net/octeontx/octeontx_ethdev.c| 14 ++ > > drivers/net/pcap/rte_eth_pcap.c | 18 +++--- > > drivers/net/tap/rte_eth_tap.c | 9 + > > drivers/net/vhost/rte_eth_vhost.c | 17 ++--- > > 7 files changed, 20 insertions(+), 93 deletions(-) > > > > diff --git a/drivers/net/af_packet/rte_eth_af_packet.c > > b/drivers/net/af_packet/rte_eth_af_packet.c > > index 57eccfd..2db692f 100644 > > --- a/drivers/net/af_packet/rte_eth_af_packet.c > > +++ b/drivers/net/af_packet/rte_eth_af_packet.c > > @@ -564,25 +564,17 @@ rte_pmd_init_internals(struct rte_vdev_device > > *dev, > > RTE_LOG(ERR, PMD, > > "%s: no interface specified for AF_PACKET > > ethdev\n", > > name); > > - goto error_early; > > + return -1; > > } > > > > RTE_LOG(INFO, PMD, > > "%s: creating AF_PACKET-backed ethdev on numa socket > > %u\n", > > name, numa_node); > > > > - /* > > -* now do all data allocation - for eth_dev structure, dummy pci > > driver > > -* and internal (private) data > > -*/ > > - data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node); > > - if (data == NULL) > > - goto error_early; > > - > > *internals = rte_zmalloc_socket(name, sizeof(**internals), > > 0, numa_node); > > if (*internals == NULL) > > - goto error_early; > > + return -1; > > > > for (q = 0; q < nb_queues; q++) { > > (*internals)->rx_queue[q].map = MAP_FAILED; @@ -604,24 > > +596,24 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, > > RTE_LOG(ERR, PMD, > > "%s: I/F name too long (%s)\n", > > name, pair->value); > > - goto error_early; > > + return -1; > > } > > if (ioctl(sockfd, SIOCGIFINDEX, &ifr) == -1) { > > RTE_LOG(ERR, PMD, > > "%s: ioctl failed (SIOCGIFINDEX)\n", > > name); > > - goto error_early; > > + return -1; > > } > > (*internals)->if_name = strdup(pair->value); > > if ((*internals)->if_name == NULL) > > - goto error_early; > > + return -1; > > (*internals)->if_index = ifr.ifr_ifindex; > > > > if (ioctl(sockfd, SIOCGIFHWADDR, &ifr) == -1) { > > RTE_LOG(ERR, PMD, > > "%s: ioctl failed (SIOCGIFHWADDR)\n", > > name); > > - goto error_early; > > + return -1; > > } > > memcpy(&(*internals)->eth_addr, ifr.ifr_hwaddr.sa_data, > > ETH_ALEN); > > > > @@ -775,14 +767,13 @@ rte_pmd_init_internals(struct rte_vdev_device > > *dev, > > > > (*internals)->nb_queues = nb_queues; > > > > - rte_memcpy(data, (*eth_dev)->data, sizeof(*data)); > > + data = (*eth_dev)->data; > > data->dev_private = *internals; > > data->nb_rx_queues = (uint16_t)nb_queues; > > data->nb_tx_queues = (uint16_t)nb_queues; > > data->dev_link = pmd_link; > > data->mac_addrs = &(*internals)->eth_addr; > > > > - (*eth_dev)->data = data; > > (*eth_dev)->dev_ops = &ops; > > > > return 0; > > @@ -802,8 +793,6 @@ rte_pmd_init_internals(struct rte_vdev_device > *dev, > > } > > free((*internals)->if_name); > > rte_free(*internals); > > -error_early: > > - rte_free(data); > > return -1; > > } > > > > I think you should remove the private rte_eth_dev_data freeing in > rte_pmd_af_packet_remove(). > This is relevant to all the vdevs here. Ah, yes, you are correct. I will fix that in v2. > > Question: > Does the patch include all the vdevs which allocated private > rte_eth_dev_data? Yes, we are removing all privat
Re: [dpdk-dev] [PATCH v2] doc: add driver limitation for vhost dequeue zero copy
On 02/27/2018 10:21 AM, Junjie Chen wrote: In vhost-switch example, when binding nic to vfio-pci, dequeue zero copy cannot work in VM2NIC mode due to no iommu dma mapping is setup for guest memory currently. Signed-off-by: Junjie Chen --- Changes in V2: - add doc in vhost lib doc/guides/prog_guide/vhost_lib.rst | 3 +++ doc/guides/sample_app_ug/vhost.rst | 5 - 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index 18227b6..bdf77d6 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -83,6 +83,9 @@ The following is an overview of some key Vhost API functions: of those segments, thus the fewer the segments, the quicker we will get the mapping. NOTE: we may speed it by using tree searching in future. +* zero copy does not work when using vfio-pci driver currently, this is + because we don't setup iommu dma mapping for guest memory. + I guess that it should work with vfio-pci in noiommu mode? Maybe worth to clarify. - ``RTE_VHOST_USER_IOMMU_SUPPORT`` IOMMU support will be enabled when this flag is set. It is disabled by diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst index a4bdc6a..840c1fd 100644 --- a/doc/guides/sample_app_ug/vhost.rst +++ b/doc/guides/sample_app_ug/vhost.rst @@ -147,7 +147,10 @@ retries on an RX burst, it takes effect only when rx retry is enabled. The default value is 15. **--dequeue-zero-copy** -Dequeue zero copy will be enabled when this option is given. +Dequeue zero copy will be enabled when this option is given, it is worth to +note that if NIC is binded to vfio-pci driver, dequeue zero copy cannot work +at VM2NIC mode (vm2vm=0) due to currently we don't setup iommu dma mapping for +guest memory. **--vlan-strip 0|1** VLAN strip option is removed, because different NICs have different behaviors
Re: [dpdk-dev] [PATCH] vhost: add note about sockets in server mode
Hi Ilya, On 02/26/2018 09:39 AM, Ilya Maximets wrote: From time to time, someone sends patches about unlinking existing sockets when registering a vhost user in server mode. A recent example: http://dpdk.org/ml/archives/dev/2018-February/090025.html This problem has been discussed many times, and it was made clear that the library should not unlink files given by the application in order to avoid possible security problems, such as removing random files used by other programs. One of the first discussions: http://dpdk.org/ml/archives/dev/2015-December/030326.html To avoid such patches in the future, it was decided to add a comment that explains what is happening and tries to describe the reasoning. Signed-off-by: Ilya Maximets --- I'm open for suggestions. Wording/grammar fixes are also welcome. lib/librte_vhost/socket.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 83befdc..e8584f3 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -318,6 +318,16 @@ vhost_user_start_server(struct vhost_user_socket *vsocket) int fd = vsocket->socket_fd; const char *path = vsocket->path; + /* +* bind () may fail if the socket file with the same name already +* exists. But the library obviously should not delete the file +* provided by the user, since we can not be sure that it is not +* being used by other applications. Moreover, many applications form +* socket names based on user input, which is prone to errors. +* +* The user must ensure that the socket does not exist before +* registering the vhost driver in server mode. +*/ ret = bind(fd, (struct sockaddr *)&vsocket->un, sizeof(vsocket->un)); if (ret < 0) { RTE_LOG(ERR, VHOST_CONFIG, Reviewed-by: Maxime Coquelin Thanks! Maxime
Re: [dpdk-dev] [RFC 1/4] drivers/bus/ifpga:Intel FPGA Bus Lib Code
-Original Message- From: Shreyansh Jain [mailto:shreyansh.j...@nxp.com] Sent: Tuesday, March 06, 2018 14:10 To: Xu, Rosen Cc: dev@dpdk.org; Doherty, Declan ; Zhang, Tianfei Subject: Re: [dpdk-dev] [RFC 1/4] drivers/bus/ifpga:Intel FPGA Bus Lib Code Hello Rosen, I have some initial (and most of them trivial) comments inline... On Tue, Mar 6, 2018 at 7:13 AM, Rosen Xu wrote: > Signed-off-by: Rosen Xu > --- > drivers/bus/ifpga/Makefile | 64 > drivers/bus/ifpga/ifpga_bus.c | 527 > > drivers/bus/ifpga/ifpga_common.c| 168 + > drivers/bus/ifpga/ifpga_common.h| 46 +++ > drivers/bus/ifpga/ifpga_logs.h | 59 > drivers/bus/ifpga/rte_bus_ifpga.h | 153 > drivers/bus/ifpga/rte_bus_ifpga_version.map | 8 + > 7 files changed, 1025 insertions(+) > create mode 100644 drivers/bus/ifpga/Makefile create mode 100644 > drivers/bus/ifpga/ifpga_bus.c create mode 100644 > drivers/bus/ifpga/ifpga_common.c create mode 100644 > drivers/bus/ifpga/ifpga_common.h create mode 100644 > drivers/bus/ifpga/ifpga_logs.h create mode 100644 > drivers/bus/ifpga/rte_bus_ifpga.h create mode 100644 > drivers/bus/ifpga/rte_bus_ifpga_version.map > > diff --git a/drivers/bus/ifpga/Makefile b/drivers/bus/ifpga/Makefile > new file mode 100644 index 000..c71f186 > --- /dev/null > +++ b/drivers/bus/ifpga/Makefile > @@ -0,0 +1,64 @@ > +# BSD LICENSE > +# > +# Copyright(c) 2010-2017 Intel Corporation. All rights reserved. > +# All rights reserved. > +# > +# Redistribution and use in source and binary forms, with or without > +# modification, are permitted provided that the following conditions > +# are met: > +# > +# * Redistributions of source code must retain the above copyright > +# notice, this list of conditions and the following disclaimer. > +# * Redistributions in binary form must reproduce the above copyright > +# notice, this list of conditions and the following disclaimer in > +# the documentation and/or other materials provided with the > +# distribution. > +# * Neither the name of Intel Corporation nor the names of its > +# contributors may be used to endorse or promote products derived > +# from this software without specific prior written permission. > +# > +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. As of 18.02, I think all licensing has moved to SPDX. Maybe in formal patch you should change to that. Rosen: I will modify it > + > +include $(RTE_SDK)/mk/rte.vars.mk > + > +# > +# library name > +# > +LIB = librte_bus_ifpga.a > +LIBABIVER := 1 > +EXPORT_MAP := rte_bus_ifpga_version.map > + > +ifeq ($(CONFIG_RTE_LIBRTE_DPAA2_DEBUG_INIT),y) I think this is copy-paste issue - isn't it? Rosen: yes (CONFIG_RTE_LIBRTE_DPAA2_DEBUG_INIT) I see that you have already enabled dynamic logging - in which case you won't need this anyway. Rosen: ok > +CFLAGS += -O0 -g > +CFLAGS += "-Wno-error" > +else > +CFLAGS += -O3 > +CFLAGS += $(WERROR_FLAGS) > +endif > + > +CFLAGS += -I$(RTE_SDK)/drivers/bus/ifpga CFLAGS += > +-I$(RTE_SDK)/drivers/bus/pci CFLAGS += > +-I$(RTE_SDK)/lib/librte_eal/linuxapp/eal > +CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common > +#CFLAGS += -I$(RTE_SDK)/lib/librte_rawdev #LDLIBS += -lrte_eal > +-lrte_mbuf -lrte_mempool -lrte_ring -lrte_rawdev LDLIBS += -lrte_eal > +-lrte_mbuf -lrte_mempool -lrte_ring #LDLIBS += -lrte_ethdev > + > +VPATH += $(SRCDIR)/base > + > +SRCS-y += \ > +ifpga_bus.c \ > +ifpga_common.c > + > +include $(RTE_SDK)/mk/rte.lib.mk > diff --git a/drivers/bus/ifpga/ifpga_bus.c > b/drivers/bus/ifpga/ifpga_bus.c new file mode 100644 index > 000..382d550 > --- /dev/null > +++ b/drivers/bus/ifpga/ifpga_bus.c > @@ -0,0 +1,527 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * Copyright 2013-2014 6WIND S.A. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistri
[dpdk-dev] [PATCH] net/bonding: avoid wrong casting on primary_slave_port_id from input param
From: Gowrishankar Muthukrishnan primary_slave_port_id is uint16_t which needs to be correctly stored with the same data type of input parameter in bond_ethdev_configure. Fixes: f8244c6399 ("ethdev: increase port id range") Cc: sta...@dpdk.org Signed-off-by: Gowrishankar Muthukrishnan --- In powerpc, creating bond pmd results in below error due to wrong cast on input param. This is reproducible, only when using shared libraries. sudo -E LD_LIBRARY_PATH=$PWD/$RTE_TARGET/lib $RTE_TARGET/app/testpmd \ -l 0,8 --socket-mem=1024,1024 \ --vdev 'net_tap0,iface=dpdktap0' --vdev 'net_tap1,iface=dpdktap1' \ --vdev 'net_bonding0,mode=1,slave=0,slave=1,primary=0,socket_id=1' \ -d $RTE_TARGET/lib/librte_pmd_tap.so \ -d $RTE_TARGET/lib/librte_mempool_ring.so -- --forward-mode=rxonly Configuring Port 0 (socket 0) PMD: net_tap0: 0x70a854070280: TX configured queues number: 1 PMD: net_tap0: 0x70a854070280: RX configured queues number: 1 Port 0: 86:EA:6D:52:3E:DB Configuring Port 1 (socket 0) PMD: net_tap1: 0x70a854074300: TX configured queues number: 1 PMD: net_tap1: 0x70a854074300: RX configured queues number: 1 Port 1: 42:9A:B8:49:B6:00 Configuring Port 2 (socket 1) EAL: Failed to set primary slave port 7424 on bonded device net_bonding0 Fail to configure port 2 EAL: Error - exiting with code: 1 Cause: Start ports failed drivers/net/bonding/rte_eth_bond_args.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/bonding/rte_eth_bond_args.c b/drivers/net/bonding/rte_eth_bond_args.c index 27d3101..e99681e 100644 --- a/drivers/net/bonding/rte_eth_bond_args.c +++ b/drivers/net/bonding/rte_eth_bond_args.c @@ -244,7 +244,7 @@ if (primary_slave_port_id < 0) return -1; - *(uint8_t *)extra_args = (uint8_t)primary_slave_port_id; + *(uint16_t *)extra_args = (uint16_t)primary_slave_port_id; return 0; } -- 1.9.1
Re: [dpdk-dev] [PATCH] ethdev: return diagnostic when setting MAC address
Hi, Some comments inline. On Tue, Feb 27, 2018 at 04:11:29PM +0100, Olivier Matz wrote: > Change the prototype and the behavior of dev_ops->eth_mac_addr_set(): a > return code is added to notify the caller (librte_ether) if an error > occurred in the PMD. > > The new default MAC address is now copied in dev->data->mac_addrs[0] > only if the operation is successful. > > The patch also updates all the PMDs accordingly. > > Signed-off-by: Olivier Matz > --- > > Hi, > > This patch is the following of the discussion we had in this thread: > https://dpdk.org/dev/patchwork/patch/32284/ > > I did my best to keep the consistency inside the PMDs. The behavior > of eth_mac_addr_set() is inspired from other fonctions in the same > PMD, usually eth_mac_addr_add(). For instance: > - dpaa and dpaa2 return 0 on error. > - some PMDs (bnxt, mlx5, ...?) do not return a -errno code (-1 or > positive values). > - some PMDs (avf, tap) check if the address is the same and return 0 > in that case. This could go in generic code? > > I tried to use the following errors when relevant: > - -EPERM when a VF is not allowed to do a change > - -ENOTSUP if the function is not supported > - -EIO if this is an unknown error from lower layer (hw or sdk) > - -EINVAL for other unknown errors > > Please, PMD maintainers, feel free to comment if you ahve specific > needs for your driver. > > Thanks > Olivier > > > doc/guides/rel_notes/deprecation.rst| 8 > drivers/net/ark/ark_ethdev.c| 9 ++--- > drivers/net/avf/avf_ethdev.c| 12 > drivers/net/bnxt/bnxt_ethdev.c | 10 ++ > drivers/net/bonding/rte_eth_bond_pmd.c | 8 ++-- > drivers/net/dpaa/dpaa_ethdev.c | 4 +++- > drivers/net/dpaa2/dpaa2_ethdev.c| 6 -- > drivers/net/e1000/igb_ethdev.c | 12 +++- > drivers/net/failsafe/failsafe_ops.c | 16 +--- > drivers/net/i40e/i40e_ethdev.c | 24 ++- > drivers/net/i40e/i40e_ethdev_vf.c | 12 +++- > drivers/net/ixgbe/ixgbe_ethdev.c| 13 - > drivers/net/mlx4/mlx4.h | 2 +- > drivers/net/mlx4/mlx4_ethdev.c | 7 +-- > drivers/net/mlx5/mlx5.h | 2 +- > drivers/net/mlx5/mlx5_mac.c | 7 +-- > drivers/net/mrvl/mrvl_ethdev.c | 7 ++- > drivers/net/null/rte_eth_null.c | 3 ++- > drivers/net/octeontx/octeontx_ethdev.c | 4 +++- > drivers/net/qede/qede_ethdev.c | 7 +++ > drivers/net/sfc/sfc_ethdev.c| 14 +- > drivers/net/szedata2/rte_eth_szedata2.c | 3 ++- > drivers/net/tap/rte_eth_tap.c | 34 > + > drivers/net/virtio/virtio_ethdev.c | 15 ++- > drivers/net/vmxnet3/vmxnet3_ethdev.c| 5 +++-- > lib/librte_ether/rte_ethdev.c | 7 +-- > lib/librte_ether/rte_ethdev_core.h | 2 +- > test/test/virtual_pmd.c | 3 ++- > 28 files changed, 159 insertions(+), 97 deletions(-) > > diff --git a/doc/guides/rel_notes/deprecation.rst > b/doc/guides/rel_notes/deprecation.rst > index 74c18ed7c..2bf360f0d 100644 > --- a/doc/guides/rel_notes/deprecation.rst > +++ b/doc/guides/rel_notes/deprecation.rst > @@ -134,14 +134,6 @@ Deprecation Notices >between the VF representor and the VF or the parent PF. Those new fields >are to be included in ``rte_eth_dev_info`` struct. > > -* ethdev: The prototype and the behavior of > - ``dev_ops->eth_mac_addr_set()`` will change in v18.05. A return code > - will be added to notify the caller if an error occurred in the PMD. In > - ``rte_eth_dev_default_mac_addr_set()``, the new default MAC address > - will be copied in ``dev->data->mac_addrs[0]`` only if the operation is > - successful. This modification will only impact the PMDs, not the > - applications. > - > * ethdev: functions add rx/tx callback will return named opaque type >``rte_eth_add_rx_callback()``, ``rte_eth_add_first_rx_callback()`` and >``rte_eth_add_tx_callback()`` functions currently return callback object as > diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c > index ff87c20e2..3fc40cd74 100644 > --- a/drivers/net/ark/ark_ethdev.c > +++ b/drivers/net/ark/ark_ethdev.c > @@ -69,7 +69,7 @@ static int eth_ark_dev_set_link_down(struct rte_eth_dev > *dev); > static int eth_ark_dev_stats_get(struct rte_eth_dev *dev, > struct rte_eth_stats *stats); > static void eth_ark_dev_stats_reset(struct rte_eth_dev *dev); > -static void eth_ark_set_default_mac_addr(struct rte_eth_dev *dev, > +static int eth_ark_set_default_mac_addr(struct rte_eth_dev *dev, >struct ether_addr *mac_addr); > static int eth_ark_macaddr_add(struct rte_eth_dev *dev, > struct ether_addr *mac_addr, > @@ -887,16 +887,19 @@ eth_ark_macaddr_remove(struct
Re: [dpdk-dev] [PATCH v2 1/6] vhost: export vhost feature definitions
> -Original Message- > From: Wang, Zhihong > Sent: Tuesday, February 13, 2018 5:21 PM > To: dev@dpdk.org > Cc: Tan, Jianfeng; Bie, Tiwei; maxime.coque...@redhat.com; > y...@fridaylinux.org; Liang, Cunming; Wang, Xiao W; Daly, Dan; Wang, > Zhihong > Subject: [PATCH v2 1/6] vhost: export vhost feature definitions > > This patch exports vhost-user protocol features to support device driver > development. > > Signed-off-by: Zhihong Wang > --- > lib/librte_vhost/rte_vhost.h | 8 > lib/librte_vhost/vhost.h | 4 +--- > lib/librte_vhost/vhost_user.c | 9 + > lib/librte_vhost/vhost_user.h | 20 +++- > 4 files changed, 21 insertions(+), 20 deletions(-) > > diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h > index d33206997..b05162366 100644 > --- a/lib/librte_vhost/rte_vhost.h > +++ b/lib/librte_vhost/rte_vhost.h > @@ -29,6 +29,14 @@ extern "C" { > #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY (1ULL << 2) > #define RTE_VHOST_USER_IOMMU_SUPPORT (1ULL << 3) > > +#define RTE_VHOST_USER_PROTOCOL_F_MQ 0 Instead of adding a "RTE_" prefix. I prefer to define it like this: #ifndef VHOST_USER_PROTOCOL_F_MQ #define VHOST_USER_PROTOCOL_F_MQ 0 #endif Similar to other macros. > +#define RTE_VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 > +#define RTE_VHOST_USER_PROTOCOL_F_RARP 2 > +#define RTE_VHOST_USER_PROTOCOL_F_REPLY_ACK 3 > +#define RTE_VHOST_USER_PROTOCOL_F_NET_MTU4 > +#define RTE_VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 > +#define RTE_VHOST_USER_F_PROTOCOL_FEATURES 30 > + > /** > * Information relating to memory regions including offsets to > * addresses in QEMUs memory file. > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index 58aec2e0d..a0b0520e2 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -174,8 +174,6 @@ struct vhost_msg { > #define VIRTIO_F_VERSION_1 32 > #endif > > -#define VHOST_USER_F_PROTOCOL_FEATURES 30 > - > /* Features supported by this builtin vhost-user net driver. */ > #define VIRTIO_NET_SUPPORTED_FEATURES ((1ULL << > VIRTIO_NET_F_MRG_RXBUF) | \ > (1ULL << VIRTIO_F_ANY_LAYOUT) | \ > @@ -185,7 +183,7 @@ struct vhost_msg { > (1ULL << VIRTIO_NET_F_MQ) | \ > (1ULL << VIRTIO_F_VERSION_1) | \ > (1ULL << VHOST_F_LOG_ALL) | \ > - (1ULL << > VHOST_USER_F_PROTOCOL_FEATURES) | \ > + (1ULL << > RTE_VHOST_USER_F_PROTOCOL_FEATURES) | \ > (1ULL << VIRTIO_NET_F_GSO) | \ > (1ULL << VIRTIO_NET_F_HOST_TSO4) | \ > (1ULL << VIRTIO_NET_F_HOST_TSO6) | \ > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > index 5c5361066..c93e48e4d 100644 > --- a/lib/librte_vhost/vhost_user.c > +++ b/lib/librte_vhost/vhost_user.c > @@ -527,7 +527,7 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, > VhostUserMsg *msg) > vring_invalidate(dev, vq); > > if (vq->enabled && (dev->features & > - (1ULL << > VHOST_USER_F_PROTOCOL_FEATURES))) { > + (1ULL << > RTE_VHOST_USER_F_PROTOCOL_FEATURES))) { > dev = translate_ring_addresses(dev, msg- > >payload.addr.index); > if (!dev) > return -1; > @@ -897,11 +897,11 @@ vhost_user_set_vring_kick(struct virtio_net > **pdev, struct VhostUserMsg *pmsg) > vq = dev->virtqueue[file.index]; > > /* > - * When VHOST_USER_F_PROTOCOL_FEATURES is not negotiated, > + * When RTE_VHOST_USER_F_PROTOCOL_FEATURES is not > negotiated, >* the ring starts already enabled. Otherwise, it is enabled via >* the SET_VRING_ENABLE message. >*/ > - if (!(dev->features & (1ULL << > VHOST_USER_F_PROTOCOL_FEATURES))) > + if (!(dev->features & (1ULL << > RTE_VHOST_USER_F_PROTOCOL_FEATURES))) > vq->enabled = 1; > > if (vq->kickfd >= 0) > @@ -1012,7 +1012,8 @@ vhost_user_get_protocol_features(struct > virtio_net *dev, >* Qemu versions (from v2.7.0 to v2.9.0). >*/ > if (!(features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) > - protocol_features &= ~(1ULL << > VHOST_USER_PROTOCOL_F_REPLY_ACK); > + protocol_features &= > + ~(1ULL << > RTE_VHOST_USER_PROTOCOL_F_REPLY_ACK); > > msg->payload.u64 = protocol_features; > msg->size = sizeof(msg->payload.u64); > diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h > index 0fafbe6e0..066e772dd 100644 > --- a/lib/librte_vhost/vhost_user.h > +++ b/lib/librte_vhost/vhost_user.h > @@ -14,19 +14,13 @@ > > #define VHOST_MEMORY_MAX_NREGIONS 8 > > -#define VHOST_USER_PROTOCOL_F_MQ 0 > -#define VHOST_USER_PROTOCOL_
Re: [dpdk-dev] [PATCH v2 6/6] vhost: export new apis
> -Original Message- > From: Wang, Zhihong > Sent: Tuesday, February 13, 2018 5:21 PM > To: dev@dpdk.org > Cc: Tan, Jianfeng; Bie, Tiwei; maxime.coque...@redhat.com; > y...@fridaylinux.org; Liang, Cunming; Wang, Xiao W; Daly, Dan; Wang, > Zhihong > Subject: [PATCH v2 6/6] vhost: export new apis > > This patch exports new APIs as experimental. How about squeezing this patch with patch 2 where the APIs are introduced, as well as the related doc update? Thanks, Jianfeng > > Signed-off-by: Zhihong Wang > --- > lib/librte_vhost/rte_vdpa.h| 16 +++- > lib/librte_vhost/rte_vhost.h | 33 ++--- > lib/librte_vhost/rte_vhost_version.map | 19 +++ > 3 files changed, 52 insertions(+), 16 deletions(-) > > diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h > index 1bde36f7f..23fb471be 100644 > --- a/lib/librte_vhost/rte_vdpa.h > +++ b/lib/librte_vhost/rte_vdpa.h > @@ -100,15 +100,21 @@ extern struct rte_vdpa_engine *vdpa_engines[]; > extern uint32_t vdpa_engine_num; > > /* engine management */ > -int rte_vdpa_register_engine(const char *name, struct rte_vdpa_eng_addr > *addr); > -int rte_vdpa_unregister_engine(int eid); > +int __rte_experimental > +rte_vdpa_register_engine(const char *name, struct rte_vdpa_eng_addr > *addr); > > -int rte_vdpa_find_engine_id(struct rte_vdpa_eng_addr *addr); > +int __rte_experimental > +rte_vdpa_unregister_engine(int eid); > > -int rte_vdpa_info_query(int eid, struct rte_vdpa_eng_attr *attr); > +int __rte_experimental > +rte_vdpa_find_engine_id(struct rte_vdpa_eng_addr *addr); > + > +int __rte_experimental > +rte_vdpa_info_query(int eid, struct rte_vdpa_eng_attr *attr); > > /* driver register api */ > -void rte_vdpa_register_driver(struct rte_vdpa_eng_driver *drv); > +void __rte_experimental > +rte_vdpa_register_driver(struct rte_vdpa_eng_driver *drv); > > #define RTE_VDPA_REGISTER_DRIVER(nm, drv) \ > RTE_INIT(vdpainitfn_ ##nm); \ > diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h > index 48005d9ff..d5589c543 100644 > --- a/lib/librte_vhost/rte_vhost.h > +++ b/lib/librte_vhost/rte_vhost.h > @@ -187,7 +187,8 @@ int rte_vhost_driver_unregister(const char *path); > * @return > * 0 on success, -1 on failure > */ > -int rte_vhost_driver_set_vdpa_eid(const char *path, int eid); > +int __rte_experimental > +rte_vhost_driver_set_vdpa_eid(const char *path, int eid); > > /** > * Set the device id, enforce single connection per socket > @@ -199,7 +200,8 @@ int rte_vhost_driver_set_vdpa_eid(const char *path, > int eid); > * @return > * 0 on success, -1 on failure > */ > -int rte_vhost_driver_set_vdpa_did(const char *path, int did); > +int __rte_experimental > +rte_vhost_driver_set_vdpa_did(const char *path, int did); > > /** > * Get the engine id > @@ -209,7 +211,8 @@ int rte_vhost_driver_set_vdpa_did(const char *path, > int did); > * @return > * Engine id, -1 on failure > */ > -int rte_vhost_driver_get_vdpa_eid(const char *path); > +int __rte_experimental > +rte_vhost_driver_get_vdpa_eid(const char *path); > > /** > * Get the device id > @@ -219,7 +222,8 @@ int rte_vhost_driver_get_vdpa_eid(const char *path); > * @return > * Device id, -1 on failure > */ > -int rte_vhost_driver_get_vdpa_did(const char *path); > +int __rte_experimental > +rte_vhost_driver_get_vdpa_did(const char *path); > > /** > * Set the feature bits the vhost-user driver supports. > @@ -286,7 +290,8 @@ int rte_vhost_driver_get_features(const char *path, > uint64_t *features); > * @return > * 0 on success, -1 on failure > */ > -int rte_vhost_driver_get_protocol_features(const char *path, > +int __rte_experimental > +rte_vhost_driver_get_protocol_features(const char *path, > uint64_t *protocol_features); > > /** > @@ -299,7 +304,8 @@ int rte_vhost_driver_get_protocol_features(const > char *path, > * @return > * 0 on success, -1 on failure > */ > -int rte_vhost_driver_get_queue_num(const char *path, uint32_t > *queue_num); > +int __rte_experimental > +rte_vhost_driver_get_queue_num(const char *path, uint32_t > *queue_num); > > /** > * Get the feature bits after negotiation > @@ -523,7 +529,8 @@ uint32_t rte_vhost_rx_queue_count(int vid, uint16_t > qid); > * @return > * 0 on success, -1 on failure > */ > -int rte_vhost_get_log_base(int vid, uint64_t *log_base, > +int __rte_experimental > +rte_vhost_get_log_base(int vid, uint64_t *log_base, > uint64_t *log_size); > > /** > @@ -540,7 +547,8 @@ int rte_vhost_get_log_base(int vid, uint64_t > *log_base, > * @return > * 0 on success, -1 on failure > */ > -int rte_vhost_get_vring_base(int vid, uint16_t queue_id, > +int __rte_experimental > +rte_vhost_get_vring_base(int vid, uint16_t queue_id, > uint16_t *last_avail_idx, uint16_t *last_used_idx); > > /** > @@ -557,7 +565,8 @@ int rte_vhost_get_vring_base(int vid, uint16_t
Re: [dpdk-dev] OPDL and 18.02 Release Notes
The O stands for "Optimized", we will make the necessary changes to remove inconsistencies. Regards Peter -Original Message- From: Yigit, Ferruh Sent: Monday, March 5, 2018 5:58 PM To: Rosen, Rami ; dev@dpdk.org Cc: tho...@monjalon.net; Ma, Liang J ; Mccarthy, Peter Subject: Re: [dpdk-dev] OPDL and 18.02 Release Notes On 2/9/2018 12:08 AM, Rosen, Rami wrote: > Hi all, > Following the recent announcement of DPDK 18.02-RC4, I went over > 18.02 release notes and I have this minor query which I am not sure about: > In the release notes: > http://dpdk.org/doc/guides/rel_notes/release_18_02.html > we have the following: > ... > The OPDL (Ordered Packet Distribution Library) eventdev ... > > While in http://dpdk.org/dev/roadmap > We have: > > eventdev optimized packet distribution library (OPDL) driver ... > > So I am not sure about this inconsistency -should it be "optimized" or > "ordered" ? According driver documentation (doc/guides/eventdevs/opdl.rst) it is: "Ordered Packet Distribution Library", release notes seems correct. cc'ed maintainers. > > Regards, > Rami Rosen > > -- Intel Research and Development Ireland Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Re: [dpdk-dev] [RFC 1/4] drivers/bus/ifpga:Intel FPGA Bus Lib Code
Hi Rosen, A few comments inline. (I will skip elements already pointed out by Shreyansh.) On Tue, Mar 06, 2018 at 09:43:55AM +0800, Rosen Xu wrote: > Signed-off-by: Rosen Xu > --- > drivers/bus/ifpga/Makefile | 64 > drivers/bus/ifpga/ifpga_bus.c | 527 > > drivers/bus/ifpga/ifpga_common.c| 168 + > drivers/bus/ifpga/ifpga_common.h| 46 +++ > drivers/bus/ifpga/ifpga_logs.h | 59 > drivers/bus/ifpga/rte_bus_ifpga.h | 153 > drivers/bus/ifpga/rte_bus_ifpga_version.map | 8 + > 7 files changed, 1025 insertions(+) > create mode 100644 drivers/bus/ifpga/Makefile > create mode 100644 drivers/bus/ifpga/ifpga_bus.c > create mode 100644 drivers/bus/ifpga/ifpga_common.c > create mode 100644 drivers/bus/ifpga/ifpga_common.h > create mode 100644 drivers/bus/ifpga/ifpga_logs.h > create mode 100644 drivers/bus/ifpga/rte_bus_ifpga.h > create mode 100644 drivers/bus/ifpga/rte_bus_ifpga_version.map > > diff --git a/drivers/bus/ifpga/Makefile b/drivers/bus/ifpga/Makefile > new file mode 100644 > index 000..c71f186 > --- /dev/null > +++ b/drivers/bus/ifpga/Makefile > @@ -0,0 +1,64 @@ > +# BSD LICENSE > +# > +# Copyright(c) 2010-2017 Intel Corporation. All rights reserved. > +# All rights reserved. > +# > +# Redistribution and use in source and binary forms, with or without > +# modification, are permitted provided that the following conditions > +# are met: > +# > +# * Redistributions of source code must retain the above copyright > +# notice, this list of conditions and the following disclaimer. > +# * Redistributions in binary form must reproduce the above copyright > +# notice, this list of conditions and the following disclaimer in > +# the documentation and/or other materials provided with the > +# distribution. > +# * Neither the name of Intel Corporation nor the names of its > +# contributors may be used to endorse or promote products derived > +# from this software without specific prior written permission. > +# > +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + > +include $(RTE_SDK)/mk/rte.vars.mk > + > +# > +# library name > +# > +LIB = librte_bus_ifpga.a > +LIBABIVER := 1 > +EXPORT_MAP := rte_bus_ifpga_version.map > + > +ifeq ($(CONFIG_RTE_LIBRTE_DPAA2_DEBUG_INIT),y) > +CFLAGS += -O0 -g > +CFLAGS += "-Wno-error" > +else > +CFLAGS += -O3 > +CFLAGS += $(WERROR_FLAGS) > +endif > + > +CFLAGS += -I$(RTE_SDK)/drivers/bus/ifpga > +CFLAGS += -I$(RTE_SDK)/drivers/bus/pci > +CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal > +CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common > +#CFLAGS += -I$(RTE_SDK)/lib/librte_rawdev > +#LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring -lrte_rawdev > +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring > +#LDLIBS += -lrte_ethdev > + > +VPATH += $(SRCDIR)/base > + > +SRCS-y += \ > +ifpga_bus.c \ > +ifpga_common.c > + > +include $(RTE_SDK)/mk/rte.lib.mk > diff --git a/drivers/bus/ifpga/ifpga_bus.c b/drivers/bus/ifpga/ifpga_bus.c > new file mode 100644 > index 000..382d550 > --- /dev/null > +++ b/drivers/bus/ifpga/ifpga_bus.c > @@ -0,0 +1,527 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * Copyright 2013-2014 6WIND S.A. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior wri
Re: [dpdk-dev] [RFC 3/4] lib/librte_eal/common: Add Intel FPGA Bus Second Scan, it should be scanned after PCI Bus
-Original Message- From: Shreyansh Jain [mailto:shreyansh.j...@nxp.com] Sent: Tuesday, March 06, 2018 14:20 To: Xu, Rosen Cc: dev@dpdk.org; Doherty, Declan ; Zhang, Tianfei Subject: Re: [dpdk-dev] [RFC 3/4] lib/librte_eal/common: Add Intel FPGA Bus Second Scan, it should be scanned after PCI Bus On Tue, Mar 6, 2018 at 7:13 AM, Rosen Xu wrote: > Signed-off-by: Rosen Xu > --- > lib/librte_eal/common/eal_common_bus.c | 14 +- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_eal/common/eal_common_bus.c > b/lib/librte_eal/common/eal_common_bus.c > index 3e022d5..74bfa15 100644 > --- a/lib/librte_eal/common/eal_common_bus.c > +++ b/lib/librte_eal/common/eal_common_bus.c > @@ -70,15 +70,27 @@ struct rte_bus_list rte_bus_list = > rte_bus_scan(void) > { > int ret; > - struct rte_bus *bus = NULL; > + struct rte_bus *bus = NULL, *ifpga_bus = NULL; > > TAILQ_FOREACH(bus, &rte_bus_list, next) { > + if (!strcmp(bus->name, "ifpga")) { > + ifpga_bus = bus; > + continue; > + } > + > ret = bus->scan(); > if (ret) > RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n", > bus->name); > } > > + if (ifpga_bus) { > + ret = ifpga_bus->scan(); > + if (ret) > + RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n", > + ifpga_bus->name); > + } > + You are doing this just so that PCI scans are completed *before* ifpga scans? Rosen: yes Well, I understand that this certainly is an issue that we can't yet define a priority ordering of bus scans. But, I think what you are require is a simpler: In the file ifpga_bus.c: +RTE_REGISTER_BUS(IFPGA_BUS_NAME, rte_ifpga_bus.bus); <== this ... ... #define RTE_REGISTER_BUS(nm, bus) \ RTE_INIT_PRIO(businitfn_ ##nm, 110); \ If you define your own version of RTE_REGISTER_BUS with the priority number higher, it would be inserted later in the bus list. rte_register_bus doesn't do any inherent ordering. This would save the changes you are doing in the lib/librte_eal/common/eal_common_bus.c file. But I think there has to be a better provision of defining priority of bus scans - I am sure when new devices come in, there would be possibility of dependencies as in your case. Rosen: is the priority scan of bus is implemented? > return 0; > } > > -- > 1.8.3.1 >
[dpdk-dev] [PATCH 1/3] vhost: do not generate signal when sendmsg fails
Signed-off-by: Tiwei Bie --- lib/librte_vhost/socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 0354740fa..d703d2114 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -181,7 +181,7 @@ send_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num) } do { - ret = sendmsg(sockfd, &msgh, 0); + ret = sendmsg(sockfd, &msgh, MSG_NOSIGNAL); } while (ret < 0 && errno == EINTR); if (ret < 0) { -- 2.11.0
[dpdk-dev] [PATCH 2/3] vhost: support sending fds via send_vhost_message()
This function will be used to send fds to QEMU via slave channel. Signed-off-by: Tiwei Bie --- lib/librte_vhost/vhost_user.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 8b07b6c43..e3a1dfbfb 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -1308,13 +1308,13 @@ read_vhost_message(int sockfd, struct VhostUserMsg *msg) } static int -send_vhost_message(int sockfd, struct VhostUserMsg *msg) +send_vhost_message(int sockfd, struct VhostUserMsg *msg, int *fds, int fd_num) { if (!msg) return 0; return send_fd_message(sockfd, (char *)msg, - VHOST_USER_HDR_SIZE + msg->size, NULL, 0); + VHOST_USER_HDR_SIZE + msg->size, fds, fd_num); } static int @@ -1328,7 +1328,7 @@ send_vhost_reply(int sockfd, struct VhostUserMsg *msg) msg->flags |= VHOST_USER_VERSION; msg->flags |= VHOST_USER_REPLY_MASK; - return send_vhost_message(sockfd, msg); + return send_vhost_message(sockfd, msg, NULL, 0); } /* @@ -1643,7 +1643,7 @@ vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm) }, }; - ret = send_vhost_message(dev->slave_req_fd, &msg); + ret = send_vhost_message(dev->slave_req_fd, &msg, NULL, 0); if (ret < 0) { RTE_LOG(ERR, VHOST_CONFIG, "Failed to send IOTLB miss message (%d)\n", -- 2.11.0
[dpdk-dev] [PATCH 3/3] vhost: support VFIO based accelerator
This commit adds the VFIO based accelerator support to vhost. A new API is provided to support asking QEMU to do further setup to allow notifications and interrupts being delivered directly between the driver in guest and the vDPA device in host. Signed-off-by: Tiwei Bie --- lib/librte_vhost/rte_vhost.h | 28 ++ lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/vhost_user.c | 166 + lib/librte_vhost/vhost_user.h | 9 ++ 4 files changed, 204 insertions(+) diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index d5589c543..68842e908 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -35,6 +35,7 @@ extern "C" { #define RTE_VHOST_USER_PROTOCOL_F_REPLY_ACK3 #define RTE_VHOST_USER_PROTOCOL_F_NET_MTU 4 #define RTE_VHOST_USER_PROTOCOL_F_SLAVE_REQ5 +#define RTE_VHOST_USER_PROTOCOL_F_VFIO 8 #define RTE_VHOST_USER_F_PROTOCOL_FEATURES 30 /** @@ -591,6 +592,33 @@ rte_vhost_get_vdpa_eid(int vid); int __rte_experimental rte_vhost_get_vdpa_did(int vid); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enable or disable the VFIO based accelerator for vhost-user. + * + * This function is to ask QEMU to do further setup to better + * support the vDPA device at vhost user backend. With this + * setup, the notifications and interrupts will be delivered + * directly between the driver in guest and the vDPA device + * in host if platform supports e.g. EPT and Posted interrupt. + * It's nice to have, and not mandatory. + * + * @param vid + * vhost device ID + * @param int + * Enable or disable + * + * @return + * 0: success + * -ENODEV: no such vhost device + * -ENOTSUP: device does not support VFIO based accelerator feature + * -EINVAL: there is no accelerator assigned to this vhost device + * -EFAULT: failed to talk with QEMU + */ +int rte_vhost_vfio_accelerator_ctrl(int vid, int enable); + #ifdef __cplusplus } #endif diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index 36257e51b..ca970170f 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -72,6 +72,7 @@ EXPERIMENTAL { rte_vhost_set_vring_base; rte_vhost_get_vdpa_eid; rte_vhost_get_vdpa_did; + rte_vhost_vfio_accelerator_ctrl; rte_vdpa_register_engine; rte_vdpa_unregister_engine; rte_vdpa_find_engine_id; diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index e3a1dfbfb..a65598d80 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -35,6 +35,7 @@ #include #include #include +#include #include "iotlb.h" #include "vhost.h" @@ -1628,6 +1629,27 @@ vhost_user_msg_handler(int vid, int fd) return 0; } +static int process_slave_message_reply(struct virtio_net *dev, + const VhostUserMsg *msg) +{ + VhostUserMsg msg_reply; + + if ((msg->flags & VHOST_USER_NEED_REPLY) == 0) + return 0; + + if (read_vhost_message(dev->slave_req_fd, &msg_reply) < 0) + return -1; + + if (msg_reply.request.slave != msg->request.slave) { + RTE_LOG(ERR, VHOST_CONFIG, + "received unexpected msg type (%u), expected %u\n", + msg_reply.request.slave, msg->request.slave); + return -1; + } + + return msg_reply.payload.u64; +} + int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm) { @@ -1653,3 +1675,147 @@ vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm) return 0; } + +static int vhost_user_slave_set_vring_file(struct virtio_net *dev, + uint32_t request, + struct vhost_vring_file *file) +{ + int *fdp = NULL; + size_t fd_num = 0; + int ret; + struct VhostUserMsg msg = { + .request.slave = request, + .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY, + .payload.u64 = file->index & VHOST_USER_VRING_IDX_MASK, + .size = sizeof(msg.payload.u64), + }; + + if (file->fd < 0) + msg.payload.u64 |= VHOST_USER_VRING_NOFD_MASK; + else { + fdp = &file->fd; + fd_num = 1; + } + + ret = send_vhost_message(dev->slave_req_fd, &msg, fdp, fd_num); + if (ret < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to send slave message %u (%d)\n", + request, ret); + return ret; + } + + return process_slave_message_reply(dev, &msg); +} + +static int vhost_user_slave_set_vring_notify_area(struct virtio_net *dev, +
[dpdk-dev] [PATCH 0/3] Extend vhost to support VFIO based accelerator
This patch set introduces the VFIO based accelerator support for vhost. This is a new vhost user protocol feature to better support the vDPA device at the vhost user backend. It allows interrupts/notifications being delivered between the driver in guest and the device in host directly. Dependencies: 1. This patch set depends on the below patch set for QEMU: http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06028.html Some of the enum definitions in this patch set have been updated for the latest QEMU. A new patch set for QEMU will be sent out later. 2. This patch set depends on Zhihong's "selective datapath" patch set: http://dpdk.org/ml/archives/dev/2018-March/091858.html This patch set is generated on the latest master branch of dpdk-next-virtio with Zhihong's patches applied. Best regards, Tiwei Bie Tiwei Bie (3): vhost: do not generate signal when sendmsg fails vhost: support sending fds via send_vhost_message() vhost: support VFIO based accelerator lib/librte_vhost/rte_vhost.h | 28 ++ lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/socket.c | 2 +- lib/librte_vhost/vhost_user.c | 174 - lib/librte_vhost/vhost_user.h | 9 ++ 5 files changed, 209 insertions(+), 5 deletions(-) -- 2.11.0
Re: [dpdk-dev] [RFC 3/4] lib/librte_eal/common: Add Intel FPGA Bus Second Scan, it should be scanned after PCI Bus
On Tue, Mar 06, 2018 at 10:42:14AM +, Xu, Rosen wrote: > > > -Original Message- > From: Shreyansh Jain [mailto:shreyansh.j...@nxp.com] > Sent: Tuesday, March 06, 2018 14:20 > To: Xu, Rosen > Cc: dev@dpdk.org; Doherty, Declan ; Zhang, Tianfei > > Subject: Re: [dpdk-dev] [RFC 3/4] lib/librte_eal/common: Add Intel FPGA Bus > Second Scan, it should be scanned after PCI Bus > > On Tue, Mar 6, 2018 at 7:13 AM, Rosen Xu wrote: > > Signed-off-by: Rosen Xu > > --- > > lib/librte_eal/common/eal_common_bus.c | 14 +- > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/lib/librte_eal/common/eal_common_bus.c > > b/lib/librte_eal/common/eal_common_bus.c > > index 3e022d5..74bfa15 100644 > > --- a/lib/librte_eal/common/eal_common_bus.c > > +++ b/lib/librte_eal/common/eal_common_bus.c > > @@ -70,15 +70,27 @@ struct rte_bus_list rte_bus_list = > > rte_bus_scan(void) > > { > > int ret; > > - struct rte_bus *bus = NULL; > > + struct rte_bus *bus = NULL, *ifpga_bus = NULL; > > > > TAILQ_FOREACH(bus, &rte_bus_list, next) { > > + if (!strcmp(bus->name, "ifpga")) { > > + ifpga_bus = bus; > > + continue; > > + } > > + > > ret = bus->scan(); > > if (ret) > > RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n", > > bus->name); > > } > > > > + if (ifpga_bus) { > > + ret = ifpga_bus->scan(); > > + if (ret) > > + RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n", > > + ifpga_bus->name); > > + } > > + > > You are doing this just so that PCI scans are completed *before* ifpga scans? > Rosen: yes > Well, I understand that this certainly is an issue that we can't yet define a > priority ordering of bus scans. > > But, I think what you are require is a simpler: > > In the file ifpga_bus.c: > > +RTE_REGISTER_BUS(IFPGA_BUS_NAME, rte_ifpga_bus.bus); <== this > ... > ... > #define RTE_REGISTER_BUS(nm, bus) \ > RTE_INIT_PRIO(businitfn_ ##nm, 110); \ > > If you define your own version of RTE_REGISTER_BUS with the priority number > higher, it would be inserted later in the bus list. > rte_register_bus doesn't do any inherent ordering. > This would save the changes you are doing in the > lib/librte_eal/common/eal_common_bus.c file. > > But I think there has to be a better provision of defining priority of bus > scans - I am sure when new devices come in, there would be possibility of > dependencies as in your case. > Rosen: is the priority scan of bus is implemented? No, there is no priority set for scanning order. However, the order in which buses are registered, will modify the order in which scans are done. Thus, if you change the priority of your registration, you should be able to ensure that your scan comes last. > > > return 0; > > } > > > > -- > > 1.8.3.1 > > -- Gaëtan Rivet 6WIND
[dpdk-dev] Anyone who can help?
Hi, I met a problem when i use git to get code from dpdk.org. I never met this before. Is there anyone know what happened with this? [root@localhost dpdk]# git pull fatal: unable to access 'http://dpdk.org/git/dpdk/': The requested URL returned error: 502 [root@localhost wangyong]# git clone http://dpdk.org/git/dpdk Cloning into 'dpdk'... fatal: unable to access 'http://dpdk.org/git/dpdk/': The requested URL returned error: 502
Re: [dpdk-dev] [PATCH v4 2/5] eal: use file to check if secondary process is ready
On 02-Mar-18 3:14 PM, Anatoly Burakov wrote: Previously, IPC would remove sockets it considers to be "inactive" based on whether they have responded. We also need to prevent sending messages to processes that are active, but haven't yet finished initialization. This will create a "init file" per socket which will be removed after initialization is complete, to prevent primary process from sending messages to a process that hasn't finished its initialization. Signed-off-by: Anatoly Burakov --- Self-NACK on this patch. Secondary processes may initialize data structures, which means IPC has to be active during init. Each subsystem will therefore have to synchronize access to IPC on their own. (For example, memory hotplug will only block IPC for a short period between rte_config_init() and init of memory/heap init) -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH 00/41] Memory Hotplug for DPDK
On 03-Mar-18 1:45 PM, Anatoly Burakov wrote: This patchset introduces dynamic memory allocation for DPDK (aka memory hotplug). Based upon RFC submitted in December [1]. For those testing this patch, there's a deadlock-at-startup issue when DPDK is started with no memory. This will be fixed in v2 (as well as dependent IPC patches), but for now the workaround is to start DPDK with -m/--socket-mem switches. -- Thanks, Anatoly
Re: [dpdk-dev] [RFC 3/4] lib/librte_eal/common: Add Intel FPGA Bus Second Scan, it should be scanned after PCI Bus
On Tue, Mar 06, 2018 at 11:46:22AM +0100, Gaëtan Rivet wrote: > On Tue, Mar 06, 2018 at 10:42:14AM +, Xu, Rosen wrote: > > > > > > -Original Message- > > From: Shreyansh Jain [mailto:shreyansh.j...@nxp.com] > > Sent: Tuesday, March 06, 2018 14:20 > > To: Xu, Rosen > > Cc: dev@dpdk.org; Doherty, Declan ; Zhang, > > Tianfei > > Subject: Re: [dpdk-dev] [RFC 3/4] lib/librte_eal/common: Add Intel FPGA Bus > > Second Scan, it should be scanned after PCI Bus > > > > On Tue, Mar 6, 2018 at 7:13 AM, Rosen Xu wrote: > > > Signed-off-by: Rosen Xu > > > --- > > > lib/librte_eal/common/eal_common_bus.c | 14 +- > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > diff --git a/lib/librte_eal/common/eal_common_bus.c > > > b/lib/librte_eal/common/eal_common_bus.c > > > index 3e022d5..74bfa15 100644 > > > --- a/lib/librte_eal/common/eal_common_bus.c > > > +++ b/lib/librte_eal/common/eal_common_bus.c > > > @@ -70,15 +70,27 @@ struct rte_bus_list rte_bus_list = > > > rte_bus_scan(void) > > > { > > > int ret; > > > - struct rte_bus *bus = NULL; > > > + struct rte_bus *bus = NULL, *ifpga_bus = NULL; > > > > > > TAILQ_FOREACH(bus, &rte_bus_list, next) { > > > + if (!strcmp(bus->name, "ifpga")) { > > > + ifpga_bus = bus; > > > + continue; > > > + } > > > + > > > ret = bus->scan(); > > > if (ret) > > > RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n", > > > bus->name); > > > } > > > > > > + if (ifpga_bus) { > > > + ret = ifpga_bus->scan(); > > > + if (ret) > > > + RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n", > > > + ifpga_bus->name); > > > + } > > > + > > > > You are doing this just so that PCI scans are completed *before* ifpga > > scans? > > Rosen: yes > > Well, I understand that this certainly is an issue that we can't yet define > > a priority ordering of bus scans. > > > > But, I think what you are require is a simpler: > > > > In the file ifpga_bus.c: > > > > +RTE_REGISTER_BUS(IFPGA_BUS_NAME, rte_ifpga_bus.bus); <== this > > ... > > ... > > #define RTE_REGISTER_BUS(nm, bus) \ > > RTE_INIT_PRIO(businitfn_ ##nm, 110); \ > > > > If you define your own version of RTE_REGISTER_BUS with the priority number > > higher, it would be inserted later in the bus list. > > rte_register_bus doesn't do any inherent ordering. > > This would save the changes you are doing in the > > lib/librte_eal/common/eal_common_bus.c file. > > > > But I think there has to be a better provision of defining priority of bus > > scans - I am sure when new devices come in, there would be possibility of > > dependencies as in your case. > > Rosen: is the priority scan of bus is implemented? > > No, there is no priority set for scanning order. > However, the order in which buses are registered, will modify the order > in which scans are done. > > Thus, if you change the priority of your registration, you should be > able to ensure that your scan comes last. > Can we register the bus only when a PCI device match is found at runtime, e.g. as part of the PCI driver instance initialization? /Bruce
Re: [dpdk-dev] [PATCH v2] net/null:Different mac address support
On 3/6/2018 3:35 AM, Mallesh Koujalagi wrote: > After attaching two Null device to ovs, seeing "00.00.00.00.00.00" mac > address for both null devices. Fix this issue, by setting different mac > address. > > Signed-off-by: Mallesh Koujalagi <...> > @@ -514,12 +524,21 @@ eth_dev_null_create(struct rte_vdev_device *dev, > if (!data) > return -ENOMEM; > > + eth_addr = rte_zmalloc_socket(rte_vdev_device_name(dev), > + sizeof(*eth_addr), 0, dev->device.numa_node); > + if (eth_addr == NULL) { > + rte_free(data); > + return -ENOMEM; > + } > + > eth_dev = rte_eth_vdev_allocate(dev, sizeof(*internals)); > if (!eth_dev) { > + rte_free(eth_addr); > rte_free(data); > return -ENOMEM; > } Same comment from previous version, why not put "eth_addr" inside "struct pmd_internals"? "struct pmd_internals" is already allocated/freed in the code, so you don't need to manage "eth_addr" if you put it into "struct pmd_internals" it will come free. <...>
Re: [dpdk-dev] [dpdk-stable] [PATCH] net/bonding: avoid wrong casting on primary_slave_port_id from input param
On 3/6/2018 9:37 AM, Gowrishankar wrote: > From: Gowrishankar Muthukrishnan > > primary_slave_port_id is uint16_t which needs to be correctly stored > with the same data type of input parameter in bond_ethdev_configure. > > Fixes: f8244c6399 ("ethdev: increase port id range") > Cc: sta...@dpdk.org > > Signed-off-by: Gowrishankar Muthukrishnan Acked-by: Ferruh Yigit
Re: [dpdk-dev] [PATCH v1] app/pdump: add check for PCAP PMD
On 3/6/2018 8:45 AM, Varghese, Vipin wrote: > Hi Ferruh, > >> -Original Message- >> From: Yigit, Ferruh >> Sent: Monday, March 5, 2018 2:33 PM >> To: Varghese, Vipin ; dev@dpdk.org; Pattan, >> Reshma >> Cc: Mcnamara, John >> Subject: Re: [dpdk-dev] [PATCH v1] app/pdump: add check for PCAP PMD >> >> On 3/5/2018 7:57 AM, Vipin Varghese wrote: >>> dpdk-pdump makes use of LIBRTE_PMD_PCAP for interfacing the ring to >>> the device-queue pair. Updating Makefile to check for the same. >>> >>> Signed-off-by: Vipin Varghese >>> --- >>> app/pdump/Makefile | 4 >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/app/pdump/Makefile b/app/pdump/Makefile index >>> bd3c208..038a34f 100644 >>> --- a/app/pdump/Makefile >>> +++ b/app/pdump/Makefile >>> @@ -3,6 +3,10 @@ >>> >>> include $(RTE_SDK)/mk/rte.vars.mk >>> >>> +ifeq ($(CONFIG_RTE_LIBRTE_PMD_PCAP),n) $(error "Please enable >>> +CONFIG_RTE_LIBRTE_PMD_PCAP") endif >> >> pdump is enabled default, so won't this break the default build? > > Yes, you are right it will fail. Which then forces the user to enable PCAP. We shouldn't break the default build because of missing dependencies. > >> >> What about moving this to lib/librte_pdump, convert $(error ..) to $(warning >> ..) >> and disable CONFIG_RTE_LIBRTE_PDUMP there? > > If we set to warning and there are no PCAP headers in build system. The > application gets built, but will fail internally becz the pcap API will fails > during execution. if CONFIG_RTE_LIBRTE_PDUMP disabled application won't be compiled > >> >>> + >>> ifeq ($(CONFIG_RTE_LIBRTE_PDUMP),y) >>> >>> APP = dpdk-pdump >>> >
Re: [dpdk-dev] [RFC 3/4] lib/librte_eal/common: Add Intel FPGA Bus Second Scan, it should be scanned after PCI Bus
On Tue, Mar 06, 2018 at 11:36:17AM +, Bruce Richardson wrote: > On Tue, Mar 06, 2018 at 11:46:22AM +0100, Gaëtan Rivet wrote: > > On Tue, Mar 06, 2018 at 10:42:14AM +, Xu, Rosen wrote: > > > > > > > > > -Original Message- > > > From: Shreyansh Jain [mailto:shreyansh.j...@nxp.com] > > > Sent: Tuesday, March 06, 2018 14:20 > > > To: Xu, Rosen > > > Cc: dev@dpdk.org; Doherty, Declan ; Zhang, > > > Tianfei > > > Subject: Re: [dpdk-dev] [RFC 3/4] lib/librte_eal/common: Add Intel FPGA > > > Bus Second Scan, it should be scanned after PCI Bus > > > > > > On Tue, Mar 6, 2018 at 7:13 AM, Rosen Xu wrote: > > > > Signed-off-by: Rosen Xu > > > > --- > > > > lib/librte_eal/common/eal_common_bus.c | 14 +- > > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/lib/librte_eal/common/eal_common_bus.c > > > > b/lib/librte_eal/common/eal_common_bus.c > > > > index 3e022d5..74bfa15 100644 > > > > --- a/lib/librte_eal/common/eal_common_bus.c > > > > +++ b/lib/librte_eal/common/eal_common_bus.c > > > > @@ -70,15 +70,27 @@ struct rte_bus_list rte_bus_list = > > > > rte_bus_scan(void) > > > > { > > > > int ret; > > > > - struct rte_bus *bus = NULL; > > > > + struct rte_bus *bus = NULL, *ifpga_bus = NULL; > > > > > > > > TAILQ_FOREACH(bus, &rte_bus_list, next) { > > > > + if (!strcmp(bus->name, "ifpga")) { > > > > + ifpga_bus = bus; > > > > + continue; > > > > + } > > > > + > > > > ret = bus->scan(); > > > > if (ret) > > > > RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n", > > > > bus->name); > > > > } > > > > > > > > + if (ifpga_bus) { > > > > + ret = ifpga_bus->scan(); > > > > + if (ret) > > > > + RTE_LOG(ERR, EAL, "Scan for (%s) bus failed.\n", > > > > + ifpga_bus->name); > > > > + } > > > > + > > > > > > You are doing this just so that PCI scans are completed *before* ifpga > > > scans? > > > Rosen: yes > > > Well, I understand that this certainly is an issue that we can't yet > > > define a priority ordering of bus scans. > > > > > > But, I think what you are require is a simpler: > > > > > > In the file ifpga_bus.c: > > > > > > +RTE_REGISTER_BUS(IFPGA_BUS_NAME, rte_ifpga_bus.bus); <== this > > > ... > > > ... > > > #define RTE_REGISTER_BUS(nm, bus) \ > > > RTE_INIT_PRIO(businitfn_ ##nm, 110); \ > > > > > > If you define your own version of RTE_REGISTER_BUS with the priority > > > number higher, it would be inserted later in the bus list. > > > rte_register_bus doesn't do any inherent ordering. > > > This would save the changes you are doing in the > > > lib/librte_eal/common/eal_common_bus.c file. > > > > > > But I think there has to be a better provision of defining priority of > > > bus scans - I am sure when new devices come in, there would be > > > possibility of dependencies as in your case. > > > Rosen: is the priority scan of bus is implemented? > > > > No, there is no priority set for scanning order. > > However, the order in which buses are registered, will modify the order > > in which scans are done. > > > > Thus, if you change the priority of your registration, you should be > > able to ensure that your scan comes last. > > > > Can we register the bus only when a PCI device match is found at > runtime, e.g. as part of the PCI driver instance initialization? > > /Bruce Technically, yes. You would append a new bus during rte_bus_probe, so the linked list would simply have a new node and you would then probe it. You would need to make sure you scan your bus first, so you would have some weird conditions (whether you are loaded during probe or naturally, you'd have to do your scan or not). However, this seems like a terrible idea. You introduce an edge case that will need to be carried over in most of the bus API implementation. This new bus seems like a specialization of the PCI bus. Why not directly use the PCI bus and have your driver linked to either a rawdev or a vdev, where you could store your metadata and expose a specialized interface? -- Gaëtan Rivet 6WIND
Re: [dpdk-dev] [PATCH v1] app/pdump: add check for PCAP PMD
> > +ifeq ($(CONFIG_RTE_LIBRTE_PMD_PCAP),n) > +$(error "Please enable CONFIG_RTE_LIBRTE_PMD_PCAP") endif > + How about combining If(($(CONFIG_RTE_LIBRTE_PMD_PCAP),y) check with below existing if check? with this, dpdk-pdump will be compiled only when both the flags are enabled. > ifeq ($(CONFIG_RTE_LIBRTE_PDUMP),y) > > APP = dpdk-pdump > -- > 1.9.1
Re: [dpdk-dev] Anyone who can help?
+Thomas, Hi, On Tue, Mar 06, 2018 at 06:54:25PM +0800, wang.yon...@zte.com.cn wrote: > Hi, > I met a problem when i use git to get code from dpdk.org. I never met this > before. > Is there anyone know what happened with this? > > [root@localhost dpdk]# git pull > fatal: unable to access 'http://dpdk.org/git/dpdk/': The requested URL > returned error: 502 > > [root@localhost wangyong]# git clone http://dpdk.org/git/dpdk > Cloning into 'dpdk'... > fatal: unable to access 'http://dpdk.org/git/dpdk/': The requested URL > returned error: 502 Did you tried with https or git protocol instead? https://dpdk.org/git/dpdk/ git://dpdk.org/git/dpdk/ Regards, -- Nélio Laranjeiro 6WIND
Re: [dpdk-dev] [PATCH v2 1/6] vhost: export vhost feature definitions
On 03/06/2018 10:37 AM, Tan, Jianfeng wrote: -Original Message- From: Wang, Zhihong Sent: Tuesday, February 13, 2018 5:21 PM To: dev@dpdk.org Cc: Tan, Jianfeng; Bie, Tiwei; maxime.coque...@redhat.com; y...@fridaylinux.org; Liang, Cunming; Wang, Xiao W; Daly, Dan; Wang, Zhihong Subject: [PATCH v2 1/6] vhost: export vhost feature definitions This patch exports vhost-user protocol features to support device driver development. Signed-off-by: Zhihong Wang --- lib/librte_vhost/rte_vhost.h | 8 lib/librte_vhost/vhost.h | 4 +--- lib/librte_vhost/vhost_user.c | 9 + lib/librte_vhost/vhost_user.h | 20 +++- 4 files changed, 21 insertions(+), 20 deletions(-) diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index d33206997..b05162366 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -29,6 +29,14 @@ extern "C" { #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY (1ULL << 2) #define RTE_VHOST_USER_IOMMU_SUPPORT (1ULL << 3) +#define RTE_VHOST_USER_PROTOCOL_F_MQ 0 Instead of adding a "RTE_" prefix. I prefer to define it like this: #ifndef VHOST_USER_PROTOCOL_F_MQ #define VHOST_USER_PROTOCOL_F_MQ 0 #endif Similar to other macros. I agree, it is better to keep same naming as in the spec IMHO. +#define RTE_VHOST_USER_PROTOCOL_F_LOG_SHMFD1 +#define RTE_VHOST_USER_PROTOCOL_F_RARP 2 +#define RTE_VHOST_USER_PROTOCOL_F_REPLY_ACK3 +#define RTE_VHOST_USER_PROTOCOL_F_NET_MTU 4 +#define RTE_VHOST_USER_PROTOCOL_F_SLAVE_REQ5 +#define RTE_VHOST_USER_F_PROTOCOL_FEATURES 30 Please put the above declaration separately, it could be misleading, making to think it is a vhost-user protocol feature whereas it is a Virtio feature. + /** * Information relating to memory regions including offsets to * addresses in QEMUs memory file. diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 58aec2e0d..a0b0520e2 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -174,8 +174,6 @@ struct vhost_msg { #define VIRTIO_F_VERSION_1 32 #endif -#define VHOST_USER_F_PROTOCOL_FEATURES 30 - /* Features supported by this builtin vhost-user net driver. */ #define VIRTIO_NET_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \ (1ULL << VIRTIO_F_ANY_LAYOUT) | \ @@ -185,7 +183,7 @@ struct vhost_msg { (1ULL << VIRTIO_NET_F_MQ) | \ (1ULL << VIRTIO_F_VERSION_1) | \ (1ULL << VHOST_F_LOG_ALL) | \ - (1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \ + (1ULL << RTE_VHOST_USER_F_PROTOCOL_FEATURES) | \ (1ULL << VIRTIO_NET_F_GSO) | \ (1ULL << VIRTIO_NET_F_HOST_TSO4) | \ (1ULL << VIRTIO_NET_F_HOST_TSO6) | \ diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 5c5361066..c93e48e4d 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -527,7 +527,7 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, VhostUserMsg *msg) vring_invalidate(dev, vq); if (vq->enabled && (dev->features & - (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) { + (1ULL << RTE_VHOST_USER_F_PROTOCOL_FEATURES))) { dev = translate_ring_addresses(dev, msg- payload.addr.index); if (!dev) return -1; @@ -897,11 +897,11 @@ vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *pmsg) vq = dev->virtqueue[file.index]; /* -* When VHOST_USER_F_PROTOCOL_FEATURES is not negotiated, +* When RTE_VHOST_USER_F_PROTOCOL_FEATURES is not negotiated, * the ring starts already enabled. Otherwise, it is enabled via * the SET_VRING_ENABLE message. */ - if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) + if (!(dev->features & (1ULL << RTE_VHOST_USER_F_PROTOCOL_FEATURES))) vq->enabled = 1; if (vq->kickfd >= 0) @@ -1012,7 +1012,8 @@ vhost_user_get_protocol_features(struct virtio_net *dev, * Qemu versions (from v2.7.0 to v2.9.0). */ if (!(features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) - protocol_features &= ~(1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK); + protocol_features &= + ~(1ULL << RTE_VHOST_USER_PROTOCOL_F_REPLY_ACK); msg->payload.u64 = protocol_features; msg->size = sizeof(msg->payload.u64); diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h index 0fafbe6e0..066e772dd 100644 --- a/lib/librte_vhost/vhost_user.h +++ b/lib/librte_vhost/vhost_user.h @@ -14,19 +
Re: [dpdk-dev] [PATCH 3/3] vhost: support VFIO based accelerator
On 03/06/2018 11:43 AM, Tiwei Bie wrote: This commit adds the VFIO based accelerator support to vhost. A new API is provided to support asking QEMU to do further setup to allow notifications and interrupts being delivered directly between the driver in guest and the vDPA device in host. Signed-off-by: Tiwei Bie --- lib/librte_vhost/rte_vhost.h | 28 ++ lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/vhost_user.c | 166 + lib/librte_vhost/vhost_user.h | 9 ++ 4 files changed, 204 insertions(+) diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index d5589c543..68842e908 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -35,6 +35,7 @@ extern "C" { #define RTE_VHOST_USER_PROTOCOL_F_REPLY_ACK 3 #define RTE_VHOST_USER_PROTOCOL_F_NET_MTU 4 #define RTE_VHOST_USER_PROTOCOL_F_SLAVE_REQ 5 +#define RTE_VHOST_USER_PROTOCOL_F_VFIO 8 #define RTE_VHOST_USER_F_PROTOCOL_FEATURES30 /** @@ -591,6 +592,33 @@ rte_vhost_get_vdpa_eid(int vid); int __rte_experimental rte_vhost_get_vdpa_did(int vid); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enable or disable the VFIO based accelerator for vhost-user. + * + * This function is to ask QEMU to do further setup to better + * support the vDPA device at vhost user backend. With this + * setup, the notifications and interrupts will be delivered + * directly between the driver in guest and the vDPA device + * in host if platform supports e.g. EPT and Posted interrupt. + * It's nice to have, and not mandatory. + * + * @param vid + * vhost device ID + * @param int + * Enable or disable + * + * @return + * 0: success + * -ENODEV: no such vhost device + * -ENOTSUP: device does not support VFIO based accelerator feature + * -EINVAL: there is no accelerator assigned to this vhost device + * -EFAULT: failed to talk with QEMU + */ +int rte_vhost_vfio_accelerator_ctrl(int vid, int enable); + #ifdef __cplusplus } #endif diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index 36257e51b..ca970170f 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -72,6 +72,7 @@ EXPERIMENTAL { rte_vhost_set_vring_base; rte_vhost_get_vdpa_eid; rte_vhost_get_vdpa_did; + rte_vhost_vfio_accelerator_ctrl; rte_vdpa_register_engine; rte_vdpa_unregister_engine; rte_vdpa_find_engine_id; diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index e3a1dfbfb..a65598d80 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -35,6 +35,7 @@ #include #include #include +#include #include "iotlb.h" #include "vhost.h" @@ -1628,6 +1629,27 @@ vhost_user_msg_handler(int vid, int fd) return 0; } +static int process_slave_message_reply(struct virtio_net *dev, + const VhostUserMsg *msg) +{ + VhostUserMsg msg_reply; + + if ((msg->flags & VHOST_USER_NEED_REPLY) == 0) + return 0; + + if (read_vhost_message(dev->slave_req_fd, &msg_reply) < 0) + return -1; + + if (msg_reply.request.slave != msg->request.slave) { + RTE_LOG(ERR, VHOST_CONFIG, + "received unexpected msg type (%u), expected %u\n", + msg_reply.request.slave, msg->request.slave); + return -1; + } + + return msg_reply.payload.u64; +} + int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm) { @@ -1653,3 +1675,147 @@ vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm) return 0; } + +static int vhost_user_slave_set_vring_file(struct virtio_net *dev, + uint32_t request, + struct vhost_vring_file *file) Why passing the request as an argument? It seems to be called only with the same request ID. +{ + int *fdp = NULL; + size_t fd_num = 0; + int ret; + struct VhostUserMsg msg = { + .request.slave = request, + .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY, + .payload.u64 = file->index & VHOST_USER_VRING_IDX_MASK, + .size = sizeof(msg.payload.u64), + }; + + if (file->fd < 0) + msg.payload.u64 |= VHOST_USER_VRING_NOFD_MASK; + else { + fdp = &file->fd; + fd_num = 1; + } + + ret = send_vhost_message(dev->slave_req_fd, &msg, fdp, fd_num); + if (ret < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "Failed to send slave message %u (%d)\n", + request, ret); + return ret; + } + + retu
Re: [dpdk-dev] [PATCH 2/6] net/sfc: add support for driver-wide dynamic logging
On 03/05/2018 05:59 PM, Ferruh Yigit wrote: On 1/25/2018 5:00 PM, Andrew Rybchenko wrote: From: Ivan Malov Signed-off-by: Ivan Malov Signed-off-by: Andrew Rybchenko Reviewed-by: Andy Moreton <...> @@ -2082,3 +2084,14 @@ RTE_PMD_REGISTER_PARAM_STRING(net_sfc_efx, SFC_KVARG_STATS_UPDATE_PERIOD_MS "= " SFC_KVARG_MCDI_LOGGING "=" SFC_KVARG_VALUES_BOOL " " SFC_KVARG_DEBUG_INIT "=" SFC_KVARG_VALUES_BOOL); + +RTE_INIT(sfc_driver_register_logtype); +static void +sfc_driver_register_logtype(void) +{ + int ret; + + ret = rte_log_register_type_and_pick_level(SFC_LOGTYPE_PREFIX "driver", + RTE_LOG_NOTICE); No benefit of using rte_log_register_type_and_pick_level() here, in this stage "opt_loglevel_list" will be empty and this will be same as rte_log_register() That's true except "uniform approach is good". I.e. simply use rte_log_register_type_and_pick_level() everywhere to make it safe against code movements. In fact it was raised during internal review and we kept as you can see it. Other option is to avoid usage of constructor here at all and move it to probe. Yes, it will be tried many times, but there is no harm if it is already registered.
Re: [dpdk-dev] [PATCH 2/6] net/sfc: add support for driver-wide dynamic logging
On 03/06/2018 05:45 PM, Andrew Rybchenko wrote: On 03/05/2018 05:59 PM, Ferruh Yigit wrote: On 1/25/2018 5:00 PM, Andrew Rybchenko wrote: From: Ivan Malov Signed-off-by: Ivan Malov Signed-off-by: Andrew Rybchenko Reviewed-by: Andy Moreton <...> @@ -2082,3 +2084,14 @@ RTE_PMD_REGISTER_PARAM_STRING(net_sfc_efx, SFC_KVARG_STATS_UPDATE_PERIOD_MS "= " SFC_KVARG_MCDI_LOGGING "=" SFC_KVARG_VALUES_BOOL " " SFC_KVARG_DEBUG_INIT "=" SFC_KVARG_VALUES_BOOL); + +RTE_INIT(sfc_driver_register_logtype); +static void +sfc_driver_register_logtype(void) +{ + int ret; + + ret = rte_log_register_type_and_pick_level(SFC_LOGTYPE_PREFIX "driver", + RTE_LOG_NOTICE); No benefit of using rte_log_register_type_and_pick_level() here, in this stage "opt_loglevel_list" will be empty and this will be same as rte_log_register() That's true except "uniform approach is good". I.e. simply use rte_log_register_type_and_pick_level() everywhere to make it safe against code movements. In fact it was raised during internal review and we kept as you can see it. Other option is to avoid usage of constructor here at all and move it to probe. Yes, it will be tried many times, but there is no harm if it is already registered. In fact it could be really required if dynamic library is used and it is pulled later using dlopen() - don't know if there are any restrictions in DPDK which prevent it.
Re: [dpdk-dev] [PATCH 01/14] net/sfc/base: support filters for encapsulated packets
On 02/27/2018 03:45 PM, Andrew Rybchenko wrote: From: Roman Zhukov This adds filters for encapsulated packets to the list returned by ef10_filter_supported_filters(). Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Andy Moreton --- drivers/net/sfc/base/ef10_filter.c | 65 -- 1 file changed, 55 insertions(+), 10 deletions(-) <...> - rc = efx_mcdi_get_parser_disp_info(enp, buffer, buffer_length, - &mcdi_list_length); + /* +* Two calls to MC_CMD_GET_PARSER_DISP_INFO are needed: one to get the +* list of supported filters for ordinary packets, and then another to +* get the list of supported filters for encapsulated packets. +*/ + rc = efx_mcdi_get_parser_disp_info(enp, buffer, buffer_length, B_FALSE, + &mcdi_list_length); if (rc != 0) { - if (rc == ENOSPC) { - /* Pass through mcdi_list_length for the list length */ - *list_lengthp = mcdi_list_length; + if (rc == ENOSPC) + no_space = B_TRUE; + else + goto fail1; + } + + if (no_space) { + next_buf_idx = 0; + next_buf_length = 0; + } else { + EFSYS_ASSERT(mcdi_list_length < buffer_length); In fact <= must be here since above call may return 0 if return array fits exactly in provided buffer. I'll send v2. + next_buf_idx = mcdi_list_length; + next_buf_length = buffer_length - mcdi_list_length; + }
Re: [dpdk-dev] Anyone who can help?
06/03/2018 11:54, wang.yon...@zte.com.cn: > Hi, > I met a problem when i use git to get code from dpdk.org. I never met this > before. > Is there anyone know what happened with this? > > [root@localhost dpdk]# git pull > fatal: unable to access 'http://dpdk.org/git/dpdk/': The requested URL > returned error: 502 There was an outage with git on dpdk.org today. It has been fixed when discovered.
Re: [dpdk-dev] [PATCH 2/6] net/sfc: add support for driver-wide dynamic logging
On 3/6/2018 2:56 PM, Andrew Rybchenko wrote: > > In fact it could be really required if dynamic library is used and it is > pulled later using dlopen() - don't know if there are any restrictions in > DPDK which prevent it. That function has constructor attribute, not sure how it works for that case. I am good as long as this is not missed but decided to implement this way.
[dpdk-dev] [PATCH v2 05/14] net/sfc: add VXLAN in flow API filters support
From: Roman Zhukov Exact match of VXLAN network identifier is supported by parser. IP protocol match are enforced to UDP. Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Ivan Malov Reviewed-by: Andy Moreton --- doc/guides/nics/sfc_efx.rst | 2 + drivers/net/sfc/sfc_flow.c | 165 2 files changed, 167 insertions(+) diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst index ccdf5ff..5a4b2a6 100644 --- a/doc/guides/nics/sfc_efx.rst +++ b/doc/guides/nics/sfc_efx.rst @@ -166,6 +166,8 @@ Supported pattern items: - UDP (exact match of source/destination ports) +- VXLAN (exact match of VXLAN network identifier) + Supported actions: - VOID diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c index 93cdf8f..20ba69d 100644 --- a/drivers/net/sfc/sfc_flow.c +++ b/drivers/net/sfc/sfc_flow.c @@ -57,6 +57,7 @@ static sfc_flow_item_parse sfc_flow_parse_ipv4; static sfc_flow_item_parse sfc_flow_parse_ipv6; static sfc_flow_item_parse sfc_flow_parse_tcp; static sfc_flow_item_parse sfc_flow_parse_udp; +static sfc_flow_item_parse sfc_flow_parse_vxlan; static boolean_t sfc_flow_is_zero(const uint8_t *buf, unsigned int size) @@ -696,6 +697,132 @@ sfc_flow_parse_udp(const struct rte_flow_item *item, return -rte_errno; } +/* + * Filters for encapsulated packets match based on the EtherType and IP + * protocol in the outer frame. + */ +static int +sfc_flow_set_match_flags_for_encap_pkts(const struct rte_flow_item *item, + efx_filter_spec_t *efx_spec, + uint8_t ip_proto, + struct rte_flow_error *error) +{ + if (!(efx_spec->efs_match_flags & EFX_FILTER_MATCH_IP_PROTO)) { + efx_spec->efs_match_flags |= EFX_FILTER_MATCH_IP_PROTO; + efx_spec->efs_ip_proto = ip_proto; + } else if (efx_spec->efs_ip_proto != ip_proto) { + switch (ip_proto) { + case EFX_IPPROTO_UDP: + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "Outer IP header protocol must be UDP " + "in VxLAN pattern"); + return -rte_errno; + + default: + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "Only VxLAN tunneling patterns " + "are supported"); + return -rte_errno; + } + } + + if (!(efx_spec->efs_match_flags & EFX_FILTER_MATCH_ETHER_TYPE)) { + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "Outer frame EtherType in pattern with tunneling " + "must be set"); + return -rte_errno; + } else if (efx_spec->efs_ether_type != EFX_ETHER_TYPE_IPV4 && + efx_spec->efs_ether_type != EFX_ETHER_TYPE_IPV6) { + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "Outer frame EtherType in pattern with tunneling " + "must be IPv4 or IPv6"); + return -rte_errno; + } + + return 0; +} + +static int +sfc_flow_set_efx_spec_vni_or_vsid(efx_filter_spec_t *efx_spec, + const uint8_t *vni_or_vsid_val, + const uint8_t *vni_or_vsid_mask, + const struct rte_flow_item *item, + struct rte_flow_error *error) +{ + const uint8_t vni_or_vsid_full_mask[EFX_VNI_OR_VSID_LEN] = { + 0xff, 0xff, 0xff + }; + + if (memcmp(vni_or_vsid_mask, vni_or_vsid_full_mask, + EFX_VNI_OR_VSID_LEN) == 0) { + efx_spec->efs_match_flags |= EFX_FILTER_MATCH_VNI_OR_VSID; + rte_memcpy(efx_spec->efs_vni_or_vsid, vni_or_vsid_val, + EFX_VNI_OR_VSID_LEN); + } else if (!sfc_flow_is_zero(vni_or_vsid_mask, EFX_VNI_OR_VSID_LEN)) { + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "Unsupported VNI/VSID mask"); + return -rte_errno; + } + + return 0; +} + +/** + * Convert VXLAN item to EFX filter specification. + * + * @param item[in] + * Item specification. Only VXLAN network identifier field is supported. + * If the mask is NULL, default mask will be used. + * Ranging is not supported. + * @param efx_spec[in, out] + * EFX filter specification to update. + * @param[out] error + * Perform verbose error reporting if not NULL. + */ +static int +sfc_flow_parse_vxla
[dpdk-dev] [PATCH v2 13/14] net/sfc: avoid creation of ineffective flow rules
From: Roman Zhukov Despite being versatile, the hardware support for filtering has a number of special properties which must be taken into account. Namely, there is a known set of valid filters which don't take any effect despite being accepted by the hardware. The combinations of match flags and field values which can describe the exceptional filters are as follows: - ETHER_TYPE or ETHER_TYPE | LOC_MAC with IPv4 or IPv6 EtherType - ETHER_TYPE | IP_PROTO or ETHER_TYPE | IP_PROTO | LOC_MAC with UDP or TCP IP protocol value - The same combinations with OUTER_VID and/or INNER_VID These exceptional filters can be expressed in terms of RTE flow rules. If the user creates such a flow rule, no traffic will hit the underlying filter, and no errors will be reported. This patch adds a means to prevent such ineffective flow rules from being created. Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Ivan Malov --- doc/guides/nics/sfc_efx.rst | 17 ++ drivers/net/sfc/sfc_flow.c | 78 + 2 files changed, 95 insertions(+) diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst index 539ce90..f41ccdb 100644 --- a/doc/guides/nics/sfc_efx.rst +++ b/doc/guides/nics/sfc_efx.rst @@ -193,6 +193,23 @@ in the mask of destination address. If destinaton address in the spec is multicast, it matches all multicast (and broadcast) packets, oherwise it matches unicast packets that are not filtered by other flow rules. +Exceptions to flow rules + + +There is a list of exceptional flow rule patterns which will not be +accepted by the PMD. A pattern will be rejected if at least one of the +conditions is met: + +- Filtering by IPv4 or IPv6 EtherType without pattern items of internet + layer and above. + +- The last item is IPV4 or IPV6, and it's empty. + +- Filtering by TCP or UDP IP transport protocol without pattern items of + transport layer and above. + +- The last item is TCP or UDP, and it's empty. + Supported NICs -- diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c index 7b26653..2b8bef8 100644 --- a/drivers/net/sfc/sfc_flow.c +++ b/drivers/net/sfc/sfc_flow.c @@ -1919,6 +1919,77 @@ sfc_flow_spec_filters_complete(struct sfc_adapter *sa, return 0; } +/** + * Check that set of match flags is referred to by a filter. Filter is + * described by match flags with the ability to add OUTER_VID and INNER_VID + * flags. + * + * @param match_flags[in] + * Set of match flags. + * @param flags_pattern[in] + * Pattern of filter match flags. + */ +static boolean_t +sfc_flow_is_match_with_vids(efx_filter_match_flags_t match_flags, + efx_filter_match_flags_t flags_pattern) +{ + if ((match_flags & flags_pattern) != flags_pattern) + return B_FALSE; + + switch (match_flags & ~flags_pattern) { + case 0: + case EFX_FILTER_MATCH_OUTER_VID: + case EFX_FILTER_MATCH_OUTER_VID | EFX_FILTER_MATCH_INNER_VID: + return B_TRUE; + default: + return B_FALSE; + } +} + +/** + * Check whether the spec maps to a hardware filter which is known to be + * ineffective despite being valid. + * + * @param spec[in] + * SFC flow specification. + */ +static boolean_t +sfc_flow_is_match_flags_exception(struct sfc_flow_spec *spec) +{ + unsigned int i; + uint16_t ether_type; + uint8_t ip_proto; + efx_filter_match_flags_t match_flags; + + for (i = 0; i < spec->count; i++) { + match_flags = spec->filters[i].efs_match_flags; + + if (sfc_flow_is_match_with_vids(match_flags, + EFX_FILTER_MATCH_ETHER_TYPE) || + sfc_flow_is_match_with_vids(match_flags, + EFX_FILTER_MATCH_ETHER_TYPE | + EFX_FILTER_MATCH_LOC_MAC)) { + ether_type = spec->filters[i].efs_ether_type; + if (ether_type == EFX_ETHER_TYPE_IPV4 || + ether_type == EFX_ETHER_TYPE_IPV6) + return B_TRUE; + } else if (sfc_flow_is_match_with_vids(match_flags, + EFX_FILTER_MATCH_ETHER_TYPE | + EFX_FILTER_MATCH_IP_PROTO) || + sfc_flow_is_match_with_vids(match_flags, + EFX_FILTER_MATCH_ETHER_TYPE | + EFX_FILTER_MATCH_IP_PROTO | + EFX_FILTER_MATCH_LOC_MAC)) { + ip_proto = spec->filters[i].efs_ip_proto; + if (ip_proto == EFX_IPPROTO_TCP || + ip_proto == EFX_IPPROTO_UDP) + return B_TRUE; + } + } + + return B_FALSE; +} + static int sfc_flo
[dpdk-dev] [PATCH v2 07/14] net/sfc: add GENEVE in flow API filters support
From: Roman Zhukov Exact match of virtual network identifier is supported by parser. IP protocol match are enforced to UDP. Only Ethernet protocol type is supported. Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Ivan Malov Reviewed-by: Andy Moreton --- doc/guides/nics/sfc_efx.rst | 3 ++ drivers/net/sfc/sfc_flow.c | 80 +++-- 2 files changed, 81 insertions(+), 2 deletions(-) diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst index 05dacb3..943fe55 100644 --- a/doc/guides/nics/sfc_efx.rst +++ b/doc/guides/nics/sfc_efx.rst @@ -168,6 +168,9 @@ Supported pattern items: - VXLAN (exact match of VXLAN network identifier) +- GENEVE (exact match of virtual network identifier, only Ethernet (0x6558) + protocol type is supported) + - NVGRE (exact match of virtual subnet ID) Supported actions: diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c index 126ec9b..efdc664 100644 --- a/drivers/net/sfc/sfc_flow.c +++ b/drivers/net/sfc/sfc_flow.c @@ -58,6 +58,7 @@ static sfc_flow_item_parse sfc_flow_parse_ipv6; static sfc_flow_item_parse sfc_flow_parse_tcp; static sfc_flow_item_parse sfc_flow_parse_udp; static sfc_flow_item_parse sfc_flow_parse_vxlan; +static sfc_flow_item_parse sfc_flow_parse_geneve; static sfc_flow_item_parse sfc_flow_parse_nvgre; static boolean_t @@ -717,7 +718,7 @@ sfc_flow_set_match_flags_for_encap_pkts(const struct rte_flow_item *item, rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM, item, "Outer IP header protocol must be UDP " - "in VxLAN pattern"); + "in VxLAN/GENEVE pattern"); return -rte_errno; case EFX_IPPROTO_GRE: @@ -730,7 +731,7 @@ sfc_flow_set_match_flags_for_encap_pkts(const struct rte_flow_item *item, default: rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM, item, - "Only VxLAN/NVGRE tunneling patterns " + "Only VxLAN/GENEVE/NVGRE tunneling patterns " "are supported"); return -rte_errno; } @@ -832,6 +833,74 @@ sfc_flow_parse_vxlan(const struct rte_flow_item *item, } /** + * Convert GENEVE item to EFX filter specification. + * + * @param item[in] + * Item specification. Only Virtual Network Identifier and protocol type + * fields are supported. But protocol type can be only Ethernet (0x6558). + * If the mask is NULL, default mask will be used. + * Ranging is not supported. + * @param efx_spec[in, out] + * EFX filter specification to update. + * @param[out] error + * Perform verbose error reporting if not NULL. + */ +static int +sfc_flow_parse_geneve(const struct rte_flow_item *item, + efx_filter_spec_t *efx_spec, + struct rte_flow_error *error) +{ + int rc; + const struct rte_flow_item_geneve *spec = NULL; + const struct rte_flow_item_geneve *mask = NULL; + const struct rte_flow_item_geneve supp_mask = { + .protocol = RTE_BE16(0x), + .vni = { 0xff, 0xff, 0xff } + }; + + rc = sfc_flow_parse_init(item, +(const void **)&spec, +(const void **)&mask, +&supp_mask, +&rte_flow_item_geneve_mask, +sizeof(struct rte_flow_item_geneve), +error); + if (rc != 0) + return rc; + + rc = sfc_flow_set_match_flags_for_encap_pkts(item, efx_spec, +EFX_IPPROTO_UDP, error); + if (rc != 0) + return rc; + + efx_spec->efs_encap_type = EFX_TUNNEL_PROTOCOL_GENEVE; + efx_spec->efs_match_flags |= EFX_FILTER_MATCH_ENCAP_TYPE; + + if (spec == NULL) + return 0; + + if (mask->protocol == supp_mask.protocol) { + if (spec->protocol != rte_cpu_to_be_16(ETHER_TYPE_TEB)) { + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "GENEVE encap. protocol must be Ethernet " + "(0x6558) in the GENEVE pattern item"); + return -rte_errno; + } + } else if (mask->protocol != 0) { + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "Unsupported mask for GENEVE encap. protocol"); + return -rte_errno; + } + + rc = sfc_flow_set_efx_spec_vni_or_vsid(efx_spec, spec->vni, +
[dpdk-dev] [PATCH v2 10/14] net/sfc: multiply of specs with an unknown EtherType
From: Roman Zhukov Hardware filter specification for encapsulated traffic must contain EtherType. In terms of RTE flow API, this would require L3 item to be used in the flow rule. In the simplest case, if the user needs to filter encapsulated traffic without knowledge of exact EtherType, they will have to create multiple variants of the flow rule featuring all possible L3 items (IPv4, IPv6), respectively. In order to hide the gory details and avoid such a complication, this patch implements a mechanism to auto-complete the filter specifications if need be. Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Ivan Malov --- drivers/net/sfc/sfc_flow.c | 306 +++-- drivers/net/sfc/sfc_flow.h | 2 +- 2 files changed, 266 insertions(+), 42 deletions(-) diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c index a432936..244fcdb 100644 --- a/drivers/net/sfc/sfc_flow.c +++ b/drivers/net/sfc/sfc_flow.c @@ -64,6 +64,21 @@ static sfc_flow_item_parse sfc_flow_parse_vxlan; static sfc_flow_item_parse sfc_flow_parse_geneve; static sfc_flow_item_parse sfc_flow_parse_nvgre; +typedef int (sfc_flow_spec_set_vals)(struct sfc_flow_spec *spec, +unsigned int filters_count_for_one_val, +struct rte_flow_error *error); + +struct sfc_flow_copy_flag { + /* EFX filter specification match flag */ + efx_filter_match_flags_t flag; + /* Number of values of corresponding field */ + unsigned int vals_count; + /* Function to set values in specifications */ + sfc_flow_spec_set_vals *set_vals; +}; + +static sfc_flow_spec_set_vals sfc_flow_set_ethertypes; + static boolean_t sfc_flow_is_zero(const uint8_t *buf, unsigned int size) { @@ -244,16 +259,9 @@ sfc_flow_parse_eth(const struct rte_flow_item *item, if (rc != 0) return rc; - /* -* If "spec" is not set, could be any Ethernet, but for the inner frame -* type of destination MAC must be set -*/ - if (spec == NULL) { - if (is_ifrm) - goto fail_bad_ifrm_dst_mac; - else - return 0; - } + /* If "spec" is not set, could be any Ethernet */ + if (spec == NULL) + return 0; if (is_same_ether_addr(&mask->dst, &supp_mask.dst)) { efx_spec->efs_match_flags |= is_ifrm ? @@ -273,8 +281,6 @@ sfc_flow_parse_eth(const struct rte_flow_item *item, EFX_FILTER_MATCH_UNKNOWN_MCAST_DST; } else if (!is_zero_ether_addr(&mask->dst)) { goto fail_bad_mask; - } else if (is_ifrm) { - goto fail_bad_ifrm_dst_mac; } /* @@ -308,13 +314,6 @@ sfc_flow_parse_eth(const struct rte_flow_item *item, RTE_FLOW_ERROR_TYPE_ITEM, item, "Bad mask in the ETH pattern item"); return -rte_errno; - -fail_bad_ifrm_dst_mac: - rte_flow_error_set(error, EINVAL, - RTE_FLOW_ERROR_TYPE_ITEM, item, - "Type of destination MAC address in inner frame " - "must be set"); - return -rte_errno; } /** @@ -782,14 +781,9 @@ sfc_flow_set_match_flags_for_encap_pkts(const struct rte_flow_item *item, } } - if (!(efx_spec->efs_match_flags & EFX_FILTER_MATCH_ETHER_TYPE)) { - rte_flow_error_set(error, EINVAL, - RTE_FLOW_ERROR_TYPE_ITEM, item, - "Outer frame EtherType in pattern with tunneling " - "must be set"); - return -rte_errno; - } else if (efx_spec->efs_ether_type != EFX_ETHER_TYPE_IPV4 && - efx_spec->efs_ether_type != EFX_ETHER_TYPE_IPV6) { + if (efx_spec->efs_match_flags & EFX_FILTER_MATCH_ETHER_TYPE && + efx_spec->efs_ether_type != EFX_ETHER_TYPE_IPV4 && + efx_spec->efs_ether_type != EFX_ETHER_TYPE_IPV6) { rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM, item, "Outer frame EtherType in pattern with tunneling " @@ -1508,6 +1502,246 @@ sfc_flow_parse_actions(struct sfc_adapter *sa, return 0; } +/** + * Set the EFX_FILTER_MATCH_ETHER_TYPE match flag and EFX_ETHER_TYPE_IPV4 and + * EFX_ETHER_TYPE_IPV6 values of the corresponding field in the same + * specifications after copying. + * + * @param spec[in, out] + * SFC flow specification to update. + * @param filters_count_for_one_val[in] + * How many specifications should have the same EtherType value, what is the + * number of specifications before copying. + * @param error[out] + * Perform verbose error reporting if not NULL. + */ +static int +sfc_flow_set_ethertypes(struct sfc_flow_spec *spec, + unsig
[dpdk-dev] [PATCH v2 11/14] net/sfc: multiply of specs w/o inner frame destination MAC
From: Roman Zhukov Knowledge of a network identifier is not sufficient to construct a workable hardware filter for encapsulated traffic. It's obligatory to specify one of the match flags associated with inner frame destination MAC. If the address is unknown, then one needs to specify either unknown unicast or unknown multicast destination match flag. In terms of RTE flow API, this would require adding multiple flow rules with corresponding ETH items besides the tunnel item. In order to avoid such a complication, the patch implements a mechanism to auto-complete an underlying filter representation of a flow rule in order to create additional filter specififcations featuring the missing match flags. Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Ivan Malov --- drivers/net/sfc/sfc_flow.c | 114 - drivers/net/sfc/sfc_flow.h | 2 +- 2 files changed, 113 insertions(+), 3 deletions(-) diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c index 244fcdb..2d45827 100644 --- a/drivers/net/sfc/sfc_flow.c +++ b/drivers/net/sfc/sfc_flow.c @@ -68,6 +68,10 @@ typedef int (sfc_flow_spec_set_vals)(struct sfc_flow_spec *spec, unsigned int filters_count_for_one_val, struct rte_flow_error *error); +typedef boolean_t (sfc_flow_spec_check)(efx_filter_match_flags_t match, + efx_filter_spec_t *spec, + struct sfc_filter *filter); + struct sfc_flow_copy_flag { /* EFX filter specification match flag */ efx_filter_match_flags_t flag; @@ -75,9 +79,16 @@ struct sfc_flow_copy_flag { unsigned int vals_count; /* Function to set values in specifications */ sfc_flow_spec_set_vals *set_vals; + /* +* Function to check that the specification is suitable +* for adding this match flag +*/ + sfc_flow_spec_check *spec_check; }; static sfc_flow_spec_set_vals sfc_flow_set_ethertypes; +static sfc_flow_spec_set_vals sfc_flow_set_ifrm_unknown_dst_flags; +static sfc_flow_spec_check sfc_flow_check_ifrm_unknown_dst_flags; static boolean_t sfc_flow_is_zero(const uint8_t *buf, unsigned int size) @@ -1548,12 +1559,98 @@ sfc_flow_set_ethertypes(struct sfc_flow_spec *spec, return 0; } +/** + * Set the EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST and + * EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST match flags in the same + * specifications after copying. + * + * @param spec[in, out] + * SFC flow specification to update. + * @param filters_count_for_one_val[in] + * How many specifications should have the same match flag, what is the + * number of specifications before copying. + * @param error[out] + * Perform verbose error reporting if not NULL. + */ +static int +sfc_flow_set_ifrm_unknown_dst_flags(struct sfc_flow_spec *spec, + unsigned int filters_count_for_one_val, + struct rte_flow_error *error) +{ + unsigned int i; + static const efx_filter_match_flags_t vals[] = { + EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST, + EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST + }; + + if (filters_count_for_one_val * RTE_DIM(vals) != spec->count) { + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "Number of specifications is incorrect while copying " + "by inner frame unknown destination flags"); + return -rte_errno; + } + + for (i = 0; i < spec->count; i++) { + /* The check above ensures that divisor can't be zero here */ + spec->filters[i].efs_match_flags |= + vals[i / filters_count_for_one_val]; + } + + return 0; +} + +/** + * Check that the following conditions are met: + * - the specification corresponds to a filter for encapsulated traffic + * - the list of supported filters has a filter + * with EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST flag instead of + * EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST, since this filter will also + * be inserted. + * + * @param match[in] + * The match flags of filter. + * @param spec[in] + * Specification to be supplemented. + * @param filter[in] + * SFC filter with list of supported filters. + */ +static boolean_t +sfc_flow_check_ifrm_unknown_dst_flags(efx_filter_match_flags_t match, + efx_filter_spec_t *spec, + struct sfc_filter *filter) +{ + unsigned int i; + efx_tunnel_protocol_t encap_type = spec->efs_encap_type; + efx_filter_match_flags_t match_mcast_dst; + + if (encap_type == EFX_TUNNEL_PROTOCOL_NONE) + return B_FALSE; + + match_mcast_dst = + (match & ~EFX_FI
[dpdk-dev] [PATCH v2 04/14] net/sfc/base: distinguish filters for encapsulated packets
From: Roman Zhukov Add filter match flag to distinguish filters applied only to encapsulated packets. Match flags set should allow to determine whether a filter is supported or not. The problem is that if specification has supported set outer match flags and specified encapsulation without any inner flags, check says that it is supported, and filter insertion is performed. However, there is no filtering of the encapsulated traffic. A new flag is added to solve this problem and separate the filters for the encapsulated packets. Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Andy Moreton Reviewed-by: Mark Spender --- drivers/net/sfc/base/ef10_filter.c | 19 +-- drivers/net/sfc/base/efx.h | 5 + drivers/net/sfc/base/efx_filter.c | 3 ++- 3 files changed, 24 insertions(+), 3 deletions(-) diff --git a/drivers/net/sfc/base/ef10_filter.c b/drivers/net/sfc/base/ef10_filter.c index e93dc13..a627cce 100644 --- a/drivers/net/sfc/base/ef10_filter.c +++ b/drivers/net/sfc/base/ef10_filter.c @@ -174,6 +174,7 @@ efx_mcdi_filter_op_add( efx_mcdi_req_t req; uint8_t payload[MAX(MC_CMD_FILTER_OP_EXT_IN_LEN, MC_CMD_FILTER_OP_EXT_OUT_LEN)]; + efx_filter_match_flags_t match_flags; efx_rc_t rc; memset(payload, 0, sizeof (payload)); @@ -183,6 +184,12 @@ efx_mcdi_filter_op_add( req.emr_out_buf = payload; req.emr_out_length = MC_CMD_FILTER_OP_EXT_OUT_LEN; + /* +* Remove match flag for encapsulated filters that does not correspond +* to the MCDI match flags +*/ + match_flags = spec->efs_match_flags & ~EFX_FILTER_MATCH_ENCAP_TYPE; + switch (filter_op) { case MC_CMD_FILTER_OP_IN_OP_REPLACE: MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_HANDLE_LO, @@ -203,7 +210,7 @@ efx_mcdi_filter_op_add( MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_PORT_ID, EVB_PORT_ID_ASSIGNED); MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_MATCH_FIELDS, - spec->efs_match_flags); + match_flags); MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_RX_DEST, MC_CMD_FILTER_OP_EXT_IN_RX_DEST_HOST); MCDI_IN_SET_DWORD(req, FILTER_OP_EXT_IN_RX_QUEUE, @@ -1008,13 +1015,17 @@ ef10_filter_supported_filters( EFX_FILTER_MATCH_IFRM_LOC_MAC | EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST | EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST | + EFX_FILTER_MATCH_ENCAP_TYPE | EFX_FILTER_MATCH_UNKNOWN_MCAST_DST | EFX_FILTER_MATCH_UNKNOWN_UCAST_DST); /* * Two calls to MC_CMD_GET_PARSER_DISP_INFO are needed: one to get the * list of supported filters for ordinary packets, and then another to -* get the list of supported filters for encapsulated packets. +* get the list of supported filters for encapsulated packets. To +* distinguish the second list from the first, the +* EFX_FILTER_MATCH_ENCAP_TYPE flag is added to each filter for +* encapsulated packets. */ rc = efx_mcdi_get_parser_disp_info(enp, buffer, buffer_length, B_FALSE, &mcdi_list_length); @@ -1042,6 +1053,10 @@ ef10_filter_supported_filters( no_space = B_TRUE; else goto fail2; + } else { + for (i = next_buf_idx; + i < next_buf_idx + mcdi_encap_list_length; i++) + buffer[i] |= EFX_FILTER_MATCH_ENCAP_TYPE; } } else { mcdi_encap_list_length = 0; diff --git a/drivers/net/sfc/base/efx.h b/drivers/net/sfc/base/efx.h index e2f49ec..bb903e5 100644 --- a/drivers/net/sfc/base/efx.h +++ b/drivers/net/sfc/base/efx.h @@ -2485,6 +2485,11 @@ typedef uint8_t efx_filter_flags_t; #defineEFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST 0x0100 /* For encapsulated packets, match all unicast inner frames */ #defineEFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST 0x0200 +/* + * Match by encap type, this flag does not correspond to + * the MCDI match flags and any unoccupied value may be used + */ +#defineEFX_FILTER_MATCH_ENCAP_TYPE 0x2000 /* Match otherwise-unmatched multicast and broadcast packets */ #defineEFX_FILTER_MATCH_UNKNOWN_MCAST_DST 0x4000 /* Match otherwise-unmatched unicast packets */ diff --git a/drivers/net/sfc/base/efx_filter.c b/drivers/net/sfc/base/efx_filter.c index 2e6628b..97c972c 100644 --- a/drivers/net/sfc/base/efx_filter.c +++ b/drivers/net/sfc/base/efx_filter.c @@ -418,7 +418,7 @@ efx_filter_spec_set_encap_type( __inefx_tunnel_protocol_t encap_type, __inefx_filter_inner_frame_match_t inner_frame_match) { - uint32_t match_flags = 0; + uint32_t match_flags = EFX_FILTER_MATCH_ENCAP_TYPE;
[dpdk-dev] [PATCH v2 02/14] net/sfc/base: support VNI/VSID and inner frame local MAC
From: Roman Zhukov This supports VNI/VSID and inner frame local MAC fields to match in VXLAN, GENEVE, or NVGRE packets. Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Andy Moreton --- drivers/net/sfc/base/ef10_filter.c | 18 ++ drivers/net/sfc/base/efx.h | 8 2 files changed, 26 insertions(+) diff --git a/drivers/net/sfc/base/ef10_filter.c b/drivers/net/sfc/base/ef10_filter.c index 8a6bc61..e93dc13 100644 --- a/drivers/net/sfc/base/ef10_filter.c +++ b/drivers/net/sfc/base/ef10_filter.c @@ -119,6 +119,10 @@ ef10_filter_init( MATCH_MASK(MC_CMD_FILTER_OP_EXT_IN_MATCH_OUTER_VLAN)); EFX_STATIC_ASSERT(EFX_FILTER_MATCH_IP_PROTO == MATCH_MASK(MC_CMD_FILTER_OP_EXT_IN_MATCH_IP_PROTO)); + EFX_STATIC_ASSERT(EFX_FILTER_MATCH_VNI_OR_VSID == + MATCH_MASK(MC_CMD_FILTER_OP_EXT_IN_MATCH_VNI_OR_VSID)); + EFX_STATIC_ASSERT(EFX_FILTER_MATCH_IFRM_LOC_MAC == + MATCH_MASK(MC_CMD_FILTER_OP_EXT_IN_MATCH_IFRM_DST_MAC)); EFX_STATIC_ASSERT(EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST == MATCH_MASK(MC_CMD_FILTER_OP_EXT_IN_MATCH_IFRM_UNKNOWN_MCAST_DST)); EFX_STATIC_ASSERT(EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST == @@ -292,6 +296,12 @@ efx_mcdi_filter_op_add( rc = EINVAL; goto fail2; } + + memcpy(MCDI_IN2(req, uint8_t, FILTER_OP_EXT_IN_VNI_OR_VSID), + spec->efs_vni_or_vsid, EFX_VNI_OR_VSID_LEN); + + memcpy(MCDI_IN2(req, uint8_t, FILTER_OP_EXT_IN_IFRM_DST_MAC), + spec->efs_ifrm_loc_mac, EFX_MAC_ADDR_LEN); } efx_mcdi_execute(enp, &req); @@ -415,6 +425,12 @@ ef10_filter_equal( return (B_FALSE); if (left->efs_encap_type != right->efs_encap_type) return (B_FALSE); + if (memcmp(left->efs_vni_or_vsid, right->efs_vni_or_vsid, + EFX_VNI_OR_VSID_LEN)) + return (B_FALSE); + if (memcmp(left->efs_ifrm_loc_mac, right->efs_ifrm_loc_mac, + EFX_MAC_ADDR_LEN)) + return (B_FALSE); return (B_TRUE); @@ -988,6 +1004,8 @@ ef10_filter_supported_filters( EFX_FILTER_MATCH_LOC_MAC | EFX_FILTER_MATCH_LOC_PORT | EFX_FILTER_MATCH_ETHER_TYPE | EFX_FILTER_MATCH_INNER_VID | EFX_FILTER_MATCH_OUTER_VID | EFX_FILTER_MATCH_IP_PROTO | + EFX_FILTER_MATCH_VNI_OR_VSID | + EFX_FILTER_MATCH_IFRM_LOC_MAC | EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST | EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST | EFX_FILTER_MATCH_UNKNOWN_MCAST_DST | diff --git a/drivers/net/sfc/base/efx.h b/drivers/net/sfc/base/efx.h index 088a896..8380d0a 100644 --- a/drivers/net/sfc/base/efx.h +++ b/drivers/net/sfc/base/efx.h @@ -454,6 +454,8 @@ typedef enum efx_link_mode_e { #defineEFX_MAC_ADDR_LEN 6 +#defineEFX_VNI_OR_VSID_LEN 3 + #defineEFX_MAC_ADDR_IS_MULTICAST(_address) (((uint8_t *)_address)[0] & 0x01) #defineEFX_MAC_MULTICAST_LIST_MAX 256 @@ -2475,6 +2477,10 @@ typedef uint8_t efx_filter_flags_t; #defineEFX_FILTER_MATCH_OUTER_VID 0x0100 /* Match by IP transport protocol */ #defineEFX_FILTER_MATCH_IP_PROTO 0x0200 +/* Match by VNI or VSID */ +#defineEFX_FILTER_MATCH_VNI_OR_VSID0x0800 +/* For encapsulated packets, match by inner frame local MAC address */ +#defineEFX_FILTER_MATCH_IFRM_LOC_MAC 0x0001 /* For encapsulated packets, match all multicast inner frames */ #defineEFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST 0x0100 /* For encapsulated packets, match all unicast inner frames */ @@ -2521,6 +2527,8 @@ typedef struct efx_filter_spec_s { uint16_tefs_rem_port; efx_oword_t efs_rem_host; efx_oword_t efs_loc_host; + uint8_t efs_vni_or_vsid[EFX_VNI_OR_VSID_LEN]; + uint8_t efs_ifrm_loc_mac[EFX_MAC_ADDR_LEN]; } efx_filter_spec_t; -- 2.7.4
[dpdk-dev] [PATCH v2 00/14] net/sfc: support flow API for tunnels
Update base driver and the PMD itself to support flow API patterns for tunnels: VXLAN, NVGRE and Geneve. Applicable to SFN8xxx NICs with full-feature firmware variant running. Andrew Rybchenko (1): doc: add net/sfc flow API support for tunnels Roman Zhukov (12): net/sfc/base: support filters for encapsulated packets net/sfc/base: support VNI/VSID and inner frame local MAC net/sfc/base: distinguish filters for encapsulated packets net/sfc: add VXLAN in flow API filters support net/sfc: add NVGRE in flow API filters support net/sfc: add GENEVE in flow API filters support net/sfc: add inner frame ETH in flow API filters support net/sfc: add infrastructure to make many filters from flow net/sfc: multiply of specs with an unknown EtherType net/sfc: multiply of specs w/o inner frame destination MAC net/sfc: multiply of specs with an unknown destination MAC net/sfc: avoid creation of ineffective flow rules Vijay Srivastava (1): net/sfc/base: support VXLAN filter creation doc/guides/nics/sfc_efx.rst| 28 +- doc/guides/rel_notes/release_18_05.rst |6 + drivers/net/sfc/base/ef10_filter.c | 100 +++- drivers/net/sfc/base/efx.h | 20 + drivers/net/sfc/base/efx_filter.c | 39 +- drivers/net/sfc/sfc_flow.c | 1001 ++-- drivers/net/sfc/sfc_flow.h | 19 +- 7 files changed, 1161 insertions(+), 52 deletions(-) -- 2.7.4
[dpdk-dev] [PATCH v2 01/14] net/sfc/base: support filters for encapsulated packets
From: Roman Zhukov This adds filters for encapsulated packets to the list returned by ef10_filter_supported_filters(). Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Andy Moreton --- v2: - fix assertion drivers/net/sfc/base/ef10_filter.c | 65 -- 1 file changed, 55 insertions(+), 10 deletions(-) diff --git a/drivers/net/sfc/base/ef10_filter.c b/drivers/net/sfc/base/ef10_filter.c index 2b7a09c..8a6bc61 100644 --- a/drivers/net/sfc/base/ef10_filter.c +++ b/drivers/net/sfc/base/ef10_filter.c @@ -895,6 +895,7 @@ efx_mcdi_get_parser_disp_info( __inefx_nic_t *enp, __out_ecount(buffer_length) uint32_t *buffer, __insize_t buffer_length, + __inboolean_t encap, __out size_t *list_lengthp) { efx_mcdi_req_t req; @@ -911,7 +912,8 @@ efx_mcdi_get_parser_disp_info( req.emr_out_buf = payload; req.emr_out_length = MC_CMD_GET_PARSER_DISP_INFO_OUT_LENMAX; - MCDI_IN_SET_DWORD(req, GET_PARSER_DISP_INFO_OUT_OP, + MCDI_IN_SET_DWORD(req, GET_PARSER_DISP_INFO_OUT_OP, encap ? + MC_CMD_GET_PARSER_DISP_INFO_IN_OP_GET_SUPPORTED_ENCAP_RX_MATCHES : MC_CMD_GET_PARSER_DISP_INFO_IN_OP_GET_SUPPORTED_RX_MATCHES); efx_mcdi_execute(enp, &req); @@ -971,28 +973,66 @@ ef10_filter_supported_filters( __insize_t buffer_length, __out size_t *list_lengthp) { - + efx_nic_cfg_t *encp = &(enp->en_nic_cfg); size_t mcdi_list_length; + size_t mcdi_encap_list_length; size_t list_length; uint32_t i; + uint32_t next_buf_idx; + size_t next_buf_length; efx_rc_t rc; + boolean_t no_space = B_FALSE; efx_filter_match_flags_t all_filter_flags = (EFX_FILTER_MATCH_REM_HOST | EFX_FILTER_MATCH_LOC_HOST | EFX_FILTER_MATCH_REM_MAC | EFX_FILTER_MATCH_REM_PORT | EFX_FILTER_MATCH_LOC_MAC | EFX_FILTER_MATCH_LOC_PORT | EFX_FILTER_MATCH_ETHER_TYPE | EFX_FILTER_MATCH_INNER_VID | EFX_FILTER_MATCH_OUTER_VID | EFX_FILTER_MATCH_IP_PROTO | + EFX_FILTER_MATCH_IFRM_UNKNOWN_MCAST_DST | + EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST | EFX_FILTER_MATCH_UNKNOWN_MCAST_DST | EFX_FILTER_MATCH_UNKNOWN_UCAST_DST); - rc = efx_mcdi_get_parser_disp_info(enp, buffer, buffer_length, - &mcdi_list_length); + /* +* Two calls to MC_CMD_GET_PARSER_DISP_INFO are needed: one to get the +* list of supported filters for ordinary packets, and then another to +* get the list of supported filters for encapsulated packets. +*/ + rc = efx_mcdi_get_parser_disp_info(enp, buffer, buffer_length, B_FALSE, + &mcdi_list_length); if (rc != 0) { - if (rc == ENOSPC) { - /* Pass through mcdi_list_length for the list length */ - *list_lengthp = mcdi_list_length; + if (rc == ENOSPC) + no_space = B_TRUE; + else + goto fail1; + } + + if (no_space) { + next_buf_idx = 0; + next_buf_length = 0; + } else { + EFSYS_ASSERT(mcdi_list_length <= buffer_length); + next_buf_idx = mcdi_list_length; + next_buf_length = buffer_length - mcdi_list_length; + } + + if (encp->enc_tunnel_encapsulations_supported != 0) { + rc = efx_mcdi_get_parser_disp_info(enp, &buffer[next_buf_idx], + next_buf_length, B_TRUE, &mcdi_encap_list_length); + if (rc != 0) { + if (rc == ENOSPC) + no_space = B_TRUE; + else + goto fail2; } - goto fail1; + } else { + mcdi_encap_list_length = 0; + } + + if (no_space) { + *list_lengthp = mcdi_list_length + mcdi_encap_list_length; + rc = ENOSPC; + goto fail3; } /* @@ -1005,9 +1045,10 @@ ef10_filter_supported_filters( * of the matches is preserved as they are ordered from highest to * lowest priority. */ - EFSYS_ASSERT(mcdi_list_length <= buffer_length); + EFSYS_ASSERT(mcdi_list_length + mcdi_encap_list_length <= + buffer_length); list_length = 0; - for (i = 0; i < mcdi_list_length; i++) { + for (i = 0; i < mcdi_list_length + mcdi_encap_list_length; i++) { if ((buffer[i] & ~all_filter_flags) == 0) { buffer[list_length] = buffer[i]; list_length++; @@ -1018,6 +1059,10 @@
[dpdk-dev] [PATCH v2 03/14] net/sfc/base: support VXLAN filter creation
From: Vijay Srivastava Signed-off-by: Vijay Srivastava Signed-off-by: Andrew Rybchenko --- drivers/net/sfc/base/efx.h| 7 +++ drivers/net/sfc/base/efx_filter.c | 36 2 files changed, 43 insertions(+) diff --git a/drivers/net/sfc/base/efx.h b/drivers/net/sfc/base/efx.h index 8380d0a..e2f49ec 100644 --- a/drivers/net/sfc/base/efx.h +++ b/drivers/net/sfc/base/efx.h @@ -2624,6 +2624,13 @@ efx_filter_spec_set_encap_type( __inefx_tunnel_protocol_t encap_type, __inefx_filter_inner_frame_match_t inner_frame_match); +extern __checkReturn efx_rc_t +efx_filter_spec_set_vxlan_full( + __inout efx_filter_spec_t *spec, + __inconst uint8_t *vxlan_id, + __inconst uint8_t *inner_addr, + __inconst uint8_t *outer_addr); + #if EFSYS_OPT_RX_SCALE extern __checkReturn efx_rc_t efx_filter_spec_set_rss_context( diff --git a/drivers/net/sfc/base/efx_filter.c b/drivers/net/sfc/base/efx_filter.c index 8705369..2e6628b 100644 --- a/drivers/net/sfc/base/efx_filter.c +++ b/drivers/net/sfc/base/efx_filter.c @@ -468,6 +468,42 @@ efx_filter_spec_set_encap_type( return (rc); } +/* + * Specify inner and outer Ethernet address and VXLAN ID in filter + * specification. + */ + __checkReturn efx_rc_t +efx_filter_spec_set_vxlan_full( + __inout efx_filter_spec_t *spec, + __inconst uint8_t *vxlan_id, + __inconst uint8_t *inner_addr, + __inconst uint8_t *outer_addr) +{ + EFSYS_ASSERT3P(spec, !=, NULL); + EFSYS_ASSERT3P(vxlan_id, !=, NULL); + EFSYS_ASSERT3P(inner_addr, !=, NULL); + EFSYS_ASSERT3P(outer_addr, !=, NULL); + + if ((inner_addr == NULL) && (outer_addr == NULL)) + return (EINVAL); + + if (vxlan_id != NULL) { + spec->efs_match_flags |= EFX_FILTER_MATCH_VNI_OR_VSID; + memcpy(spec->efs_vni_or_vsid, vxlan_id, EFX_VNI_OR_VSID_LEN); + } + if (outer_addr != NULL) { + spec->efs_match_flags |= EFX_FILTER_MATCH_LOC_MAC; + memcpy(spec->efs_loc_mac, outer_addr, EFX_MAC_ADDR_LEN); + } + if (inner_addr != NULL) { + spec->efs_match_flags |= EFX_FILTER_MATCH_IFRM_LOC_MAC; + memcpy(spec->efs_ifrm_loc_mac, inner_addr, EFX_MAC_ADDR_LEN); + } + spec->efs_encap_type = EFX_TUNNEL_PROTOCOL_VXLAN; + + return (0); +} + #if EFSYS_OPT_RX_SCALE __checkReturn efx_rc_t efx_filter_spec_set_rss_context( -- 2.7.4
[dpdk-dev] [PATCH v2 06/14] net/sfc: add NVGRE in flow API filters support
From: Roman Zhukov Exact match of virtual subnet ID is supported by parser. IP protocol match are enforced to GRE. Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Ivan Malov Reviewed-by: Andy Moreton --- doc/guides/nics/sfc_efx.rst | 2 ++ drivers/net/sfc/sfc_flow.c | 68 - 2 files changed, 69 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst index 5a4b2a6..05dacb3 100644 --- a/doc/guides/nics/sfc_efx.rst +++ b/doc/guides/nics/sfc_efx.rst @@ -168,6 +168,8 @@ Supported pattern items: - VXLAN (exact match of VXLAN network identifier) +- NVGRE (exact match of virtual subnet ID) + Supported actions: - VOID diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c index 20ba69d..126ec9b 100644 --- a/drivers/net/sfc/sfc_flow.c +++ b/drivers/net/sfc/sfc_flow.c @@ -58,6 +58,7 @@ static sfc_flow_item_parse sfc_flow_parse_ipv6; static sfc_flow_item_parse sfc_flow_parse_tcp; static sfc_flow_item_parse sfc_flow_parse_udp; static sfc_flow_item_parse sfc_flow_parse_vxlan; +static sfc_flow_item_parse sfc_flow_parse_nvgre; static boolean_t sfc_flow_is_zero(const uint8_t *buf, unsigned int size) @@ -719,10 +720,17 @@ sfc_flow_set_match_flags_for_encap_pkts(const struct rte_flow_item *item, "in VxLAN pattern"); return -rte_errno; + case EFX_IPPROTO_GRE: + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_ITEM, item, + "Outer IP header protocol must be GRE " + "in NVGRE pattern"); + return -rte_errno; + default: rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM, item, - "Only VxLAN tunneling patterns " + "Only VxLAN/NVGRE tunneling patterns " "are supported"); return -rte_errno; } @@ -823,6 +831,57 @@ sfc_flow_parse_vxlan(const struct rte_flow_item *item, return rc; } +/** + * Convert NVGRE item to EFX filter specification. + * + * @param item[in] + * Item specification. Only virtual subnet ID field is supported. + * If the mask is NULL, default mask will be used. + * Ranging is not supported. + * @param efx_spec[in, out] + * EFX filter specification to update. + * @param[out] error + * Perform verbose error reporting if not NULL. + */ +static int +sfc_flow_parse_nvgre(const struct rte_flow_item *item, +efx_filter_spec_t *efx_spec, +struct rte_flow_error *error) +{ + int rc; + const struct rte_flow_item_nvgre *spec = NULL; + const struct rte_flow_item_nvgre *mask = NULL; + const struct rte_flow_item_nvgre supp_mask = { + .tni = { 0xff, 0xff, 0xff } + }; + + rc = sfc_flow_parse_init(item, +(const void **)&spec, +(const void **)&mask, +&supp_mask, +&rte_flow_item_nvgre_mask, +sizeof(struct rte_flow_item_nvgre), +error); + if (rc != 0) + return rc; + + rc = sfc_flow_set_match_flags_for_encap_pkts(item, efx_spec, +EFX_IPPROTO_GRE, error); + if (rc != 0) + return rc; + + efx_spec->efs_encap_type = EFX_TUNNEL_PROTOCOL_NVGRE; + efx_spec->efs_match_flags |= EFX_FILTER_MATCH_ENCAP_TYPE; + + if (spec == NULL) + return 0; + + rc = sfc_flow_set_efx_spec_vni_or_vsid(efx_spec, spec->tni, + mask->tni, item, error); + + return rc; +} + static const struct sfc_flow_item sfc_flow_items[] = { { .type = RTE_FLOW_ITEM_TYPE_VOID, @@ -872,6 +931,12 @@ static const struct sfc_flow_item sfc_flow_items[] = { .layer = SFC_FLOW_ITEM_START_LAYER, .parse = sfc_flow_parse_vxlan, }, + { + .type = RTE_FLOW_ITEM_TYPE_NVGRE, + .prev_layer = SFC_FLOW_ITEM_L3, + .layer = SFC_FLOW_ITEM_START_LAYER, + .parse = sfc_flow_parse_nvgre, + }, }; /* @@ -980,6 +1045,7 @@ sfc_flow_parse_pattern(const struct rte_flow_item pattern[], break; case RTE_FLOW_ITEM_TYPE_VXLAN: + case RTE_FLOW_ITEM_TYPE_NVGRE: if (is_ifrm) { rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ITEM, -- 2.7.4
[dpdk-dev] [PATCH v2 09/14] net/sfc: add infrastructure to make many filters from flow
From: Roman Zhukov Not all flow rules can be expressed in one hardware filter, so some flow rules have to be expressed in terms of multiple hardware filters. This patch provides a means to produce a filter spec template from the flow rule which then can be used to produce a set of fully elaborated specs to be inserted. Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Ivan Malov --- drivers/net/sfc/sfc_flow.c | 118 - drivers/net/sfc/sfc_flow.h | 19 +++- 2 files changed, 114 insertions(+), 23 deletions(-) diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c index c942a36..a432936 100644 --- a/drivers/net/sfc/sfc_flow.c +++ b/drivers/net/sfc/sfc_flow.c @@ -25,10 +25,13 @@ /* * At now flow API is implemented in such a manner that each - * flow rule is converted to a hardware filter. + * flow rule is converted to one or more hardware filters. * All elements of flow rule (attributes, pattern items, actions) * correspond to one or more fields in the efx_filter_spec_s structure * that is responsible for the hardware filter. + * If some required field is unset in the flow rule, then a handful + * of filter copies will be created to cover all possible values + * of such a field. */ enum sfc_flow_item_layers { @@ -1095,8 +1098,8 @@ sfc_flow_parse_attr(const struct rte_flow_attr *attr, return -rte_errno; } - flow->spec.efs_flags |= EFX_FILTER_FLAG_RX; - flow->spec.efs_rss_context = EFX_RSS_CONTEXT_DEFAULT; + flow->spec.template.efs_flags |= EFX_FILTER_FLAG_RX; + flow->spec.template.efs_rss_context = EFX_RSS_CONTEXT_DEFAULT; return 0; } @@ -1187,7 +1190,7 @@ sfc_flow_parse_pattern(const struct rte_flow_item pattern[], break; } - rc = item->parse(pattern, &flow->spec, error); + rc = item->parse(pattern, &flow->spec.template, error); if (rc != 0) return rc; @@ -1209,7 +1212,7 @@ sfc_flow_parse_queue(struct sfc_adapter *sa, return -EINVAL; rxq = sa->rxq_info[queue->index].rxq; - flow->spec.efs_dmaq_id = (uint16_t)rxq->hw_index; + flow->spec.template.efs_dmaq_id = (uint16_t)rxq->hw_index; return 0; } @@ -1285,13 +1288,57 @@ sfc_flow_parse_rss(struct sfc_adapter *sa, #endif /* EFSYS_OPT_RX_SCALE */ static int +sfc_flow_spec_flush(struct sfc_adapter *sa, struct sfc_flow_spec *spec, + unsigned int filters_count) +{ + unsigned int i; + int ret = 0; + + for (i = 0; i < filters_count; i++) { + int rc; + + rc = efx_filter_remove(sa->nic, &spec->filters[i]); + if (ret == 0 && rc != 0) { + sfc_err(sa, "failed to remove filter specification " + "(rc = %d)", rc); + ret = rc; + } + } + + return ret; +} + +static int +sfc_flow_spec_insert(struct sfc_adapter *sa, struct sfc_flow_spec *spec) +{ + unsigned int i; + int rc = 0; + + for (i = 0; i < spec->count; i++) { + rc = efx_filter_insert(sa->nic, &spec->filters[i]); + if (rc != 0) { + sfc_flow_spec_flush(sa, spec, i); + break; + } + } + + return rc; +} + +static int +sfc_flow_spec_remove(struct sfc_adapter *sa, struct sfc_flow_spec *spec) +{ + return sfc_flow_spec_flush(sa, spec, spec->count); +} + +static int sfc_flow_filter_insert(struct sfc_adapter *sa, struct rte_flow *flow) { - efx_filter_spec_t *spec = &flow->spec; - #if EFSYS_OPT_RX_SCALE struct sfc_flow_rss *rss = &flow->rss_conf; + uint32_t efs_rss_context = EFX_RSS_CONTEXT_DEFAULT; + unsigned int i; int rc = 0; if (flow->rss) { @@ -1302,27 +1349,38 @@ sfc_flow_filter_insert(struct sfc_adapter *sa, rc = efx_rx_scale_context_alloc(sa->nic, EFX_RX_SCALE_EXCLUSIVE, rss_spread, - &spec->efs_rss_context); + &efs_rss_context); if (rc != 0) goto fail_scale_context_alloc; - rc = efx_rx_scale_mode_set(sa->nic, spec->efs_rss_context, + rc = efx_rx_scale_mode_set(sa->nic, efs_rss_context, EFX_RX_HASHALG_TOEPLITZ, rss->rss_hash_types, B_TRUE); if (rc != 0) goto fail_scale_mode_set; - rc = efx_rx_scale_key_set(sa->nic, spec->efs_rss_context, + rc = efx_rx_scale_key_set(sa->nic, efs_rss_context,
[dpdk-dev] [PATCH v2 08/14] net/sfc: add inner frame ETH in flow API filters support
From: Roman Zhukov Support destination MAC address match in inner frames. Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Ivan Malov Reviewed-by: Andy Moreton --- doc/guides/nics/sfc_efx.rst | 4 ++- drivers/net/sfc/sfc_flow.c | 73 +++-- 2 files changed, 61 insertions(+), 16 deletions(-) diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst index 943fe55..539ce90 100644 --- a/doc/guides/nics/sfc_efx.rst +++ b/doc/guides/nics/sfc_efx.rst @@ -152,7 +152,9 @@ Supported pattern items: - VOID - ETH (exact match of source/destination addresses, individual/group match - of destination address, EtherType) + of destination address, EtherType in the outer frame and exact match of + destination addresses, individual/group match of destination address in + the inner frame) - VLAN (exact match of VID, double-tagging is supported) diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c index efdc664..c942a36 100644 --- a/drivers/net/sfc/sfc_flow.c +++ b/drivers/net/sfc/sfc_flow.c @@ -187,11 +187,11 @@ sfc_flow_parse_void(__rte_unused const struct rte_flow_item *item, * Convert Ethernet item to EFX filter specification. * * @param item[in] - * Item specification. Only source and destination addresses and - * Ethernet type fields are supported. In addition to full and - * empty masks of destination address, individual/group mask is - * also supported. If the mask is NULL, default mask will be used. - * Ranging is not supported. + * Item specification. Outer frame specification may only comprise + * source/destination addresses and Ethertype field. + * Inner frame specification may contain destination address only. + * There is support for individual/group mask as well as for empty and full. + * If the mask is NULL, default mask will be used. Ranging is not supported. * @param efx_spec[in, out] * EFX filter specification to update. * @param[out] error @@ -210,40 +210,75 @@ sfc_flow_parse_eth(const struct rte_flow_item *item, .src.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, .type = 0x, }; + const struct rte_flow_item_eth ifrm_supp_mask = { + .dst.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + }; const uint8_t ig_mask[EFX_MAC_ADDR_LEN] = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00 }; + const struct rte_flow_item_eth *supp_mask_p; + const struct rte_flow_item_eth *def_mask_p; + uint8_t *loc_mac = NULL; + boolean_t is_ifrm = (efx_spec->efs_encap_type != + EFX_TUNNEL_PROTOCOL_NONE); + + if (is_ifrm) { + supp_mask_p = &ifrm_supp_mask; + def_mask_p = &ifrm_supp_mask; + loc_mac = efx_spec->efs_ifrm_loc_mac; + } else { + supp_mask_p = &supp_mask; + def_mask_p = &rte_flow_item_eth_mask; + loc_mac = efx_spec->efs_loc_mac; + } rc = sfc_flow_parse_init(item, (const void **)&spec, (const void **)&mask, -&supp_mask, -&rte_flow_item_eth_mask, +supp_mask_p, def_mask_p, sizeof(struct rte_flow_item_eth), error); if (rc != 0) return rc; - /* If "spec" is not set, could be any Ethernet */ - if (spec == NULL) - return 0; + /* +* If "spec" is not set, could be any Ethernet, but for the inner frame +* type of destination MAC must be set +*/ + if (spec == NULL) { + if (is_ifrm) + goto fail_bad_ifrm_dst_mac; + else + return 0; + } if (is_same_ether_addr(&mask->dst, &supp_mask.dst)) { - efx_spec->efs_match_flags |= EFX_FILTER_MATCH_LOC_MAC; - rte_memcpy(efx_spec->efs_loc_mac, spec->dst.addr_bytes, + efx_spec->efs_match_flags |= is_ifrm ? + EFX_FILTER_MATCH_IFRM_LOC_MAC : + EFX_FILTER_MATCH_LOC_MAC; + rte_memcpy(loc_mac, spec->dst.addr_bytes, EFX_MAC_ADDR_LEN); } else if (memcmp(mask->dst.addr_bytes, ig_mask, EFX_MAC_ADDR_LEN) == 0) { if (is_unicast_ether_addr(&spec->dst)) - efx_spec->efs_match_flags |= + efx_spec->efs_match_flags |= is_ifrm ? + EFX_FILTER_MATCH_IFRM_UNKNOWN_UCAST_DST : EFX_FILTER_MATCH_UNKNOWN_UCAST_DST; else - efx_spec->efs_match_flags |= + efx_spec->efs_match_flags |= is_ifrm ? +
[dpdk-dev] [PATCH v2 14/14] doc: add net/sfc flow API support for tunnels
Signed-off-by: Andrew Rybchenko --- doc/guides/rel_notes/release_18_05.rst | 6 ++ 1 file changed, 6 insertions(+) diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst index 3923dc2..894f636 100644 --- a/doc/guides/rel_notes/release_18_05.rst +++ b/doc/guides/rel_notes/release_18_05.rst @@ -41,6 +41,12 @@ New Features Also, make sure to start the actual text at the margin. = +* **Updated Solarflare network PMD.** + + Updated the sfc_efx driver including the following changes: + + * Added support for NVGRE, VXLAN and GENEVE filters in flow API. + API Changes --- -- 2.7.4
[dpdk-dev] [PATCH v2 12/14] net/sfc: multiply of specs with an unknown destination MAC
From: Roman Zhukov To filter all traffic, need to create two hardware filter specifications with both unknown unicast and unknown multicast destination MAC address match flags. In terms of RTE flow API, this would require adding multiple flow rules with corresponding ETH items. In order to avoid such a complication, the patch implements a mechanism to auto-complete an underlying filter representation of a flow rule in order to create additional filter specififcations featuring the missing match flags. Signed-off-by: Roman Zhukov Signed-off-by: Andrew Rybchenko Reviewed-by: Ivan Malov --- drivers/net/sfc/sfc_flow.c | 91 +- drivers/net/sfc/sfc_flow.h | 2 +- 2 files changed, 91 insertions(+), 2 deletions(-) diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c index 2d45827..7b26653 100644 --- a/drivers/net/sfc/sfc_flow.c +++ b/drivers/net/sfc/sfc_flow.c @@ -86,6 +86,8 @@ struct sfc_flow_copy_flag { sfc_flow_spec_check *spec_check; }; +static sfc_flow_spec_set_vals sfc_flow_set_unknown_dst_flags; +static sfc_flow_spec_check sfc_flow_check_unknown_dst_flags; static sfc_flow_spec_set_vals sfc_flow_set_ethertypes; static sfc_flow_spec_set_vals sfc_flow_set_ifrm_unknown_dst_flags; static sfc_flow_spec_check sfc_flow_check_ifrm_unknown_dst_flags; @@ -1514,6 +1516,80 @@ sfc_flow_parse_actions(struct sfc_adapter *sa, } /** + * Set the EFX_FILTER_MATCH_UNKNOWN_UCAST_DST + * and EFX_FILTER_MATCH_UNKNOWN_MCAST_DST match flags in the same + * specifications after copying. + * + * @param spec[in, out] + * SFC flow specification to update. + * @param filters_count_for_one_val[in] + * How many specifications should have the same match flag, what is the + * number of specifications before copying. + * @param error[out] + * Perform verbose error reporting if not NULL. + */ +static int +sfc_flow_set_unknown_dst_flags(struct sfc_flow_spec *spec, + unsigned int filters_count_for_one_val, + struct rte_flow_error *error) +{ + unsigned int i; + static const efx_filter_match_flags_t vals[] = { + EFX_FILTER_MATCH_UNKNOWN_UCAST_DST, + EFX_FILTER_MATCH_UNKNOWN_MCAST_DST + }; + + if (filters_count_for_one_val * RTE_DIM(vals) != spec->count) { + rte_flow_error_set(error, EINVAL, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "Number of specifications is incorrect while copying " + "by unknown destination flags"); + return -rte_errno; + } + + for (i = 0; i < spec->count; i++) { + /* The check above ensures that divisor can't be zero here */ + spec->filters[i].efs_match_flags |= + vals[i / filters_count_for_one_val]; + } + + return 0; +} + +/** + * Check that the following conditions are met: + * - the list of supported filters has a filter + * with EFX_FILTER_MATCH_UNKNOWN_MCAST_DST flag instead of + * EFX_FILTER_MATCH_UNKNOWN_UCAST_DST, since this filter will also + * be inserted. + * + * @param match[in] + * The match flags of filter. + * @param spec[in] + * Specification to be supplemented. + * @param filter[in] + * SFC filter with list of supported filters. + */ +static boolean_t +sfc_flow_check_unknown_dst_flags(efx_filter_match_flags_t match, +__rte_unused efx_filter_spec_t *spec, +struct sfc_filter *filter) +{ + unsigned int i; + efx_filter_match_flags_t match_mcast_dst; + + match_mcast_dst = + (match & ~EFX_FILTER_MATCH_UNKNOWN_UCAST_DST) | + EFX_FILTER_MATCH_UNKNOWN_MCAST_DST; + for (i = 0; i < filter->supported_match_num; i++) { + if (match_mcast_dst == filter->supported_match[i]) + return B_TRUE; + } + + return B_FALSE; +} + +/** * Set the EFX_FILTER_MATCH_ETHER_TYPE match flag and EFX_ETHER_TYPE_IPV4 and * EFX_ETHER_TYPE_IPV6 values of the corresponding field in the same * specifications after copying. @@ -1638,9 +1714,22 @@ sfc_flow_check_ifrm_unknown_dst_flags(efx_filter_match_flags_t match, return B_FALSE; } -/* Match flags that can be automatically added to filters */ +/* + * Match flags that can be automatically added to filters. + * Selecting the last minimum when searching for the copy flag ensures that the + * EFX_FILTER_MATCH_UNKNOWN_UCAST_DST flag has a higher priority than + * EFX_FILTER_MATCH_ETHER_TYPE. This is because the filter + * EFX_FILTER_MATCH_UNKNOWN_UCAST_DST is at the end of the list of supported + * filters. + */ static const struct sfc_flow_copy_flag sfc_flow_copy_flags[] = { { + .flag = EFX_FILTER_MATCH_UNKNOWN_UCAST_DST, + .vals_count = 2, + .set_vals = sfc_flow_set_unknown_dst_flags, +
[dpdk-dev] [RFC PATCH] net/bonding: add rte flow support
Ethernet devices which are grouped by bonding PMD, aka slaves, are sharing the same queues and RSS configurations and their Rx burst functions must be managed by the bonding PMD according to the bonding architecture. So, it makes sense to configure the same flow rules for all the bond slaves to allow consistency in packet flow management. Add rte flow support to the bonding PMD. Signed-off-by: Matan Azrad --- Implementation details: Allow rte flow next operations: validate, create, destroy, flush, query, isolate. Validate: Validation will pass only if all the existed slaves validations will pass. Create: Create the flow in all slaves. Save all the slaves created flows objects in bonding internal flow structure. Save each flow configuration to be able to configure them for each new slave. Failure in flow creation for existed slave will reject the flow. Failure in flow creation for new slaves in slave adding time will reject the slave. Return the bonding flow structure pointer to the application. Destroy: Destroy the flow in all slaves and release the internal flow memory. Flush: Destroy all the bonding PMD flows in all the slaves (calling to slaves flush will destroy all the slave flows which may include another flows from application or the bond internal LACP flow). Query: Return the query result of the bonding primary slave.(alternatively we can sum all the query data for COUNT action and return -ENOTSUP for another queries). Isolate: Call to flow isolate for all slaves. isolate mode will be configured for new slaves too(will reject the slave in failure case). * This implementation allows to application to configure flows directly to the slaves and to manage another rte flows set. * The recommendation is to use rte flow by the bonding PMD and not directly by the slaves PMDs (for example: calling to flow flush of the slave directly may hurt LACP mechanism). You can look on the code below to see more details. Thoughts? drivers/net/bonding/Makefile | 1 + drivers/net/bonding/rte_eth_bond_api.c | 61 + drivers/net/bonding/rte_eth_bond_flow.c| 206 + drivers/net/bonding/rte_eth_bond_pmd.c | 28 +++- drivers/net/bonding/rte_eth_bond_private.h | 19 +++ 5 files changed, 312 insertions(+), 3 deletions(-) create mode 100644 drivers/net/bonding/rte_eth_bond_flow.c diff --git a/drivers/net/bonding/Makefile b/drivers/net/bonding/Makefile index 4a6633e..acad16a 100644 --- a/drivers/net/bonding/Makefile +++ b/drivers/net/bonding/Makefile @@ -27,6 +27,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += rte_eth_bond_pmd.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += rte_eth_bond_args.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += rte_eth_bond_8023ad.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += rte_eth_bond_alb.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += rte_eth_bond_flow.c # # Export include files diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c index f854b73..350b46e 100644 --- a/drivers/net/bonding/rte_eth_bond_api.c +++ b/drivers/net/bonding/rte_eth_bond_api.c @@ -223,6 +223,47 @@ } static int +slave_rte_flow_prepare(uint16_t slave_id, struct bond_dev_private *internals) +{ + struct rte_flow *flow; + struct rte_flow_error ferror; + uint16_t slave_port_id = internals->slaves[slave_id].port_id; + + if (internals->flow_isolated_valid != 0) { + if (rte_flow_isolate(slave_port_id, internals->flow_isolated, + &ferror)) { + RTE_BOND_LOG(ERR, "rte_flow_isolate failed for slave" +" %d: %s", slave_id, ferror.message ? +ferror.message : "(no stated reason)"); + return -1; + } + } + TAILQ_FOREACH(flow, &internals->flow_list, next) { + flow->flows[slave_id] = rte_flow_create(slave_port_id, + &flow->fd->attr, + flow->fd->items, + flow->fd->actions, + &ferror); + if (flow->flows[slave_id] == NULL) { + RTE_BOND_LOG(ERR, "Cannot create flow for slave" +" %d: %s", slave_id, +ferror.message ? ferror.message : +"(no stated reason)"); + /* Destroy successful bond flows from the slave */ + TAILQ_FOREACH(flow, &internals->flow_list, next) { + if (flow->flows[slave_id] != NULL) { + rte_flow_destroy(slave_port_id, flow, +&ferror); + flow->flows[slave_id] = NUL
Re: [dpdk-dev] [dpdk-stable] [PATCH] net/bonding: avoid wrong casting on primary_slave_port_id from input param
On 3/6/2018 11:51 AM, Ferruh Yigit wrote: > On 3/6/2018 9:37 AM, Gowrishankar wrote: >> From: Gowrishankar Muthukrishnan >> >> primary_slave_port_id is uint16_t which needs to be correctly stored >> with the same data type of input parameter in bond_ethdev_configure. >> >> Fixes: f8244c6399 ("ethdev: increase port id range") >> Cc: sta...@dpdk.org >> >> Signed-off-by: Gowrishankar Muthukrishnan > > Acked-by: Ferruh Yigit Applied to dpdk-next-net/master, thanks.
Re: [dpdk-dev] [PATCH] vhost: stop device before updating public vring data
Hi Tomasz, On 03/05/2018 05:11 PM, Tomasz Kulasek wrote: For now DPDK assumes that callfd, kickfd and last_idx are being set just once during vring initialization and device cannot be running while DPDK receives SET_VRING_KICK, SET_VRING_CALL and SET_VRING_BASE messages. However, that assumption is wrong. For Vhost SCSI messages might arrive at any point of time, possibly multiple times, one after another. QEMU issues SET_VRING_CALL once during device initialization, then again during device start. The second message will close previous callfd, which is still being used by the user-implementation of vhost device. This results in writing to invalid (closed) callfd. Other messages like SET_FEATURES, SET_VRING_ADDR etc also will change internal state of VQ or device. To prevent race condition device should also be stopped before updateing vring data. Signed-off-by: Dariusz Stojaczyk Signed-off-by: Pawel Wodkowski Signed-off-by: Tomasz Kulasek --- lib/librte_vhost/vhost_user.c | 40 1 file changed, 40 insertions(+) In last release, we have introduced a per-virtqueue lock to protect vring handling against asynchronous device changes. I think that would solve the issue you are facing, but you would need to export the VQs locking functions to the vhost-user lib API to be able to use it. I don't think your current patch is the right solution anyway, because it destroys the device in case we don't want it to remain alive, like set_log_base, or set_features when only the logging feature gets enabled. Cheers, Maxime
[dpdk-dev] [PATCH] app/testpmd: print Rx/Tx offload values
It is not clear which per port offloads are enabled. Printing offloads values at forwarding start. CRC strip offload value was printed in more verbose manner, it is removed since Rx/Tx offload values covers it and printing only CRC one can cause confusion. Hexadecimal offloads values are not very user friendly but preferred to not create to much noise during forwarding start. Signed-off-by: Ferruh Yigit --- Cc: Shahaf Shuler --- app/test-pmd/config.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 4bb255c62..47845d0cb 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -1682,10 +1682,9 @@ rxtx_config_display(void) struct rte_eth_txconf *tx_conf = &ports[pid].tx_conf; printf(" port %d:\n", (unsigned int)pid); - printf(" CRC stripping %s\n", - (ports[pid].dev_conf.rxmode.offloads & -DEV_RX_OFFLOAD_CRC_STRIP) ? - "enabled" : "disabled"); + printf(" Rx offloads=0x%"PRIx64" Tx Offloads=0x%"PRIx64"\n", + ports[pid].dev_conf.rxmode.offloads, + ports[pid].dev_conf.txmode.offloads); printf(" RX queues=%d - RX desc=%d - RX free threshold=%d\n", nb_rxq, nb_rxd, rx_conf->rx_free_thresh); printf(" RX threshold registers: pthresh=%d hthresh=%d " -- 2.13.6
Re: [dpdk-dev] [PATCH v1] net/tap: allow user MAC to be passed as args
On 2/12/2018 2:44 PM, Vipin Varghese wrote: > Allow TAP PMD to pass user desired MAC address as argument. > The argument value is processed as string delimited by ':', > is parsed and converted to HEX MAC address after validation. > > Signed-off-by: Vipin Varghese > Signed-off-by: Pascal Mazon <...> > @@ -1589,7 +1630,7 @@ enum ioctl_mode { > int speed; > char tap_name[RTE_ETH_NAME_MAX_LEN]; > char remote_iface[RTE_ETH_NAME_MAX_LEN]; > - int fixed_mac_type = 0; > + struct ether_addr user_mac; > > name = rte_vdev_device_name(dev); > params = rte_vdev_device_args(dev); > @@ -1626,7 +1667,7 @@ enum ioctl_mode { > ret = rte_kvargs_process(kvlist, >ETH_TAP_MAC_ARG, >&set_mac_type, > - &fixed_mac_type); > + &user_mac); > if (ret == -1) > goto leave; > } > @@ -1637,7 +1678,7 @@ enum ioctl_mode { > RTE_LOG(NOTICE, PMD, "Initializing pmd_tap for %s as %s\n", > name, tap_name); > > - ret = eth_dev_tap_create(dev, tap_name, remote_iface, fixed_mac_type); > + ret = eth_dev_tap_create(dev, tap_name, remote_iface, &user_mac); "user_mac" without initial value is leading error when no "mac" argument is provided. It should be zeroed out.
Re: [dpdk-dev] 16.11.5 (LTS) patches review and test
On Tue, 2018-03-06 at 11:07 +0530, gowrishankar muthukrishnan wrote: > On Monday 05 March 2018 03:42 PM, Luca Boccassi wrote: > > On Mon, 2018-03-05 at 11:31 +0530, gowrishankar muthukrishnan > > wrote: > > > Hi Luca, > > > In powerpc to support i40e, we wish below patch be merged: > > > > > > c3def6a8724 net/i40e: implement vector PMD for altivec > > > > > > I have verified br-16.11 with the above commit (in cherry-pick, I > > > needed > > > to remove release > > > notes which was meant for 17.05 release which hope is fine here). > > > Could you please merge the above. > > > > > > Thanks, > > > Gowrishankar > > > > Hi, > > > > This introduced a new PMD for that architecture, right? > > > > If so I can merge the patch, at the following conditions: > > > > 1) It will be disabled by default > > 2) Support and help in backporting will have to be provided by the > > authors for the remaining lifetime of 16.11 > > > > Is this OK for you? > > Yes, please go ahead. > > Thanks, > Gowrishankar Applied and pushed to dpdk-stable/16.11. > > > On Monday 26 February 2018 05:04 PM, Luca Boccassi wrote: > > > > Hi all, > > > > > > > > Here is a list of patches targeted for LTS release 16.11.5. > > > > Please > > > > help review and test. The planned date for the final release is > > > > March > > > > the 5th, pending results from regression tests. > > > > Before that, please shout if anyone has objections with these > > > > patches being applied. > > > > > > > > These patches are located at branch 16.11 of dpdk-stable repo: > > > > http://dpdk.org/browse/dpdk-stable/ > > > > > > > > Thanks. > > > > > > > > Luca Boccassi > > > > > > > > --- > > > > Ajit Khaparde (6): > > > > net/bnxt: support new PCI IDs > > > > net/bnxt: parse checksum offload flags > > > > net/bnxt: fix group info usage > > > > net/bnxt: fix broadcast cofiguration > > > > net/bnxt: fix size of Tx ring in HW > > > > net/bnxt: fix link speed setting with autoneg off > > > > > > > > Akhil Goyal (1): > > > > examples/ipsec-secgw: fix corner case for SPI value > > > > > > > > Alejandro Lucero (3): > > > > net/nfp: fix MTU settings > > > > net/nfp: fix jumbo settings > > > > net/nfp: fix CRC strip check behaviour > > > > > > > > Anatoly Burakov (14): > > > > memzone: fix leak on allocation error > > > > malloc: protect stats with lock > > > > malloc: fix end for bounded elements > > > > vfio: fix enabled check on error > > > > app/procinfo: add compilation option in config > > > > test: register test as failed if setup failed > > > > test/table: fix uninitialized parameter > > > > test/memzone: fix wrong test > > > > test/memzone: handle previously allocated memzones > > > > usertools/devbind: remove unused function > > > > test/reorder: fix memory leak > > > > test/ring_perf: fix memory leak > > > > test/table: fix memory leak > > > > test/timer_perf: fix memory leak > > > > > > > > Andriy Berestovskyy (1): > > > > keepalive: fix state alignment > > > > > > > > Bao-Long Tran (1): > > > > examples/ip_pipeline: fix timer period unit > > > > > > > > Beilei Xing (8): > > > > net/i40e: fix flow director Rx resource defect > > > > net/i40e: add warnings when writing global registers > > > > net/i40e: add debug logs when writing global registers > > > > net/i40e: fix multiple driver support issue > > > > net/i40e: fix interrupt conflict when using multi- > > > > driver > > > > net/i40e: fix Rx interrupt > > > > net/i40e: check multi-driver option parsing > > > > app/testpmd: fix flow director filter > > > > > > > > Chas Williams (1): > > > > net/bonding: fix setting slave MAC addresses > > > > > > > > David Harton (1): > > > > net/i40e: fix VF reset stats crash > > > > > > > > Didier Pallard (1): > > > > net/virtio: fix incorrect cast > > > > > > > > Dustin Lundquist (1): > > > > examples/exception_path: align stats on cache line > > > > > > > > Erez Ferber (1): > > > > net/mlx5: fix MTU update > > > > > > > > Ferruh Yigit (1): > > > > kni: fix build with kernel 4.15 > > > > > > > > Fiona Trahe (1): > > > > crypto/qat: fix null auth algo overwrite > > > > > > > > Gowrishankar Muthukrishnan (2): > > > > eal/ppc: remove the braces in memory barrier macros > > > > eal/ppc: support sPAPR IOMMU for vfio-pci > > > > > > > > Harish Patil (2): > > > > net/qede: fix to reject config with no Rx queue > > > > net/qede/base: fix VF LRO tunnel configuration > > > > > > > > Hemant Agrawal (4): > > > > pmdinfogen: fix cross compilation for ARM big endian > > > > lpm: fix ARM big endian build > > > > net/i40e: fix ARM big endian build > > > > net/ixgbe: fix A
Re: [dpdk-dev] [PATCH v1 1/2] net/octeontx: fix null pointer dereference
On 2/20/2018 5:14 PM, Santosh Shukla wrote: > Fixes: f18b146c498d ("net/octeontx: create ethdev ports") > Coverity issue: 195040 > > Cc: sta...@dpdk.org > Signed-off-by: Santosh Shukla Series applied to dpdk-next-net/master, thanks. BTW, what is the plan to switching new offloading API in PMD? This release it is planned to remove support for old API.
[dpdk-dev] [PATCH] eal: register rte_panic user callback
The use case addressed here is dpdk environment init aborting the process due to panic, preventing the calling process from running its own tear-down actions. A preferred, though ABI breaking solution would be to have the environment init always return a value rather than abort upon distress. This patch defines a couple of callback registration functions, one for panic and one for exit in case one wishes to distinguish between these events. Once a callback is set and panic takes place, it will be called prior to calling abort. Maiden voyage patch for Qwilt and myself. Signed-off-by: Arnon Warshavsky --- lib/librte_eal/bsdapp/eal/eal_debug.c | 37 ++ lib/librte_eal/common/include/rte_debug.h | 24 +++ lib/librte_eal/linuxapp/eal/eal_debug.c | 38 +++ lib/librte_eal/rte_eal_version.map| 2 ++ 4 files changed, 101 insertions(+) diff --git a/lib/librte_eal/bsdapp/eal/eal_debug.c b/lib/librte_eal/bsdapp/eal/eal_debug.c index 5d92500..010859d 100644 --- a/lib/librte_eal/bsdapp/eal/eal_debug.c +++ b/lib/librte_eal/bsdapp/eal/eal_debug.c @@ -18,6 +18,39 @@ #define BACKTRACE_SIZE 256 +/* + * user function pointers that when assigned, gets to be called + * during ret_exit() + */ +static rte_user_abort_callback_t *exit_user_callback; + +/* + * user function pointers that when assigned, gets to be called + * during ret_panic() + */ +static rte_user_abort_callback_t *panic_user_callback; + +/** + * Register user callback function to be called during rte_panic() + * Deregisteration is by passing NULL as the parameter + */ +void __rte_experimental +rte_panic_user_callback_register(rte_user_abort_callback_t *cb) +{ + panic_user_callback = cb; +} + +/** + * Register user callback function to be called during rte_exit() + * Deregisteration is by passing NULL as the parameter + */ +void __rte_experimental +rte_exit_user_callback_register(rte_user_abort_callback_t *cb) +{ + exit_user_callback = cb; +} + + /* dump the stack of the calling core */ void rte_dump_stack(void) { @@ -59,6 +92,8 @@ void __rte_panic(const char *funcname, const char *format, ...) va_end(ap); rte_dump_stack(); rte_dump_registers(); + if (panic_user_callback) + (*panic_user_callback)(); abort(); } @@ -78,6 +113,8 @@ rte_exit(int exit_code, const char *format, ...) va_start(ap, format); rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap); va_end(ap); + if (exit_user_callback) + (*exit_user_callback)(); #ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR if (rte_eal_cleanup() != 0) diff --git a/lib/librte_eal/common/include/rte_debug.h b/lib/librte_eal/common/include/rte_debug.h index 272df49..7e3d0a2 100644 --- a/lib/librte_eal/common/include/rte_debug.h +++ b/lib/librte_eal/common/include/rte_debug.h @@ -16,11 +16,35 @@ #include "rte_log.h" #include "rte_branch_prediction.h" +#include #ifdef __cplusplus extern "C" { #endif + +/* + * Definition of user function pointer type to be called during + * the execution of rte_panic + */ + +typedef void (*rte_user_abort_callback_t)(void); +/**< @internal Ethernet device configuration. */ + +/** + * Register user callback function to be called during rte_panic() + * Deregisteration is by passing NULL as the parameter + */ +void __rte_experimental +rte_panic_user_callback_register(rte_user_abort_callback_t *cb); + +/** + * Register user callback function to be called during rte_exit() + * Deregisteration is by passing NULL as the parameter + */ +void __rte_experimental +rte_exit_user_callback_register(rte_user_abort_callback_t *cb); + /** * Dump the stack of the calling core to the console. */ diff --git a/lib/librte_eal/linuxapp/eal/eal_debug.c b/lib/librte_eal/linuxapp/eal/eal_debug.c index 5d92500..b1748b8 100644 --- a/lib/librte_eal/linuxapp/eal/eal_debug.c +++ b/lib/librte_eal/linuxapp/eal/eal_debug.c @@ -16,8 +16,42 @@ #include #include + #define BACKTRACE_SIZE 256 +/* + * user function pointers that when assigned, gets to be called + * during ret_exit() + */ +static rte_user_abort_callback_t *exit_user_callback; + +/* + * user function pointers that when assigned, gets to be called + * during ret_panic() + */ +static rte_user_abort_callback_t *panic_user_callback; + +/** + * Register user callback function to be called during rte_panic() + * Deregisteration is by passing NULL as the parameter + */ +void __rte_experimental +rte_panic_user_callback_register(rte_user_abort_callback_t *cb) +{ + panic_user_callback = cb; +} + +/** + * Register user callback function to be called during rte_exit() + * Deregisteration is by passing NULL as the parameter + */ +void __rte_experimental +rte_exit_user_callback_register(rte_user_abort_callback_t *cb) +{ + exit_user_callback = cb; +} + + /* dump the stack of the calling core */ void rte_dump_stack(void) { @@ -59,6 +93,8 @
Re: [dpdk-dev] [PATCH 0/5] remove void pointer explicit cast
On 2/26/2018 8:10 AM, Zhiyong Yang wrote: > The patch series cleanup void pointer explicit cast related to > struct rte_flow_item fields in librte_flow_classify and make > code more readable. > > Zhiyong Yang (5): > flow_classify: remove void pointer cast > net/ixgbe: remove void pointer cast > net/e1000: remove void pointer cast > net/bnxt: remove void pointer cast > net/sfc: remove void pointer cast Series applied to dpdk-next-net/master, thanks.
[dpdk-dev] [PATCH] net/bnxt: switch to the new offload API
Update bnxt PMD to new ethdev offloads API. Signed-off-by: Ajit Khaparde --- drivers/net/bnxt/bnxt_ethdev.c | 59 +- 1 file changed, 41 insertions(+), 18 deletions(-) diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c index 21c46f833..cca4ef40c 100644 --- a/drivers/net/bnxt/bnxt_ethdev.c +++ b/drivers/net/bnxt/bnxt_ethdev.c @@ -146,6 +146,27 @@ static const struct rte_pci_id bnxt_pci_id_map[] = { ETH_RSS_NONFRAG_IPV6_TCP | \ ETH_RSS_NONFRAG_IPV6_UDP) +#define BNXT_DEV_TX_OFFLOAD_SUPPORT (DEV_TX_OFFLOAD_VLAN_INSERT | \ +DEV_TX_OFFLOAD_IPV4_CKSUM | \ +DEV_TX_OFFLOAD_TCP_CKSUM | \ +DEV_TX_OFFLOAD_UDP_CKSUM | \ +DEV_TX_OFFLOAD_TCP_TSO | \ +DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM | \ +DEV_TX_OFFLOAD_VXLAN_TNL_TSO | \ +DEV_TX_OFFLOAD_GRE_TNL_TSO | \ +DEV_TX_OFFLOAD_IPIP_TNL_TSO | \ +DEV_TX_OFFLOAD_GENEVE_TNL_TSO | \ +DEV_TX_OFFLOAD_MULTI_SEGS) + +#define BNXT_DEV_RX_OFFLOAD_SUPPORT (DEV_RX_OFFLOAD_VLAN_FILTER | \ +DEV_RX_OFFLOAD_VLAN_STRIP | \ +DEV_RX_OFFLOAD_IPV4_CKSUM | \ +DEV_RX_OFFLOAD_UDP_CKSUM | \ +DEV_RX_OFFLOAD_TCP_CKSUM | \ +DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM | \ +DEV_RX_OFFLOAD_JUMBO_FRAME | \ +DEV_RX_OFFLOAD_CRC_STRIP) + static int bnxt_vlan_offload_set_op(struct rte_eth_dev *dev, int mask); static void bnxt_print_link_info(struct rte_eth_dev *eth_dev); @@ -430,21 +451,14 @@ static void bnxt_dev_info_get_op(struct rte_eth_dev *eth_dev, dev_info->min_rx_bufsize = 1; dev_info->max_rx_pktlen = BNXT_MAX_MTU + ETHER_HDR_LEN + ETHER_CRC_LEN + VLAN_TAG_SIZE; - dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP | - DEV_RX_OFFLOAD_IPV4_CKSUM | - DEV_RX_OFFLOAD_UDP_CKSUM | - DEV_RX_OFFLOAD_TCP_CKSUM | - DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM; - dev_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT | - DEV_TX_OFFLOAD_IPV4_CKSUM | - DEV_TX_OFFLOAD_TCP_CKSUM | - DEV_TX_OFFLOAD_UDP_CKSUM | - DEV_TX_OFFLOAD_TCP_TSO | - DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM | - DEV_TX_OFFLOAD_VXLAN_TNL_TSO | - DEV_TX_OFFLOAD_GRE_TNL_TSO | - DEV_TX_OFFLOAD_IPIP_TNL_TSO | - DEV_TX_OFFLOAD_GENEVE_TNL_TSO; + + dev_info->rx_queue_offload_capa = BNXT_DEV_RX_OFFLOAD_SUPPORT; + dev_info->rx_offload_capa = BNXT_DEV_RX_OFFLOAD_SUPPORT; + if (bp->flags & BNXT_FLAG_PTP_SUPPORTED) + dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_TIMESTAMP; + dev_info->tx_queue_offload_capa = BNXT_DEV_TX_OFFLOAD_SUPPORT; + dev_info->tx_offload_capa = BNXT_DEV_TX_OFFLOAD_SUPPORT; + dev_info->flow_type_rss_offloads = BNXT_ETH_RSS_SUPPORT; /* *INDENT-OFF* */ dev_info->default_rxconf = (struct rte_eth_rxconf) { @@ -454,7 +468,8 @@ static void bnxt_dev_info_get_op(struct rte_eth_dev *eth_dev, .wthresh = 0, }, .rx_free_thresh = 32, - .rx_drop_en = 0, + /* If no descriptors available, pkts are dropped by default */ + .rx_drop_en = 1, }; dev_info->default_txconf = (struct rte_eth_txconf) { @@ -465,8 +480,6 @@ static void bnxt_dev_info_get_op(struct rte_eth_dev *eth_dev, }, .tx_free_thresh = 32, .tx_rs_thresh = 32, - .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | -ETH_TXQ_FLAGS_NOOFFLOADS, }; eth_dev->data->dev_conf.intr_conf.lsc = 1; @@ -510,6 +523,16 @@ static void bnxt_dev_info_get_op(struct rte_eth_dev *eth_dev, static int bnxt_dev_configure_op(struct rte_eth_dev *eth_dev) { struct bnxt *bp = (struct bnxt *)eth_dev->data->dev_private; + uint64_t tx_offloads = eth_dev->data->dev_conf.txmode.offloads; + uint64_t rx_offloads = eth_dev->data->dev_conf.rxmode.offloads; + + if (tx_offloads != BNXT_DEV_TX_OFFLOAD_SUPPORT) + PMD_DRV_LO
Re: [dpdk-dev] [PATCH] compressdev: implement API
On 3/5/2018 9:32 AM, Verma, Shally wrote: > >> -Original Message- >> From: Ahmed Mansour [mailto:ahmed.mans...@nxp.com] >> Sent: 03 March 2018 01:19 >> To: Trahe, Fiona ; Verma, Shally >> ; dev@dpdk.org >> Cc: De Lara Guarch, Pablo ; Athreya, >> Narayana Prasad ; >> Gupta, Ashish ; Sahu, Sunila >> ; Challa, Mahipal >> ; Jain, Deepak K ; >> Hemant Agrawal ; Roy >> Pledge ; Youri Querry >> Subject: Re: [dpdk-dev] [PATCH] compressdev: implement API >> >> On 3/2/2018 4:53 AM, Trahe, Fiona wrote: On 3/1/2018 9:41 AM, Trahe, Fiona wrote: > Hi Shally > > //snip// >> [Shally] This looks better to me. So it mean app would always call >> xform_init() for stateless and attach an >> updated priv_xform to ops (depending upon if there's shareable or not). >> So it does not need to have >> NULL pointer on priv_xform. right? >> > [Fiona] yes. The PMD must return a valid priv_xform pointer. [Ahmed] What I understood is that the xform_init will be called once initially. if the @flag returned is NONE_SHAREABLE then the application must not attach two inflight ops to the same @priv_xform? Otherwise the application can attach many ops in flight to the @priv_xform? >>> [Fiona Yes. App calls the xform_init() once on a device where it plans to >>> send stateless ops. >>> If PMD returns shareable, then it doesn't need to call again and can attach >>> this to every stateless op going to that device. >>> If PMD returns SINGLE_OP then it must call xform_init() before every other >>> stateless op it wants to have inflight simultaneously. This does not mean >>> it must be called before every op, >>> but probably will set up a batch of priv_xforms - it can reuse each >>> priv_xform once the op finishes with it. >> [Ahmed] @Shally Can this complexity of managing the NONE_SHAREABLE mode >> be pushed into the PMD? A flexible stockpile can be kept and maintained >> by the PMD and it can be increased or decreased based on >> low-water/high-water thresholds > [Shally] It is doable to manage within PMD but need to do hands on to > evaluate effectiveness. So far, we have never exercised this way and left it > to application to attach different session (or stream) to op for maximum > performance gain. So, I would say, may it be ok to have flag feature in first > place and deprecate later, if it not required?! Or just have API without any > flag option and add a feature flag to indicate PMD support for > SHAREABLE/NON-SHAREABLE xform_priv handle?! [Ahmed] Either way looks ok to me. I see your point about performance. If this is in the PMD it will have to constantly guess how much memory the user needs and accommodate dynamically. The user can implement a similar scheme or if the application is simple they can pre-allocate and reduce CPU allocation de-allocation overhead.
[dpdk-dev] [PATCH v3] net/null:Different mac address support
After attaching two Null device to ovs, seeing "00.00.00.00.00.00" mac address for both null devices. Fix this issue, by setting different mac address. Signed-off-by: Mallesh Koujalagi --- drivers/net/null/rte_eth_null.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c index 9385ffd..42e3a77 100644 --- a/drivers/net/null/rte_eth_null.c +++ b/drivers/net/null/rte_eth_null.c @@ -73,6 +73,7 @@ struct pmd_internals { struct null_queue rx_null_queues[RTE_MAX_QUEUES_PER_PORT]; struct null_queue tx_null_queues[RTE_MAX_QUEUES_PER_PORT]; + struct ether_addr eth_addr; /** Bit mask of RSS offloads, the bit offset also means flow type */ uint64_t flow_type_rss_offloads; @@ -84,9 +85,6 @@ struct pmd_internals { uint8_t rss_key[40];/**< 40-byte hash key. */ }; - - -static struct ether_addr eth_addr = { .addr_bytes = {0} }; static struct rte_eth_link pmd_link = { .link_speed = ETH_SPEED_NUM_10G, .link_duplex = ETH_LINK_FULL_DUPLEX, @@ -519,7 +517,6 @@ eth_dev_null_create(struct rte_vdev_device *dev, rte_free(data); return -ENOMEM; } - /* now put it all together * - store queue data in internals, * - store numa_node info in ethdev data @@ -533,6 +530,7 @@ eth_dev_null_create(struct rte_vdev_device *dev, internals->packet_size = packet_size; internals->packet_copy = packet_copy; internals->port_id = eth_dev->data->port_id; + eth_random_addr(internals->eth_addr.addr_bytes); internals->flow_type_rss_offloads = ETH_RSS_PROTO_MASK; internals->reta_size = RTE_DIM(internals->reta_conf) * RTE_RETA_GROUP_SIZE; @@ -543,7 +541,7 @@ eth_dev_null_create(struct rte_vdev_device *dev, data->nb_rx_queues = (uint16_t)nb_rx_queues; data->nb_tx_queues = (uint16_t)nb_tx_queues; data->dev_link = pmd_link; - data->mac_addrs = ð_addr; + data->mac_addrs = &internals->eth_addr; eth_dev->data = data; eth_dev->dev_ops = &ops; -- 2.7.4
[dpdk-dev] Fwd: PMD for Broadcom/Emulex OCe14000 OCP Skyhawk-R
Hi all, Is PMD for Broadcom/Emulex OCe14000 OCP Skyhawk-R available? There are a few documents in Broadcom's site. But could not find the source code of it. I believe 6Wind team developed the PMD for Broadcom. But what is the status of it? Is it freely available? Tried to get some help from users alias, but could not. Could someone please help me with info on this? Thanks, -Sujith
Re: [dpdk-dev] [RFC 4/4] drivers/raw/ifpga_rawdev: Rawdev for Intel FPGA Device, it's a PCI Driver of FPGA Device Manager
-Original Message- From: Shreyansh Jain [mailto:shreyansh.j...@nxp.com] Sent: Tuesday, March 6, 2018 2:48 PM To: Xu, Rosen Cc: dev@dpdk.org; Doherty, Declan ; Zhang, Tianfei Subject: Re: [dpdk-dev] [RFC 4/4] drivers/raw/ifpga_rawdev: Rawdev for Intel FPGA Device, it's a PCI Driver of FPGA Device Manager On Tue, Mar 6, 2018 at 7:13 AM, Rosen Xu wrote: > Signed-off-by: Rosen Xu > --- > drivers/raw/ifpga_rawdev/Makefile | 59 > drivers/raw/ifpga_rawdev/ifpga_rawdev.c| 343 > + > drivers/raw/ifpga_rawdev/ifpga_rawdev.h| 109 +++ > drivers/raw/ifpga_rawdev/ifpga_rawdev_example.c| 121 When rawdev skeleton driver was integrated, Thomas raised this point of naming 'skeleton_rawdev' rather than just 'skeleton'. So, rather than 'ifpga_rawdev' rather than 'ifpga'. At that time I thought we could use as model. But, frankly, to me it seems a bad choice now. Extra '_rawdev' doesn't serve any purpose here. So, feel free to change your naming to a more appropriate "drivers/raw/ifpga/" or "drivers/raw/ifpga_sample" etc. Probably I too can change the skeleton_rawdev to skeleton. > .../ifpga_rawdev/rte_pmd_ifpga_rawdev_version.map | 4 + > 5 files changed, 636 insertions(+) > create mode 100644 drivers/raw/ifpga_rawdev/Makefile create mode > 100644 drivers/raw/ifpga_rawdev/ifpga_rawdev.c > create mode 100644 drivers/raw/ifpga_rawdev/ifpga_rawdev.h > create mode 100644 drivers/raw/ifpga_rawdev/ifpga_rawdev_example.c > create mode 100644 > drivers/raw/ifpga_rawdev/rte_pmd_ifpga_rawdev_version.map > > diff --git a/drivers/raw/ifpga_rawdev/Makefile > b/drivers/raw/ifpga_rawdev/Makefile > new file mode 100644 > index 000..3166fe2 > --- /dev/null > +++ b/drivers/raw/ifpga_rawdev/Makefile > @@ -0,0 +1,59 @@ > +# BSD LICENSE > +# > +# Copyright(c) 2010-2017 Intel Corporation. All rights reserved. > +# All rights reserved. > +# > +# Redistribution and use in source and binary forms, with or without > +# modification, are permitted provided that the following conditions > +# are met: > +# > +# * Redistributions of source code must retain the above copyright > +# notice, this list of conditions and the following disclaimer. > +# * Redistributions in binary form must reproduce the above copyright > +# notice, this list of conditions and the following disclaimer in > +# the documentation and/or other materials provided with the > +# distribution. > +# * Neither the name of Intel Corporation nor the names of its > +# contributors may be used to endorse or promote products derived > +# from this software without specific prior written permission. > +# > +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + SPDX identifier in place of BSD boiler-plate. > +include $(RTE_SDK)/mk/rte.vars.mk > + > +# > +# library name > +# > +LIB = librte_pmd_ifpga_rawdev.a > + > +CFLAGS += -DALLOW_EXPERIMENTAL_API > +CFLAGS += -O3 > +CFLAGS += $(WERROR_FLAGS) > +CFLAGS += -I$(RTE_SDK)/drivers/bus/ifpga CFLAGS += > +-I$(RTE_SDK)/drivers/raw/ifpga_rawdev > +LDLIBS += -lrte_eal > +LDLIBS += -lrte_rawdev > +LDLIBS += -lrte_bus_vdev > +LDLIBS += -lrte_kvargs > + > +EXPORT_MAP := rte_pmd_ifpga_rawdev_version.map > + > +LIBABIVER := 1 > + > +# > +# all source are stored in SRCS-y > +# > +SRCS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_RAWDEV) += ifpga_rawdev.c > +SRCS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_RAWDEV) += > +ifpga_rawdev_example.c This is a copy-paste issue - CONFIG_RTE_LIBRTE_PMD_SKELETON_RAWDEV > + > +include $(RTE_SDK)/mk/rte.lib.mk > diff --git a/drivers/raw/ifpga_rawdev/ifpga_rawdev.c > b/drivers/raw/ifpga_rawdev/ifpga_rawdev.c > new file mode 100644 > index 000..6046711 > --- /dev/null > +++ b/drivers/raw/ifpga_rawdev/ifpga_rawdev.c > @@ -0,0 +1,343 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright 2016 NXP. :) - should be Intel. Even better - SPDX > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright
Re: [dpdk-dev] [PATCH 3/4] drivers/net: do not allocate rte_eth_dev_data privately
Hi Jianfeng From: Tan, Jianfeng, Sent: Tuesday, March 6, 2018 10:56 AM > > -Original Message- > > From: Matan Azrad [mailto:ma...@mellanox.com] > > Sent: Tuesday, March 6, 2018 2:08 PM > > To: Tan, Jianfeng; Yigit, Ferruh > > Cc: Richardson, Bruce; Ananyev, Konstantin; Thomas Monjalon; > > maxime.coque...@redhat.com; Burakov, Anatoly; dev@dpdk.org > > Subject: RE: [dpdk-dev] [PATCH 3/4] drivers/net: do not allocate > > rte_eth_dev_data privately > > > > Hi Jianfeng > > > > Please see a comment below. > > > > > From: Jianfeng Tan, Sent: Sunday, March 4, 2018 5:30 PM We > > > introduced private rte_eth_dev_data to allow vdev to be created both > > in > > > primary process and secondary process(es). This is not friendly to > > > multi- process model, for example, it leads to port id contention > > > issue if two processes both find the data entry is free. > > > > > > And to get stats of primary vdev in secondary, we must allocate from > > > the pre-defined array so that we can find it. > > > > > > Suggested-by: Bruce Richardson > > > Signed-off-by: Jianfeng Tan > > > --- > > > drivers/net/af_packet/rte_eth_af_packet.c | 25 +++-- > > > drivers/net/kni/rte_eth_kni.c | 13 ++--- > > > drivers/net/null/rte_eth_null.c | 17 +++-- > > > drivers/net/octeontx/octeontx_ethdev.c| 14 ++ > > > drivers/net/pcap/rte_eth_pcap.c | 18 +++--- > > > drivers/net/tap/rte_eth_tap.c | 9 + > > > drivers/net/vhost/rte_eth_vhost.c | 17 ++--- > > > 7 files changed, 20 insertions(+), 93 deletions(-) > > > > > > diff --git a/drivers/net/af_packet/rte_eth_af_packet.c > > > b/drivers/net/af_packet/rte_eth_af_packet.c > > > index 57eccfd..2db692f 100644 > > > --- a/drivers/net/af_packet/rte_eth_af_packet.c > > > +++ b/drivers/net/af_packet/rte_eth_af_packet.c > > > @@ -564,25 +564,17 @@ rte_pmd_init_internals(struct rte_vdev_device > > > *dev, > > > RTE_LOG(ERR, PMD, > > > "%s: no interface specified for AF_PACKET > ethdev\n", > > > name); > > > - goto error_early; > > > + return -1; > > > } > > > > > > RTE_LOG(INFO, PMD, > > > "%s: creating AF_PACKET-backed ethdev on numa socket > %u\n", > > > name, numa_node); > > > > > > - /* > > > - * now do all data allocation - for eth_dev structure, dummy pci > > > driver > > > - * and internal (private) data > > > - */ > > > - data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node); > > > - if (data == NULL) > > > - goto error_early; > > > - > > > *internals = rte_zmalloc_socket(name, sizeof(**internals), > > > 0, numa_node); > > > if (*internals == NULL) > > > - goto error_early; > > > + return -1; > > > > > > for (q = 0; q < nb_queues; q++) { > > > (*internals)->rx_queue[q].map = MAP_FAILED; @@ -604,24 > > > +596,24 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, > > > RTE_LOG(ERR, PMD, > > > "%s: I/F name too long (%s)\n", > > > name, pair->value); > > > - goto error_early; > > > + return -1; > > > } > > > if (ioctl(sockfd, SIOCGIFINDEX, &ifr) == -1) { > > > RTE_LOG(ERR, PMD, > > > "%s: ioctl failed (SIOCGIFINDEX)\n", > > > name); > > > - goto error_early; > > > + return -1; > > > } > > > (*internals)->if_name = strdup(pair->value); > > > if ((*internals)->if_name == NULL) > > > - goto error_early; > > > + return -1; > > > (*internals)->if_index = ifr.ifr_ifindex; > > > > > > if (ioctl(sockfd, SIOCGIFHWADDR, &ifr) == -1) { > > > RTE_LOG(ERR, PMD, > > > "%s: ioctl failed (SIOCGIFHWADDR)\n", > > > name); > > > - goto error_early; > > > + return -1; > > > } > > > memcpy(&(*internals)->eth_addr, ifr.ifr_hwaddr.sa_data, > ETH_ALEN); > > > > > > @@ -775,14 +767,13 @@ rte_pmd_init_internals(struct rte_vdev_device > > > *dev, > > > > > > (*internals)->nb_queues = nb_queues; > > > > > > - rte_memcpy(data, (*eth_dev)->data, sizeof(*data)); > > > + data = (*eth_dev)->data; > > > data->dev_private = *internals; > > > data->nb_rx_queues = (uint16_t)nb_queues; > > > data->nb_tx_queues = (uint16_t)nb_queues; > > > data->dev_link = pmd_link; > > > data->mac_addrs = &(*internals)->eth_addr; > > > > > > - (*eth_dev)->data = data; > > > (*eth_dev)->dev_ops = &ops; > > > > > > return 0; > > > @@ -802,8 +793,6 @@ rte_pmd_init_internals(struct rte_vdev_device > > *dev, > > > } > > > free((*internals)->if_name); > > > rte_free(*internals); > > > -error_early: > > > - rte_free(data); > > > return -1; > > > } > > > > > > > I think you should remove the private rte_eth_dev_data freeing in > > rte_pmd_af_packet_remove(). > > This is relevant to all the vde
Re: [dpdk-dev] [PATCH 3/4] drivers/net: do not allocate rte_eth_dev_data privately
Hi Jianfeng From: Matan Azrad, Wednesday, March 7, 2018 8:01 AM > Hi Jianfeng > > From: Tan, Jianfeng, Sent: Tuesday, March 6, 2018 10:56 AM > > > -Original Message- > > > From: Matan Azrad [mailto:ma...@mellanox.com] > > > Sent: Tuesday, March 6, 2018 2:08 PM > > > To: Tan, Jianfeng; Yigit, Ferruh > > > Cc: Richardson, Bruce; Ananyev, Konstantin; Thomas Monjalon; > > > maxime.coque...@redhat.com; Burakov, Anatoly; dev@dpdk.org > > > Subject: RE: [dpdk-dev] [PATCH 3/4] drivers/net: do not allocate > > > rte_eth_dev_data privately > > > > > > Hi Jianfeng > > > > > > Please see a comment below. > > > > > > > From: Jianfeng Tan, Sent: Sunday, March 4, 2018 5:30 PM We > > > > introduced private rte_eth_dev_data to allow vdev to be created > > > > both > > > in > > > > primary process and secondary process(es). This is not friendly to > > > > multi- process model, for example, it leads to port id contention > > > > issue if two processes both find the data entry is free. > > > > > > > > And to get stats of primary vdev in secondary, we must allocate > > > > from the pre-defined array so that we can find it. > > > > > > > > Suggested-by: Bruce Richardson > > > > Signed-off-by: Jianfeng Tan > > > > --- > > > > drivers/net/af_packet/rte_eth_af_packet.c | 25 +++ > -- > > > > drivers/net/kni/rte_eth_kni.c | 13 ++--- > > > > drivers/net/null/rte_eth_null.c | 17 +++-- > > > > drivers/net/octeontx/octeontx_ethdev.c| 14 ++ > > > > drivers/net/pcap/rte_eth_pcap.c | 18 +++--- > > > > drivers/net/tap/rte_eth_tap.c | 9 + > > > > drivers/net/vhost/rte_eth_vhost.c | 17 ++--- > > > > 7 files changed, 20 insertions(+), 93 deletions(-) > > > > > > > > diff --git a/drivers/net/af_packet/rte_eth_af_packet.c > > > > b/drivers/net/af_packet/rte_eth_af_packet.c > > > > index 57eccfd..2db692f 100644 > > > > --- a/drivers/net/af_packet/rte_eth_af_packet.c > > > > +++ b/drivers/net/af_packet/rte_eth_af_packet.c > > > > @@ -564,25 +564,17 @@ rte_pmd_init_internals(struct > > > > rte_vdev_device *dev, > > > > RTE_LOG(ERR, PMD, > > > > "%s: no interface specified for AF_PACKET > > ethdev\n", > > > > name); > > > > - goto error_early; > > > > + return -1; > > > > } > > > > > > > > RTE_LOG(INFO, PMD, > > > > "%s: creating AF_PACKET-backed ethdev on numa socket > > %u\n", > > > > name, numa_node); > > > > > > > > - /* > > > > -* now do all data allocation - for eth_dev structure, dummy pci > > > > driver > > > > -* and internal (private) data > > > > -*/ > > > > - data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node); > > > > - if (data == NULL) > > > > - goto error_early; > > > > - > > > > *internals = rte_zmalloc_socket(name, sizeof(**internals), > > > > 0, numa_node); > > > > if (*internals == NULL) > > > > - goto error_early; > > > > + return -1; > > > > > > > > for (q = 0; q < nb_queues; q++) { > > > > (*internals)->rx_queue[q].map = MAP_FAILED; @@ -604,24 > > > > +596,24 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, > > > > RTE_LOG(ERR, PMD, > > > > "%s: I/F name too long (%s)\n", > > > > name, pair->value); > > > > - goto error_early; > > > > + return -1; > > > > } > > > > if (ioctl(sockfd, SIOCGIFINDEX, &ifr) == -1) { > > > > RTE_LOG(ERR, PMD, > > > > "%s: ioctl failed (SIOCGIFINDEX)\n", > > > > name); > > > > - goto error_early; > > > > + return -1; > > > > } > > > > (*internals)->if_name = strdup(pair->value); > > > > if ((*internals)->if_name == NULL) > > > > - goto error_early; > > > > + return -1; > > > > (*internals)->if_index = ifr.ifr_ifindex; > > > > > > > > if (ioctl(sockfd, SIOCGIFHWADDR, &ifr) == -1) { > > > > RTE_LOG(ERR, PMD, > > > > "%s: ioctl failed (SIOCGIFHWADDR)\n", > > > > name); > > > > - goto error_early; > > > > + return -1; > > > > } > > > > memcpy(&(*internals)->eth_addr, ifr.ifr_hwaddr.sa_data, > > ETH_ALEN); > > > > > > > > @@ -775,14 +767,13 @@ rte_pmd_init_internals(struct > > > > rte_vdev_device *dev, > > > > > > > > (*internals)->nb_queues = nb_queues; > > > > > > > > - rte_memcpy(data, (*eth_dev)->data, sizeof(*data)); > > > > + data = (*eth_dev)->data; > > > > data->dev_private = *internals; > > > > data->nb_rx_queues = (uint