[dpdk-dev] [Question] How pmd virtio works without UIO?
On Tue, Dec 22, 2015 at 04:38:30PM +, Xie, Huawei wrote: > On 12/22/2015 7:39 PM, Peter Xu wrote: > > I tried to unbind one of the virtio net device, I see the PCI entry > > still there. > > > > Before unbind: > > > > [root at vm proc]# lspci -k -s 00:03.0 > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device > > Subsystem: Red Hat, Inc Device 0001 > > Kernel driver in use: virtio-pci > > [root at vm proc]# cat /proc/ioports | grep c060-c07f > > c060-c07f : :00:03.0 > > c060-c07f : virtio-pci > > > > After unbind: > > > > [root at vm proc]# lspci -k -s 00:03.0 > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device > > Subsystem: Red Hat, Inc Device 0001 > > [root at vm proc]# cat /proc/ioports | grep c060-c07f > > c060-c07f : :00:03.0 > > > > So... does this means that it is an alternative to black list > > solution? > Oh, we could firstly check if this port is manipulated by kernel driver > in virtio_resource_init/eth_virtio_dev_init, as long as it is not too late. I guess there might be two problems? Which are: 1. How user avoid DPDK taking over virtio devices that they do not want for IO (chooses which device to use) 2. Driver conflict between virtio PMD in DPDK, and virtio-pci in kernel (happens on every virtio device that DPDK uses) For the white/black list solution, I guess it's good enough to solve (1) for customers. I am just curious about the 2nd. Or say, even we black listed some virtio devices (or doing white list), the virtio devices used by DPDK are still in danger if we cannot make sure that virtio-pci will not touch the device any more (even it will not touch it, it feels like errornous to not telling virtio-pci to remove it before hand). E.g., if virtio-pci interrupt is still working, when there are packets from outside to guest, vp_interrupt() might be called? Then virtio-pci driver might do read/write to vring as well? If so, that's problematic. Am I wrong? Peter
[dpdk-dev] [PATCH v4 2/6] fm10k: setup rx queue interrupts for PF and VF
> -Original Message- > From: Qiu, Michael > Sent: Tuesday, December 22, 2015 3:28 PM > To: He, Shaopeng; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 2/6] fm10k: setup rx queue interrupts for > PF and VF > > On 12/21/2015 6:20 PM, Shaopeng He wrote: > > In interrupt mode, each rx queue can have one interrupt to notify the > > up layer application when packets are available in that queue. Some > > queues also can share one interrupt. > > Currently, fm10k needs one separate interrupt for mailbox. So, only > > those drivers which support multiple interrupt vectors e.g. vfio-pci > > can work in fm10k interrupt mode. > > This patch uses the RXINT/INT_MAP registers to map interrupt causes > > (rx queue and other events) to vectors, and enable these interrupts > > through kernel drivers like vfio-pci. > > > > Signed-off-by: Shaopeng He > > Acked-by: Jing Chen > > --- > > doc/guides/rel_notes/release_2_3.rst | 2 + > > drivers/net/fm10k/fm10k.h| 3 ++ > > drivers/net/fm10k/fm10k_ethdev.c | 101 > +++ > > 3 files changed, 95 insertions(+), 11 deletions(-) > > > > diff --git a/doc/guides/rel_notes/release_2_3.rst > > b/doc/guides/rel_notes/release_2_3.rst > > index 99de186..2cb5ebd 100644 > > --- a/doc/guides/rel_notes/release_2_3.rst > > +++ b/doc/guides/rel_notes/release_2_3.rst > > @@ -4,6 +4,8 @@ DPDK Release 2.3 > > New Features > > > > > > +* **Added fm10k Rx interrupt support.** > > + > > > > Resolved Issues > > --- > > diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h > > index e2f677a..770d6ba 100644 > > --- a/drivers/net/fm10k/fm10k.h > > +++ b/drivers/net/fm10k/fm10k.h > > @@ -129,6 +129,9 @@ > > #define RTE_FM10K_TX_MAX_FREE_BUF_SZ64 > > #define RTE_FM10K_DESCS_PER_LOOP4 > > > > +#define FM10K_MISC_VEC_ID RTE_INTR_VEC_ZERO_OFFSET > > +#define FM10K_RX_VEC_START RTE_INTR_VEC_RXTX_OFFSET > > + > > #define FM10K_SIMPLE_TX_FLAG > ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \ > > ETH_TXQ_FLAGS_NOOFFLOADS) > > > > diff --git a/drivers/net/fm10k/fm10k_ethdev.c > > b/drivers/net/fm10k/fm10k_ethdev.c > > index d39c33b..a34c5e2 100644 > > --- a/drivers/net/fm10k/fm10k_ethdev.c > > +++ b/drivers/net/fm10k/fm10k_ethdev.c > > @@ -54,6 +54,8 @@ > > /* Number of chars per uint32 type */ #define CHARS_PER_UINT32 > > (sizeof(uint32_t)) #define BIT_MASK_PER_UINT32 ((1 << > > CHARS_PER_UINT32) - 1) > > +/* default 1:1 map from queue ID to interrupt vector ID */ #define > > +Q2V(dev, queue_id) (dev->pci_dev->intr_handle.intr_vec[queue_id]) > > > > static void fm10k_close_mbx_service(struct fm10k_hw *hw); static > > void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev); @@ - > 109,6 > > +111,8 @@ struct fm10k_xstats_name_off fm10k_hw_stats_tx_q_strings[] > = > > { > > > > #define FM10K_NB_XSTATS (FM10K_NB_HW_XSTATS + > FM10K_MAX_QUEUES_PF * \ > > (FM10K_NB_RX_Q_XSTATS + FM10K_NB_TX_Q_XSTATS)) > > +static int > > +fm10k_dev_rxq_interrupt_setup(struct rte_eth_dev *dev); > > > > static void > > fm10k_mbx_initlock(struct fm10k_hw *hw) @@ -687,6 +691,7 @@ static > > int fm10k_dev_rx_init(struct rte_eth_dev *dev) { > > struct fm10k_hw *hw = > > FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > + struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle; > > int i, ret; > > struct fm10k_rx_queue *rxq; > > uint64_t base_addr; > > @@ -694,10 +699,23 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev) > > uint32_t rxdctl = FM10K_RXDCTL_WRITE_BACK_MIN_DELAY; > > uint16_t buf_size; > > > > - /* Disable RXINT to avoid possible interrupt */ > > - for (i = 0; i < hw->mac.max_queues; i++) > > + /* enable RXINT for interrupt mode */ > > + i = 0; > > + if (rte_intr_dp_is_en(intr_handle)) { > > + for (; i < dev->data->nb_rx_queues; i++) { > > + FM10K_WRITE_REG(hw, FM10K_RXINT(i), Q2V(dev, > i)); > > + if (hw->mac.type == fm10k_mac_pf) > > + FM10K_WRITE_REG(hw, > FM10K_ITR(Q2V(dev, i)), > > + FM10K_ITR_AUTOMASK | > FM10K_ITR_MASK_CLEAR); > > + else > > + FM10K_WRITE_REG(hw, > FM10K_VFITR(Q2V(dev, i)), > > + FM10K_ITR_AUTOMASK | > FM10K_ITR_MASK_CLEAR); > > + } > > + } > > + /* Disable other RXINT to avoid possible interrupt */ > > + for (; i < hw->mac.max_queues; i++) > > FM10K_WRITE_REG(hw, FM10K_RXINT(i), > > - 3 << FM10K_RXINT_TIMER_SHIFT); > > + 3 << FM10K_RXINT_TIMER_SHIFT); > > > > /* Setup RX queues */ > > for (i = 0; i < dev->data->nb_rx_queues; ++i) { @@ -1053,6 +1071,9 > > @@ fm10k_dev_start(struct rte_eth_dev *dev) > > return diag; > > } > > > > + if (fm10k_dev_rxq_interrupt_setup(dev)) > > + return -EIO; >
[dpdk-dev] [Question] How pmd virtio works without UIO?
On Tue, Dec 22, 2015 at 05:56:41PM +0800, Peter Xu wrote: > On Tue, Dec 22, 2015 at 04:32:46PM +0800, Yuanhan Liu wrote: > > Actually, you are right. I mentioned in the last email that this is > > for configuration part. To answer your question in this email, you > > will not be able to go that further (say initiating virtio pmd) if > > you don't unbind the origin virtio-net driver, and bind it to igb_uio > > (or something similar). > > > > The start point is from rte_eal_pci_scan, where the sub-function > > pci_san_one just initates a DPDK bond driver. > > I am not sure whether I do understand your meaning correctly > (regarding "you willl not be able to go that furture"): The problem > is that, we _can_ run testpmd without unbinding the ports and bind > to UIO or something. What we need to do is boot the guest, reserve > huge pages, and run testpmd (keeping its kernel driver as > "virtio-pci"). In pci_scan_one(): > > if (!ret) { > if (!strcmp(driver, "vfio-pci")) > dev->kdrv = RTE_KDRV_VFIO; > else if (!strcmp(driver, "igb_uio")) > dev->kdrv = RTE_KDRV_IGB_UIO; > else if (!strcmp(driver, "uio_pci_generic")) > dev->kdrv = RTE_KDRV_UIO_GENERIC; > else > dev->kdrv = RTE_KDRV_UNKNOWN; > } else > dev->kdrv = RTE_KDRV_UNKNOWN; > > I think it should be going to RTE_KDRV_UNKNOWN > (driver=="virtio-pci") here. Sorry, I simply overlook that. I was thinking it will quit here for the RTE_KDRV_UNKNOWN case. > I tried to run IO and it could work, > but I am not sure whether it is safe, and how. I also did a quick test then, however, with the virtio 1.0 patchset I sent before, which sets the RTE_PCI_DRV_NEED_MAPPING, resulting to pci_map_device() failure and virtio pmd is not initiated at all. > > Also, I am not sure whether I need to (at least) unbind the > virtio-pci driver, so that there should have no kernel driver > running for the virtio device before DPDK using it. Why not? That's what the DPDK document asked to do (http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html): 3.6. Binding and Unbinding Network Ports to/from the Kernel Modules As of release 1.4, DPDK applications no longer automatically unbind all supported network ports from the kernel driver in use. Instead, all ports that are to be used by an DPDK application must be bound to the uio_pci_generic, igb_uio or vfio-pci module before the application is run. Any network ports under Linux* control will be ignored by the DPDK poll-mode drivers and cannot be used by the application. --yliu
[dpdk-dev] [Question] How pmd virtio works without UIO?
On Wed, Dec 23, 2015 at 09:55:54AM +0800, Peter Xu wrote: > On Tue, Dec 22, 2015 at 04:38:30PM +, Xie, Huawei wrote: > > On 12/22/2015 7:39 PM, Peter Xu wrote: > > > I tried to unbind one of the virtio net device, I see the PCI entry > > > still there. > > > > > > Before unbind: > > > > > > [root at vm proc]# lspci -k -s 00:03.0 > > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device > > > Subsystem: Red Hat, Inc Device 0001 > > > Kernel driver in use: virtio-pci > > > [root at vm proc]# cat /proc/ioports | grep c060-c07f > > > c060-c07f : :00:03.0 > > > c060-c07f : virtio-pci > > > > > > After unbind: > > > > > > [root at vm proc]# lspci -k -s 00:03.0 > > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device > > > Subsystem: Red Hat, Inc Device 0001 > > > [root at vm proc]# cat /proc/ioports | grep c060-c07f > > > c060-c07f : :00:03.0 > > > > > > So... does this means that it is an alternative to black list > > > solution? > > Oh, we could firstly check if this port is manipulated by kernel driver > > in virtio_resource_init/eth_virtio_dev_init, as long as it is not too late. Why can't we simply quit at pci_scan_one, once finding that it's not bond to uio (or similar stuff)? That would be generic enough, that we don't have to do similar checks for each new pmd driver. Or, am I missing something? > I guess there might be two problems? Which are: > > 1. How user avoid DPDK taking over virtio devices that they do not >want for IO (chooses which device to use) Isn't that what's the 'binding/unbinding' for? > 2. Driver conflict between virtio PMD in DPDK, and virtio-pci in >kernel (happens on every virtio device that DPDK uses) If you unbinded the kernel driver first, which is the suggested (or must?) way to use DPDK, that will not happen. --yliu > > For the white/black list solution, I guess it's good enough to solve > (1) for customers. I am just curious about the 2nd. > > Or say, even we black listed some virtio devices (or doing white > list), the virtio devices used by DPDK are still in danger if we > cannot make sure that virtio-pci will not touch the device any more > (even it will not touch it, it feels like errornous to not telling > virtio-pci to remove it before hand). E.g., if virtio-pci interrupt > is still working, when there are packets from outside to guest, > vp_interrupt() might be called? Then virtio-pci driver might do > read/write to vring as well? If so, that's problematic. Am I wrong? > > Peter
[dpdk-dev] [Question] How pmd virtio works without UIO?
On Wed, Dec 23, 2015 at 10:09:49AM +0800, Yuanhan Liu wrote: > Why can't we simply quit at pci_scan_one, once finding that it's not > bond to uio (or similar stuff)? That would be generic enough, that we > don't have to do similar checks for each new pmd driver. > > Or, am I missing something? It seems that ioport way to play with virtio devices do not require any PCI wrapper layer like UIO/VFIO? Please check virtio_resource_init(). > > > > I guess there might be two problems? Which are: > > > > 1. How user avoid DPDK taking over virtio devices that they do not > >want for IO (chooses which device to use) > > Isn't that what's the 'binding/unbinding' for? > > > 2. Driver conflict between virtio PMD in DPDK, and virtio-pci in > >kernel (happens on every virtio device that DPDK uses) > > If you unbinded the kernel driver first, which is the suggested (or > must?) way to use DPDK, that will not happen. Yes, maybe we should unbind it first. I am just not sure what will happen if not. Peter
[dpdk-dev] [Question] How pmd virtio works without UIO?
On Wed, Dec 23, 2015 at 10:01:35AM +0800, Yuanhan Liu wrote: > On Tue, Dec 22, 2015 at 05:56:41PM +0800, Peter Xu wrote: > > On Tue, Dec 22, 2015 at 04:32:46PM +0800, Yuanhan Liu wrote: > > > Actually, you are right. I mentioned in the last email that this is > > > for configuration part. To answer your question in this email, you > > > will not be able to go that further (say initiating virtio pmd) if > > > you don't unbind the origin virtio-net driver, and bind it to igb_uio > > > (or something similar). > > > > > > The start point is from rte_eal_pci_scan, where the sub-function > > > pci_san_one just initates a DPDK bond driver. > > > > I am not sure whether I do understand your meaning correctly > > (regarding "you willl not be able to go that furture"): The problem > > is that, we _can_ run testpmd without unbinding the ports and bind > > to UIO or something. What we need to do is boot the guest, reserve > > huge pages, and run testpmd (keeping its kernel driver as > > "virtio-pci"). In pci_scan_one(): > > > > if (!ret) { > > if (!strcmp(driver, "vfio-pci")) > > dev->kdrv = RTE_KDRV_VFIO; > > else if (!strcmp(driver, "igb_uio")) > > dev->kdrv = RTE_KDRV_IGB_UIO; > > else if (!strcmp(driver, "uio_pci_generic")) > > dev->kdrv = RTE_KDRV_UIO_GENERIC; > > else > > dev->kdrv = RTE_KDRV_UNKNOWN; > > } else > > dev->kdrv = RTE_KDRV_UNKNOWN; > > > > I think it should be going to RTE_KDRV_UNKNOWN > > (driver=="virtio-pci") here. > > Sorry, I simply overlook that. I was thinking it will quit here for > the RTE_KDRV_UNKNOWN case. > > > I tried to run IO and it could work, > > but I am not sure whether it is safe, and how. > > I also did a quick test then, however, with the virtio 1.0 patchset > I sent before, which sets the RTE_PCI_DRV_NEED_MAPPING, resulting to > pci_map_device() failure and virtio pmd is not initiated at all. Then, will the patch work with ioport way to access virtio devices? > > > > > Also, I am not sure whether I need to (at least) unbind the > > virtio-pci driver, so that there should have no kernel driver > > running for the virtio device before DPDK using it. > > Why not? That's what the DPDK document asked to do > (http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html): > > 3.6. Binding and Unbinding Network Ports to/from the Kernel Modules > > As of release 1.4, DPDK applications no longer automatically unbind > all supported network ports from the kernel driver in use. Instead, > all ports that are to be used by an DPDK application must be bound > to the uio_pci_generic, igb_uio or vfio-pci module before the > application is run. Any network ports under Linux* control will be > ignored by the DPDK poll-mode drivers and cannot be used by the > application. This seems obsolete? since it's not covering ioport. Peter > > > --yliu
[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote: > On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu > wrote: > > On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote: > > The queue state change callback is the one new API that needs to be > > added because > > normal NICs don't have this behavior. > > Again I'd ask, will vring_state_changed() be enough, when above issues > are resolved: vring_state_changed() will be invoked at new_device()/ > destroy_device(), and of course, ethtool change? > > > It would be sufficient. It is not a great API though, because it requires the > application to do the conversion from struct virtio_net to a DPDK port number, > and from a virtqueue index to a DPDK queue id and direction. Also, the current > implementation often makes this callback when the vring state has not actually > changed (enabled -> enabled and disabled -> disabled). > > If you're asking about using vring_state_changed() _instead_ of the link > status > event and rte_eth_dev_socket_id(), No, I like the idea of link status event and rte_eth_dev_socket_id(); I was just wondering why a new API is needed. Both Tetsuya and I were thinking to leverage the link status event to represent the queue stats change (triggered by vring_state_changed()) as well, so that we don't need to introduce another eth event. However, I'd agree that it's better if we could have a new dedicate event. Thomas, here is some background for you. For vhost pmd and linux virtio-net combo, the queue can be dynamically changed by ethtool, therefore, the application wishes to have another eth event, say RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can add/remove corresponding queue to the datapath when that happens. What do you think of that? > then yes, it still works. I'd only consider > that a stopgap until the real ethdev APIs are implemented. > > I'd suggest to add?RTE_ETH_EVENT_QUEUE_STATE_CHANGE rather than > create another callback registration API. > > Perhaps we could merge the basic PMD which I think is pretty solid and then > continue the API discussion with patches to it. Perhaps, but let's see how hard it could be for the new eth event discussion then. --yliu
[dpdk-dev] [Question] How pmd virtio works without UIO?
On Wed, Dec 23, 2015 at 10:41:57AM +0800, Peter Xu wrote: > On Wed, Dec 23, 2015 at 10:01:35AM +0800, Yuanhan Liu wrote: > > On Tue, Dec 22, 2015 at 05:56:41PM +0800, Peter Xu wrote: > > > On Tue, Dec 22, 2015 at 04:32:46PM +0800, Yuanhan Liu wrote: > > > > Actually, you are right. I mentioned in the last email that this is > > > > for configuration part. To answer your question in this email, you > > > > will not be able to go that further (say initiating virtio pmd) if > > > > you don't unbind the origin virtio-net driver, and bind it to igb_uio > > > > (or something similar). > > > > > > > > The start point is from rte_eal_pci_scan, where the sub-function > > > > pci_san_one just initates a DPDK bond driver. > > > > > > I am not sure whether I do understand your meaning correctly > > > (regarding "you willl not be able to go that furture"): The problem > > > is that, we _can_ run testpmd without unbinding the ports and bind > > > to UIO or something. What we need to do is boot the guest, reserve > > > huge pages, and run testpmd (keeping its kernel driver as > > > "virtio-pci"). In pci_scan_one(): > > > > > > if (!ret) { > > > if (!strcmp(driver, "vfio-pci")) > > > dev->kdrv = RTE_KDRV_VFIO; > > > else if (!strcmp(driver, "igb_uio")) > > > dev->kdrv = RTE_KDRV_IGB_UIO; > > > else if (!strcmp(driver, "uio_pci_generic")) > > > dev->kdrv = RTE_KDRV_UIO_GENERIC; > > > else > > > dev->kdrv = RTE_KDRV_UNKNOWN; > > > } else > > > dev->kdrv = RTE_KDRV_UNKNOWN; > > > > > > I think it should be going to RTE_KDRV_UNKNOWN > > > (driver=="virtio-pci") here. > > > > Sorry, I simply overlook that. I was thinking it will quit here for > > the RTE_KDRV_UNKNOWN case. > > > > > I tried to run IO and it could work, > > > but I am not sure whether it is safe, and how. > > > > I also did a quick test then, however, with the virtio 1.0 patchset > > I sent before, which sets the RTE_PCI_DRV_NEED_MAPPING, resulting to > > pci_map_device() failure and virtio pmd is not initiated at all. > > Then, will the patch work with ioport way to access virtio devices? Yes. > > > > > > > > > Also, I am not sure whether I need to (at least) unbind the > > > virtio-pci driver, so that there should have no kernel driver > > > running for the virtio device before DPDK using it. > > > > Why not? That's what the DPDK document asked to do > > (http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html): > > > > 3.6. Binding and Unbinding Network Ports to/from the Kernel Modules > > > > As of release 1.4, DPDK applications no longer automatically unbind > > all supported network ports from the kernel driver in use. Instead, > > all ports that are to be used by an DPDK application must be bound > > to the uio_pci_generic, igb_uio or vfio-pci module before the > > application is run. Any network ports under Linux* control will be > > ignored by the DPDK poll-mode drivers and cannot be used by the > > application. > > This seems obsolete? since it's not covering ioport. I don't think so. Above is for how to run DPDK applications. ioport is just a (optional) way to access PCI resource in a specific PMD. And, above speicification avoids your concerns, that two drivers try to manipulate same device concurrently, doesn't it? And, it is saying "any network ports under Linux* control will be ignored by the DPDK poll-mode drivers and cannot be used by the application", so that the case you were saying that virtio pmd continues to work without the bind looks like a bug to me. Can anyone confirm that? --yliu
[dpdk-dev] DPDP crash with sr-iov (with ESXi 5.5 hypervisor)
Hi, While initializing pci port (VF) DPDK is crashing while configuring the device. Reason/location: PMD: rte_eth_dev_configure: ethdev port_id=1 nb_rx_queues=8 > 2 EAL: Error - exiting with code: 1 System info: DPDK version: 2.0 NIC: 82599EB, sr-iov enabled. SR-IOV config at ESXi 5.5 hypervisor host: max_vfs=2 Guest OS: Linux OS based. Driver: ixgbevf.ko VM is configured with 3 vCPUs. Before linking the port to DPDK, I see that, pci device (VF) comes up with 8 rx/tx queues (using native kernel driver ixgbevf.ko, /sys/class/net/ethx/queues/*). But DPDK code expect max queues for device to be '2' and hence the crash. Am I missing anything here? Appreciate for any suggestions/fixes for the issue. Thanks, -Vithal
[dpdk-dev] [Question] How pmd virtio works without UIO?
On 12/23/2015 10:57 AM, Yuanhan Liu wrote: > On Wed, Dec 23, 2015 at 10:41:57AM +0800, Peter Xu wrote: >> On Wed, Dec 23, 2015 at 10:01:35AM +0800, Yuanhan Liu wrote: >>> On Tue, Dec 22, 2015 at 05:56:41PM +0800, Peter Xu wrote: On Tue, Dec 22, 2015 at 04:32:46PM +0800, Yuanhan Liu wrote: > Actually, you are right. I mentioned in the last email that this is > for configuration part. To answer your question in this email, you > will not be able to go that further (say initiating virtio pmd) if > you don't unbind the origin virtio-net driver, and bind it to igb_uio > (or something similar). > > The start point is from rte_eal_pci_scan, where the sub-function > pci_san_one just initates a DPDK bond driver. I am not sure whether I do understand your meaning correctly (regarding "you willl not be able to go that furture"): The problem is that, we _can_ run testpmd without unbinding the ports and bind to UIO or something. What we need to do is boot the guest, reserve huge pages, and run testpmd (keeping its kernel driver as "virtio-pci"). In pci_scan_one(): if (!ret) { if (!strcmp(driver, "vfio-pci")) dev->kdrv = RTE_KDRV_VFIO; else if (!strcmp(driver, "igb_uio")) dev->kdrv = RTE_KDRV_IGB_UIO; else if (!strcmp(driver, "uio_pci_generic")) dev->kdrv = RTE_KDRV_UIO_GENERIC; else dev->kdrv = RTE_KDRV_UNKNOWN; } else dev->kdrv = RTE_KDRV_UNKNOWN; I think it should be going to RTE_KDRV_UNKNOWN (driver=="virtio-pci") here. >>> Sorry, I simply overlook that. I was thinking it will quit here for >>> the RTE_KDRV_UNKNOWN case. >>> I tried to run IO and it could work, but I am not sure whether it is safe, and how. >>> I also did a quick test then, however, with the virtio 1.0 patchset >>> I sent before, which sets the RTE_PCI_DRV_NEED_MAPPING, resulting to >>> pci_map_device() failure and virtio pmd is not initiated at all. >> Then, will the patch work with ioport way to access virtio devices? > Yes. > Also, I am not sure whether I need to (at least) unbind the virtio-pci driver, so that there should have no kernel driver running for the virtio device before DPDK using it. >>> Why not? That's what the DPDK document asked to do >>> (http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html): >>> >>> 3.6. Binding and Unbinding Network Ports to/from the Kernel Modules >>> >>> As of release 1.4, DPDK applications no longer automatically unbind >>> all supported network ports from the kernel driver in use. Instead, >>> all ports that are to be used by an DPDK application must be bound >>> to the uio_pci_generic, igb_uio or vfio-pci module before the >>> application is run. Any network ports under Linux* control will be >>> ignored by the DPDK poll-mode drivers and cannot be used by the >>> application. >> This seems obsolete? since it's not covering ioport. > I don't think so. Above is for how to run DPDK applications. ioport > is just a (optional) way to access PCI resource in a specific PMD. > > And, above speicification avoids your concerns, that two drivers try > to manipulate same device concurrently, doesn't it? > > And, it is saying "any network ports under Linux* control will be > ignored by the DPDK poll-mode drivers and cannot be used by the > application", so that the case you were saying that virtio pmd > continues to work without the bind looks like a bug to me. > > Can anyone confirm that? That document isn't accurate. virtio doesn't require binding to UIO driver if it uses PORT IO. The PORT IO commit said it is because UIO isn't secure, but avoid using uio doesn't bring more security as virtio PMD still could ask device to DMA into any memory. The thing we at least we might do is fail in virtio_resource_init if kernel driver is still manipulating this device. This saves the effort users use blacklist option and avoids the driver conflict. > > --yliu >
[dpdk-dev] [PATCH] i40e: fix the issue of port initialization failure
Workaround for the issue of cannot processing adminq commands during initialization, when 2x40G or 4x10G is receiving packets in highest throughput. Register 0x002698a8 and 0x002698ac should be cleared at first, and restored with the default values at the end. No more details, as they are not exposed registers. Signed-off-by: Helin Zhang --- drivers/net/i40e/i40e_ethdev.c | 39 +++ 1 file changed, 39 insertions(+) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index bf6220d..149a31e 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -712,6 +712,41 @@ i40e_add_tx_flow_control_drop_filter(struct i40e_pf *pf) " frames from VSIs."); } +/* Workaround for the issue of cannot processing adminq commands during + * initialization, when 2x40G or 4x10G is receiving packets in highest + * throughput. Register 0x002698a8 and 0x002698ac should be cleared at + * first, and restored with the default values at the end. No more details, + * as they are not exposed registers. + */ +static void +i40e_clear_fdena(struct i40e_hw *hw) +{ + uint32_t fdena0, fdena1; + + fdena0 = I40E_READ_REG(hw, 0x002698a8); + fdena1 = I40E_READ_REG(hw, 0x002698ac); + PMD_INIT_LOG(DEBUG, "[0x002698a8]: 0x%08x, [0x002698ac]: 0x%08x", +fdena0, fdena1); + + I40E_WRITE_REG(hw, 0x002698a8, 0x0); + I40E_WRITE_REG(hw, 0x002698ac, 0x0); + I40E_WRITE_FLUSH(hw); +} + +/* Workaround for the issue of cannot processing adminq commands during + * initialization, when 2x40G or 4x10G is receiving packets in highest + * throughput. Register 0x002698a8 and 0x002698ac should be cleared at + * first, and restored with the default values at the end. No more details, + * as they are not exposed registers. + */ +static void +i40e_restore_fdena(struct i40e_hw *hw) +{ + I40E_WRITE_REG(hw, 0x002698a8, 0xfc00); + I40E_WRITE_REG(hw, 0x002698ac, 0x80007fdf); + I40E_WRITE_FLUSH(hw); +} + static int eth_i40e_dev_init(struct rte_eth_dev *dev) { @@ -774,6 +809,8 @@ eth_i40e_dev_init(struct rte_eth_dev *dev) return ret; } + i40e_clear_fdena(hw); + /* Initialize the shared code (base driver) */ ret = i40e_init_shared_code(hw); if (ret) { @@ -934,6 +971,8 @@ eth_i40e_dev_init(struct rte_eth_dev *dev) pf->flags &= ~I40E_FLAG_DCB; } + i40e_restore_fdena(hw); + return 0; err_mac_alloc: -- 1.9.3
[dpdk-dev] DPDP crash with sr-iov (with ESXi 5.5 hypervisor)
Hi Vithal, The number of VF queues is decided by PF. Suppose you use kernel driver for PF. So the queue number is decided by PF kernel driver. I have a 82599ES, and find no matter ixgbevf or dpdk igb_uio is used, the rx queue number is 2. Frankly, I believe 2 is the expected number. Surprised that you get 8 when using ixgbevf. Hope this can help. Thanks. > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vithal Mohare > Sent: Wednesday, December 23, 2015 12:32 PM > To: dev at dpdk.org > Subject: [dpdk-dev] DPDP crash with sr-iov (with ESXi 5.5 hypervisor) > > Hi, > > While initializing pci port (VF) DPDK is crashing while configuring the > device. > Reason/location: > PMD: rte_eth_dev_configure: ethdev port_id=1 nb_rx_queues=8 > > 2 > EAL: Error - exiting with code: 1 > > System info: > DPDK version: 2.0 > NIC: 82599EB, sr-iov enabled. > SR-IOV config at ESXi 5.5 hypervisor host: max_vfs=2 Guest OS: Linux OS > based. Driver: ixgbevf.ko > > VM is configured with 3 vCPUs. Before linking the port to DPDK, I see that, > pci device (VF) comes up with 8 rx/tx queues (using native kernel driver > ixgbevf.ko, /sys/class/net/ethx/queues/*). But DPDK code expect max > queues for device to be '2' and hence the crash. Am I missing anything here? > Appreciate for any suggestions/fixes for the issue. > > Thanks, > -Vithal
[dpdk-dev] [PATCH v5 0/6] interrupt mode for fm10k
This patch series adds interrupt mode support for fm10k, contains four major parts: 1. implement rx_descriptor_done function in fm10k 2. add rx interrupt support in fm10k PF and VF 3. make sure default VID available in dev_init in fm10k 4. fix a memory leak for non-ip packet in l3fwd-power, which happens mostly when testing fm10k interrupt mode. v5 changes: - remove one unnecessary NULL check for rte_free - fix a wrong error message - add more clean up when memory allocation fails - split line over 80 characters to 2 lines - update interrupt mode limitation in fm10k.rst v4 changes: - rebase to latest code - update release 2.3 note in corresponding patches v3 changes: - rebase to latest code - macro renaming according to the EAL change v2 changes: - reword some comments and commit messages - split one big patch into three smaller ones Shaopeng He (6): fm10k: implement rx_descriptor_done function fm10k: setup rx queue interrupts for PF and VF fm10k: remove rx queue interrupts when dev stops fm10k: add rx queue interrupt en/dis functions fm10k: make sure default VID available in dev_init l3fwd-power: fix a memory leak for non-ip packet doc/guides/nics/fm10k.rst| 7 ++ doc/guides/rel_notes/release_2_3.rst | 8 ++ drivers/net/fm10k/fm10k.h| 6 ++ drivers/net/fm10k/fm10k_ethdev.c | 174 --- drivers/net/fm10k/fm10k_rxtx.c | 25 + examples/l3fwd-power/main.c | 3 +- 6 files changed, 211 insertions(+), 12 deletions(-) -- 1.9.3
[dpdk-dev] [PATCH v5 1/6] fm10k: implement rx_descriptor_done function
rx_descriptor_done is used by interrupt mode example application (l3fwd-power) to check rxd DD bit to decide the RX trend, then l3fwd-power will adjust the cpu frequency according to the result. v5 change: - fix a wrong error message Signed-off-by: Shaopeng He --- drivers/net/fm10k/fm10k.h| 3 +++ drivers/net/fm10k/fm10k_ethdev.c | 1 + drivers/net/fm10k/fm10k_rxtx.c | 25 + 3 files changed, 29 insertions(+) diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h index cd38af2..e2f677a 100644 --- a/drivers/net/fm10k/fm10k.h +++ b/drivers/net/fm10k/fm10k.h @@ -345,6 +345,9 @@ uint16_t fm10k_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t fm10k_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); +int +fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset); + uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index e4aed94..d39c33b 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -2435,6 +2435,7 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = { .rx_queue_release = fm10k_rx_queue_release, .tx_queue_setup = fm10k_tx_queue_setup, .tx_queue_release = fm10k_tx_queue_release, + .rx_descriptor_done = fm10k_dev_rx_descriptor_done, .reta_update= fm10k_reta_update, .reta_query = fm10k_reta_query, .rss_hash_update= fm10k_rss_hash_update, diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c index e958865..0002f09 100644 --- a/drivers/net/fm10k/fm10k_rxtx.c +++ b/drivers/net/fm10k/fm10k_rxtx.c @@ -369,6 +369,31 @@ fm10k_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, return nb_rcv; } +int +fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset) +{ + volatile union fm10k_rx_desc *rxdp; + struct fm10k_rx_queue *rxq = rx_queue; + uint16_t desc; + int ret; + + if (unlikely(offset >= rxq->nb_desc)) { + PMD_DRV_LOG(ERR, "Invalid RX descriptor offset %u", offset); + return 0; + } + + desc = rxq->next_dd + offset; + if (desc >= rxq->nb_desc) + desc -= rxq->nb_desc; + + rxdp = &rxq->hw_ring[desc]; + + ret = !!(rxdp->w.status & + rte_cpu_to_le_16(FM10K_RXD_STATUS_DD)); + + return ret; +} + static inline void tx_free_descriptors(struct fm10k_tx_queue *q) { uint16_t next_rs, count = 0; -- 1.9.3
[dpdk-dev] [PATCH v5 2/6] fm10k: setup rx queue interrupts for PF and VF
In interrupt mode, each rx queue can have one interrupt to notify the up layer application when packets are available in that queue. Some queues also can share one interrupt. Currently, fm10k needs one separate interrupt for mailbox. So, only those drivers which support multiple interrupt vectors e.g. vfio-pci can work in fm10k interrupt mode. This patch uses the RXINT/INT_MAP registers to map interrupt causes (rx queue and other events) to vectors, and enable these interrupts through kernel drivers like vfio-pci. v5 changes: - add more clean up when memory allocation fails - split line over 80 characters to 2 lines - update interrupt mode limitation in fm10k.rst v4 change: - update release note inside the patch v3 change: - macro renaming according to the EAL change v2 changes: - split one big patch into three smaller ones - reword some comments and commit messages Signed-off-by: Shaopeng He --- doc/guides/nics/fm10k.rst| 7 +++ doc/guides/rel_notes/release_2_3.rst | 2 + drivers/net/fm10k/fm10k.h| 3 + drivers/net/fm10k/fm10k_ethdev.c | 105 +++ 4 files changed, 106 insertions(+), 11 deletions(-) diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst index 4206b7f..dc5cb6e 100644 --- a/doc/guides/nics/fm10k.rst +++ b/doc/guides/nics/fm10k.rst @@ -65,3 +65,10 @@ The FM1 family of NICS support a maximum of a 15K jumbo frame. The value is fixed and cannot be changed. So, even when the ``rxmode.max_rx_pkt_len`` member of ``struct rte_eth_conf`` is set to a value lower than 15364, frames up to 15364 bytes can still reach the host interface. + +Interrupt mode +~ + +The FM1 family of NICS need one separate interrupt for mailbox. So only +drivers which support multiple interrupt vectors e.g. vfio-pci can work +for fm10k interrupt mode. diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..2cb5ebd 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -4,6 +4,8 @@ DPDK Release 2.3 New Features +* **Added fm10k Rx interrupt support.** + Resolved Issues --- diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h index e2f677a..770d6ba 100644 --- a/drivers/net/fm10k/fm10k.h +++ b/drivers/net/fm10k/fm10k.h @@ -129,6 +129,9 @@ #define RTE_FM10K_TX_MAX_FREE_BUF_SZ64 #define RTE_FM10K_DESCS_PER_LOOP4 +#define FM10K_MISC_VEC_ID RTE_INTR_VEC_ZERO_OFFSET +#define FM10K_RX_VEC_START RTE_INTR_VEC_RXTX_OFFSET + #define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \ ETH_TXQ_FLAGS_NOOFFLOADS) diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index d39c33b..583335a 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -54,6 +54,8 @@ /* Number of chars per uint32 type */ #define CHARS_PER_UINT32 (sizeof(uint32_t)) #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1) +/* default 1:1 map from queue ID to interrupt vector ID */ +#define Q2V(dev, queue_id) (dev->pci_dev->intr_handle.intr_vec[queue_id]) static void fm10k_close_mbx_service(struct fm10k_hw *hw); static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev); @@ -109,6 +111,8 @@ struct fm10k_xstats_name_off fm10k_hw_stats_tx_q_strings[] = { #define FM10K_NB_XSTATS (FM10K_NB_HW_XSTATS + FM10K_MAX_QUEUES_PF * \ (FM10K_NB_RX_Q_XSTATS + FM10K_NB_TX_Q_XSTATS)) +static int +fm10k_dev_rxq_interrupt_setup(struct rte_eth_dev *dev); static void fm10k_mbx_initlock(struct fm10k_hw *hw) @@ -687,6 +691,7 @@ static int fm10k_dev_rx_init(struct rte_eth_dev *dev) { struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle; int i, ret; struct fm10k_rx_queue *rxq; uint64_t base_addr; @@ -694,10 +699,25 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev) uint32_t rxdctl = FM10K_RXDCTL_WRITE_BACK_MIN_DELAY; uint16_t buf_size; - /* Disable RXINT to avoid possible interrupt */ - for (i = 0; i < hw->mac.max_queues; i++) + /* enable RXINT for interrupt mode */ + i = 0; + if (rte_intr_dp_is_en(intr_handle)) { + for (; i < dev->data->nb_rx_queues; i++) { + FM10K_WRITE_REG(hw, FM10K_RXINT(i), Q2V(dev, i)); + if (hw->mac.type == fm10k_mac_pf) + FM10K_WRITE_REG(hw, FM10K_ITR(Q2V(dev, i)), + FM10K_ITR_AUTOMASK | + FM10K_ITR_MASK_CLEAR); + else + FM10K_WRITE_REG(hw, FM10K_VFITR(Q2V(dev, i)), + FM10K_ITR_AUTOMASK | +
[dpdk-dev] [PATCH v5 3/6] fm10k: remove rx queue interrupts when dev stops
Previous dev_stop function stops the rx/tx queues. This patch adds logic to disable rx queue interrupt, clean the datapath event and queue/vec map. v5 changes: - remove one unnecessary NULL check for rte_free v2 changes: - split one big patch into three smaller ones Signed-off-by: Shaopeng He --- drivers/net/fm10k/fm10k_ethdev.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index 583335a..da78389 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -1127,6 +1127,8 @@ fm10k_dev_start(struct rte_eth_dev *dev) static void fm10k_dev_stop(struct rte_eth_dev *dev) { + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle; int i; PMD_INIT_FUNC_TRACE(); @@ -1138,6 +1140,24 @@ fm10k_dev_stop(struct rte_eth_dev *dev) if (dev->data->rx_queues) for (i = 0; i < dev->data->nb_rx_queues; i++) fm10k_dev_rx_queue_stop(dev, i); + + /* Disable datapath event */ + if (rte_intr_dp_is_en(intr_handle)) { + for (i = 0; i < dev->data->nb_rx_queues; i++) { + FM10K_WRITE_REG(hw, FM10K_RXINT(i), + 3 << FM10K_RXINT_TIMER_SHIFT); + if (hw->mac.type == fm10k_mac_pf) + FM10K_WRITE_REG(hw, FM10K_ITR(Q2V(dev, i)), + FM10K_ITR_MASK_SET); + else + FM10K_WRITE_REG(hw, FM10K_VFITR(Q2V(dev, i)), + FM10K_ITR_MASK_SET); + } + } + /* Clean datapath event and queue/vec mapping */ + rte_intr_efd_disable(intr_handle); + rte_free(intr_handle->intr_vec); + intr_handle->intr_vec = NULL; } static void -- 1.9.3
[dpdk-dev] [PATCH v5 4/6] fm10k: add rx queue interrupt en/dis functions
Interrupt mode framework has enable/disable functions for individual rx queue, this patch implements these two functions. v2 changes: - split one big patch into three smaller ones Signed-off-by: Shaopeng He --- drivers/net/fm10k/fm10k_ethdev.c | 33 + 1 file changed, 33 insertions(+) diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index da78389..06bfffd 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -2205,6 +2205,37 @@ fm10k_dev_disable_intr_vf(struct rte_eth_dev *dev) } static int +fm10k_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + + /* Enable ITR */ + if (hw->mac.type == fm10k_mac_pf) + FM10K_WRITE_REG(hw, FM10K_ITR(Q2V(dev, queue_id)), + FM10K_ITR_AUTOMASK | FM10K_ITR_MASK_CLEAR); + else + FM10K_WRITE_REG(hw, FM10K_VFITR(Q2V(dev, queue_id)), + FM10K_ITR_AUTOMASK | FM10K_ITR_MASK_CLEAR); + rte_intr_enable(&dev->pci_dev->intr_handle); + return 0; +} + +static int +fm10k_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id) +{ + struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); + + /* Disable ITR */ + if (hw->mac.type == fm10k_mac_pf) + FM10K_WRITE_REG(hw, FM10K_ITR(Q2V(dev, queue_id)), + FM10K_ITR_MASK_SET); + else + FM10K_WRITE_REG(hw, FM10K_VFITR(Q2V(dev, queue_id)), + FM10K_ITR_MASK_SET); + return 0; +} + +static int fm10k_dev_rxq_interrupt_setup(struct rte_eth_dev *dev) { struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private); @@ -2539,6 +2570,8 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = { .tx_queue_setup = fm10k_tx_queue_setup, .tx_queue_release = fm10k_tx_queue_release, .rx_descriptor_done = fm10k_dev_rx_descriptor_done, + .rx_queue_intr_enable = fm10k_dev_rx_queue_intr_enable, + .rx_queue_intr_disable = fm10k_dev_rx_queue_intr_disable, .reta_update= fm10k_reta_update, .reta_query = fm10k_reta_query, .rss_hash_update= fm10k_rss_hash_update, -- 1.9.3
[dpdk-dev] [PATCH v5 5/6] fm10k: make sure default VID available in dev_init
When PF establishes a connection with Switch Manager, it receives a logic port range from SM, and registers certain logic ports from that range, then a default VID will be send back from SM. This whole transaction needs to be finished in dev_init, otherwise, in dev_start the interrupt setting will be changed according to RX queue number, and probably will cause this transaction failed. Signed-off-by: Shaopeng He --- drivers/net/fm10k/fm10k_ethdev.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index 06bfffd..832a3fe 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -2817,6 +2817,21 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev) fm10k_mbx_unlock(hw); + /* Make sure default VID is ready before going forward. */ + if (hw->mac.type == fm10k_mac_pf) { + for (i = 0; i < MAX_QUERY_SWITCH_STATE_TIMES; i++) { + if (hw->mac.default_vid) + break; + /* Delay some time to acquire async port VLAN info. */ + rte_delay_us(WAIT_SWITCH_MSG_US); + } + + if (!hw->mac.default_vid) { + PMD_INIT_LOG(ERR, "default VID is not ready"); + return -1; + } + } + /* Add default mac address */ fm10k_MAC_filter_set(dev, hw->mac.addr, true, MAIN_VSI_POOL_NUMBER); -- 1.9.3
[dpdk-dev] [PATCH v5 6/6] l3fwd-power: fix a memory leak for non-ip packet
Previous l3fwd-power only processes IP and IPv6 packet, other packet's mbuf is not released, and causes a memory leak. This patch fixes this issue. v4 change: - update release note inside the patch Signed-off-by: Shaopeng He --- doc/guides/rel_notes/release_2_3.rst | 6 ++ examples/l3fwd-power/main.c | 3 ++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 2cb5ebd..fc871ab 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -25,6 +25,12 @@ Libraries Examples +* **l3fwd-power: Fixed memory leak for non-ip packet.** + + Fixed issue in l3fwd-power where, recieving other packet than + types of IP and IPv6, the mbuf was not released, and caused + a memory leak. + Other ~ diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index 828c18a..d9cd848 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -714,7 +714,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, /* We don't currently handle IPv6 packets in LPM mode. */ rte_pktmbuf_free(m); #endif - } + } else + rte_pktmbuf_free(m); } -- 1.9.3
[dpdk-dev] [PATCH] i40e: fix inverted check for ETH_TXQ_FLAGS_NOREFCOUNT
The no-refcount path was being taken without the application opting in to it. Reported-by: Mike Stolarchuk Signed-off-by: Rich Lane --- drivers/net/i40e/i40e_rxtx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 39d94ec..d0bdeb9 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -1762,7 +1762,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) for (i = 0; i < txq->tx_rs_thresh; i++) rte_prefetch0((txep + i)->mbuf); - if (!(txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT)) { + if (txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT) { for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) { rte_mempool_put(txep->mbuf->pool, txep->mbuf); txep->mbuf = NULL; -- 1.9.1
[dpdk-dev] [PATCH v3 0/2] provide rte_pktmbuf_alloc_bulk API and call it in vhost dequeue
v3 changes: move while after case 0 add context about duff's device and why we use while loop in the commit message v2 changes: unroll the loop in rte_pktmbuf_alloc_bulk to help the performance For symmetric rte_pktmbuf_free_bulk, if the app knows in its scenarios their mbufs are all simple mbufs, i.e meet the following requirements: * no multiple segments * not indirect mbuf * refcnt is 1 * belong to the same mbuf memory pool, it could directly call rte_mempool_put to free the bulk of mbufs, otherwise rte_pktmbuf_free_bulk has to call rte_pktmbuf_free to free the mbuf one by one. This patchset will not provide this symmetric implementation. Huawei Xie (2): mbuf: provide rte_pktmbuf_alloc_bulk API vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue lib/librte_mbuf/rte_mbuf.h| 49 +++ lib/librte_vhost/vhost_rxtx.c | 35 +++ 2 files changed, 71 insertions(+), 13 deletions(-) -- 1.8.1.4
[dpdk-dev] [PATCH v3 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API
v3 changes: move while after case 0 add context about duff's device and why we use while loop in the commit message v2 changes: unroll the loop a bit to help the performance rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs. There is related thread about this bulk API. http://dpdk.org/dev/patchwork/patch/4718/ Thanks to Konstantin's loop unrolling. Attached the wiki page about duff's device. It explains the performance optimization through loop unwinding, and also the most dramatic use of case label fall-through. https://en.wikipedia.org/wiki/Duff%27s_device In our implementation, we use while() loop rather than do{} while() loop because we could not assume count is strictly positive. Using while() loop saves one line of check if count is zero. Signed-off-by: Gerald Rogers Signed-off-by: Huawei Xie Acked-by: Konstantin Ananyev --- lib/librte_mbuf/rte_mbuf.h | 49 ++ 1 file changed, 49 insertions(+) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index f234ac9..3381c28 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -1336,6 +1336,55 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp) } /** + * Allocate a bulk of mbufs, initialize refcnt and reset the fields to default + * values. + * + * @param pool + *The mempool from which mbufs are allocated. + * @param mbufs + *Array of pointers to mbufs + * @param count + *Array size + * @return + * - 0: Success + */ +static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool, +struct rte_mbuf **mbufs, unsigned count) +{ + unsigned idx = 0; + int rc; + + rc = rte_mempool_get_bulk(pool, (void **)mbufs, count); + if (unlikely(rc)) + return rc; + + switch (count % 4) { + case 0: while (idx != count) { + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); + rte_mbuf_refcnt_set(mbufs[idx], 1); + rte_pktmbuf_reset(mbufs[idx]); + idx++; + case 3: + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); + rte_mbuf_refcnt_set(mbufs[idx], 1); + rte_pktmbuf_reset(mbufs[idx]); + idx++; + case 2: + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); + rte_mbuf_refcnt_set(mbufs[idx], 1); + rte_pktmbuf_reset(mbufs[idx]); + idx++; + case 1: + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); + rte_mbuf_refcnt_set(mbufs[idx], 1); + rte_pktmbuf_reset(mbufs[idx]); + idx++; + } + } + return 0; +} + +/** * Attach packet mbuf to another packet mbuf. * * After attachment we refer the mbuf we attached as 'indirect', -- 1.8.1.4
[dpdk-dev] [PATCH v3 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue
pre-allocate a bulk of mbufs instead of allocating one mbuf a time on demand Signed-off-by: Gerald Rogers Signed-off-by: Huawei Xie Acked-by: Konstantin Ananyev --- lib/librte_vhost/vhost_rxtx.c | 35 ++- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index bbf3fac..0faae58 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -576,6 +576,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, uint32_t i; uint16_t free_entries, entry_success = 0; uint16_t avail_idx; + uint8_t alloc_err = 0; + uint8_t seg_num; if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) { RTE_LOG(ERR, VHOST_DATA, @@ -609,6 +611,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n", dev->device_fh, free_entries); + + if (unlikely(rte_pktmbuf_alloc_bulk(mbuf_pool, + pkts, free_entries)) < 0) { + RTE_LOG(ERR, VHOST_DATA, + "Failed to bulk allocating %d mbufs\n", free_entries); + return 0; + } + /* Retrieve all of the head indexes first to avoid caching issues. */ for (i = 0; i < free_entries; i++) head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 1)]; @@ -621,9 +631,9 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, uint32_t vb_avail, vb_offset; uint32_t seg_avail, seg_offset; uint32_t cpy_len; - uint32_t seg_num = 0; + seg_num = 0; struct rte_mbuf *cur; - uint8_t alloc_err = 0; + desc = &vq->desc[head[entry_success]]; @@ -654,13 +664,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, vq->used->ring[used_idx].id = head[entry_success]; vq->used->ring[used_idx].len = 0; - /* Allocate an mbuf and populate the structure. */ - m = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(m == NULL)) { - RTE_LOG(ERR, VHOST_DATA, - "Failed to allocate memory for mbuf.\n"); - break; - } + prev = cur = m = pkts[entry_success]; seg_offset = 0; seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; cpy_len = RTE_MIN(vb_avail, seg_avail); @@ -668,8 +672,6 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0); seg_num++; - cur = m; - prev = m; while (cpy_len != 0) { rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset), (void *)((uintptr_t)(vb_addr + vb_offset)), @@ -761,16 +763,23 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, cpy_len = RTE_MIN(vb_avail, seg_avail); } - if (unlikely(alloc_err == 1)) + if (unlikely(alloc_err)) break; m->nb_segs = seg_num; - pkts[entry_success] = m; vq->last_used_idx++; entry_success++; } + if (unlikely(alloc_err)) { + uint16_t i = entry_success; + + m->nb_segs = seg_num; + for (; i < free_entries; i++) + rte_pktmbuf_free(pkts[entry_success]); + } + rte_compiler_barrier(); vq->used->idx += entry_success; /* Kick guest if required. */ -- 1.8.1.4
[dpdk-dev] [PATCH] i40e: fix inverted check for ETH_TXQ_FLAGS_NOREFCOUNT
> -Original Message- > From: Rich Lane [mailto:rich.lane at bigswitch.com] > Sent: Wednesday, December 23, 2015 4:08 PM > To: dev at dpdk.org > Cc: Zhang, Helin > Subject: [PATCH] i40e: fix inverted check for ETH_TXQ_FLAGS_NOREFCOUNT > > The no-refcount path was being taken without the application opting in to it. > > Reported-by: Mike Stolarchuk > Signed-off-by: Rich Lane Acked-by: Helin Zhang Thanks for the good catch!
[dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
I want to define a set of General tunneling APIs, which are used to accelarate tunneling packet processing in DPDK, In this RFC patch set, I wll explain my idea using some codes. 1. Using flow director offload to define a tunnel flow in a pair of queues. flow rule: src IP + dst IP + src port + dst port + tunnel ID (for VXLAN) For example: struct rte_eth_tunnel_conf{ .tunnel_type = VXLAN, .rx_queue = 1, .tx_queue = 1, .filter_type = 'src ip + dst ip + src port + dst port + tunnel id' .flow_tnl { .tunnel_type = VXLAN, .tunnel_id = 100, .remote_mac = 11.22.33.44.55.66, .ip_type = ipv4, .outer_ipv4.src_ip = 192.168.10.1 .outer_ipv4.dst_ip = 10.239.129.11 .src_port = 1000, .dst_port =2000 }; 2. Configure tunnel flow for a device and for a pair of queues. rte_eth_dev_tunnel_configure(0, &rte_eth_tunnel_conf); In this API, it will call RX decapsulation and TX encapsulation callback function if HW doesn't support encap/decap, and a space will be allocated for tunnel configuration and store a pointer to this new allocated space as dev->post_rx/tx_burst_cbs[].param. rte_eth_add_rx_callback(port_id, tunnel_conf.rx_queue, rte_eth_tunnel_decap, (void *)tunnel_conf); rte_eth_add_tx_callback(port_id, tunnel_conf.tx_queue, rte_eth_tunnel_encap, (void *)tunnel_conf) 3. Using rte_vxlan_decap_burst() to do decapsulation of tunneling packet. 4. Using rte_vxlan_encap_burst() to do encapsulation of tunneling packet. The 'src ip, dst ip, src port, dst port and tunnel ID" can be got from tunnel configuration. And SIMD is used to accelarate the operation. How to use these APIs, there is a example below: 1)at config phase dev_config(port, ...); tunnel_config(port,...); ... dev_start(port); ... rx_burst(port, rxq,... ); tx_burst(port, txq,...); 2)at transmitting packet phase The only outer src/dst MAC address need to be set for TX tunnel configuration in dev->post_tx_burst_cbs[].param. In this patch set, I have not finished all of codes, the purpose of sending patch set is that I would like to collect more comments and sugestions on this idea. Jijiang Liu (6): extend rte_eth_tunnel_flow define tunnel flow structure and APIs implement tunnel flow APIs define rte_vxlan_decap/encap implement rte_vxlan_decap/encap i40e tunnel configure drivers/net/i40e/i40e_ethdev.c | 41 + lib/librte_ether/libtunnel/rte_vxlan_opt.c | 251 lib/librte_ether/libtunnel/rte_vxlan_opt.h | 49 ++ lib/librte_ether/rte_eth_ctrl.h| 14 ++- lib/librte_ether/rte_ethdev.h | 28 +++ lib/librte_ether/rte_ethdev.c | 60 ++ 5 files changed, 440 insertions(+), 3 deletions(-) create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.c create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.h -- 1.7.7.6
[dpdk-dev] [RFC PATCH 1/6] rte_ether: extend rte_eth_tunnel_flow structure
The purpose of extending this structure is to support more tunnel filter conditions. Signed-off-by: Jijiang Liu --- lib/librte_ether/rte_eth_ctrl.h | 14 +++--- 1 files changed, 11 insertions(+), 3 deletions(-) diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h index ce224ad..39f52d9 100644 --- a/lib/librte_ether/rte_eth_ctrl.h +++ b/lib/librte_ether/rte_eth_ctrl.h @@ -494,9 +494,17 @@ enum rte_eth_fdir_tunnel_type { * NVGRE */ struct rte_eth_tunnel_flow { - enum rte_eth_fdir_tunnel_type tunnel_type; /**< Tunnel type to match. */ - uint32_t tunnel_id;/**< Tunnel ID to match. TNI, VNI... */ - struct ether_addr mac_addr;/**< Mac address to match. */ + enum rte_eth_tunnel_type tunnel_type; + uint64_t tunnel_id; /**< Tunnel ID to match. TNI, VNI... */ + struct ether_addr outer_src_mac; /* for TX */ + struct ether_addr outer_peer_mac; /* for TX */ + enum rte_tunnel_iptype outer_ip_type; /**< IP address type. */ + union { + struct rte_eth_ipv4_flow outer_ipv4; + struct rte_eth_ipv6_flow outer_ipv6; + } outer_ip_addr; + uint16_t dst_port; + uint16_t src_port; }; /** -- 1.7.7.6
[dpdk-dev] [RFC PATCH 3/6] rte_ether: implement tunnel config API
Signed-off-by: Jijiang Liu --- lib/librte_ether/rte_ethdev.c | 60 + 1 files changed, 60 insertions(+), 0 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index c3eed49..6725398 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -1004,6 +1004,66 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, return 0; } +int +rte_eth_dev_tunnel_configure(uint8_t port_id, +struct rte_eth_tunnel_conf *tunnel_conf) +{ + struct rte_eth_dev *dev; + struct rte_eth_dev_info dev_info; + int diag; + + /* This function is only safe when called from the primary process + * * in a multi-process setup*/ + RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); + + dev = &rte_eth_devices[port_id]; + + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP); + + /* +* * Check that the numbers of RX and TX queues are not greater +* * than the configured number of RX and TX queues supported by the +* * configured device. +* */ + (*dev->dev_ops->dev_infos_get)(dev, &dev_info); + if (tunnel_conf->rx_queue > dev->data->nb_rx_queues - 1) { + RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_rx_queues=%d > %d\n", + port_id, nb_rx_q, dev_info.max_rx_queues); + return -EINVAL; + } + + if (tunnel_conf->tx_queue > dev->data->nb_rx_queues -1 ) { + RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_tx_queues=%d > %d\n", + port_id, nb_tx_q, dev_info.max_tx_queues); + return -EINVAL; + } + + tunnel_conf->tunnel_flow = rte_zmalloc(NULL, + sizeof(struct rte_eth_tunnel_flow) + * tunnel_conf->nb_flow, 0); + + /* Copy the dev_conf parameter into the dev structure */ + memcpy(dev->data->dev_conf.tunnel_conf[tunnel_conf->rx_queue], + tunnel_conf, sizeof(struct rte_eth_tunnel_conf)); + + rte_eth_add_rx_callback(port_id, tunnel_conf->rx_queue, + rte_eth_tunnel_decap, (void *)tunnel_conf); + + rte_eth_add_tx_callback(port_id, tunnel_conf->tx_queue, + rte_eth_tunnel_encap, (void *)tunnel_conf) + + diag = (*dev->dev_ops->tunnel_configure)(dev); + if (diag != 0) { + RTE_PMD_DEBUG_TRACE("port%d dev_tunnel_configure = %d\n", + port_id, diag); + return diag; + } + + return 0; +} + static void rte_eth_dev_config_restore(uint8_t port_id) { -- 1.7.7.6
[dpdk-dev] [RFC PATCH 2/6] rte_ether: define tunnel flow structure and APIs
Add the struct 'rte_eth_tunnel_conf' and the tunnel configuration API. Signed-off-by: Jijiang Liu --- lib/librte_ether/rte_ethdev.h | 28 1 files changed, 28 insertions(+), 0 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index bada8ad..cb4d9a2 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -630,6 +630,18 @@ struct rte_eth_rxconf { uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */ }; +/** + * A structure used to configure tunnel flow of an Ethernet port. + */ +struct rte_eth_tunnel_conf { + uint16_t rx_queue; + uint16_t tx_queue; + uint16_t udp_tunnel_port; + uint16_t nb_flow; + uint16_t filter_type; + struct rte_eth_tunnel_flow *tunnel_flow; +}; + #define ETH_TXQ_FLAGS_NOMULTSEGS 0x0001 /**< nb_segs=1 for all mbufs */ #define ETH_TXQ_FLAGS_NOREFCOUNT 0x0002 /**< refcnt can be ignored */ #define ETH_TXQ_FLAGS_NOMULTMEMP 0x0004 /**< all bufs come from same mempool */ @@ -810,6 +822,7 @@ struct rte_eth_conf { #define DEV_RX_OFFLOAD_TCP_CKSUM 0x0008 #define DEV_RX_OFFLOAD_TCP_LRO 0x0010 #define DEV_RX_OFFLOAD_QINQ_STRIP 0x0020 +#define DEV_RX_OFFLOAD_TUNNEL_DECAP 0x0040 /** * TX offload capabilities of a device. @@ -1210,6 +1223,10 @@ typedef int (*eth_udp_tunnel_add_t)(struct rte_eth_dev *dev, typedef int (*eth_udp_tunnel_del_t)(struct rte_eth_dev *dev, struct rte_eth_udp_tunnel *tunnel_udp); + +typedef int (*eth_tunnel_flow_conf_t)(struct rte_eth_dev *dev, + struct rte_eth_tunnel_conf *tunnel_conf); + /**< @internal Delete tunneling UDP info */ typedef int (*eth_set_mc_addr_list_t)(struct rte_eth_dev *dev, @@ -1385,6 +1402,7 @@ struct eth_dev_ops { eth_set_vf_vlan_filter_t set_vf_vlan_filter; /**< Set VF VLAN filter */ eth_udp_tunnel_add_t udp_tunnel_add; eth_udp_tunnel_del_t udp_tunnel_del; + eth_tunnel_flow_conf_t tunnel_configure; eth_set_queue_rate_limit_t set_queue_rate_limit; /**< Set queue rate limit */ eth_set_vf_rate_limit_tset_vf_rate_limit; /**< Set VF rate limit */ /** Update redirection table. */ @@ -1821,6 +1839,16 @@ extern int rte_eth_dev_configure(uint8_t port_id, const struct rte_eth_conf *eth_conf); /** + * Configure an Ethernet device for tunnelling packet. + * + * @return + * - 0: Success, device configured. + *- <0: Error code returned by the driver configuration function. + */ +extern int rte_eth_dev_tunnel_configure(uint8_t port_id, + struct rte_eth_tunnel_conf *tunnel_conf); + +/** * Allocate and set up a receive queue for an Ethernet device. * * The function allocates a contiguous block of memory for *nb_rx_desc* -- 1.7.7.6
[dpdk-dev] [RFC PATCH 5/6] rte_ether: implement encap and decap APIs
Using SIMD instruction to accelarate encapsulation operation. Signed-off-by: Jijiang Liu --- lib/librte_ether/libtunnel/rte_vxlan_opt.c | 251 1 files changed, 251 insertions(+), 0 deletions(-) create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.c diff --git a/lib/librte_ether/libtunnel/rte_vxlan_opt.c b/lib/librte_ether/libtunnel/rte_vxlan_opt.c new file mode 100644 index 000..e59ed2c --- /dev/null +++ b/lib/librte_ether/libtunnel/rte_vxlan_opt.c @@ -0,0 +1,251 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "vxlan_opt.h" + +#ifndef __INTEL_COMPILER +#pragma GCC diagnostic ignored "-Wcast-qual" +#endif + +#pragma GCC diagnostic ignored "-Wstrict-aliasing" + +#define PORT_MIN49152 +#define PORT_MAX65535 +#define PORT_RANGE ((PORT_MAX - PORT_MIN) + 1) + +#define DUMMY_FOR_TEST +#define RTE_DEFAULT_VXLAN_PORT 4789 + +#define LOOP 4 +#define MAC_LEN6 +#define PREFIX ETHER_HDR_LEN + 4 +#define UDP_PRE_SZ (sizeof(struct udp_hdr) + sizeof(struct vxlan_hdr)) +#define IP_PRE_SZ (UDP_PRE_SZ + sizeof(struct ipv4_hdr)) +#define VXLAN_PKT_HDR_SIZE (IP_PRE_SZ + ETHER_HDR_LEN) + +#define VXLAN_SIZE sizeof(struct vxlan_hdr) +#define INNER_PRE_SZ (14 + 20 + 8 + 8) +#define DECAP_OFFSET (16 + 8 + 8) +#define DETECT_OFFSET 12 + +struct eth_pkt_info { + uint8_t l2_len; + uint16_t ethertype; + uint16_t l3_len; + uint16_t l4_proto; + uint16_t l4_len; +}; + +/* 16Bytes tx meta data */ +struct vxlan_tx_meta { + uint32_t sip; + uint32_t dip; + uint32_t vni; + uint16_t sport; +} __attribute__((__aligned__(16))); + + +/* Parse an IPv4 header to fill l3_len, l4_len, and l4_proto */ +static void +parse_ipv4(struct ipv4_hdr *ipv4_hdr, struct eth_pkt_info *info) +{ + struct tcp_hdr *tcp_hdr; + + info->l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4; + info->l4_proto = ipv4_hdr->next_proto_id; + + /* only fill l4_len for TCP, it's useful for TSO */ + if (info->l4_proto == IPPROTO_TCP) { + tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + info->l3_len); + info->l4_len = (tcp_hdr->data_off & 0xf0) >> 2; + } else + info->l4_len = 0; +} + +/* Parse an IPv6 header to fill l3_len, l4_len, and l4_proto */ +static void +parse_ipv6(struct ipv6_hdr *ipv6_hdr, struct eth_pkt_info *info) +{ + struct tcp_hdr *tcp_hdr; + + info->l3_len = sizeof(struct ipv6_hdr); + info->l4_proto = ipv6_hdr->proto; + + /* only fill l4_len for TCP, it's useful for TSO */ + if (info->l4_proto == IPPROTO_TCP) { + tcp_hdr = (struct tcp_hdr *)((char *)ipv6_hdr + info->l3_len); + info->l4_len = (tcp_hdr->data_off & 0xf0) >> 2; + } else + info->l4_len = 0; +} + +/* + * Parse an ethernet header to fill the ethertype, l2_len, l3_len and + * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan + * header. The l4_len argument is only set in case of TCP (useful for TSO). + */ +static void +parse_ethernet(struct ether_hdr *eth_hdr, struct eth_pkt_info *info) +{
[dpdk-dev] [RFC PATCH 6/6] driver/i40e: tunnel configure in i40e
Add i40e_udp_tunnel_flow_configre() to implement the configuration of flow rule with 'src IP, dst IP, src port, dst port and tunnel ID' using flow director. Signed-off-by: Jijiang Liu --- drivers/net/i40e/i40e_ethdev.c | 41 1 files changed, 41 insertions(+), 0 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 7e03a1f..7d8c8d7 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -469,6 +469,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = { .rss_hash_conf_get= i40e_dev_rss_hash_conf_get, .udp_tunnel_add = i40e_dev_udp_tunnel_add, .udp_tunnel_del = i40e_dev_udp_tunnel_del, + .tunnel_configure = i40e_dev_tunnel_configure, .filter_ctrl = i40e_dev_filter_ctrl, .rxq_info_get = i40e_rxq_info_get, .txq_info_get = i40e_txq_info_get, @@ -6029,6 +6030,46 @@ i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev, return ret; } +static int +i40e_udp_tunnel_flow_configre(struct i40e_pf *pf, rte_eth_tunnel_conf *tunnel_conf) +{ + int idx, ret; + uint8_t filter_idx; + struct i40e_hw *hw = I40E_PF_TO_HW(pf); + + /* set filter with src IP + dst IP + src port + dst port + tunnel id*/ + /* flow director setting */ + + return 0; +} + +/* Add UDP tunneling port */ +static int +i40e_dev_tunnel_conf(struct rte_eth_dev *dev, +struct rte_eth_tunnel_conf *tunnel_conf) +{ + int ret = 0; + struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); + + if (tunnel_tunnel == NULL) + return -EINVAL; + + switch (udp_tunnel->prot_type) { + case RTE_TUNNEL_TYPE_VXLAN: + case RTE_TUNNEL_TYPE_GENEVE: + case RTE_TUNNEL_TYPE_TEREDO: + ret = i40e_udp_tunnel_flow_configure(pf, tunnel_conf); + break; + + default: + PMD_DRV_LOG(ERR, "Invalid tunnel type"); + ret = -1; + break; + } + + return ret; +} + /* Calculate the maximum number of contiguous PF queues that are configured */ static int i40e_pf_calc_configured_queues_num(struct i40e_pf *pf) -- 1.7.7.6
[dpdk-dev] [RFC PATCH 4/6] rte_ether: define rte_eth_vxlan_decap and rte_eth_vxlan_encap
This function parameters should be the same as callback function (rte_rx/tx_callback_fn). But we can redefine some parameters as 'unused'. Signed-off-by: Jijiang Liu --- lib/librte_ether/libtunnel/rte_vxlan_opt.h | 49 1 files changed, 49 insertions(+), 0 deletions(-) create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.h diff --git a/lib/librte_ether/libtunnel/rte_vxlan_opt.h b/lib/librte_ether/libtunnel/rte_vxlan_opt.h new file mode 100644 index 000..d9412fc --- /dev/null +++ b/lib/librte_ether/libtunnel/rte_vxlan_opt.h @@ -0,0 +1,49 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_VXLAN_OPT_H_ +#define _RTE_VXLAN_OPT_H_ + +extern void rte_vxlan_encap_burst (uint8_t port, uint16_t queue, + struct rte_mbuf *pkts[], + uint16_t nb_pkts, + uint16_t max_pkts, + void *user_param); + +extern uint16_t rte_vxlan_decap_burst(uint8_t port, + uint16_t queue, + struct rte_mbuf *pkts[], + uint16_t nb_pkts, + void *user_param); + +#endif /* _RTE_VXLAN_OPT_H_ */ -- 1.7.7.6
[dpdk-dev] [PATCH] doc: add Vector FM10K introductions
From: "Chen Jing D(Mark)" Add introductions on how to enable Vector FM10K Rx/Tx functions, the preconditions and assumptions on Rx/Tx configuration parameters. The new content also lists the limitations of vector, so app/customer can do better to select best Rx/Tx functions. Signed-off-by: Chen Jing D(Mark) --- doc/guides/nics/fm10k.rst | 89 + 1 files changed, 89 insertions(+), 0 deletions(-) diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst index 4206b7f..54b761c 100644 --- a/doc/guides/nics/fm10k.rst +++ b/doc/guides/nics/fm10k.rst @@ -34,6 +34,95 @@ FM10K Poll Mode Driver The FM10K poll mode driver library provides support for the Intel FM1 (FM10K) family of 40GbE/100GbE adapters. +Vector PMD for FM10K + +Vector PMD uses Intel? SIMD instructions to optimize packet I/O. +It improves load/store bandwidth efficiency of L1 data cache by using a wider +SSE/AVX register 1 (1). +The wider register gives space to hold multiple packet buffers so as to save +instruction number when processing bulk of packets. + +There is no change to PMD API. The RX/TX handler are the only two entries for +vPMD packet I/O. They are transparently registered at runtime RX/TX execution +if all condition checks pass. + +1. To date, only an SSE version of FM10K vPMD is available. +To ensure that vPMD is in the binary code, ensure that the option +CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y is in the configure file. + +Some constraints apply as pre-conditions for specific optimizations on bulk +packet transfers. The following sections explain RX and TX constraints in the +vPMD. + +RX Constraints +~~ + +Prerequisites and Pre-conditions + +Number of descriptor ring must be power of 2. This is the assumptions for +Vector RX. With this pre-condition, ring pointer can easily scroll back to head +after hitting tail without conditional check. Besides that, Vector RX can use +it to do bit mask by ``ring_size - 1``. + +Feature not Supported by Vector RX PMD +^^ +Some features are not supported when trying to increase the throughput in vPMD. +They are: + +* IEEE1588 + +* FDIR + +* Header split + +* RX checksum offload + +Other features are supported using optional MACRO configuration. They include: + +* HW VLAN strip + +* L3/L4 packet type + +To enabled by RX_OLFLAGS (RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y) + +To guarantee the constraint, configuration flags in dev_conf.rxmode will be +checked: + +* hw_vlan_extend + +* hw_ip_checksum + +* header_split + +* fdir_conf->mode + +RX Burst Size +^ + +As vPMD is focused on high throughput, which processes 4 packets at a time. +So it assumes that the RX burst should be greater than 4 per burst. It returns +zero if using nb_pkt < 4 in the receive handler. If nb_pkt is not multiple of +4, a floor alignment will be applied. + +TX Constraint +~ + +Feature not Supported by TX Vector PMD +^^ + +TX vPMD only works when txq_flags is set to FM10K_SIMPLE_TX_FLAG. +This means that it does not support TX multi-segment, VLAN offload and TX csum +offload. The following MACROs are used for these three features: + +* ETH_TXQ_FLAGS_NOMULTSEGS + +* ETH_TXQ_FLAGS_NOVLANOFFL + +* ETH_TXQ_FLAGS_NOXSUMSCTP + +* ETH_TXQ_FLAGS_NOXSUMUDP + +* ETH_TXQ_FLAGS_NOXSUMTCP Limitations --- -- 1.7.7.6
[dpdk-dev] [PATCH] hash: fix CRC32c computation
Is it suitable to put so many code in commit log? Thanks, Michael On 12/22/2015 5:36 PM, Didier Pallard wrote: > As demonstrated by the following code, CRC32c computation is not valid > when buffer length is not a multiple of 4 bytes: > (Output obtained by code below) > > CRC of 1 NULL bytes expected: 0x527d5351 > soft: 527d5351 > rte accelerated: 48674bc7 > rte soft: 48674bc7 > CRC of 2 NULL bytes expected: 0xf16177d2 > soft: f16177d2 > rte accelerated: 48674bc7 > rte soft: 48674bc7 > CRC of 2x1 NULL bytes expected: 0xf16177d2 > soft: f16177d2 > rte accelerated: 8c28b28a > rte soft: 8c28b28a > CRC of 3 NULL bytes expected: 0x6064a37a > soft: 6064a37a > rte accelerated: 48674bc7 > rte soft: 48674bc7 > CRC of 4 NULL bytes expected: 0x48674bc7 > soft: 48674bc7 > rte accelerated: 48674bc7 > rte soft: 48674bc7 > > Values returned by rte_hash_crc functions does not match the one > computed by a trivial crc32c implementation. > > ARM code is a guess, it is not tested, neither compiled. > > code showing the problem: > > uint8_t null_test[32] = {0}; > > static uint32_t crc32c_trivial(uint8_t *buffer, uint32_t length, uint32_t crc) > { > uint32_t i, j; > for (i = 0; i < length; ++i) > { > crc = crc ^ buffer[i]; > for (j = 0; j < 8; j++) > crc = (crc >> 1) ^ 0x8000 ^ ((~crc & 1) * 0x82f63b78); > } > return crc; > } > > void hash_test(void); > void hash_test(void) > { > printf("CRC of 1 nul byte expected: 0x527d5351\n"); > printf("soft: %08x\n", crc32c_trivial(null_test, 1, 0)); > rte_hash_crc_init_alg(); > printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 1, > 0x)); > rte_hash_crc_set_alg(CRC32_SW); > printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 1, 0x)); > > printf("CRC of 2 nul bytes expected: 0xf16177d2\n"); > printf("soft: %08x\n", crc32c_trivial(null_test, 2, 0)); > rte_hash_crc_init_alg(); > printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 2, > 0x)); > rte_hash_crc_set_alg(CRC32_SW); > printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 2, 0x)); > > printf("CRC of 2x1 nul bytes expected: 0xf16177d2\n"); > printf("soft: %08x\n", crc32c_trivial(null_test, 1, > crc32c_trivial(null_test, 1, 0))); > rte_hash_crc_init_alg(); > printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 1, > rte_hash_crc(null_test, 1, 0x))); > rte_hash_crc_set_alg(CRC32_SW); > printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 1, > rte_hash_crc(null_test, 1, 0x))); > > printf("CRC of 3 nul bytes expected: 0x6064a37a\n"); > printf("soft: %08x\n", crc32c_trivial(null_test, 3, 0)); > rte_hash_crc_init_alg(); > printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 3, > 0x)); > rte_hash_crc_set_alg(CRC32_SW); > printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 3, 0x)); > > printf("CRC of 4 nul bytes expected: 0x48674bc7\n"); > printf("soft: %08x\n", crc32c_trivial(null_test, 4, 0)); > rte_hash_crc_init_alg(); > printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 4, > 0x)); > rte_hash_crc_set_alg(CRC32_SW); > printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 4, 0x)); > } > > Signed-off-by: Didier Pallard > Acked-by: David Marchand > --- > lib/librte_hash/rte_crc_arm64.h | 64 > lib/librte_hash/rte_hash_crc.h | 125 > +++- > 2 files changed, 162 insertions(+), 27 deletions(-) > > diff --git a/lib/librte_hash/rte_crc_arm64.h b/lib/librte_hash/rte_crc_arm64.h > index 02e26bc..44ef460 100644 > --- a/lib/librte_hash/rte_crc_arm64.h > +++ b/lib/librte_hash/rte_crc_arm64.h > @@ -50,6 +50,28 @@ extern "C" { > #include > > static inline uint32_t > +crc32c_arm64_u8(uint8_t data, uint32_t init_val) > +{ > + asm(".arch armv8-a+crc"); > + __asm__ volatile( > + "crc32cb %w[crc], %w[crc], %b[value]" > + : [crc] "+r" (init_val) > + : [value] "r" (data)); > + return init_val; > +} > + > +static inline uint32_t > +crc32c_arm64_u16(uint16_t data, uint32_t init_val) > +{ > + asm(".arch armv8-a+crc"); > + __asm__ volatile( > + "crc32ch %w[crc], %w[crc], %h[value]" > + : [crc] "+r" (init_val) > + : [value] "r" (data)); > + return init_val; > +} > + > +static inline uint32_t > crc32c_arm64_u32(uint32_t data, uint32_t init_val) > { > asm(".arch armv8-a+crc"); > @@ -103,6 +125,48 @@ rte_hash_crc_init_alg(void) > } > > /** > + * Use single crc32 instruction to perform a hash on a 1 byte value. > + * Fall back to software crc32 implementation in case arm64 crc intrinsics is
[dpdk-dev] [PATCH] virtio: fix crashes in virtio stats functions
This initialisation of nb_rx_queues and nb_tx_queues has been removed from eth_virtio_dev_init. The nb_rx_queues and nb_tx_queues were being initialised in eth_virtio_dev_init before the tx_queues and rx_queues arrays were allocated. The arrays are allocated when the ethdev port is configured and the nb_tx_queues and nb_rx_queues are initialised. If any of the following functions were called before the ethdev port was configured there was a segmentation fault because rx_queues and tx_queues were NULL: rte_eth_stats_get rte_eth_stats_reset rte_eth_xstats_get rte_eth_xstats_reset Fixes: 823ad647950a ("virtio: support multiple queues") Signed-off-by: Bernard Iremonger --- drivers/net/virtio/virtio_ethdev.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index d928339..5ef0752 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1378,9 +1378,6 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev) hw->max_tx_queues = 1; } - eth_dev->data->nb_rx_queues = hw->max_rx_queues; - eth_dev->data->nb_tx_queues = hw->max_tx_queues; - PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d hw->max_tx_queues=%d", hw->max_rx_queues, hw->max_tx_queues); PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x", -- 2.6.3
[dpdk-dev] [PATCH 0/8] bonding: fixes and enhancements
Hi Stephen, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen > Hemminger > Sent: Friday, December 4, 2015 5:14 PM > To: Doherty, Declan > Cc: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 0/8] bonding: fixes and enhancements > > These are bug fixes and some small enhancements to allow bonding to work > with external control (teamd). Please consider integrating these into DPDK > 2.2 > > Eric Kinzie (8): > bond: use existing enslaved device queues > bond mode 4: copy entire config structure > bond mode 4: do not ignore multicast > bond mode 4: allow external state machine > bond: active slaves with no primary > bond: handle slaves with fewer queues than bonding device > bond: per-slave intermediate rx ring > bond: do not activate slave twice > > app/test/test_link_bonding_mode4.c| 7 +- > drivers/net/bonding/rte_eth_bond_8023ad.c | 174 > + > drivers/net/bonding/rte_eth_bond_8023ad.h | 44 + > drivers/net/bonding/rte_eth_bond_8023ad_private.h | 2 + > drivers/net/bonding/rte_eth_bond_api.c| 48 - > drivers/net/bonding/rte_eth_bond_pmd.c| 217 > ++ > drivers/net/bonding/rte_eth_bond_private.h| 9 +- > drivers/net/bonding/rte_eth_bond_version.map | 6 + > 8 files changed, 462 insertions(+), 45 deletions(-) > > -- > 2.1.4 Patches 6 and 7 of this patchset do not apply successfully to DPDK 2.2, a rebase is probably needed. It might be better to split this patchset into a fixes patchset and a new feature patchset. Regards, Bernard.
[dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
Hi Jijang, I like an idea of tunnel API very much. I have a few questions. 1. I see that you have only i40e support due to lack of HW tunneling support in other NICs. I don't see a way how do you want to handle tunneling requests for NICs without HW offload. I think that we should have one common function for sending tunneled packets but the initialization should check the NIC capabilities and call some registered function making tunneling in SW in case of lack of HW support. I know that making tunnel is very time consuming process, but it makes an API more generic. Similar only 3 protocols are supported by i40e by HW and we can imagine about 40 or more different tunnels working with this NIC. Making the SW implementation we could support missing tunnels even for i40e. 2. I understand that we need RX HW queue defined in struct rte_eth_tunnel_conf but why tx_queue is necessary?. As I know i40e HW we can set tunneled packet descriptors in any HW queue and receive only on one specific queue. 3. I see a similar problem with receiving tunneled packets on the single queue only. I know that some NICs like fm10k could make hashing on packets and push same tunnel to many queues. Maybe we should support such RSS like feature in the design also. I know that it is not supported by i40e but it is good to have a more flexible API design. 4. In your implementation you are assuming the there is one tunnel configured per DPDK interface rte_eth_dev_tunnel_configure(uint8_t port_id, +struct rte_eth_tunnel_conf *tunnel_conf) The sense of tunnel is lack of interfaces in the system because number of possible VLANs is too small (4095). In the DPDK we have only one tunnel per physical port what is useless even with such big acceleration provided with i40e. In normal use cases there is a need for 10,000s of tunnels per interface. Even for Vxlan we have 24 bits for tunnel definition I think that we need a special API for sending like rte_eth_dev_tunnel_send_burst where we will provide some tunnel number allocated by rte_eth_dev_tunnel_configure to avoid setting the tunnel specific information separately in each descriptor . Same on RX we should provide in struct rte_eth_tunnel_conf the callback functions that will make some specific action on received tunnel that could be pushing packet to the user ring or setting the tunnel information in RX descriptor or somewhat else. 5. I see that you have implementations for VXLAN,TEREDO, and GENEVE tunnels in i40e drivers. I could find the implementation for VXLAN encap/decap. Are all files in the patch present? 6. What about with QinQ HW tunneling also supported by i40e HW. I know that the implementation is present in different place but why not include QinQ as additional tunnel. It would be very nice feature to have all tunnels API in single place. Regards, Mirek > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jijiang Liu > Sent: Wednesday, December 23, 2015 9:50 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs > > I want to define a set of General tunneling APIs, which are used to > accelarate tunneling packet processing in DPDK, > In this RFC patch set, I wll explain my idea using some codes. > > 1. Using flow director offload to define a tunnel flow in a pair of queues. > > flow rule: src IP + dst IP + src port + dst port + tunnel ID (for VXLAN) > > For example: > struct rte_eth_tunnel_conf{ > .tunnel_type = VXLAN, > .rx_queue = 1, > .tx_queue = 1, > .filter_type = 'src ip + dst ip + src port + dst port + tunnel id' > .flow_tnl { > .tunnel_type = VXLAN, > .tunnel_id = 100, > .remote_mac = 11.22.33.44.55.66, > .ip_type = ipv4, > .outer_ipv4.src_ip = 192.168.10.1 > .outer_ipv4.dst_ip = 10.239.129.11 > .src_port = 1000, > .dst_port =2000 > }; > > 2. Configure tunnel flow for a device and for a pair of queues. > > rte_eth_dev_tunnel_configure(0, &rte_eth_tunnel_conf); > > In this API, it will call RX decapsulation and TX encapsulation callback > function if HW doesn't support encap/decap, and > a space will be allocated for tunnel configuration and store a pointer to this > new allocated space as dev->post_rx/tx_burst_cbs[].param. > > rte_eth_add_rx_callback(port_id, tunnel_conf.rx_queue, > rte_eth_tunnel_decap, (void *)tunnel_conf); > rte_eth_add_tx_callback(port_id, tunnel_conf.tx_queue, > rte_eth_tunnel_encap, (void *)tunnel_conf) > > 3. Using rte_vxlan_decap_burst() to do decapsulation of tunneling packet. > > 4. Using rte_vxlan_encap_burst() to do encapsulation of tunneling packet. >The 'src ip, dst ip, src port, dst port and tunnel ID" can be got from > tunnel > configuration. >And SIMD is used to accelarate the operation. > > How to
[dpdk-dev] VFIO no-iommu
Hi Alex, > I've re-posted the unified patch upstream and it should start showing up in > the next linux-next build. ?I expect the dpdk code won't be merged until > after this gets back into a proper kernel, but could we get the dpdk > modifications posted as rfc for others looking to try it? I have already posted a patch that should work with No-IOMMU. http://dpdk.org/dev/patchwork/patch/9619/ Apologies for not CC-ing you. I too would be interested to know if other people are having any issues with the patch. Thanks, Anatoly
[dpdk-dev] [PATCH v3 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue
> > + if (unlikely(alloc_err)) { > + uint16_t i = entry_success; > + > + m->nb_segs = seg_num; > + for (; i < free_entries; i++) > + rte_pktmbuf_free(pkts[entry_success]); -> > rte_pktmbuf_free(pkts[i]); > + } > + > rte_compiler_barrier(); > vq->used->idx += entry_success; > /* Kick guest if required. */ >
[dpdk-dev] [PATCH] hash: fix CRC32c computation
Le 23 d?c. 2015 10:12, "Qiu, Michael" a ?crit : > > Is it suitable to put so many code in commit log? It is more explicit than a text/comment. I do not think it should be maintained code. > > Thanks, > Michael > On 12/22/2015 5:36 PM, Didier Pallard wrote: > > As demonstrated by the following code, CRC32c computation is not valid > > when buffer length is not a multiple of 4 bytes: > > (Output obtained by code below) > > > > CRC of 1 NULL bytes expected: 0x527d5351 > > soft: 527d5351 > > rte accelerated: 48674bc7 > > rte soft: 48674bc7 > > CRC of 2 NULL bytes expected: 0xf16177d2 > > soft: f16177d2 > > rte accelerated: 48674bc7 > > rte soft: 48674bc7 > > CRC of 2x1 NULL bytes expected: 0xf16177d2 > > soft: f16177d2 > > rte accelerated: 8c28b28a > > rte soft: 8c28b28a > > CRC of 3 NULL bytes expected: 0x6064a37a > > soft: 6064a37a > > rte accelerated: 48674bc7 > > rte soft: 48674bc7 > > CRC of 4 NULL bytes expected: 0x48674bc7 > > soft: 48674bc7 > > rte accelerated: 48674bc7 > > rte soft: 48674bc7 > > > > Values returned by rte_hash_crc functions does not match the one > > computed by a trivial crc32c implementation. > > > > ARM code is a guess, it is not tested, neither compiled. > > > > code showing the problem: > > > > uint8_t null_test[32] = {0}; > > > > static uint32_t crc32c_trivial(uint8_t *buffer, uint32_t length, uint32_t crc) > > { > > uint32_t i, j; > > for (i = 0; i < length; ++i) > > { > > crc = crc ^ buffer[i]; > > for (j = 0; j < 8; j++) > > crc = (crc >> 1) ^ 0x8000 ^ ((~crc & 1) * 0x82f63b78); > > } > > return crc; > > } > > > > void hash_test(void); > > void hash_test(void) > > { > > printf("CRC of 1 nul byte expected: 0x527d5351\n"); > > printf("soft: %08x\n", crc32c_trivial(null_test, 1, 0)); > > rte_hash_crc_init_alg(); > > printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 1, 0x)); > > rte_hash_crc_set_alg(CRC32_SW); > > printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 1, 0x)); > > > > printf("CRC of 2 nul bytes expected: 0xf16177d2\n"); > > printf("soft: %08x\n", crc32c_trivial(null_test, 2, 0)); > > rte_hash_crc_init_alg(); > > printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 2, 0x)); > > rte_hash_crc_set_alg(CRC32_SW); > > printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 2, 0x)); > > > > printf("CRC of 2x1 nul bytes expected: 0xf16177d2\n"); > > printf("soft: %08x\n", crc32c_trivial(null_test, 1, crc32c_trivial(null_test, 1, 0))); > > rte_hash_crc_init_alg(); > > printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 1, rte_hash_crc(null_test, 1, 0x))); > > rte_hash_crc_set_alg(CRC32_SW); > > printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 1, rte_hash_crc(null_test, 1, 0x))); > > > > printf("CRC of 3 nul bytes expected: 0x6064a37a\n"); > > printf("soft: %08x\n", crc32c_trivial(null_test, 3, 0)); > > rte_hash_crc_init_alg(); > > printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 3, 0x)); > > rte_hash_crc_set_alg(CRC32_SW); > > printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 3, 0x)); > > > > printf("CRC of 4 nul bytes expected: 0x48674bc7\n"); > > printf("soft: %08x\n", crc32c_trivial(null_test, 4, 0)); > > rte_hash_crc_init_alg(); > > printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 4, 0x)); > > rte_hash_crc_set_alg(CRC32_SW); > > printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 4, 0x)); > > } > > > > Signed-off-by: Didier Pallard > > Acked-by: David Marchand > > --- > > lib/librte_hash/rte_crc_arm64.h | 64 > > lib/librte_hash/rte_hash_crc.h | 125 +++- > > 2 files changed, 162 insertions(+), 27 deletions(-) > > > > diff --git a/lib/librte_hash/rte_crc_arm64.h b/lib/librte_hash/rte_crc_arm64.h > > index 02e26bc..44ef460 100644 > > --- a/lib/librte_hash/rte_crc_arm64.h > > +++ b/lib/librte_hash/rte_crc_arm64.h > > @@ -50,6 +50,28 @@ extern "C" { > > #include > > > > static inline uint32_t > > +crc32c_arm64_u8(uint8_t data, uint32_t init_val) > > +{ > > + asm(".arch armv8-a+crc"); > > + __asm__ volatile( > > + "crc32cb %w[crc], %w[crc], %b[value]" > > + : [crc] "+r" (init_val) > > + : [value] "r" (data)); > > + return init_val; > > +} > > + > > +static inline uint32_t > > +crc32c_arm64_u16(uint16_t data, uint32_t init_val) > > +{ > > + asm(".arch armv8-a+crc"); > > + __asm__ volatile( > > + "crc32ch %w[crc], %w[crc], %h[value]" > > + : [crc] "+r" (init_val) > > + : [value] "r" (data)); > > + retur
[dpdk-dev] [PATCH v3 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue
On 12/23/2015 7:25 PM, linhaifeng wrote: > >> >> +if (unlikely(alloc_err)) { >> +uint16_t i = entry_success; >> + >> +m->nb_segs = seg_num; >> +for (; i < free_entries; i++) >> +rte_pktmbuf_free(pkts[entry_success]); -> >> rte_pktmbuf_free(pkts[i]); >> +} >> + >> rte_compiler_barrier(); >> vq->used->idx += entry_success; >> /* Kick guest if required. */ Very sorry for silly typo. Thanks! >> > >
[dpdk-dev] [PATCH 1/3] librte_ether: remove RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET
Macros RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET are blocking the secondary process from using the APIs. API access should be given to both secondary and primary. Fix minor checkpath issues in rte_ethdev.h Reported-by: Sean Harte Signed-off-by: Reshma Pattan --- lib/librte_ether/rte_ethdev.c | 50 +-- lib/librte_ether/rte_ethdev.h | 20 - 2 files changed, 11 insertions(+), 59 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index ed971b4..5849102 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -711,10 +711,6 @@ rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t rx_queue_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -741,10 +737,6 @@ rte_eth_dev_rx_queue_stop(uint8_t port_id, uint16_t rx_queue_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -771,10 +763,6 @@ rte_eth_dev_tx_queue_start(uint8_t port_id, uint16_t tx_queue_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -801,10 +789,6 @@ rte_eth_dev_tx_queue_stop(uint8_t port_id, uint16_t tx_queue_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -874,10 +858,6 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, struct rte_eth_dev_info dev_info; int diag; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); if (nb_rx_q > RTE_MAX_QUEUES_PER_PORT) { @@ -1059,10 +1039,6 @@ rte_eth_dev_start(uint8_t port_id) struct rte_eth_dev *dev; int diag; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -1096,10 +1072,6 @@ rte_eth_dev_stop(uint8_t port_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_RET(); - RTE_ETH_VALID_PORTID_OR_RET(port_id); dev = &rte_eth_devices[port_id]; @@ -1121,10 +1093,6 @@ rte_eth_dev_set_link_up(uint8_t port_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -1138,10 +1106,6 @@ rte_eth_dev_set_link_down(uint8_t port_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -1155,10 +1119,6 @@ rte_eth_dev_close(uint8_t port_id) { struct rte_eth_dev *dev; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_RET(); - RTE_ETH_VALID_PORTID_OR_RET(port_id); dev = &rte_eth_devices[port_id]; @@ -1183,10 +1143,6 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t rx_queue_id, struct rte_eth_dev *dev; struct rte_eth_dev_info dev_info; - /* This function is only safe when called from the primary process -* in a multi-process setup*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); dev = &rte_eth_devices[port_id]; @@ -1266,10 +1222,6 @@ rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id, struct rte_eth_d
[dpdk-dev] [PATCH 3/3] librte_ether: fix rte_eth_dev_configure
User should be able to configure ethdev with zero rx/tx queues, but both should not be zero. After above change, rte_eth_dev_tx_queue_config, rte_eth_dev_rx_queue_config should allocate memory for rx/tx queues only when number of rx/tx queues are nonzero. Signed-off-by: Reshma Pattan --- lib/librte_ether/rte_ethdev.c | 36 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 5849102..a7647b6 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -673,7 +673,7 @@ rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) void **rxq; unsigned i; - if (dev->data->rx_queues == NULL) { /* first time configuration */ + if (dev->data->rx_queues == NULL && nb_queues != 0) { /* first time configuration */ dev->data->rx_queues = rte_zmalloc("ethdev->rx_queues", sizeof(dev->data->rx_queues[0]) * nb_queues, RTE_CACHE_LINE_SIZE); @@ -681,7 +681,7 @@ rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) dev->data->nb_rx_queues = 0; return -(ENOMEM); } - } else { /* re-configure */ + } else if (dev->data->rx_queues != NULL && nb_queues != 0) { /* re-configure */ RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release, -ENOTSUP); rxq = dev->data->rx_queues; @@ -701,6 +701,13 @@ rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) dev->data->rx_queues = rxq; + } else if (dev->data->rx_queues != NULL && nb_queues == 0) { + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release, -ENOTSUP); + + rxq = dev->data->rx_queues; + + for (i = nb_queues; i < old_nb_queues; i++) + (*dev->dev_ops->rx_queue_release)(rxq[i]); } dev->data->nb_rx_queues = nb_queues; return 0; @@ -817,7 +824,7 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) void **txq; unsigned i; - if (dev->data->tx_queues == NULL) { /* first time configuration */ + if (dev->data->tx_queues == NULL && nb_queues != 0) { /* first time configuration */ dev->data->tx_queues = rte_zmalloc("ethdev->tx_queues", sizeof(dev->data->tx_queues[0]) * nb_queues, RTE_CACHE_LINE_SIZE); @@ -825,7 +832,7 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) dev->data->nb_tx_queues = 0; return -(ENOMEM); } - } else { /* re-configure */ + } else if (dev->data->tx_queues != NULL && nb_queues != 0) { /* re-configure */ RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release, -ENOTSUP); txq = dev->data->tx_queues; @@ -845,6 +852,13 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues) dev->data->tx_queues = txq; + } else if (dev->data->tx_queues != NULL && nb_queues == 0) { + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release, -ENOTSUP); + + txq = dev->data->tx_queues; + + for (i = nb_queues; i < old_nb_queues; i++) + (*dev->dev_ops->tx_queue_release)(txq[i]); } dev->data->nb_tx_queues = nb_queues; return 0; @@ -891,25 +905,23 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, * configured device. */ (*dev->dev_ops->dev_infos_get)(dev, &dev_info); + + if (nb_rx_q == 0 && nb_tx_q == 0) { + RTE_PMD_DEBUG_TRACE("ethdev port_id=%d both rx and tx queue cannot be 0\n", port_id); + return -EINVAL; + } + if (nb_rx_q > dev_info.max_rx_queues) { RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_rx_queues=%d > %d\n", port_id, nb_rx_q, dev_info.max_rx_queues); return -EINVAL; } - if (nb_rx_q == 0) { - RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_rx_q == 0\n", port_id); - return -EINVAL; - } if (nb_tx_q > dev_info.max_tx_queues) { RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_tx_queues=%d > %d\n", port_id, nb_tx_q, dev_info.max_tx_queues); return -EINVAL; } - if (nb_tx_q == 0) { - RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_tx_q == 0\n", port_id); - return -EINVAL; - } /* Copy the dev_conf parameter into the dev structure */ memcpy(&dev->data->dev_conf, dev_conf, sizeof(dev->data->dev_conf)); -- 2.5.0
[dpdk-dev] [PATCH 2/3] librte_cryptodev: remove RTE_PROC_PRIMARY_OR_RET
Macro RTE_PROC_PRIMARY_OR_ERR_RET blocking the secondary process from API usage. API access should be given to both secondary and primary. Signed-off-by: Reshma Pattan --- lib/librte_cryptodev/rte_cryptodev.c | 42 1 file changed, 42 deletions(-) diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c index f09f67e..207e92c 100644 --- a/lib/librte_cryptodev/rte_cryptodev.c +++ b/lib/librte_cryptodev/rte_cryptodev.c @@ -532,12 +532,6 @@ rte_cryptodev_queue_pair_start(uint8_t dev_id, uint16_t queue_pair_id) { struct rte_cryptodev *dev; - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return -EINVAL; @@ -560,12 +554,6 @@ rte_cryptodev_queue_pair_stop(uint8_t dev_id, uint16_t queue_pair_id) { struct rte_cryptodev *dev; - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return -EINVAL; @@ -593,12 +581,6 @@ rte_cryptodev_configure(uint8_t dev_id, struct rte_cryptodev_config *config) struct rte_cryptodev *dev; int diag; - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return (-EINVAL); @@ -635,12 +617,6 @@ rte_cryptodev_start(uint8_t dev_id) CDEV_LOG_DEBUG("Start dev_id=%" PRIu8, dev_id); - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return (-EINVAL); @@ -670,12 +646,6 @@ rte_cryptodev_stop(uint8_t dev_id) { struct rte_cryptodev *dev; - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_RET(); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return; @@ -701,12 +671,6 @@ rte_cryptodev_close(uint8_t dev_id) struct rte_cryptodev *dev; int retval; - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-EINVAL); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return -1; @@ -747,12 +711,6 @@ rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id, { struct rte_cryptodev *dev; - /* -* This function is only safe when called from the primary process -* in a multi-process setup -*/ - RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY); - if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id); return (-EINVAL); -- 2.5.0
[dpdk-dev] [RFC v2 0/2] ethdev: Enhancements to flow director filter
This RFC series of patches attempt to extend the flow director filter to add support for Chelsio T5 hardware filtering capabilities. Chelsio T5 supports carrying out filtering in hardware which supports 3 actions to carry out on a packet which hit a filter viz. 1. Action Pass - Packets hitting a filter rule can be directed to a particular RXQ. 2. Action Drop - Packets hitting a filter rule are dropped in h/w. 3. Action Switch - Packets hitting a filter rule can be switched in h/w from one port to another, without involvement of host. Also, the action Switch also supports rewrite of src-mac/dst-mac headers as well as rewrite of vlan headers. It also supports rewrite of IP headers and thereby, supports NAT (Network Address Translation) in h/w. Also, each filter rule can optionally support specifying a mask value i.e. it's possible to create a filter rule for an entire subnet of IP addresses or a range of tcp/udp ports, etc. Patch 1 does the following: - Adds an additional flow rte_eth_pkt_filter_flow which encapsulates ingress ports, l2 payload, vlan and ntuples. - Adds an additional mask for the flow to allow range of values to be matched. - Adds an ability to set both filters with masks (Maskfull) and without masks (Maskless). Also allow prioritizing one of these filter types over the other when a packet matches several types. - Adds a new behavior 'switch'. - Adds behavior arguments that can be passed when a particular behavior is taken. For ex: in case of action 'switch', pass additional 4-tuple to allow rewriting src/dst ip and port addresses to support NAT'ing. Patch 2 shows testpmd command line example to support packet filter flow. The patch series has been compile tested on all x86 gcc targets and the current fdir filter supported drivers seem to return appropriate error codes when this new flow type and the new action are not supported and hence are not affected. Posting this series mainly for discussion on API change. Once this is agreeable then, I will post the cxgbe PMD changes to use the new API. --- v2: 1. Added ttl to rte_eth_ipv4_flow and tc, flow_label, next_header, and hop_limit to rte_eth_ipv6_flow. 2. Added new field type to rte_eth_pkt_filter_flow to differentiate between maskfull and maskless filter types. 3. Added new field prio to rte_eth_pkt_filter_flow to allow setting priority over maskfull or maskless when packet matches multiple filter types. 4. Added new behavior sub op RTE_FDIR_BEHAVIOR_SUB_OP_SWAP to allow swapping fields in matched flows. For ex, useful when swapping mac addresses in hardware before switching. 5. Updated the testpmd example to reflect the above new changes. 6. Dropped Patch 3 since the ABI announcement has already been merged. Rahul Lakkireddy (2): ethdev: add packet filter flow and new behavior switch to fdir testpmd: add an example to show packet filter flow app/test-pmd/cmdline.c | 528 +++- lib/librte_ether/rte_eth_ctrl.h | 127 +- 2 files changed, 646 insertions(+), 9 deletions(-) -- 2.5.3
[dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new behavior switch to fdir
Add a new packet filter flow that allows filtering a packet based on matching ingress port, ethertype, vlan, ip, and tcp/udp fields, i.e. matching based on any or all fields at the same time. Add the ability to provide masks for fields in flow to allow range of values. Allow selection of maskfull vs maskless filter types. Provide mechanism to set priority to maskfull vs maskless filter types when packet matches several filter types. Add a new vlan flow containing inner and outer vlan to match. Add tos, proto, and ttl fields that can be matched for ipv4 flow. Add tc, flow_label, next_header, and hop_limit fields that can be matched for ipv6 flow. Add a new behavior switch. Add the ability to provide behavior arguments to allow insert/deletion/ swapping of matched fields in the flow. Useful when rewriting matched fields with new values. Adds arguments for port, mac, vlan, and nat. Ex: allows to provide new ip and port addresses to rewrite the fields of packets matching a filter rule before NAT'ing. Signed-off-by: Rahul Lakkireddy Signed-off-by: Kumar Sanghvi --- v2: 1. Added ttl to rte_eth_ipv4_flow and tc, flow_label, next_header, and hop_limit to rte_eth_ipv6_flow. 2. Added new field type to rte_eth_pkt_filter_flow to differentiate between maskfull and maskless filter types. 3. Added new field prio to rte_eth_pkt_filter_flow to allow setting priority over maskfull or maskless when packet matches multiple filter types. 4. Added new behavior sub op RTE_FDIR_BEHAVIOR_SUB_OP_SWAP to allow swapping fields in matched flows. Useful when swapping mac addresses in hardware before switching. lib/librte_ether/rte_eth_ctrl.h | 127 +++- 1 file changed, 126 insertions(+), 1 deletion(-) diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h index ce224ad..5cc22a0 100644 --- a/lib/librte_ether/rte_eth_ctrl.h +++ b/lib/librte_ether/rte_eth_ctrl.h @@ -74,7 +74,11 @@ extern "C" { #define RTE_ETH_FLOW_IPV6_EX15 #define RTE_ETH_FLOW_IPV6_TCP_EX16 #define RTE_ETH_FLOW_IPV6_UDP_EX17 -#define RTE_ETH_FLOW_MAX18 +#define RTE_ETH_FLOW_PKT_FILTER_IPV4_TCP 18 +#define RTE_ETH_FLOW_PKT_FILTER_IPV4_UDP 19 +#define RTE_ETH_FLOW_PKT_FILTER_IPV6_TCP 20 +#define RTE_ETH_FLOW_PKT_FILTER_IPV6_UDP 21 +#define RTE_ETH_FLOW_MAX22 /** * Feature filter types @@ -407,6 +411,9 @@ struct rte_eth_l2_flow { struct rte_eth_ipv4_flow { uint32_t src_ip; /**< IPv4 source address to match. */ uint32_t dst_ip; /**< IPv4 destination address to match. */ + uint8_t tos; /**< IPV4 type of service to match. */ + uint8_t proto;/**< IPV4 proto to match. */ + uint8_t ttl; /**< IPV4 time to live to match. */ }; /** @@ -443,6 +450,10 @@ struct rte_eth_sctpv4_flow { struct rte_eth_ipv6_flow { uint32_t src_ip[4]; /**< IPv6 source address to match. */ uint32_t dst_ip[4]; /**< IPv6 destination address to match. */ + uint8_t tc; /**< IPv6 traffic class to match. */ + uint32_t flow_label; /**< IPv6 flow label to match. */ + uint8_t next_header;/**< IPv6 next header to match. */ + uint8_t hop_limit; /**< IPv6 hop limits to match. */ }; /** @@ -500,6 +511,51 @@ struct rte_eth_tunnel_flow { }; /** + * A structure used to define the input for vlan flow. + */ +struct rte_eth_vlan_flow { + uint16_t inner_vlan; /**< Inner vlan field to match. */ + uint16_t outer_vlan; /**< Outer vlan field to match. */ +}; + +/** + * A union used to define the input for N-Tuple flow + */ +union rte_eth_ntuple_flow { + struct rte_eth_tcpv4_flow tcp4; + struct rte_eth_udpv4_flow udp4; + struct rte_eth_tcpv6_flow tcp6; + struct rte_eth_udpv6_flow udp6; +}; + +/** + * A structure used to define the input for packet filter. + */ +struct rte_eth_pkt_filter { + uint8_t port_id; /**< Port id to match. */ + struct rte_eth_l2_flowl2_flow; /**< L2 flow fields to match. */ + struct rte_eth_vlan_flow vlan_flow; /**< Vlan flow fields to match. */ + union rte_eth_ntuple_flow ntuple_flow; + /**< N-tuple flow fields to match. */ +}; + +/** + * A structure used to define the input for packet filter flow. + */ +enum rte_eth_pkt_filter_type { + RTE_ETH_PKT_FILTER_TYPE_MASKLESS = 0, /**< Ignore masks in the flow */ + RTE_ETH_PKT_FILTER_TYPE_MASKFULL, /**< Consider masks in the flow */ +}; + +struct rte_eth_pkt_filter_flow { + enum rte_eth_pkt_filter_type type; /**< Type of filter */ + enum rte_eth_pkt_filter_type prio; + /**< Prioritize the filter type when a packet matches several types */ + struct rte_eth_pkt_filter pkt; /**< Packet fields to match. */ + struct rte_eth_pkt_filter mask; /**< Mask for matched fields. */ +}; + +/** * An union conta
[dpdk-dev] [RFC v2 2/2] testpmd: add an example to show packet filter flow
Extend the existing flow_director_filter to add support for packet filter flow. Also shows how to pass the extra behavior arguments to rewrite fields in matched filter rules. Signed-off-by: Rahul Lakkireddy Signed-off-by: Kumar Sanghvi --- v2: 1. Added new field filter-type to allow specifying maskfull vs maskless filter types. 2. Added new field filter-prio to allow specifying the priority between maskfull and maskless filters i.e. if we have a maskfull and a maskless filter both of which can match a single traffic pattern then, which one takes the priority is determined by filter-prio. 3. Added new field flow-label to be matched for ipv6. 4. Added new mac-swap behavior argument. app/test-pmd/cmdline.c | 528 - 1 file changed, 520 insertions(+), 8 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 73298c9..3402f2c 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -641,7 +641,7 @@ static void cmd_help_long_parsed(void *parsed_result, " flow (ipv4-other|ipv4-frag|ipv6-other|ipv6-frag)" " src (src_ip_address) dst (dst_ip_address)" " vlan (vlan_value) flexbytes (flexbytes_value)" - " (drop|fwd) pf|vf(vf_id) queue (queue_id)" + " (drop|fwd|switch) pf|vf(vf_id) queue (queue_id)" " fd_id (fd_id_value)\n" "Add/Del an IP type flow director filter.\n\n" @@ -650,7 +650,7 @@ static void cmd_help_long_parsed(void *parsed_result, " src (src_ip_address) (src_port)" " dst (dst_ip_address) (dst_port)" " vlan (vlan_value) flexbytes (flexbytes_value)" - " (drop|fwd) pf|vf(vf_id) queue (queue_id)" + " (drop|fwd|switch) pf|vf(vf_id) queue (queue_id)" " fd_id (fd_id_value)\n" "Add/Del an UDP/TCP type flow director filter.\n\n" @@ -659,16 +659,41 @@ static void cmd_help_long_parsed(void *parsed_result, " src (src_ip_address) (src_port)" " dst (dst_ip_address) (dst_port)" " tag (verification_tag) vlan (vlan_value)" - " flexbytes (flexbytes_value) (drop|fwd)" + " flexbytes (flexbytes_value) (drop|fwd|switch)" " pf|vf(vf_id) queue (queue_id) fd_id (fd_id_value)\n" "Add/Del a SCTP type flow director filter.\n\n" "flow_director_filter (port_id) mode IP (add|del|update)" " flow l2_payload ether (ethertype)" - " flexbytes (flexbytes_value) (drop|fwd)" + " flexbytes (flexbytes_value) (drop|fwd|switch)" " pf|vf(vf_id) queue (queue_id) fd_id (fd_id_value)\n" "Add/Del a l2 payload type flow director filter.\n\n" + "flow_director_filter (port_id) mode IP (add|del|update)" + " flow (ipv4-tcp-pkt-filter|ipv4-udp-pkt-filter" + " ipv6-tcp-pkt-filter|ipv6-udp-pkt-filter)" + " filter-type maskfull|maskless" + " filter-prio default|maskfull|maskless" + " ingress-port (port_id) (port_id_mask)" + " ether (ethertype) (ethertype_mask)" + " inner-vlan (inner_vlan_value) (inner_vlan_mask)" + " outer-vlan (outer_vlan_value) (outer_vlan_mask)" + " tos (tos_value) (tos_mask)" + " flow-label (flow_label_value) (flow_label_mask)" + " proto (proto_value) (proto_mask)" + " ttl (ttl_value) (ttl_mask)" + " src (src_ip) (src_ip_mask) (src_port) (src_port_mask)" + " dst (dst_ip) (dst_ip_mask) (dst_port) (dst_port_mask)" + " flexbytes (flexbytes_value) (drop|fwd|switch)" + " pf|vf(vf_id) queue (queue_id)" + " port-arg none|port-redirect (dst-port-id)" + " mac-arg none|mac-rewrite|mac-swap (src-mac) (dst-mac)" + " vlan-arg none|vlan-rewrite|vlan-del (vlan_value)" + " nat-arg none|nat-rewrite" + " src (src_ip) (src_port) dst (dst_ip) (dst_port)" + " fd_id (fd_id_value)\n" + "Add/Del a packet filter type flow director filter.\n\n" + "flow_director_filter (port_id) mode MAC-VLAN (add|del|update)" " mac (mac_address) vlan (vlan_value)" " flexbytes (flexbytes_value) (drop|fwd)" @@ -7973,14 +7998,44 @@ struct cmd_flow_dir
[dpdk-dev] [PATCH] virtio: fix crashes in virtio stats functions
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bernard Iremonger > Sent: Wednesday, December 23, 2015 9:45 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] virtio: fix crashes in virtio stats functions > > This initialisation of nb_rx_queues and nb_tx_queues has been removed > from eth_virtio_dev_init. > > The nb_rx_queues and nb_tx_queues were being initialised in > eth_virtio_dev_init > before the tx_queues and rx_queues arrays were allocated. > > The arrays are allocated when the ethdev port is configured and the > nb_tx_queues and nb_rx_queues are initialised. > > If any of the following functions were called before the ethdev > port was configured there was a segmentation fault because > rx_queues and tx_queues were NULL: > > rte_eth_stats_get > rte_eth_stats_reset > rte_eth_xstats_get > rte_eth_xstats_reset > > Fixes: 823ad647950a ("virtio: support multiple queues") > Signed-off-by: Bernard Iremonger > --- > drivers/net/virtio/virtio_ethdev.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/net/virtio/virtio_ethdev.c > b/drivers/net/virtio/virtio_ethdev.c > index d928339..5ef0752 100644 > --- a/drivers/net/virtio/virtio_ethdev.c > +++ b/drivers/net/virtio/virtio_ethdev.c > @@ -1378,9 +1378,6 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev) > hw->max_tx_queues = 1; > } > > - eth_dev->data->nb_rx_queues = hw->max_rx_queues; > - eth_dev->data->nb_tx_queues = hw->max_tx_queues; > - > PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d hw->max_tx_queues=%d", > hw->max_rx_queues, hw->max_tx_queues); > PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x", > -- Acked-by: Konstantin Ananyev > 2.6.3
[dpdk-dev] [PATCH v4 0/2] provide rte_pktmbuf_alloc_bulk API and call it in vhost dequeue
v4 changes: fix a silly typo in error handling when rte_pktmbuf_alloc fails v3 changes: move while after case 0 add context about duff's device and why we use while loop in the commit message v2 changes: unroll the loop in rte_pktmbuf_alloc_bulk to help the performance For symmetric rte_pktmbuf_free_bulk, if the app knows in its scenarios their mbufs are all simple mbufs, i.e meet the following requirements: * no multiple segments * not indirect mbuf * refcnt is 1 * belong to the same mbuf memory pool, it could directly call rte_mempool_put to free the bulk of mbufs, otherwise rte_pktmbuf_free_bulk has to call rte_pktmbuf_free to free the mbuf one by one. This patchset will not provide this symmetric implementation. Huawei Xie (2): mbuf: provide rte_pktmbuf_alloc_bulk API vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue lib/librte_mbuf/rte_mbuf.h| 49 +++ lib/librte_vhost/vhost_rxtx.c | 35 +++ 2 files changed, 71 insertions(+), 13 deletions(-) -- 1.8.1.4
[dpdk-dev] [PATCH v4 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API
v3 changes: move while after case 0 add context about duff's device and why we use while loop in the commit message v2 changes: unroll the loop a bit to help the performance rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs. There is related thread about this bulk API. http://dpdk.org/dev/patchwork/patch/4718/ Thanks to Konstantin's loop unrolling. Attached the wiki page about duff's device. It explains the performance optimization through loop unwinding, and also the most dramatic use of case label fall-through. https://en.wikipedia.org/wiki/Duff%27s_device In our implementation, we use while() loop rather than do{} while() loop because we could not assume count is strictly positive. Using while() loop saves one line of check if count is zero. Signed-off-by: Gerald Rogers Signed-off-by: Huawei Xie Acked-by: Konstantin Ananyev --- lib/librte_mbuf/rte_mbuf.h | 49 ++ 1 file changed, 49 insertions(+) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index f234ac9..3381c28 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -1336,6 +1336,55 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp) } /** + * Allocate a bulk of mbufs, initialize refcnt and reset the fields to default + * values. + * + * @param pool + *The mempool from which mbufs are allocated. + * @param mbufs + *Array of pointers to mbufs + * @param count + *Array size + * @return + * - 0: Success + */ +static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool, +struct rte_mbuf **mbufs, unsigned count) +{ + unsigned idx = 0; + int rc; + + rc = rte_mempool_get_bulk(pool, (void **)mbufs, count); + if (unlikely(rc)) + return rc; + + switch (count % 4) { + case 0: while (idx != count) { + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); + rte_mbuf_refcnt_set(mbufs[idx], 1); + rte_pktmbuf_reset(mbufs[idx]); + idx++; + case 3: + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); + rte_mbuf_refcnt_set(mbufs[idx], 1); + rte_pktmbuf_reset(mbufs[idx]); + idx++; + case 2: + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); + rte_mbuf_refcnt_set(mbufs[idx], 1); + rte_pktmbuf_reset(mbufs[idx]); + idx++; + case 1: + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); + rte_mbuf_refcnt_set(mbufs[idx], 1); + rte_pktmbuf_reset(mbufs[idx]); + idx++; + } + } + return 0; +} + +/** * Attach packet mbuf to another packet mbuf. * * After attachment we refer the mbuf we attached as 'indirect', -- 1.8.1.4
[dpdk-dev] [PATCH v4 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue
v4 changes: fix a silly typo in error handling when rte_pktmbuf_alloc fails reported by haifeng pre-allocate a bulk of mbufs instead of allocating one mbuf a time on demand Signed-off-by: Gerald Rogers Signed-off-by: Huawei Xie Acked-by: Konstantin Ananyev Acked-by: Yuanhan Liu Tested-by: Yuanhan Liu --- lib/librte_vhost/vhost_rxtx.c | 35 ++- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index bbf3fac..f10d534 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -576,6 +576,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, uint32_t i; uint16_t free_entries, entry_success = 0; uint16_t avail_idx; + uint8_t alloc_err = 0; + uint8_t seg_num; if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) { RTE_LOG(ERR, VHOST_DATA, @@ -609,6 +611,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n", dev->device_fh, free_entries); + + if (unlikely(rte_pktmbuf_alloc_bulk(mbuf_pool, + pkts, free_entries)) < 0) { + RTE_LOG(ERR, VHOST_DATA, + "Failed to bulk allocating %d mbufs\n", free_entries); + return 0; + } + /* Retrieve all of the head indexes first to avoid caching issues. */ for (i = 0; i < free_entries; i++) head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 1)]; @@ -621,9 +631,9 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, uint32_t vb_avail, vb_offset; uint32_t seg_avail, seg_offset; uint32_t cpy_len; - uint32_t seg_num = 0; + seg_num = 0; struct rte_mbuf *cur; - uint8_t alloc_err = 0; + desc = &vq->desc[head[entry_success]]; @@ -654,13 +664,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, vq->used->ring[used_idx].id = head[entry_success]; vq->used->ring[used_idx].len = 0; - /* Allocate an mbuf and populate the structure. */ - m = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(m == NULL)) { - RTE_LOG(ERR, VHOST_DATA, - "Failed to allocate memory for mbuf.\n"); - break; - } + prev = cur = m = pkts[entry_success]; seg_offset = 0; seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; cpy_len = RTE_MIN(vb_avail, seg_avail); @@ -668,8 +672,6 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0); seg_num++; - cur = m; - prev = m; while (cpy_len != 0) { rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset), (void *)((uintptr_t)(vb_addr + vb_offset)), @@ -761,16 +763,23 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, cpy_len = RTE_MIN(vb_avail, seg_avail); } - if (unlikely(alloc_err == 1)) + if (unlikely(alloc_err)) break; m->nb_segs = seg_num; - pkts[entry_success] = m; vq->last_used_idx++; entry_success++; } + if (unlikely(alloc_err)) { + uint16_t i = entry_success; + + m->nb_segs = seg_num; + for (; i < free_entries; i++) + rte_pktmbuf_free(pkts[i]); + } + rte_compiler_barrier(); vq->used->idx += entry_success; /* Kick guest if required. */ -- 1.8.1.4
[dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
On Wed, 23 Dec 2015 16:49:46 +0800 Jijiang Liu wrote: > 1)at config phase > > dev_config(port, ...); > tunnel_config(port,...); > ... > dev_start(port); > ... > rx_burst(port, rxq,... ); > tx_burst(port, txq,...); What about dynamically adding and deleting multiple tunnels after device has started? This would be the more common case in a real world environment.
[dpdk-dev] [RFC PATCH 5/6] rte_ether: implement encap and decap APIs
On Wed, 23 Dec 2015 16:49:51 +0800 Jijiang Liu wrote: > + > +#ifndef __INTEL_COMPILER > +#pragma GCC diagnostic ignored "-Wcast-qual" > +#endif > + > +#pragma GCC diagnostic ignored "-Wstrict-aliasing" > + Since this is new code, can't you please fix it to be warning safe?
[dpdk-dev] [PATCH v3 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API
On Wed, 23 Dec 2015 00:17:53 +0800 Huawei Xie wrote: > + > + rc = rte_mempool_get_bulk(pool, (void **)mbufs, count); > + if (unlikely(rc)) > + return rc; > + > + switch (count % 4) { > + case 0: while (idx != count) { > + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); > + rte_mbuf_refcnt_set(mbufs[idx], 1); > + rte_pktmbuf_reset(mbufs[idx]); > + idx++; > + case 3: > + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); > + rte_mbuf_refcnt_set(mbufs[idx], 1); > + rte_pktmbuf_reset(mbufs[idx]); > + idx++; > + case 2: > + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); > + rte_mbuf_refcnt_set(mbufs[idx], 1); > + rte_pktmbuf_reset(mbufs[idx]); > + idx++; > + case 1: > + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); > + rte_mbuf_refcnt_set(mbufs[idx], 1); > + rte_pktmbuf_reset(mbufs[idx]); > + idx++; > + } > + } > + return 0; > +} Since function will not work if count can not be 0 (otherwise rte_mempool_get_bulk will fail), why not: 1. Document that assumption 2. Use that assumption to speed up code. switch(count % 4) { do { case 0: ... case 1: ... } while (idx != count); } Also you really need to add a big block comment about this loop, to explain what it does and why.
[dpdk-dev] [PATCH v3 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger > Sent: Wednesday, December 23, 2015 6:38 PM > To: Xie, Huawei > Cc: dev at dpdk.org; dprovan at bivio.net > Subject: Re: [dpdk-dev] [PATCH v3 1/2] mbuf: provide rte_pktmbuf_alloc_bulk > API > > On Wed, 23 Dec 2015 00:17:53 +0800 > Huawei Xie wrote: > > > + > > + rc = rte_mempool_get_bulk(pool, (void **)mbufs, count); > > + if (unlikely(rc)) > > + return rc; > > + > > + switch (count % 4) { > > + case 0: while (idx != count) { > > + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); > > + rte_mbuf_refcnt_set(mbufs[idx], 1); > > + rte_pktmbuf_reset(mbufs[idx]); > > + idx++; > > + case 3: > > + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); > > + rte_mbuf_refcnt_set(mbufs[idx], 1); > > + rte_pktmbuf_reset(mbufs[idx]); > > + idx++; > > + case 2: > > + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); > > + rte_mbuf_refcnt_set(mbufs[idx], 1); > > + rte_pktmbuf_reset(mbufs[idx]); > > + idx++; > > + case 1: > > + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0); > > + rte_mbuf_refcnt_set(mbufs[idx], 1); > > + rte_pktmbuf_reset(mbufs[idx]); > > + idx++; > > + } > > + } > > + return 0; > > +} > > Since function will not work if count can not be 0 (otherwise > rte_mempool_get_bulk will fail), As I understand, rte_mempool_get_bulk() will work correctly and return 0, if count==0. That's why Huawei prefers while() {}, instead of do {} while() - to avoid extra check for (count != 0) at the start. Konstantin > why not: > 1. Document that assumption > 2. Use that assumption to speed up code. > > > > switch(count % 4) { > do { > case 0: > ... > case 1: > ... > } while (idx != count); > } > > Also you really need to add a big block comment about this loop, to explain > what it does and why.
[dpdk-dev] [PATCH] mk: Fix examples install path
Hi, 2015-12-22 14:13, Christian Ehrhardt: > Depending on non-doc targets being built before and the setting of DESTDIR > the examples dir could in some cases not end up in the right target. > Reason is just a typo variable reference in the copy target. [...] > - $(Q)cp -a $(RTE_SDK)/examples $(DESTDIR)$(datadir) > + $(Q)cp -a $(RTE_SDK)/examples $(DESTDIR)$(docdir) No, it was not a typo. Do you really think the examples code should be in the doc dir (i.e. /usr/share/doc/dpdk) instead of datadir (i.e. /usr/share/dpdk)?
[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
2015-12-23 10:44, Yuanhan Liu: > On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote: > > On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu > linux.intel.com> > > wrote: > > > > On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote: > > > The queue state change callback is the one new API that needs to be > > > added because > > > normal NICs don't have this behavior. > > > > Again I'd ask, will vring_state_changed() be enough, when above issues > > are resolved: vring_state_changed() will be invoked at new_device()/ > > destroy_device(), and of course, ethtool change? > > > > > > It would be sufficient. It is not a great API though, because it requires > > the > > application to do the conversion from struct virtio_net to a DPDK port > > number, > > and from a virtqueue index to a DPDK queue id and direction. Also, the > > current > > implementation often makes this callback when the vring state has not > > actually > > changed (enabled -> enabled and disabled -> disabled). > > > > If you're asking about using vring_state_changed() _instead_ of the link > > status > > event and rte_eth_dev_socket_id(), > > No, I like the idea of link status event and rte_eth_dev_socket_id(); > I was just wondering why a new API is needed. Both Tetsuya and I > were thinking to leverage the link status event to represent the > queue stats change (triggered by vring_state_changed()) as well, > so that we don't need to introduce another eth event. However, I'd > agree that it's better if we could have a new dedicate event. > > Thomas, here is some background for you. For vhost pmd and linux > virtio-net combo, the queue can be dynamically changed by ethtool, > therefore, the application wishes to have another eth event, say > RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can > add/remove corresponding queue to the datapath when that happens. > What do you think of that? Yes it is an event. So I don't understand the question. What may be better than a specific rte_eth_event_type?
[dpdk-dev] [Question] How pmd virtio works without UIO?
2015-12-23 05:13, Xie, Huawei: > On 12/23/2015 10:57 AM, Yuanhan Liu wrote: > > On Wed, Dec 23, 2015 at 10:41:57AM +0800, Peter Xu wrote: > >> On Wed, Dec 23, 2015 at 10:01:35AM +0800, Yuanhan Liu wrote: > >>> On Tue, Dec 22, 2015 at 05:56:41PM +0800, Peter Xu wrote: > On Tue, Dec 22, 2015 at 04:32:46PM +0800, Yuanhan Liu wrote: > > Actually, you are right. I mentioned in the last email that this is > > for configuration part. To answer your question in this email, you > > will not be able to go that further (say initiating virtio pmd) if > > you don't unbind the origin virtio-net driver, and bind it to igb_uio > > (or something similar). > > > > The start point is from rte_eal_pci_scan, where the sub-function > > pci_san_one just initates a DPDK bond driver. > I am not sure whether I do understand your meaning correctly > (regarding "you willl not be able to go that furture"): The problem > is that, we _can_ run testpmd without unbinding the ports and bind > to UIO or something. What we need to do is boot the guest, reserve > huge pages, and run testpmd (keeping its kernel driver as > "virtio-pci"). In pci_scan_one(): > > if (!ret) { > if (!strcmp(driver, "vfio-pci")) > dev->kdrv = RTE_KDRV_VFIO; > else if (!strcmp(driver, "igb_uio")) > dev->kdrv = RTE_KDRV_IGB_UIO; > else if (!strcmp(driver, "uio_pci_generic")) > dev->kdrv = RTE_KDRV_UIO_GENERIC; > else > dev->kdrv = RTE_KDRV_UNKNOWN; > } else > dev->kdrv = RTE_KDRV_UNKNOWN; > > I think it should be going to RTE_KDRV_UNKNOWN > (driver=="virtio-pci") here. > >>> Sorry, I simply overlook that. I was thinking it will quit here for > >>> the RTE_KDRV_UNKNOWN case. > >>> > I tried to run IO and it could work, > but I am not sure whether it is safe, and how. > >>> I also did a quick test then, however, with the virtio 1.0 patchset > >>> I sent before, which sets the RTE_PCI_DRV_NEED_MAPPING, resulting to > >>> pci_map_device() failure and virtio pmd is not initiated at all. > >> Then, will the patch work with ioport way to access virtio devices? > > Yes. > > > Also, I am not sure whether I need to (at least) unbind the > virtio-pci driver, so that there should have no kernel driver > running for the virtio device before DPDK using it. > >>> Why not? That's what the DPDK document asked to do > >>> (http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html): > >>> > >>> 3.6. Binding and Unbinding Network Ports to/from the Kernel Modules > >>> > >>> As of release 1.4, DPDK applications no longer automatically unbind > >>> all supported network ports from the kernel driver in use. Instead, > >>> all ports that are to be used by an DPDK application must be bound > >>> to the uio_pci_generic, igb_uio or vfio-pci module before the > >>> application is run. Any network ports under Linux* control will be > >>> ignored by the DPDK poll-mode drivers and cannot be used by the > >>> application. > >> This seems obsolete? since it's not covering ioport. > > I don't think so. Above is for how to run DPDK applications. ioport > > is just a (optional) way to access PCI resource in a specific PMD. > > > > And, above speicification avoids your concerns, that two drivers try > > to manipulate same device concurrently, doesn't it? > > > > And, it is saying "any network ports under Linux* control will be > > ignored by the DPDK poll-mode drivers and cannot be used by the > > application", so that the case you were saying that virtio pmd > > continues to work without the bind looks like a bug to me. > > > > Can anyone confirm that? > > That document isn't accurate. virtio doesn't require binding to UIO > driver if it uses PORT IO. The PORT IO commit said it is because UIO > isn't secure, but avoid using uio doesn't bring more security as virtio > PMD still could ask device to DMA into any memory. > The thing we at least we might do is fail in virtio_resource_init if > kernel driver is still manipulating this device. This saves the effort > users use blacklist option and avoids the driver conflict. +1 for checking kernel driver in use
[dpdk-dev] [Question] How pmd virtio works without UIO?
2015-12-23 10:09, Yuanhan Liu: > On Wed, Dec 23, 2015 at 09:55:54AM +0800, Peter Xu wrote: > > On Tue, Dec 22, 2015 at 04:38:30PM +, Xie, Huawei wrote: > > > On 12/22/2015 7:39 PM, Peter Xu wrote: > > > > I tried to unbind one of the virtio net device, I see the PCI entry > > > > still there. > > > > > > > > Before unbind: > > > > > > > > [root at vm proc]# lspci -k -s 00:03.0 > > > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device > > > > Subsystem: Red Hat, Inc Device 0001 > > > > Kernel driver in use: virtio-pci > > > > [root at vm proc]# cat /proc/ioports | grep c060-c07f > > > > c060-c07f : :00:03.0 > > > > c060-c07f : virtio-pci > > > > > > > > After unbind: > > > > > > > > [root at vm proc]# lspci -k -s 00:03.0 > > > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device > > > > Subsystem: Red Hat, Inc Device 0001 > > > > [root at vm proc]# cat /proc/ioports | grep c060-c07f > > > > c060-c07f : :00:03.0 > > > > > > > > So... does this means that it is an alternative to black list > > > > solution? > > > Oh, we could firstly check if this port is manipulated by kernel driver > > > in virtio_resource_init/eth_virtio_dev_init, as long as it is not too > > > late. > > Why can't we simply quit at pci_scan_one, once finding that it's not > bond to uio (or similar stuff)? That would be generic enough, that we > don't have to do similar checks for each new pmd driver. > > Or, am I missing something? UIO is not needed to make virtio works (without interrupt support). Sometimes it may be required to avoid using kernel modules. > > I guess there might be two problems? Which are: > > > > 1. How user avoid DPDK taking over virtio devices that they do not > >want for IO (chooses which device to use) > > Isn't that what's the 'binding/unbinding' for? Binding is, sometimes, required. But does it mean DPDK should use every available ports? That's the default and may be configured with blacklist/whitelist. > > 2. Driver conflict between virtio PMD in DPDK, and virtio-pci in > >kernel (happens on every virtio device that DPDK uses) > > If you unbinded the kernel driver first, which is the suggested (or > must?) way to use DPDK, that will not happen.
[dpdk-dev] [PATCH 0/3] Handle SIGINT and SIGTERM in DPDK examples
This patch handles SIGINT and SIGTERM in testpmd, l2fwd, and l3fwd, make sure all ports are properly stopped and closed. For virtual ports, the stop and close function may deal with resource cleanup, such as socket files unlinking. Zhihong Wang (3): app/test-pmd: Handle SIGINT and SIGTERM in testpmd examples/l2fwd: Handle SIGINT and SIGTERM in l2fwd examples/l3fwd: Handle SIGINT and SIGTERM in l3fwd app/test-pmd/testpmd.c | 23 +++ examples/l2fwd/main.c | 25 + examples/l3fwd/main.c | 25 + 3 files changed, 73 insertions(+) -- 2.5.0
[dpdk-dev] [PATCH 1/3] app/test-pmd: Handle SIGINT and SIGTERM in testpmd
Handle SIGINT and SIGTERM in testpmd. Signed-off-by: Zhihong Wang --- app/test-pmd/testpmd.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 98ae46d..c259ba3 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -1573,6 +1573,7 @@ pmd_test_exit(void) FOREACH_PORT(pt_id, ports) { printf("Stopping port %d...", pt_id); fflush(stdout); + rte_eth_dev_stop(pt_id); rte_eth_dev_close(pt_id); printf("done\n"); } @@ -1984,12 +1985,34 @@ init_port(void) ports[pid].enabled = 1; } +/* When we receive a INT signal, close all ports */ +static void +sigint_handler(__rte_unused int signum) +{ + unsigned portid; + + printf("Preparing to exit...\n"); + FOREACH_PORT(portid, ports) { + if (port_id_is_invalid(portid, ENABLED_WARN)) + continue; + printf("Stopping port %d...", portid); + rte_eth_dev_stop(portid); + rte_eth_dev_close(portid); + printf(" Done\n"); + } + printf("Bye...\n"); + exit(0); +} + int main(int argc, char** argv) { int diag; uint8_t port_id; + signal(SIGINT, sigint_handler); + signal(SIGTERM, sigint_handler); + diag = rte_eal_init(argc, argv); if (diag < 0) rte_panic("Cannot init EAL\n"); -- 2.5.0
[dpdk-dev] [PATCH 2/3] examples/l2fwd: Handle SIGINT and SIGTERM in l2fwd
Handle SIGINT and SIGTERM in l2fwd. Signed-off-by: Zhihong Wang --- examples/l2fwd/main.c | 25 + 1 file changed, 25 insertions(+) diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c index 720fd5a..0594037 100644 --- a/examples/l2fwd/main.c +++ b/examples/l2fwd/main.c @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -534,6 +535,27 @@ check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) } } +/* When we receive a INT signal, close all ports */ +static void +sigint_handler(__rte_unused int signum) +{ + unsigned portid, nb_ports; + + printf("Preparing to exit...\n"); + nb_ports = rte_eth_dev_count(); + for (portid = 0; portid < nb_ports; portid++) { + if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) { + continue; + } + printf("Stopping port %d...", portid); + rte_eth_dev_stop(portid); + rte_eth_dev_close(portid); + printf(" Done\n"); + } + printf("Bye...\n"); + exit(0); +} + int main(int argc, char **argv) { @@ -546,6 +568,9 @@ main(int argc, char **argv) unsigned lcore_id, rx_lcore_id; unsigned nb_ports_in_mask = 0; + signal(SIGINT, sigint_handler); + signal(SIGTERM, sigint_handler); + /* init EAL */ ret = rte_eal_init(argc, argv); if (ret < 0) -- 2.5.0
[dpdk-dev] [PATCH 3/3] examples/l3fwd: Handle SIGINT and SIGTERM in l3fwd
Handle SIGINT and SIGTERM in l3fwd. Signed-off-by: Zhihong Wang --- examples/l3fwd/main.c | 25 + 1 file changed, 25 insertions(+) diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c index 5b0c2dd..aae16d2 100644 --- a/examples/l3fwd/main.c +++ b/examples/l3fwd/main.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include @@ -2559,6 +2560,27 @@ check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) } } +/* When we receive a INT signal, close all ports */ +static void +sigint_handler(__rte_unused int signum) +{ + unsigned portid, nb_ports; + + printf("Preparing to exit...\n"); + nb_ports = rte_eth_dev_count(); + for (portid = 0; portid < nb_ports; portid++) { + if ((enabled_port_mask & (1 << portid)) == 0) { + continue; + } + printf("Stopping port %d...", portid); + rte_eth_dev_stop(portid); + rte_eth_dev_close(portid); + printf(" Done\n"); + } + printf("Bye...\n"); + exit(0); +} + int main(int argc, char **argv) { @@ -2572,6 +2594,9 @@ main(int argc, char **argv) uint32_t n_tx_queue, nb_lcores; uint8_t portid, nb_rx_queue, queue, socketid; + signal(SIGINT, sigint_handler); + signal(SIGTERM, sigint_handler); + /* init EAL */ ret = rte_eal_init(argc, argv); if (ret < 0) -- 2.5.0
[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD
On Wed, Dec 23, 2015 at 7:09 PM, Tetsuya Mukawa wrote: > On 2015/12/22 13:47, Rich Lane wrote: > > On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu < > yuanhan.liu at linux.intel.com> > > wrote: > > > >> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote: > >>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD > >> in a few > >>> ways: > >> Rich, thanks for the info! > >> > >>> 1. new_device/destroy_device: Link state change (will be covered by the > >> link > >>> status interrupt). > >>> 2. new_device: Add first queue to datapath. > >> I'm wondering why vring_state_changed() is not used, as it will also be > >> triggered at the beginning, when the default queue (the first queue) is > >> enabled. > >> > > Turns out I'd misread the code and it's already using the > > vring_state_changed callback for the > > first queue. Not sure if this is intentional but vring_state_changed is > > called for the first queue > > before new_device. > > > > > >>> 3. vring_state_changed: Add/remove queue to datapath. > >>> 4. destroy_device: Remove all queues (vring_state_changed is not called > >> when > >>> qemu is killed). > >> I had a plan to invoke vring_state_changed() to disable all vrings > >> when destroy_device() is called. > >> > > That would be good. > > > > > >>> 5. new_device and struct virtio_net: Determine NUMA node of the VM. > >> You can get the 'struct virtio_net' dev from all above callbacks. > > > > > >> 1. Link status interrupt. > >> > >> To vhost pmd, new_device()/destroy_device() equals to the link status > >> interrupt, where new_device() is a link up, and destroy_device() is link > >> down(). > >> > >> > >>> 2. New queue_state_changed callback. Unlike vring_state_changed this > >> should > >>> cover the first queue at new_device and removal of all queues at > >>> destroy_device. > >> As stated above, vring_state_changed() should be able to do that, except > >> the one on destroy_device(), which is not done yet. > >> > >>> 3. Per-queue or per-device NUMA node info. > >> You can query the NUMA node info implicitly by get_mempolicy(); check > >> numa_realloc() at lib/librte_vhost/virtio-net.c for reference. > >> > > Your suggestions are exactly how my application is already working. I was > > commenting on the > > proposed changes to the vhost PMD API. I would prefer to > > use RTE_ETH_EVENT_INTR_LSC > > and rte_eth_dev_socket_id for consistency with other NIC drivers, instead > > of these vhost-specific > > hacks. The queue state change callback is the one new API that needs to > be > > added because > > normal NICs don't have this behavior. > > > > You could add another rte_eth_event_type for the queue state change > > callback, and pass the > > queue ID, RX/TX direction, and enable bit through cb_arg. > > Hi Rich, > > So far, EAL provides rte_eth_dev_callback_register() for event handling. > DPDK app can register callback handler and "callback argument". > And EAL will call callback handler with the argument. > Anyway, vhost library and PMD cannot change the argument. > You're right, I'd mistakenly thought that the PMD controlled the void * passed to the callback. Here's a thought: struct rte_eth_vhost_queue_event { uint16_t queue_id; bool rx; bool enable; }; int rte_eth_vhost_get_queue_event(uint8_t port_id, struct rte_eth_vhost_queue_event *event); On receiving the ethdev event the application could repeatedly call rte_eth_vhost_get_queue_event to find out what happened. An issue with having the application dig into struct virtio_net is that it can only be safely accessed from a callback on the vhost thread. A typical application running its control plane on lcore 0 would need to copy all the relevant info from struct virtio_net before sending it over. As you mentioned, queues for a single vhost port could be located on different NUMA nodes. I think this is an uncommon scenario but if needed you could add an API to retrieve the NUMA node for a given port and queue.