Re: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue
Hi Xiaoyun, > -Original Message- > From: Li, Xiaoyun > Sent: Monday, December 23, 2019 3:52 PM > To: Gavin Hu ; Wu, Jingjing > Cc: dev@dpdk.org; Maslekar, Omkar ; > sta...@dpdk.org; nd > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue > > Hi > I reconsidered and retested about this issue. > I still need to use rte_wmb instead of using rte_io_wmb. > > Because to achieve high performance, ntb needs to turn on WC(write > combining) feature. The perf difference with and without WC enabled is > more than 20X. > And when WC enabled, rte_io_wmb cannot make sure the instructions are > in order only rte_wmb can make sure that. > > And in my retest, when sending 64 bytes packets, using rte_io_wmb will > cause out-of-order issue and cause memory corruption on rx side. > And using rte_wmb is fine. That's true, as it is declared as 'write combine' region, even x86 is known as strong ordered, it is the interconnect or PCI RC may do the reordering, 'write combine', 'write coalescing', which caused this problem. IMO, rte_io_*mb barriers on x86 should be promoted to stronger is WC is involved(but that will sap performance for non-WC memories?). https://code.dpdk.org/dpdk/latest/source/lib/librte_eal/common/include/arch/x86/rte_atomic.h#L78 Using rte_wmb will hurt performance for aarch64 also, as pci device memory accesses to a single device are strongly ordered therefore the strongest rte_wmb is not necessary. > So I can only use v1 patch and suspend v2 patch in patchwork. > > Best Regards > Xiaoyun Li > > > -Original Message- > > From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com] > > Sent: Monday, December 16, 2019 18:50 > > To: Li, Xiaoyun ; Wu, Jingjing > > > Cc: dev@dpdk.org; Maslekar, Omkar ; > > sta...@dpdk.org; nd > > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier > issue > > > > > > > > > -Original Message- > > > From: dev On Behalf Of Xiaoyun Li > > > Sent: Monday, December 16, 2019 9:59 AM > > > To: jingjing...@intel.com > > > Cc: dev@dpdk.org; omkar.masle...@intel.com; Xiaoyun Li > > > ; sta...@dpdk.org > > > Subject: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue > > > > > > All buffers and ring info should be written before tail register update. > > > This patch relocates the write memory barrier before updating tail > > > register to avoid potential issues. > > > > > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") > > > Cc: sta...@dpdk.org > > > > > > Signed-off-by: Xiaoyun Li > > > --- > > > v2: > > > * Replaced rte_wmb with rte_io_wmb since rte_io_wmb is enough. > > > --- > > > drivers/raw/ntb/ntb.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index > > > ad7f6abfd..c7de86f36 100644 > > > --- a/drivers/raw/ntb/ntb.c > > > +++ b/drivers/raw/ntb/ntb.c > > > @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev, > > > sizeof(struct ntb_used) * nb1); > > > rte_memcpy(txq->tx_used_ring, tx_used + nb1, > > > sizeof(struct ntb_used) * nb2); > > > + rte_io_wmb(); > > As both txq->tx_used_ring and *txq->used_cnt are physically reside in the > PCI > > device side, rte_io_wmb is correct to ensure the ordering. > > > > > *txq->used_cnt = txq->last_used; > > > - rte_wmb(); > > > > > > /* update queue stats */ > > > hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; @@ - > 789,8 > > +789,8 @@ > > > ntb_dequeue_bufs(struct rte_rawdev *dev, > > > sizeof(struct ntb_desc) * nb1); > > > rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, > > > sizeof(struct ntb_desc) * nb2); > > > + rte_io_wmb(); > > > *rxq->avail_cnt = rxq->last_avail; > > > - rte_wmb(); > > > > > > /* update queue stats */ > > > off = NTB_XSTATS_NUM * ((size_t)context + 1); > > > -- > > > 2.17.1 > > > > Reviewed-by: Gavin Hu
Re: [dpdk-dev] [PATCH] net/ixgbe: add or remove MAC address
Hi Xiaolong > -Original Message- > From: Ye, Xiaolong > Sent: Monday, December 23, 2019 3:25 PM > To: Sun, GuinanX > Cc: dev@dpdk.org; Lu, Wenzhuo ; Yang, Qiming > ; Xing, Beilei > Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: add or remove MAC address > > Hi, guinan > > For the title, better to use > > Add support for vf MAC address add and remove > > or something like so. I agree with you and I will fix it in V2's patch > > On 12/03, Guinan Sun wrote: > >Ixgbe PMD pf host code needs to support ixgbevf mac address add and > >remove. For this purpose, a response was added between pf and vf to > >update the mac address. > > Does this mean each one vf can have multiple MAC addresses after this patch, > or > this `add` actually means to replace the old mac address with the new one? It means each one vf can have multiple MAC addresses after this patch > > > > >Signed-off-by: Guinan Sun > >--- > > drivers/net/ixgbe/ixgbe_ethdev.h | 1 + > > drivers/net/ixgbe/ixgbe_pf.c | 35 > > 2 files changed, 36 insertions(+) > > > >diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h > >b/drivers/net/ixgbe/ixgbe_ethdev.h > >index 76a1b9d18..e1cd8fd16 100644 > >--- a/drivers/net/ixgbe/ixgbe_ethdev.h > >+++ b/drivers/net/ixgbe/ixgbe_ethdev.h > >@@ -270,6 +270,7 @@ struct ixgbe_vf_info { > > uint8_t api_version; > > uint16_t switch_domain_id; > > uint16_t xcast_mode; > >+uint16_t mac_count; > > How is this mac_count initialized? This variable is initialized to 0 when the ixgbe_adapter structure is initialized, and the method is similar to vlan_count. > > > }; > > > > /* > >diff --git a/drivers/net/ixgbe/ixgbe_pf.c > >b/drivers/net/ixgbe/ixgbe_pf.c index d0d85e138..76dbed2ab 100644 > >--- a/drivers/net/ixgbe/ixgbe_pf.c > >+++ b/drivers/net/ixgbe/ixgbe_pf.c > >@@ -748,6 +748,37 @@ ixgbe_set_vf_mc_promisc(struct rte_eth_dev *dev, > uint32_t vf, uint32_t *msgbuf) > > return 0; > > } > > > >+static int > >+ixgbe_set_vf_macvlan_msg(struct rte_eth_dev *dev, uint32_t vf, > >+uint32_t *msgbuf) { > >+struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data- > >dev_private); > >+struct ixgbe_vf_info *vf_info = > >+*(IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data- > >dev_private)); > >+uint8_t *new_mac = (uint8_t *)(&msgbuf[1]); > >+int index = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) >> > >+IXGBE_VT_MSGINFO_SHIFT; > >+ > >+if (index) { > >+if (!rte_is_valid_assigned_ether_addr( > >+(struct rte_ether_addr *)new_mac)) { > >+RTE_LOG(ERR, PMD, "set invalid mac vf:%d\n", vf); > >+return -1; > >+} > >+ > >+if (new_mac == NULL) > >+return -1; > > I feel the null check should be in front of valid ether addr check, otherwise > there > might be null pointer reference issue. Ok,I will fix it in V2's patch. > > Thanks, > Xiaolong > > >+ > >+vf_info[vf].mac_count++; > >+ > >+hw->mac.ops.set_rar(hw, vf_info[vf].mac_count, > >+new_mac, vf, IXGBE_RAH_AV); > >+} else { > >+hw->mac.ops.clear_rar(hw, vf_info[vf].mac_count); > >+vf_info[vf].mac_count = 0; > >+} > >+return 0; > >+} > >+ > > static int > > ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t vf) { @@ > >-835,6 +866,10 @@ ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t > vf) > > if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED) > > retval = ixgbe_set_vf_mc_promisc(dev, vf, msgbuf); > > break; > >+case IXGBE_VF_SET_MACVLAN: > >+if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED) > >+retval = ixgbe_set_vf_macvlan_msg(dev, vf, msgbuf); > >+break; > > default: > > PMD_DRV_LOG(DEBUG, "Unhandled Msg %8.8x", > (unsigned)msgbuf[0]); > > retval = IXGBE_ERR_MBX; > >-- > >2.17.1 > >
Re: [dpdk-dev] [PATCH v6 1/6] lib/eal: implement the family of rte bit operation APIs
On Sat, Dec 21, 2019 at 9:37 PM Honnappa Nagarahalli wrote: > > > +__rte_experimental > > > +static inline uint32_t > > > +rte_get_bit32_relaxed(unsigned int nr, uint32_t *addr) { > > Why not pass the memory order as a parameter? It would reduce the number > > of API calls by half. > I think these APIs should be modelled according to C11 __atomic_xxx APIs. > Otherwise, the programmers have to understand another interface. It will also > help reduce the number of APIs. > Converting these into macros will help remove the size based duplication of > APIs. I came up with the following macro: > > #define RTE_GET_BIT(nr, var, ret, memorder) \ > ({ \ > if (sizeof(var) == sizeof(uint32_t)) { \ > uint32_t mask1 = 1U << (nr)%32; \ > ret = __atomic_load_n(&var, (memorder)) & mask1;\ > } \ > else {\ > uint64_t mask2 = 1UL << (nr)%64;\ > ret = __atomic_load_n(&var, (memorder)) & mask2;\ > } \ > }) > > The '%' is required to avoid a compiler warning/error. But the '%' operation > will get removed by the compiler since 'nr' is a constant. > IMO, the macro itself is not complex and should not be a pain for debugging. > > Currently, we have 20 APIs in this patch (possibly more coming in the future > and creating an explosion with memory order/size combinations). The above > macro will reduce this to 5 macros without further explosion in number of > combinations. > > Any thoughts? What do others think? # I think, the most common use case for register manipulation is getting/setting of "fields"(set of consecutive bits). IMO, Linux kernel bit manipulation APIs makes more sense. At least have implementation similar to FIELD_GET() and FIELD_SET(). https://github.com/torvalds/linux/blob/master/include/linux/bitfield.h # FIELD_GET will be superset API. A single bit can get through width = 1 # I think, it good to two versions of Macro/API. RTE_FIELD_GET(without atomics) and RTE_FIELD_GET_ATOMIC(with C11 atomics)
Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
Hi Jerin, I think we are on the same page with regard to the problem, and the situations, thanks for illuminating the historical background of the two barriers. About the solution, I added inline comments. > -Original Message- > From: Jerin Jacob > Sent: Friday, December 20, 2019 2:56 PM > To: Gavin Hu > Cc: dpdk-dev ; nd ; David Marchand > ; tho...@monjalon.net; > rasl...@mellanox.com; maxime.coque...@redhat.com; > tiwei@intel.com; hemant.agra...@nxp.com; jer...@marvell.com; > Pavan Nikhilesh ; Honnappa Nagarahalli > ; Ruifeng Wang > ; Phil Yang ; Joyce Kong > ; Steve Capper > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for > aarch64 > > On Fri, Dec 20, 2019 at 12:02 PM Gavin Hu wrote: > > > > Hi Jerin, > > Hi Gavin, > > > > > > > > > > > > > > > > The peripheral coherence order for a memory-mapped peripheral > > > > > signifies the > > > > > > > order in which accesses arrive at the endpoint. For a read or a > write > > > > > RW1 > > > > > > > and a read or a write RW2 to the same peripheral, then RW1 will > > > appear > > > > > in > > > > > > > the peripheral coherence order for the peripheral before RW2 if > > > either > > > > > of > > > > > > > the following cases apply: > > > > > > > 1. RW1 and RW2 are accesses using Non-cacheable or Device > > > attributes > > > > > and > > > > > > > RW1 is Ordered-before RW2. > > > > > > > 2. RW1 and RW2 are accesses using Device-nGnRE or Device- > > > nGnRnE > > > > > attributes > > > > > > > and RW1 appears in program order before RW2. > > > > > > > > > > > > > > > > > > This is true if RW1 and RW2 addresses are device memory. i.e the > > > > > > registers in the PCI bar address. > > > > > > If RW1 is DDR address which is been used by the controller(say NIC > > > > > > ring descriptor) then there will be an issue. > > > > > > For example Intel i40e driver, the admin queue update in Host DDR > > > > > > memory and it updates the doorbell. > > > > > > In such a case, this patch will create an issue. Correct? Have you > > > > > > checked this patch with ARM64 + XL710 controllers? > > > > > > > > This patch relaxes the rte_io_*mb barriers for pure PCI device memory > > > accesses. > > > > > > Yes. This would break cases for mixed access fro i40e drivers. > > > > > > > > > > > For mixed accesses of DDR and PCI device memory, > rte_smp_*mb(DMB > > > ISH) is not sufficient. > > > > But rte_cio_*mb(DMB OSH) is sufficient and can be used. > > > > > > Yes. Let me share a bit of history. > > > > > > 1) There are a lot of drivers(initially developed in x86) that have > > > mixed access and don't have any barriers as x86 does not need it. > > > 2) rte_io introduced to fix that > > > 3) Item (2) introduced the performance issues in the fast path as an > > > optimization rte_cio_* introduced. > > Exactly, this patch is to mitigate the performance issues introduced by > rte_io('dsb' is too much and unnecessary here). > > Rte_cio instead is definitely required for mixed access. > > > > > > So in the current of the scheme of things, we have APIs to FIX > > > portability issue(rte_io) and performance issue(rte_cio). > > > IMO, we may not need any change in infra code now. If you think, the > > > documentation is missing then we can enhance it. > > > If we make infra change then again drivers needs to be updated and > tested. > > No changes for rte_cio, the semantics, and definitions of rte_io does not > change either, if limited the scope to PCI, which is the case in DPDK > context(?). > > The change lies only in the implementation, right? > > > > Just looked at the link you shared and found i40 driver is missing > rte_cio_*mb in i40e_asq_send_command, but the old rte_io_*mb rescued. > > Will submit a new patch in this series to used rte_cio together with new > relaxed rte_io and do more tests. > > > > Yes, this is a big change, also a big optimization, for aarch64, in our > > tests it > has very positive results. > > It will be optimization only when if we are changing in the fast path. > In the slow path, it does not matter. > I think, the First step should be to use rte_cio_* wherever it is > coherent memory used in _fast path_. I think, Almost every driver > fixed that. > > I am not against this patch(changing the slow path to use rte_cio* > from rte_io* and virtio changes associated with that). > If you are taking that patch, pay attention to all the drivers in the > tree which is using rte_io* for mixed access in slowpath. I see 30+ drivers has calling rte_io* directly or indirectly through rte_write/read*. It is hard for me to figure out all the mixed accesses in these drivers, and as you said, it makes no sense to change the _slow path_. How about we keep the old rte_io as is, and introduce 'fast path' version of rte_io for new code use? Then in future, we may merge the two? Another reason about this proposal is maybe there is rte_io calling in the fast path, but they are not mixed accesses and rte_c
[dpdk-dev] [PATCH v2] net/ixgbe: add support for VF MAC address add and remove
Ixgbe PMD pf host code needs to support ixgbevf mac address add and remove. For this purpose, a response was added between pf and vf to update the mac address. Signed-off-by: Guinan Sun --- v2: * Changed the title of commit message. * Checked null in front of valid ether addr check. --- drivers/net/ixgbe/ixgbe_ethdev.h | 1 + drivers/net/ixgbe/ixgbe_pf.c | 35 2 files changed, 36 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h index 76a1b9d18..e1cd8fd16 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.h +++ b/drivers/net/ixgbe/ixgbe_ethdev.h @@ -270,6 +270,7 @@ struct ixgbe_vf_info { uint8_t api_version; uint16_t switch_domain_id; uint16_t xcast_mode; + uint16_t mac_count; }; /* diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c index d0d85e138..78fc8c5f1 100644 --- a/drivers/net/ixgbe/ixgbe_pf.c +++ b/drivers/net/ixgbe/ixgbe_pf.c @@ -748,6 +748,37 @@ ixgbe_set_vf_mc_promisc(struct rte_eth_dev *dev, uint32_t vf, uint32_t *msgbuf) return 0; } +static int +ixgbe_set_vf_macvlan_msg(struct rte_eth_dev *dev, uint32_t vf, uint32_t *msgbuf) +{ + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct ixgbe_vf_info *vf_info = + *(IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private)); + uint8_t *new_mac = (uint8_t *)(&msgbuf[1]); + int index = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) >> + IXGBE_VT_MSGINFO_SHIFT; + + if (index) { + if (new_mac == NULL) + return -1; + + if (!rte_is_valid_assigned_ether_addr( + (struct rte_ether_addr *)new_mac)) { + RTE_LOG(ERR, PMD, "set invalid mac vf:%d\n", vf); + return -1; + } + + vf_info[vf].mac_count++; + + hw->mac.ops.set_rar(hw, vf_info[vf].mac_count, + new_mac, vf, IXGBE_RAH_AV); + } else { + hw->mac.ops.clear_rar(hw, vf_info[vf].mac_count); + vf_info[vf].mac_count = 0; + } + return 0; +} + static int ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t vf) { @@ -835,6 +866,10 @@ ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t vf) if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED) retval = ixgbe_set_vf_mc_promisc(dev, vf, msgbuf); break; + case IXGBE_VF_SET_MACVLAN: + if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED) + retval = ixgbe_set_vf_macvlan_msg(dev, vf, msgbuf); + break; default: PMD_DRV_LOG(DEBUG, "Unhandled Msg %8.8x", (unsigned)msgbuf[0]); retval = IXGBE_ERR_MBX; -- 2.17.1
Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
On Mon, Dec 23, 2019 at 2:44 PM Gavin Hu wrote: > > Hi Jerin, Hi Gavin, > > I think we are on the same page with regard to the problem, and the > situations, thanks for illuminating the historical background of the two > barriers. > About the solution, I added inline comments. > > It will be optimization only when if we are changing in the fast path. > > In the slow path, it does not matter. > > I think, the First step should be to use rte_cio_* wherever it is > > coherent memory used in _fast path_. I think, Almost every driver > > fixed that. > > > > I am not against this patch(changing the slow path to use rte_cio* > > from rte_io* and virtio changes associated with that). > > If you are taking that patch, pay attention to all the drivers in the > > tree which is using rte_io* for mixed access in slowpath. > I see 30+ drivers has calling rte_io* directly or indirectly through > rte_write/read*. > It is hard for me to figure out all the mixed accesses in these drivers, and > as you said, it makes no sense to change the _slow path_. > > How about we keep the old rte_io as is, and introduce 'fast path' version of > rte_io for new code use? > Then in future, we may merge the two? > Another reason about this proposal is maybe there is rte_io calling in the > fast path, but they are not mixed accesses and rte_cio is not suitable. Could you share more details about the case where fastpath + rte_io needed + rte_cio is not suitable? > > Any thoughts? > > > > > > But as the case in i40e, we must pay attention to where rte_cio was > > missing but rescued by old rte_io(but not by new rte_io). > > > > > >
Re: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue
Hi Still, stability and correctness are much more important than performance. As I said, with WC can benefit more than 20X perf. Comparing to this benefit, the difference between rte_wmb and rte_io_wmb is not that important. And in my test, the performance is not that bad with rte_wmb especially with large packets which are the normal use cases. BTW, I've searched linux kernel codes and don't see any NTB device on arm platform. So I don't think you need to consider the perf hurt to arm. Best Regards Xiaoyun Li > -Original Message- > From: Gavin Hu [mailto:gavin...@arm.com] > Sent: Monday, December 23, 2019 16:38 > To: Li, Xiaoyun ; Wu, Jingjing > Cc: dev@dpdk.org; Maslekar, Omkar ; > sta...@dpdk.org; nd ; jer...@marvell.com; Honnappa > Nagarahalli ; Richardson, Bruce > ; nd > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier issue > > Hi Xiaoyun, > > > -Original Message- > > From: Li, Xiaoyun > > Sent: Monday, December 23, 2019 3:52 PM > > To: Gavin Hu ; Wu, Jingjing > > Cc: dev@dpdk.org; Maslekar, Omkar ; > > sta...@dpdk.org; nd > > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier > > issue > > > > Hi > > I reconsidered and retested about this issue. > > I still need to use rte_wmb instead of using rte_io_wmb. > > > > Because to achieve high performance, ntb needs to turn on WC(write > > combining) feature. The perf difference with and without WC enabled is > > more than 20X. > > And when WC enabled, rte_io_wmb cannot make sure the instructions are > > in order only rte_wmb can make sure that. > > > > And in my retest, when sending 64 bytes packets, using rte_io_wmb will > > cause out-of-order issue and cause memory corruption on rx side. > > And using rte_wmb is fine. > That's true, as it is declared as 'write combine' region, even x86 is known as > strong ordered, it is the interconnect or PCI RC may do the reordering, 'write > combine', 'write coalescing', which caused this problem. > IMO, rte_io_*mb barriers on x86 should be promoted to stronger is WC is > involved(but that will sap performance for non-WC memories?). > https://code.dpdk.org/dpdk/latest/source/lib/librte_eal/common/include/arch/ > x86/rte_atomic.h#L78 > > Using rte_wmb will hurt performance for aarch64 also, as pci device memory > accesses to a single device are strongly ordered therefore the strongest > rte_wmb is not necessary. > > So I can only use v1 patch and suspend v2 patch in patchwork. > > > > Best Regards > > Xiaoyun Li > > > > > -Original Message- > > > From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com] > > > Sent: Monday, December 16, 2019 18:50 > > > To: Li, Xiaoyun ; Wu, Jingjing > > > > > Cc: dev@dpdk.org; Maslekar, Omkar ; > > > sta...@dpdk.org; nd > > > Subject: RE: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier > > issue > > > > > > > > > > > > > -Original Message- > > > > From: dev On Behalf Of Xiaoyun Li > > > > Sent: Monday, December 16, 2019 9:59 AM > > > > To: jingjing...@intel.com > > > > Cc: dev@dpdk.org; omkar.masle...@intel.com; Xiaoyun Li > > > > ; sta...@dpdk.org > > > > Subject: [dpdk-dev] [PATCH v2] raw/ntb: fix write memory barrier > > > > issue > > > > > > > > All buffers and ring info should be written before tail register update. > > > > This patch relocates the write memory barrier before updating tail > > > > register to avoid potential issues. > > > > > > > > Fixes: 11b5c7daf019 ("raw/ntb: add enqueue and dequeue functions") > > > > Cc: sta...@dpdk.org > > > > > > > > Signed-off-by: Xiaoyun Li > > > > --- > > > > v2: > > > > * Replaced rte_wmb with rte_io_wmb since rte_io_wmb is enough. > > > > --- > > > > drivers/raw/ntb/ntb.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index > > > > ad7f6abfd..c7de86f36 100644 > > > > --- a/drivers/raw/ntb/ntb.c > > > > +++ b/drivers/raw/ntb/ntb.c > > > > @@ -683,8 +683,8 @@ ntb_enqueue_bufs(struct rte_rawdev *dev, > > > >sizeof(struct ntb_used) * nb1); > > > > rte_memcpy(txq->tx_used_ring, tx_used + nb1, > > > >sizeof(struct ntb_used) * nb2); > > > > + rte_io_wmb(); > > > As both txq->tx_used_ring and *txq->used_cnt are physically reside > > > in the > > PCI > > > device side, rte_io_wmb is correct to ensure the ordering. > > > > > > > *txq->used_cnt = txq->last_used; > > > > - rte_wmb(); > > > > > > > > /* update queue stats */ > > > > hw->ntb_xstats[NTB_TX_BYTES_ID + off] += bytes; @@ - > > 789,8 > > > +789,8 @@ > > > > ntb_dequeue_bufs(struct rte_rawdev *dev, > > > >sizeof(struct ntb_desc) * nb1); > > > > rte_memcpy(rxq->rx_desc_ring, rx_desc + nb1, > > > >sizeof(struct ntb_desc) * nb2); > > > > + rte_io_wmb(); > > > >
Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
Hi Jerin, > -Original Message- > From: Jerin Jacob > Sent: Monday, December 23, 2019 5:20 PM > To: Gavin Hu > Cc: dpdk-dev ; nd ; David Marchand > ; tho...@monjalon.net; > rasl...@mellanox.com; maxime.coque...@redhat.com; > tiwei@intel.com; hemant.agra...@nxp.com; jer...@marvell.com; > Pavan Nikhilesh ; Honnappa Nagarahalli > ; Ruifeng Wang > ; Phil Yang ; Joyce Kong > ; Steve Capper > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for > aarch64 > > On Mon, Dec 23, 2019 at 2:44 PM Gavin Hu wrote: > > > > Hi Jerin, > > Hi Gavin, > > > > > I think we are on the same page with regard to the problem, and the > situations, thanks for illuminating the historical background of the two > barriers. > > About the solution, I added inline comments. > > > It will be optimization only when if we are changing in the fast path. > > > In the slow path, it does not matter. > > > I think, the First step should be to use rte_cio_* wherever it is > > > coherent memory used in _fast path_. I think, Almost every driver > > > fixed that. > > > > > > I am not against this patch(changing the slow path to use rte_cio* > > > from rte_io* and virtio changes associated with that). > > > If you are taking that patch, pay attention to all the drivers in the > > > tree which is using rte_io* for mixed access in slowpath. > > I see 30+ drivers has calling rte_io* directly or indirectly through > rte_write/read*. > > It is hard for me to figure out all the mixed accesses in these drivers, and > as you said, it makes no sense to change the _slow path_. > > > > How about we keep the old rte_io as is, and introduce 'fast path' version > of rte_io for new code use? > > Then in future, we may merge the two? > > Another reason about this proposal is maybe there is rte_io calling in the > fast path, but they are not mixed accesses and rte_cio is not suitable. > > Could you share more details about the case where fastpath + rte_io > needed + rte_cio is not suitable? Here is an example for i40e, in the fast path, but only a pure io memory access. https://code.dpdk.org/dpdk/v19.11/source/drivers/net/i40e/i40e_rxtx.c#L1208 I wanted two variants of rte_io, because also x86 requires two as indicated here, one for no-WC and another for WC. http://inbox.dpdk.org/dev/20191204151916.12607-1-xiaoyun...@intel.com/T/#ea8bb1b4a378ab09baedbf95b4542bcb92f4a396f > > > > > Any thoughts? > > > > > > > > > But as the case in i40e, we must pay attention to where rte_cio was > > > missing but rescued by old rte_io(but not by new rte_io). > > > > > > > >
[dpdk-dev] [PATCH v2] build: explicitly enable sse4 for meson builds
If the compiler does not recognise the specific CPU when building with the default "native" machine type, sse4.2 instructions can be missing, causing a build error. Rather than advising the user to change the machine type, we can just turn on SSE4.2 directly. This can prevent issues with running automated tests with older compilers/distros on newer hardware. Signed-off-by: Bruce Richardson --- V2: insert missing quotes around "-msse4" --- config/x86/meson.build | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/config/x86/meson.build b/config/x86/meson.build index 8b0fa3e6f..adc857ba2 100644 --- a/config/x86/meson.build +++ b/config/x86/meson.build @@ -15,11 +15,9 @@ if not is_windows endif # we require SSE4.2 for DPDK -sse_errormsg = '''SSE4.2 instruction set is required for DPDK. -Please set the machine type to "nehalem" or "corei7" or higher value''' - if cc.get_define('__SSE4_2__', args: machine_args) == '' - error(sse_errormsg) + message('SSE 4.2 not enabled by default, explicitly enabling') + machine_args += '-msse4' endif base_flags = ['SSE', 'SSE2', 'SSE3','SSSE3', 'SSE4_1', 'SSE4_2'] -- 2.24.1
[dpdk-dev] Inconsistent behavior of mempool with regards to hugepage allocation
Hi, I'm not sure if this is a bug, but I've seen an inconsistency in the behavior of DPDK with regards to hugepage allocation for rte_mempool. Basically, for the same mempool size, the number of hugepages allocated changes from run to run. Here's how I reproduce with DPDK 19.11. IOVA=pa (default) 1. Reserve 16x1G hugepages on socket 0 2. Replace examples/skeleton/basicfwd.c with the code below, build and run make && ./build/basicfwd 3. At the same time, watch the number of hugepages allocated "watch -n.1 ls /dev/hugepages" 4. Repeat step 2 If you can reproduce, you should see that for some runs, DPDK allocates 5 hugepages, other times it allocates 6. When it allocates 6, if you watch the output from step 3., you will see that DPDK first try to allocate 5 hugepages, then unmap all 5, retry, and got 6. For our use case, it's important that DPDK allocate the same number of hugepages on every run so we can get reproducable results. Studying the code, this seems to be the behavior of rte_mempool_populate_default(). If I understand correctly, if the first try fail to get 5 IOVA-contiguous pages, it retries, relaxing the IOVA-contiguous condition, and eventually wound up with 6 hugepages. Questions: 1. Why does the API sometimes fail to get IOVA contig mem, when hugepage memory is abundant? 2. Why does the 2nd retry need N+1 hugepages? Some insights for Q1: From my experiments, seems like the IOVA of the first hugepage is not guaranteed to be at the start of the IOVA space (understandably). It could explain the retry when the IOVA of the first hugepage is near the end of the IOVA space. But I have also seen situation where the 1st hugepage is near the beginning of the IOVA space and it still failed the 1st time. Here's the code: #include #include int main(int argc, char *argv[]) { struct rte_mempool *mbuf_pool; unsigned mbuf_pool_size = 2097151; int ret = rte_eal_init(argc, argv); if (ret < 0) rte_exit(EXIT_FAILURE, "Error with EAL initialization\n"); printf("Creating mbuf pool size=%u\n", mbuf_pool_size); mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", mbuf_pool_size, 256, 0, RTE_MBUF_DEFAULT_BUF_SIZE, 0); printf("mbuf_pool %p\n", mbuf_pool); return 0; } Best regards, BL
Re: [dpdk-dev] [EXT] Re: [PATCH] ethdev: allow multiple security sessions to use one rte flow
> > >>> The rte_security API which enables inline protocol/crypto > > >>> feature mandates that for every security session an rte_flow > > >>> is > > >> created. > > >>> This would internally translate to a rule in the hardware > > >>> which would do packet classification. > > >>> > > >>> In rte_securty, one SA would be one security session. And if > > >>> an rte_flow need to be created for every session, the number > > >>> of SAs supported by an inline implementation would be > > >>> limited by the number of rte_flows the PMD would be able to > > support. > > >>> > > >>> If the fields SPI & IP addresses are allowed to be a range, > > >>> then this limitation can be overcome. Multiple flows will be > > >>> able to use one rule for SECURITY processing. In this case, > > >>> the security session provided as conf would be NULL. > > >> Wonder what will be the usage model for it? > > >> AFAIK, RFC 4301 clearly states that either SPI value alone > > >> or in conjunction with dst (and src) IP should clearly > > >> identify SA for inbound SAD > > lookup. > > >> Am I missing something obvious here? > > > [Anoob] Existing SECURITY action type requires application to > > > create an 'rte_flow' per SA, which is not really required if > > > h/w can use SPI to uniquely > > identify the security session/SA. > > > Existing rte_flow usage: IP (dst,src) + ESP + SPI -> security > > > processing enabled on one security session (ie on SA) > > > > > > The above rule would uniquely identify packets for an SA. But > > > with the above usage, we would quickly exhaust entries > > > available in h/w lookup tables (which are limited on our > > > hardware). But if h/w can use SPI field to index > > into a table (for example), then the above requirement of one > > rte_flow per SA is not required. > > > Proposed rte_flow usage: IP (any) + ESP + SPI (any) -> > > > security processing enabled on all ESP packets > > >> So this means that SA will be indexed only by spi? What about > > >> SA's which are indexed by SPI+DIP+SIP? > > > Now h/w could use SPI to index into a pre-populated table to > > > get security session. Please do note that, SPI is not ignored > > > during the actual > > lookup. Just that it is not used while creating 'rte_flow'. > > > > And this table will be prepopulated by user and pointer to it > > will be somehow passed via rte_flow API? > > If yes, then what would be the mechanism? > > >>> [Anoob] I'm not sure what exactly you meant by user. But may be > > >>> I'll explain > > >> how it's done in OCTEONTX2 PMD. > > >>> The application would create security_session for every SA. SPI > > >>> etc would be > > >> available to PMD (in conf) when the session is created. Now the > > >> PMD would populate SA related params in a specific location that > > >> h/w would access. This memory is allocated during device > > >> configure and h/w would have the pointer after the initialization is > > done. > > >> If memory is allocated during device configure what is upper > > >> limit for number of sessions? What if app needs more? > > >>> PMD uses SPI as index to write into specific locations(during > > >>> session create) > > >> and h/w would use it when it sees an ESP packet eligible for > > >> SECURITY (in receive path, per packet). As long as session > > >> creation could populate at memory locations that h/w would look > > >> at, this scheme would > > work. > > > [Anoob] Yes. But we need to allow application to control the h/w > > > ipsec > > processing as well. Let's say, application wants to handle a > > specific SPI range in lookaside mode (may be because of unsupported > > capabilities?), in that case having rte_flow will help in fine > > tuning how the > > >> h/w packet steering happens. > > Also, rte_flow enables H/w parsing on incoming packets. This info > > is useful even after IPsec processing is complete. Or if > > application wants to give higher priority to a range of SPIs, > > rte_flow would allow doing > > >> so. > > >> What algorithm of indexing by SPI is there? Could I use any > > >> arbitrary SPI? If some kind of hashing is used, what about > > >> collisions? > > > [Anoob] That is implementation dependent. In our PMD, we map it > > > one > > >> to one. > > As in, SPI is used as index in the table. > > So, as far as you are mapping one to one and using SPI as an index, > > a lot of memory is wasted in the table for unused SPI's. Also, you > > are not able to have a table with 2^32 sessions
Re: [dpdk-dev] [PATCH 14/14] examples/ipsec-secgw: add cmd line option for bufs
> > Add command line option -s which can be used to configure number > of buffers in a pool. Default number of buffers is 8192. > > Signed-off-by: Anoob Joseph > Signed-off-by: Lukasz Bartosik > --- > examples/ipsec-secgw/ipsec-secgw.c | 23 +++ > 1 file changed, 19 insertions(+), 4 deletions(-) > > diff --git a/examples/ipsec-secgw/ipsec-secgw.c > b/examples/ipsec-secgw/ipsec-secgw.c > index 76719f2..f8e28d6 100644 > --- a/examples/ipsec-secgw/ipsec-secgw.c > +++ b/examples/ipsec-secgw/ipsec-secgw.c > @@ -59,8 +59,6 @@ volatile bool force_quit; > > #define MEMPOOL_CACHE_SIZE 256 > > -#define NB_MBUF (32000) > - > #define CDEV_QUEUE_DESC 2048 > #define CDEV_MAP_ENTRIES 16384 > #define CDEV_MP_NB_OBJS 1024 > @@ -167,6 +165,7 @@ static int32_t numa_on = 1; /**< NUMA is enabled by > default. */ > static uint32_t nb_lcores; > static uint32_t single_sa; > static uint32_t single_sa_idx; > +static uint32_t nb_bufs_in_pool = 8192; Why to change the default number (behavior) here? Why not to keep existing one as default? > > /* > * RX/TX HW offload capabilities to enable/use on ethernet ports. > @@ -1261,6 +1260,7 @@ print_usage(const char *prgname) > " [-w REPLAY_WINDOW_SIZE]" > " [-e]" > " [-a]" > + " [-s NUMBER_OF_MBUFS_IN_PKT_POOL]" > " -f CONFIG_FILE" > " --config (port,queue,lcore)[,(port,queue,lcore)]" > " [--single-sa SAIDX]" > @@ -1284,6 +1284,7 @@ print_usage(const char *prgname) > " size for each SA\n" > " -e enables ESN\n" > " -a enables SA SQN atomic behaviour\n" > + " -s number of mbufs in packet pool (default 8192)\n" > " -f CONFIG_FILE: Configuration file\n" > " --config (port,queue,lcore): Rx queue configuration\n" > " --single-sa SAIDX: Use single SA index for outbound > traffic,\n" > @@ -1534,7 +1535,7 @@ parse_args(int32_t argc, char **argv, struct eh_conf > *eh_conf) > > argvopt = argv; > > - while ((opt = getopt_long(argc, argvopt, "aelp:Pu:f:j:w:", > + while ((opt = getopt_long(argc, argvopt, "aelp:Pu:f:j:w:s:", > lgopts, &option_index)) != EOF) { > > switch (opt) { > @@ -1568,6 +1569,19 @@ parse_args(int32_t argc, char **argv, struct eh_conf > *eh_conf) > cfgfile = optarg; > f_present = 1; > break; > + > + case 's': > + ret = parse_decimal(optarg); > + if (ret < 0) { > + printf("Invalid number of buffers in a pool: " > + "%s\n", optarg); > + print_usage(prgname); > + return -1; > + } > + > + nb_bufs_in_pool = ret; > + break; > + > case 'j': > ret = parse_decimal(optarg); > if (ret < RTE_MBUF_DEFAULT_BUF_SIZE || > @@ -2792,11 +2806,12 @@ main(int32_t argc, char **argv) > if (socket_ctx[socket_id].mbuf_pool) > continue; > > - pool_init(&socket_ctx[socket_id], socket_id, NB_MBUF); > + pool_init(&socket_ctx[socket_id], socket_id, nb_bufs_in_pool); > session_pool_init(&socket_ctx[socket_id], socket_id, sess_sz); > session_priv_pool_init(&socket_ctx[socket_id], socket_id, > sess_sz); > } > + printf("Number of mbufs in packet pool %d\n", nb_bufs_in_pool); > > RTE_ETH_FOREACH_DEV(portid) { > if ((enabled_port_mask & (1 << portid)) == 0) > -- > 2.7.4
Re: [dpdk-dev] [PATCH 14/14] examples/ipsec-secgw: add cmd line option for bufs
> > Add command line option -s which can be used to configure number > > of buffers in a pool. Default number of buffers is 8192. > > > > Signed-off-by: Anoob Joseph > > Signed-off-by: Lukasz Bartosik > > --- > > examples/ipsec-secgw/ipsec-secgw.c | 23 +++ > > 1 file changed, 19 insertions(+), 4 deletions(-) > > > > diff --git a/examples/ipsec-secgw/ipsec-secgw.c > > b/examples/ipsec-secgw/ipsec-secgw.c > > index 76719f2..f8e28d6 100644 > > --- a/examples/ipsec-secgw/ipsec-secgw.c > > +++ b/examples/ipsec-secgw/ipsec-secgw.c > > @@ -59,8 +59,6 @@ volatile bool force_quit; > > > > #define MEMPOOL_CACHE_SIZE 256 > > > > -#define NB_MBUF(32000) > > - > > #define CDEV_QUEUE_DESC 2048 > > #define CDEV_MAP_ENTRIES 16384 > > #define CDEV_MP_NB_OBJS 1024 > > @@ -167,6 +165,7 @@ static int32_t numa_on = 1; /**< NUMA is enabled by > > default. */ > > static uint32_t nb_lcores; > > static uint32_t single_sa; > > static uint32_t single_sa_idx; > > +static uint32_t nb_bufs_in_pool = 8192; > > Why to change the default number (behavior) here? > Why not to keep existing one as default? Or, at least try to guess required number of mbufs (like l3fwd, etc., do)? > > > > > /* > > * RX/TX HW offload capabilities to enable/use on ethernet ports. > > @@ -1261,6 +1260,7 @@ print_usage(const char *prgname) > > " [-w REPLAY_WINDOW_SIZE]" > > " [-e]" > > " [-a]" > > + " [-s NUMBER_OF_MBUFS_IN_PKT_POOL]" > > " -f CONFIG_FILE" > > " --config (port,queue,lcore)[,(port,queue,lcore)]" > > " [--single-sa SAIDX]" > > @@ -1284,6 +1284,7 @@ print_usage(const char *prgname) > > " size for each SA\n" > > " -e enables ESN\n" > > " -a enables SA SQN atomic behaviour\n" > > + " -s number of mbufs in packet pool (default 8192)\n" > > " -f CONFIG_FILE: Configuration file\n" > > " --config (port,queue,lcore): Rx queue configuration\n" > > " --single-sa SAIDX: Use single SA index for outbound > > traffic,\n" > > @@ -1534,7 +1535,7 @@ parse_args(int32_t argc, char **argv, struct eh_conf > > *eh_conf) > > > > argvopt = argv; > > > > - while ((opt = getopt_long(argc, argvopt, "aelp:Pu:f:j:w:", > > + while ((opt = getopt_long(argc, argvopt, "aelp:Pu:f:j:w:s:", > > lgopts, &option_index)) != EOF) { > > > > switch (opt) { > > @@ -1568,6 +1569,19 @@ parse_args(int32_t argc, char **argv, struct eh_conf > > *eh_conf) > > cfgfile = optarg; > > f_present = 1; > > break; > > + > > + case 's': > > + ret = parse_decimal(optarg); > > + if (ret < 0) { > > + printf("Invalid number of buffers in a pool: " > > + "%s\n", optarg); > > + print_usage(prgname); > > + return -1; > > + } > > + > > + nb_bufs_in_pool = ret; > > + break; > > + > > case 'j': > > ret = parse_decimal(optarg); > > if (ret < RTE_MBUF_DEFAULT_BUF_SIZE || > > @@ -2792,11 +2806,12 @@ main(int32_t argc, char **argv) > > if (socket_ctx[socket_id].mbuf_pool) > > continue; > > > > - pool_init(&socket_ctx[socket_id], socket_id, NB_MBUF); > > + pool_init(&socket_ctx[socket_id], socket_id, nb_bufs_in_pool); > > session_pool_init(&socket_ctx[socket_id], socket_id, sess_sz); > > session_priv_pool_init(&socket_ctx[socket_id], socket_id, > > sess_sz); > > } > > + printf("Number of mbufs in packet pool %d\n", nb_bufs_in_pool); > > > > RTE_ETH_FOREACH_DEV(portid) { > > if ((enabled_port_mask & (1 << portid)) == 0) > > -- > > 2.7.4
Re: [dpdk-dev] [PATCH v6 1/6] lib/eal: implement the family of rte bit operation APIs
On Mon, 23 Dec 2019 05:04:12 + Honnappa Nagarahalli wrote: > > > > > > On Sat, 21 Dec 2019 16:07:23 + > > Honnappa Nagarahalli wrote: > > > > > Converting these into macros will help remove the size based duplication > > > of > > APIs. I came up with the following macro: > > > > > > #define RTE_GET_BIT(nr, var, ret, memorder) \ ({ \ > > > if (sizeof(var) == sizeof(uint32_t)) { \ > > > uint32_t mask1 = 1U << (nr)%32; \ > > > ret = __atomic_load_n(&var, (memorder)) & mask1;\ > > > } \ > > > else {\ > > > uint64_t mask2 = 1UL << (nr)%64;\ > > > ret = __atomic_load_n(&var, (memorder)) & mask2;\ > > > } \ > > > }) > > > > Macros are more error prone. Especially because this is in exposed header > > file > That's another question I have. Why do we need to have these APIs in a public > header file? These will add to the ABI burden as well. These APIs should be > in a common-but-not-public header file. I am also not sure how helpful these > APIs are for applications as these APIs seem to have considered requirements > only from the PMDs. Why do we have to wrap every C atomic builtin? What value is there in that?
Re: [dpdk-dev] [PATCH 09/14] examples/ipsec-secgw: add eventmode to ipsec-secgw
> > Add eventmode support to ipsec-secgw. This uses event helper to setup > and use the eventmode capabilities. Add driver inbound worker. > > Example command: > ./ipsec-secgw -c 0x1 -w 0002:02:00.0,ipsec_in_max_spi=100 -w 0002:07:00.0 > -w 0002:0e:00.0 -w 0002:10:00.1 -- -P -p 0x3 -u 0x1 > --config "(0,0,0),(1,0,0)" -f a-aes-gcm-msa.cfg --transfer-mode 1 > --schedule-type 2 --process-mode drv --process-dir in > > Signed-off-by: Anoob Joseph > Signed-off-by: Lukasz Bartosik > --- > examples/ipsec-secgw/Makefile | 1 + > examples/ipsec-secgw/event_helper.c | 3 + > examples/ipsec-secgw/event_helper.h | 26 +++ > examples/ipsec-secgw/ipsec-secgw.c | 344 > +++- > examples/ipsec-secgw/ipsec.h| 7 + > examples/ipsec-secgw/ipsec_worker.c | 180 +++ > examples/ipsec-secgw/meson.build| 2 +- > 7 files changed, 555 insertions(+), 8 deletions(-) > create mode 100644 examples/ipsec-secgw/ipsec_worker.c > > diff --git a/examples/ipsec-secgw/Makefile b/examples/ipsec-secgw/Makefile > index 09e3c5a..f6fd94c 100644 > --- a/examples/ipsec-secgw/Makefile > +++ b/examples/ipsec-secgw/Makefile > @@ -15,6 +15,7 @@ SRCS-y += sa.c > SRCS-y += rt.c > SRCS-y += ipsec_process.c > SRCS-y += ipsec-secgw.c > +SRCS-y += ipsec_worker.c > SRCS-y += event_helper.c > > CFLAGS += -gdwarf-2 > diff --git a/examples/ipsec-secgw/event_helper.c > b/examples/ipsec-secgw/event_helper.c > index 6549875..44f997d 100644 > --- a/examples/ipsec-secgw/event_helper.c > +++ b/examples/ipsec-secgw/event_helper.c > @@ -984,6 +984,9 @@ eh_find_worker(uint32_t lcore_id, struct eh_conf *conf, > else > curr_conf.cap.burst = EH_RX_TYPE_NON_BURST; > > + curr_conf.cap.ipsec_mode = conf->ipsec_mode; > + curr_conf.cap.ipsec_dir = conf->ipsec_dir; > + > /* Parse the passed list and see if we have matching capabilities */ > > /* Initialize the pointer used to traverse the list */ > diff --git a/examples/ipsec-secgw/event_helper.h > b/examples/ipsec-secgw/event_helper.h > index 2895dfa..07849b0 100644 > --- a/examples/ipsec-secgw/event_helper.h > +++ b/examples/ipsec-secgw/event_helper.h > @@ -74,6 +74,22 @@ enum eh_tx_types { > EH_TX_TYPE_NO_INTERNAL_PORT > }; > > +/** > + * Event mode ipsec mode types > + */ > +enum eh_ipsec_mode_types { > + EH_IPSEC_MODE_TYPE_APP = 0, > + EH_IPSEC_MODE_TYPE_DRIVER > +}; > + > +/** > + * Event mode ipsec direction types > + */ > +enum eh_ipsec_dir_types { > + EH_IPSEC_DIR_TYPE_OUTBOUND = 0, > + EH_IPSEC_DIR_TYPE_INBOUND, > +}; > + > /* Event dev params */ > struct eventdev_params { > uint8_t eventdev_id; > @@ -183,6 +199,12 @@ struct eh_conf { >*/ > void *mode_params; > /**< Mode specific parameters */ > + > + /** Application specific params */ > + enum eh_ipsec_mode_types ipsec_mode; > + /**< Mode of ipsec run */ > + enum eh_ipsec_dir_types ipsec_dir; > + /**< Direction of ipsec processing */ > }; > > /* Workers registered by the application */ > @@ -194,6 +216,10 @@ struct eh_app_worker_params { > /**< Specify status of rx type burst */ > uint64_t tx_internal_port : 1; > /**< Specify whether tx internal port is available */ > + uint64_t ipsec_mode : 1; > + /**< Specify ipsec processing level */ > + uint64_t ipsec_dir : 1; > + /**< Specify direction of ipsec */ > }; > uint64_t u64; > } cap; > diff --git a/examples/ipsec-secgw/ipsec-secgw.c > b/examples/ipsec-secgw/ipsec-secgw.c > index 7506922..c5d95b9 100644 > --- a/examples/ipsec-secgw/ipsec-secgw.c > +++ b/examples/ipsec-secgw/ipsec-secgw.c > @@ -2,6 +2,7 @@ > * Copyright(c) 2016 Intel Corporation > */ > > +#include > #include > #include > #include > @@ -14,6 +15,7 @@ > #include > #include > #include > +#include > #include > > #include > @@ -41,12 +43,17 @@ > #include > #include > #include > +#include > +#include > #include > #include > > +#include "event_helper.h" > #include "ipsec.h" > #include "parser.h" > > +volatile bool force_quit; > + > #define RTE_LOGTYPE_IPSEC RTE_LOGTYPE_USER1 > > #define MAX_JUMBO_PKT_LEN 9600 > @@ -133,12 +140,21 @@ struct flow_info flow_info_tbl[RTE_MAX_ETHPORTS]; > #define CMD_LINE_OPT_CONFIG "config" > #define CMD_LINE_OPT_SINGLE_SA "single-sa" > #define CMD_LINE_OPT_CRYPTODEV_MASK "cryptodev_mask" > +#define CMD_LINE_OPT_TRANSFER_MODE "transfer-mode" > +#define CMD_LINE_OPT_SCHEDULE_TYPE "schedule-type" > +#define CMD_LINE_OPT_IPSEC_MODE "process-mode" > +#define CMD_LINE_OPT_IPSEC_DIR "process-dir" > #define CMD_LINE_OPT_RX_OFFLOAD "rxoffload" > #define CMD_LINE_OPT_TX_OFFLOAD "txoffload" >
Re: [dpdk-dev] [PATCH 11/14] examples/ipsec-secgw: add app processing code
> > Add IPsec application processing code for event mode. > > Signed-off-by: Anoob Joseph > Signed-off-by: Lukasz Bartosik > --- > examples/ipsec-secgw/ipsec-secgw.c | 124 ++ > examples/ipsec-secgw/ipsec-secgw.h | 81 > examples/ipsec-secgw/ipsec.h| 37 +++--- > examples/ipsec-secgw/ipsec_worker.c | 242 > ++-- > examples/ipsec-secgw/ipsec_worker.h | 39 ++ > examples/ipsec-secgw/sa.c | 11 -- > 6 files changed, 409 insertions(+), 125 deletions(-) > create mode 100644 examples/ipsec-secgw/ipsec-secgw.h > create mode 100644 examples/ipsec-secgw/ipsec_worker.h > > diff --git a/examples/ipsec-secgw/ipsec-secgw.c > b/examples/ipsec-secgw/ipsec-secgw.c > index c5d95b9..2e7d4d8 100644 > --- a/examples/ipsec-secgw/ipsec-secgw.c > +++ b/examples/ipsec-secgw/ipsec-secgw.c > @@ -50,12 +50,11 @@ > > #include "event_helper.h" > #include "ipsec.h" > +#include "ipsec_worker.h" > #include "parser.h" > > volatile bool force_quit; > > -#define RTE_LOGTYPE_IPSEC RTE_LOGTYPE_USER1 > - > #define MAX_JUMBO_PKT_LEN 9600 > > #define MEMPOOL_CACHE_SIZE 256 > @@ -70,8 +69,6 @@ volatile bool force_quit; > > #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */ > > -#define NB_SOCKETS 4 > - > /* Configure how many packets ahead to prefetch, when reading packets */ > #define PREFETCH_OFFSET 3 > > @@ -79,8 +76,6 @@ volatile bool force_quit; > > #define MAX_LCORE_PARAMS 1024 > > -#define UNPROTECTED_PORT(port) (unprotected_port_mask & (1 << portid)) > - > /* > * Configurable number of RX/TX ring descriptors > */ > @@ -89,29 +84,6 @@ volatile bool force_quit; > static uint16_t nb_rxd = IPSEC_SECGW_RX_DESC_DEFAULT; > static uint16_t nb_txd = IPSEC_SECGW_TX_DESC_DEFAULT; > > -#if RTE_BYTE_ORDER != RTE_LITTLE_ENDIAN > -#define __BYTES_TO_UINT64(a, b, c, d, e, f, g, h) \ > - (((uint64_t)((a) & 0xff) << 56) | \ > - ((uint64_t)((b) & 0xff) << 48) | \ > - ((uint64_t)((c) & 0xff) << 40) | \ > - ((uint64_t)((d) & 0xff) << 32) | \ > - ((uint64_t)((e) & 0xff) << 24) | \ > - ((uint64_t)((f) & 0xff) << 16) | \ > - ((uint64_t)((g) & 0xff) << 8) | \ > - ((uint64_t)(h) & 0xff)) > -#else > -#define __BYTES_TO_UINT64(a, b, c, d, e, f, g, h) \ > - (((uint64_t)((h) & 0xff) << 56) | \ > - ((uint64_t)((g) & 0xff) << 48) | \ > - ((uint64_t)((f) & 0xff) << 40) | \ > - ((uint64_t)((e) & 0xff) << 32) | \ > - ((uint64_t)((d) & 0xff) << 24) | \ > - ((uint64_t)((c) & 0xff) << 16) | \ > - ((uint64_t)((b) & 0xff) << 8) | \ > - ((uint64_t)(a) & 0xff)) > -#endif > -#define ETHADDR(a, b, c, d, e, f) (__BYTES_TO_UINT64(a, b, c, d, e, f, 0, 0)) > - > #define ETHADDR_TO_UINT64(addr) __BYTES_TO_UINT64( \ > (addr)->addr_bytes[0], (addr)->addr_bytes[1], \ > (addr)->addr_bytes[2], (addr)->addr_bytes[3], \ > @@ -123,18 +95,6 @@ static uint16_t nb_txd = IPSEC_SECGW_TX_DESC_DEFAULT; > > #define MTU_TO_FRAMELEN(x) ((x) + RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN) > > -/* port/source ethernet addr and destination ethernet addr */ > -struct ethaddr_info { > - uint64_t src, dst; > -}; > - > -struct ethaddr_info ethaddr_tbl[RTE_MAX_ETHPORTS] = { > - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x7e, 0x94, 0x9a) }, > - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x22, 0xa1, 0xd9) }, > - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x08, 0x69, 0x26) }, > - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x49, 0x9e, 0xdd) } > -}; > - > struct flow_info flow_info_tbl[RTE_MAX_ETHPORTS]; > > #define CMD_LINE_OPT_CONFIG "config" > @@ -192,10 +152,16 @@ static const struct option lgopts[] = { > {NULL, 0, 0, 0} > }; > > +struct ethaddr_info ethaddr_tbl[RTE_MAX_ETHPORTS] = { > + { 0, ETHADDR(0x00, 0x16, 0x3e, 0x7e, 0x94, 0x9a) }, > + { 0, ETHADDR(0x00, 0x16, 0x3e, 0x22, 0xa1, 0xd9) }, > + { 0, ETHADDR(0x00, 0x16, 0x3e, 0x08, 0x69, 0x26) }, > + { 0, ETHADDR(0x00, 0x16, 0x3e, 0x49, 0x9e, 0xdd) } > +}; > + > /* mask of enabled ports */ > static uint32_t enabled_port_mask; > static uint64_t enabled_cryptodev_mask = UINT64_MAX; > -static uint32_t unprotected_port_mask; > static int32_t promiscuous_on = 1; > static int32_t numa_on = 1; /**< NUMA is enabled by default. */ > static uint32_t nb_lcores; > @@ -283,8 +249,6 @@ static struct rte_eth_conf port_conf = { > }, > }; > > -static struct socket_ctx socket_ctx[NB_SOCKETS]; > - > /* > * Determine is multi-segment support required: > * - either frame buffer size is smaller then mtu > @@ -2828,47 +2792,10 @@ main(int32_t argc, char **argv) > > sa_check_offloads(portid, &req_rx_offloads, &req_tx_offloads); > port_init(portid, req_rx_offloads, req_tx_offloads); > - /* Create default ipsec flow for the ethernet device */ > - ret = create_default_ipsec_flow(portid, req_rx_offloads); > - if (ret) > - printf("Cannot create def
[dpdk-dev] rte_eth_rx_burst seems to read packets tx through the same port
Hi, I have a dpdk app that implements a simple run-to-completion model. I use igb_uio as my PMD. I notice a weird situation where my receiver ( rte_eth_rx_burst) seems to sporadically read packets I intend to transmit through the same port using rte_eth_tx_burst. Is this possible by any chance and if you have a clue as to what may be causing this issue please shed some light on this. Best, Danushka
Re: [dpdk-dev] [PATCH 12/14] examples/ipsec-secgw: add driver outbound worker
> This patch adds the driver outbound worker thread for ipsec-secgw. > In this mode the security session is a fixed one and sa update > is not done. > > Signed-off-by: Ankur Dwivedi > Signed-off-by: Anoob Joseph > Signed-off-by: Lukasz Bartosik > --- > examples/ipsec-secgw/ipsec-secgw.c | 12 + > examples/ipsec-secgw/ipsec.c| 9 > examples/ipsec-secgw/ipsec_worker.c | 90 > - > 3 files changed, 110 insertions(+), 1 deletion(-) > > diff --git a/examples/ipsec-secgw/ipsec-secgw.c > b/examples/ipsec-secgw/ipsec-secgw.c > index 2e7d4d8..76719f2 100644 > --- a/examples/ipsec-secgw/ipsec-secgw.c > +++ b/examples/ipsec-secgw/ipsec-secgw.c > @@ -2011,6 +2011,18 @@ cryptodevs_init(void) > i++; > } > > + /* > + * Set the queue pair to at least the number of ethernet > + * devices for inline outbound. > + */ > + qp = RTE_MAX(rte_eth_dev_count_avail(), qp); Not sure, what for? Why we can't process packets from several eth devs on the same crypto-dev queue? > + > + /* > + * The requested number of queues should never exceed > + * the max available > + */ > + qp = RTE_MIN(qp, max_nb_qps); > + > if (qp == 0) > continue; > > diff --git a/examples/ipsec-secgw/ipsec.c b/examples/ipsec-secgw/ipsec.c > index e529f68..9ff8a63 100644 > --- a/examples/ipsec-secgw/ipsec.c > +++ b/examples/ipsec-secgw/ipsec.c > @@ -141,6 +141,10 @@ create_lookaside_session(struct ipsec_ctx *ipsec_ctx, > struct ipsec_sa *sa, > return 0; > } > > +uint16_t sa_no; > +#define MAX_FIXED_SESSIONS 10 > +struct rte_security_session *sec_session_fixed[MAX_FIXED_SESSIONS]; > + > int > create_inline_session(struct socket_ctx *skt_ctx, struct ipsec_sa *sa, > struct rte_ipsec_session *ips) > @@ -401,6 +405,11 @@ create_inline_session(struct socket_ctx *skt_ctx, struct > ipsec_sa *sa, > > ips->security.ol_flags = sec_cap->ol_flags; > ips->security.ctx = sec_ctx; > + if (sa_no < MAX_FIXED_SESSIONS) { > + sec_session_fixed[sa_no] = > + ipsec_get_primary_session(sa)->security.ses; > + sa_no++; > + } > } Totally lost what is the purpose of these changes... Why first 10 inline-proto are special and need to be saved inside global array (sec_session_fixed)? Why later, in ipsec_worker.c this array is referenced by eth port_id? What would happen if number of inline-proto sessions is less than number of eth ports? > set_cdev_id: > diff --git a/examples/ipsec-secgw/ipsec_worker.c > b/examples/ipsec-secgw/ipsec_worker.c > index 2af9475..e202277 100644 > --- a/examples/ipsec-secgw/ipsec_worker.c > +++ b/examples/ipsec-secgw/ipsec_worker.c > @@ -263,7 +263,7 @@ process_ipsec_ev_inbound(struct ipsec_ctx *ctx, struct > route_table *rt, > */ > > /* Workers registered */ > -#define IPSEC_EVENTMODE_WORKERS 2 > +#define IPSEC_EVENTMODE_WORKERS 3 > > /* > * Event mode worker > @@ -423,6 +423,84 @@ ipsec_wrkr_non_burst_int_port_app_mode_inb(struct > eh_event_link_info *links, > return; > } > > +/* > + * Event mode worker > + * Operating parameters : non-burst - Tx internal port - driver mode - > outbound > + */ > +extern struct rte_security_session *sec_session_fixed[]; > +static void > +ipsec_wrkr_non_burst_int_port_drvr_mode_outb(struct eh_event_link_info > *links, > + uint8_t nb_links) > +{ > + unsigned int nb_rx = 0; > + struct rte_mbuf *pkt; > + unsigned int port_id; > + struct rte_event ev; > + uint32_t lcore_id; > + > + /* Check if we have links registered for this lcore */ > + if (nb_links == 0) { > + /* No links registered - exit */ > + goto exit; > + } > + > + /* Get core ID */ > + lcore_id = rte_lcore_id(); > + > + RTE_LOG(INFO, IPSEC, > + "Launching event mode worker (non-burst - Tx internal port - " > + "driver mode - outbound) on lcore %d\n", lcore_id); > + > + /* We have valid links */ > + > + /* Check if it's single link */ > + if (nb_links != 1) { > + RTE_LOG(INFO, IPSEC, > + "Multiple links not supported. Using first link\n"); > + } > + > + RTE_LOG(INFO, IPSEC, " -- lcoreid=%u event_port_id=%u\n", lcore_id, > + links[0].event_port_id); > + while (!force_quit) { > + /* Read packet from event queues */ > + nb_rx = rte_event_dequeue_burst(links[0].eventdev_id, > + links[0].event_port_id, > + &ev,/* events */ > + 1, /* nb_events */ > + 0 /* timeout_ticks */); > + > +
Re: [dpdk-dev] [PATCH 04/14] examples/ipsec-secgw: add Rx adapter support
> Add Rx adapter support. The event helper init routine will initialize > the Rx adapter according to the configuration. If Rx adapter config > is not present it will generate a default config. It will check the > available eth ports and event queues and map them 1:1. So one eth port > will be connected to one event queue. This way event queue ID could > be used to figure out the port on which a packet came in. > > Signed-off-by: Anoob Joseph > Signed-off-by: Lukasz Bartosik > --- > examples/ipsec-secgw/event_helper.c | 289 > +++- > examples/ipsec-secgw/event_helper.h | 29 > 2 files changed, 317 insertions(+), 1 deletion(-) > > diff --git a/examples/ipsec-secgw/event_helper.c > b/examples/ipsec-secgw/event_helper.c > index d0157f4..f0eca01 100644 > --- a/examples/ipsec-secgw/event_helper.c > +++ b/examples/ipsec-secgw/event_helper.c > @@ -4,10 +4,60 @@ > #include > #include > #include > +#include > #include > > #include "event_helper.h" > > +static int > +eh_get_enabled_cores(struct rte_bitmap *eth_core_mask) > +{ > + int i; > + int count = 0; > + > + RTE_LCORE_FOREACH(i) { > + /* Check if this core is enabled in core mask*/ > + if (rte_bitmap_get(eth_core_mask, i)) { > + /* We have found enabled core */ > + count++; > + } > + } > + return count; > +} > + > +static inline unsigned int > +eh_get_next_eth_core(struct eventmode_conf *em_conf) > +{ > + static unsigned int prev_core = -1; > + unsigned int next_core; > + > + /* > + * Make sure we have at least one eth core running, else the following > + * logic would lead to an infinite loop. > + */ > + if (eh_get_enabled_cores(em_conf->eth_core_mask) == 0) { > + EH_LOG_ERR("No enabled eth core found"); > + return RTE_MAX_LCORE; > + } > + > +get_next_core: > + /* Get the next core */ > + next_core = rte_get_next_lcore(prev_core, 0, 1); > + > + /* Check if we have reached max lcores */ > + if (next_core == RTE_MAX_LCORE) > + return next_core; > + > + /* Update prev_core */ > + prev_core = next_core; > + > + /* Only some cores are marked as eth cores. Skip others */ > + if (!(rte_bitmap_get(em_conf->eth_core_mask, next_core))) > + goto get_next_core; Are loops statements forbidden in C now? 😉 As a generic comment - too many (unnecessary) gotos in this patch series. It is not uncommon to see 2-3 labels inside the function and bunch gotos to them. Would be good to rework the code a bit to get rid of them. > + > + return next_core; > +} > + > static inline unsigned int > eh_get_next_active_core(struct eventmode_conf *em_conf, unsigned int > prev_core) > { > @@ -154,6 +204,87 @@ eh_set_default_conf_link(struct eventmode_conf *em_conf) > } >
Re: [dpdk-dev] [PATCH] net/i40e: enable multi-queue Rx interrupt for VF
> -Original Message- > From: Cui, LunyuanX > Sent: Tuesday, December 3, 2019 7:44 PM > To: dev@dpdk.org > Cc: Xing, Beilei ; Zhang, Qi Z ; > Yang, Qiming ; Cui, LunyuanX > > Subject: [PATCH] net/i40e: enable multi-queue Rx interrupt for VF > > Current implementation is that only one Rx queue can support interrupt, > because all queues are mapped in the same vector id in vfio_enable_msix(). > So VF can not support multi-queue Rx interrupt in the interrupt mode. > > In this patch, if the packet I/O interrupt on datapath is enabled > (rte_intr_dp_is_en(intr_handle) is true), we map different interrupt vectors > to > each queue and send this map to PF. > After PF sets the map to the register, > all Rx queue interrupts will be received. > > In addition, vector id should less than the maximum vector value. > When queue number is more than vector value, we set up a loop of interrupt > vectors map. > > Signed-off-by: Lunyuan Cui Acked-by: Qi Zhang
Re: [dpdk-dev] [PATCH v2] net/ixgbe: add support for VF MAC address add and remove
On 12/23, Guinan Sun wrote: >Ixgbe PMD pf host code needs to support ixgbevf mac address >add and remove. For this purpose, a response was added >between pf and vf to update the mac address. > >Signed-off-by: Guinan Sun >--- >v2: >* Changed the title of commit message. >* Checked null in front of valid ether addr check. >--- > drivers/net/ixgbe/ixgbe_ethdev.h | 1 + > drivers/net/ixgbe/ixgbe_pf.c | 35 > 2 files changed, 36 insertions(+) > >diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h >b/drivers/net/ixgbe/ixgbe_ethdev.h >index 76a1b9d18..e1cd8fd16 100644 >--- a/drivers/net/ixgbe/ixgbe_ethdev.h >+++ b/drivers/net/ixgbe/ixgbe_ethdev.h >@@ -270,6 +270,7 @@ struct ixgbe_vf_info { > uint8_t api_version; > uint16_t switch_domain_id; > uint16_t xcast_mode; >+ uint16_t mac_count; > }; > > /* >diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c >index d0d85e138..78fc8c5f1 100644 >--- a/drivers/net/ixgbe/ixgbe_pf.c >+++ b/drivers/net/ixgbe/ixgbe_pf.c >@@ -748,6 +748,37 @@ ixgbe_set_vf_mc_promisc(struct rte_eth_dev *dev, uint32_t >vf, uint32_t *msgbuf) > return 0; > } > >+static int >+ixgbe_set_vf_macvlan_msg(struct rte_eth_dev *dev, uint32_t vf, uint32_t >*msgbuf) >+{ >+ struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); >+ struct ixgbe_vf_info *vf_info = >+ *(IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private)); >+ uint8_t *new_mac = (uint8_t *)(&msgbuf[1]); >+ int index = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) >> >+ IXGBE_VT_MSGINFO_SHIFT; >+ >+ if (index) { >+ if (new_mac == NULL) >+ return -1; >+ >+ if (!rte_is_valid_assigned_ether_addr( >+ (struct rte_ether_addr *)new_mac)) { >+ RTE_LOG(ERR, PMD, "set invalid mac vf:%d\n", vf); Better to use PMD_DRV_LOG so as to be aligned with other logging. Thanks, Xiaolong >+ return -1; >+ } >+ >+ vf_info[vf].mac_count++; >+ >+ hw->mac.ops.set_rar(hw, vf_info[vf].mac_count, >+ new_mac, vf, IXGBE_RAH_AV); >+ } else { >+ hw->mac.ops.clear_rar(hw, vf_info[vf].mac_count); >+ vf_info[vf].mac_count = 0; >+ } >+ return 0; >+} >+ > static int > ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t vf) > { >@@ -835,6 +866,10 @@ ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t >vf) > if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED) > retval = ixgbe_set_vf_mc_promisc(dev, vf, msgbuf); > break; >+ case IXGBE_VF_SET_MACVLAN: >+ if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED) >+ retval = ixgbe_set_vf_macvlan_msg(dev, vf, msgbuf); >+ break; > default: > PMD_DRV_LOG(DEBUG, "Unhandled Msg %8.8x", (unsigned)msgbuf[0]); > retval = IXGBE_ERR_MBX; >-- >2.17.1 >
[dpdk-dev] [PATCH v4 1/5] net/fm10k: cleanup Tx buffers
Add support to the fm10k driver for the API rte_eth_tx_done_cleanup to force free consumed buffers on Tx ring. Signed-off-by: Chenxu Di --- drivers/net/fm10k/fm10k.h| 2 ++ drivers/net/fm10k/fm10k_ethdev.c | 1 + drivers/net/fm10k/fm10k_rxtx.c | 45 3 files changed, 48 insertions(+) diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h index 916b856ac..ddb1d64ec 100644 --- a/drivers/net/fm10k/fm10k.h +++ b/drivers/net/fm10k/fm10k.h @@ -342,6 +342,8 @@ uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t fm10k_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +int fm10k_tx_done_cleanup(void *txq, uint32_t free_cnt); + int fm10k_rxq_vec_setup(struct fm10k_rx_queue *rxq); int fm10k_rx_vec_condition_check(struct rte_eth_dev *); void fm10k_rx_queue_release_mbufs_vec(struct fm10k_rx_queue *rxq); diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index 407baa16c..c389c79de 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -2897,6 +2897,7 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = { .reta_query = fm10k_reta_query, .rss_hash_update= fm10k_rss_hash_update, .rss_hash_conf_get = fm10k_rss_hash_conf_get, + .tx_done_cleanup= fm10k_tx_done_cleanup, }; static int ftag_check_handler(__rte_unused const char *key, diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c index 5c3112183..f67c5bf00 100644 --- a/drivers/net/fm10k/fm10k_rxtx.c +++ b/drivers/net/fm10k/fm10k_rxtx.c @@ -541,6 +541,51 @@ static inline void tx_free_bulk_mbuf(struct rte_mbuf **txep, int num) } } +int fm10k_tx_done_cleanup(void *txq, uint32_t free_cnt) +{ + struct fm10k_tx_queue *q = (struct fm10k_tx_queue *)txq; + uint16_t next_rs, count = 0; + + if (q == NULL) + return -ENODEV; + + next_rs = fifo_peek(&q->rs_tracker); + if (!(q->hw_ring[next_rs].flags & FM10K_TXD_FLAG_DONE)) + return count; + + /* the DONE flag is set on this descriptor so remove the ID +* from the RS bit tracker and free the buffers +*/ + fifo_remove(&q->rs_tracker); + + /* wrap around? if so, free buffers from last_free up to but NOT +* including nb_desc +*/ + if (q->last_free > next_rs) { + count = q->nb_desc - q->last_free; + tx_free_bulk_mbuf(&q->sw_ring[q->last_free], count); + q->last_free = 0; + + if (unlikely(count == (int)free_cnt)) + return count; + } + + /* adjust free descriptor count before the next loop */ + q->nb_free += count + (next_rs + 1 - q->last_free); + + /* free buffers from last_free, up to and including next_rs */ + if (q->last_free <= next_rs) { + count = next_rs - q->last_free + 1; + tx_free_bulk_mbuf(&q->sw_ring[q->last_free], count); + q->last_free += count; + } + + if (q->last_free == q->nb_desc) + q->last_free = 0; + + return count; +} + static inline void tx_free_descriptors(struct fm10k_tx_queue *q) { uint16_t next_rs, count = 0; -- 2.17.1
[dpdk-dev] [PATCH v4 2/5] net/i40e: cleanup Tx buffers
Add support to the i40e driver for the API rte_eth_tx_done_cleanup to force free consumed buffers on Tx ring. Signed-off-by: Chenxu Di --- drivers/net/i40e/i40e_ethdev.c| 1 + drivers/net/i40e/i40e_ethdev_vf.c | 1 + drivers/net/i40e/i40e_rxtx.c | 122 ++ drivers/net/i40e/i40e_rxtx.h | 1 + 4 files changed, 125 insertions(+) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 5999c964b..fad47a942 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -522,6 +522,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = { .mac_addr_set = i40e_set_default_mac_addr, .mtu_set = i40e_dev_mtu_set, .tm_ops_get = i40e_tm_ops_get, + .tx_done_cleanup = i40e_tx_done_cleanup, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index 5dba0928b..0ca5417d7 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -215,6 +215,7 @@ static const struct eth_dev_ops i40evf_eth_dev_ops = { .rss_hash_conf_get= i40evf_dev_rss_hash_conf_get, .mtu_set = i40evf_dev_mtu_set, .mac_addr_set = i40evf_set_default_mac_addr, + .tx_done_cleanup = i40e_tx_done_cleanup, }; /* diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 17dc8c78f..e75733b8e 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -2455,6 +2455,128 @@ i40e_tx_queue_release_mbufs(struct i40e_tx_queue *txq) } } +int i40e_tx_done_cleanup(void *q, uint32_t free_cnt) +{ + struct i40e_tx_queue *txq = (struct i40e_tx_queue *)q; + struct i40e_tx_entry *sw_ring; + volatile struct i40e_tx_desc *txr; + uint16_t tx_first; /* First segment analyzed. */ + uint16_t tx_id;/* Current segment being processed. */ + uint16_t tx_last; /* Last segment in the current packet. */ + uint16_t tx_next; /* First segment of the next packet. */ + int count; + + if (txq == NULL) + return -ENODEV; + + count = 0; + sw_ring = txq->sw_ring; + txr = txq->tx_ring; + + /* +* tx_tail is the last sent packet on the sw_ring. Goto the end +* of that packet (the last segment in the packet chain) and +* then the next segment will be the start of the oldest segment +* in the sw_ring. This is the first packet that will be +* attempted to be freed. +*/ + + /* Get last segment in most recently added packet. */ + tx_first = sw_ring[txq->tx_tail].last_id; + + /* Get the next segment, which is the oldest segment in ring. */ + tx_first = sw_ring[tx_first].next_id; + + /* Set the current index to the first. */ + tx_id = tx_first; + + /* +* Loop through each packet. For each packet, verify that an +* mbuf exists and that the last segment is free. If so, free +* it and move on. +*/ + while (1) { + tx_last = sw_ring[tx_id].last_id; + + if (sw_ring[tx_last].mbuf) { + if ((txr[tx_last].cmd_type_offset_bsz & + rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) != + rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) { + /* +* Increment the number of packets +* freed. +*/ + count++; + + /* Get the start of the next packet. */ + tx_next = sw_ring[tx_last].next_id; + + /* +* Loop through all segments in a +* packet. +*/ + do { + rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf); + sw_ring[tx_id].mbuf = NULL; + sw_ring[tx_id].last_id = tx_id; + + /* Move to next segemnt. */ + tx_id = sw_ring[tx_id].next_id; + + } while (tx_id != tx_next); + + if (unlikely(count == (int)free_cnt)) + break; + } else { + /* +* mbuf still in use, nothing left to +* free. +*/ + break; + } + } else { + /*
[dpdk-dev] [PATCH v4 0/5] drivers/net: cleanup Tx buffers
Add support to the drivers inclulding fm10k, i40e, ice, ixgbe and igb vf for the API rte_eth_tx_done_cleanup to force free consumed buffers on Tx ring. --- v2: added code about igb vf. v3: changed information of author v4: changed code. Chenxu Di (5): net/fm10k: cleanup Tx buffers net/i40e: cleanup Tx buffers net/ice: cleanup Tx buffers net/ixgbe: cleanup Tx buffers net/e1000: cleanup Tx buffers drivers/net/e1000/igb_ethdev.c| 1 + drivers/net/fm10k/fm10k.h | 2 + drivers/net/fm10k/fm10k_ethdev.c | 1 + drivers/net/fm10k/fm10k_rxtx.c| 45 +++ drivers/net/i40e/i40e_ethdev.c| 1 + drivers/net/i40e/i40e_ethdev_vf.c | 1 + drivers/net/i40e/i40e_rxtx.c | 122 + drivers/net/i40e/i40e_rxtx.h | 1 + drivers/net/ice/ice_ethdev.c | 1 + drivers/net/ice/ice_rxtx.c| 123 ++ drivers/net/ice/ice_rxtx.h| 1 + drivers/net/ixgbe/ixgbe_ethdev.c | 2 + drivers/net/ixgbe/ixgbe_rxtx.c| 121 + drivers/net/ixgbe/ixgbe_rxtx.h| 2 + 14 files changed, 424 insertions(+) -- 2.17.1
[dpdk-dev] [PATCH v4 3/5] net/ice: cleanup Tx buffers
Add support to the ice driver for the API rte_eth_tx_done_cleanup to force free consumed buffers on Tx ring. Signed-off-by: Chenxu Di --- drivers/net/ice/ice_ethdev.c | 1 + drivers/net/ice/ice_rxtx.c | 123 +++ drivers/net/ice/ice_rxtx.h | 1 + 3 files changed, 125 insertions(+) diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index de189daba..b55cdbf74 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -220,6 +220,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = { .filter_ctrl = ice_dev_filter_ctrl, .udp_tunnel_port_add = ice_dev_udp_tunnel_port_add, .udp_tunnel_port_del = ice_dev_udp_tunnel_port_del, + .tx_done_cleanup = ice_tx_done_cleanup, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 2db174456..4aead98fd 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -863,6 +863,129 @@ ice_fdir_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id) return 0; } + +int ice_tx_done_cleanup(void *q, uint32_t free_cnt) +{ + struct ice_tx_queue *txq = (struct ice_tx_queue *)q; + struct ice_tx_entry *sw_ring; + volatile struct ice_tx_desc *txr; + uint16_t tx_first; /* First segment analyzed. */ + uint16_t tx_id;/* Current segment being processed. */ + uint16_t tx_last; /* Last segment in the current packet. */ + uint16_t tx_next; /* First segment of the next packet. */ + int count; + + if (txq == NULL) + return -ENODEV; + + count = 0; + sw_ring = txq->sw_ring; + txr = txq->tx_ring; + + /* +* tx_tail is the last sent packet on the sw_ring. Goto the end +* of that packet (the last segment in the packet chain) and +* then the next segment will be the start of the oldest segment +* in the sw_ring. This is the first packet that will be +* attempted to be freed. +*/ + + /* Get last segment in most recently added packet. */ + tx_first = sw_ring[txq->tx_tail].last_id; + + /* Get the next segment, which is the oldest segment in ring. */ + tx_first = sw_ring[tx_first].next_id; + + /* Set the current index to the first. */ + tx_id = tx_first; + + /* +* Loop through each packet. For each packet, verify that an +* mbuf exists and that the last segment is free. If so, free +* it and move on. +*/ + while (1) { + tx_last = sw_ring[tx_id].last_id; + + if (sw_ring[tx_last].mbuf) { + if ((txr[tx_last].cmd_type_offset_bsz & + rte_cpu_to_le_64(ICE_TXD_QW1_DTYPE_M)) != + rte_cpu_to_le_64(ICE_TX_DESC_DTYPE_DESC_DONE)) { + /* +* Increment the number of packets +* freed. +*/ + count++; + + /* Get the start of the next packet. */ + tx_next = sw_ring[tx_last].next_id; + + /* +* Loop through all segments in a +* packet. +*/ + do { + rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf); + sw_ring[tx_id].mbuf = NULL; + sw_ring[tx_id].last_id = tx_id; + + /* Move to next segemnt. */ + tx_id = sw_ring[tx_id].next_id; + + } while (tx_id != tx_next); + + if (unlikely(count == (int)free_cnt)) + break; + } else { + /* +* mbuf still in use, nothing left to +* free. +*/ + break; + } + } else { + /* +* There are multiple reasons to be here: +* 1) All the packets on the ring have been +*freed - tx_id is equal to tx_first +*and some packets have been freed. +*- Done, exit +* 2) Interfaces has not sent a rings worth of +*packets yet, so the segment after tail is +*still empty. Or a previous call to this +*funct
[dpdk-dev] [PATCH v4 4/5] net/ixgbe: cleanup Tx buffers
Add support to the ixgbe driver for the API rte_eth_tx_done_cleanup to force free consumed buffers on Tx ring. Signed-off-by: Chenxu Di --- drivers/net/ixgbe/ixgbe_ethdev.c | 2 + drivers/net/ixgbe/ixgbe_rxtx.c | 121 +++ drivers/net/ixgbe/ixgbe_rxtx.h | 2 + 3 files changed, 125 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 2c6fd0f13..0091405db 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -601,6 +601,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = { .udp_tunnel_port_add = ixgbe_dev_udp_tunnel_port_add, .udp_tunnel_port_del = ixgbe_dev_udp_tunnel_port_del, .tm_ops_get = ixgbe_tm_ops_get, + .tx_done_cleanup = ixgbe_tx_done_cleanup, }; /* @@ -649,6 +650,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops = { .reta_query = ixgbe_dev_rss_reta_query, .rss_hash_update = ixgbe_dev_rss_hash_update, .rss_hash_conf_get= ixgbe_dev_rss_hash_conf_get, + .tx_done_cleanup = ixgbe_tx_done_cleanup, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index fa572d184..0cd56d427 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -2306,6 +2306,127 @@ ixgbe_tx_queue_release_mbufs(struct ixgbe_tx_queue *txq) } } +int ixgbe_tx_done_cleanup(void *q, uint32_t free_cnt) +{ + struct ixgbe_tx_queue *txq = (struct ixgbe_tx_queue *)q; + struct ixgbe_tx_entry *sw_ring; + volatile union ixgbe_adv_tx_desc *txr; + uint16_t tx_first; /* First segment analyzed. */ + uint16_t tx_id;/* Current segment being processed. */ + uint16_t tx_last; /* Last segment in the current packet. */ + uint16_t tx_next; /* First segment of the next packet. */ + int count; + + if (txq == NULL) + return -ENODEV; + + count = 0; + sw_ring = txq->sw_ring; + txr = txq->tx_ring; + + /* +* tx_tail is the last sent packet on the sw_ring. Goto the end +* of that packet (the last segment in the packet chain) and +* then the next segment will be the start of the oldest segment +* in the sw_ring. This is the first packet that will be +* attempted to be freed. +*/ + + /* Get last segment in most recently added packet. */ + tx_first = sw_ring[txq->tx_tail].last_id; + + /* Get the next segment, which is the oldest segment in ring. */ + tx_first = sw_ring[tx_first].next_id; + + /* Set the current index to the first. */ + tx_id = tx_first; + + /* +* Loop through each packet. For each packet, verify that an +* mbuf exists and that the last segment is free. If so, free +* it and move on. +*/ + while (1) { + tx_last = sw_ring[tx_id].last_id; + + if (sw_ring[tx_last].mbuf) { + if (txr[tx_last].wb.status & + IXGBE_TXD_STAT_DD) { + /* +* Increment the number of packets +* freed. +*/ + count++; + + /* Get the start of the next packet. */ + tx_next = sw_ring[tx_last].next_id; + + /* +* Loop through all segments in a +* packet. +*/ + do { + rte_pktmbuf_free_seg(sw_ring[tx_id].mbuf); + sw_ring[tx_id].mbuf = NULL; + sw_ring[tx_id].last_id = tx_id; + + /* Move to next segemnt. */ + tx_id = sw_ring[tx_id].next_id; + + } while (tx_id != tx_next); + + if (unlikely(count == (int)free_cnt)) + break; + } else { + /* +* mbuf still in use, nothing left to +* free. +*/ + break; + } + } else { + /* +* There are multiple reasons to be here: +* 1) All the packets on the ring have been +*freed - tx_id is equal to tx_first +*and some packets have been freed. +*- Done, exit +* 2
[dpdk-dev] [PATCH v4 5/5] net/e1000: cleanup Tx buffers
Add support to the igb vf for the API rte_eth_tx_done_cleanup to force free consumed buffers on Tx ring. Signed-off-by: Chenxu Di --- drivers/net/e1000/igb_ethdev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index a3e30dbe5..647d5504f 100644 --- a/drivers/net/e1000/igb_ethdev.c +++ b/drivers/net/e1000/igb_ethdev.c @@ -446,6 +446,7 @@ static const struct eth_dev_ops igbvf_eth_dev_ops = { .tx_descriptor_status = eth_igb_tx_descriptor_status, .tx_queue_setup = eth_igb_tx_queue_setup, .tx_queue_release = eth_igb_tx_queue_release, + .tx_done_cleanup = eth_igb_tx_done_cleanup, .set_mc_addr_list = eth_igb_set_mc_addr_list, .rxq_info_get = igb_rxq_info_get, .txq_info_get = igb_txq_info_get, -- 2.17.1
Re: [dpdk-dev] [PATCH] net/i40e: enable multi-queue Rx interrupt for VF
On 12/24, Zhang, Qi Z wrote: > > >> -Original Message- >> From: Cui, LunyuanX >> Sent: Tuesday, December 3, 2019 7:44 PM >> To: dev@dpdk.org >> Cc: Xing, Beilei ; Zhang, Qi Z ; >> Yang, Qiming ; Cui, LunyuanX >> >> Subject: [PATCH] net/i40e: enable multi-queue Rx interrupt for VF >> >> Current implementation is that only one Rx queue can support interrupt, >> because all queues are mapped in the same vector id in vfio_enable_msix(). >> So VF can not support multi-queue Rx interrupt in the interrupt mode. >> >> In this patch, if the packet I/O interrupt on datapath is enabled >> (rte_intr_dp_is_en(intr_handle) is true), we map different interrupt vectors >> to >> each queue and send this map to PF. >> After PF sets the map to the register, >> all Rx queue interrupts will be received. >> >> In addition, vector id should less than the maximum vector value. >> When queue number is more than vector value, we set up a loop of interrupt >> vectors map. >> >> Signed-off-by: Lunyuan Cui > >Acked-by: Qi Zhang > Applied to dpdk-next-net-intel, Thanks.
Re: [dpdk-dev] [RFC PATCH] mbuf: fix to update documentation of PKT_RX_QINQ_STRIPPED
Given that we haven't heard any objection from anyone in a while on this ...can we get this in please? Thanks On Mon, Dec 16, 2019 at 2:43 PM Andrew Rybchenko wrote: > > On 12/16/19 11:47 AM, Somnath Kotur wrote: > > On Mon, Dec 16, 2019 at 12:01 PM Andrew Rybchenko > > wrote: > >> > >> On 12/16/19 6:16 AM, Somnath Kotur wrote: > >>> Certain hardware may be able to strip and/or save only the outermost > >>> VLAN instead of both the VLANs in the mbuf in a QinQ scenario. > >>> To handle such cases, we could re-interpret setting of just > >>> PKT_RX_QINQ_STRIPPED > >>> to indicate that only the outermost VLAN has been stripped by the > >>> hardware and > >>> saved in mbuf->vlan_tci_outer. > >>> Only When both PKT_RX_QINQ_STRIPPED and PKT_RX_VLAN_STRIPPED are set, the > >>> 2 vlans > >>> have been stripped by the hardware and their tci are saved in > >>> mbuf->vlan_tci (inner) > >>> and mbuf->vlan_tci_outer (outer). > >>> > >>> Signed-off-by: Somnath Kotur > >>> Signed-off-by: JP Lee > >>> --- > >>> lib/librte_mbuf/rte_mbuf_core.h | 15 +++ > >>> 1 file changed, 11 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/lib/librte_mbuf/rte_mbuf_core.h > >>> b/lib/librte_mbuf/rte_mbuf_core.h > >>> index 9a8557d..db1070b 100644 > >>> --- a/lib/librte_mbuf/rte_mbuf_core.h > >>> +++ b/lib/librte_mbuf/rte_mbuf_core.h > >>> @@ -124,12 +124,19 @@ > >>> #define PKT_RX_FDIR_FLX (1ULL << 14) > >>> > >>> /** > >>> - * The 2 vlans have been stripped by the hardware and their tci are > >>> - * saved in mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer). > >>> + * The outer vlan has been stripped by the hardware and their tci are > >>> + * saved in mbuf->vlan_tci_outer (outer). > >>> * This can only happen if vlan stripping is enabled in the RX > >>> * configuration of the PMD. > >>> - * When PKT_RX_QINQ_STRIPPED is set, the flags (PKT_RX_VLAN | > >>> - * PKT_RX_VLAN_STRIPPED | PKT_RX_QINQ) must also be set. > >>> + * When PKT_RX_QINQ_STRIPPED is set, the flags (PKT_RX_VLAN | > >>> PKT_RX_QINQ) > >>> + * must also be set. > >>> + * When both PKT_RX_QINQ_STRIPPED and PKT_RX_VLAN_STRIPPED are set, the > >>> 2 vlans > >>> + * have been stripped by the hardware and their tci are saved in > >>> + * mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer). > >>> + * This can only happen if vlan stripping is enabled in the RX > >>> configuration > >>> + * of the PMD. > >>> + * When PKT_RX_QINQ_STRIPPED and PKT_RX_VLAN_STRIPPED are set, > >>> + * (PKT_RX_VLAN | PKT_RX_QINQ) must also be set. > >>> */ > >>> #define PKT_RX_QINQ_STRIPPED (1ULL << 15) > >>> > >> > >> I always thought that PKT_RX_VLAN_STRIPPED means *one* VLAN > >> stripped regardless if it is outer (if the packet is double > >> tagged) or inner (if only one VLAN tag was present). > >> > >> That's why PKT_RX_QINQ_STRIPPED description says that *two* > >> VLANs have been stripped. > >> > >> What is the problem with such approach? > > The problem is that RX_VLAN_STRIPPED implies that the stripped VLAN > > (outer or inner) is saved in mbuf->vlan_tci, correct? > > Yes. > > > There is no way to convey that it is in QinQ mode and yet only outer > > VLAN has been stripped and saved in mbuf->vlan_tci_outer ? > > Ah, it looks like I understand now that the problem is in > PKT_RX_QINQ description which claims that TCI is saved in > mbuf->vlan_tci_outer and PKT_RX_QINQ_STRIPPED means that > both VLAN tags are stripped regardless (PKT_RX_VLAN_STRIPPED). > Moreover PKT_RX_QINQ_STRIPPED requires PKT_RX_VLAN_STRIPPED. > > It means that PKT_RX_QINQ_STRIPPED, PKT_RX_QINQ, PKT_RX_VLAN > and PKT_RX_VLAN_STRIPPED must be clarified. > > I'm not sure, but it looks like it could affect net/dpaa2, > so I'm including driver maintainers in CC.
[dpdk-dev] [PATCH v3] net/ixgbe: add support for VF MAC address add and remove
Ixgbe PMD pf host code needs to support ixgbevf mac address add and remove. For this purpose, a response was added between pf and vf to update the mac address. Signed-off-by: Guinan Sun --- v3: * Changed from `RTE_LOG` to `PMD_DRV_LOG`. v2: * Changed the title of commit message. * Checked null in front of valid ether addr check. --- drivers/net/ixgbe/ixgbe_ethdev.h | 1 + drivers/net/ixgbe/ixgbe_pf.c | 35 2 files changed, 36 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h index 76a1b9d18..e1cd8fd16 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.h +++ b/drivers/net/ixgbe/ixgbe_ethdev.h @@ -270,6 +270,7 @@ struct ixgbe_vf_info { uint8_t api_version; uint16_t switch_domain_id; uint16_t xcast_mode; + uint16_t mac_count; }; /* diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c index d0d85e138..c93c0fdc2 100644 --- a/drivers/net/ixgbe/ixgbe_pf.c +++ b/drivers/net/ixgbe/ixgbe_pf.c @@ -748,6 +748,37 @@ ixgbe_set_vf_mc_promisc(struct rte_eth_dev *dev, uint32_t vf, uint32_t *msgbuf) return 0; } +static int +ixgbe_set_vf_macvlan_msg(struct rte_eth_dev *dev, uint32_t vf, uint32_t *msgbuf) +{ + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct ixgbe_vf_info *vf_info = + *(IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private)); + uint8_t *new_mac = (uint8_t *)(&msgbuf[1]); + int index = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) >> + IXGBE_VT_MSGINFO_SHIFT; + + if (index) { + if (new_mac == NULL) + return -1; + + if (!rte_is_valid_assigned_ether_addr( + (struct rte_ether_addr *)new_mac)) { + PMD_DRV_LOG(ERR, "set invalid mac vf:%d\n", vf); + return -1; + } + + vf_info[vf].mac_count++; + + hw->mac.ops.set_rar(hw, vf_info[vf].mac_count, + new_mac, vf, IXGBE_RAH_AV); + } else { + hw->mac.ops.clear_rar(hw, vf_info[vf].mac_count); + vf_info[vf].mac_count = 0; + } + return 0; +} + static int ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t vf) { @@ -835,6 +866,10 @@ ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t vf) if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED) retval = ixgbe_set_vf_mc_promisc(dev, vf, msgbuf); break; + case IXGBE_VF_SET_MACVLAN: + if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED) + retval = ixgbe_set_vf_macvlan_msg(dev, vf, msgbuf); + break; default: PMD_DRV_LOG(DEBUG, "Unhandled Msg %8.8x", (unsigned)msgbuf[0]); retval = IXGBE_ERR_MBX; -- 2.17.1
[dpdk-dev] [PATCH v3] net/iavf: fix virtual channel return value error
In iavf_handle_virtchnl_msg(), it is not appropriate for _clear_cmd() to be used as a notification to forground thread. So introduce _notify_cmd() to fix this error. In addition, since _notify_cmd() contains rte_wmb(), rte_compiler_barrier() is not necessary. Sending msg from VF to PF is mainly by calling iavf_execute_vf_cmd(), the whole virtchnl msg process is like, iavf_execute_vf_cmd() will call iavf_aq_send_msg_to_pf() to send msg and then polling the cmd done flag as "if (vf->pend_cmd == VIRTCHNL_OP_UNKNOWN)" When reply msg is returned by pf, iavf_handle_virtchnl_msg() in isr will read msg return value by "vf->cmd_retval = msg_ret" and immediately set the cmd done flag by calling _clear_cmd() to notify the iavf_execute_vf_cmd(). iavf_execute_vf_cmd() find the cmd done flag is set and then check whether command return value vf->cmd_retval is success or not. However _clear_cmd() also resets the vf->cmd_retval to success, overwriting the actual return value which is used for diagnosis. So iavf_execute_vf_cmd() will always find vf->cmd_retval is success and then return success. Fixes: 22b123a36d07 ("net/avf: initialize PMD") Cc: sta...@dpdk.org Signed-off-by: Yahui Cao --- drivers/net/iavf/iavf.h | 11 +++ drivers/net/iavf/iavf_vchnl.c | 9 +++-- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/net/iavf/iavf.h b/drivers/net/iavf/iavf.h index bbd4d75d0..84f821354 100644 --- a/drivers/net/iavf/iavf.h +++ b/drivers/net/iavf/iavf.h @@ -173,6 +173,17 @@ struct iavf_cmd_info { uint32_t out_size; /* buffer size for response */ }; +/* notify current command done. Only call in case execute + * _atomic_set_cmd successfully. + */ +static inline void +_notify_cmd(struct iavf_info *vf, uint32_t msg_ret) +{ + vf->cmd_retval = msg_ret; + rte_wmb(); + vf->pend_cmd = VIRTCHNL_OP_UNKNOWN; +} + /* clear current command. Only call in case execute * _atomic_set_cmd successfully. */ diff --git a/drivers/net/iavf/iavf_vchnl.c b/drivers/net/iavf/iavf_vchnl.c index 14395fed3..cf0f6458e 100644 --- a/drivers/net/iavf/iavf_vchnl.c +++ b/drivers/net/iavf/iavf_vchnl.c @@ -210,12 +210,9 @@ iavf_handle_virtchnl_msg(struct rte_eth_dev *dev) info.msg_len); } else { /* read message and it's expected one */ - if (msg_opc == vf->pend_cmd) { - vf->cmd_retval = msg_ret; - /* prevent compiler reordering */ - rte_compiler_barrier(); - _clear_cmd(vf); - } else + if (msg_opc == vf->pend_cmd) + _notify_cmd(vf, msg_ret); + else PMD_DRV_LOG(ERR, "command mismatch," "expect %u, get %u", vf->pend_cmd, msg_opc); -- 2.17.1
Re: [dpdk-dev] [PATCH v3] net/iavf: fix virtual channel return value error
> -Original Message- > From: Cao, Yahui > Sent: Tuesday, December 24, 2019 12:13 PM > To: Wu, Jingjing ; Lu, Wenzhuo > > Cc: dev@dpdk.org; sta...@dpdk.org; Zhang, Qi Z ; Cao, > Yahui ; Ye, Xiaolong ; Su, > Simei > Subject: [PATCH v3] net/iavf: fix virtual channel return value error > > In iavf_handle_virtchnl_msg(), it is not appropriate for _clear_cmd() to be > used > as a notification to forground thread. So introduce > _notify_cmd() to fix this error. In addition, since _notify_cmd() contains > rte_wmb(), rte_compiler_barrier() is not necessary. > > Sending msg from VF to PF is mainly by calling iavf_execute_vf_cmd(), the > whole virtchnl msg process is like, > > iavf_execute_vf_cmd() will call iavf_aq_send_msg_to_pf() to send msg and > then polling the cmd done flag as "if (vf->pend_cmd == > VIRTCHNL_OP_UNKNOWN)" > > When reply msg is returned by pf, iavf_handle_virtchnl_msg() in isr will read > msg return value by "vf->cmd_retval = msg_ret" and immediately set the cmd > done flag by calling _clear_cmd() to notify the iavf_execute_vf_cmd(). > > iavf_execute_vf_cmd() find the cmd done flag is set and then check whether > command return value vf->cmd_retval is success or not. > > However _clear_cmd() also resets the vf->cmd_retval to success, overwriting > the actual return value which is used for diagnosis. > So iavf_execute_vf_cmd() will always find vf->cmd_retval is success and then > return success. > > Fixes: 22b123a36d07 ("net/avf: initialize PMD") > Cc: sta...@dpdk.org > > Signed-off-by: Yahui Cao Acked-by: Qi Zhang
Re: [dpdk-dev] [PATCH v3] net/ixgbe: add support for VF MAC address add and remove
On 12/24, Guinan Sun wrote: >Ixgbe PMD pf host code needs to support ixgbevf mac address >add and remove. For this purpose, a response was added >between pf and vf to update the mac address. > >Signed-off-by: Guinan Sun >--- >v3: >* Changed from `RTE_LOG` to `PMD_DRV_LOG`. >v2: >* Changed the title of commit message. >* Checked null in front of valid ether addr check. >--- > drivers/net/ixgbe/ixgbe_ethdev.h | 1 + > drivers/net/ixgbe/ixgbe_pf.c | 35 > 2 files changed, 36 insertions(+) > >diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h >b/drivers/net/ixgbe/ixgbe_ethdev.h >index 76a1b9d18..e1cd8fd16 100644 >--- a/drivers/net/ixgbe/ixgbe_ethdev.h >+++ b/drivers/net/ixgbe/ixgbe_ethdev.h >@@ -270,6 +270,7 @@ struct ixgbe_vf_info { > uint8_t api_version; > uint16_t switch_domain_id; > uint16_t xcast_mode; >+ uint16_t mac_count; > }; > > /* >diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c >index d0d85e138..c93c0fdc2 100644 >--- a/drivers/net/ixgbe/ixgbe_pf.c >+++ b/drivers/net/ixgbe/ixgbe_pf.c >@@ -748,6 +748,37 @@ ixgbe_set_vf_mc_promisc(struct rte_eth_dev *dev, uint32_t >vf, uint32_t *msgbuf) > return 0; > } > >+static int >+ixgbe_set_vf_macvlan_msg(struct rte_eth_dev *dev, uint32_t vf, uint32_t >*msgbuf) >+{ >+ struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); >+ struct ixgbe_vf_info *vf_info = >+ *(IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private)); >+ uint8_t *new_mac = (uint8_t *)(&msgbuf[1]); >+ int index = (msgbuf[0] & IXGBE_VT_MSGINFO_MASK) >> >+ IXGBE_VT_MSGINFO_SHIFT; >+ >+ if (index) { >+ if (new_mac == NULL) >+ return -1; >+ >+ if (!rte_is_valid_assigned_ether_addr( >+ (struct rte_ether_addr *)new_mac)) { >+ PMD_DRV_LOG(ERR, "set invalid mac vf:%d\n", vf); >+ return -1; >+ } >+ >+ vf_info[vf].mac_count++; >+ >+ hw->mac.ops.set_rar(hw, vf_info[vf].mac_count, >+ new_mac, vf, IXGBE_RAH_AV); >+ } else { >+ hw->mac.ops.clear_rar(hw, vf_info[vf].mac_count); >+ vf_info[vf].mac_count = 0; >+ } >+ return 0; >+} >+ > static int > ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t vf) > { >@@ -835,6 +866,10 @@ ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t >vf) > if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED) > retval = ixgbe_set_vf_mc_promisc(dev, vf, msgbuf); > break; >+ case IXGBE_VF_SET_MACVLAN: >+ if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED) >+ retval = ixgbe_set_vf_macvlan_msg(dev, vf, msgbuf); >+ break; > default: > PMD_DRV_LOG(DEBUG, "Unhandled Msg %8.8x", (unsigned)msgbuf[0]); > retval = IXGBE_ERR_MBX; >-- >2.17.1 > Acked-by: Xiaolong Ye Applied to dpdk-next-net-intel, Thanks.