RE: [RFC v3 23/26] dev: hide driver object

2022-08-01 Thread Jayatheerthan, Jay
Looks good to me.

Acked-by: Jay Jayatheerthan 


> -Original Message-
> From: David Marchand 
> Sent: Thursday, July 28, 2022 8:57 PM
> To: dev@dpdk.org
> Cc: Chautru, Nicolas ; Parav Pandit 
> ; Xueming Li ; Hemant
> Agrawal ; Sachin Saxena ; 
> Stephen Hemminger ;
> Long Li ; Zhang, Roy Fan ; 
> Sunila Sahu ; Ashish Gupta
> ; Ajit Khaparde ; 
> Raveendra Padasalagi
> ; Vikas Gupta ; 
> Chandubabu Namburu ;
> Ankur Dwivedi ; Anoob Joseph ; 
> Tejasree Kondoj ;
> Gagandeep Singh ; Richardson, Bruce 
> ; Laatz, Kevin ;
> McDaniel, Timothy ; Jerin Jacob 
> ; Elena Agostini ; Loftus,
> Ciara ; Zhang, Qi Z ; Shepard 
> Siegel ; Ed Czeck
> ; John Miller ; 
> Webster, Steven ; Peters,
> Matt ; Rasesh Mody ; Shahed 
> Shaikh ; Somnath Kotur
> ; Rahul Lakkireddy 
> ; Su, Simei ; Wu, Wenjun1
> ; Marcin Wojtas ; Michal Krawczyk 
> ; Shai Brandes
> ; Evgeny Schemeilin ; Igor Chauskin 
> ; Daley, John
> ; Hyong Youb Kim ; Gaetan Rivet 
> ; Wang, Xiao W
> ; Zhang, Yuying ; Xing, Beilei 
> ; Wu, Jingjing
> ; Yang, Qiming ; Matan Azrad 
> ; Viacheslav Ovsiienko
> ; Chaoyong He ; Niklas 
> Soderlund ; Harman
> Kalra ; Devendra Singh Rawat ; 
> Andrew Rybchenko
> ; Maciej Czekaj ; Maxime 
> Coquelin ; Xia,
> Chenbo ; Jochen Behrens ; Jakub 
> Palider ; Tomasz Duszynski
> ; Ori Kam ; Akhil Goyal 
> ; Chengwen Feng
> ; Ray Kinsella ; Thomas Monjalon 
> ; Ferruh Yigit
> ; Gujjar, Abhinandan S 
> ; Jayatheerthan, Jay 
> ;
> Matz, Olivier ; Pattan, Reshma 
> 
> Subject: [RFC v3 23/26] dev: hide driver object
> 
> Make rte_driver opaque for non internal users.
> This will make extending this object possible without breaking the ABI.
> 
> Introduce a new driver header and move rte_driver definition.
> Update drivers and library to use the internal header.
> 
> Some applications may have been dereferencing rte_driver objects, mark
> this object's accessors as stable.
> 
> Signed-off-by: David Marchand 
> ---
> Changes since RFC v2:
> - updated release notes,
> - marked accessors as stable,
> 
> ---
>  doc/guides/rel_notes/release_22_11.rst|  2 ++
>  drivers/baseband/acc100/rte_acc100_pmd.c  |  2 +-
>  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c |  2 +-
>  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |  2 +-
>  drivers/bus/auxiliary/auxiliary_params.c  |  2 +-
>  drivers/bus/auxiliary/bus_auxiliary_driver.h  |  2 +-
>  drivers/bus/dpaa/bus_dpaa_driver.h|  2 +-
>  drivers/bus/fslmc/bus_fslmc_driver.h  |  2 +-
>  drivers/bus/fslmc/fslmc_vfio.c|  2 +-
>  drivers/bus/fslmc/portal/dpaa2_hw_dpbp.c  |  2 +-
>  drivers/bus/fslmc/portal/dpaa2_hw_dpci.c  |  2 +-
>  drivers/bus/fslmc/portal/dpaa2_hw_dpio.c  |  2 +-
>  drivers/bus/fslmc/portal/dpaa2_hw_dprc.c  |  2 +-
>  drivers/bus/pci/bus_pci_driver.h  |  2 +-
>  drivers/bus/pci/pci_params.c  |  2 +-
>  drivers/bus/vdev/bus_vdev_driver.h|  2 +-
>  drivers/bus/vdev/vdev.c   |  2 +-
>  drivers/bus/vdev/vdev_params.c|  2 +-
>  drivers/bus/vmbus/bus_vmbus_driver.h  |  2 +-
>  drivers/common/qat/dev/qat_dev_gen4.c |  2 +-
>  drivers/common/qat/qat_qp.c   |  2 +-
>  drivers/compress/zlib/zlib_pmd_ops.c  |  2 +-
>  drivers/crypto/bcmfs/bcmfs_qp.c   |  2 +-
>  drivers/crypto/bcmfs/bcmfs_sym_pmd.c  |  2 +-
>  drivers/crypto/ccp/rte_ccp_pmd.c  |  2 +-
>  drivers/crypto/cnxk/cn10k_cryptodev.c |  2 +-
>  drivers/crypto/cnxk/cn9k_cryptodev.c  |  2 +-
>  drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c   |  2 +-
>  drivers/crypto/dpaa_sec/dpaa_sec.c|  2 +-
>  drivers/crypto/scheduler/scheduler_pmd_ops.c  |  2 +-
>  drivers/dma/idxd/idxd_bus.c   |  1 +
>  drivers/event/dlb2/dlb2.c |  2 +-
>  drivers/event/dlb2/pf/dlb2_pf.c   |  2 +-
>  drivers/event/dpaa/dpaa_eventdev.c|  2 +-
>  drivers/event/dpaa2/dpaa2_eventdev.c  |  2 +-
>  drivers/event/dpaa2/dpaa2_hw_dpcon.c  |  2 +-
>  drivers/event/octeontx/ssovf_evdev.c  |  2 +-
>  drivers/event/skeleton/skeleton_eventdev.c|  2 +-
>  drivers/gpu/cuda/cuda.c   |  2 +-
>  drivers/mempool/dpaa2/dpaa2_hw_mempool.c  |  2 +-
>  drivers/net/af_xdp/rte_eth_af_xdp.c   |  2 +-
>  drivers/net/ark/ark_global.h  |  2 +-
>  drivers/net/avp/avp_ethdev.c  |  2 +-
>  drivers/net/axgbe/axgbe_common.h  |  2 +-
>  drivers/net/bnx2x/bnx2x_ethdev.c  |  2 +-
>  drivers/net/bnxt/bnxt_ethdev.c|  2 +-
>  drivers/net/bnxt/rte_pmd_bnxt.c   |  2 +-
>  drivers/net/cxgbe/base/t4_hw.c|  2 +-
>  drivers/net/cxgbe/cxgbe_ethdev.c  |  2 +-
>  drivers/net/cxgbe/cxgbe_main.c|  2 +-
>  drivers/net/cxgbe/sge.c   |  2 +-
>  drivers/net/dpaa2/base/dpaa2_hw_dpni.c|  2 +-
>  drivers/net/dpa

RE: [PATCH v9 1/4] ethdev: introduce protocol header API

2022-08-01 Thread Wang, YuanX
Hi Thomas,

Sorry so long to response your email.

> -Original Message-
> From: Thomas Monjalon 
> Sent: Thursday, July 7, 2022 5:05 PM
> To: Wu, WenxuanX 
> Cc: andrew.rybche...@oktetlabs.ru; Li, Xiaoyun ;
> ferruh.yi...@xilinx.com; Singh, Aman Deep ;
> dev@dpdk.org; Zhang, Yuying ; Zhang, Qi Z
> ; jerinjac...@gmail.com;
> step...@networkplumber.org
> Subject: Re: [PATCH v9 1/4] ethdev: introduce protocol header API
> 
> 13/06/2022 12:25, wenxuanx...@intel.com:
> > From: Wenxuan Wu 
> >
> > This patch added new ethdev API to retrieve supported protocol header
> > mask of a PMD, which helps to configure protocol header based buffer split.
> >
> > Signed-off-by: Wenxuan Wu 
> > ---
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Get supported header protocols to split supported by PMD.
> > + * The API will return error if the device is not valid.
> > + *
> > + * @param port_id
> > + *   The port identifier of the device.
> > + * @param ptype
> > + *   Supported protocol headers of driver.
> 
> It doesn't say where to find the types.
> Please give the prefix.

Sorry I didn't catch your point, are you referring the ptype should be composed 
of RTE_PTYPE_*?
Could you explain it in more detail?

> 
> > + * @return
> > + *   - (-ENOTSUP) if header protocol is not supported by device.
> > + *   - (-ENODEV) if *port_id* invalid.
> > + *   - (-EIO) if device is removed.
> > + *   - (0) on success.
> > + */
> > +__rte_experimental
> > +int rte_eth_supported_hdrs_get(uint16_t port_id,
> > +   uint32_t *ptype);
> 
> The function name is not precise enough.
> There should be the word "split" in its name.

Thanks for the suggestion, it will be revised in the next version.

Thanks,
Yuan
> 




RE: [PATCH v9 1/4] ethdev: introduce protocol header API

2022-08-01 Thread Wang, YuanX
Hi Andrew,

Apologies for the delay in getting back to you.

> -Original Message-
> From: Andrew Rybchenko 
> Sent: Friday, July 8, 2022 11:01 PM
> To: Wu, WenxuanX ; tho...@monjalon.net; Li,
> Xiaoyun ; ferruh.yi...@xilinx.com; Singh, Aman Deep
> ; dev@dpdk.org; Zhang, Yuying
> ; Zhang, Qi Z ;
> jerinjac...@gmail.com
> Cc: step...@networkplumber.org
> Subject: Re: [PATCH v9 1/4] ethdev: introduce protocol header API
> 
> On 6/13/22 13:25, wenxuanx...@intel.com wrote:
> > From: Wenxuan Wu 
> >
> > This patch added new ethdev API to retrieve supported protocol header
> > mask
> 
> This patch added -> Add

Thanks for your catch, will fix in the next version.

> 
> > of a PMD, which helps to configure protocol header based buffer split.
> 
> I'd like to see motivation why single mask is considered sufficient.
> I.e. why don't we follow ptypes approach which is move flexible, but a bit
> more complicated.
> 
> Looking at RTE_PTYPE_* defines carefully it looks like below API simply
> cannot provide information that we can split after TCP or UDP.

As Xuan replied in the patch 2, we think maybe RTE_PTYPE_* is enough.
Any insights are welcome.

> 
> >
> > Signed-off-by: Wenxuan Wu 
> 
> [snip]
> 
> >   /**
> >* @internal
> >* Dump private info from device to a file.
> > @@ -1281,6 +1296,9 @@ struct eth_dev_ops {
> > /** Set IP reassembly configuration */
> > eth_ip_reassembly_conf_set_t ip_reassembly_conf_set;
> >
> > +   /** Get supported ptypes to split */
> > +   eth_buffer_split_hdr_ptype_get_t hdrs_supported_ptypes_get;
> > +
> 
> It is better to be consistent with naming. I.e. just cut prefix "eth_"
> and suffix "_t".
> 
> Also the type name sounds like it get current split configuration, not
> supported one.

Thank you for your suggestion, will fix in the next version.

> 
> > /** Dump private info from device */
> > eth_dev_priv_dump_t eth_dev_priv_dump;
> >   };
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> > 29a3d80466..e1f2a0ffe3 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -1636,9 +1636,10 @@ rte_eth_dev_is_removed(uint16_t port_id)
> >   }
> >
> >   static int
> > -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg,
> > -uint16_t n_seg, uint32_t *mbp_buf_size,
> > -const struct rte_eth_dev_info *dev_info)
> > +rte_eth_rx_queue_check_split(uint16_t port_id,
> > +   const struct rte_eth_rxseg_split *rx_seg,
> > +   int16_t n_seg, uint32_t *mbp_buf_size,
> > +   const struct rte_eth_dev_info *dev_info)
> >   {
> > const struct rte_eth_rxseg_capa *seg_capa = &dev_info-
> >rx_seg_capa;
> > struct rte_mempool *mp_first;
> > @@ -1694,13 +1695,7 @@ rte_eth_rx_queue_check_split(const struct
> rte_eth_rxseg_split *rx_seg,
> > }
> > offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM;
> > *mbp_buf_size = rte_pktmbuf_data_room_size(mpl);
> > -   length = length != 0 ? length : *mbp_buf_size;
> > -   if (*mbp_buf_size < length + offset) {
> 
> I don't understand why the check goes away completely.

Thanks for your catch, it should be in the patch 2, will fix in the next 
version.

> 
> > -   RTE_ETHDEV_LOG(ERR,
> > -  "%s mbuf_data_room_size %u < %u
> (segment length=%u + segment offset=%u)\n",
> > -  mpl->name, *mbp_buf_size,
> > -  length + offset, length, offset);
> > -   return -EINVAL;
> > +
> 
> Unnecessary empty line
> 
> > }
> 
> Shouldn't the curly bracket go away as well together with its 'if'

Thanks for your catch, will fix in the next version.

> 
> > }
> > return 0;
> > @@ -1779,7 +1774,7 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> > n_seg = rx_conf->rx_nseg;
> >
> > if (rx_conf->offloads &
> RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) {
> > -   ret = rte_eth_rx_queue_check_split(rx_seg, n_seg,
> > +   ret = rte_eth_rx_queue_check_split(port_id, rx_seg,
> n_seg,
> >&mbp_buf_size,
> >&dev_info);
> > if (ret != 0)
> > @@ -5844,6 +5839,20 @@ rte_eth_ip_reassembly_conf_set(uint16_t
> port_id,
> >(*dev->dev_ops->ip_reassembly_conf_set)(dev, conf));
> >   }
> >
> > +int
> > +rte_eth_supported_hdrs_get(uint16_t port_id, uint32_t *ptypes) {
> > +   struct rte_eth_dev *dev;
> > +   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +   dev = &rte_eth_devices[port_id];
> 
> ptypes must be checked vs NULL

Thanks for your catch, will fix in the next version.

> 
> > +
> > +   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops-
> >hdrs_supported_ptypes_get,
> > + 

[PATCH v3] pcapng: fix write more packets than IOV_MAX limit

2022-08-01 Thread Mário Kuka
The rte_pcapng_write_packets() function fails when we try to write more
packets than the IOV_MAX limit. writev() system call is limited by the
IOV_MAX limit. The iovcnt argument is valid if it is greater than 0 and
less than or equal to IOV_MAX as defined in .

To avoid this problem, we can check that all segments of the next
packet will fit into the iovec buffer, whose capacity will be limited
by the IOV_MAX limit. If not, we flush the current iovec buffer to the
file by calling writev() and, if successful, fit the current packet at
the beginning of the flushed iovec buffer.

Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")
Cc: step...@networkplumber.org

Signed-off-by: Mário Kuka 
---
v3:
* Remove unwanted changes to check for partial writes from the writev().

 app/test/test_pcapng.c  | 42 +++-
 lib/pcapng/rte_pcapng.c | 47 -
 2 files changed, 64 insertions(+), 25 deletions(-)

diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
index 320dacea34..7f51946fff 100644
--- a/app/test/test_pcapng.c
+++ b/app/test/test_pcapng.c
@@ -110,7 +110,7 @@ test_setup(void)
}
 
/* Make a pool for cloned packets */
-   mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", NUM_PACKETS,
+   mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", IOV_MAX + 
NUM_PACKETS,
0, 0,
rte_pcapng_mbuf_size(pkt_len),
SOCKET_ID_ANY, "ring_mp_sc");
@@ -237,6 +237,45 @@ test_validate(void)
return ret;
 }
 
+static int
+test_write_over_limit_iov_max(void)
+{
+   struct rte_mbuf *orig;
+   struct rte_mbuf *clones[IOV_MAX + NUM_PACKETS] = { };
+   struct dummy_mbuf mbfs;
+   unsigned int i;
+   ssize_t len;
+
+   /* make a dummy packet */
+   mbuf1_prepare(&mbfs, pkt_len);
+
+   /* clone them */
+   orig  = &mbfs.mb[0];
+   for (i = 0; i < IOV_MAX + NUM_PACKETS; i++) {
+   struct rte_mbuf *mc;
+
+   mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+   rte_get_tsc_cycles(), 0);
+   if (mc == NULL) {
+   fprintf(stderr, "Cannot copy packet\n");
+   return -1;
+   }
+   clones[i] = mc;
+   }
+
+   /* write it to capture file */
+   len = rte_pcapng_write_packets(pcapng, clones, IOV_MAX + NUM_PACKETS);
+
+   rte_pktmbuf_free_bulk(clones, IOV_MAX + NUM_PACKETS);
+
+   if (len <= 0) {
+   fprintf(stderr, "Write of packets failed\n");
+   return -1;
+   }
+
+   return 0;
+}
+
 static void
 test_cleanup(void)
 {
@@ -256,6 +295,7 @@ unit_test_suite test_pcapng_suite  = {
TEST_CASE(test_write_packets),
TEST_CASE(test_write_stats),
TEST_CASE(test_validate),
+   TEST_CASE(test_write_over_limit_iov_max),
TEST_CASES_END()
}
 };
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index 06ad712bd1..e41cf909e1 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -551,33 +551,16 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
return NULL;
 }
 
-/* Count how many segments are in this array of mbufs */
-static unsigned int
-mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
-{
-   unsigned int i, iovcnt;
-
-   for (iovcnt = 0, i = 0; i < n; i++) {
-   const struct rte_mbuf *m = pkts[i];
-
-   __rte_mbuf_sanity_check(m, 1);
-
-   iovcnt += m->nb_segs;
-   }
-   return iovcnt;
-}
-
 /* Write pre-formatted packets to file. */
 ssize_t
 rte_pcapng_write_packets(rte_pcapng_t *self,
 struct rte_mbuf *pkts[], uint16_t nb_pkts)
 {
-   int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
-   struct iovec iov[iovcnt];
-   unsigned int i, cnt;
-   ssize_t ret;
+   struct iovec iov[IOV_MAX];
+   unsigned int i, cnt = 0;
+   ssize_t ret, total = 0;
 
-   for (i = cnt = 0; i < nb_pkts; i++) {
+   for (i = 0; i < nb_pkts; i++) {
struct rte_mbuf *m = pkts[i];
struct pcapng_enhance_packet_block *epb;
 
@@ -589,6 +572,20 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
return -1;
}
 
+   /*
+* Handle case of highly fragmented and large burst size
+* Note: this assumes that max segments per mbuf < IOV_MAX
+*/
+   if (unlikely(cnt + m->nb_segs >= IOV_MAX)) {
+   ret = writev(self->outfd, iov, cnt);
+   if (unlikely(ret < 0)) {
+   rte_errno = errno;
+   return -1;
+   }
+   total += ret;
+  

Re: [PATCH v2 2/2] pcapng: check if writev() returns a partial write

2022-08-01 Thread Mário Kuka

As pcapng is used in the dpdk application it writes to a file.
You could repurpose it to something else, but even a pipe will not
give partial writes unless you configure the pipe as non-blocking.

Writing to a non-blocking pipe is going to have a load of other problems.

This seems like a purely hypothetical case, can't see why it needs to be 
addressed.


OK, I'm sending patch v3 which only fixes the IVO_MAX limit issue.
I've removed the changes related to the partial write check.

On 29/07/2022 20:14, Stephen Hemminger wrote:

On Fri, 29 Jul 2022 19:08:41 +0200
Mário Kuka  wrote:


Since this is being written to a file, handling partial writes makes little
sense. The only case where partial write would happen would be if filesystem
was full. Retrying just adds unnecessary complexity.

If you really want to track this, then add a dropped counter.

But the file descriptor doesn't have to refer to just a regular file, what
if it's a socket or a pipe or some device? The pcapng documentation doesn't
say anything about any restrictions, so the implementation should be fully
generic. What's the point of a function to write packets to a file
descriptor
where there's a risk that it won't write all the packets or that the
file will
by corrupted due to a partial write and still not even let me know about
it?

As pcapng is used in the dpdk application it writes to a file.
You could repurpose it to something else, but even a pipe will not
give partial writes unless you configure the pipe as non-blocking.

Writing to a non-blocking pipe is going to have a load of other problems.

This seems like a purely hypothetical case, can't see why it needs to be 
addressed.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH v9 1/4] ethdev: introduce protocol header API

2022-08-01 Thread Thomas Monjalon
01/08/2022 09:09, Wang, YuanX:
> Hi Thomas,
> 
> Sorry so long to response your email.
> 
> From: Thomas Monjalon 
> > 13/06/2022 12:25, wenxuanx...@intel.com:
> > > From: Wenxuan Wu 
> > >
> > > This patch added new ethdev API to retrieve supported protocol header
> > > mask of a PMD, which helps to configure protocol header based buffer 
> > > split.
> > >
> > > Signed-off-by: Wenxuan Wu 
> > > ---
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Get supported header protocols to split supported by PMD.
> > > + * The API will return error if the device is not valid.
> > > + *
> > > + * @param port_id
> > > + *   The port identifier of the device.
> > > + * @param ptype
> > > + *   Supported protocol headers of driver.
> > 
> > It doesn't say where to find the types.
> > Please give the prefix.
> 
> Sorry I didn't catch your point, are you referring the ptype should be 
> composed of RTE_PTYPE_*?
> Could you explain it in more detail?

Yes just give to the user the required info to use the function.
If ptype must be composed with RTE_PTYPE_*, it must be said.

Thanks




[PATCH 0/2] vhost fixes for OVS SIGSEGV in PMD

2022-08-01 Thread Claudio Fontana
Hi, the real meat is in patch 1/2,

which fixes a segmentation fault in the OVS PMD thread when
resynchronizing with QEMU after the guest application has been killed
with SIGKILL.

This fixes an issue where the guest DPDK application is able
to crash the OVS process on the host.

Patch 2/2 is just an improvement in the current error handling.

For your review and comments,

Claudio

Claudio Fontana (2):
  vhost: fix error handling in virtio_dev_tx_split
  vhost: improve error handling in desc_to_mbuf

 lib/vhost/virtio_net.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

-- 
2.26.2



[PATCH 1/2] vhost: fix error handling in virtio_dev_tx_split

2022-08-01 Thread Claudio Fontana
in virtio_dev_split we add a check for invalid nr_vec,
mainly for nr_vec == 0 (but add a check for BUF_VECTOR_MAX too),
and bail out before calling desc_to_mbuf, otherwise in desc_to_mbuf
we end up trying to memcpy from a source address buf_vec[0] that
is an uninitialized stack variable.

This should fix errors that have been reported in multiple occasions
from telcos to the DPDK, OVS and QEMU projects, as this affects in
particular the openvswitch/DPDK, QEMU vhost-user setup.

The back trace looks roughly like this, depending on the specific
rte_memcpy selected, etc, in any case the "src" parameter is garbage
(in this example containing 0 + dev->host_hlen(12 = 0xc)).

Thread 153 "pmd-c88/id:150" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f64e5e6b700 (LWP 141373)]
rte_mov128blocks (n=2048, src=0xc ,
 dst=0x150da4480) at ../lib/eal/x86/include/rte_memcpy.h:384
(gdb) bt
0  rte_mov128blocks (n=2048, src=0xc, dst=0x150da4480)
1  rte_memcpy_generic (n=2048, src=0xc, dst=0x150da4480)
2  rte_memcpy (n=2048, src=0xc, dst=)
3  sync_fill_seg
4  desc_to_mbuf
5  virtio_dev_tx_split
6  virtio_dev_tx_split_legacy
7  0x7f676fea0fef in rte_vhost_dequeue_burst
8  0x7f6772005a62 in netdev_dpdk_vhost_rxq_recv
9  0x7f6771f38116 in netdev_rxq_recv
10 0x7f6771f03d96 in dp_netdev_process_rxq_port
11 0x7f6771f04239 in pmd_thread_main
12 0x7f6771f92aff in ovsthread_wrapper
13 0x7f6771c1b6ea in start_thread
14 0x7f6771933a8f in clone

Tested-by: Claudio Fontana 
Signed-off-by: Claudio Fontana 
---
 lib/vhost/virtio_net.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 35fa4670fd..0b8db2046e 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2917,9 +2917,16 @@ virtio_dev_tx_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
vq->last_avail_idx + i,
&nr_vec, buf_vec,
&head_idx, &buf_len,
-   VHOST_ACCESS_RO) < 0))
+   VHOST_ACCESS_RO) < 0)) {
+   dropped += 1;
+   i++;
break;
-
+   }
+   if (unlikely(nr_vec < 1 || nr_vec >= BUF_VECTOR_MAX)) {
+   dropped += 1;
+   i++;
+   break;
+   }
update_shadow_used_ring_split(vq, head_idx, 0);
 
err = virtio_dev_pktmbuf_prep(dev, pkts[i], buf_len);
-- 
2.26.2



[PATCH 2/2] vhost: improve error handling in desc_to_mbuf

2022-08-01 Thread Claudio Fontana
check when increasing vec_idx that it is still valid
in the (buf_len < dev->vhost_hlen) case too.

Signed-off-by: Claudio Fontana 
---
 lib/vhost/virtio_net.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 0b8db2046e..6d34feaf73 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2701,12 +2701,15 @@ desc_to_mbuf(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
if (unlikely(buf_len < dev->vhost_hlen)) {
buf_offset = dev->vhost_hlen - buf_len;
vec_idx++;
+   if (unlikely(vec_idx >= nr_vec))
+   goto error;
buf_addr = buf_vec[vec_idx].buf_addr;
buf_iova = buf_vec[vec_idx].buf_iova;
buf_len = buf_vec[vec_idx].buf_len;
buf_avail  = buf_len - buf_offset;
} else if (buf_len == dev->vhost_hlen) {
-   if (unlikely(++vec_idx >= nr_vec))
+   vec_idx++;
+   if (unlikely(vec_idx >= nr_vec))
goto error;
buf_addr = buf_vec[vec_idx].buf_addr;
buf_iova = buf_vec[vec_idx].buf_iova;
-- 
2.26.2



Re: Question about pattern types for rte_flow RSS rule

2022-08-01 Thread Andrew Rybchenko

On 8/1/22 06:39, lihuisong (C) wrote:


在 2022/7/31 17:40, Andrew Rybchenko 写道:

Hi, Huisong!

On 7/29/22 05:30, lihuisong (C) wrote:

Hi Ori, and all,

For RSS flow rule, pattern item types and RSS types in action
are an inclusive relationship, and RSS types contain pattern item
types. 


I disagree with the statement. We can redirect various packets,
but apply RSS on subset only. Everything else goes to the first
queue (first entry in the redirection table in fact).

Sorry, the statement above is inaccurate.
I mean, pattern item type and RSS types in action are the same for
creating specified types RSS rule, and this configuration is duplicate
in this case(in <1> command, I think). Use one of them can specify
which packet type flow performs action(like <1> or <2> command).


No, I still disagree. Pattern is about filtering. It defines on which
packets flow rule actions are applied. RSS types choose packet header
fields (if applicable) to calculate hash. I.e. RSS types are no about
filtering.



Create a RSS rule that redirect ipv4-tcp flow to queues 1 and 2:
<1> flow create 0 ingress pattern eth / ipv4 / tcp / end actions rss 
types ipv4-tcp end queues 1 2 end / end


Yes, this one will redirect IPv4 TCP packets only.

<2> flow create 0 ingress pattern end actions rss types ipv4-tcp end 
queues 1 2 end / end


The rule will redirect all packets to queues 1 and 2. Some packets will
always go to queue 1, e.g. ARP packets.

<3> flow create 0 ingress pattern eth / ipv4 / tcp / end actions rss end 
queues 1 2 end / end


This one will redirect IPv4 TCP and will use RSS hash function specified
in dev_conf on device configuration or default RSS hash function chosen
by corresponding driver.



What do you think about it, @Andrew and @Ori?



Is it necessary to set pattern item types when specify RSS
types to create a rule? 


No, it is not strictly required. It depends on what you want.


How should the user set and how should the
driver do?


Pattern and action are not strictly related in the case of RSS.
Pattern defines on which packets the rule is applied.
Action defines what the rule does. If hash function is not
applicable to a packet, e.g. ARP packet and ipv4-tcp hash function,
the hash is 0 and goes via redirection table entry 0.
I know this rule. Ori has already explained the usage of RSS rule in 
rte_flow API,

but I still have a confusion momentioned above.


Andrew.



Looking forward to your reply.

Regards,
Huisong

在 2022/7/13 9:34, lihuisong (C) 写道:

Hi all,

Can someone open my confusion?
I'm looking forward to your reply.

Thanks,
Huisong.

在 2022/7/7 11:50, lihuisong (C) 写道:

Hi all,

From following testpmd command:
'flow create 0 ingress pattern eth / ipv4 / tcp / end actions rss 
types ipv4-tcp l3-src-only end queues end / end'

and
"flow create 0 ingress pattern end actions rss types ipv4-tcp 
l3-src-only end queues end / end"


I have some confusions about rte_flow RSS rule:
1> Do pattern item types need to set when configure rte_flow RSS rule?
2> Does the driver need to check and process the pattern? (After 
all, the RSS types in actions alreadly contain all RSS offload types.)


Have someone explains it?

Regards,
Huisong

.

.


.




Re: segfault in ovs in setup with DPDK, qemu vhost-user

2022-08-01 Thread Claudio Fontana
On 7/30/22 18:23, Claudio Fontana wrote:
> On 7/30/22 18:17, Claudio Fontana wrote:
>> Hello all,
>>
>> with the latest DPDK, openvswitch and qemu
>>
>> DPDK tag v22.07
>> openvswitch tag v2.17.1
>> qemu v7.1-git 22.07.2022
>>
>> and a DPDK setup which involves also an ubuntu guest with DPDK 16.11 
>> test-pmd application (but also verified with DPDK 19.x),
>> with an external traffic generator to cause some load,
>>
>> I am able to cause a segfault in OVS (ovs-vswitchd) inside the DPDK 
>> libraries by doing (from the guest):
>>
>> bind the device, start testpmd,
>> SIGKILL of testpmd,
>> immediately restart testpmd,
>> rinse and repeat.
>>
>> Once every few restarts, the following segfault happens (may take anything 
>> from a few seconds to minutes):
>>
>>
>> Thread 153 "pmd-c88/id:150" received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7f64e5e6b700 (LWP 141373)]
>> rte_mov128blocks (n=2048, src=0xc > 0xc>, dst=0x150da4480 "h\005\312❇\377\377\377\377\377\377\b") at 
>> ../lib/eal/x86/include/rte_memcpy.h:384
>> 384  ../lib/eal/x86/include/rte_memcpy.h: No such file or directory.
>> (gdb) bt
>> #0  rte_mov128blocks (n=2048, src=0xc > address 0xc>, 
>> dst=0x150da4480 "h\005\312❇\377\377\377\377\377\377\b") at 
>> ../lib/eal/x86/include/rte_memcpy.h:384
>> #1  rte_memcpy_generic (n=2048, src=0xc, dst=0x150da4480) at 
>> ../lib/eal/x86/include/rte_memcpy.h:484
>> #2  rte_memcpy (n=2048, src=0xc, dst=) at 
>> ../lib/eal/x86/include/rte_memcpy.h:851
>> #3  sync_fill_seg (to_desc=false, cpy_len=2048, buf_iova=, 
>> buf_addr=12, mbuf_offset=0, m=0x150da4140, 
>> vq=0x2200400680, dev=0x2200d3d740) at ../lib/vhost/virtio_net.c:1119
>> #4  desc_to_mbuf (is_async=false, slot_idx=0, legacy_ol_flags=true, 
>> mbuf_pool=0x17fe7df00, m=0x150da4140, nr_vec=, 
>> buf_vec=0x7f64e5e67ca0, vq=0x2200400680, dev=0x2200d3d740) at 
>> ../lib/vhost/virtio_net.c:2747
>> #5  virtio_dev_tx_split (legacy_ol_flags=true, count=, 
>> count@entry=0, pkts=pkts@entry=0x0, 
>> mbuf_pool=mbuf_pool@entry=0x150da4140, vq=vq@entry=0xe5e67d34, 
>> dev=dev@entry=0x7f64e5e694d0) at ../lib/vhost/virtio_net.c:2943
>> #6  virtio_dev_tx_split_legacy (dev=dev@entry=0x2200d3d740, 
>> vq=vq@entry=0x2200400680, mbuf_pool=mbuf_pool@entry=0x17fe7df00, 
>> pkts=pkts@entry=0x7f64e5e69600, count=count@entry=32) at 
>> ../lib/vhost/virtio_net.c:2979
>> #7  0x7f676fea0fef in rte_vhost_dequeue_burst (vid=vid@entry=0, 
>> queue_id=queue_id@entry=1, mbuf_pool=0x17fe7df00, 
>> pkts=pkts@entry=0x7f64e5e69600, count=count@entry=32) at 
>> ../lib/vhost/virtio_net.c:3331
>> #8  0x7f6772005a62 in netdev_dpdk_vhost_rxq_recv (rxq=, 
>> batch=0x7f64e5e695f0, qfill=0x0)
>> at ../lib/netdev-dpdk.c:2393
>> #9  0x7f6771f38116 in netdev_rxq_recv (rx=, 
>> batch=batch@entry=0x7f64e5e695f0, qfill=)
>> at ../lib/netdev.c:727
>> #10 0x7f6771f03d96 in dp_netdev_process_rxq_port 
>> (pmd=pmd@entry=0x7f64e5e6c010, rxq=0x254d730, port_no=2)
>> at ../lib/dpif-netdev.c:5317
>> #11 0x7f6771f04239 in pmd_thread_main (f_=) at 
>> ../lib/dpif-netdev.c:6945
>> #12 0x7f6771f92aff in ovsthread_wrapper (aux_=) at 
>> ../lib/ovs-thread.c:422
>> #13 0x7f6771c1b6ea in start_thread () from /lib64/libpthread.so.0
>> #14 0x7f6771933a8f in clone () from /lib64/libc.so.6
>>
>> When run in gdb as shown above, ovs-vswitchd on the host gets a SIGSEGV and 
>> drops to gdb as shown above,
>> so as a result QEMU stops when trying to read a response from ovs as such:
>>
>> 0  0x7f0a093991e9 in poll () from target:/lib64/libc.so.6
>> #1  0x7f0a0b06c9a9 in ?? () from target:/usr/lib64/libglib-2.0.so.0
>> #2  0x7f0a0b06ccf2 in g_main_loop_run () from 
>> target:/usr/lib64/libglib-2.0.so.0
>> #3  0x561a5cd04747 in vhost_user_read (dev=dev@entry=0x561a5e640df0, 
>> msg=msg@entry=0x7f09ff7fd160)
>> at ../hw/virtio/vhost-user.c:406
>> #4  0x561a5cd04c7e in vhost_user_get_vring_base (dev=0x561a5e640df0, 
>> ring=0x7f09ff7fd428)
>> at ../hw/virtio/vhost-user.c:1261
>> #5  0x561a5cd0043f in vhost_virtqueue_stop 
>> (dev=dev@entry=0x561a5e640df0, vdev=vdev@entry=0x561a5f78ae50, 
>> vq=0x561a5e641070, idx=0) at ../hw/virtio/vhost.c:1216
>> #6  0x561a5cd034fa in vhost_dev_stop (hdev=hdev@entry=0x561a5e640df0, 
>> vdev=vdev@entry=0x561a5f78ae50)
>> at ../hw/virtio/vhost.c:1872
>> #7  0x561a5cb623fa in vhost_net_stop_one (net=0x561a5e640df0, 
>> dev=dev@entry=0x561a5f78ae50)
>> at ../hw/net/vhost_net.c:315
>> #8  0x561a5cb6295e in vhost_net_stop (dev=dev@entry=0x561a5f78ae50, 
>> ncs=0x561a5f808970, 
>> data_queue_pairs=data_queue_pairs@entry=4, cvq=cvq@entry=0) at 
>> ../hw/net/vhost_net.c:427
>> #9  0x561a5cccef79 in virtio_net_vhost_status (status=, 
>> n=0x561a5f78ae50)
>> at ../hw/net/virtio-net.c:298
>> #10 virtio_net_set_status (vdev=0x561a5f78ae50, status=0 '\000') at 
>> ../hw/net/virtio-net.c:372
>> #11 0x561a5ccfb36b in virtio_set_status (vdev=vdev@entry=0

Re: segfault in ovs in setup with DPDK, qemu vhost-user

2022-08-01 Thread Claudio Fontana
On 8/1/22 13:57, Claudio Fontana wrote:
> On 7/30/22 18:23, Claudio Fontana wrote:
>> On 7/30/22 18:17, Claudio Fontana wrote:
>>> Hello all,
>>>
>>> with the latest DPDK, openvswitch and qemu
>>>
>>> DPDK tag v22.07
>>> openvswitch tag v2.17.1
>>> qemu v7.1-git 22.07.2022
>>>
>>> and a DPDK setup which involves also an ubuntu guest with DPDK 16.11 
>>> test-pmd application (but also verified with DPDK 19.x),
>>> with an external traffic generator to cause some load,
>>>
>>> I am able to cause a segfault in OVS (ovs-vswitchd) inside the DPDK 
>>> libraries by doing (from the guest):
>>>
>>> bind the device, start testpmd,
>>> SIGKILL of testpmd,
>>> immediately restart testpmd,
>>> rinse and repeat.
>>>
>>> Once every few restarts, the following segfault happens (may take anything 
>>> from a few seconds to minutes):
>>>
>>>
>>> Thread 153 "pmd-c88/id:150" received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0x7f64e5e6b700 (LWP 141373)]
>>> rte_mov128blocks (n=2048, src=0xc >> 0xc>, dst=0x150da4480 "h\005\312❇\377\377\377\377\377\377\b") at 
>>> ../lib/eal/x86/include/rte_memcpy.h:384
>>> 384 ../lib/eal/x86/include/rte_memcpy.h: No such file or directory.
>>> (gdb) bt
>>> #0  rte_mov128blocks (n=2048, src=0xc >> address 0xc>, 
>>> dst=0x150da4480 "h\005\312❇\377\377\377\377\377\377\b") at 
>>> ../lib/eal/x86/include/rte_memcpy.h:384
>>> #1  rte_memcpy_generic (n=2048, src=0xc, dst=0x150da4480) at 
>>> ../lib/eal/x86/include/rte_memcpy.h:484
>>> #2  rte_memcpy (n=2048, src=0xc, dst=) at 
>>> ../lib/eal/x86/include/rte_memcpy.h:851
>>> #3  sync_fill_seg (to_desc=false, cpy_len=2048, buf_iova=, 
>>> buf_addr=12, mbuf_offset=0, m=0x150da4140, 
>>> vq=0x2200400680, dev=0x2200d3d740) at ../lib/vhost/virtio_net.c:1119
>>> #4  desc_to_mbuf (is_async=false, slot_idx=0, legacy_ol_flags=true, 
>>> mbuf_pool=0x17fe7df00, m=0x150da4140, nr_vec=, 
>>> buf_vec=0x7f64e5e67ca0, vq=0x2200400680, dev=0x2200d3d740) at 
>>> ../lib/vhost/virtio_net.c:2747
>>> #5  virtio_dev_tx_split (legacy_ol_flags=true, count=, 
>>> count@entry=0, pkts=pkts@entry=0x0, 
>>> mbuf_pool=mbuf_pool@entry=0x150da4140, vq=vq@entry=0xe5e67d34, 
>>> dev=dev@entry=0x7f64e5e694d0) at ../lib/vhost/virtio_net.c:2943
>>> #6  virtio_dev_tx_split_legacy (dev=dev@entry=0x2200d3d740, 
>>> vq=vq@entry=0x2200400680, mbuf_pool=mbuf_pool@entry=0x17fe7df00, 
>>> pkts=pkts@entry=0x7f64e5e69600, count=count@entry=32) at 
>>> ../lib/vhost/virtio_net.c:2979
>>> #7  0x7f676fea0fef in rte_vhost_dequeue_burst (vid=vid@entry=0, 
>>> queue_id=queue_id@entry=1, mbuf_pool=0x17fe7df00, 
>>> pkts=pkts@entry=0x7f64e5e69600, count=count@entry=32) at 
>>> ../lib/vhost/virtio_net.c:3331
>>> #8  0x7f6772005a62 in netdev_dpdk_vhost_rxq_recv (rxq=, 
>>> batch=0x7f64e5e695f0, qfill=0x0)
>>> at ../lib/netdev-dpdk.c:2393
>>> #9  0x7f6771f38116 in netdev_rxq_recv (rx=, 
>>> batch=batch@entry=0x7f64e5e695f0, qfill=)
>>> at ../lib/netdev.c:727
>>> #10 0x7f6771f03d96 in dp_netdev_process_rxq_port 
>>> (pmd=pmd@entry=0x7f64e5e6c010, rxq=0x254d730, port_no=2)
>>> at ../lib/dpif-netdev.c:5317
>>> #11 0x7f6771f04239 in pmd_thread_main (f_=) at 
>>> ../lib/dpif-netdev.c:6945
>>> #12 0x7f6771f92aff in ovsthread_wrapper (aux_=) at 
>>> ../lib/ovs-thread.c:422
>>> #13 0x7f6771c1b6ea in start_thread () from /lib64/libpthread.so.0
>>> #14 0x7f6771933a8f in clone () from /lib64/libc.so.6
>>>
>>> When run in gdb as shown above, ovs-vswitchd on the host gets a SIGSEGV and 
>>> drops to gdb as shown above,
>>> so as a result QEMU stops when trying to read a response from ovs as such:
>>>
>>> 0  0x7f0a093991e9 in poll () from target:/lib64/libc.so.6
>>> #1  0x7f0a0b06c9a9 in ?? () from target:/usr/lib64/libglib-2.0.so.0
>>> #2  0x7f0a0b06ccf2 in g_main_loop_run () from 
>>> target:/usr/lib64/libglib-2.0.so.0
>>> #3  0x561a5cd04747 in vhost_user_read (dev=dev@entry=0x561a5e640df0, 
>>> msg=msg@entry=0x7f09ff7fd160)
>>> at ../hw/virtio/vhost-user.c:406
>>> #4  0x561a5cd04c7e in vhost_user_get_vring_base (dev=0x561a5e640df0, 
>>> ring=0x7f09ff7fd428)
>>> at ../hw/virtio/vhost-user.c:1261
>>> #5  0x561a5cd0043f in vhost_virtqueue_stop 
>>> (dev=dev@entry=0x561a5e640df0, vdev=vdev@entry=0x561a5f78ae50, 
>>> vq=0x561a5e641070, idx=0) at ../hw/virtio/vhost.c:1216
>>> #6  0x561a5cd034fa in vhost_dev_stop (hdev=hdev@entry=0x561a5e640df0, 
>>> vdev=vdev@entry=0x561a5f78ae50)
>>> at ../hw/virtio/vhost.c:1872
>>> #7  0x561a5cb623fa in vhost_net_stop_one (net=0x561a5e640df0, 
>>> dev=dev@entry=0x561a5f78ae50)
>>> at ../hw/net/vhost_net.c:315
>>> #8  0x561a5cb6295e in vhost_net_stop (dev=dev@entry=0x561a5f78ae50, 
>>> ncs=0x561a5f808970, 
>>> data_queue_pairs=data_queue_pairs@entry=4, cvq=cvq@entry=0) at 
>>> ../hw/net/vhost_net.c:427
>>> #9  0x561a5cccef79 in virtio_net_vhost_status (status=, 
>>> n=0x561a5f78ae50)
>>> at ../hw/net/virtio-net.c:298
>>> #10 virtio_net_se

[PATCH] app/testpmd: add command line argument 'rx-metadata'

2022-08-01 Thread Hanumanth Pothula
Presently, rx metadata is sent to PMD by default, leading
to a performance drop as processing for the same in rx path
takes extra cycles.

Hence, introducing command line argument, 'rx-metadata' to
control passing rx metadata to PMD. By default it’s disabled.

ci: skip_checkpatch

Signed-off-by: Hanumanth Pothula 
Change-Id: If7b6bbc7489d3e9df89c63e000d71ea2f7fe9afd
---
 app/test-pmd/parameters.c | 4 
 app/test-pmd/testpmd.c| 6 +-
 app/test-pmd/testpmd.h| 2 ++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index e3c9757f3f..daf1218977 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -213,6 +213,7 @@ usage(char* progname)
printf("  --hairpin-mode=0xXX: bitmask set the hairpin port mode.\n"
   "0x10 - explicit Tx rule, 0x02 - hairpin ports paired\n"
   "0x01 - hairpin ports loop, 0x00 - hairpin port self\n");
+   printf(" --rx-metadata: send rx  metadata to driver \n");
 }
 
 #ifdef RTE_LIB_CMDLINE
@@ -710,6 +711,7 @@ launch_args_parse(int argc, char** argv)
{ "record-burst-stats", 0, 0, 0 },
{ PARAM_NUM_PROCS,  1, 0, 0 },
{ PARAM_PROC_ID,1, 0, 0 },
+   { "rx-metadata",0, 0, 0 },
{ 0, 0, 0, 0 },
};
 
@@ -1510,6 +1512,8 @@ launch_args_parse(int argc, char** argv)
num_procs = atoi(optarg);
if (!strcmp(lgopts[opt_idx].name, PARAM_PROC_ID))
proc_id = atoi(optarg);
+   if (!strcmp(lgopts[opt_idx].name, "rx-metadata"))
+   rx_metadata_negotiate = 1;
break;
case 'h':
usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index addcbcac85..ebbde5dfc9 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -411,6 +411,9 @@ uint8_t clear_ptypes = true;
 /* Hairpin ports configuration mode. */
 uint16_t hairpin_mode;
 
+/* send rx metadata */
+uint8_t rx_metadata_negotiate;
+
 /* Pretty printing of ethdev events */
 static const char * const eth_event_desc[] = {
[RTE_ETH_EVENT_UNKNOWN] = "unknown",
@@ -1628,7 +1631,8 @@ init_config_port_offloads(portid_t pid, uint32_t 
socket_id)
int ret;
int i;
 
-   eth_rx_metadata_negotiate_mp(pid);
+   if (rx_metadata_negotiate)
+   eth_rx_metadata_negotiate_mp(pid);
 
port->dev_conf.txmode = tx_mode;
port->dev_conf.rxmode = rx_mode;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fb2f5195d3..8a9168c51e 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -621,6 +621,8 @@ extern struct rte_ether_addr 
peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for 
mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. 
*/
 
+extern uint8_t rx_metadata_negotiate;
+
 #ifdef RTE_LIB_GRO
 #define GRO_DEFAULT_ITEM_NUM_PER_FLOW 32
 #define GRO_DEFAULT_FLOW_NUM (RTE_GRO_MAX_BURST_ITEM_NUM / \
-- 
2.25.1



[PATCH] app/testpmd: add command line argument 'rx-metadata'

2022-08-01 Thread Hanumanth Pothula
Presently, rx metadata is sent to PMD by default, leading
to a performance drop as processing for the same in rx path
takes extra cycles.

Hence, introducing command line argument, 'rx-metadata' to
control passing rx metadata to PMD. By default it’s disabled.

Signed-off-by: Hanumanth Pothula 
---
 app/test-pmd/parameters.c | 4 
 app/test-pmd/testpmd.c| 6 +-
 app/test-pmd/testpmd.h| 2 ++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index e3c9757f3f..daf1218977 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -213,6 +213,7 @@ usage(char* progname)
printf("  --hairpin-mode=0xXX: bitmask set the hairpin port mode.\n"
   "0x10 - explicit Tx rule, 0x02 - hairpin ports paired\n"
   "0x01 - hairpin ports loop, 0x00 - hairpin port self\n");
+   printf(" --rx-metadata: send rx  metadata to driver \n");
 }
 
 #ifdef RTE_LIB_CMDLINE
@@ -710,6 +711,7 @@ launch_args_parse(int argc, char** argv)
{ "record-burst-stats", 0, 0, 0 },
{ PARAM_NUM_PROCS,  1, 0, 0 },
{ PARAM_PROC_ID,1, 0, 0 },
+   { "rx-metadata",0, 0, 0 },
{ 0, 0, 0, 0 },
};
 
@@ -1510,6 +1512,8 @@ launch_args_parse(int argc, char** argv)
num_procs = atoi(optarg);
if (!strcmp(lgopts[opt_idx].name, PARAM_PROC_ID))
proc_id = atoi(optarg);
+   if (!strcmp(lgopts[opt_idx].name, "rx-metadata"))
+   rx_metadata_negotiate = 1;
break;
case 'h':
usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index addcbcac85..ebbde5dfc9 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -411,6 +411,9 @@ uint8_t clear_ptypes = true;
 /* Hairpin ports configuration mode. */
 uint16_t hairpin_mode;
 
+/* send rx metadata */
+uint8_t rx_metadata_negotiate;
+
 /* Pretty printing of ethdev events */
 static const char * const eth_event_desc[] = {
[RTE_ETH_EVENT_UNKNOWN] = "unknown",
@@ -1628,7 +1631,8 @@ init_config_port_offloads(portid_t pid, uint32_t 
socket_id)
int ret;
int i;
 
-   eth_rx_metadata_negotiate_mp(pid);
+   if (rx_metadata_negotiate)
+   eth_rx_metadata_negotiate_mp(pid);
 
port->dev_conf.txmode = tx_mode;
port->dev_conf.rxmode = rx_mode;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fb2f5195d3..8a9168c51e 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -621,6 +621,8 @@ extern struct rte_ether_addr 
peer_eth_addrs[RTE_MAX_ETHPORTS];
 extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for 
mac-retry. */
 extern uint32_t burst_tx_retry_num;  /**< Burst tx retry number for mac-retry. 
*/
 
+extern uint8_t rx_metadata_negotiate;
+
 #ifdef RTE_LIB_GRO
 #define GRO_DEFAULT_ITEM_NUM_PER_FLOW 32
 #define GRO_DEFAULT_FLOW_NUM (RTE_GRO_MAX_BURST_ITEM_NUM / \
-- 
2.25.1



Re: [PATCH v1] graph: fix out of bounds access when re-allocate node objs

2022-08-01 Thread Jerin Jacob
On Wed, Jul 27, 2022 at 8:10 AM Zhirun Yan  wrote:
>
> For __rte_node_enqueue_prologue(), If the number of objs is more than
> the node->size * 2, the extra objs will write out of bounds memory.
> It should use __rte_node_stream_alloc_size() to request enough memory.
>
> And for rte_node_next_stream_put(), it will re-allocate a small size,
> when the node free space is small and new objs is less than the current
> node->size. Some objs pointers behind new size may be lost. And it will
> cause memory leak. It should request enough size of memory, containing
> the original objs and new objs at least.
>
> Fixes: 40d4f51403ec ("graph: implement fastpath routines")
>
> Signed-off-by: Zhirun Yan 
> Signed-off-by: Liang, Cunming 
> ---
>  lib/graph/rte_graph_worker.h | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> index 0c0b9c095a..b7d145c3cb 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -218,13 +218,16 @@ static __rte_always_inline void
>  __rte_node_enqueue_prologue(struct rte_graph *graph, struct rte_node *node,
> const uint16_t idx, const uint16_t space)
>  {
> +   uint32_t req_size;
>
> /* Add to the pending stream list if the node is new */
> if (idx == 0)
> __rte_node_enqueue_tail_update(graph, node);
>
> -   if (unlikely(node->size < (idx + space)))
> -   __rte_node_stream_alloc(graph, node);
> +   if (unlikely(node->size < (idx + space))) {
> +   req_size = rte_align32pow2(node->size + space);
> +   __rte_node_stream_alloc_size(graph, node, req_size);
> +   }

Change looks good to me.

Please have an inline function to avoid code duplication(Same change
in rte_node_next_stream_get())


With above change:
Acked-by: Jerin Jacob 

>  }
>
>  /**
> @@ -430,9 +433,12 @@ rte_node_next_stream_get(struct rte_graph *graph, struct 
> rte_node *node,
> node = __rte_node_next_node_get(node, next);
> const uint16_t idx = node->idx;
> uint16_t free_space = node->size - idx;
> +   uint32_t req_size;
>
> -   if (unlikely(free_space < nb_objs))
> -   __rte_node_stream_alloc_size(graph, node, nb_objs);
> +   if (unlikely(free_space < nb_objs)) {
> +   req_size = rte_align32pow2(node->size + nb_objs);
> +   __rte_node_stream_alloc_size(graph, node, req_size);
> +   }
>
> return &node->objs[idx];
>  }
> --
> 2.25.1
>


Re: Question about pattern types for rte_flow RSS rule

2022-08-01 Thread Andrew Rybchenko

On 8/1/22 16:27, lihuisong (C) wrote:


在 2022/8/1 19:53, Andrew Rybchenko 写道:

On 8/1/22 06:39, lihuisong (C) wrote:


在 2022/7/31 17:40, Andrew Rybchenko 写道:

Hi, Huisong!

On 7/29/22 05:30, lihuisong (C) wrote:

Hi Ori, and all,

For RSS flow rule, pattern item types and RSS types in action
are an inclusive relationship, and RSS types contain pattern item
types. 


I disagree with the statement. We can redirect various packets,
but apply RSS on subset only. Everything else goes to the first
queue (first entry in the redirection table in fact).

Sorry, the statement above is inaccurate.
I mean, pattern item type and RSS types in action are the same for
creating specified types RSS rule, and this configuration is duplicate
in this case(in <1> command, I think). Use one of them can specify
which packet type flow performs action(like <1> or <2> command).


No, I still disagree. Pattern is about filtering. It defines on which
packets flow rule actions are applied. RSS types choose packet header
fields (if applicable) to calculate hash. I.e. RSS types are no about
filtering.



Create a RSS rule that redirect ipv4-tcp flow to queues 1 and 2:
<1> flow create 0 ingress pattern eth / ipv4 / tcp / end actions rss 
types ipv4-tcp end queues 1 2 end / end


Yes, this one will redirect IPv4 TCP packets only.

<2> flow create 0 ingress pattern end actions rss types ipv4-tcp end 
queues 1 2 end / end


The rule will redirect all packets to queues 1 and 2. Some packets will
always go to queue 1, e.g. ARP packets.


In this case, only nonfrag ipv4 TCP packet flow calculate RSS hash and may
distribute to queue 1 and 2 in hardware, and other packets will always go
to queue 1, right?


Yes.

<3> flow create 0 ingress pattern eth / ipv4 / tcp / end actions rss 
end queues 1 2 end / end


This one will redirect IPv4 TCP and will use RSS hash function specified
in dev_conf on device configuration or default RSS hash function chosen
by corresponding driver.



What do you think about it, @Andrew and @Ori?



Is it necessary to set pattern item types when specify RSS
types to create a rule? 


No, it is not strictly required. It depends on what you want.


How should the user set and how should the
driver do?


Pattern and action are not strictly related in the case of RSS.
Pattern defines on which packets the rule is applied.
Action defines what the rule does. If hash function is not
applicable to a packet, e.g. ARP packet and ipv4-tcp hash function,
the hash is 0 and goes via redirection table entry 0.
I know this rule. Ori has already explained the usage of RSS rule in 
rte_flow API,

but I still have a confusion momentioned above.


Andrew.



Looking forward to your reply.

Regards,
Huisong

在 2022/7/13 9:34, lihuisong (C) 写道:

Hi all,

Can someone open my confusion?
I'm looking forward to your reply.

Thanks,
Huisong.

在 2022/7/7 11:50, lihuisong (C) 写道:

Hi all,

From following testpmd command:
'flow create 0 ingress pattern eth / ipv4 / tcp / end actions rss 
types ipv4-tcp l3-src-only end queues end / end'

and
"flow create 0 ingress pattern end actions rss types ipv4-tcp 
l3-src-only end queues end / end"


I have some confusions about rte_flow RSS rule:
1> Do pattern item types need to set when configure rte_flow RSS 
rule?
2> Does the driver need to check and process the pattern? (After 
all, the RSS types in actions alreadly contain all RSS offload 
types.)


Have someone explains it?

Regards,
Huisong

.

.


.


.




Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split

2022-08-01 Thread Andrew Rybchenko

On 7/21/22 06:24, Ding, Xuan wrote:

Hi Andrew,


-Original Message-
From: Andrew Rybchenko 
Sent: 2022年7月8日 23:01
To: Wu, WenxuanX ; tho...@monjalon.net; Li,
Xiaoyun ; ferruh.yi...@xilinx.com; Singh, Aman Deep
; dev@dpdk.org; Zhang, Yuying
; Zhang, Qi Z ;
jerinjac...@gmail.com
Cc: step...@networkplumber.org; Ding, Xuan ; Wang,
YuanX ; Ray Kinsella 
Subject: Re: [PATCH v9 2/4] ethdev: introduce protocol hdr based buffer split

On 6/13/22 13:25, wenxuanx...@intel.com wrote:

From: Wenxuan Wu 

Currently, Rx buffer split supports length based split. With Rx queue
offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet

segment

configured, PMD will be able to split the received packets into
multiple segments.

However, length based buffer split is not suitable for NICs that do
split based on protocol headers. Given an arbitrarily variable length
in Rx packet segment, it is almost impossible to pass a fixed protocol
header to driver. Besides, the existence of tunneling results in the
composition of a packet is various, which makes the situation even worse.

This patch extends current buffer split to support protocol header
based buffer split. A new proto_hdr field is introduced in the
reserved field of rte_eth_rxseg_split structure to specify protocol
header. The proto_hdr field defines the split position of packet,
splitting will always happens after the protocol header defined in the
Rx packet segment. When Rx queue offload
RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding protocol
header is configured, driver will split the ingress packets into multiple

segments.


struct rte_eth_rxseg_split {

  struct rte_mempool *mp; /* memory pools to allocate segment from */
  uint16_t length; /* segment maximal data length,
  configures "split point" */
  uint16_t offset; /* data offset from beginning
  of mbuf data buffer */
  uint32_t proto_hdr; /* inner/outer L2/L3/L4 protocol header,
   configures "split point" */


There is a big problem here that using RTE_PTYPE_* defines I can't request split
after either TCP or UDP header.


Sorry, for some reason I missed your reply.

Current RTE_PTYPE_* list all the tunnel and L2/L3/L4 protocol headers (both 
outer and inner).
Do you mean that we should support higher layer protocols after L4?

I think tunnel and L2/L3/L4 protocol headers are enough.
In DPDK, we don't parse higher level protocols after L4.
And the higher layer protocols are richer, we can't list all of them.
What do you think?


It looks like you don't get my point. You simply cannot say:
RTE_PTYPE_L4_TCP | RTE_PTYPE_L4_UDP since it is numerically equal to
RTE_PTYPE_L4_FRAG. May be the design limitation is acceptable.
I have no strong opinion, but it must be clear for all that the
limitation exists.




  };

If both inner and outer L2/L3/L4 level protocol header split can be
supported by a PMD. Corresponding protocol header capability is
RTE_PTYPE_L2_ETHER, RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV6,
RTE_PTYPE_L4_TCP, RTE_PTYPE_L4_UDP, RTE_PTYPE_L4_SCTP,
RTE_PTYPE_INNER_L2_ETHER, RTE_PTYPE_INNER_L3_IPV4,
RTE_PTYPE_INNER_L3_IPV6, RTE_PTYPE_INNER_L4_TCP,

RTE_PTYPE_INNER_L4_UDP, RTE_PTYPE_INNER_L4_SCTP.

I think there is no point to list above defines here if it is not the only 
supported
defines.


Yes, since we use a API to return the protocol header driver supported to split,
there is no need to list the incomplete RTE_PTYPE* here. Please see next 
version.





For example, let's suppose we configured the Rx queue with the
following segments:
  seg0 - pool0, proto_hdr0=RTE_PTYPE_L3_IPV4, off0=2B
  seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B
  seg2 - pool2, off1=0B

The packet consists of MAC_IPV4_UDP_PAYLOAD will be split like
following:
  seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
  seg1 - udp header @ 128 in mbuf from pool1
  seg2 - payload @ 0 in mbuf from pool2


Sorry, but I still see no definition what should happen with, for example, ARP
packet with above config.


Thanks, because the following reply was not answered in v8,
the definition has not been added in v9 yet.

"
Our NIC only supports to split the packets into two segments,
so there will be an exact match for the only one protocol header configured. 
Back to this
question, for the set of proto_hdrs configured, it can have two behaviors:
1. The aggressive way is to split on longest match you mentioned, E.g. we 
configure split
on ETH-IPV4-TCP, when receives ETH-IPV4-UDP or ETH-IPV6, it can also split on 
ETH-IPV4
or ETH.
2. A more conservative way is to split only when the packets meet the all 
protocol headers
in the Rx packet segment. In the above situation, it will not do split for 
ETH-IPV4-UDP
and ETH-IPV6.

I prefer the second behavior, because the split is usually for the inner most 
header and
payload, if it does not meet, the rest of the h

Re: [PATCH v3] pcapng: fix write more packets than IOV_MAX limit

2022-08-01 Thread Stephen Hemminger
On Mon,  1 Aug 2022 10:40:56 +0200
Mário Kuka  wrote:

> The rte_pcapng_write_packets() function fails when we try to write more
> packets than the IOV_MAX limit. writev() system call is limited by the
> IOV_MAX limit. The iovcnt argument is valid if it is greater than 0 and
> less than or equal to IOV_MAX as defined in .
> 
> To avoid this problem, we can check that all segments of the next
> packet will fit into the iovec buffer, whose capacity will be limited
> by the IOV_MAX limit. If not, we flush the current iovec buffer to the
> file by calling writev() and, if successful, fit the current packet at
> the beginning of the flushed iovec buffer.
> 
> Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")
> Cc: step...@networkplumber.org
> 
> Signed-off-by: Mário Kuka 

Thanks for fixing this.

Acked-by: Stephen Hemminger 



Re: [RFC v3 23/26] dev: hide driver object

2022-08-01 Thread Ajit Khaparde
On Thu, Jul 28, 2022 at 10:00 AM Bruce Richardson
 wrote:
>
> On Thu, Jul 28, 2022 at 05:26:37PM +0200, David Marchand wrote:
> > Make rte_driver opaque for non internal users.
> > This will make extending this object possible without breaking the ABI.
> >
> > Introduce a new driver header and move rte_driver definition.
> > Update drivers and library to use the internal header.
> >
> > Some applications may have been dereferencing rte_driver objects, mark
> > this object's accessors as stable.
> >
> > Signed-off-by: David Marchand 
> > ---
> Acked-by: Bruce Richardson 
Acked-by: Ajit Khaparde 


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] app/testpmd: add command line argument 'rx-metadata'

2022-08-01 Thread Ivan Malov

Hi Hanumanth,

Thanks for expanding support for the NIC-to-PMD Rx metadata feature.
I do not object the idea of the patch, it looks aceeptable.
However, please find my comments below.

On Mon, 1 Aug 2022, Hanumanth Pothula wrote:


Presently, rx metadata is sent to PMD by default, leading
to a performance drop as processing for the same in rx path
takes extra cycles.

Hence, introducing command line argument, 'rx-metadata' to
control passing rx metadata to PMD. By default it’s disabled.

Signed-off-by: Hanumanth Pothula 
---
app/test-pmd/parameters.c | 4 
app/test-pmd/testpmd.c | 6 +-
app/test-pmd/testpmd.h | 2 ++
3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index e3c9757f3f..daf1218977 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -213,6 +213,7 @@ usage(char* progname)
printf(" --hairpin-mode=0xXX: bitmask set the hairpin port mode.\n"
" 0x10 - explicit Tx rule, 0x02 - hairpin ports paired\n"
" 0x01 - hairpin ports loop, 0x00 - hairpin port self\n");
+ printf(" --rx-metadata: send rx metadata to driver \n");


In accordance with the above "printf", one should have 2 space
characters between the opening double quote character and "--".
Also, it is quite common to capitalise "R" in "Rx".
The space character before "\n" seems redundant.

Regarding the option name, I would appreciate if it gets more
precise and clear. Consider: "--nic-to-pmd-rx-metadata".
I do not insist on this particular name.

Perhaps it pays to have a bit more verbose description as well.
Consider: "let the NIC deliver per-packet Rx metadata to PMD".


}
#ifdef RTE_LIB_CMDLINE
@@ -710,6 +711,7 @@ launch_args_parse(int argc, char** argv)
{ "record-burst-stats", 0, 0, 0 },
{ PARAM_NUM_PROCS, 1, 0, 0 },
{ PARAM_PROC_ID, 1, 0, 0 },
+ { "rx-metadata", 0, 0, 0 },
{ 0, 0, 0, 0 },
};
@@ -1510,6 +1512,8 @@ launch_args_parse(int argc, char** argv)
num_procs = atoi(optarg);
if (!strcmp(lgopts[opt_idx].name, PARAM_PROC_ID))
proc_id = atoi(optarg);
+ if (!strcmp(lgopts[opt_idx].name, "rx-metadata"))
+ rx_metadata_negotiate = 1;
break;
case 'h':
usage(argv[0]);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index addcbcac85..ebbde5dfc9 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -411,6 +411,9 @@ uint8_t clear_ptypes = true;
/* Hairpin ports configuration mode. */
uint16_t hairpin_mode;
+/* send rx metadata */


Consider: "/* Send Rx metadata */".


+uint8_t rx_metadata_negotiate;
+
/* Pretty printing of ethdev events */
static const char * const eth_event_desc[] = {
[RTE_ETH_EVENT_UNKNOWN] = "unknown",
@@ -1628,7 +1631,8 @@ init_config_port_offloads(portid_t pid, uint32_t 
socket_id)
int ret;
int i;
- eth_rx_metadata_negotiate_mp(pid);
+ if (rx_metadata_negotiate)
+ eth_rx_metadata_negotiate_mp(pid);
port->dev_conf.txmode = tx_mode;
port->dev_conf.rxmode = rx_mode;
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index fb2f5195d3..8a9168c51e 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -621,6 +621,8 @@ extern struct rte_ether_addr 
peer_eth_addrs[RTE_MAX_ETHPORTS];
extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for 
mac-retry. */
extern uint32_t burst_tx_retry_num; /**< Burst tx retry number for mac-retry. */
+extern uint8_t rx_metadata_negotiate;
+
#ifdef RTE_LIB_GRO
#define GRO_DEFAULT_ITEM_NUM_PER_FLOW 32
#define GRO_DEFAULT_FLOW_NUM (RTE_GRO_MAX_BURST_ITEM_NUM / \

--
2.25.1




--
Best regards,
Ivan M

[PATCH v2 0/2] vhost fixes for OVS SIGSEGV in PMD

2022-08-01 Thread Claudio Fontana
This is an alternative, more general fix compared with PATCH v1.

The series fixes a segmentation fault in the OVS PMD thread when
resynchronizing with QEMU after the guest application has been killed
with SIGKILL (patch 1/2),

The segmentation fault can be caused by the guest DPDK application,
which is able this way to crash the OVS process on the host,
see the backtrace in patch 1/2.

Patch 2/2 is an additional improvement in the current error handling.

---

Changes from v1:

* patch 1/2: instead of only fixing virtio_dev_tx_split, put the check
  for nr_vec == 0 inside desc_to_mbuf and mbuf_to_desc, so that in no
  case they attempt to read and dereference addresses from the buf_vec[]
  array when it does not contain any valid elements.

---

For your review and comments,

Claudio

Claudio Fontana (2):
  vhost: check for nr_vec == 0 in desc_to_mbuf, mbuf_to_desc
  vhost: improve error handling in desc_to_mbuf

 lib/vhost/virtio_net.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

-- 
2.26.2



[PATCH 1/2] vhost: check for nr_vec == 0 in desc_to_mbuf, mbuf_to_desc

2022-08-01 Thread Claudio Fontana
in virtio_dev_split we cannot currently call desc_to_mbuf with
nr_vec == 0, or we end up trying to rte_memcpy from a source address
buf_vec[0] that is an uninitialized stack variable.

Improve this in general by having desc_to_mbuf and mbuf_to_desc
return -1 when called with an invalid nr_vec == 0, which should
fix any other instance of this problem.

This should fix errors that have been reported in multiple occasions
from telcos to the DPDK, OVS and QEMU projects, as this affects in
particular the openvswitch/DPDK, QEMU vhost-user setup when the
guest DPDK application abruptly goes away via SIGKILL and then
reconnects.

The back trace looks roughly like this, depending on the specific
rte_memcpy selected, etc, in any case the "src" parameter is garbage
(in this example containing 0 + dev->host_hlen(12 = 0xc)).

Thread 153 "pmd-c88/id:150" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f64e5e6b700 (LWP 141373)]
rte_mov128blocks (n=2048, src=0xc ,
 dst=0x150da4480) at ../lib/eal/x86/include/rte_memcpy.h:384
(gdb) bt
0  rte_mov128blocks (n=2048, src=0xc, dst=0x150da4480)
1  rte_memcpy_generic (n=2048, src=0xc, dst=0x150da4480)
2  rte_memcpy (n=2048, src=0xc, dst=)
3  sync_fill_seg
4  desc_to_mbuf
5  virtio_dev_tx_split
6  virtio_dev_tx_split_legacy
7  0x7f676fea0fef in rte_vhost_dequeue_burst
8  0x7f6772005a62 in netdev_dpdk_vhost_rxq_recv
9  0x7f6771f38116 in netdev_rxq_recv
10 0x7f6771f03d96 in dp_netdev_process_rxq_port
11 0x7f6771f04239 in pmd_thread_main
12 0x7f6771f92aff in ovsthread_wrapper
13 0x7f6771c1b6ea in start_thread
14 0x7f6771933a8f in clone

Tested-by: Claudio Fontana 
Signed-off-by: Claudio Fontana 
---
 lib/vhost/virtio_net.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 35fa4670fd..8d0d223983 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -1153,7 +1153,7 @@ mbuf_to_desc(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL;
struct vhost_async *async = vq->async;
 
-   if (unlikely(m == NULL))
+   if (unlikely(m == NULL) || nr_vec == 0)
return -1;
 
buf_addr = buf_vec[vec_idx].buf_addr;
@@ -2673,6 +2673,9 @@ desc_to_mbuf(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
struct vhost_async *async = vq->async;
struct async_inflight_info *pkts_info;
 
+if (unlikely(nr_vec == 0)) {
+   return -1;
+}
buf_addr = buf_vec[vec_idx].buf_addr;
buf_iova = buf_vec[vec_idx].buf_iova;
buf_len = buf_vec[vec_idx].buf_len;
@@ -2917,9 +2920,11 @@ virtio_dev_tx_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
vq->last_avail_idx + i,
&nr_vec, buf_vec,
&head_idx, &buf_len,
-   VHOST_ACCESS_RO) < 0))
+   VHOST_ACCESS_RO) < 0)) {
+   dropped += 1;
+   i++;
break;
-
+   }
update_shadow_used_ring_split(vq, head_idx, 0);
 
err = virtio_dev_pktmbuf_prep(dev, pkts[i], buf_len);
-- 
2.26.2



[PATCH 2/2] vhost: improve error handling in desc_to_mbuf

2022-08-01 Thread Claudio Fontana
check when increasing vec_idx that it is still valid
in the (buf_len < dev->vhost_hlen) case too.

Signed-off-by: Claudio Fontana 
---
 lib/vhost/virtio_net.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 8d0d223983..229e484f2d 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2704,12 +2704,15 @@ desc_to_mbuf(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
if (unlikely(buf_len < dev->vhost_hlen)) {
buf_offset = dev->vhost_hlen - buf_len;
vec_idx++;
+   if (unlikely(vec_idx >= nr_vec))
+   goto error;
buf_addr = buf_vec[vec_idx].buf_addr;
buf_iova = buf_vec[vec_idx].buf_iova;
buf_len = buf_vec[vec_idx].buf_len;
buf_avail  = buf_len - buf_offset;
} else if (buf_len == dev->vhost_hlen) {
-   if (unlikely(++vec_idx >= nr_vec))
+   vec_idx++;
+   if (unlikely(vec_idx >= nr_vec))
goto error;
buf_addr = buf_vec[vec_idx].buf_addr;
buf_iova = buf_vec[vec_idx].buf_iova;
-- 
2.26.2



[PATCH v3 0/2] vhost fixes for OVS SIGSEGV in PMD

2022-08-01 Thread Claudio Fontana
This is an alternative, more general fix compared with PATCH v1,
and fixes style issues in v2.

The series fixes a segmentation fault in the OVS PMD thread when
resynchronizing with QEMU after the guest application has been killed
with SIGKILL (patch 1/2),

The segmentation fault can be caused by the guest DPDK application,
which is able this way to crash the OVS process on the host,
see the backtrace in patch 1/2.

Patch 2/2 is an additional improvement in the current error handling.

---
Changes from v2: fix warnings from checkpatch.
---

Changes from v1:

* patch 1/2: instead of only fixing virtio_dev_tx_split, put the check
  for nr_vec == 0 inside desc_to_mbuf and mbuf_to_desc, so that in no
  case they attempt to read and dereference addresses from the buf_vec[]
  array when it does not contain any valid elements.

---

For your review and comments,

Claudio

Claudio Fontana (2):
  vhost: check for nr_vec == 0 in desc_to_mbuf, mbuf_to_desc
  vhost: improve error handling in desc_to_mbuf

 lib/vhost/virtio_net.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

-- 
2.26.2



[PATCH v3 1/2] vhost: check for nr_vec == 0 in desc_to_mbuf, mbuf_to_desc

2022-08-01 Thread Claudio Fontana
in virtio_dev_split we cannot currently call desc_to_mbuf with
nr_vec == 0, or we end up trying to rte_memcpy from a source address
buf_vec[0] that is an uninitialized stack variable.

Improve this in general by having desc_to_mbuf and mbuf_to_desc
return -1 when called with an invalid nr_vec == 0, which should
fix any other instance of this problem.

This should fix errors that have been reported in multiple occasions
from telcos to the DPDK, OVS and QEMU projects, as this affects in
particular the openvswitch/DPDK, QEMU vhost-user setup when the
guest DPDK application abruptly goes away via SIGKILL and then
reconnects.

The back trace looks roughly like this, depending on the specific
rte_memcpy selected, etc, in any case the "src" parameter is garbage
(in this example containing 0 + dev->host_hlen(12 = 0xc)).

Thread 153 "pmd-c88/id:150" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f64e5e6b700 (LWP 141373)]
rte_mov128blocks (n=2048, src=0xc ,
 dst=0x150da4480) at ../lib/eal/x86/include/rte_memcpy.h:384
(gdb) bt
0  rte_mov128blocks (n=2048, src=0xc, dst=0x150da4480)
1  rte_memcpy_generic (n=2048, src=0xc, dst=0x150da4480)
2  rte_memcpy (n=2048, src=0xc, dst=)
3  sync_fill_seg
4  desc_to_mbuf
5  virtio_dev_tx_split
6  virtio_dev_tx_split_legacy
7  0x7f676fea0fef in rte_vhost_dequeue_burst
8  0x7f6772005a62 in netdev_dpdk_vhost_rxq_recv
9  0x7f6771f38116 in netdev_rxq_recv
10 0x7f6771f03d96 in dp_netdev_process_rxq_port
11 0x7f6771f04239 in pmd_thread_main
12 0x7f6771f92aff in ovsthread_wrapper
13 0x7f6771c1b6ea in start_thread
14 0x7f6771933a8f in clone

Tested-by: Claudio Fontana 
Signed-off-by: Claudio Fontana 
---
 lib/vhost/virtio_net.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 35fa4670fd..eb19e54c2b 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -1153,7 +1153,7 @@ mbuf_to_desc(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL;
struct vhost_async *async = vq->async;
 
-   if (unlikely(m == NULL))
+   if (unlikely(m == NULL) || nr_vec == 0)
return -1;
 
buf_addr = buf_vec[vec_idx].buf_addr;
@@ -2673,6 +2673,9 @@ desc_to_mbuf(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
struct vhost_async *async = vq->async;
struct async_inflight_info *pkts_info;
 
+   if (unlikely(nr_vec == 0))
+   return -1;
+
buf_addr = buf_vec[vec_idx].buf_addr;
buf_iova = buf_vec[vec_idx].buf_iova;
buf_len = buf_vec[vec_idx].buf_len;
@@ -2917,9 +2920,11 @@ virtio_dev_tx_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
vq->last_avail_idx + i,
&nr_vec, buf_vec,
&head_idx, &buf_len,
-   VHOST_ACCESS_RO) < 0))
+   VHOST_ACCESS_RO) < 0)) {
+   dropped += 1;
+   i++;
break;
-
+   }
update_shadow_used_ring_split(vq, head_idx, 0);
 
err = virtio_dev_pktmbuf_prep(dev, pkts[i], buf_len);
-- 
2.26.2



[PATCH v3 2/2] vhost: improve error handling in desc_to_mbuf

2022-08-01 Thread Claudio Fontana
check when increasing vec_idx that it is still valid
in the (buf_len < dev->vhost_hlen) case too.

Tested-by: Claudio Fontana 
Signed-off-by: Claudio Fontana 
---
 lib/vhost/virtio_net.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index eb19e54c2b..20ed951979 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -2704,12 +2704,15 @@ desc_to_mbuf(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
if (unlikely(buf_len < dev->vhost_hlen)) {
buf_offset = dev->vhost_hlen - buf_len;
vec_idx++;
+   if (unlikely(vec_idx >= nr_vec))
+   goto error;
buf_addr = buf_vec[vec_idx].buf_addr;
buf_iova = buf_vec[vec_idx].buf_iova;
buf_len = buf_vec[vec_idx].buf_len;
buf_avail  = buf_len - buf_offset;
} else if (buf_len == dev->vhost_hlen) {
-   if (unlikely(++vec_idx >= nr_vec))
+   vec_idx++;
+   if (unlikely(vec_idx >= nr_vec))
goto error;
buf_addr = buf_vec[vec_idx].buf_addr;
buf_iova = buf_vec[vec_idx].buf_iova;
-- 
2.26.2



Re: [PATCH v3 1/2] vhost: check for nr_vec == 0 in desc_to_mbuf, mbuf_to_desc

2022-08-01 Thread Stephen Hemminger
On Tue,  2 Aug 2022 02:49:37 +0200
Claudio Fontana  wrote:

> Tested-by: Claudio Fontana 
> Signed-off-by: Claudio Fontana 

Minor nit having both tags by same author is redundant and unnecessary.


RE: [PATCH v4] net/i40e: fix the issue caused by PF and VF release order

2022-08-01 Thread Zhang, Yuying
Hi Ke,

Please update commit log.
LGTM.

> -Original Message-
> From: Zhang, Ke1X 
> Sent: Friday, July 15, 2022 5:04 PM
> To: tho...@monjalon.net; Zhang, Yuying ; Xing,
> Beilei ; ferruh.yi...@xilinx.com; Zhou, YidingX
> ; dev@dpdk.org
> Cc: Zhang, Ke1X ; sta...@dpdk.org
> Subject: [PATCH v4] net/i40e: fix the issue caused by PF and VF release order
> 
> A segmentation fault occurs when testpmd exit.
> 
> This is due to fetching the device name from PF, PF is freed firstly and then 
> VF
> representor is called later.
> 
> This commit fixes the bug by fetching the device name from VF representor
> not PF.

instead of PF

> 
> Fixes: e391a7b7f815 ("net/i40e: fix multi-process shared data")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Ke Zhang 
Acked-by: Yuying Zhang 

> 
> ---
> v4: Update the commit log
> v3: Change the design and fix code in driver
> v2: Change the testpmd code to fix this issue
> ---
> ---
>  drivers/net/i40e/i40e_vf_representor.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/i40e/i40e_vf_representor.c
> b/drivers/net/i40e/i40e_vf_representor.c
> index 7f8e81858e..bcd445bcdd 100644
> --- a/drivers/net/i40e/i40e_vf_representor.c
> +++ b/drivers/net/i40e/i40e_vf_representor.c
> @@ -29,8 +29,6 @@ i40e_vf_representor_dev_infos_get(struct rte_eth_dev
> *ethdev,
>   struct rte_eth_dev_info *dev_info)
>  {
>   struct i40e_vf_representor *representor = ethdev->data-
> >dev_private;
> - struct rte_eth_dev_data *pf_dev_data =
> - representor->adapter->pf.dev_data;
> 
>   /* get dev info for the vdev */
>   dev_info->device = ethdev->device;
> @@ -104,7 +102,7 @@ i40e_vf_representor_dev_infos_get(struct
> rte_eth_dev *ethdev,
>   };
> 
>   dev_info->switch_info.name =
> - rte_eth_devices[pf_dev_data->port_id].device->name;
> + rte_eth_devices[ethdev->data->port_id].device->name;
>   dev_info->switch_info.domain_id = representor->switch_domain_id;
>   dev_info->switch_info.port_id = representor->vf_id;
> 
> --
> 2.25.1



Re: [PATCH] net/bnxt: fix null pointer dereference in bnxt_hwrm_port_led_cfg()

2022-08-01 Thread Somnath Kotur
On Tue, Aug 2, 2022 at 8:33 AM Mao YingMing  wrote:
>
Thanks for the patch
> From: maoyingming 
>
> VFs's "bp->leds" is allways null, check bp->leds is
> not null before use bp->leds->num_leds.
Typo in 'always'
You can just say ' For VFs , bp->leds is uninitialized'
>
> segfault backtrace in trex program when use VF:
> 11: bnxt_hwrm_port_led_cfg (bp=0x23ffb2140, led_on=true)
> 10: bnxt_dev_led_on_op (dev=0x22d7780 )
>  9: rte_eth_led_on (port_id=0)
>  8: DpdkTRexPortAttr::set_led (this=0x23b6ce0, on=true)
>  7: DpdkTRexPortAttr::DpdkTRexPortAttr
>  6: CTRexExtendedDriverBnxt::create_port_attr
>  5: CPhyEthIF::Create
>  4: CGlobalTRex::device_start
>  3: CGlobalTRex::Create
>  2: main_test
>  1: main
>
> Fixes: d4d5a04 ("net/bnxt: fix unnecessary memory allocation")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Mao YingMing 
> ---
>  drivers/net/bnxt/bnxt_hwrm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
> index 9c52573..41e6067 100644
> --- a/drivers/net/bnxt/bnxt_hwrm.c
> +++ b/drivers/net/bnxt/bnxt_hwrm.c
> @@ -4535,7 +4535,7 @@ int bnxt_hwrm_port_led_cfg(struct bnxt *bp, bool led_on)
> uint16_t duration = 0;
> int rc, i;
>
> -   if (!bp->leds->num_leds || BNXT_VF(bp))
> +   if (BNXT_VF(bp) || (!bp->leds) || (!bp->leds->num_leds))
> return -EOPNOTSUPP;
>
> HWRM_PREP(&req, HWRM_PORT_LED_CFG, BNXT_USE_CHIMP_MB);
> --
> 1.8.3.1
>


smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH] net/bnxt: fix null pointer dereference in bnxt_hwrm_port_led_cfg()

2022-08-01 Thread Mao YingMing
From: maoyingming 

VFs's "bp->leds" is allways null, check bp->leds is
not null before use bp->leds->num_leds.

segfault backtrace in trex program when use VF:
11: bnxt_hwrm_port_led_cfg (bp=0x23ffb2140, led_on=true)
10: bnxt_dev_led_on_op (dev=0x22d7780 )
 9: rte_eth_led_on (port_id=0)
 8: DpdkTRexPortAttr::set_led (this=0x23b6ce0, on=true)
 7: DpdkTRexPortAttr::DpdkTRexPortAttr
 6: CTRexExtendedDriverBnxt::create_port_attr
 5: CPhyEthIF::Create
 4: CGlobalTRex::device_start
 3: CGlobalTRex::Create
 2: main_test
 1: main

Fixes: d4d5a04 ("net/bnxt: fix unnecessary memory allocation")
Cc: sta...@dpdk.org

Signed-off-by: Mao YingMing 
---
 drivers/net/bnxt/bnxt_hwrm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 9c52573..41e6067 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -4535,7 +4535,7 @@ int bnxt_hwrm_port_led_cfg(struct bnxt *bp, bool led_on)
uint16_t duration = 0;
int rc, i;
 
-   if (!bp->leds->num_leds || BNXT_VF(bp))
+   if (BNXT_VF(bp) || (!bp->leds) || (!bp->leds->num_leds))
return -EOPNOTSUPP;
 
HWRM_PREP(&req, HWRM_PORT_LED_CFG, BNXT_USE_CHIMP_MB);
-- 
1.8.3.1



Re: 21.11.2 patches review and test

2022-08-01 Thread YangHang Liu
Hi Luca,

The dpdk 21.11.2-rc1 test result from Red Hat looks good.

We tested below 17 scenarios and all got PASS on RHEL8:

   - Guest with device assignment(PF) throughput testing(1G hugepage size):
   PASS
   - Guest with device assignment(PF) throughput testing(2M hugepage size)
   : PASS
   - Guest with device assignment(VF) throughput testing: PASS
   - PVP (host dpdk testpmd as vswitch) 1Q: throughput testing: PASS
   - PVP vhost-user 2Q throughput testing: PASS
   - PVP vhost-user 1Q - cross numa node throughput testing: PASS
   - Guest with vhost-user 2 queues throughput testing: PASS
   - vhost-user reconnect with dpdk-client, qemu-server: qemu reconnect:
   PASS
   - vhost-user reconnect with dpdk-client, qemu-server: ovs reconnect: PASS
   - PVP 1Q live migration testing: PASS
   - PVP 1Q cross numa node live migration testing: PASS
   - Guest with ovs+dpdk+vhost-user 1Q live migration testing: PASS
   - Guest with ovs+dpdk+vhost-user 1Q live migration testing (2M): PASS
   - Guest with ovs+dpdk+vhost-user 2Q live migration testing: PASS
   - Guest with ovs+dpdk+vhost-user 4Q live migration testing: PASS
   - Host PF + DPDK testing: PASS
   - Host VF + DPDK testing: PASS


Versions:

   - kernel 4.18
   - qemu 6.2
   - dpdk: git://dpdk.org/dpdk-stable (remotes/origin/21.11)
   - # git log
  Author: Luca Boccassi 
  Date:   Mon Jul 18 09:56:26 2022 +0100
  version: 21.11.2-rc1
  Signed-off-by: Luca Boccassi 
   - NICs: X540-AT2 NIC(ixgbe, 10G)




Best Regards,
YangHang Liu


On Thu, Jul 28, 2022 at 8:34 PM Luca Boccassi  wrote:

> On Mon, 2022-07-18 at 10:58 +0100, luca.bocca...@gmail.com wrote:
> > Hi all,
> >
> > Here is a list of patches targeted for stable release 21.11.2.
> >
> > The planned date for the final release is August 29th.
> >
> > Please help with testing and validation of your use cases and report
> > any issues/results with reply-all to this mail. For the final release
> > the fixes and reported validations will be added to the release notes.
> >
> > A release candidate tarball can be found at:
> >
> > https://dpdk.org/browse/dpdk-stable/tag/?id=v21.11.2-rc1
> >
> > These patches are located at branch 21.11 of dpdk-stable repo:
> > https://dpdk.org/browse/dpdk-stable/
> >
> > Thanks.
> >
> > Luca Boccassi
>
> Hello,
>
> Any update from any of the validation teams? Any indication on how the
> tests are going?
>
> --
> Kind regards,
> Luca Boccassi
>
>


[PATCHv2] net/bnxt: fix null pointer dereference in bnxt_hwrm_port_led_cfg()

2022-08-01 Thread Mao YingMing
For VFs, bp->leds is uninitialized, check bp->leds is
not null before use bp->leds->num_leds.

segfault backtrace in trex program when use VF:
11: bnxt_hwrm_port_led_cfg (bp=0x23ffb2140, led_on=true)
10: bnxt_dev_led_on_op (dev=0x22d7780 )
 9: rte_eth_led_on (port_id=0)
 8: DpdkTRexPortAttr::set_led (this=0x23b6ce0, on=true)
 7: DpdkTRexPortAttr::DpdkTRexPortAttr
 6: CTRexExtendedDriverBnxt::create_port_attr
 5: CPhyEthIF::Create
 4: CGlobalTRex::device_start
 3: CGlobalTRex::Create
 2: main_test
 1: main

Fixes: d4d5a04 ("net/bnxt: fix unnecessary memory allocation")
Cc: sta...@dpdk.org

Signed-off-by: Mao YingMing 
---
 drivers/net/bnxt/bnxt_hwrm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 9c52573..51e1e2d 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -4535,7 +4535,7 @@ int bnxt_hwrm_port_led_cfg(struct bnxt *bp, bool led_on)
uint16_t duration = 0;
int rc, i;
 
-   if (!bp->leds->num_leds || BNXT_VF(bp))
+   if (BNXT_VF(bp) || !bp->leds || !bp->leds->num_leds)
return -EOPNOTSUPP;
 
HWRM_PREP(&req, HWRM_PORT_LED_CFG, BNXT_USE_CHIMP_MB);
-- 
1.8.3.1



Re: [PATCHv2] net/bnxt: fix null pointer dereference in bnxt_hwrm_port_led_cfg()

2022-08-01 Thread Somnath Kotur
On Tue, Aug 2, 2022 at 9:17 AM Mao YingMing  wrote:
>
> For VFs, bp->leds is uninitialized, check bp->leds is
> not null before use bp->leds->num_leds.
Sorry for missing this the first time around, Rephrase it to say
'before checking for bp->leds->num_leds'
> segfault backtrace in trex program when use VF:
> 11: bnxt_hwrm_port_led_cfg (bp=0x23ffb2140, led_on=true)
> 10: bnxt_dev_led_on_op (dev=0x22d7780 )
>  9: rte_eth_led_on (port_id=0)
>  8: DpdkTRexPortAttr::set_led (this=0x23b6ce0, on=true)
>  7: DpdkTRexPortAttr::DpdkTRexPortAttr
>  6: CTRexExtendedDriverBnxt::create_port_attr
>  5: CPhyEthIF::Create
>  4: CGlobalTRex::device_start
>  3: CGlobalTRex::Create
>  2: main_test
>  1: main
>
> Fixes: d4d5a04 ("net/bnxt: fix unnecessary memory allocation")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Mao YingMing 
> ---
>  drivers/net/bnxt/bnxt_hwrm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
> index 9c52573..51e1e2d 100644
> --- a/drivers/net/bnxt/bnxt_hwrm.c
> +++ b/drivers/net/bnxt/bnxt_hwrm.c
> @@ -4535,7 +4535,7 @@ int bnxt_hwrm_port_led_cfg(struct bnxt *bp, bool led_on)
> uint16_t duration = 0;
> int rc, i;
>
> -   if (!bp->leds->num_leds || BNXT_VF(bp))
> +   if (BNXT_VF(bp) || !bp->leds || !bp->leds->num_leds)
> return -EOPNOTSUPP;
>
> HWRM_PREP(&req, HWRM_PORT_LED_CFG, BNXT_USE_CHIMP_MB);
> --
> 1.8.3.1
>


smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH v3] net/bnxt: fix null pointer dereference in bnxt_hwrm_port_led_cfg()

2022-08-01 Thread Mao YingMing
For VFs, bp->leds is uninitialized, check bp->leds is
not null before checking for bp->leds->num_leds.

segfault backtrace in trex program when use VF:
11: bnxt_hwrm_port_led_cfg (bp=0x23ffb2140, led_on=true)
10: bnxt_dev_led_on_op (dev=0x22d7780 )
 9: rte_eth_led_on (port_id=0)
 8: DpdkTRexPortAttr::set_led (this=0x23b6ce0, on=true)
 7: DpdkTRexPortAttr::DpdkTRexPortAttr
 6: CTRexExtendedDriverBnxt::create_port_attr
 5: CPhyEthIF::Create
 4: CGlobalTRex::device_start
 3: CGlobalTRex::Create
 2: main_test
 1: main

Fixes: d4d5a04 ("net/bnxt: fix unnecessary memory allocation")
Cc: sta...@dpdk.org

Signed-off-by: Mao YingMing 
---
 drivers/net/bnxt/bnxt_hwrm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 9c52573..51e1e2d 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -4535,7 +4535,7 @@ int bnxt_hwrm_port_led_cfg(struct bnxt *bp, bool led_on)
uint16_t duration = 0;
int rc, i;
 
-   if (!bp->leds->num_leds || BNXT_VF(bp))
+   if (BNXT_VF(bp) || !bp->leds || !bp->leds->num_leds)
return -EOPNOTSUPP;
 
HWRM_PREP(&req, HWRM_PORT_LED_CFG, BNXT_USE_CHIMP_MB);
-- 
1.8.3.1



Re: [PATCH v3] net/bnxt: fix null pointer dereference in bnxt_hwrm_port_led_cfg()

2022-08-01 Thread Somnath Kotur
On Tue, Aug 2, 2022 at 9:38 AM Mao YingMing  wrote:
>
> For VFs, bp->leds is uninitialized, check bp->leds is
> not null before checking for bp->leds->num_leds.
>
> segfault backtrace in trex program when use VF:
> 11: bnxt_hwrm_port_led_cfg (bp=0x23ffb2140, led_on=true)
> 10: bnxt_dev_led_on_op (dev=0x22d7780 )
>  9: rte_eth_led_on (port_id=0)
>  8: DpdkTRexPortAttr::set_led (this=0x23b6ce0, on=true)
>  7: DpdkTRexPortAttr::DpdkTRexPortAttr
>  6: CTRexExtendedDriverBnxt::create_port_attr
>  5: CPhyEthIF::Create
>  4: CGlobalTRex::device_start
>  3: CGlobalTRex::Create
>  2: main_test
>  1: main
>
> Fixes: d4d5a04 ("net/bnxt: fix unnecessary memory allocation")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Mao YingMing 
> ---
>  drivers/net/bnxt/bnxt_hwrm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
> index 9c52573..51e1e2d 100644
> --- a/drivers/net/bnxt/bnxt_hwrm.c
> +++ b/drivers/net/bnxt/bnxt_hwrm.c
> @@ -4535,7 +4535,7 @@ int bnxt_hwrm_port_led_cfg(struct bnxt *bp, bool led_on)
> uint16_t duration = 0;
> int rc, i;
>
> -   if (!bp->leds->num_leds || BNXT_VF(bp))
> +   if (BNXT_VF(bp) || !bp->leds || !bp->leds->num_leds)
> return -EOPNOTSUPP;
>
> HWRM_PREP(&req, HWRM_PORT_LED_CFG, BNXT_USE_CHIMP_MB);
> --
> 1.8.3.1
>
Acked-by: Somnath Kotur 


smime.p7s
Description: S/MIME Cryptographic Signature


答复: [PATCH v3] net/bnxt: fix null pointer dereference in bnxt_hwrm_port_led_cfg()

2022-08-01 Thread Mao,Yingming
Thans for your review.
Mao Yingming

-邮件原件-
发件人: Somnath Kotur  
发送时间: 2022年8月2日 12:09
收件人: Mao,Yingming 
抄送: dev@dpdk.org; Ajit Khaparde ; Kalesh AP 
; Thomas Monjalon 
主题: Re: [PATCH v3] net/bnxt: fix null pointer dereference in 
bnxt_hwrm_port_led_cfg()

On Tue, Aug 2, 2022 at 9:38 AM Mao YingMing  wrote:
>
> For VFs, bp->leds is uninitialized, check bp->leds is
> not null before checking for bp->leds->num_leds.
>
> segfault backtrace in trex program when use VF:
> 11: bnxt_hwrm_port_led_cfg (bp=0x23ffb2140, led_on=true)
> 10: bnxt_dev_led_on_op (dev=0x22d7780 )
>  9: rte_eth_led_on (port_id=0)
>  8: DpdkTRexPortAttr::set_led (this=0x23b6ce0, on=true)
>  7: DpdkTRexPortAttr::DpdkTRexPortAttr
>  6: CTRexExtendedDriverBnxt::create_port_attr
>  5: CPhyEthIF::Create
>  4: CGlobalTRex::device_start
>  3: CGlobalTRex::Create
>  2: main_test
>  1: main
>
> Fixes: d4d5a04 ("net/bnxt: fix unnecessary memory allocation")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Mao YingMing 
> ---
>  drivers/net/bnxt/bnxt_hwrm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
> index 9c52573..51e1e2d 100644
> --- a/drivers/net/bnxt/bnxt_hwrm.c
> +++ b/drivers/net/bnxt/bnxt_hwrm.c
> @@ -4535,7 +4535,7 @@ int bnxt_hwrm_port_led_cfg(struct bnxt *bp, bool led_on)
> uint16_t duration = 0;
> int rc, i;
>
> -   if (!bp->leds->num_leds || BNXT_VF(bp))
> +   if (BNXT_VF(bp) || !bp->leds || !bp->leds->num_leds)
> return -EOPNOTSUPP;
>
> HWRM_PREP(&req, HWRM_PORT_LED_CFG, BNXT_USE_CHIMP_MB);
> --
> 1.8.3.1
>
Acked-by: Somnath Kotur