Re: [dpdk-dev] [PATCH] bus/vmbus: fix wrong allocation for device.name

2020-05-29 Thread wangyunjian



> -Original Message-
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Friday, May 29, 2020 12:13 AM
> To: wangyunjian 
> Cc: dev@dpdk.org; sthem...@microsoft.com; Lilijun (Jerry)
> ; xudingke ;
> sta...@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] bus/vmbus: fix wrong allocation for
> device.name
> 
> On Thu, 28 May 2020 20:03:07 +0800
> wangyunjian  wrote:
> 
> > From: Yunjian Wang 
> >
> > We do not need and should not allocate memory for device.name.
> > The device.name should be set point to the devargs->name.
> >
> > Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Yunjian Wang 
> > ---
> >  drivers/bus/vmbus/linux/vmbus_bus.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c
> b/drivers/bus/vmbus/linux/vmbus_bus.c
> > index 3c924ee..31d0dd3 100644
> > --- a/drivers/bus/vmbus/linux/vmbus_bus.c
> > +++ b/drivers/bus/vmbus/linux/vmbus_bus.c
> > @@ -242,9 +242,6 @@
> > return -1;
> >
> > dev->device.bus = &rte_vmbus_bus.bus;
> > -   dev->device.name = strdup(name);
> > -   if (!dev->device.name)
> > -   goto error;
> >
> > /* sysfs base directory
> >  *   /sys/bus/vmbus/devices/7a08391f-f5a0-4ac0-9802-d13fd964f8df
> > @@ -296,6 +293,7 @@
> > }
> >
> > dev->device.devargs = vmbus_devargs_lookup(dev);
> > +   dev->device.name = dev->device.devargs->name;
> >
> > /* device is valid, add in list (sorted) */
> > VMBUS_LOG(DEBUG, "Adding vmbus device %s", name);
> 
> This doesn't seem right. devargs is not filled in unless devargs is used.

At present, the memory allocated for the device.name is not released
in the error handling code. I have not found the relevant code to release
the vmbus device, so I am not sure how to release it corrently.

Generally, the pointer of device.name should be set to another pointer.
However, it was defined as "const" pointer and could not be released directly.

Thanks,
Yunjian


Re: [dpdk-dev] [PATCH v5 04/11] eal/mem: extract common code for memseg list initialization

2020-05-29 Thread Burakov, Anatoly

On 28-May-20 3:41 PM, Dmitry Kozlyuk wrote:

On Thu, 28 May 2020 12:46:49 +0100
"Burakov, Anatoly"  wrote:


On 25-May-20 1:37 AM, Dmitry Kozlyuk wrote:

All supported OS create memory segment lists (MSL) and reserve VA space
for them in a nearly identical way. Move common code into EAL private
functions to reduce duplication.

Signed-off-by: Dmitry Kozlyuk 
---





+eal_memseg_list_alloc(struct rte_memseg_list *msl, int reserve_flags)
+{
+   uint64_t page_sz;
+   size_t mem_sz;
+   void *addr;
+
+   page_sz = msl->page_sz;
+   mem_sz = page_sz * msl->memseg_arr.len;
+
+   addr = eal_get_virtual_area(
+   msl->base_va, &mem_sz, page_sz, 0, reserve_flags);
+   if (addr == NULL) {
+   if (rte_errno == EADDRNOTAVAIL)
+   RTE_LOG(ERR, EAL, "Cannot reserve %llu bytes at [%p] - "
+   "please use '--" OPT_BASE_VIRTADDR "' option\n",
+   (unsigned long long)mem_sz, msl->base_va);


Do all OS's support this EAL option?


Supported, yes; meaningful, not quite: for Windows, we start with address 0
(let the OS choose) and using the option can hardly help. Probably Linux and
FreeBSD EALs should print this hint. For Windows, we can leave the option,
but not print misleading hint.



We keep rte_errno when exiting this function, do we not? How about we 
just check it in the caller? It's a bit of a hack, but will avoid 
OS-specific #ifdef-ery in the common code. Not sure if it's worth it 
though, over just having an #idef :) Up to you!


--
Thanks,
Anatoly


[dpdk-dev] [PATCH v2] net/axgbe: add RSS reta/hash query and update

2020-05-29 Thread chandu
From: Chandu Babu N 

add support for RSS reta/hash query and update function

v2:
modification as per Andrew Rybchenko review comments

Signed-off-by: Chandu Babu N 
---
 doc/guides/nics/features/axgbe.ini |   2 +
 drivers/net/axgbe/axgbe_dev.c  |   5 +-
 drivers/net/axgbe/axgbe_ethdev.c   | 144 +
 drivers/net/axgbe/axgbe_ethdev.h   |   3 +
 4 files changed, 152 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/features/axgbe.ini 
b/doc/guides/nics/features/axgbe.ini
index 0becaa097..34df0d1ee 100644
--- a/doc/guides/nics/features/axgbe.ini
+++ b/doc/guides/nics/features/axgbe.ini
@@ -11,6 +11,8 @@ Scattered Rx = Y
 Promiscuous mode = Y
 Allmulticast mode= Y
 RSS hash = Y
+RSS key update   = Y
+RSS reta update  = Y
 CRC offload  = Y
 L3 checksum offload  = Y
 L4 checksum offload  = Y
diff --git a/drivers/net/axgbe/axgbe_dev.c b/drivers/net/axgbe/axgbe_dev.c
index 5f0f19592..af62eae3b 100644
--- a/drivers/net/axgbe/axgbe_dev.c
+++ b/drivers/net/axgbe/axgbe_dev.c
@@ -614,7 +614,7 @@ static int axgbe_write_rss_reg(struct axgbe_port *pdata, 
unsigned int type,
return -EBUSY;
 }
 
-static int axgbe_write_rss_hash_key(struct axgbe_port *pdata)
+int axgbe_write_rss_hash_key(struct axgbe_port *pdata)
 {
struct rte_eth_rss_conf *rss_conf;
unsigned int key_regs = sizeof(pdata->rss_key) / sizeof(u32);
@@ -638,7 +638,7 @@ static int axgbe_write_rss_hash_key(struct axgbe_port 
*pdata)
return 0;
 }
 
-static int axgbe_write_rss_lookup_table(struct axgbe_port *pdata)
+int axgbe_write_rss_lookup_table(struct axgbe_port *pdata)
 {
unsigned int i;
int ret;
@@ -683,6 +683,7 @@ static void axgbe_rss_options(struct axgbe_port *pdata)
uint64_t rss_hf;
 
rss_conf = &pdata->eth_dev->data->dev_conf.rx_adv_conf.rss_conf;
+   pdata->rss_hf = rss_conf->rss_hf;
rss_hf = rss_conf->rss_hf;
 
if (rss_hf & (ETH_RSS_IPV4 | ETH_RSS_IPV6))
diff --git a/drivers/net/axgbe/axgbe_ethdev.c b/drivers/net/axgbe/axgbe_ethdev.c
index 867058845..619250fb0 100644
--- a/drivers/net/axgbe/axgbe_ethdev.c
+++ b/drivers/net/axgbe/axgbe_ethdev.c
@@ -60,6 +60,16 @@ axgbe_dev_xstats_get_names_by_id(struct rte_eth_dev *dev,
 const uint64_t *ids,
 unsigned int size);
 static int axgbe_dev_xstats_reset(struct rte_eth_dev *dev);
+static int axgbe_dev_rss_reta_update(struct rte_eth_dev *dev,
+ struct rte_eth_rss_reta_entry64 *reta_conf,
+ uint16_t reta_size);
+static int axgbe_dev_rss_reta_query(struct rte_eth_dev *dev,
+struct rte_eth_rss_reta_entry64 *reta_conf,
+uint16_t reta_size);
+static int axgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
+struct rte_eth_rss_conf *rss_conf);
+static int axgbe_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
+  struct rte_eth_rss_conf *rss_conf);
 static int  axgbe_dev_info_get(struct rte_eth_dev *dev,
   struct rte_eth_dev_info *dev_info);
 static int axgbe_flow_ctrl_get(struct rte_eth_dev *dev,
@@ -201,6 +211,10 @@ static const struct eth_dev_ops axgbe_eth_dev_ops = {
.xstats_get_names = axgbe_dev_xstats_get_names,
.xstats_get_names_by_id = axgbe_dev_xstats_get_names_by_id,
.xstats_get_by_id = axgbe_dev_xstats_get_by_id,
+   .reta_update  = axgbe_dev_rss_reta_update,
+   .reta_query   = axgbe_dev_rss_reta_query,
+   .rss_hash_update  = axgbe_dev_rss_hash_update,
+   .rss_hash_conf_get= axgbe_dev_rss_hash_conf_get,
.dev_infos_get= axgbe_dev_info_get,
.rx_queue_setup   = axgbe_dev_rx_queue_setup,
.rx_queue_release = axgbe_dev_rx_queue_release,
@@ -450,6 +464,136 @@ axgbe_dev_mac_addr_add(struct rte_eth_dev *dev, struct 
rte_ether_addr *mac_addr,
return 0;
 }
 
+static int
+axgbe_dev_rss_reta_update(struct rte_eth_dev *dev,
+ struct rte_eth_rss_reta_entry64 *reta_conf,
+ uint16_t reta_size)
+{
+   struct axgbe_port *pdata = dev->data->dev_private;
+   unsigned int i, idx, shift;
+   int ret;
+
+   if (!pdata->rss_enable) {
+   PMD_DRV_LOG(ERR, "RSS not enabled\n");
+   return -ENOTSUP;
+   }
+
+   if (reta_size == 0 || reta_size > AXGBE_RSS_MAX_TABLE_SIZE) {
+   PMD_DRV_LOG(ERR, "reta_size %d is not supported\n", reta_size);
+   return -EINVAL;
+   }
+
+   for (i = 0; i < reta_size; i++) {
+   idx = i / RTE_RETA_GROUP_SIZE;
+   shift = i % RTE_RETA_GROUP_SIZE;
+   if ((reta_conf[idx].mask & (1ULL << shift)) == 0)
+   continue;
+   pdata->rss_table[i] = reta_conf[idx].reta[shift];
+   }
+
+ 

[dpdk-dev] [PATCH v2] net/axgbe: add RSS reta/hash query and update

2020-05-29 Thread chandu
From: Chandu Babu N 

add support for RSS reta/hash query and update function

v2:
modification as per Andrew Rybchenko review comments

Signed-off-by: Chandu Babu N 
---
 doc/guides/nics/features/axgbe.ini |   2 +
 drivers/net/axgbe/axgbe_dev.c  |   5 +-
 drivers/net/axgbe/axgbe_ethdev.c   | 144 +
 drivers/net/axgbe/axgbe_ethdev.h   |   3 +
 4 files changed, 152 insertions(+), 2 deletions(-)

diff --git a/doc/guides/nics/features/axgbe.ini 
b/doc/guides/nics/features/axgbe.ini
index 0becaa097..34df0d1ee 100644
--- a/doc/guides/nics/features/axgbe.ini
+++ b/doc/guides/nics/features/axgbe.ini
@@ -11,6 +11,8 @@ Scattered Rx = Y
 Promiscuous mode = Y
 Allmulticast mode= Y
 RSS hash = Y
+RSS key update   = Y
+RSS reta update  = Y
 CRC offload  = Y
 L3 checksum offload  = Y
 L4 checksum offload  = Y
diff --git a/drivers/net/axgbe/axgbe_dev.c b/drivers/net/axgbe/axgbe_dev.c
index 5f0f19592..af62eae3b 100644
--- a/drivers/net/axgbe/axgbe_dev.c
+++ b/drivers/net/axgbe/axgbe_dev.c
@@ -614,7 +614,7 @@ static int axgbe_write_rss_reg(struct axgbe_port *pdata, 
unsigned int type,
return -EBUSY;
 }
 
-static int axgbe_write_rss_hash_key(struct axgbe_port *pdata)
+int axgbe_write_rss_hash_key(struct axgbe_port *pdata)
 {
struct rte_eth_rss_conf *rss_conf;
unsigned int key_regs = sizeof(pdata->rss_key) / sizeof(u32);
@@ -638,7 +638,7 @@ static int axgbe_write_rss_hash_key(struct axgbe_port 
*pdata)
return 0;
 }
 
-static int axgbe_write_rss_lookup_table(struct axgbe_port *pdata)
+int axgbe_write_rss_lookup_table(struct axgbe_port *pdata)
 {
unsigned int i;
int ret;
@@ -683,6 +683,7 @@ static void axgbe_rss_options(struct axgbe_port *pdata)
uint64_t rss_hf;
 
rss_conf = &pdata->eth_dev->data->dev_conf.rx_adv_conf.rss_conf;
+   pdata->rss_hf = rss_conf->rss_hf;
rss_hf = rss_conf->rss_hf;
 
if (rss_hf & (ETH_RSS_IPV4 | ETH_RSS_IPV6))
diff --git a/drivers/net/axgbe/axgbe_ethdev.c b/drivers/net/axgbe/axgbe_ethdev.c
index 867058845..619250fb0 100644
--- a/drivers/net/axgbe/axgbe_ethdev.c
+++ b/drivers/net/axgbe/axgbe_ethdev.c
@@ -60,6 +60,16 @@ axgbe_dev_xstats_get_names_by_id(struct rte_eth_dev *dev,
 const uint64_t *ids,
 unsigned int size);
 static int axgbe_dev_xstats_reset(struct rte_eth_dev *dev);
+static int axgbe_dev_rss_reta_update(struct rte_eth_dev *dev,
+ struct rte_eth_rss_reta_entry64 *reta_conf,
+ uint16_t reta_size);
+static int axgbe_dev_rss_reta_query(struct rte_eth_dev *dev,
+struct rte_eth_rss_reta_entry64 *reta_conf,
+uint16_t reta_size);
+static int axgbe_dev_rss_hash_update(struct rte_eth_dev *dev,
+struct rte_eth_rss_conf *rss_conf);
+static int axgbe_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
+  struct rte_eth_rss_conf *rss_conf);
 static int  axgbe_dev_info_get(struct rte_eth_dev *dev,
   struct rte_eth_dev_info *dev_info);
 static int axgbe_flow_ctrl_get(struct rte_eth_dev *dev,
@@ -201,6 +211,10 @@ static const struct eth_dev_ops axgbe_eth_dev_ops = {
.xstats_get_names = axgbe_dev_xstats_get_names,
.xstats_get_names_by_id = axgbe_dev_xstats_get_names_by_id,
.xstats_get_by_id = axgbe_dev_xstats_get_by_id,
+   .reta_update  = axgbe_dev_rss_reta_update,
+   .reta_query   = axgbe_dev_rss_reta_query,
+   .rss_hash_update  = axgbe_dev_rss_hash_update,
+   .rss_hash_conf_get= axgbe_dev_rss_hash_conf_get,
.dev_infos_get= axgbe_dev_info_get,
.rx_queue_setup   = axgbe_dev_rx_queue_setup,
.rx_queue_release = axgbe_dev_rx_queue_release,
@@ -450,6 +464,136 @@ axgbe_dev_mac_addr_add(struct rte_eth_dev *dev, struct 
rte_ether_addr *mac_addr,
return 0;
 }
 
+static int
+axgbe_dev_rss_reta_update(struct rte_eth_dev *dev,
+ struct rte_eth_rss_reta_entry64 *reta_conf,
+ uint16_t reta_size)
+{
+   struct axgbe_port *pdata = dev->data->dev_private;
+   unsigned int i, idx, shift;
+   int ret;
+
+   if (!pdata->rss_enable) {
+   PMD_DRV_LOG(ERR, "RSS not enabled\n");
+   return -ENOTSUP;
+   }
+
+   if (reta_size == 0 || reta_size > AXGBE_RSS_MAX_TABLE_SIZE) {
+   PMD_DRV_LOG(ERR, "reta_size %d is not supported\n", reta_size);
+   return -EINVAL;
+   }
+
+   for (i = 0; i < reta_size; i++) {
+   idx = i / RTE_RETA_GROUP_SIZE;
+   shift = i % RTE_RETA_GROUP_SIZE;
+   if ((reta_conf[idx].mask & (1ULL << shift)) == 0)
+   continue;
+   pdata->rss_table[i] = reta_conf[idx].reta[shift];
+   }
+
+ 

[dpdk-dev] Marvell DPDK v20.08 Roadmap

2020-05-29 Thread Jerin Kollanukkaran


General:

1)  hash: unify crc32 API header for x86 and ARM
http://patches.dpdk.org/patch/70132/

2) rte_log registration simplification using constructor scheme
http://mails.dpdk.org/archives/dev/2020-May/168874.html

3) PCI probe optimization by removing RTE_KDRV_NONE based probe
http://mails.dpdk.org/archives/dev/2020-May/168656.html

Networking:
~~
4) add Tx offloads for packet marking
http://patches.dpdk.org/patch/68733/

5) IF Proxy library
http://patches.dpdk.org/patch/69690/

6) Add l2fwd support for forwarding between asymmetric ports.
http://patches.dpdk.org/patch/69405/

7) test: add reassembly performance test
http://patches.dpdk.org/patch/67756/

Cryptography:

8) examples/ipsec-secgw: add per core packet stats
http://patches.dpdk.org/patch/70193/

9) examples/ipsec-secgw: Introduce "flow" rules to allow packet distribution 
using rte_flow APIs

Eventdev:

10) doc: add eventdev feature matrix
http://patches.dpdk.org/patch/66084/

Regexdev:
~
11) Review regexdev: introduce regexdev subsystem
http://patches.dpdk.org/patch/69920/

octeontx2 network PMD:
~~
12) Add devargs for locking the NDC context for NIX.
http://patches.dpdk.org/patch/67489/

13) Add packet mirroring support

octeotx2 crypto PMD:
~~~
14) Lookaside protocol support
15) ChaChaPoly support  

octeontx2 eventdev PMD:
~~~
16) Performance improvement

Qede network PMD:
~
17) Debug facilities: comprehensive debug data collection for problem/issues 
analysis
18) SRIOV-PF support 


Re: [dpdk-dev] [PATCH 3/3] l3fwd-power: add interrupt-only mode

2020-05-29 Thread Harman Kalra
On Thu, May 28, 2020 at 10:13:51AM +0100, Anatoly Burakov wrote:
> In addition to existing modes, add a mode which is very similar to
> legacy mode, but does not do frequency scaling, and thus does not
> depend on the power library.
> 
> Signed-off-by: Anatoly Burakov 
> ---
>  examples/l3fwd-power/main.c | 215 +---
>  1 file changed, 202 insertions(+), 13 deletions(-)
> 
> diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
> index 5cee9d5387..4161e01974 100644
> --- a/examples/l3fwd-power/main.c
> +++ b/examples/l3fwd-power/main.c
> @@ -195,9 +195,11 @@ static int parse_ptype; /**< Parse packet type using rx 
> callback, and */
>   /**< disabled by default */
>  
>  enum appmode {
> - APP_MODE_LEGACY = 0,
> + APP_MODE_DEFAULT = 0,
> + APP_MODE_LEGACY,
>   APP_MODE_EMPTY_POLL,
> - APP_MODE_TELEMETRY
> + APP_MODE_TELEMETRY,
> + APP_MODE_INTERRUPT
>  };
>  
>  enum appmode app_mode;
> @@ -900,6 +902,170 @@ static int event_register(struct lcore_conf *qconf)
>  
>   return 0;
>  }
> +
> +/* main processing loop */
> +static int main_intr_loop(__rte_unused void *dummy)
> +{
> + struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
> + unsigned int lcore_id;
> + uint64_t prev_tsc, diff_tsc, cur_tsc;
> + int i, j, nb_rx;
> + uint8_t queueid;
> + uint16_t portid;
> + struct lcore_conf *qconf;
> + struct lcore_rx_queue *rx_queue;
> + uint32_t lcore_rx_idle_count = 0;
> + uint32_t lcore_idle_hint = 0;
> + int intr_en = 0;
> +
> + const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1) /
> +US_PER_S * BURST_TX_DRAIN_US;
> +
> + prev_tsc = 0;
> +
> + lcore_id = rte_lcore_id();
> + qconf = &lcore_conf[lcore_id];
> +
> + if (qconf->n_rx_queue == 0) {
> + RTE_LOG(INFO, L3FWD_POWER, "lcore %u has nothing to do\n",
> + lcore_id);
> + return 0;
> + }
> +
> + RTE_LOG(INFO, L3FWD_POWER, "entering main interrupt loop on lcore %u\n",
> + lcore_id);
> +
> + for (i = 0; i < qconf->n_rx_queue; i++) {
> + portid = qconf->rx_queue_list[i].port_id;
> + queueid = qconf->rx_queue_list[i].queue_id;
> + RTE_LOG(INFO, L3FWD_POWER,
> + " -- lcoreid=%u portid=%u rxqueueid=%hhu\n",
> + lcore_id, portid, queueid);
> + }
> +
> + /* add into event wait list */
> + if (event_register(qconf) == 0)
> + intr_en = 1;
> + else
> + RTE_LOG(INFO, L3FWD_POWER, "RX interrupt won't enable.\n");
> +
> + while (!is_done()) {
> + stats[lcore_id].nb_iteration_looped++;
> +
> + cur_tsc = rte_rdtsc();
> +
> + /*
> +  * TX burst queue drain
> +  */
> + diff_tsc = cur_tsc - prev_tsc;
> + if (unlikely(diff_tsc > drain_tsc)) {
> + for (i = 0; i < qconf->n_tx_port; ++i) {
> + portid = qconf->tx_port_id[i];
> + rte_eth_tx_buffer_flush(portid,
> + qconf->tx_queue_id[portid],
> + qconf->tx_buffer[portid]);
> + }
> + prev_tsc = cur_tsc;
> + }
> +
> +start_rx:
> + /*
> +  * Read packet from RX queues
> +  */
> + lcore_rx_idle_count = 0;
> + for (i = 0; i < qconf->n_rx_queue; ++i) {
> + rx_queue = &(qconf->rx_queue_list[i]);
> + rx_queue->idle_hint = 0;
> + portid = rx_queue->port_id;
> + queueid = rx_queue->queue_id;
> +
> + nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
> + MAX_PKT_BURST);
> +
> + stats[lcore_id].nb_rx_processed += nb_rx;
> + if (unlikely(nb_rx == 0)) {
> + /**
> +  * no packet received from rx queue, try to
> +  * sleep for a while forcing CPU enter deeper
> +  * C states.
> +  */
> + rx_queue->zero_rx_packet_count++;
> +
> + if (rx_queue->zero_rx_packet_count <=
> + MIN_ZERO_POLL_COUNT)
> + continue;
> +
> + rx_queue->idle_hint = power_idle_heuristic(
> + rx_queue->zero_rx_packet_count);
> + lcore_rx_idle_count++;
> + } else {
> + rx_queue->zero_rx_packet_count = 0;
> + }
> +

Re: [dpdk-dev] [PATCH 3/3] l3fwd-power: add interrupt-only mode

2020-05-29 Thread Burakov, Anatoly

On 29-May-20 2:19 PM, Harman Kalra wrote:


if (ret < 0)
rte_exit(EXIT_FAILURE, "Invalid L3FWD parameters\n");
  
-	if (app_mode != APP_MODE_TELEMETRY && init_power_library())

+   if (app_mode == APP_MODE_DEFAULT)
+   app_mode = APP_MODE_LEGACY;
+
+   /* only legacy and empty poll mode rely on power library */
+   if ((app_mode == APP_MODE_LEGACY || app_mode == APP_MODE_EMPTY_POLL) &&
+   init_power_library())
rte_exit(EXIT_FAILURE, "init_power_library failed\n");
  

Hi,

Rather than just exiting from here can we have a else condition to
automatically enter into the "interrupt only" mode.
Please correct me if I am missing something.


Hi,

Thanks for your review. I don't think silently proceeding is a good 
idea. If the user wants interrupt-only mode, they should request it. 
Silently falling back to interrupt-only mode will create an illusion of 
successful initialization and set the wrong expectation for how the 
application will behave.


--
Thanks,
Anatoly


[dpdk-dev] [PATCH] doc: removed typing mistake

2020-05-29 Thread Muhammad Bilal
There was a duplicate command instruction in the documentation of memif
so I have removed the 1 command from it.

Fixes: cd3365d2 ("net/memif: enable loopback")
Cc: jmilan@gmail.com

Signed-off-by: Muhammad Bilal 
---
 doc/guides/nics/memif.rst | 1 -
 1 file changed, 1 deletion(-)

diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
index 08a9fda86..ddeebed25 100644
--- a/doc/guides/nics/memif.rst
+++ b/doc/guides/nics/memif.rst
@@ -289,4 +289,3 @@ Then start the communication::
 Finally we can check port stats to see the traffic::
 
 testpmd> show port stats all
-testpmd> show port stats all
-- 
2.17.1



Re: [dpdk-dev] [PATCH] bus/vmbus: fix wrong allocation for device.name

2020-05-29 Thread Stephen Hemminger
On Thu, 28 May 2020 20:03:07 +0800
wangyunjian  wrote:

> From: Yunjian Wang 
> 
> We do not need and should not allocate memory for device.name.
> The device.name should be set point to the devargs->name.
> 
> Fixes: 831dba47bd36 ("bus/vmbus: add Hyper-V virtual bus support")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Yunjian Wang 

The correct fix is more complex. The code needs to act more like PCI.
The name should be in the vmbus_device structure (like PCI).
And the devargs logic needs to be more involved and also handle
whitelist/blacklisting.

Here is a very rough idea what that would look like:

From edf90357a99c7970546ef00941a9593bca46d08e Mon Sep 17 00:00:00 2001
From: Stephen Hemminger 
Date: Fri, 29 May 2020 10:45:26 -0700
Subject: [PATCH] vmbus: support whitelist/blacklist

This makes VMBus work like PCI and support blacklist and
whitelist command line arguments.

It also moves the vmbus device name into the internal
structure.

This is compile tested only.

Signed-off-by: Stephen Hemminger 
---
 drivers/bus/vmbus/linux/vmbus_bus.c | 30 
 drivers/bus/vmbus/private.h |  5 ++-
 drivers/bus/vmbus/rte_bus_vmbus.h   |  2 +-
 drivers/bus/vmbus/vmbus_common.c| 53 ++---
 4 files changed, 67 insertions(+), 23 deletions(-)

diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c 
b/drivers/bus/vmbus/linux/vmbus_bus.c
index 3c924eee1412..9162b30bb46c 100644
--- a/drivers/bus/vmbus/linux/vmbus_bus.c
+++ b/drivers/bus/vmbus/linux/vmbus_bus.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -230,7 +229,7 @@ rte_vmbus_unmap_device(struct rte_vmbus_device *dev)
 
 /* Scan one vmbus sysfs entry, and fill the devices list from it. */
 static int
-vmbus_scan_one(const char *name)
+vmbus_scan_one(rte_uuid_t id, const char *name)
 {
struct rte_vmbus_device *dev, *dev2;
char filename[PATH_MAX];
@@ -242,14 +241,10 @@ vmbus_scan_one(const char *name)
return -1;
 
dev->device.bus = &rte_vmbus_bus.bus;
-   dev->device.name = strdup(name);
-   if (!dev->device.name)
-   goto error;
+   rte_uuid_copy(dev->device_id, id);
 
/* sysfs base directory
 *   /sys/bus/vmbus/devices/7a08391f-f5a0-4ac0-9802-d13fd964f8df
-* or on older kernel
-*   /sys/bus/vmbus/devices/vmbus_1
 */
snprintf(dirname, sizeof(dirname), "%s/%s",
 SYSFS_VMBUS_DEVICES, name);
@@ -265,11 +260,6 @@ vmbus_scan_one(const char *name)
return 0;
}
 
-   /* get device id */
-   snprintf(filename, sizeof(filename), "%s/device_id", dirname);
-   if (parse_sysfs_uuid(filename, dev->device_id) < 0)
-   goto error;
-
/* get relid */
snprintf(filename, sizeof(filename), "%s/id", dirname);
if (eal_parse_sysfs_value(filename, &tmp) < 0)
@@ -295,9 +285,6 @@ vmbus_scan_one(const char *name)
dev->device.numa_node = SOCKET_ID_ANY;
}
 
-   dev->device.devargs = vmbus_devargs_lookup(dev);
-
-   /* device is valid, add in list (sorted) */
VMBUS_LOG(DEBUG, "Adding vmbus device %s", name);
 
TAILQ_FOREACH(dev2, &rte_vmbus_bus.device_list, next) {
@@ -346,10 +333,21 @@ rte_vmbus_scan(void)
}
 
while ((e = readdir(dir)) != NULL) {
+   rte_uuid_t id;
+
if (e->d_name[0] == '.')
continue;
 
-   if (vmbus_scan_one(e->d_name) < 0)
+   if (rte_uuid_parse(e->d_name, id) != 0) {
+   VMBUS_LOG(DEBUG, "ignore '%s' non-uuid in 
vmbus/devices",
+ e->d_name);
+   continue;
+   }
+
+   if (vmbus_ignore_device(id))
+   continue;
+
+   if (vmbus_scan_one(id, e->d_name) < 0)
goto error;
}
closedir(dir);
diff --git a/drivers/bus/vmbus/private.h b/drivers/bus/vmbus/private.h
index f19b14e4a657..dd6a17c7238c 100644
--- a/drivers/bus/vmbus/private.h
+++ b/drivers/bus/vmbus/private.h
@@ -71,12 +71,15 @@ struct vmbus_channel {
 #define VMBUS_MAX_CHANNELS 64
 
 struct rte_devargs *
-vmbus_devargs_lookup(struct rte_vmbus_device *dev);
+vmbus_devargs_lookup(rte_uuid_t device_id);
+
+void vmbus_name_set(struct rte_vmbus_device *dev);
 
 int vmbus_chan_create(const struct rte_vmbus_device *device,
  uint16_t relid, uint16_t subid, uint8_t monitor_id,
  struct vmbus_channel **new_chan);
 
+bool vmbus_ignore_device(rte_uuid_t device_id);
 void vmbus_add_device(struct rte_vmbus_device *vmbus_dev);
 void vmbus_insert_device(struct rte_vmbus_device *exist_vmbus_dev,
 struct rte_vmbus_device *new_vmbus_dev);
diff --git a/drivers/bus/vmbus/rte_bus_vmbus.h 
b/drivers/bus/vmbus/rte_bus_vmbus.h
index 4cf73ce81513..62d067f19179 100644
--- a/drivers/bus/vmbus/rte_b

Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds

2020-05-29 Thread N. Benes
Hi everyone,

Tom Barbette:
> 
> Le 22/05/2020 à 20:43, PATRICK KEROULAS a écrit :
>> mlx5 part of libibverbs includes a ts-to-ns converter which takes the
>> instantaneous clock info. It's unused in dpdk so far. I've tested
>> it in the
>> device/port init routine and the result looks reliable. Since this
>> approach
>> looks very simple, compared to the time sync mechanism, I'm trying to
>> integrate.
>>
>> The conversion should occur in the primary process (testpmd) I
>> suppose.
>> 1) The needed clock info derives from ethernet device. Is it
>> possible to
>>     access that struct from a rx callback?
>> 2) how to attach the nanosecond to mbuf so that `pdump` catches it?
>>     (workaround: copy `mbuf->udata64` in forwarded packets.)
>> 3) any other idea?
> The timestamp is carried in mbuf.
> Then the conversion must be done by the ethdev caller (application or
> any other upper layer).
 What if the converter function needs a clock_info?
 https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201

 I'm affraid this info may change by the time the converter is called
 by upper layer.
>>> Indeed, the clock in the device is not an atomic one :)
>>> We need to adjust the time conversion continuously.
>>> I am not an expert of time synchronization, so I add more people Cc
>>> who could help for having a precise timestamp.
>> Thanks Thomas.
>> Not sure this is a synchronization issue. We have dedicated processes
>> (linuxptp) to keep both NIC and sys clocks in sync with an external
>> clock.
>> It is "just" a matter of unit conversion.
>>
>> If it has to be performed in dpdk-pdump, I would need some help to
>> retrieve mlx5_clock_info from inside a secondary process. Looking at
>> mlx5_read_clock(), this info is extracted from ibv_context which looks
>> reachable in a primary process only (segfault, if I try in pdump).

The normal phc2sys can not only synchronise NIC -> system but also sys
-> NIC and (I believe it does but have not tried) NIC1 -> NIC2.
If I understand your proposal correctly, you want to use a free running
NIC counter and calibrate out the drift afterwards.
It may be easier to adapt phc2sys to use a NIC through DPDK and sync the
NIC's timewheel/VCO in a proven/reliable manner (e.g. low pass filtering
excursions). Then you could directly use the NIC counter value.

> I don't know about the integrated ts-to-ns, but we implemented a
> translation mechanism that mimics what NTP does in Linux to translate a
> given clock (TSC at first) to a wall time. You'll find more info at
> https://orbi.uliege.be/bitstream/2268/226257/1/thesis.pdf chapter
> 3.4.1.  This is an often forgotten matter, as we saw in real switches
> that the time spent in time-related VDSO is enormous.

Do you have measurements of vDSO clock_gettime and how much is
"enormous" to you?
To my knowledge, clock_gettime via vDSO on Linux only takes a few
nanoseconds in the average case. However, it can go up to ~10 or even
~50 microseconds every few (~10) seconds, depending on the number of
CPUs (for example single vs. dual socket, though my hardware for this
test is quite old, Dell R210-II, R610). Presumably this is when the
kernel locks the struct in VVAR to update the TSC drift compensation
parameters.

Linux clock_gettime implementation is here (different versions):
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/vdso/vclock_gettime.c?h=linux-3.10.y#n193
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/entry/vdso/vclock_gettime.c?h=linux-4.19.y#n241
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/lib/vdso/gettimeofday.c#n98

I use busy waiting on clock_gettime in a packet generator application
(so far 10 GbE only) to pace jumbo frames according to a spec
(simulating the traffic pattern of a to-be-developed hardware with
FPGA), and COTS sniffer hardware with absolute timestamping to verify my
generator's performance. I can observe the above 10-50 us artefacts and
sufficiently good/low (for my needs) average execution time of
clock_gettime. The only sad thing is that TAI clock does not go through
vDSO and therefore I cannot use it.

> We wanted to do a very precise capture too, se we made that clock able
> to synchronize itself with the ConnectX 5 internal clock as a base
> instead of TSC. FYI the clock in CX5 si running at 800MHz, so pure
> nanosecond is impossible, but close enough. It is for that purpose that
> I proposed the rte_eth_read_clock() patch in DPDK. We need to be able to
> read the current clock (like rdtsc() instruction for TSC) to compute the
> frequency.

Doesn't this mean that you need to wait for the PCIe op from the NIC?
Is this really faster than a rdtsc, memory/cache read, integer
multiplication and shift?

Cheers,
nicolas



Re: [dpdk-dev] mlx5 & pdump: convert HW timestamps to nanoseconds

2020-05-29 Thread PATRICK KEROULAS
On Tue, May 26, 2020 at 12:00 PM Slava Ovsiienko 
wrote:
>> Hi, Patrick
>
> ConnectX HW timestamp is the captured value of internal 64-bit counter
running at the frequency,
> reported in the device_frequency_khz field of struct
mlx5_ifc_cmd_hca_cap_bits{}.
> This structure is queried in mlx5_devx_cmd_query_hca_attr() routine.
> So, with known frequency it is possible to recalculate timestamp ticks to
desired units.

Hello Slava,

Assuming that the NIC clock is already synced thanks to a PTP client,
does the bit counter give an absolute time value (0 => 1 January 1970
00:00:00)? Or do I need to calculate a time duration from the process
start time?

I just want to validate the path from mlx5 eth dev(Rx) to eth pcap (Tx) :
- query the oscillator frequency at the mlx5_eth_dev init step
  (mlx5_devx_cmd_query_hca_attr())
- store the freq with other hca_attr, carried by dev config which should
  be shared with the secondary process
- in eth_pcap_tx_dumper(), retrieve the freq from the dev given by
  mbuf->port
- convert all the incoming mbuf->timestamp using this freq whose
  variation should be negligible over the capture duration

Last question: what is your opinion about this other method?
https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201

Thanks a lot!