On 3/27/2023 6:31 AM, Deng, KaiwenX wrote: > > >> -----Original Message----- >> From: Ferruh Yigit <ferruh.yi...@amd.com> >> Sent: Thursday, March 23, 2023 11:39 PM >> To: Deng, KaiwenX <kaiwenx.d...@intel.com>; dev@dpdk.org >> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, YidingX >> <yidingx.z...@intel.com>; Chas Williams <ch...@att.com>; Min Hu (Connor) >> <humi...@huawei.com>; Wu, Jingjing <jingjing...@intel.com>; Xing, Beilei >> <beilei.x...@intel.com>; Mike Pattrick <m...@redhat.com>; Zhang, Qi Z >> <qi.z.zh...@intel.com>; Doherty, Declan <declan.dohe...@intel.com>; >> Mrzyglod, Daniel T <daniel.t.mrzyg...@intel.com>; Dapeng Yu >> <dapengx...@intel.com> >> Subject: Re: [PATCH v3] net/iavf: fix iavf query stats in intr thread >> >> On 3/22/2023 7:26 AM, Kaiwen Deng wrote: >>> When iavf send query-stats command in eal-intr-thread through virtual >>> channel, there will be no response received from >>> iavf_dev_virtchnl_handler for this command during block and wait. >>> Because iavf_dev_virtchnl_handler is also registered in eal-intr-thread. >>> >>> When vf device is bonded as BONDING_MODE_TLB mode, the slave device >>> update callback will registered in alarm and called by >>> eal-intr-thread, it would also raise the above issue. >>> >>> This commit add to poll the response for VIRTCHNL_OP_GET_STATS when >> it >>> is called by eal-intr-thread to fix this issue. >>> >>> Fixes: 91bf37d250aa ("net/iavf: add lock for VF commands") >>> Fixes: 22b123a36d07 ("net/avf: initialize PMD") >>> Fixes: 7c76a747e68c ("bond: add mode 5") >>> Fixes: 435d523112cc ("net/iavf: fix multi-process shared data") >>> Fixes: cb5c1b91f76f ("net/iavf: add thread for event callbacks") >> >> >> Hi Kaiwen, >> >> Above commit already seems trying to address same issue, it creates "iavf- >> event-thread" control thread to asyncroniously handle the interrupts, in non- >> interrupt context, why it is not working? >> >> Instead of adding 'rte_thread_is_intr()' checks, can't you make sure all >> interrupts handled in control tread? >> >> And can you please provide a stack trace in commit log, to describe the issue >> better? > Hi Ferru, > Sorry for my late reply, And thanks for your review. > > The above commit does not fix this issue when we need to get the returned > data. > If we call iavf_query_stats and wait for response statistics in the > intr-thread. > iavf_handle_virtchnl_msg is also registered in the intr_thread and will not > be > executed while waiting. >
Got it, since return value is required, API can't be called asyncroniously. I think 'rte_thread_is_intr()' checks may cause more trouble for you in long term, - why 'iavf_query_stats()' is called in the iterrupt thread, can it be prevented? - does it make sense to allways poll messages from PF (for simplification)? If answer to both are 'No', I am OK to continue with current proposal if you are happy with it. > This commit I changed it to polling for replies to commands executed in the > interrupt thread. > > main thread > interrupt thread > | > | > | > | > iavf_query_stats > | > iavf_execute_vf_cmd > | > iavf_aq_send_msg_to_pf and wait handle complete > | > | > | > > |-------------------------------------------------------------------------------------------->| > | > | > | > iavf_handle_virtchnl_msg > | > | > > |<--------------------------------------------------------------------------------------------| > | > | > iavf_execute_vf_cmd get response > | > | > | > > The above is the stack trace for the normal execution of iavf_query_stats > in the main thread. > > interrupt thread > | > | > iavf_query_stats > iavf_execute_vf_cmd > iavf_aq_send_msg_to_pf wait handle complete(1 sec) > iavf_execute_vf_cmd timeout > | > | > iavf_handle_virtchnl_msg > | > > The above is the stack trace for the abnormal execution of iavf_query_stats > in the interrupt thread. >> >>> Cc: sta...@dpdk.org >>> >>> Signed-off-by: Kaiwen Deng <kaiwenx.d...@intel.com> >>> --- >>> Changes since v2: >>> - Add to poll the response for VIRTCHNL_OP_GET_STATS. >>> >>> Changes since v1: >>> - Add lock to avoid race condition. >>> --- >>> --- >>> drivers/net/bonding/rte_eth_bond_pmd.c | 7 ++- >>> drivers/net/iavf/iavf_ethdev.c | 5 +- >>> drivers/net/iavf/iavf_vchnl.c | 71 ++++++++++++++++++-------- >>> 3 files changed, 58 insertions(+), 25 deletions(-) >>> >>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c >>> b/drivers/net/bonding/rte_eth_bond_pmd.c >>> index f0c4f7d26b..edce621496 100644 >>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c >>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c >>> @@ -894,6 +894,7 @@ bond_ethdev_update_tlb_slave_cb(void *arg) >>> uint8_t update_stats = 0; >>> uint16_t slave_id; >>> uint16_t i; >>> + int ret; >>> >>> internals->slave_update_idx++; >>> >>> @@ -903,7 +904,10 @@ bond_ethdev_update_tlb_slave_cb(void *arg) >>> >>> for (i = 0; i < internals->active_slave_count; i++) { >>> slave_id = internals->active_slaves[i]; >>> - rte_eth_stats_get(slave_id, &slave_stats); >>> + ret = rte_eth_stats_get(slave_id, &slave_stats); >>> + if (ret) >>> + goto OUT; >>> + >>> tx_bytes = slave_stats.obytes - tlb_last_obytets[slave_id]; >>> bandwidth_left(slave_id, tx_bytes, >>> internals->slave_update_idx, &bwg_array[i]); >> @@ -922,6 +926,7 @@ >>> bond_ethdev_update_tlb_slave_cb(void *arg) >>> for (i = 0; i < slave_count; i++) >>> internals->tlb_slaves_order[i] = bwg_array[i].slave; >>> >>> +OUT: >>> rte_eal_alarm_set(REORDER_PERIOD_MS * 1000, >> bond_ethdev_update_tlb_slave_cb, >>> (struct bond_dev_private *)internals); } diff --git >>> a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c >>> index 3196210f2c..d6e1f1a7f4 100644 >>> --- a/drivers/net/iavf/iavf_ethdev.c >>> +++ b/drivers/net/iavf/iavf_ethdev.c >>> @@ -2607,6 +2607,9 @@ iavf_dev_init(struct rte_eth_dev *eth_dev) >>> adapter->dev_data = eth_dev->data; >>> adapter->stopped = 1; >>> >>> + if (iavf_dev_event_handler_init()) >>> + goto init_vf_err; >>> + >>> if (iavf_init_vf(eth_dev) != 0) { >>> PMD_INIT_LOG(ERR, "Init vf failed"); >>> return -1; >>> @@ -2634,8 +2637,6 @@ iavf_dev_init(struct rte_eth_dev *eth_dev) >>> rte_ether_addr_copy((struct rte_ether_addr *)hw->mac.addr, >>> ð_dev->data->mac_addrs[0]); >>> >>> - if (iavf_dev_event_handler_init()) >>> - goto init_vf_err; >>> >>> if (vf->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_WB_ON_ITR) >> { >>> /* register callback func to eal lib */ diff --git >>> a/drivers/net/iavf/iavf_vchnl.c b/drivers/net/iavf/iavf_vchnl.c index >>> 9adaadb173..aeffb07cca 100644 >>> --- a/drivers/net/iavf/iavf_vchnl.c >>> +++ b/drivers/net/iavf/iavf_vchnl.c >>> @@ -256,6 +256,7 @@ iavf_read_msg_from_pf(struct iavf_adapter >> *adapter, uint16_t buf_len, >>> vf->link_speed = >> iavf_convert_link_speed(speed); >>> } >>> iavf_dev_link_update(vf->eth_dev, 0); >>> + iavf_dev_event_post(vf->eth_dev, >> RTE_ETH_EVENT_INTR_LSC, NULL, 0); >>> PMD_DRV_LOG(INFO, "Link status update:%s", >>> vf->link_up ? "up" : "down"); >>> break; >>> @@ -368,28 +369,48 @@ iavf_execute_vf_cmd(struct iavf_adapter >> *adapter, struct iavf_cmd_info *args, >>> _clear_cmd(vf); >>> break; >>> default: >>> - /* For other virtchnl ops in running time, >>> - * wait for the cmd done flag. >>> - */ >>> - do { >>> - if (vf->pend_cmd == VIRTCHNL_OP_UNKNOWN) >>> - break; >>> - iavf_msec_delay(ASQ_DELAY_MS); >>> - /* If don't read msg or read sys event, continue */ >>> - } while (i++ < MAX_TRY_TIMES); >>> - >>> - if (i >= MAX_TRY_TIMES) { >>> - PMD_DRV_LOG(ERR, "No response for cmd %d", >> args->ops); >>> + if (rte_thread_is_intr()) { >>> + /* For virtchnl ops were executed in eal_intr_thread, >>> + * need to poll the response. >>> + */ >>> + do { >>> + result = iavf_read_msg_from_pf(adapter, >> args->out_size, >>> + args->out_buffer); >>> + if (result == IAVF_MSG_CMD) >>> + break; >>> + iavf_msec_delay(ASQ_DELAY_MS); >>> + } while (i++ < MAX_TRY_TIMES); >>> + if (i >= MAX_TRY_TIMES || >>> + vf->cmd_retval != >> VIRTCHNL_STATUS_SUCCESS) { >>> + err = -1; >>> + PMD_DRV_LOG(ERR, "No response or return >> failure (%d)" >>> + " for cmd %d", vf- >>> cmd_retval, args->ops); >>> + } >>> _clear_cmd(vf); >>> - err = -EIO; >>> - } else if (vf->cmd_retval == >>> - VIRTCHNL_STATUS_ERR_NOT_SUPPORTED) { >>> - PMD_DRV_LOG(ERR, "Cmd %d not supported", args- >>> ops); >>> - err = -ENOTSUP; >>> - } else if (vf->cmd_retval != VIRTCHNL_STATUS_SUCCESS) { >>> - PMD_DRV_LOG(ERR, "Return failure %d for cmd %d", >>> - vf->cmd_retval, args->ops); >>> - err = -EINVAL; >>> + } else { >>> + /* For other virtchnl ops in running time, >>> + * wait for the cmd done flag. >>> + */ >>> + do { >>> + if (vf->pend_cmd == >> VIRTCHNL_OP_UNKNOWN) >>> + break; >>> + iavf_msec_delay(ASQ_DELAY_MS); >>> + /* If don't read msg or read sys event, >> continue */ >>> + } while (i++ < MAX_TRY_TIMES); >>> + >>> + if (i >= MAX_TRY_TIMES) { >>> + PMD_DRV_LOG(ERR, "No response for >> cmd %d", args->ops); >>> + _clear_cmd(vf); >>> + err = -EIO; >>> + } else if (vf->cmd_retval == >>> + VIRTCHNL_STATUS_ERR_NOT_SUPPORTED) { >>> + PMD_DRV_LOG(ERR, "Cmd %d not >> supported", args->ops); >>> + err = -ENOTSUP; >>> + } else if (vf->cmd_retval != >> VIRTCHNL_STATUS_SUCCESS) { >>> + PMD_DRV_LOG(ERR, "Return failure %d for >> cmd %d", >>> + vf->cmd_retval, args->ops); >>> + err = -EINVAL; >>> + } >>> } >>> break; >>> } >>> @@ -403,8 +424,14 @@ iavf_execute_vf_cmd_safe(struct iavf_adapter >>> *adapter, { >>> struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(adapter); >>> int ret; >>> + int is_intr_thread = rte_thread_is_intr(); >>> >>> - rte_spinlock_lock(&vf->aq_lock); >>> + if (is_intr_thread) { >>> + if (!rte_spinlock_trylock(&vf->aq_lock)) >>> + return -EIO; >>> + } else { >>> + rte_spinlock_lock(&vf->aq_lock); >>> + } >>> ret = iavf_execute_vf_cmd(adapter, args, async); >>> rte_spinlock_unlock(&vf->aq_lock); >>> >