> -----Original Message----- > From: Ferruh Yigit <ferruh.yi...@amd.com> > Sent: Monday, March 27, 2023 8:38 PM > To: Deng, KaiwenX <kaiwenx.d...@intel.com>; dev@dpdk.org > Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, YidingX > <yidingx.z...@intel.com>; Chas Williams <ch...@att.com>; Min Hu (Connor) > <humi...@huawei.com>; Wu, Jingjing <jingjing...@intel.com>; Xing, Beilei > <beilei.x...@intel.com>; Mike Pattrick <m...@redhat.com>; Zhang, Qi Z > <qi.z.zh...@intel.com>; Doherty, Declan <declan.dohe...@intel.com>; > Mrzyglod, Daniel T <daniel.t.mrzyg...@intel.com>; Dapeng Yu > <dapengx...@intel.com>; Zhang, Helin <helin.zh...@intel.com>; > Mcnamara, John <john.mcnam...@intel.com>; Thomas Monjalon > <tho...@monjalon.net> > Subject: Re: [PATCH v3] net/iavf: fix iavf query stats in intr thread > > On 3/27/2023 1:31 PM, Ferruh Yigit wrote: > > On 3/27/2023 6:31 AM, Deng, KaiwenX wrote: > >> > >> > >>> -----Original Message----- > >>> From: Ferruh Yigit <ferruh.yi...@amd.com> > >>> Sent: Thursday, March 23, 2023 11:39 PM > >>> To: Deng, KaiwenX <kaiwenx.d...@intel.com>; dev@dpdk.org > >>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, > >>> YidingX <yidingx.z...@intel.com>; Chas Williams <ch...@att.com>; Min > >>> Hu (Connor) <humi...@huawei.com>; Wu, Jingjing > >>> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; Mike > >>> Pattrick <m...@redhat.com>; Zhang, Qi Z <qi.z.zh...@intel.com>; > >>> Doherty, Declan <declan.dohe...@intel.com>; Mrzyglod, Daniel T > >>> <daniel.t.mrzyg...@intel.com>; Dapeng Yu <dapengx...@intel.com> > >>> Subject: Re: [PATCH v3] net/iavf: fix iavf query stats in intr > >>> thread > >>> > >>> On 3/22/2023 7:26 AM, Kaiwen Deng wrote: > >>>> When iavf send query-stats command in eal-intr-thread through > >>>> virtual channel, there will be no response received from > >>>> iavf_dev_virtchnl_handler for this command during block and wait. > >>>> Because iavf_dev_virtchnl_handler is also registered in eal-intr-thread. > >>>> > >>>> When vf device is bonded as BONDING_MODE_TLB mode, the slave > device > >>>> update callback will registered in alarm and called by > >>>> eal-intr-thread, it would also raise the above issue. > >>>> > >>>> This commit add to poll the response for VIRTCHNL_OP_GET_STATS > when > >>> it > >>>> is called by eal-intr-thread to fix this issue. > >>>> > >>>> Fixes: 91bf37d250aa ("net/iavf: add lock for VF commands") > >>>> Fixes: 22b123a36d07 ("net/avf: initialize PMD") > >>>> Fixes: 7c76a747e68c ("bond: add mode 5") > >>>> Fixes: 435d523112cc ("net/iavf: fix multi-process shared data") > >>>> Fixes: cb5c1b91f76f ("net/iavf: add thread for event callbacks") > >>> > >>> > >>> Hi Kaiwen, > >>> > >>> Above commit already seems trying to address same issue, it creates > >>> "iavf- event-thread" control thread to asyncroniously handle the > >>> interrupts, in non- interrupt context, why it is not working? > >>> > >>> Instead of adding 'rte_thread_is_intr()' checks, can't you make sure > >>> all interrupts handled in control tread? > >>> > >>> And can you please provide a stack trace in commit log, to describe > >>> the issue better? > >> Hi Ferru, > >> Sorry for my late reply, And thanks for your review. > >> > >> The above commit does not fix this issue when we need to get the > returned data. > >> If we call iavf_query_stats and wait for response statistics in the intr- > thread. > >> iavf_handle_virtchnl_msg is also registered in the intr_thread and > >> will not be executed while waiting. > >> > > > > Got it, since return value is required, API can't be called asyncroniously. > > > > > > > > I think 'rte_thread_is_intr()' checks may cause more trouble for you > > in long term, > > > > - why 'iavf_query_stats()' is called in the iterrupt thread, can it be > > prevented? > > > > - does it make sense to allways poll messages from PF (for simplification)? > > > > > > If answer to both are 'No', I am OK to continue with current proposal > > if you are happy with it. > > > > > btw, how critical is this issue? > > If it is critical, I am OK to get it as it is for this release and > investigate it further > for next release, since only a few days left for this release. > This is not a very crtical issue, we can wait for a more appropriate solution.
Thanks. > > > > >> This commit I changed it to polling for replies to commands executed in > the interrupt thread. > >> > >> main thread > >> interrupt > thread > >> | > >> | > >> | > >> | > >> iavf_query_stats > >> | > >> iavf_execute_vf_cmd > >> | > >> iavf_aq_send_msg_to_pf and wait handle complete > >> | > >> | > >> | > >> > >> |------------------------------------------------------------------------------------- > ------->| > >> | > >> | > >> | > iavf_handle_virtchnl_msg > >> | > >> | > >> > >> |<------------------------------------------------------------------------------------ > --------| > >> | > >> | > >> iavf_execute_vf_cmd get response > >> | > >> | > >> | > >> > >> The above is the stack trace for the normal execution of > >> iavf_query_stats in the main thread. > >> > >> interrupt thread > >> | > >> | > >> iavf_query_stats > >> iavf_execute_vf_cmd > >> iavf_aq_send_msg_to_pf wait handle complete(1 sec) > >> iavf_execute_vf_cmd timeout > >> | > >> | > >> iavf_handle_virtchnl_msg > >> | > >> > >> The above is the stack trace for the abnormal execution of > >> iavf_query_stats in the interrupt thread. > >>> > >>>> Cc: sta...@dpdk.org > >>>> > >>>> Signed-off-by: Kaiwen Deng <kaiwenx.d...@intel.com> > >>>> --- > >>>> Changes since v2: > >>>> - Add to poll the response for VIRTCHNL_OP_GET_STATS. > >>>> > >>>> Changes since v1: > >>>> - Add lock to avoid race condition. > >>>> --- > >>>> --- > >>>> drivers/net/bonding/rte_eth_bond_pmd.c | 7 ++- > >>>> drivers/net/iavf/iavf_ethdev.c | 5 +- > >>>> drivers/net/iavf/iavf_vchnl.c | 71 ++++++++++++++++++-------- > >>>> 3 files changed, 58 insertions(+), 25 deletions(-) > >>>> > >>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c > >>>> b/drivers/net/bonding/rte_eth_bond_pmd.c > >>>> index f0c4f7d26b..edce621496 100644 > >>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c > >>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c > >>>> @@ -894,6 +894,7 @@ bond_ethdev_update_tlb_slave_cb(void *arg) > >>>> uint8_t update_stats = 0; > >>>> uint16_t slave_id; > >>>> uint16_t i; > >>>> + int ret; > >>>> > >>>> internals->slave_update_idx++; > >>>> > >>>> @@ -903,7 +904,10 @@ bond_ethdev_update_tlb_slave_cb(void *arg) > >>>> > >>>> for (i = 0; i < internals->active_slave_count; i++) { > >>>> slave_id = internals->active_slaves[i]; > >>>> - rte_eth_stats_get(slave_id, &slave_stats); > >>>> + ret = rte_eth_stats_get(slave_id, &slave_stats); > >>>> + if (ret) > >>>> + goto OUT; > >>>> + > >>>> tx_bytes = slave_stats.obytes - > >>>> tlb_last_obytets[slave_id]; > >>>> bandwidth_left(slave_id, tx_bytes, > >>>> internals->slave_update_idx, > >>>> &bwg_array[i]); > >>> @@ -922,6 +926,7 @@ > >>>> bond_ethdev_update_tlb_slave_cb(void *arg) > >>>> for (i = 0; i < slave_count; i++) > >>>> internals->tlb_slaves_order[i] = bwg_array[i].slave; > >>>> > >>>> +OUT: > >>>> rte_eal_alarm_set(REORDER_PERIOD_MS * 1000, > >>> bond_ethdev_update_tlb_slave_cb, > >>>> (struct bond_dev_private *)internals); } diff > >>>> --git > >>>> a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c > >>>> index 3196210f2c..d6e1f1a7f4 100644 > >>>> --- a/drivers/net/iavf/iavf_ethdev.c > >>>> +++ b/drivers/net/iavf/iavf_ethdev.c > >>>> @@ -2607,6 +2607,9 @@ iavf_dev_init(struct rte_eth_dev *eth_dev) > >>>> adapter->dev_data = eth_dev->data; > >>>> adapter->stopped = 1; > >>>> > >>>> + if (iavf_dev_event_handler_init()) > >>>> + goto init_vf_err; > >>>> + > >>>> if (iavf_init_vf(eth_dev) != 0) { > >>>> PMD_INIT_LOG(ERR, "Init vf failed"); > >>>> return -1; > >>>> @@ -2634,8 +2637,6 @@ iavf_dev_init(struct rte_eth_dev *eth_dev) > >>>> rte_ether_addr_copy((struct rte_ether_addr *)hw->mac.addr, > >>>> ð_dev->data->mac_addrs[0]); > >>>> > >>>> - if (iavf_dev_event_handler_init()) > >>>> - goto init_vf_err; > >>>> > >>>> if (vf->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_WB_ON_ITR) > >>> { > >>>> /* register callback func to eal lib */ diff --git > >>>> a/drivers/net/iavf/iavf_vchnl.c b/drivers/net/iavf/iavf_vchnl.c > >>>> index 9adaadb173..aeffb07cca 100644 > >>>> --- a/drivers/net/iavf/iavf_vchnl.c > >>>> +++ b/drivers/net/iavf/iavf_vchnl.c > >>>> @@ -256,6 +256,7 @@ iavf_read_msg_from_pf(struct iavf_adapter > >>> *adapter, uint16_t buf_len, > >>>> vf->link_speed = > >>> iavf_convert_link_speed(speed); > >>>> } > >>>> iavf_dev_link_update(vf->eth_dev, 0); > >>>> + iavf_dev_event_post(vf->eth_dev, > >>> RTE_ETH_EVENT_INTR_LSC, NULL, 0); > >>>> PMD_DRV_LOG(INFO, "Link status update:%s", > >>>> vf->link_up ? "up" : "down"); > >>>> break; > >>>> @@ -368,28 +369,48 @@ iavf_execute_vf_cmd(struct iavf_adapter > >>> *adapter, struct iavf_cmd_info *args, > >>>> _clear_cmd(vf); > >>>> break; > >>>> default: > >>>> - /* For other virtchnl ops in running time, > >>>> - * wait for the cmd done flag. > >>>> - */ > >>>> - do { > >>>> - if (vf->pend_cmd == VIRTCHNL_OP_UNKNOWN) > >>>> - break; > >>>> - iavf_msec_delay(ASQ_DELAY_MS); > >>>> - /* If don't read msg or read sys event, > >>>> continue */ > >>>> - } while (i++ < MAX_TRY_TIMES); > >>>> - > >>>> - if (i >= MAX_TRY_TIMES) { > >>>> - PMD_DRV_LOG(ERR, "No response for cmd %d", > >>> args->ops); > >>>> + if (rte_thread_is_intr()) { > >>>> + /* For virtchnl ops were executed in > >>>> eal_intr_thread, > >>>> + * need to poll the response. > >>>> + */ > >>>> + do { > >>>> + result = iavf_read_msg_from_pf(adapter, > >>> args->out_size, > >>>> + > >>>> args->out_buffer); > >>>> + if (result == IAVF_MSG_CMD) > >>>> + break; > >>>> + iavf_msec_delay(ASQ_DELAY_MS); > >>>> + } while (i++ < MAX_TRY_TIMES); > >>>> + if (i >= MAX_TRY_TIMES || > >>>> + vf->cmd_retval != > >>> VIRTCHNL_STATUS_SUCCESS) { > >>>> + err = -1; > >>>> + PMD_DRV_LOG(ERR, "No response or return > >>> failure (%d)" > >>>> + " for cmd %d", vf- > >>>> cmd_retval, args->ops); > >>>> + } > >>>> _clear_cmd(vf); > >>>> - err = -EIO; > >>>> - } else if (vf->cmd_retval == > >>>> - VIRTCHNL_STATUS_ERR_NOT_SUPPORTED) { > >>>> - PMD_DRV_LOG(ERR, "Cmd %d not supported", args- > >>>> ops); > >>>> - err = -ENOTSUP; > >>>> - } else if (vf->cmd_retval != VIRTCHNL_STATUS_SUCCESS) { > >>>> - PMD_DRV_LOG(ERR, "Return failure %d for cmd %d", > >>>> - vf->cmd_retval, args->ops); > >>>> - err = -EINVAL; > >>>> + } else { > >>>> + /* For other virtchnl ops in running time, > >>>> + * wait for the cmd done flag. > >>>> + */ > >>>> + do { > >>>> + if (vf->pend_cmd == > >>> VIRTCHNL_OP_UNKNOWN) > >>>> + break; > >>>> + iavf_msec_delay(ASQ_DELAY_MS); > >>>> + /* If don't read msg or read sys event, > >>> continue */ > >>>> + } while (i++ < MAX_TRY_TIMES); > >>>> + > >>>> + if (i >= MAX_TRY_TIMES) { > >>>> + PMD_DRV_LOG(ERR, "No response for > >>> cmd %d", args->ops); > >>>> + _clear_cmd(vf); > >>>> + err = -EIO; > >>>> + } else if (vf->cmd_retval == > >>>> + VIRTCHNL_STATUS_ERR_NOT_SUPPORTED) { > >>>> + PMD_DRV_LOG(ERR, "Cmd %d not > >>> supported", args->ops); > >>>> + err = -ENOTSUP; > >>>> + } else if (vf->cmd_retval != > >>> VIRTCHNL_STATUS_SUCCESS) { > >>>> + PMD_DRV_LOG(ERR, "Return failure %d for > >>> cmd %d", > >>>> + vf->cmd_retval, > >>>> args->ops); > >>>> + err = -EINVAL; > >>>> + } > >>>> } > >>>> break; > >>>> } > >>>> @@ -403,8 +424,14 @@ iavf_execute_vf_cmd_safe(struct > iavf_adapter > >>>> *adapter, { > >>>> struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(adapter); > >>>> int ret; > >>>> + int is_intr_thread = rte_thread_is_intr(); > >>>> > >>>> - rte_spinlock_lock(&vf->aq_lock); > >>>> + if (is_intr_thread) { > >>>> + if (!rte_spinlock_trylock(&vf->aq_lock)) > >>>> + return -EIO; > >>>> + } else { > >>>> + rte_spinlock_lock(&vf->aq_lock); > >>>> + } > >>>> ret = iavf_execute_vf_cmd(adapter, args, async); > >>>> rte_spinlock_unlock(&vf->aq_lock); > >>>> > >> > >