On 5/23/2023 2:45 AM, Deng, KaiwenX wrote: > > >> -----Original Message----- >> From: Deng, KaiwenX >> Sent: Friday, May 5, 2023 10:31 AM >> To: Ferruh Yigit <ferruh.yi...@amd.com>; dev@dpdk.org >> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, YidingX >> <yidingx.z...@intel.com>; Chas Williams <ch...@att.com>; Min Hu (Connor) >> <humi...@huawei.com>; Wu, Jingjing <jingjing...@intel.com>; Xing, Beilei >> <beilei.x...@intel.com>; Mike Pattrick <m...@redhat.com>; Zhang, Qi Z >> <qi.z.zh...@intel.com>; Doherty, Declan <declan.dohe...@intel.com>; >> Mrzyglod, Daniel T <daniel.t.mrzyg...@intel.com>; Dapeng Yu >> <dapengx...@intel.com>; Zhang, Helin <helin.zh...@intel.com>; >> Mcnamara, John <john.mcnam...@intel.com>; Thomas Monjalon >> <tho...@monjalon.net> >> Subject: RE: [PATCH v3] net/iavf: fix iavf query stats in intr thread >> >> >> >>> -----Original Message----- >>> From: Ferruh Yigit <ferruh.yi...@amd.com> >>> Sent: Monday, March 27, 2023 8:38 PM >>> To: Deng, KaiwenX <kaiwenx.d...@intel.com>; dev@dpdk.org >>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, >>> YidingX <yidingx.z...@intel.com>; Chas Williams <ch...@att.com>; Min >>> Hu (Connor) <humi...@huawei.com>; Wu, Jingjing >>> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; Mike >>> Pattrick <m...@redhat.com>; Zhang, Qi Z <qi.z.zh...@intel.com>; >>> Doherty, Declan <declan.dohe...@intel.com>; Mrzyglod, Daniel T >>> <daniel.t.mrzyg...@intel.com>; Dapeng Yu <dapengx...@intel.com>; >>> Zhang, Helin <helin.zh...@intel.com>; Mcnamara, John >>> <john.mcnam...@intel.com>; Thomas Monjalon <tho...@monjalon.net> >>> Subject: Re: [PATCH v3] net/iavf: fix iavf query stats in intr thread >>> >>> On 3/27/2023 1:31 PM, Ferruh Yigit wrote: >>>> On 3/27/2023 6:31 AM, Deng, KaiwenX wrote: >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Ferruh Yigit <ferruh.yi...@amd.com> >>>>>> Sent: Thursday, March 23, 2023 11:39 PM >>>>>> To: Deng, KaiwenX <kaiwenx.d...@intel.com>; dev@dpdk.org >>>>>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, >>>>>> YidingX <yidingx.z...@intel.com>; Chas Williams <ch...@att.com>; >>>>>> Min Hu (Connor) <humi...@huawei.com>; Wu, Jingjing >>>>>> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; >>>>>> Mike Pattrick <m...@redhat.com>; Zhang, Qi Z >>>>>> <qi.z.zh...@intel.com>; Doherty, Declan >>>>>> <declan.dohe...@intel.com>; Mrzyglod, Daniel T >>>>>> <daniel.t.mrzyg...@intel.com>; Dapeng Yu <dapengx...@intel.com> >>>>>> Subject: Re: [PATCH v3] net/iavf: fix iavf query stats in intr >>>>>> thread >>>>>> >>>>>> On 3/22/2023 7:26 AM, Kaiwen Deng wrote: >>>>>>> When iavf send query-stats command in eal-intr-thread through >>>>>>> virtual channel, there will be no response received from >>>>>>> iavf_dev_virtchnl_handler for this command during block and wait. >>>>>>> Because iavf_dev_virtchnl_handler is also registered in eal-intr- >> thread. >>>>>>> >>>>>>> When vf device is bonded as BONDING_MODE_TLB mode, the slave >>> device >>>>>>> update callback will registered in alarm and called by >>>>>>> eal-intr-thread, it would also raise the above issue. >>>>>>> >>>>>>> This commit add to poll the response for VIRTCHNL_OP_GET_STATS >>> when >>>>>> it >>>>>>> is called by eal-intr-thread to fix this issue. >>>>>>> >>>>>>> Fixes: 91bf37d250aa ("net/iavf: add lock for VF commands") >>>>>>> Fixes: 22b123a36d07 ("net/avf: initialize PMD") >>>>>>> Fixes: 7c76a747e68c ("bond: add mode 5") >>>>>>> Fixes: 435d523112cc ("net/iavf: fix multi-process shared data") >>>>>>> Fixes: cb5c1b91f76f ("net/iavf: add thread for event callbacks") >>>>>> >>>>>> >>>>>> Hi Kaiwen, >>>>>> >>>>>> Above commit already seems trying to address same issue, it >>>>>> creates >>>>>> "iavf- event-thread" control thread to asyncroniously handle the >>>>>> interrupts, in non- interrupt context, why it is not working? >>>>>> >>>>>> Instead of adding 'rte_thread_is_intr()' checks, can't you make >>>>>> sure all interrupts handled in control tread? >>>>>> >>>>>> And can you please provide a stack trace in commit log, to >>>>>> describe the issue better? >>>>> Hi Ferru, >>>>> Sorry for my late reply, And thanks for your review. >>>>> >>>>> The above commit does not fix this issue when we need to get the >>> returned data. >>>>> If we call iavf_query_stats and wait for response statistics in the >>>>> intr- >>> thread. >>>>> iavf_handle_virtchnl_msg is also registered in the intr_thread and >>>>> will not be executed while waiting. >>>>> >>>> >>>> Got it, since return value is required, API can't be called asyncroniously. >>>> >>>> >>>> >>>> I think 'rte_thread_is_intr()' checks may cause more trouble for you >>>> in long term, >>>> >>>> - why 'iavf_query_stats()' is called in the iterrupt thread, can it >>>> be prevented? >>>> >>>> - does it make sense to allways poll messages from PF (for simplification)? >>>> >>>> >>>> If answer to both are 'No', I am OK to continue with current >>>> proposal if you are happy with it. >>>> >>> >>> >>> btw, how critical is this issue? >>> >>> If it is critical, I am OK to get it as it is for this release and >>> investigate it further for next release, since only a few days left for this >> release. >>> >> Hi Ferruh, >> >> I didn't find a more suitable solution after consideration, if you have a >> better >> suggestion, please let me know, thanks. > Hi Ferruh, > Can you please take a look at this patch. >
Hi Kaiwen, Sorry for delay, lets continue with this solution. I thought calling from "iavf-event-thread" can be an option but your description clarifies why it is not an option, and I also don't have better solution, so I think can continue as it is. > Thanks. >>> >>>> >>>>> This commit I changed it to polling for replies to commands >>>>> executed in >>> the interrupt thread. >>>>> >>>>> main thread >>>>> interrupt >>> thread >>>>> | >>>>> | >>>>> | >>>>> | >>>>> iavf_query_stats >>>>> | >>>>> iavf_execute_vf_cmd >>>>> | >>>>> iavf_aq_send_msg_to_pf and wait handle complete >> | >>>>> | >>>>> | >>>>> >>>>> |------------------------------------------------------------------ >>>>> ------------------- >>> ------->| >>>>> | >>>>> | >>>>> | >>> iavf_handle_virtchnl_msg >>>>> | >>>>> | >>>>> >>>>> |<----------------------------------------------------------------- >>>>> ------------------- >>> --------| >>>>> | >>>>> | >>>>> iavf_execute_vf_cmd get response >>>>> | >>>>> | >>>>> | >>>>> >>>>> The above is the stack trace for the normal execution of >>>>> iavf_query_stats in the main thread. >>>>> >>>>> interrupt thread >>>>> | >>>>> | >>>>> iavf_query_stats >>>>> iavf_execute_vf_cmd >>>>> iavf_aq_send_msg_to_pf wait handle complete(1 sec) >>>>> iavf_execute_vf_cmd timeout >>>>> | >>>>> | >>>>> iavf_handle_virtchnl_msg >>>>> | >>>>> >>>>> The above is the stack trace for the abnormal execution of >>>>> iavf_query_stats in the interrupt thread. >>>>>> >>>>>>> Cc: sta...@dpdk.org >>>>>>> >>>>>>> Signed-off-by: Kaiwen Deng <kaiwenx.d...@intel.com> >>>>>>> --- >>>>>>> Changes since v2: >>>>>>> - Add to poll the response for VIRTCHNL_OP_GET_STATS. >>>>>>> >>>>>>> Changes since v1: >>>>>>> - Add lock to avoid race condition. >>>>>>> --- >>>>>>> --- >>>>>>> drivers/net/bonding/rte_eth_bond_pmd.c | 7 ++- >>>>>>> drivers/net/iavf/iavf_ethdev.c | 5 +- >>>>>>> drivers/net/iavf/iavf_vchnl.c | 71 ++++++++++++++++++-------- >>>>>>> 3 files changed, 58 insertions(+), 25 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c >>>>>>> b/drivers/net/bonding/rte_eth_bond_pmd.c >>>>>>> index f0c4f7d26b..edce621496 100644 >>>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c >>>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c >>>>>>> @@ -894,6 +894,7 @@ bond_ethdev_update_tlb_slave_cb(void *arg) >>>>>>> uint8_t update_stats = 0; >>>>>>> uint16_t slave_id; >>>>>>> uint16_t i; >>>>>>> + int ret; >>>>>>> >>>>>>> internals->slave_update_idx++; >>>>>>> >>>>>>> @@ -903,7 +904,10 @@ bond_ethdev_update_tlb_slave_cb(void >> *arg) >>>>>>> >>>>>>> for (i = 0; i < internals->active_slave_count; i++) { >>>>>>> slave_id = internals->active_slaves[i]; >>>>>>> - rte_eth_stats_get(slave_id, &slave_stats); >>>>>>> + ret = rte_eth_stats_get(slave_id, &slave_stats); >>>>>>> + if (ret) >>>>>>> + goto OUT; >>>>>>> + >>>>>>> tx_bytes = slave_stats.obytes - >> tlb_last_obytets[slave_id]; >>>>>>> bandwidth_left(slave_id, tx_bytes, >>>>>>> internals->slave_update_idx, >> &bwg_array[i]); >>>>>> @@ -922,6 +926,7 @@ >>>>>>> bond_ethdev_update_tlb_slave_cb(void *arg) >>>>>>> for (i = 0; i < slave_count; i++) >>>>>>> internals->tlb_slaves_order[i] = bwg_array[i].slave; >>>>>>> >>>>>>> +OUT: >>>>>>> rte_eal_alarm_set(REORDER_PERIOD_MS * 1000, >>>>>> bond_ethdev_update_tlb_slave_cb, >>>>>>> (struct bond_dev_private *)internals); } diff - >> -git >>>>>>> a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c >>>>>>> index 3196210f2c..d6e1f1a7f4 100644 >>>>>>> --- a/drivers/net/iavf/iavf_ethdev.c >>>>>>> +++ b/drivers/net/iavf/iavf_ethdev.c >>>>>>> @@ -2607,6 +2607,9 @@ iavf_dev_init(struct rte_eth_dev *eth_dev) >>>>>>> adapter->dev_data = eth_dev->data; >>>>>>> adapter->stopped = 1; >>>>>>> >>>>>>> + if (iavf_dev_event_handler_init()) >>>>>>> + goto init_vf_err; >>>>>>> + >>>>>>> if (iavf_init_vf(eth_dev) != 0) { >>>>>>> PMD_INIT_LOG(ERR, "Init vf failed"); >>>>>>> return -1; >>>>>>> @@ -2634,8 +2637,6 @@ iavf_dev_init(struct rte_eth_dev *eth_dev) >>>>>>> rte_ether_addr_copy((struct rte_ether_addr *)hw- >>> mac.addr, >>>>>>> ð_dev->data->mac_addrs[0]); >>>>>>> >>>>>>> - if (iavf_dev_event_handler_init()) >>>>>>> - goto init_vf_err; >>>>>>> >>>>>>> if (vf->vf_res->vf_cap_flags & >> VIRTCHNL_VF_OFFLOAD_WB_ON_ITR) >>>>>> { >>>>>>> /* register callback func to eal lib */ diff --git >>>>>>> a/drivers/net/iavf/iavf_vchnl.c b/drivers/net/iavf/iavf_vchnl.c >>>>>>> index 9adaadb173..aeffb07cca 100644 >>>>>>> --- a/drivers/net/iavf/iavf_vchnl.c >>>>>>> +++ b/drivers/net/iavf/iavf_vchnl.c >>>>>>> @@ -256,6 +256,7 @@ iavf_read_msg_from_pf(struct iavf_adapter >>>>>> *adapter, uint16_t buf_len, >>>>>>> vf->link_speed = >>>>>> iavf_convert_link_speed(speed); >>>>>>> } >>>>>>> iavf_dev_link_update(vf->eth_dev, 0); >>>>>>> + iavf_dev_event_post(vf->eth_dev, >>>>>> RTE_ETH_EVENT_INTR_LSC, NULL, 0); >>>>>>> PMD_DRV_LOG(INFO, "Link status >> update:%s", >>>>>>> vf->link_up ? "up" : "down"); >>>>>>> break; >>>>>>> @@ -368,28 +369,48 @@ iavf_execute_vf_cmd(struct iavf_adapter >>>>>> *adapter, struct iavf_cmd_info *args, >>>>>>> _clear_cmd(vf); >>>>>>> break; >>>>>>> default: >>>>>>> - /* For other virtchnl ops in running time, >>>>>>> - * wait for the cmd done flag. >>>>>>> - */ >>>>>>> - do { >>>>>>> - if (vf->pend_cmd == >> VIRTCHNL_OP_UNKNOWN) >>>>>>> - break; >>>>>>> - iavf_msec_delay(ASQ_DELAY_MS); >>>>>>> - /* If don't read msg or read sys event, >> continue */ >>>>>>> - } while (i++ < MAX_TRY_TIMES); >>>>>>> - >>>>>>> - if (i >= MAX_TRY_TIMES) { >>>>>>> - PMD_DRV_LOG(ERR, "No response for >> cmd %d", >>>>>> args->ops); >>>>>>> + if (rte_thread_is_intr()) { >>>>>>> + /* For virtchnl ops were executed in >> eal_intr_thread, >>>>>>> + * need to poll the response. >>>>>>> + */ >>>>>>> + do { >>>>>>> + result = >> iavf_read_msg_from_pf(adapter, >>>>>> args->out_size, >>>>>>> + args- >>> out_buffer); >>>>>>> + if (result == IAVF_MSG_CMD) >>>>>>> + break; >>>>>>> + iavf_msec_delay(ASQ_DELAY_MS); >>>>>>> + } while (i++ < MAX_TRY_TIMES); >>>>>>> + if (i >= MAX_TRY_TIMES || >>>>>>> + vf->cmd_retval != >>>>>> VIRTCHNL_STATUS_SUCCESS) { >>>>>>> + err = -1; >>>>>>> + PMD_DRV_LOG(ERR, "No response >> or return >>>>>> failure (%d)" >>>>>>> + " for cmd %d", vf- >>>>>>> cmd_retval, args->ops); >>>>>>> + } >>>>>>> _clear_cmd(vf); >>>>>>> - err = -EIO; >>>>>>> - } else if (vf->cmd_retval == >>>>>>> - VIRTCHNL_STATUS_ERR_NOT_SUPPORTED) >> { >>>>>>> - PMD_DRV_LOG(ERR, "Cmd %d not >> supported", args- >>>>>>> ops); >>>>>>> - err = -ENOTSUP; >>>>>>> - } else if (vf->cmd_retval != >> VIRTCHNL_STATUS_SUCCESS) { >>>>>>> - PMD_DRV_LOG(ERR, "Return failure %d for >> cmd %d", >>>>>>> - vf->cmd_retval, args->ops); >>>>>>> - err = -EINVAL; >>>>>>> + } else { >>>>>>> + /* For other virtchnl ops in running time, >>>>>>> + * wait for the cmd done flag. >>>>>>> + */ >>>>>>> + do { >>>>>>> + if (vf->pend_cmd == >>>>>> VIRTCHNL_OP_UNKNOWN) >>>>>>> + break; >>>>>>> + iavf_msec_delay(ASQ_DELAY_MS); >>>>>>> + /* If don't read msg or read sys event, >>>>>> continue */ >>>>>>> + } while (i++ < MAX_TRY_TIMES); >>>>>>> + >>>>>>> + if (i >= MAX_TRY_TIMES) { >>>>>>> + PMD_DRV_LOG(ERR, "No response >> for >>>>>> cmd %d", args->ops); >>>>>>> + _clear_cmd(vf); >>>>>>> + err = -EIO; >>>>>>> + } else if (vf->cmd_retval == >>>>>>> + >> VIRTCHNL_STATUS_ERR_NOT_SUPPORTED) { >>>>>>> + PMD_DRV_LOG(ERR, "Cmd %d not >>>>>> supported", args->ops); >>>>>>> + err = -ENOTSUP; >>>>>>> + } else if (vf->cmd_retval != >>>>>> VIRTCHNL_STATUS_SUCCESS) { >>>>>>> + PMD_DRV_LOG(ERR, "Return >> failure %d for >>>>>> cmd %d", >>>>>>> + vf->cmd_retval, args- >>> ops); >>>>>>> + err = -EINVAL; >>>>>>> + } >>>>>>> } >>>>>>> break; >>>>>>> } >>>>>>> @@ -403,8 +424,14 @@ iavf_execute_vf_cmd_safe(struct >>> iavf_adapter >>>>>>> *adapter, { >>>>>>> struct iavf_info *vf = IAVF_DEV_PRIVATE_TO_VF(adapter); >>>>>>> int ret; >>>>>>> + int is_intr_thread = rte_thread_is_intr(); >>>>>>> >>>>>>> - rte_spinlock_lock(&vf->aq_lock); >>>>>>> + if (is_intr_thread) { >>>>>>> + if (!rte_spinlock_trylock(&vf->aq_lock)) >>>>>>> + return -EIO; >>>>>>> + } else { >>>>>>> + rte_spinlock_lock(&vf->aq_lock); >>>>>>> + } >>>>>>> ret = iavf_execute_vf_cmd(adapter, args, async); >>>>>>> rte_spinlock_unlock(&vf->aq_lock); >>>>>>> >>>>> >>>> >