On 12/22/2022 7:23 AM, You, KaisenX wrote: > > >> -----Original Message----- >> From: Ferruh Yigit <ferruh.yi...@amd.com> >> Sent: 2022年12月21日 21:49 >> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov, >> Anatoly <anatoly.bura...@intel.com>; David Marchand >> <david.march...@redhat.com> >> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, YidingX >> <yidingx.z...@intel.com>; Wu, Jingjing <jingjing...@intel.com>; Xing, >> Beilei <beilei.x...@intel.com>; Zhang, Qi Z <qi.z.zh...@intel.com>; Luca >> Boccassi <bl...@debian.org>; Mcnamara, John >> <john.mcnam...@intel.com>; Kevin Traynor <ktray...@redhat.com> >> Subject: Re: [PATCH] net/iavf:fix slow memory allocation >> >> On 12/20/2022 6:52 AM, You, KaisenX wrote: >>> >>> >>>> -----Original Message----- >>>> From: Ferruh Yigit <ferruh.yi...@amd.com> >>>> Sent: 2022年12月13日 21:28 >>>> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov, >>>> Anatoly <anatoly.bura...@intel.com>; David Marchand >>>> <david.march...@redhat.com> >>>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, >>>> YidingX <yidingx.z...@intel.com>; Wu, Jingjing >>>> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; Zhang, >>>> Qi Z <qi.z.zh...@intel.com>; Luca Boccassi <bl...@debian.org>; >>>> Mcnamara, John <john.mcnam...@intel.com>; Kevin Traynor >>>> <ktray...@redhat.com> >>>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation >>>> >>>> On 12/13/2022 9:35 AM, Ferruh Yigit wrote: >>>>> On 12/13/2022 7:52 AM, You, KaisenX wrote: >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Ferruh Yigit <ferruh.yi...@amd.com> >>>>>>> Sent: 2022年12月8日 23:04 >>>>>>> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov, >>>>>>> Anatoly <anatoly.bura...@intel.com>; David Marchand >>>>>>> <david.march...@redhat.com> >>>>>>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, >>>>>>> YidingX <yidingx.z...@intel.com>; Wu, Jingjing >>>>>>> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; >>>>>>> Zhang, Qi Z <qi.z.zh...@intel.com>; Luca Boccassi >>>>>>> <bl...@debian.org>; Mcnamara, John <john.mcnam...@intel.com>; >>>> Kevin >>>>>>> Traynor <ktray...@redhat.com> >>>>>>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation >>>>>>> >>>>>>> On 11/17/2022 6:57 AM, Kaisen You wrote: >>>>>>>> In some cases, the DPDK does not allocate hugepage heap memory >> to >>>>>>> some >>>>>>>> sockets due to the user setting parameters (e.g. -l 40-79, SOCKET >>>>>>>> 0 has no memory). >>>>>>>> When the interrupt thread runs on the corresponding core of this >>>>>>>> socket, each allocation/release will execute a whole set of heap >>>>>>>> allocation/release operations,resulting in poor performance. >>>>>>>> Instead we call malloc() to get memory from the system's heap >>>>>>>> space to fix this problem. >>>>>>>> >>>>>>> >>>>>>> Hi Kaisen, >>>>>>> >>>>>>> Using libc malloc can improve performance for this case, but I >>>>>>> would like to understand root cause of the problem. >>>>>>> >>>>>>> >>>>>>> As far as I can see, interrupt callbacks are run by interrupt >>>>>>> thread >>>>>>> ("eal-intr- thread"), and interrupt thread created by >>>> 'rte_ctrl_thread_create()' API. >>>>>>> >>>>>>> 'rte_ctrl_thread_create()' comment mentions that "CPU affinity >>>>>>> retrieved at the time 'rte_eal_init()' was called," >>>>>>> >>>>>>> And 'rte_eal_init()' is run on main lcore, which is the first >>>>>>> lcore in the core list (unless otherwise defined with --main-lcore). >>>>>>> >>>>>>> So, the interrupts should be running on a core that has hugepages >>>>>>> allocated for it, am I missing something here? >>>>>>> >>>>>>> >>>>>> Thank for your comments. Let me try to explain the root cause here: >>>>>> eal_intr_thread the CPU in the corresponding slot does not create >>>> memory pool. >>>>>> That results in frequent memory subsequently creating/destructing. >>>>>> >>>>>> When testpmd started, the parameter (e.g. -l 40-79) is set. >>>>>> Different OS has different topology. Some OS like SUSE only creates >>>>>> memory pool for one CPU slot, while other system creates for two. >>>>>> That is why the problem occurs when using memories in different OS. >>>>> >>>>> >>>>> It is testpmd application that decides from which socket to allocate >>>>> memory from, right. This is nothing specific to OS. >>>>> >>>>> As far as I remember, testpmd logic is too allocate from socket that >>>>> its cores are used (provided with -l parameter), and allocate from >>>>> socket that device is attached to. >>>>> >>>>> So, in a dual socket system, if all used cores are in socket 1 and >>>>> the NIC is in socket 1, no memory is allocated for socket 0. This is >>>>> to optimize memory consumption. >>>>> >>>>> >>>>> Can you please confirm that the problem you are observing is because >>>>> interrupt handler is running on a CPU, which doesn't have memory >>>>> allocated for its socket? >>>>> >>>>> In this case what I don't understand is why interrupts is not >>>>> running on main lcore, which should be first core in the list, for "-l >>>>> 40-79" >>>>> sample it should be lcore 40. >>>>> For your case, is interrupt handler run on core 0? Or any arbitrary core? >>>>> If so, can you please confirm when you provide core list as "-l 0,40-79" >>>>> fixes the issue? >>>>> >>> First of all, sorry to reply to you so late. >>> I can confirm that the problem I observed is because interrupt >>> handler is running on a CPU, which doesn't have memory allocated for its >> socket. >>> >>> In my case, interrupt handler is running on core 0. >>> I tried providing "-l 0,40-79" as a startup parameter, this issue can be >> resolved. >>> >>> I corrected the previous statement that this problem does only occur >>> on the SUSE system. In any OS, this problem occurs as long as the >>> range of startup parameters is only on node1. >>> >>>>> >>>>>>> >>>>>>> >>>>>>> And what about using 'rte_malloc_socket()' API (instead of >>>>>>> rte_malloc), which gets 'socket' as parameter, and provide the >>>>>>> socket that devices is on as parameter to this API? Is it possible >>>>>>> to test >>>> this? >>>>>>> >>>>>>> >>>>>> As to the reason for not using rte_malloc_socket. I thought >>>>>> rte_malloc_socket() could solve the problem too. And the >>>>>> appropriate parameter should be the socket_id that created the >>>>>> memory pool for DPDK initialization. Assuming that> the socket_id >>>>>> of the initially allocated memory = 1, first let the >>>>> eal_intr_thread >>>>>> determine if it is on the socket_id, then record this socket_id in >>>>>> the eal_intr_thread and pass it to the iavf_event_thread. But >>>>>> there seems no way to link this parameter to the >>>>>> iavf_dev_event_post() >>>> function. That is why rte_malloc_socket is not used. >>>>>> >>>>> >>>>> I was thinking socket id of device can be used, but that won't help >>>>> if the core that interrupt handler runs is in different socket. >>>>> And I also don't know if there is a way to get socket that interrupt >>>>> thread is on. @David may help perhaps. >>>>> >>>>> So question is why interrupt thread is not running on main lcore. >>>>> >>>> >>>> OK after some talk with David, what I am missing is >> 'rte_ctrl_thread_create()' >>>> does NOT run on main lcore, it can run on any core except data plane >> cores. >>>> >>>> Driver "iavf-event-thread" thread (iavf_dev_event_handle()) and >>>> interrupt thread (so driver interrupt callback iavf_dev_event_post()) >>>> can run on any core, making it hard to manage. >>>> And it seems it is not possible to control where interrupt thread to run. >>>> >>>> One option can be allocating hugepages for all sockets, but this >>>> requires user involvement, and can't happen transparently. >>>> >>>> Other option can be to control where "iavf-event-thread" run, like >>>> using 'rte_thread_create()' to create thread and provide attribute to >>>> run it on main lcore (rte_lcore_cpuset(rte_get_main_lcore()))? >>>> >>>> Can you please test above option? >>>> >>>> >>> The first option can solve this issue. but to borrow from your >>> previous saying, "in a dual socket system, if all used cores are in >>> socket 1 and the NIC is in socket 1, no memory is allocated for socket 0. >> This is to optimize memory consumption." >>> I think it's unreasonable to do so. >>> >>> About other option. In " rte_eal_intr_init" function, After the thread >>> is created, I set the thread affinity for eal-intr-thread, but it does not >>> solve >> this issue. >> >> Hi Kaisen, >> >> There are two threads involved, >> >> First one is interrupt thread, "eal-intr-thread", created by >> 'rte_eal_intr_init()'. >> >> Second one is iavf event handler, "iavf-event-thread", created by >> 'iavf_dev_event_handler_init()'. >> >> First one triggered by interrupt and puts a message to a list, second one >> consumes from the list and processes the message. >> So I assume two thread being in different sockets, or memory being >> allocated in a different socket than the cores running causes the >> performance issue. >> >> Did you test the second thread, "iavf-event-thread", affiliated to main core? >> (by creating thread using 'rte_thread_create()' API) >> >> > I tried to use ''rte_thread_create() 'API creates the second thread, > but this issue still exists. > > Because malloc is executed by "eal_intr_thread", it has nothing > to do with "iavf_event_thread". >
Since 'iavf_event_element' (pos which is allocated by malloc() accessed and freed in "iavf-event-thread", it could be related. But if it doesn't fix that is OK, thanks for testing. > But I found a patch similar to my issue: > https://patchwork.dpdk.org/project/dpdk/patch/20221221104858.296530-1-david.march...@redhat.com/ > According to the patch modification, this issue can be solved. > I guess that patch inspired from this discussion, and if it fixes the issue, I prefer that one as generic solution. >>>> >>>>>> Let me know if there is anything else unclear. >>>>>>> >>>>>>>> Fixes: cb5c1b91f76f ("net/iavf: add thread for event callbacks") >>>>>>>> Cc: sta...@dpdk.org >>>>>>>> >>>>>>>> Signed-off-by: Kaisen You <kaisenx....@intel.com> >>>>>>>> --- >>>>>>>> drivers/net/iavf/iavf_vchnl.c | 8 +++----- >>>>>>>> 1 file changed, 3 insertions(+), 5 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/net/iavf/iavf_vchnl.c >>>>>>>> b/drivers/net/iavf/iavf_vchnl.c index f92daf97f2..a05791fe48 >>>>>>>> 100644 >>>>>>>> --- a/drivers/net/iavf/iavf_vchnl.c >>>>>>>> +++ b/drivers/net/iavf/iavf_vchnl.c >>>>>>>> @@ -36,7 +36,6 @@ struct iavf_event_element { >>>>>>>> struct rte_eth_dev *dev; >>>>>>>> enum rte_eth_event_type event; >>>>>>>> void *param; >>>>>>>> - size_t param_alloc_size; >>>>>>>> uint8_t param_alloc_data[0]; >>>>>>>> }; >>>>>>>> >>>>>>>> @@ -80,7 +79,7 @@ iavf_dev_event_handle(void *param >>>> __rte_unused) >>>>>>>> TAILQ_FOREACH_SAFE(pos, &pending, next, >> save_next) { >>>>>>>> TAILQ_REMOVE(&pending, pos, next); >>>>>>>> rte_eth_dev_callback_process(pos->dev, >> pos- event, >>>> pos->param); >>>>>>>> - rte_free(pos); >>>>>>>> + free(pos); >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> @@ -94,14 +93,13 @@ iavf_dev_event_post(struct rte_eth_dev >> *dev, >>>> { >>>>>>>> struct iavf_event_handler *handler = &event_handler; >>>>>>>> char notify_byte; >>>>>>>> - struct iavf_event_element *elem = rte_malloc(NULL, >> sizeof(*elem) >>>>>>> + param_alloc_size, 0); >>>>>>>> + struct iavf_event_element *elem = malloc(sizeof(*elem) + >>>>>>>> +param_alloc_size); >>>>>>>> if (!elem) >>>>>>>> return; >>>>>>>> >>>>>>>> elem->dev = dev; >>>>>>>> elem->event = event; >>>>>>>> elem->param = param; >>>>>>>> - elem->param_alloc_size = param_alloc_size; >>>>>>>> if (param && param_alloc_size) { >>>>>>>> rte_memcpy(elem->param_alloc_data, param, >>>>>>> param_alloc_size); >>>>>>>> elem->param = elem->param_alloc_data; @@ -165,7 >> +163,7 >>>>>>> @@ >>>>>>>> iavf_dev_event_handler_fini(void) >>>>>>>> struct iavf_event_element *pos, *save_next; >>>>>>>> TAILQ_FOREACH_SAFE(pos, &handler->pending, next, >> save_next) { >>>>>>>> TAILQ_REMOVE(&handler->pending, pos, next); >>>>>>>> - rte_free(pos); >>>>>>>> + free(pos); >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>> >>>>> >>> >