On 12/20/2022 6:52 AM, You, KaisenX wrote: > > >> -----Original Message----- >> From: Ferruh Yigit <ferruh.yi...@amd.com> >> Sent: 2022年12月13日 21:28 >> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov, >> Anatoly <anatoly.bura...@intel.com>; David Marchand >> <david.march...@redhat.com> >> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, YidingX >> <yidingx.z...@intel.com>; Wu, Jingjing <jingjing...@intel.com>; Xing, >> Beilei <beilei.x...@intel.com>; Zhang, Qi Z <qi.z.zh...@intel.com>; Luca >> Boccassi <bl...@debian.org>; Mcnamara, John >> <john.mcnam...@intel.com>; Kevin Traynor <ktray...@redhat.com> >> Subject: Re: [PATCH] net/iavf:fix slow memory allocation >> >> On 12/13/2022 9:35 AM, Ferruh Yigit wrote: >>> On 12/13/2022 7:52 AM, You, KaisenX wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Ferruh Yigit <ferruh.yi...@amd.com> >>>>> Sent: 2022年12月8日 23:04 >>>>> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov, >>>>> Anatoly <anatoly.bura...@intel.com>; David Marchand >>>>> <david.march...@redhat.com> >>>>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, >>>>> YidingX <yidingx.z...@intel.com>; Wu, Jingjing >>>>> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; >>>>> Zhang, Qi Z <qi.z.zh...@intel.com>; Luca Boccassi >>>>> <bl...@debian.org>; Mcnamara, John <john.mcnam...@intel.com>; >> Kevin >>>>> Traynor <ktray...@redhat.com> >>>>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation >>>>> >>>>> On 11/17/2022 6:57 AM, Kaisen You wrote: >>>>>> In some cases, the DPDK does not allocate hugepage heap memory to >>>>> some >>>>>> sockets due to the user setting parameters (e.g. -l 40-79, SOCKET 0 >>>>>> has no memory). >>>>>> When the interrupt thread runs on the corresponding core of this >>>>>> socket, each allocation/release will execute a whole set of heap >>>>>> allocation/release operations,resulting in poor performance. >>>>>> Instead we call malloc() to get memory from the system's heap space >>>>>> to fix this problem. >>>>>> >>>>> >>>>> Hi Kaisen, >>>>> >>>>> Using libc malloc can improve performance for this case, but I would >>>>> like to understand root cause of the problem. >>>>> >>>>> >>>>> As far as I can see, interrupt callbacks are run by interrupt thread >>>>> ("eal-intr- thread"), and interrupt thread created by >> 'rte_ctrl_thread_create()' API. >>>>> >>>>> 'rte_ctrl_thread_create()' comment mentions that "CPU affinity >>>>> retrieved at the time 'rte_eal_init()' was called," >>>>> >>>>> And 'rte_eal_init()' is run on main lcore, which is the first lcore >>>>> in the core list (unless otherwise defined with --main-lcore). >>>>> >>>>> So, the interrupts should be running on a core that has hugepages >>>>> allocated for it, am I missing something here? >>>>> >>>>> >>>> Thank for your comments. Let me try to explain the root cause here: >>>> eal_intr_thread the CPU in the corresponding slot does not create >> memory pool. >>>> That results in frequent memory subsequently creating/destructing. >>>> >>>> When testpmd started, the parameter (e.g. -l 40-79) is set. >>>> Different OS has different topology. Some OS like SUSE only creates >>>> memory pool for one CPU slot, while other system creates for two. >>>> That is why the problem occurs when using memories in different OS. >>> >>> >>> It is testpmd application that decides from which socket to allocate >>> memory from, right. This is nothing specific to OS. >>> >>> As far as I remember, testpmd logic is too allocate from socket that >>> its cores are used (provided with -l parameter), and allocate from >>> socket that device is attached to. >>> >>> So, in a dual socket system, if all used cores are in socket 1 and the >>> NIC is in socket 1, no memory is allocated for socket 0. This is to >>> optimize memory consumption. >>> >>> >>> Can you please confirm that the problem you are observing is because >>> interrupt handler is running on a CPU, which doesn't have memory >>> allocated for its socket? >>> >>> In this case what I don't understand is why interrupts is not running >>> on main lcore, which should be first core in the list, for "-l 40-79" >>> sample it should be lcore 40. >>> For your case, is interrupt handler run on core 0? Or any arbitrary core? >>> If so, can you please confirm when you provide core list as "-l 0,40-79" >>> fixes the issue? >>> > First of all, sorry to reply to you so late. > I can confirm that the problem I observed is because interrupt handler is > running on a CPU, which doesn't have memory allocated for its socket. > > In my case, interrupt handler is running on core 0. > I tried providing "-l 0,40-79" as a startup parameter, this issue can be > resolved. > > I corrected the previous statement that this problem does only occur on > the SUSE system. In any OS, this problem occurs as long as the range of > startup parameters is only on node1. > >>> >>>>> >>>>> >>>>> And what about using 'rte_malloc_socket()' API (instead of >>>>> rte_malloc), which gets 'socket' as parameter, and provide the >>>>> socket that devices is on as parameter to this API? Is it possible to test >> this? >>>>> >>>>> >>>> As to the reason for not using rte_malloc_socket. I thought >>>> rte_malloc_socket() could solve the problem too. And the appropriate >>>> parameter should be the socket_id that created the memory pool for >>>> DPDK initialization. Assuming that> the socket_id of the initially >>>> allocated memory = 1, first let the >>> eal_intr_thread >>>> determine if it is on the socket_id, then record this socket_id in >>>> the eal_intr_thread and pass it to the iavf_event_thread. But there >>>> seems no way to link this parameter to the iavf_dev_event_post() >> function. That is why rte_malloc_socket is not used. >>>> >>> >>> I was thinking socket id of device can be used, but that won't help if >>> the core that interrupt handler runs is in different socket. >>> And I also don't know if there is a way to get socket that interrupt >>> thread is on. @David may help perhaps. >>> >>> So question is why interrupt thread is not running on main lcore. >>> >> >> OK after some talk with David, what I am missing is >> 'rte_ctrl_thread_create()' >> does NOT run on main lcore, it can run on any core except data plane cores. >> >> Driver "iavf-event-thread" thread (iavf_dev_event_handle()) and interrupt >> thread (so driver interrupt callback iavf_dev_event_post()) can run on any >> core, making it hard to manage. >> And it seems it is not possible to control where interrupt thread to run. >> >> One option can be allocating hugepages for all sockets, but this requires >> user >> involvement, and can't happen transparently. >> >> Other option can be to control where "iavf-event-thread" run, like using >> 'rte_thread_create()' to create thread and provide attribute to run it on >> main >> lcore (rte_lcore_cpuset(rte_get_main_lcore()))? >> >> Can you please test above option? >> >> > The first option can solve this issue. but to borrow from your previous > saying, > "in a dual socket system, if all used cores are in socket 1 and the NIC is in > socket 1, > no memory is allocated for socket 0. This is to optimize memory > consumption." > I think it's unreasonable to do so. > > About other option. In " rte_eal_intr_init" function, After the thread is > created, > I set the thread affinity for eal-intr-thread, but it does not solve this > issue.
Hi Kaisen, There are two threads involved, First one is interrupt thread, "eal-intr-thread", created by 'rte_eal_intr_init()'. Second one is iavf event handler, "iavf-event-thread", created by 'iavf_dev_event_handler_init()'. First one triggered by interrupt and puts a message to a list, second one consumes from the list and processes the message. So I assume two thread being in different sockets, or memory being allocated in a different socket than the cores running causes the performance issue. Did you test the second thread, "iavf-event-thread", affiliated to main core? (by creating thread using 'rte_thread_create()' API) >> >>>> Let me know if there is anything else unclear. >>>>> >>>>>> Fixes: cb5c1b91f76f ("net/iavf: add thread for event callbacks") >>>>>> Cc: sta...@dpdk.org >>>>>> >>>>>> Signed-off-by: Kaisen You <kaisenx....@intel.com> >>>>>> --- >>>>>> drivers/net/iavf/iavf_vchnl.c | 8 +++----- >>>>>> 1 file changed, 3 insertions(+), 5 deletions(-) >>>>>> >>>>>> diff --git a/drivers/net/iavf/iavf_vchnl.c >>>>>> b/drivers/net/iavf/iavf_vchnl.c index f92daf97f2..a05791fe48 100644 >>>>>> --- a/drivers/net/iavf/iavf_vchnl.c >>>>>> +++ b/drivers/net/iavf/iavf_vchnl.c >>>>>> @@ -36,7 +36,6 @@ struct iavf_event_element { >>>>>> struct rte_eth_dev *dev; >>>>>> enum rte_eth_event_type event; >>>>>> void *param; >>>>>> - size_t param_alloc_size; >>>>>> uint8_t param_alloc_data[0]; >>>>>> }; >>>>>> >>>>>> @@ -80,7 +79,7 @@ iavf_dev_event_handle(void *param >> __rte_unused) >>>>>> TAILQ_FOREACH_SAFE(pos, &pending, next, save_next) { >>>>>> TAILQ_REMOVE(&pending, pos, next); >>>>>> rte_eth_dev_callback_process(pos->dev, pos- >>>>>> event, >> pos->param); >>>>>> - rte_free(pos); >>>>>> + free(pos); >>>>>> } >>>>>> } >>>>>> >>>>>> @@ -94,14 +93,13 @@ iavf_dev_event_post(struct rte_eth_dev *dev, >> { >>>>>> struct iavf_event_handler *handler = &event_handler; >>>>>> char notify_byte; >>>>>> - struct iavf_event_element *elem = rte_malloc(NULL, sizeof(*elem) >>>>> + param_alloc_size, 0); >>>>>> + struct iavf_event_element *elem = malloc(sizeof(*elem) + >>>>>> +param_alloc_size); >>>>>> if (!elem) >>>>>> return; >>>>>> >>>>>> elem->dev = dev; >>>>>> elem->event = event; >>>>>> elem->param = param; >>>>>> - elem->param_alloc_size = param_alloc_size; >>>>>> if (param && param_alloc_size) { >>>>>> rte_memcpy(elem->param_alloc_data, param, >>>>> param_alloc_size); >>>>>> elem->param = elem->param_alloc_data; @@ -165,7 +163,7 >>>>> @@ >>>>>> iavf_dev_event_handler_fini(void) >>>>>> struct iavf_event_element *pos, *save_next; >>>>>> TAILQ_FOREACH_SAFE(pos, &handler->pending, next, save_next) { >>>>>> TAILQ_REMOVE(&handler->pending, pos, next); >>>>>> - rte_free(pos); >>>>>> + free(pos); >>>>>> } >>>>>> } >>>>>> >>>> >>> >