> -----Original Message----- > From: Ferruh Yigit <ferruh.yi...@amd.com> > Sent: 2022年12月21日 21:49 > To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov, > Anatoly <anatoly.bura...@intel.com>; David Marchand > <david.march...@redhat.com> > Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, YidingX > <yidingx.z...@intel.com>; Wu, Jingjing <jingjing...@intel.com>; Xing, > Beilei <beilei.x...@intel.com>; Zhang, Qi Z <qi.z.zh...@intel.com>; Luca > Boccassi <bl...@debian.org>; Mcnamara, John > <john.mcnam...@intel.com>; Kevin Traynor <ktray...@redhat.com> > Subject: Re: [PATCH] net/iavf:fix slow memory allocation > > On 12/20/2022 6:52 AM, You, KaisenX wrote: > > > > > >> -----Original Message----- > >> From: Ferruh Yigit <ferruh.yi...@amd.com> > >> Sent: 2022年12月13日 21:28 > >> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov, > >> Anatoly <anatoly.bura...@intel.com>; David Marchand > >> <david.march...@redhat.com> > >> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, > >> YidingX <yidingx.z...@intel.com>; Wu, Jingjing > >> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; Zhang, > >> Qi Z <qi.z.zh...@intel.com>; Luca Boccassi <bl...@debian.org>; > >> Mcnamara, John <john.mcnam...@intel.com>; Kevin Traynor > >> <ktray...@redhat.com> > >> Subject: Re: [PATCH] net/iavf:fix slow memory allocation > >> > >> On 12/13/2022 9:35 AM, Ferruh Yigit wrote: > >>> On 12/13/2022 7:52 AM, You, KaisenX wrote: > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: Ferruh Yigit <ferruh.yi...@amd.com> > >>>>> Sent: 2022年12月8日 23:04 > >>>>> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov, > >>>>> Anatoly <anatoly.bura...@intel.com>; David Marchand > >>>>> <david.march...@redhat.com> > >>>>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, > >>>>> YidingX <yidingx.z...@intel.com>; Wu, Jingjing > >>>>> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; > >>>>> Zhang, Qi Z <qi.z.zh...@intel.com>; Luca Boccassi > >>>>> <bl...@debian.org>; Mcnamara, John <john.mcnam...@intel.com>; > >> Kevin > >>>>> Traynor <ktray...@redhat.com> > >>>>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation > >>>>> > >>>>> On 11/17/2022 6:57 AM, Kaisen You wrote: > >>>>>> In some cases, the DPDK does not allocate hugepage heap memory > to > >>>>> some > >>>>>> sockets due to the user setting parameters (e.g. -l 40-79, SOCKET > >>>>>> 0 has no memory). > >>>>>> When the interrupt thread runs on the corresponding core of this > >>>>>> socket, each allocation/release will execute a whole set of heap > >>>>>> allocation/release operations,resulting in poor performance. > >>>>>> Instead we call malloc() to get memory from the system's heap > >>>>>> space to fix this problem. > >>>>>> > >>>>> > >>>>> Hi Kaisen, > >>>>> > >>>>> Using libc malloc can improve performance for this case, but I > >>>>> would like to understand root cause of the problem. > >>>>> > >>>>> > >>>>> As far as I can see, interrupt callbacks are run by interrupt > >>>>> thread > >>>>> ("eal-intr- thread"), and interrupt thread created by > >> 'rte_ctrl_thread_create()' API. > >>>>> > >>>>> 'rte_ctrl_thread_create()' comment mentions that "CPU affinity > >>>>> retrieved at the time 'rte_eal_init()' was called," > >>>>> > >>>>> And 'rte_eal_init()' is run on main lcore, which is the first > >>>>> lcore in the core list (unless otherwise defined with --main-lcore). > >>>>> > >>>>> So, the interrupts should be running on a core that has hugepages > >>>>> allocated for it, am I missing something here? > >>>>> > >>>>> > >>>> Thank for your comments. Let me try to explain the root cause here: > >>>> eal_intr_thread the CPU in the corresponding slot does not create > >> memory pool. > >>>> That results in frequent memory subsequently creating/destructing. > >>>> > >>>> When testpmd started, the parameter (e.g. -l 40-79) is set. > >>>> Different OS has different topology. Some OS like SUSE only creates > >>>> memory pool for one CPU slot, while other system creates for two. > >>>> That is why the problem occurs when using memories in different OS. > >>> > >>> > >>> It is testpmd application that decides from which socket to allocate > >>> memory from, right. This is nothing specific to OS. > >>> > >>> As far as I remember, testpmd logic is too allocate from socket that > >>> its cores are used (provided with -l parameter), and allocate from > >>> socket that device is attached to. > >>> > >>> So, in a dual socket system, if all used cores are in socket 1 and > >>> the NIC is in socket 1, no memory is allocated for socket 0. This is > >>> to optimize memory consumption. > >>> > >>> > >>> Can you please confirm that the problem you are observing is because > >>> interrupt handler is running on a CPU, which doesn't have memory > >>> allocated for its socket? > >>> > >>> In this case what I don't understand is why interrupts is not > >>> running on main lcore, which should be first core in the list, for "-l > >>> 40-79" > >>> sample it should be lcore 40. > >>> For your case, is interrupt handler run on core 0? Or any arbitrary core? > >>> If so, can you please confirm when you provide core list as "-l 0,40-79" > >>> fixes the issue? > >>> > > First of all, sorry to reply to you so late. > > I can confirm that the problem I observed is because interrupt > > handler is running on a CPU, which doesn't have memory allocated for its > socket. > > > > In my case, interrupt handler is running on core 0. > > I tried providing "-l 0,40-79" as a startup parameter, this issue can be > resolved. > > > > I corrected the previous statement that this problem does only occur > > on the SUSE system. In any OS, this problem occurs as long as the > > range of startup parameters is only on node1. > > > >>> > >>>>> > >>>>> > >>>>> And what about using 'rte_malloc_socket()' API (instead of > >>>>> rte_malloc), which gets 'socket' as parameter, and provide the > >>>>> socket that devices is on as parameter to this API? Is it possible > >>>>> to test > >> this? > >>>>> > >>>>> > >>>> As to the reason for not using rte_malloc_socket. I thought > >>>> rte_malloc_socket() could solve the problem too. And the > >>>> appropriate parameter should be the socket_id that created the > >>>> memory pool for DPDK initialization. Assuming that> the socket_id > >>>> of the initially allocated memory = 1, first let the > >>> eal_intr_thread > >>>> determine if it is on the socket_id, then record this socket_id in > >>>> the eal_intr_thread and pass it to the iavf_event_thread. But > >>>> there seems no way to link this parameter to the > >>>> iavf_dev_event_post() > >> function. That is why rte_malloc_socket is not used. > >>>> > >>> > >>> I was thinking socket id of device can be used, but that won't help > >>> if the core that interrupt handler runs is in different socket. > >>> And I also don't know if there is a way to get socket that interrupt > >>> thread is on. @David may help perhaps. > >>> > >>> So question is why interrupt thread is not running on main lcore. > >>> > >> > >> OK after some talk with David, what I am missing is > 'rte_ctrl_thread_create()' > >> does NOT run on main lcore, it can run on any core except data plane > cores. > >> > >> Driver "iavf-event-thread" thread (iavf_dev_event_handle()) and > >> interrupt thread (so driver interrupt callback iavf_dev_event_post()) > >> can run on any core, making it hard to manage. > >> And it seems it is not possible to control where interrupt thread to run. > >> > >> One option can be allocating hugepages for all sockets, but this > >> requires user involvement, and can't happen transparently. > >> > >> Other option can be to control where "iavf-event-thread" run, like > >> using 'rte_thread_create()' to create thread and provide attribute to > >> run it on main lcore (rte_lcore_cpuset(rte_get_main_lcore()))? > >> > >> Can you please test above option? > >> > >> > > The first option can solve this issue. but to borrow from your > > previous saying, "in a dual socket system, if all used cores are in > > socket 1 and the NIC is in socket 1, no memory is allocated for socket 0. > This is to optimize memory consumption." > > I think it's unreasonable to do so. > > > > About other option. In " rte_eal_intr_init" function, After the thread > > is created, I set the thread affinity for eal-intr-thread, but it does not > > solve > this issue. > > Hi Kaisen, > > There are two threads involved, > > First one is interrupt thread, "eal-intr-thread", created by > 'rte_eal_intr_init()'. > > Second one is iavf event handler, "iavf-event-thread", created by > 'iavf_dev_event_handler_init()'. > > First one triggered by interrupt and puts a message to a list, second one > consumes from the list and processes the message. > So I assume two thread being in different sockets, or memory being > allocated in a different socket than the cores running causes the > performance issue. > > Did you test the second thread, "iavf-event-thread", affiliated to main core? > (by creating thread using 'rte_thread_create()' API) > > I tried to use ''rte_thread_create() 'API creates the second thread, but this issue still exists.
Because malloc is executed by "eal_intr_thread", it has nothing to do with "iavf_event_thread". But I found a patch similar to my issue: https://patchwork.dpdk.org/project/dpdk/patch/20221221104858.296530-1-david.march...@redhat.com/ According to the patch modification, this issue can be solved. > >> > >>>> Let me know if there is anything else unclear. > >>>>> > >>>>>> Fixes: cb5c1b91f76f ("net/iavf: add thread for event callbacks") > >>>>>> Cc: sta...@dpdk.org > >>>>>> > >>>>>> Signed-off-by: Kaisen You <kaisenx....@intel.com> > >>>>>> --- > >>>>>> drivers/net/iavf/iavf_vchnl.c | 8 +++----- > >>>>>> 1 file changed, 3 insertions(+), 5 deletions(-) > >>>>>> > >>>>>> diff --git a/drivers/net/iavf/iavf_vchnl.c > >>>>>> b/drivers/net/iavf/iavf_vchnl.c index f92daf97f2..a05791fe48 > >>>>>> 100644 > >>>>>> --- a/drivers/net/iavf/iavf_vchnl.c > >>>>>> +++ b/drivers/net/iavf/iavf_vchnl.c > >>>>>> @@ -36,7 +36,6 @@ struct iavf_event_element { > >>>>>> struct rte_eth_dev *dev; > >>>>>> enum rte_eth_event_type event; > >>>>>> void *param; > >>>>>> - size_t param_alloc_size; > >>>>>> uint8_t param_alloc_data[0]; > >>>>>> }; > >>>>>> > >>>>>> @@ -80,7 +79,7 @@ iavf_dev_event_handle(void *param > >> __rte_unused) > >>>>>> TAILQ_FOREACH_SAFE(pos, &pending, next, > save_next) { > >>>>>> TAILQ_REMOVE(&pending, pos, next); > >>>>>> rte_eth_dev_callback_process(pos->dev, > pos- event, > >> pos->param); > >>>>>> - rte_free(pos); > >>>>>> + free(pos); > >>>>>> } > >>>>>> } > >>>>>> > >>>>>> @@ -94,14 +93,13 @@ iavf_dev_event_post(struct rte_eth_dev > *dev, > >> { > >>>>>> struct iavf_event_handler *handler = &event_handler; > >>>>>> char notify_byte; > >>>>>> - struct iavf_event_element *elem = rte_malloc(NULL, > sizeof(*elem) > >>>>> + param_alloc_size, 0); > >>>>>> + struct iavf_event_element *elem = malloc(sizeof(*elem) + > >>>>>> +param_alloc_size); > >>>>>> if (!elem) > >>>>>> return; > >>>>>> > >>>>>> elem->dev = dev; > >>>>>> elem->event = event; > >>>>>> elem->param = param; > >>>>>> - elem->param_alloc_size = param_alloc_size; > >>>>>> if (param && param_alloc_size) { > >>>>>> rte_memcpy(elem->param_alloc_data, param, > >>>>> param_alloc_size); > >>>>>> elem->param = elem->param_alloc_data; @@ -165,7 > +163,7 > >>>>> @@ > >>>>>> iavf_dev_event_handler_fini(void) > >>>>>> struct iavf_event_element *pos, *save_next; > >>>>>> TAILQ_FOREACH_SAFE(pos, &handler->pending, next, > save_next) { > >>>>>> TAILQ_REMOVE(&handler->pending, pos, next); > >>>>>> - rte_free(pos); > >>>>>> + free(pos); > >>>>>> } > >>>>>> } > >>>>>> > >>>> > >>> > >