> -----Original Message----- > From: Ferruh Yigit <ferruh.yi...@amd.com> > Sent: Thursday, December 22, 2022 8:07 PM > To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov, Anatoly > <anatoly.bura...@intel.com>; David Marchand > <david.march...@redhat.com> > Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, YidingX > <yidingx.z...@intel.com>; Wu, Jingjing <jingjing...@intel.com>; Xing, Beilei > <beilei.x...@intel.com>; Zhang, Qi Z <qi.z.zh...@intel.com>; Luca Boccassi > <bl...@debian.org>; Mcnamara, John <john.mcnam...@intel.com>; Kevin > Traynor <ktray...@redhat.com> > Subject: Re: [PATCH] net/iavf:fix slow memory allocation > > On 12/22/2022 7:23 AM, You, KaisenX wrote: > > > > > >> -----Original Message----- > >> From: Ferruh Yigit <ferruh.yi...@amd.com> > >> Sent: 2022年12月21日 21:49 > >> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov, > >> Anatoly <anatoly.bura...@intel.com>; David Marchand > >> <david.march...@redhat.com> > >> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, > >> YidingX <yidingx.z...@intel.com>; Wu, Jingjing > >> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; Zhang, > >> Qi Z <qi.z.zh...@intel.com>; Luca Boccassi <bl...@debian.org>; > >> Mcnamara, John <john.mcnam...@intel.com>; Kevin Traynor > >> <ktray...@redhat.com> > >> Subject: Re: [PATCH] net/iavf:fix slow memory allocation > >> > >> On 12/20/2022 6:52 AM, You, KaisenX wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Ferruh Yigit <ferruh.yi...@amd.com> > >>>> Sent: 2022年12月13日 21:28 > >>>> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov, > >>>> Anatoly <anatoly.bura...@intel.com>; David Marchand > >>>> <david.march...@redhat.com> > >>>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, > >>>> YidingX <yidingx.z...@intel.com>; Wu, Jingjing > >>>> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; > >>>> Zhang, Qi Z <qi.z.zh...@intel.com>; Luca Boccassi > >>>> <bl...@debian.org>; Mcnamara, John <john.mcnam...@intel.com>; > Kevin > >>>> Traynor <ktray...@redhat.com> > >>>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation > >>>> > >>>> On 12/13/2022 9:35 AM, Ferruh Yigit wrote: > >>>>> On 12/13/2022 7:52 AM, You, KaisenX wrote: > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Ferruh Yigit <ferruh.yi...@amd.com> > >>>>>>> Sent: 2022年12月8日 23:04 > >>>>>>> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov, > >>>>>>> Anatoly <anatoly.bura...@intel.com>; David Marchand > >>>>>>> <david.march...@redhat.com> > >>>>>>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, > >>>>>>> YidingX <yidingx.z...@intel.com>; Wu, Jingjing > >>>>>>> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; > >>>>>>> Zhang, Qi Z <qi.z.zh...@intel.com>; Luca Boccassi > >>>>>>> <bl...@debian.org>; Mcnamara, John > <john.mcnam...@intel.com>; > >>>> Kevin > >>>>>>> Traynor <ktray...@redhat.com> > >>>>>>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation > >>>>>>> > >>>>>>> On 11/17/2022 6:57 AM, Kaisen You wrote: > >>>>>>>> In some cases, the DPDK does not allocate hugepage heap memory > >> to > >>>>>>> some > >>>>>>>> sockets due to the user setting parameters (e.g. -l 40-79, > >>>>>>>> SOCKET > >>>>>>>> 0 has no memory). > >>>>>>>> When the interrupt thread runs on the corresponding core of > >>>>>>>> this socket, each allocation/release will execute a whole set > >>>>>>>> of heap allocation/release operations,resulting in poor > performance. > >>>>>>>> Instead we call malloc() to get memory from the system's heap > >>>>>>>> space to fix this problem. > >>>>>>>> > >>>>>>> > >>>>>>> Hi Kaisen, > >>>>>>> > >>>>>>> Using libc malloc can improve performance for this case, but I > >>>>>>> would like to understand root cause of the problem. > >>>>>>> > >>>>>>> > >>>>>>> As far as I can see, interrupt callbacks are run by interrupt > >>>>>>> thread > >>>>>>> ("eal-intr- thread"), and interrupt thread created by > >>>> 'rte_ctrl_thread_create()' API. > >>>>>>> > >>>>>>> 'rte_ctrl_thread_create()' comment mentions that "CPU affinity > >>>>>>> retrieved at the time 'rte_eal_init()' was called," > >>>>>>> > >>>>>>> And 'rte_eal_init()' is run on main lcore, which is the first > >>>>>>> lcore in the core list (unless otherwise defined with --main-lcore). > >>>>>>> > >>>>>>> So, the interrupts should be running on a core that has > >>>>>>> hugepages allocated for it, am I missing something here? > >>>>>>> > >>>>>>> > >>>>>> Thank for your comments. Let me try to explain the root cause here: > >>>>>> eal_intr_thread the CPU in the corresponding slot does not create > >>>> memory pool. > >>>>>> That results in frequent memory subsequently creating/destructing. > >>>>>> > >>>>>> When testpmd started, the parameter (e.g. -l 40-79) is set. > >>>>>> Different OS has different topology. Some OS like SUSE only > >>>>>> creates memory pool for one CPU slot, while other system creates for > two. > >>>>>> That is why the problem occurs when using memories in different OS. > >>>>> > >>>>> > >>>>> It is testpmd application that decides from which socket to > >>>>> allocate memory from, right. This is nothing specific to OS. > >>>>> > >>>>> As far as I remember, testpmd logic is too allocate from socket > >>>>> that its cores are used (provided with -l parameter), and allocate > >>>>> from socket that device is attached to. > >>>>> > >>>>> So, in a dual socket system, if all used cores are in socket 1 and > >>>>> the NIC is in socket 1, no memory is allocated for socket 0. This > >>>>> is to optimize memory consumption. > >>>>> > >>>>> > >>>>> Can you please confirm that the problem you are observing is > >>>>> because interrupt handler is running on a CPU, which doesn't have > >>>>> memory allocated for its socket? > >>>>> > >>>>> In this case what I don't understand is why interrupts is not > >>>>> running on main lcore, which should be first core in the list, for "-l > >>>>> 40- > 79" > >>>>> sample it should be lcore 40. > >>>>> For your case, is interrupt handler run on core 0? Or any arbitrary > >>>>> core? > >>>>> If so, can you please confirm when you provide core list as "-l 0,40-79" > >>>>> fixes the issue? > >>>>> > >>> First of all, sorry to reply to you so late. > >>> I can confirm that the problem I observed is because interrupt > >>> handler is running on a CPU, which doesn't have memory allocated for > >>> its > >> socket. > >>> > >>> In my case, interrupt handler is running on core 0. > >>> I tried providing "-l 0,40-79" as a startup parameter, this issue > >>> can be > >> resolved. > >>> > >>> I corrected the previous statement that this problem does only > >>> occur on the SUSE system. In any OS, this problem occurs as long as > >>> the range of startup parameters is only on node1. > >>> > >>>>> > >>>>>>> > >>>>>>> > >>>>>>> And what about using 'rte_malloc_socket()' API (instead of > >>>>>>> rte_malloc), which gets 'socket' as parameter, and provide the > >>>>>>> socket that devices is on as parameter to this API? Is it > >>>>>>> possible to test > >>>> this? > >>>>>>> > >>>>>>> > >>>>>> As to the reason for not using rte_malloc_socket. I thought > >>>>>> rte_malloc_socket() could solve the problem too. And the > >>>>>> appropriate parameter should be the socket_id that created the > >>>>>> memory pool for DPDK initialization. Assuming that> the socket_id > >>>>>> of the initially allocated memory = 1, first let the > >>>>> eal_intr_thread > >>>>>> determine if it is on the socket_id, then record this socket_id > >>>>>> in the eal_intr_thread and pass it to the iavf_event_thread. But > >>>>>> there seems no way to link this parameter to the > >>>>>> iavf_dev_event_post() > >>>> function. That is why rte_malloc_socket is not used. > >>>>>> > >>>>> > >>>>> I was thinking socket id of device can be used, but that won't > >>>>> help if the core that interrupt handler runs is in different socket. > >>>>> And I also don't know if there is a way to get socket that > >>>>> interrupt thread is on. @David may help perhaps. > >>>>> > >>>>> So question is why interrupt thread is not running on main lcore. > >>>>> > >>>> > >>>> OK after some talk with David, what I am missing is > >> 'rte_ctrl_thread_create()' > >>>> does NOT run on main lcore, it can run on any core except data > >>>> plane > >> cores. > >>>> > >>>> Driver "iavf-event-thread" thread (iavf_dev_event_handle()) and > >>>> interrupt thread (so driver interrupt callback > >>>> iavf_dev_event_post()) can run on any core, making it hard to manage. > >>>> And it seems it is not possible to control where interrupt thread to run. > >>>> > >>>> One option can be allocating hugepages for all sockets, but this > >>>> requires user involvement, and can't happen transparently. > >>>> > >>>> Other option can be to control where "iavf-event-thread" run, like > >>>> using 'rte_thread_create()' to create thread and provide attribute > >>>> to run it on main lcore (rte_lcore_cpuset(rte_get_main_lcore()))? > >>>> > >>>> Can you please test above option? > >>>> > >>>> > >>> The first option can solve this issue. but to borrow from your > >>> previous saying, "in a dual socket system, if all used cores are in > >>> socket 1 and the NIC is in socket 1, no memory is allocated for socket 0. > >> This is to optimize memory consumption." > >>> I think it's unreasonable to do so. > >>> > >>> About other option. In " rte_eal_intr_init" function, After the > >>> thread is created, I set the thread affinity for eal-intr-thread, > >>> but it does not solve > >> this issue. > >> > >> Hi Kaisen, > >> > >> There are two threads involved, > >> > >> First one is interrupt thread, "eal-intr-thread", created by > 'rte_eal_intr_init()'. > >> > >> Second one is iavf event handler, "iavf-event-thread", created by > >> 'iavf_dev_event_handler_init()'. > >> > >> First one triggered by interrupt and puts a message to a list, second > >> one consumes from the list and processes the message. > >> So I assume two thread being in different sockets, or memory being > >> allocated in a different socket than the cores running causes the > >> performance issue. > >> > >> Did you test the second thread, "iavf-event-thread", affiliated to main > core? > >> (by creating thread using 'rte_thread_create()' API) > >> > >> > > I tried to use ''rte_thread_create() 'API creates the second thread, > > but this issue still exists. > > > > Because malloc is executed by "eal_intr_thread", it has nothing to do > > with "iavf_event_thread". > > > > Since 'iavf_event_element' (pos which is allocated by malloc() accessed and > freed in "iavf-event-thread", it could be related. But if it doesn't fix that > is OK, > thanks for testing. > > > But I found a patch similar to my issue: > > https://patchwork.dpdk.org/project/dpdk/patch/20221221104858.296530- > 1- > > david.march...@redhat.com/ According to the patch modification, this > > issue can be solved. > > > > I guess that patch inspired from this discussion, and if it fixes the issue, I > prefer that one as generic solution.
+1 That looks like the feature that DPDK should have.