RE: [PATCH] net/iavf:fix slow memory allocation

You, KaisenX Wed, 21 Dec 2022 23:23:52 -0800


> -----Original Message-----
> From: Ferruh Yigit <ferruh.yi...@amd.com>
> Sent: 2022年12月21日 21:49
> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov,
> Anatoly <anatoly.bura...@intel.com>; David Marchand
> <david.march...@redhat.com>
> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, YidingX
> <yidingx.z...@intel.com>; Wu, Jingjing <jingjing...@intel.com>; Xing,
> Beilei <beilei.x...@intel.com>; Zhang, Qi Z <qi.z.zh...@intel.com>; Luca
> Boccassi <bl...@debian.org>; Mcnamara, John
> <john.mcnam...@intel.com>; Kevin Traynor <ktray...@redhat.com>
> Subject: Re: [PATCH] net/iavf:fix slow memory allocation
> 
> On 12/20/2022 6:52 AM, You, KaisenX wrote:
> >
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yi...@amd.com>
> >> Sent: 2022年12月13日 21:28
> >> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov,
> >> Anatoly <anatoly.bura...@intel.com>; David Marchand
> >> <david.march...@redhat.com>
> >> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou,
> >> YidingX <yidingx.z...@intel.com>; Wu, Jingjing
> >> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; Zhang,
> >> Qi Z <qi.z.zh...@intel.com>; Luca Boccassi <bl...@debian.org>;
> >> Mcnamara, John <john.mcnam...@intel.com>; Kevin Traynor
> >> <ktray...@redhat.com>
> >> Subject: Re: [PATCH] net/iavf:fix slow memory allocation
> >>
> >> On 12/13/2022 9:35 AM, Ferruh Yigit wrote:
> >>> On 12/13/2022 7:52 AM, You, KaisenX wrote:
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Ferruh Yigit <ferruh.yi...@amd.com>
> >>>>> Sent: 2022年12月8日 23:04
> >>>>> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov,
> >>>>> Anatoly <anatoly.bura...@intel.com>; David Marchand
> >>>>> <david.march...@redhat.com>
> >>>>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou,
> >>>>> YidingX <yidingx.z...@intel.com>; Wu, Jingjing
> >>>>> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>;
> >>>>> Zhang, Qi Z <qi.z.zh...@intel.com>; Luca Boccassi
> >>>>> <bl...@debian.org>; Mcnamara, John <john.mcnam...@intel.com>;
> >> Kevin
> >>>>> Traynor <ktray...@redhat.com>
> >>>>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation
> >>>>>
> >>>>> On 11/17/2022 6:57 AM, Kaisen You wrote:
> >>>>>> In some cases, the DPDK does not allocate hugepage heap memory
> to
> >>>>> some
> >>>>>> sockets due to the user setting parameters (e.g. -l 40-79, SOCKET
> >>>>>> 0 has no memory).
> >>>>>> When the interrupt thread runs on the corresponding core of this
> >>>>>> socket, each allocation/release will execute a whole set of heap
> >>>>>> allocation/release operations,resulting in poor performance.
> >>>>>> Instead we call malloc() to get memory from the system's heap
> >>>>>> space to fix this problem.
> >>>>>>
> >>>>>
> >>>>> Hi Kaisen,
> >>>>>
> >>>>> Using libc malloc can improve performance for this case, but I
> >>>>> would like to understand root cause of the problem.
> >>>>>
> >>>>>
> >>>>> As far as I can see, interrupt callbacks are run by interrupt
> >>>>> thread
> >>>>> ("eal-intr- thread"), and interrupt thread created by
> >> 'rte_ctrl_thread_create()' API.
> >>>>>
> >>>>> 'rte_ctrl_thread_create()' comment mentions that "CPU affinity
> >>>>> retrieved at the time 'rte_eal_init()' was called,"
> >>>>>
> >>>>> And 'rte_eal_init()' is run on main lcore, which is the first
> >>>>> lcore in the core list (unless otherwise defined with --main-lcore).
> >>>>>
> >>>>> So, the interrupts should be running on a core that has hugepages
> >>>>> allocated for it, am I missing something here?
> >>>>>
> >>>>>
> >>>> Thank for your comments.  Let me try to explain the root cause here:
> >>>> eal_intr_thread the CPU in the corresponding slot does not create
> >> memory pool.
> >>>> That results in frequent memory subsequently creating/destructing.
> >>>>
> >>>> When testpmd started, the parameter (e.g. -l 40-79) is set.
> >>>> Different OS has different topology. Some OS like SUSE only creates
> >>>> memory pool for one CPU slot, while other system creates for two.
> >>>> That is why the problem occurs when using memories in different OS.
> >>>
> >>>
> >>> It is testpmd application that decides from which socket to allocate
> >>> memory from, right. This is nothing specific to OS.
> >>>
> >>> As far as I remember, testpmd logic is too allocate from socket that
> >>> its cores are used (provided with -l parameter), and allocate from
> >>> socket that device is attached to.
> >>>
> >>> So, in a dual socket system, if all used cores are in socket 1 and
> >>> the NIC is in socket 1, no memory is allocated for socket 0. This is
> >>> to optimize memory consumption.
> >>>
> >>>
> >>> Can you please confirm that the problem you are observing is because
> >>> interrupt handler is running on a CPU, which doesn't have memory
> >>> allocated for its socket?
> >>>
> >>> In this case what I don't understand is why interrupts is not
> >>> running on main lcore, which should be first core in the list, for "-l 
> >>> 40-79"
> >>> sample it should be lcore 40.
> >>> For your case, is interrupt handler run on core 0? Or any arbitrary core?
> >>> If so, can you please confirm when you provide core list as "-l 0,40-79"
> >>> fixes the issue?
> >>>
> > First of all, sorry to reply to you so late.
> > I can confirm that the problem I observed is because  interrupt
> > handler is running on a CPU, which doesn't have memory allocated for its
> socket.
> >
> > In my case, interrupt handler is running on core 0.
> > I tried providing "-l 0,40-79" as a startup parameter, this issue can be
> resolved.
> >
> > I corrected the previous statement that this problem does  only occur
> > on the SUSE system. In any OS, this problem occurs as long as the
> > range of startup parameters is only on node1.
> >
> >>>
> >>>>>
> >>>>>
> >>>>> And what about using 'rte_malloc_socket()' API (instead of
> >>>>> rte_malloc), which gets 'socket' as parameter, and provide the
> >>>>> socket that devices is on as parameter to this API? Is it possible
> >>>>> to test
> >> this?
> >>>>>
> >>>>>
> >>>> As to the reason for not using rte_malloc_socket. I thought
> >>>> rte_malloc_socket() could solve the problem too. And the
> >>>> appropriate parameter should be the socket_id that created the
> >>>> memory pool for DPDK initialization. Assuming that> the socket_id
> >>>> of the initially allocated memory = 1, first let the
> >>> eal_intr_thread
> >>>> determine if it is on the socket_id, then record this socket_id in
> >>>> the eal_intr_thread and pass it to the iavf_event_thread.  But
> >>>> there seems no way to link this parameter to the
> >>>> iavf_dev_event_post()
> >> function. That is why rte_malloc_socket is not used.
> >>>>
> >>>
> >>> I was thinking socket id of device can be used, but that won't help
> >>> if the core that interrupt handler runs is in different socket.
> >>> And I also don't know if there is a way to get socket that interrupt
> >>> thread is on. @David may help perhaps.
> >>>
> >>> So question is why interrupt thread is not running on main lcore.
> >>>
> >>
> >> OK after some talk with David, what I am missing is
> 'rte_ctrl_thread_create()'
> >> does NOT run on main lcore, it can run on any core except data plane
> cores.
> >>
> >> Driver "iavf-event-thread" thread (iavf_dev_event_handle()) and
> >> interrupt thread (so driver interrupt callback iavf_dev_event_post())
> >> can run on any core, making it hard to manage.
> >> And it seems it is not possible to control where interrupt thread to run.
> >>
> >> One option can be allocating hugepages for all sockets, but this
> >> requires user involvement, and can't happen transparently.
> >>
> >> Other option can be to control where "iavf-event-thread" run, like
> >> using 'rte_thread_create()' to create thread and provide attribute to
> >> run it on main lcore (rte_lcore_cpuset(rte_get_main_lcore()))?
> >>
> >> Can you please test above option?
> >>
> >>
> > The first option can solve this issue. but to borrow from your
> > previous saying, "in a dual socket system, if all used cores are in
> > socket 1 and the NIC is in socket 1,  no memory is allocated for socket 0.
> This is to optimize memory consumption."
> > I think it's unreasonable to do so.
> >
> > About other option. In " rte_eal_intr_init" function, After the thread
> > is created, I set the thread affinity for eal-intr-thread, but it does not 
> > solve
> this issue.
> 
> Hi Kaisen,
> 
> There are two threads involved,
> 
> First one is interrupt thread, "eal-intr-thread", created by 
> 'rte_eal_intr_init()'.
> 
> Second one is iavf event handler, "iavf-event-thread", created by
> 'iavf_dev_event_handler_init()'.
> 
> First one triggered by interrupt and puts a message to a list, second one
> consumes from the list and processes the message.
> So I assume two thread being in different sockets, or memory being
> allocated in a different socket than the cores running causes the
> performance issue.
> 
> Did you test the second thread, "iavf-event-thread", affiliated to main core?
> (by creating thread using 'rte_thread_create()' API)
> 
> 
I tried to use ''rte_thread_create() 'API creates the second thread, 
but this issue still exists.


Because malloc is executed by "eal_intr_thread", it has nothing 
to do with "iavf_event_thread".

But I found a patch similar to my issue：
https://patchwork.dpdk.org/project/dpdk/patch/20221221104858.296530-1-david.march...@redhat.com/
According to the patch modification, this issue can be solved.

> >>
> >>>> Let me know if there is anything else unclear.
> >>>>>
> >>>>>> Fixes: cb5c1b91f76f ("net/iavf: add thread for event callbacks")
> >>>>>> Cc: sta...@dpdk.org
> >>>>>>
> >>>>>> Signed-off-by: Kaisen You <kaisenx....@intel.com>
> >>>>>> ---
> >>>>>>  drivers/net/iavf/iavf_vchnl.c | 8 +++-----
> >>>>>>  1 file changed, 3 insertions(+), 5 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/net/iavf/iavf_vchnl.c
> >>>>>> b/drivers/net/iavf/iavf_vchnl.c index f92daf97f2..a05791fe48
> >>>>>> 100644
> >>>>>> --- a/drivers/net/iavf/iavf_vchnl.c
> >>>>>> +++ b/drivers/net/iavf/iavf_vchnl.c
> >>>>>> @@ -36,7 +36,6 @@ struct iavf_event_element {
> >>>>>>        struct rte_eth_dev *dev;
> >>>>>>        enum rte_eth_event_type event;
> >>>>>>        void *param;
> >>>>>> -      size_t param_alloc_size;
> >>>>>>        uint8_t param_alloc_data[0];
> >>>>>>  };
> >>>>>>
> >>>>>> @@ -80,7 +79,7 @@ iavf_dev_event_handle(void *param
> >> __rte_unused)
> >>>>>>                TAILQ_FOREACH_SAFE(pos, &pending, next,
> save_next) {
> >>>>>>                        TAILQ_REMOVE(&pending, pos, next);
> >>>>>>                        rte_eth_dev_callback_process(pos->dev,
> pos- event,
> >> pos->param);
> >>>>>> -                      rte_free(pos);
> >>>>>> +                      free(pos);
> >>>>>>                }
> >>>>>>        }
> >>>>>>
> >>>>>> @@ -94,14 +93,13 @@ iavf_dev_event_post(struct rte_eth_dev
> *dev,
> >> {
> >>>>>>        struct iavf_event_handler *handler = &event_handler;
> >>>>>>        char notify_byte;
> >>>>>> -      struct iavf_event_element *elem = rte_malloc(NULL,
> sizeof(*elem)
> >>>>> + param_alloc_size, 0);
> >>>>>> +      struct iavf_event_element *elem = malloc(sizeof(*elem) +
> >>>>>> +param_alloc_size);
> >>>>>>        if (!elem)
> >>>>>>                return;
> >>>>>>
> >>>>>>        elem->dev = dev;
> >>>>>>        elem->event = event;
> >>>>>>        elem->param = param;
> >>>>>> -      elem->param_alloc_size = param_alloc_size;
> >>>>>>        if (param && param_alloc_size) {
> >>>>>>                rte_memcpy(elem->param_alloc_data, param,
> >>>>> param_alloc_size);
> >>>>>>                elem->param = elem->param_alloc_data; @@ -165,7
> +163,7
> >>>>> @@
> >>>>>> iavf_dev_event_handler_fini(void)
> >>>>>>        struct iavf_event_element *pos, *save_next;
> >>>>>>        TAILQ_FOREACH_SAFE(pos, &handler->pending, next,
> save_next) {
> >>>>>>                TAILQ_REMOVE(&handler->pending, pos, next);
> >>>>>> -              rte_free(pos);
> >>>>>> +              free(pos);
> >>>>>>        }
> >>>>>>  }
> >>>>>>
> >>>>
> >>>
> >

RE: [PATCH] net/iavf:fix slow memory allocation

Reply via email to