Re: [PATCH] net/iavf:fix slow memory allocation

Ferruh Yigit Thu, 22 Dec 2022 04:07:00 -0800

On 12/22/2022 7:23 AM, You, KaisenX wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yi...@amd.com>
>> Sent: 2022年12月21日 21:49
>> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov,
>> Anatoly <anatoly.bura...@intel.com>; David Marchand
>> <david.march...@redhat.com>
>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou, YidingX
>> <yidingx.z...@intel.com>; Wu, Jingjing <jingjing...@intel.com>; Xing,
>> Beilei <beilei.x...@intel.com>; Zhang, Qi Z <qi.z.zh...@intel.com>; Luca
>> Boccassi <bl...@debian.org>; Mcnamara, John
>> <john.mcnam...@intel.com>; Kevin Traynor <ktray...@redhat.com>
>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation
>>
>> On 12/20/2022 6:52 AM, You, KaisenX wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yi...@amd.com>
>>>> Sent: 2022年12月13日 21:28
>>>> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov,
>>>> Anatoly <anatoly.bura...@intel.com>; David Marchand
>>>> <david.march...@redhat.com>
>>>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou,
>>>> YidingX <yidingx.z...@intel.com>; Wu, Jingjing
>>>> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>; Zhang,
>>>> Qi Z <qi.z.zh...@intel.com>; Luca Boccassi <bl...@debian.org>;
>>>> Mcnamara, John <john.mcnam...@intel.com>; Kevin Traynor
>>>> <ktray...@redhat.com>
>>>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation
>>>>
>>>> On 12/13/2022 9:35 AM, Ferruh Yigit wrote:
>>>>> On 12/13/2022 7:52 AM, You, KaisenX wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Ferruh Yigit <ferruh.yi...@amd.com>
>>>>>>> Sent: 2022年12月8日 23:04
>>>>>>> To: You, KaisenX <kaisenx....@intel.com>; dev@dpdk.org; Burakov,
>>>>>>> Anatoly <anatoly.bura...@intel.com>; David Marchand
>>>>>>> <david.march...@redhat.com>
>>>>>>> Cc: sta...@dpdk.org; Yang, Qiming <qiming.y...@intel.com>; Zhou,
>>>>>>> YidingX <yidingx.z...@intel.com>; Wu, Jingjing
>>>>>>> <jingjing...@intel.com>; Xing, Beilei <beilei.x...@intel.com>;
>>>>>>> Zhang, Qi Z <qi.z.zh...@intel.com>; Luca Boccassi
>>>>>>> <bl...@debian.org>; Mcnamara, John <john.mcnam...@intel.com>;
>>>> Kevin
>>>>>>> Traynor <ktray...@redhat.com>
>>>>>>> Subject: Re: [PATCH] net/iavf:fix slow memory allocation
>>>>>>>
>>>>>>> On 11/17/2022 6:57 AM, Kaisen You wrote:
>>>>>>>> In some cases, the DPDK does not allocate hugepage heap memory
>> to
>>>>>>> some
>>>>>>>> sockets due to the user setting parameters (e.g. -l 40-79, SOCKET
>>>>>>>> 0 has no memory).
>>>>>>>> When the interrupt thread runs on the corresponding core of this
>>>>>>>> socket, each allocation/release will execute a whole set of heap
>>>>>>>> allocation/release operations,resulting in poor performance.
>>>>>>>> Instead we call malloc() to get memory from the system's heap
>>>>>>>> space to fix this problem.
>>>>>>>>
>>>>>>>
>>>>>>> Hi Kaisen,
>>>>>>>
>>>>>>> Using libc malloc can improve performance for this case, but I
>>>>>>> would like to understand root cause of the problem.
>>>>>>>
>>>>>>>
>>>>>>> As far as I can see, interrupt callbacks are run by interrupt
>>>>>>> thread
>>>>>>> ("eal-intr- thread"), and interrupt thread created by
>>>> 'rte_ctrl_thread_create()' API.
>>>>>>>
>>>>>>> 'rte_ctrl_thread_create()' comment mentions that "CPU affinity
>>>>>>> retrieved at the time 'rte_eal_init()' was called,"
>>>>>>>
>>>>>>> And 'rte_eal_init()' is run on main lcore, which is the first
>>>>>>> lcore in the core list (unless otherwise defined with --main-lcore).
>>>>>>>
>>>>>>> So, the interrupts should be running on a core that has hugepages
>>>>>>> allocated for it, am I missing something here?
>>>>>>>
>>>>>>>
>>>>>> Thank for your comments.  Let me try to explain the root cause here:
>>>>>> eal_intr_thread the CPU in the corresponding slot does not create
>>>> memory pool.
>>>>>> That results in frequent memory subsequently creating/destructing.
>>>>>>
>>>>>> When testpmd started, the parameter (e.g. -l 40-79) is set.
>>>>>> Different OS has different topology. Some OS like SUSE only creates
>>>>>> memory pool for one CPU slot, while other system creates for two.
>>>>>> That is why the problem occurs when using memories in different OS.
>>>>>
>>>>>
>>>>> It is testpmd application that decides from which socket to allocate
>>>>> memory from, right. This is nothing specific to OS.
>>>>>
>>>>> As far as I remember, testpmd logic is too allocate from socket that
>>>>> its cores are used (provided with -l parameter), and allocate from
>>>>> socket that device is attached to.
>>>>>
>>>>> So, in a dual socket system, if all used cores are in socket 1 and
>>>>> the NIC is in socket 1, no memory is allocated for socket 0. This is
>>>>> to optimize memory consumption.
>>>>>
>>>>>
>>>>> Can you please confirm that the problem you are observing is because
>>>>> interrupt handler is running on a CPU, which doesn't have memory
>>>>> allocated for its socket?
>>>>>
>>>>> In this case what I don't understand is why interrupts is not
>>>>> running on main lcore, which should be first core in the list, for "-l 
>>>>> 40-79"
>>>>> sample it should be lcore 40.
>>>>> For your case, is interrupt handler run on core 0? Or any arbitrary core?
>>>>> If so, can you please confirm when you provide core list as "-l 0,40-79"
>>>>> fixes the issue?
>>>>>
>>> First of all, sorry to reply to you so late.
>>> I can confirm that the problem I observed is because  interrupt
>>> handler is running on a CPU, which doesn't have memory allocated for its
>> socket.
>>>
>>> In my case, interrupt handler is running on core 0.
>>> I tried providing "-l 0,40-79" as a startup parameter, this issue can be
>> resolved.
>>>
>>> I corrected the previous statement that this problem does  only occur
>>> on the SUSE system. In any OS, this problem occurs as long as the
>>> range of startup parameters is only on node1.
>>>
>>>>>
>>>>>>>
>>>>>>>
>>>>>>> And what about using 'rte_malloc_socket()' API (instead of
>>>>>>> rte_malloc), which gets 'socket' as parameter, and provide the
>>>>>>> socket that devices is on as parameter to this API? Is it possible
>>>>>>> to test
>>>> this?
>>>>>>>
>>>>>>>
>>>>>> As to the reason for not using rte_malloc_socket. I thought
>>>>>> rte_malloc_socket() could solve the problem too. And the
>>>>>> appropriate parameter should be the socket_id that created the
>>>>>> memory pool for DPDK initialization. Assuming that> the socket_id
>>>>>> of the initially allocated memory = 1, first let the
>>>>> eal_intr_thread
>>>>>> determine if it is on the socket_id, then record this socket_id in
>>>>>> the eal_intr_thread and pass it to the iavf_event_thread.  But
>>>>>> there seems no way to link this parameter to the
>>>>>> iavf_dev_event_post()
>>>> function. That is why rte_malloc_socket is not used.
>>>>>>
>>>>>
>>>>> I was thinking socket id of device can be used, but that won't help
>>>>> if the core that interrupt handler runs is in different socket.
>>>>> And I also don't know if there is a way to get socket that interrupt
>>>>> thread is on. @David may help perhaps.
>>>>>
>>>>> So question is why interrupt thread is not running on main lcore.
>>>>>
>>>>
>>>> OK after some talk with David, what I am missing is
>> 'rte_ctrl_thread_create()'
>>>> does NOT run on main lcore, it can run on any core except data plane
>> cores.
>>>>
>>>> Driver "iavf-event-thread" thread (iavf_dev_event_handle()) and
>>>> interrupt thread (so driver interrupt callback iavf_dev_event_post())
>>>> can run on any core, making it hard to manage.
>>>> And it seems it is not possible to control where interrupt thread to run.
>>>>
>>>> One option can be allocating hugepages for all sockets, but this
>>>> requires user involvement, and can't happen transparently.
>>>>
>>>> Other option can be to control where "iavf-event-thread" run, like
>>>> using 'rte_thread_create()' to create thread and provide attribute to
>>>> run it on main lcore (rte_lcore_cpuset(rte_get_main_lcore()))?
>>>>
>>>> Can you please test above option?
>>>>
>>>>
>>> The first option can solve this issue. but to borrow from your
>>> previous saying, "in a dual socket system, if all used cores are in
>>> socket 1 and the NIC is in socket 1,  no memory is allocated for socket 0.
>> This is to optimize memory consumption."
>>> I think it's unreasonable to do so.
>>>
>>> About other option. In " rte_eal_intr_init" function, After the thread
>>> is created, I set the thread affinity for eal-intr-thread, but it does not 
>>> solve
>> this issue.
>>
>> Hi Kaisen,
>>
>> There are two threads involved,
>>
>> First one is interrupt thread, "eal-intr-thread", created by 
>> 'rte_eal_intr_init()'.
>>
>> Second one is iavf event handler, "iavf-event-thread", created by
>> 'iavf_dev_event_handler_init()'.
>>
>> First one triggered by interrupt and puts a message to a list, second one
>> consumes from the list and processes the message.
>> So I assume two thread being in different sockets, or memory being
>> allocated in a different socket than the cores running causes the
>> performance issue.
>>
>> Did you test the second thread, "iavf-event-thread", affiliated to main core?
>> (by creating thread using 'rte_thread_create()' API)
>>
>>
> I tried to use ''rte_thread_create() 'API creates the second thread, 
> but this issue still exists.
> 
> Because malloc is executed by "eal_intr_thread", it has nothing 
> to do with "iavf_event_thread".
>


Since 'iavf_event_element' (pos which is allocated by malloc()  accessed
and freed in "iavf-event-thread", it could be related. But if it doesn't
fix that is OK, thanks for testing.

> But I found a patch similar to my issue：
> https://patchwork.dpdk.org/project/dpdk/patch/20221221104858.296530-1-david.march...@redhat.com/
> According to the patch modification, this issue can be solved.
> 

I guess that patch inspired from this discussion, and if it fixes the
issue, I prefer that one as generic solution.

>>>>
>>>>>> Let me know if there is anything else unclear.
>>>>>>>
>>>>>>>> Fixes: cb5c1b91f76f ("net/iavf: add thread for event callbacks")
>>>>>>>> Cc: sta...@dpdk.org
>>>>>>>>
>>>>>>>> Signed-off-by: Kaisen You <kaisenx....@intel.com>
>>>>>>>> ---
>>>>>>>>  drivers/net/iavf/iavf_vchnl.c | 8 +++-----
>>>>>>>>  1 file changed, 3 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/iavf/iavf_vchnl.c
>>>>>>>> b/drivers/net/iavf/iavf_vchnl.c index f92daf97f2..a05791fe48
>>>>>>>> 100644
>>>>>>>> --- a/drivers/net/iavf/iavf_vchnl.c
>>>>>>>> +++ b/drivers/net/iavf/iavf_vchnl.c
>>>>>>>> @@ -36,7 +36,6 @@ struct iavf_event_element {
>>>>>>>>        struct rte_eth_dev *dev;
>>>>>>>>        enum rte_eth_event_type event;
>>>>>>>>        void *param;
>>>>>>>> -      size_t param_alloc_size;
>>>>>>>>        uint8_t param_alloc_data[0];
>>>>>>>>  };
>>>>>>>>
>>>>>>>> @@ -80,7 +79,7 @@ iavf_dev_event_handle(void *param
>>>> __rte_unused)
>>>>>>>>                TAILQ_FOREACH_SAFE(pos, &pending, next,
>> save_next) {
>>>>>>>>                        TAILQ_REMOVE(&pending, pos, next);
>>>>>>>>                        rte_eth_dev_callback_process(pos->dev,
>> pos- event,
>>>> pos->param);
>>>>>>>> -                      rte_free(pos);
>>>>>>>> +                      free(pos);
>>>>>>>>                }
>>>>>>>>        }
>>>>>>>>
>>>>>>>> @@ -94,14 +93,13 @@ iavf_dev_event_post(struct rte_eth_dev
>> *dev,
>>>> {
>>>>>>>>        struct iavf_event_handler *handler = &event_handler;
>>>>>>>>        char notify_byte;
>>>>>>>> -      struct iavf_event_element *elem = rte_malloc(NULL,
>> sizeof(*elem)
>>>>>>> + param_alloc_size, 0);
>>>>>>>> +      struct iavf_event_element *elem = malloc(sizeof(*elem) +
>>>>>>>> +param_alloc_size);
>>>>>>>>        if (!elem)
>>>>>>>>                return;
>>>>>>>>
>>>>>>>>        elem->dev = dev;
>>>>>>>>        elem->event = event;
>>>>>>>>        elem->param = param;
>>>>>>>> -      elem->param_alloc_size = param_alloc_size;
>>>>>>>>        if (param && param_alloc_size) {
>>>>>>>>                rte_memcpy(elem->param_alloc_data, param,
>>>>>>> param_alloc_size);
>>>>>>>>                elem->param = elem->param_alloc_data; @@ -165,7
>> +163,7
>>>>>>> @@
>>>>>>>> iavf_dev_event_handler_fini(void)
>>>>>>>>        struct iavf_event_element *pos, *save_next;
>>>>>>>>        TAILQ_FOREACH_SAFE(pos, &handler->pending, next,
>> save_next) {
>>>>>>>>                TAILQ_REMOVE(&handler->pending, pos, next);
>>>>>>>> -              rte_free(pos);
>>>>>>>> +              free(pos);
>>>>>>>>        }
>>>>>>>>  }
>>>>>>>>
>>>>>>
>>>>>
>>>
>

Re: [PATCH] net/iavf:fix slow memory allocation

Reply via email to