Hi chengwen,

> -----Original Message-----
> From: fengchengwen <fengcheng...@huawei.com>
> Sent: Wednesday, March 20, 2024 12:12 PM
> To: Ma, WenwuX <wenwux...@intel.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.ji...@intel.com>; sta...@dpdk.org; Pavan Nikhilesh
> <pbhagavat...@marvell.com>; Thomas Monjalon <tho...@monjalon.net>
> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> 
> Hi Wenwu,
> 
> On 2024/3/15 17:27, Ma, WenwuX wrote:
> > Hi Chengwen
> >
> >> -----Original Message-----
> >> From: fengchengwen <fengcheng...@huawei.com>
> >> Sent: Friday, March 15, 2024 4:32 PM
> >> To: Ma, WenwuX <wenwux...@intel.com>; dev@dpdk.org
> >> Cc: Jiale, SongX <songx.ji...@intel.com>; sta...@dpdk.org
> >> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >>
> >> Hi Wenwu,
> >>
> >> On 2024/3/15 15:44, Ma, WenwuX wrote:
> >>> Hi Chengwen,
> >>>
> >>>> -----Original Message-----
> >>>> From: Ma, WenwuX
> >>>> Sent: Friday, March 15, 2024 2:26 PM
> >>>> To: fengchengwen <fengcheng...@huawei.com>; dev@dpdk.org
> >>>> Cc: Jiale, SongX <songx.ji...@intel.com>; sta...@dpdk.org
> >>>> Subject: RE: [PATCH v2] dmadev: fix structure alignment
> >>>>
> >>>> Hi Chengwen,
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: fengchengwen <fengcheng...@huawei.com>
> >>>>> Sent: Friday, March 15, 2024 2:06 PM
> >>>>> To: Ma, WenwuX <wenwux...@intel.com>; dev@dpdk.org
> >>>>> Cc: Jiale, SongX <songx.ji...@intel.com>; sta...@dpdk.org
> >>>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >>>>>
> >>>>> Hi Wenwu,
> >>>>>
> >>>>> On 2024/3/15 9:43, Wenwu Ma wrote:
> >>>>>> The structure rte_dma_dev needs only 8 byte alignment.
> >>>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
> >>>>>> __rte_aligned(8).
> >>>>>>
> >>>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> >>>>>> Cc: sta...@dpdk.org
> >>>>>>
> >>>>>> Signed-off-by: Wenwu Ma <wenwux...@intel.com>
> >>>>>> ---
> >>>>>> v2:
> >>>>>>  - Because of performance drop, adjust the code to
> >>>>>>    no longer demand cache line alignment
> >>>>>
> >>>>> Which two versions observed performance drop? And which
> benchmark
> >>>>> observed drop?
> >>>>> Could you provide more information?
> >>>>>
> >>>>>>
> >>>> V1 patch:
> >>>>
> >>
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> >>>> 1-wenwux...@intel.com/
> >>>>
> >>>> To view detailed results, visit:
> >>>> https://lab.dpdk.org/results/dashboard/patchsets/29472/
> >>>>
> >>>>>> ---
> >>>>>>  lib/dmadev/rte_dmadev_pmd.h | 2 +-
> >>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
> >>>>> b/lib/dmadev/rte_dmadev_pmd.h
> >>>>>> index 58729088ff..b569bb3502 100644
> >>>>>> --- a/lib/dmadev/rte_dmadev_pmd.h
> >>>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
> >>>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> >>>>>>   * @internal
> >>>>>>   * The generic data structure associated with each DMA device.
> >>>>>>   */
> >>>>>> -struct __rte_cache_aligned rte_dma_dev {
> >>>>>> +struct __rte_aligned(8) rte_dma_dev {
> >>>>>
> >>>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which
> >>>>> is not rte_dma_dev? So why is it a problem here?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>> The DMA device object is expected to align cache line, so clang
> >>>> will use “vmovaps” assembly instruction,
> >>>>
> >>>> And the instruction demands 16 bytes alignment or will cause
> >>>> segment fault in some environments.
> >>>>
> >>> Test case:
> >>> 1. compile dpdk
> >>> rm -rf x86_64-native-linuxapp-clang
> >>> CC=clang meson -Denable_kmods=True -Dlibdir=lib
> >>> --default-library=static x86_64-native-linuxapp-clang ninja -C
> >>> x86_64-native-linuxapp-clang -j 72 2. start dpdk-test
> >>> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39
> >>> --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note:
> >>> If it cannot be reproduced, please try using a different core)
> >>> 3. exit dpdk-test
> >>> RTE>>quit
> >>> Segmentation fault (core dumped)
> 
> I reproduce it just with --vdev=dma_skeleton.
> When execute quit command, it will invoke rte_dma_close->dma_release, pls
> see my annotations (//) below:
> 
> void
> dma_release(struct rte_dma_dev *dev)
> {
>       if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
>               rte_free(dev->data->dev_private);
>               memset(dev->data, 0, sizeof(struct rte_dma_dev_data));
>       }
> 
>       dma_fp_object_dummy(dev->fp_obj);
>       memset(dev, 0, sizeof(struct rte_dma_dev));   // this memset was
> compiles using vmovaps, its
>                                               //  8c24da:       c5 f8 57 c0
> vxorps %xmm0,%xmm0,%xmm0
>                                               //  8c24de:       c5 fc 29 43 20
> vmovaps %ymm0,0x20(%rbx)
>                                               //  8c24e3:       c5 fc 29 03
> vmovaps %ymm0,(%rbx)
>                                               // but the dev is not align 16B
> (in my env the rte_dma_devices addr is 0x15d39950) }
> 
> >>
> >> I will try to reproduce, but still a question: does above test has
> >> already merged your patch [1] or the current main branch code has this
> problem?
> >>
> >> [1]
> >>
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> >> 1-wenwux...@intel.com/
> >>
> >> Thanks
> >>
> > the current main branch code has this problem.
> >
> > Both patch v1 and v2 are able to solve this problem, but v1 has a
> performance issue.
> 
> The performance issue is ethdev benchmark, it will not invoke any dmadev
> API, I don't think these two has any relations.
> 
> So I prefer v1, Plus Pavan also submit a commit [1] to align the struct, but 
> it
> was not a fix for clang-x86-platform.
> 
The performance issue is subtle, as it doesn't occur in the v2 patch. 
So, maybe it needs more investigation.

> [1] https://lore.kernel.org/all/20240210062758.1510-1-
> pbhagavat...@marvell.com/T/
> 
> >
> >>>
> >>>>
> >>>>>>        /** Device info which supplied during device initialization. */
> >>>>>>        struct rte_device *device;
> >>>>>>        struct rte_dma_dev_data *data; /**< Pointer to shared device
> data.
> >>>>>> */
> >>>>>>
> 
> What more, could you please send v3? I hope it will contain the root cause and
> optional solutions of the segment fault problem.
> 
I will submit v3 patch later.

> BTW: dmadev is the first one which dynamic alloc dmadev struct, later maybe
> more xxxdev will use this type, I think that's typical.
>      Maybe we should add a such mem_align() function in eal library, but this
> could done later.
> 
> Thanks

Reply via email to