Hi chengwen, > -----Original Message----- > From: fengchengwen <fengcheng...@huawei.com> > Sent: Wednesday, March 20, 2024 12:12 PM > To: Ma, WenwuX <wenwux...@intel.com>; dev@dpdk.org > Cc: Jiale, SongX <songx.ji...@intel.com>; sta...@dpdk.org; Pavan Nikhilesh > <pbhagavat...@marvell.com>; Thomas Monjalon <tho...@monjalon.net> > Subject: Re: [PATCH v2] dmadev: fix structure alignment > > Hi Wenwu, > > On 2024/3/15 17:27, Ma, WenwuX wrote: > > Hi Chengwen > > > >> -----Original Message----- > >> From: fengchengwen <fengcheng...@huawei.com> > >> Sent: Friday, March 15, 2024 4:32 PM > >> To: Ma, WenwuX <wenwux...@intel.com>; dev@dpdk.org > >> Cc: Jiale, SongX <songx.ji...@intel.com>; sta...@dpdk.org > >> Subject: Re: [PATCH v2] dmadev: fix structure alignment > >> > >> Hi Wenwu, > >> > >> On 2024/3/15 15:44, Ma, WenwuX wrote: > >>> Hi Chengwen, > >>> > >>>> -----Original Message----- > >>>> From: Ma, WenwuX > >>>> Sent: Friday, March 15, 2024 2:26 PM > >>>> To: fengchengwen <fengcheng...@huawei.com>; dev@dpdk.org > >>>> Cc: Jiale, SongX <songx.ji...@intel.com>; sta...@dpdk.org > >>>> Subject: RE: [PATCH v2] dmadev: fix structure alignment > >>>> > >>>> Hi Chengwen, > >>>> > >>>>> -----Original Message----- > >>>>> From: fengchengwen <fengcheng...@huawei.com> > >>>>> Sent: Friday, March 15, 2024 2:06 PM > >>>>> To: Ma, WenwuX <wenwux...@intel.com>; dev@dpdk.org > >>>>> Cc: Jiale, SongX <songx.ji...@intel.com>; sta...@dpdk.org > >>>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment > >>>>> > >>>>> Hi Wenwu, > >>>>> > >>>>> On 2024/3/15 9:43, Wenwu Ma wrote: > >>>>>> The structure rte_dma_dev needs only 8 byte alignment. > >>>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with > >>>>>> __rte_aligned(8). > >>>>>> > >>>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library") > >>>>>> Cc: sta...@dpdk.org > >>>>>> > >>>>>> Signed-off-by: Wenwu Ma <wenwux...@intel.com> > >>>>>> --- > >>>>>> v2: > >>>>>> - Because of performance drop, adjust the code to > >>>>>> no longer demand cache line alignment > >>>>> > >>>>> Which two versions observed performance drop? And which > benchmark > >>>>> observed drop? > >>>>> Could you provide more information? > >>>>> > >>>>>> > >>>> V1 patch: > >>>> > >> > https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154- > >>>> 1-wenwux...@intel.com/ > >>>> > >>>> To view detailed results, visit: > >>>> https://lab.dpdk.org/results/dashboard/patchsets/29472/ > >>>> > >>>>>> --- > >>>>>> lib/dmadev/rte_dmadev_pmd.h | 2 +- > >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>>>> > >>>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h > >>>>> b/lib/dmadev/rte_dmadev_pmd.h > >>>>>> index 58729088ff..b569bb3502 100644 > >>>>>> --- a/lib/dmadev/rte_dmadev_pmd.h > >>>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h > >>>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state { > >>>>>> * @internal > >>>>>> * The generic data structure associated with each DMA device. > >>>>>> */ > >>>>>> -struct __rte_cache_aligned rte_dma_dev { > >>>>>> +struct __rte_aligned(8) rte_dma_dev { > >>>>> > >>>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which > >>>>> is not rte_dma_dev? So why is it a problem here? > >>>>> > >>>>> Thanks > >>>>> > >>>> The DMA device object is expected to align cache line, so clang > >>>> will use “vmovaps” assembly instruction, > >>>> > >>>> And the instruction demands 16 bytes alignment or will cause > >>>> segment fault in some environments. > >>>> > >>> Test case: > >>> 1. compile dpdk > >>> rm -rf x86_64-native-linuxapp-clang > >>> CC=clang meson -Denable_kmods=True -Dlibdir=lib > >>> --default-library=static x86_64-native-linuxapp-clang ninja -C > >>> x86_64-native-linuxapp-clang -j 72 2. start dpdk-test > >>> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39 > >>> --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note: > >>> If it cannot be reproduced, please try using a different core) > >>> 3. exit dpdk-test > >>> RTE>>quit > >>> Segmentation fault (core dumped) > > I reproduce it just with --vdev=dma_skeleton. > When execute quit command, it will invoke rte_dma_close->dma_release, pls > see my annotations (//) below: > > void > dma_release(struct rte_dma_dev *dev) > { > if (rte_eal_process_type() == RTE_PROC_PRIMARY) { > rte_free(dev->data->dev_private); > memset(dev->data, 0, sizeof(struct rte_dma_dev_data)); > } > > dma_fp_object_dummy(dev->fp_obj); > memset(dev, 0, sizeof(struct rte_dma_dev)); // this memset was > compiles using vmovaps, its > // 8c24da: c5 f8 57 c0 > vxorps %xmm0,%xmm0,%xmm0 > // 8c24de: c5 fc 29 43 20 > vmovaps %ymm0,0x20(%rbx) > // 8c24e3: c5 fc 29 03 > vmovaps %ymm0,(%rbx) > // but the dev is not align 16B > (in my env the rte_dma_devices addr is 0x15d39950) } > > >> > >> I will try to reproduce, but still a question: does above test has > >> already merged your patch [1] or the current main branch code has this > problem? > >> > >> [1] > >> > https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154- > >> 1-wenwux...@intel.com/ > >> > >> Thanks > >> > > the current main branch code has this problem. > > > > Both patch v1 and v2 are able to solve this problem, but v1 has a > performance issue. > > The performance issue is ethdev benchmark, it will not invoke any dmadev > API, I don't think these two has any relations. > > So I prefer v1, Plus Pavan also submit a commit [1] to align the struct, but > it > was not a fix for clang-x86-platform. > The performance issue is subtle, as it doesn't occur in the v2 patch. So, maybe it needs more investigation.
> [1] https://lore.kernel.org/all/20240210062758.1510-1- > pbhagavat...@marvell.com/T/ > > > > >>> > >>>> > >>>>>> /** Device info which supplied during device initialization. */ > >>>>>> struct rte_device *device; > >>>>>> struct rte_dma_dev_data *data; /**< Pointer to shared device > data. > >>>>>> */ > >>>>>> > > What more, could you please send v3? I hope it will contain the root cause and > optional solutions of the segment fault problem. > I will submit v3 patch later. > BTW: dmadev is the first one which dynamic alloc dmadev struct, later maybe > more xxxdev will use this type, I think that's typical. > Maybe we should add a such mem_align() function in eal library, but this > could done later. > > Thanks