> -----Original Message-----
> From: Maxime Coquelin <maxime.coque...@redhat.com>
> Sent: Tuesday, April 28, 2020 11:40 PM
> To: Liu, Yong <yong....@intel.com>; Ye, Xiaolong <xiaolong...@intel.com>;
> Wang, Zhihong <zhihong.w...@intel.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>; jer...@marvell.com
> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path
>
>
>
> On 4/28/20 5:35 PM, Liu, Yong wrote:
> >
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coque...@redhat.com>
> >> Sent: Tuesday, April 28, 2020 10:50 PM
> >> To: Liu, Yong <yong....@intel.com>; Ye, Xiaolong
> <xiaolong...@intel.com>;
> >> Wang, Zhihong <zhihong.w...@intel.com>
> >> Cc: dev@dpdk.org; Honnappa Nagarahalli
> >> <honnappa.nagaraha...@arm.com>; jer...@marvell.com
> >> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx
> path
> >>
> >>
> >>
> >> On 4/28/20 4:43 PM, Liu, Yong wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Maxime Coquelin <maxime.coque...@redhat.com>
> >>>> Sent: Tuesday, April 28, 2020 9:46 PM
> >>>> To: Liu, Yong <yong....@intel.com>; Ye, Xiaolong
> >> <xiaolong...@intel.com>;
> >>>> Wang, Zhihong <zhihong.w...@intel.com>
> >>>> Cc: dev@dpdk.org; Honnappa Nagarahalli
> >>>> <honnappa.nagaraha...@arm.com>; jer...@marvell.com
> >>>> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx
> >> path
> >>>>
> >>>>
> >>>>
> >>>> On 4/28/20 3:01 PM, Liu, Yong wrote:
> >>>>>>> Maxime,
> >>>>>>> Thanks for point it out, it will add extra cache miss in datapath.
> >>>>>>> And its impact on performance is around 1% in loopback case.
> >>>>>> Ok, thanks for doing the test. I'll try to run some PVP benchmarks
> >>>>>> on my side because when doing IO loopback, the cache pressure is
> >>>>>> much less important.
> >>>>>>
> >>>>>>> While benefit of vectorized path will be more than that number.
> >>>>>> Ok, but I disagree for two reasons:
> >>>>>> 1. You have to keep in mind than non-vectorized is the default and
> >>>>>> encouraged mode to use. Indeed, it takes a lot of shortcuts like not
> >>>>>> checking header length (so no error stats), etc...
> >>>>>>
> >>>>> Ok, I will keep non-vectorized same as before.
> >>>>>
> >>>>>> 2. It's like saying it's OK it degrades by 5% on $CPU_VENDOR_A
> >> because
> >>>>>> the gain is 20% on $CPU_VENDOR_B.
> >>>>>>
> >>>>>> In the case we see more degradation in real-world scenario, you
> might
> >>>>>> want to consider using ifdefs to avoid adding padding in the non-
> >>>>>> vectorized case, like you did to differentiate Virtio PMD to Virtio-
> user
> >>>>>> PMD in patch 7.
> >>>>>>
> >>>>> Maxime,
> >>>>> The performance difference is so slight, so I ignored for it look like a
> >>>> sampling error.
> >>>>
> >>>> Agree for IO loopback, but it adds one more cache line access per
> burst,
> >>>> which might be see in some real-life use cases.
> >>>>
> >>>>> It maybe not suitable to add new configuration for such setting
> which
> >>>> only used inside driver.
> >>>>
> >>>> Wait, the Virtio-user #ifdef is based on the defconfig options? How
> can
> >>>> it work since both Virtio PMD and Virtio-user PMD can be selected at
> the
> >>>> same time?
> >>>>
> >>>> I thought it was a define set before the headers inclusion and unset
> >>>> afterwards, but I didn't checked carefully.
> >>>>
> >>>
> >>> Maxime,
> >>> The difference between virtio PMD and Virtio-user PMD addresses is
> >> handled by vq->offset.
> >>>
> >>> When virtio PMD is running, offset will be set to buf_iova.
> >>> vq->offset = offsetof(struct rte_mbuf, buf_iova);
> >>>
> >>> When virtio_user PMD is running, offset will be set to buf_addr.
> >>> vq->offset = offsetof(struct rte_mbuf, buf_addr);
> >>
> >> Ok, but below is a build time check:
> >>
> >> +#ifdef RTE_VIRTIO_USER
> >> + __m128i flag_offset = _mm_set_epi64x(flags_temp, (uint64_t)vq-
> >>> offset);
> >> +#else
> >> + __m128i flag_offset = _mm_set_epi64x(flags_temp, 0);
> >> +#endif
> >>
> >> So how can it work for a single build for both Virtio and Virtio-user?
> >>
> >
> > Sorry, here is an implementation error. vq->offset should be used in
> descs_base for getting the iova address.
> > It will work the same as VIRTIO_MBUF_ADDR macro.
> >
> >>>>> Virtio driver can check whether virtqueue is using vectorized path
> when
> >>>> initialization, will use padded structure if it is.
> >>>>> I have added some tested code and now performance came back.
> Since
> >>>> code has changed in initialization process, it need some time for
> >> regression
> >>>> check.
> >>>>
> >>>> Ok, works for me.
> >>>>
> >>>> I am investigating a linkage issue with your series, which does not
> >>>> happen systematically (see below, it happens also with clang). David
> >>>> pointed me to some Intel patches removing the usage if __rte_weak,
> >>>> could it be related?
> >>>>
> >>>
> >>> I checked David's patch, it only changed i40e driver. Meanwhile
> attribute
> >> __rte_weak should still be in virtio_rxtx.c.
> >>> I will follow David's patch, eliminate the usage of weak attribute.
> >>
> >> Yeah, I meant below issue could be linked to __rte_weak, not that i40e
> >> patch was the cause of this problem.
> >>
> >
> > Maxime,
> > I haven't seen any build issue related to __rte_weak both with gcc and
> clang.
>
> Note that this build (which does not fail systematically) is when using
> binutils 2.30, which cause AVX512 support to be disabled.
>
Just change to binutils 2.30, AVX512 code will be skipped as expected in meson
build.
Could you please supply more information, I will try to reproduce it.
> > Thanks,
> > Marvin
> >