> -----Original Message----- > From: Ray Kinsella <m...@ashroe.eu> > Sent: Monday, November 4, 2019 21:49 > To: Thomas Monjalon <tho...@monjalon.net> > Cc: dev@dpdk.org; Yigit, Ferruh <ferruh.yi...@intel.com>; 'Damjan Marion' > <dmar...@me.com>; Wang, > Haiyue <haiyue.w...@intel.com>; Jerin Jacob Kollanukkaran > <jer...@marvell.com>; > viachesl...@mellanox.com; step...@networkplumber.org; > arybche...@solarflare.com > Subject: Re: [dpdk-dev] [PATCH v4 1/4] ethdev: add the API for getting burst > mode information > > > > On 04/11/2019 13:09, Thomas Monjalon wrote: > > 04/11/2019 13:07, Ray Kinsella: > >> > >> On 04/11/2019 11:30, Thomas Monjalon wrote: > >>> 04/11/2019 11:03, Ray Kinsella: > >>>> On 04/11/2019 09:54, Thomas Monjalon wrote: > >>>>> 04/11/2019 10:49, Ray Kinsella: > >>>>>> On 03/11/2019 22:41, Thomas Monjalon wrote: > >>>>>>> 03/11/2019 21:35, Ray Kinsella: > >>>>>>>> On 29/10/2019 14:27, Ferruh Yigit wrote: > >>>>>>>>> On 10/26/2019 5:23 PM, Thomas Monjalon wrote: > >>>>>>>>>> 26/10/2019 11:23, Wang, Haiyue: > >>>>>>>>>>> From: Thomas Monjalon [mailto:tho...@monjalon.net] > >>>>>>>>>>>> 26/10/2019 06:40, Wang, Haiyue: > >>>>>>>>>>>>> From: Thomas Monjalon [mailto:tho...@monjalon.net] > >>>>>>>>>>>>>> 25/10/2019 18:02, Jerin Jacob: > >>>>>>>>>>>>>>> On Fri, Oct 25, 2019 at 9:15 PM Thomas Monjalon > >>>>>>>>>>>>>>> <tho...@monjalon.net> wrote: > >>>>>>>>>>>>>>>> 25/10/2019 16:08, Ferruh Yigit: > >>>>>>>>>>>>>>>>> On 10/25/2019 10:36 AM, Thomas Monjalon wrote: > >>>>>>>>>>>>>>>>>> 15/10/2019 09:51, Haiyue Wang: > >>>>>>>>>>>>>>>>>>> Some PMDs have more than one RX/TX burst paths, add the > >>>>>>>>>>>>>>>>>>> ethdev API > >>>>>>>>>>>>>>>>>>> that allows an application to retrieve the mode > >>>>>>>>>>>>>>>>>>> information about > >>>>>>>>>>>>>>>>>>> Rx/Tx packet burst such as Scalar or Vector, and Vector > >>>>>>>>>>>>>>>>>>> technology > >>>>>>>>>>>>>>>>>>> like AVX2. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I missed this patch. I and Andrew, maintainers of ethdev, > >>>>>>>>>>>>>>>>>> were not CC'ed. > >>>>>>>>>>>>>>>>>> Ferruh, I would expect to be Cc'ed and/or get a > >>>>>>>>>>>>>>>>>> notification before merging. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> It has been discussed in the mail list and went through > >>>>>>>>>>>>>>>>> multiple discussions, > >>>>>>>>>>>>>>>>> patch is out since the August, +1 to cc all maintainers I > >>>>>>>>>>>>>>>>> missed that part, > >>>>>>>>>>>>>>>>> but when the patch is reviewed and there is no objection, > >>>>>>>>>>>>>>>>> why block the merge? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I'm not saying blocking the merge. > >>>>>>>>>>>>>>>> My bad is that I missed the patch and I am asking for help > >>>>>>>>>>>>>>>> with a notification > >>>>>>>>>>>>>>>> in this case. Same for Andrew I guess. > >>>>>>>>>>>>>>>> Note: it is merged in master and I am looking to improve > >>>>>>>>>>>>>>>> this feature. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> +/** > >>>>>>>>>>>>>>>>>>> + * Ethernet device RX/TX queue packet burst mode > >>>>>>>>>>>>>>>>>>> information structure. > >>>>>>>>>>>>>>>>>>> + * Used to retrieve information about packet burst mode > >>>>>>>>>>>>>>>>>>> setting. > >>>>>>>>>>>>>>>>>>> + */ > >>>>>>>>>>>>>>>>>>> +struct rte_eth_burst_mode { > >>>>>>>>>>>>>>>>>>> + uint64_t options; > >>>>>>>>>>>>>>>>>>> +}; > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Why a struct for an integer? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Again by a request from me, to not need to break the API if > >>>>>>>>>>>>>>>>> we need to add more > >>>>>>>>>>>>>>>>> thing in the future. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I would replace it with a string. This is the most flexible > >>>>>>>>>>>>>>>> API. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> IMO, Probably, best of both worlds make a good option here, > >>>>>>>>>>>>>>> as Haiyue suggested if we have an additional dev_specific[1] > >>>>>>>>>>>>>>> in structure. > >>>>>>>>>>>>>>> and when a pass to the application, let common code make > >>>>>>>>>>>>>>> final string as > >>>>>>>>>>>>>>> (options flags to string + dev_specific) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> options flag can be zero if PMD does not have any generic > >>>>>>>>>>>>>>> flags nor > >>>>>>>>>>>>>>> interested in such a scheme. > >>>>>>>>>>>>>>> Generic flags will help at least to have some common code. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>> struct rte_eth_burst_mode { > >>>>>>>>>>>>>>> uint64_t options; > >>>>>>>>>>>>>>> char dev_specific[128]; /* PMD has specific burst > >>>>>>>>>>>>>>> mode information */ > >>>>>>>>>>>>>>> }; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I really don't see how we can have generic flags. > >>>>>>>>>>>>>> The flags which are proposed are just matching > >>>>>>>>>>>>>> the functions implemented in Intel PMDs. > >>>>>>>>>>>>>> And this is a complicate solution. > >>>>>>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Intel PMDs use the *generic* methods like x86 SSE, AVX2, ARM > >>>>>>>>>>>>> NEON, PPC ALTIVEC, > >>>>>>>>>>>>> 'dev->data->scattered_rx' etc for the target : "DPDK is the > >>>>>>>>>>>>> Data Plane Development Kit > >>>>>>>>>>>>> that consists of libraries to accelerate packet processing > >>>>>>>>>>>>> workloads running on a wide > >>>>>>>>>>>>> variety of CPU architectures." > >>>>>>>>>>>> > >>>>>>>>>>>> How RTE_ETH_BURST_SCATTERED and RTE_ETH_BURST_BULK_ALLOC are > >>>>>>>>>>>> generic? > >>>>>>>>>>>> They just match some features of the Intel PMDs. > >>>>>>>>>>>> Why not exposing other optimizations of the Rx/Tx > >>>>>>>>>>>> implementations? > >>>>>>>>>>>> You totally missed the point of generic burst mode description. > >>>>>>>>>>>> > >>>>>>>>>>>>> If understand these new experimental APIs from above, then bit > >>>>>>>>>>>>> options is the best, > >>>>>>>>>>>>> and we didn't invent new words to describe them, just from the > >>>>>>>>>>>>> CPU & other *generic* > >>>>>>>>>>>>> technology. And the application can loop to check which kind of > >>>>>>>>>>>>> burst is running by > >>>>>>>>>>>>> just simple bit test. > >>>>>>>>>>>>> > >>>>>>>>>>>>> If PMDs missed these, they can update them in future roadmaps > >>>>>>>>>>>>> to enhance their PMDs, > >>>>>>>>>>>>> like MLX5 supports ARM NEON, x86 SSE. > >>>>>>>>>>>> > >>>>>>>>>>>> I have no word! > >>>>>>>>>>>> You really think other PMDs should learn from Intel how to > >>>>>>>>>>>> "enhance" their PMD? > >>>>>>>>>>>> You talk about mlx5, did you look at its code? Did you see the > >>>>>>>>>>>> burst modes > >>>>>>>>>>>> depending on which specific hardware path is used (MPRQ, EMPW, > >>>>>>>>>>>> inline)? > >>>>>>>>>>>> Or depending on which offloads are handled? > >>>>>>>>>>>> > >>>>>>>>>>>> Again, the instruction set used by the function is a small part > >>>>>>>>>>>> of the burst mode optimization. > >>>>>>>>>>>> > >>>>>>>>>>>> So you did not reply to my question: > >>>>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode? > >>>>>>>>>>> > >>>>>>>>>>> In fact, RFC v1/v2 returns the *name*, but the *name* is hard for > >>>>>>>>>>> application to do further processing, strcmp, strstr ? Not so nice > >>>>>>>>>>> for C code, and it is not so standard, So switch it to bit > >>>>>>>>>>> definition. > >>>>>>>>>> > >>>>>>>>>> Again, please answer my question: why do you need it? > >>>>>>>>>> I think it is just informative, that's why a string should be > >>>>>>>>>> enough. > >>>>>>>>>> I am clearly against the bitmap because it is way too much > >>>>>>>>>> restrictive. > >>>>>>>>>> I disagree that knowing it is using AVX2 or AVX512 is so > >>>>>>>>>> interesting. > >>>>>>>>>> What you would like to know is whether it is processing packets 4 > >>>>>>>>>> by 4, > >>>>>>>>>> for instance, or to know which offload is supported, or what > >>>>>>>>>> hardware trick > >>>>>>>>>> is used in the datapath design. > >>>>>>>>>> There are so many options in a datapath design that it cannot be > >>>>>>>>>> represented with a bitmap. And it makes no sense to have some > >>>>>>>>>> design > >>>>>>>>>> criterias more important than others. > >>>>>>>>>> I Cc an Intel architect (Edwin) who could explain you how much > >>>>>>>>>> a datapath design is more complicate than just using AVX > >>>>>>>>>> instructions. > >>>>>>>>> > >>>>>>>>> As I understand this is to let applications to give informed > >>>>>>>>> decision based on > >>>>>>>>> what vectorization is used in the driver, currently this is not > >>>>>>>>> know by the > >>>>>>>>> application. > >>>>>>>>> > >>>>>>>>> And as previously replied, the main target of the API is to define > >>>>>>>>> the vector > >>>>>>>>> path, not all optimizations, so the number is limited. > >>>>>>> > >>>>>>> No! > >>>>>>> The name of this API is "burst mode information", > >>>>>>> not "vector instructions used". > >>>>>>> I think the main error is that in Intel PMDs, > >>>>>>> each Rx/Tx function use different vector instructions. > >>>>>>> So you generalize that knowing the vectors instructions > >>>>>>> will give you a good information about the performance. > >>>>>>> But this is generally wrong! > >>>>>>> The right level of infos is much more complex. > >>>>>> > >>>>>> I don't think anyone was suggesting limiting it to purely describing > >>>>>> PMD optimization > >>>>>> with vector instructions. If there are other commonalities let's > >>>>>> describe those also. > >>>>>> > >>>>>> Vectorization was thought to be a good starting point - IMHO it is. > >>>>>> > >>>>>>> > >>>>>>>>> There are many optimization in the data path, I agree we may not > >>>>>>>>> represent all > >>>>>>>>> of them, and agreed existing enum having "RTE_ETH_BURST_BULK_ALLOC" > >>>>>>>>> and similar > >>>>>>>>> causing this confusion, perhaps we can remove them. > >>>>>>>>> > >>>>>>>>> And if the requirement from the application is just informative, I > >>>>>>>>> would agree > >>>>>>>>> that free text string will be better, right now > >>>>>>>>> 'rte_eth_rx/tx_burst_mode_get()' > >>>>>>>>> is the main API to provide the information and > >>>>>>>>> 'rte_eth_burst_mode_option_name()' is a helper for > >>>>>>>>> application/driver to log > >>>>>>>>> this information. > >>>>>>>>> > >>>>>>>> > >>>>>>>> Well look we have a general deficit of information about what is > >>>>>>>> happening under > >>>>>>>> the covers in DPDK. The end user may get wildly different > >>>>>>>> performance characteristics > >>>>>>>> based on the DPDK configuration. Simple example is using flow > >>>>>>>> director causes the i40e > >>>>>>>> PMD to switch to using a scalar code path, and performance may as > >>>>>>>> much as half. > >>>>>>>> > >>>>>>>> This can cause no end of head-scratching in consuming products, I > >>>>>>>> have done some > >>>>>>>> of that head scratching myself, it is a usability nightmare. > >>>>>>>> > >>>>>>>> FD.io VPP tries to work around this by mining the call stack, to > >>>>>>>> give the user _some_ > >>>>>>>> kind of information about what is happening. These kind of heroics > >>>>>>>> should not be necessary. > >>>>>>>> > >>>>>>>> For exactly the same reasons as telemetry, we should be trying to > >>>>>>>> give the users as much > >>>>>>>> information as possible, in as standard as format as possible. > >>>>>>>> Otherwise DPDK > >>>>>>>> becomes arcane leaving the user running gdb to understand what is > >>>>>>>> going on, as I > >>>>>>>> frequently do. > >>>>>>> > >>>>>>> I agree we must provide a clue to understand the performance result. > >>>>>>> As Stephen commented at the very beginning, a log is enough for such > >>>>>>> debug. > >>>>>>> But his comment was ignored. > >>>>>> > >>>>>> Do we expect applications built on DPDK to have to grep it's log to > >>>>>> make such discoveries? > >>>>>> It's very brittle and arcane way to provide information, if nothing > >>>>>> else. > >>>>>> > >>>>>>> You wanted an API, fine. > >>>>>>> I am OK to have an API to request infos which are also in logs. > >>>>>> > >>>>>> I would point out that an API to query meta-data is common practice > >>>>>> else where. > >>>>>> GStreamer GstCaps and Linux Sysfs are the closest example I can think > >>>>>> of. > >>>>>> > >>>>>>> > >>>>>>>> Finally, again for the same reasons as telemetry, I would say that > >>>>>>>> machine readable is the > >>>>>>>> ideal here. > >>>>>>> > >>>>>>> I disagree here. There is no need to make this info machine readable. > >>>>>>> We want a clue about the optimizations which are all about creativity. > >>>>>>> And we cannot make creativity of developers "machine readable". > >>>>>> > >>>>>> I am more concerned about the creativity in how developers describe > >>>>>> optimizations. > >>>>>> If there is no standardization of strings (or bits), the API will be > >>>>>> challenging to use. > >>>>> > >>>>> No it won't be challenging because it will be just a string to print. > >>>> > >>>> Well the challenge is getting everyone to use the same set of strings, > >>>> such that what is returned by the API has common meaning. > >>>> > >>>> I am fine with strings. > >>>> So long as we have a method of encouraging folks to use a standard set > >>>> were possible. > >>> > >>> I don't understand why you insist on standardizing. > >>> Every drivers are different. > >>> The routine names will have a sense only in the context of the driver. > >> > >> The more diversity in description, the more the user is reaching for > >> documentation. > >> If the user is lucky enough that description has documentation. > > > > I would go even further: > > The documentation will not explain each Rx/Tx routine. > > The user can have some guess, but the main information is to see > > that the routine changed from one run to the other, so he can expect > > a change in the performance result. > > And as a bonus, he can ask more explanation to the maintainers > > by giving the routine name he got from the API. > > > >> We can argue about this indefinitely, instead of proposing a standard. :-) > >> The best way to this is to leave as the API as experimental for some > >> period - as Haiyue suggests. > >> And then review as the API drops experimental status, with a view to > >> standardizing if possible? > > > > Yes we can argue indefinitely. > > My position is against standardizing this information. > > (not even talking about the ABI breaks it could cause) > > So here is my proposed work-around. > > We are probably to early to describe any commonalities beyond vectorization. > Vectorization is seen as too Intel specific. > > Instead let's go with the PMD-specific strings for v20 ABI. > Let's review the PMD-specific strings that emerge during v20 + Experimental, > and see if it possible to start to standardize for v21 ABI? > > My position is that some degree of standardization is necessary here, for > usability if no other reason. > I am happy to give it time for that to emerge instead of trying to dictate it. > > So go with PMD-specific strings, and review after 1 year to see if we can > improve? >
Hi Experts, Please give feedback on new patch. :-) http://patchwork.dpdk.org/patch/62368/ struct rte_eth_burst_mode { uint64_t flags; /**< The ORed values of RTE_ETH_BURST_FLAG_xxx */ #define RTE_ETH_BURST_MODE_INFO_SIZE 1024 /**< Maximum size for information */ char info[RTE_ETH_BURST_MODE_INFO_SIZE]; /**< burst mode information */ }; If most of PMDs want to follow some rule to format string, maybe new flag can be defined like: RTE_ETH_BURST_FLAG_FMT_BY_XXX, then the 'info' has some kind of the same format. And the old application can still just *display* it. New application can parse it by the known format as indicated by RTE_ETH_BURST_FLAG_FMT_BY_XXX. Buy in this design ? > > > >>>>> The challenge is trying to fix the design characteristics in an API. > >>>> > >>>> I thought Haiyue's patch with a fair degree of input from Ferruh and > >>>> others is a pretty solid > start. > >>>> Let's describe those commonalities that _do_ exist today - it may not be > >>>> enough, but it's better > than we had. > >>> > >>> The initial requirement is to describe what makes the performance > >>> change from one routine to the other. > >>> We don't want to describe the commonalities but the specific differences. > >>> Please let's focus on the real requirement and build an API which really > >>> helps. > >>> As said several times, a PMD-specific string would be a good API. > > > > > >