04/11/2019 13:07, Ray Kinsella: > > On 04/11/2019 11:30, Thomas Monjalon wrote: > > 04/11/2019 11:03, Ray Kinsella: > >> On 04/11/2019 09:54, Thomas Monjalon wrote: > >>> 04/11/2019 10:49, Ray Kinsella: > >>>> On 03/11/2019 22:41, Thomas Monjalon wrote: > >>>>> 03/11/2019 21:35, Ray Kinsella: > >>>>>> On 29/10/2019 14:27, Ferruh Yigit wrote: > >>>>>>> On 10/26/2019 5:23 PM, Thomas Monjalon wrote: > >>>>>>>> 26/10/2019 11:23, Wang, Haiyue: > >>>>>>>>> From: Thomas Monjalon [mailto:tho...@monjalon.net] > >>>>>>>>>> 26/10/2019 06:40, Wang, Haiyue: > >>>>>>>>>>> From: Thomas Monjalon [mailto:tho...@monjalon.net] > >>>>>>>>>>>> 25/10/2019 18:02, Jerin Jacob: > >>>>>>>>>>>>> On Fri, Oct 25, 2019 at 9:15 PM Thomas Monjalon > >>>>>>>>>>>>> <tho...@monjalon.net> wrote: > >>>>>>>>>>>>>> 25/10/2019 16:08, Ferruh Yigit: > >>>>>>>>>>>>>>> On 10/25/2019 10:36 AM, Thomas Monjalon wrote: > >>>>>>>>>>>>>>>> 15/10/2019 09:51, Haiyue Wang: > >>>>>>>>>>>>>>>>> Some PMDs have more than one RX/TX burst paths, add the > >>>>>>>>>>>>>>>>> ethdev API > >>>>>>>>>>>>>>>>> that allows an application to retrieve the mode information > >>>>>>>>>>>>>>>>> about > >>>>>>>>>>>>>>>>> Rx/Tx packet burst such as Scalar or Vector, and Vector > >>>>>>>>>>>>>>>>> technology > >>>>>>>>>>>>>>>>> like AVX2. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I missed this patch. I and Andrew, maintainers of ethdev, > >>>>>>>>>>>>>>>> were not CC'ed. > >>>>>>>>>>>>>>>> Ferruh, I would expect to be Cc'ed and/or get a notification > >>>>>>>>>>>>>>>> before merging. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> It has been discussed in the mail list and went through > >>>>>>>>>>>>>>> multiple discussions, > >>>>>>>>>>>>>>> patch is out since the August, +1 to cc all maintainers I > >>>>>>>>>>>>>>> missed that part, > >>>>>>>>>>>>>>> but when the patch is reviewed and there is no objection, why > >>>>>>>>>>>>>>> block the merge? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I'm not saying blocking the merge. > >>>>>>>>>>>>>> My bad is that I missed the patch and I am asking for help > >>>>>>>>>>>>>> with a notification > >>>>>>>>>>>>>> in this case. Same for Andrew I guess. > >>>>>>>>>>>>>> Note: it is merged in master and I am looking to improve this > >>>>>>>>>>>>>> feature. > >>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> +/** > >>>>>>>>>>>>>>>>> + * Ethernet device RX/TX queue packet burst mode > >>>>>>>>>>>>>>>>> information structure. > >>>>>>>>>>>>>>>>> + * Used to retrieve information about packet burst mode > >>>>>>>>>>>>>>>>> setting. > >>>>>>>>>>>>>>>>> + */ > >>>>>>>>>>>>>>>>> +struct rte_eth_burst_mode { > >>>>>>>>>>>>>>>>> + uint64_t options; > >>>>>>>>>>>>>>>>> +}; > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Why a struct for an integer? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Again by a request from me, to not need to break the API if > >>>>>>>>>>>>>>> we need to add more > >>>>>>>>>>>>>>> thing in the future. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I would replace it with a string. This is the most flexible > >>>>>>>>>>>>>> API. > >>>>>>>>>>>>> > >>>>>>>>>>>>> IMO, Probably, best of both worlds make a good option here, > >>>>>>>>>>>>> as Haiyue suggested if we have an additional dev_specific[1] in > >>>>>>>>>>>>> structure. > >>>>>>>>>>>>> and when a pass to the application, let common code make final > >>>>>>>>>>>>> string as > >>>>>>>>>>>>> (options flags to string + dev_specific) > >>>>>>>>>>>>> > >>>>>>>>>>>>> options flag can be zero if PMD does not have any generic flags > >>>>>>>>>>>>> nor > >>>>>>>>>>>>> interested in such a scheme. > >>>>>>>>>>>>> Generic flags will help at least to have some common code. > >>>>>>>>>>>>> > >>>>>>>>>>>>> [1] > >>>>>>>>>>>>> struct rte_eth_burst_mode { > >>>>>>>>>>>>> uint64_t options; > >>>>>>>>>>>>> char dev_specific[128]; /* PMD has specific burst mode > >>>>>>>>>>>>> information */ > >>>>>>>>>>>>> }; > >>>>>>>>>>>> > >>>>>>>>>>>> I really don't see how we can have generic flags. > >>>>>>>>>>>> The flags which are proposed are just matching > >>>>>>>>>>>> the functions implemented in Intel PMDs. > >>>>>>>>>>>> And this is a complicate solution. > >>>>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode? > >>>>>>>>>>> > >>>>>>>>>>> Intel PMDs use the *generic* methods like x86 SSE, AVX2, ARM > >>>>>>>>>>> NEON, PPC ALTIVEC, > >>>>>>>>>>> 'dev->data->scattered_rx' etc for the target : "DPDK is the Data > >>>>>>>>>>> Plane Development Kit > >>>>>>>>>>> that consists of libraries to accelerate packet processing > >>>>>>>>>>> workloads running on a wide > >>>>>>>>>>> variety of CPU architectures." > >>>>>>>>>> > >>>>>>>>>> How RTE_ETH_BURST_SCATTERED and RTE_ETH_BURST_BULK_ALLOC are > >>>>>>>>>> generic? > >>>>>>>>>> They just match some features of the Intel PMDs. > >>>>>>>>>> Why not exposing other optimizations of the Rx/Tx implementations? > >>>>>>>>>> You totally missed the point of generic burst mode description. > >>>>>>>>>> > >>>>>>>>>>> If understand these new experimental APIs from above, then bit > >>>>>>>>>>> options is the best, > >>>>>>>>>>> and we didn't invent new words to describe them, just from the > >>>>>>>>>>> CPU & other *generic* > >>>>>>>>>>> technology. And the application can loop to check which kind of > >>>>>>>>>>> burst is running by > >>>>>>>>>>> just simple bit test. > >>>>>>>>>>> > >>>>>>>>>>> If PMDs missed these, they can update them in future roadmaps to > >>>>>>>>>>> enhance their PMDs, > >>>>>>>>>>> like MLX5 supports ARM NEON, x86 SSE. > >>>>>>>>>> > >>>>>>>>>> I have no word! > >>>>>>>>>> You really think other PMDs should learn from Intel how to > >>>>>>>>>> "enhance" their PMD? > >>>>>>>>>> You talk about mlx5, did you look at its code? Did you see the > >>>>>>>>>> burst modes > >>>>>>>>>> depending on which specific hardware path is used (MPRQ, EMPW, > >>>>>>>>>> inline)? > >>>>>>>>>> Or depending on which offloads are handled? > >>>>>>>>>> > >>>>>>>>>> Again, the instruction set used by the function is a small part > >>>>>>>>>> of the burst mode optimization. > >>>>>>>>>> > >>>>>>>>>> So you did not reply to my question: > >>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode? > >>>>>>>>> > >>>>>>>>> In fact, RFC v1/v2 returns the *name*, but the *name* is hard for > >>>>>>>>> application to do further processing, strcmp, strstr ? Not so nice > >>>>>>>>> for C code, and it is not so standard, So switch it to bit > >>>>>>>>> definition. > >>>>>>>> > >>>>>>>> Again, please answer my question: why do you need it? > >>>>>>>> I think it is just informative, that's why a string should be enough. > >>>>>>>> I am clearly against the bitmap because it is way too much > >>>>>>>> restrictive. > >>>>>>>> I disagree that knowing it is using AVX2 or AVX512 is so interesting. > >>>>>>>> What you would like to know is whether it is processing packets 4 by > >>>>>>>> 4, > >>>>>>>> for instance, or to know which offload is supported, or what > >>>>>>>> hardware trick > >>>>>>>> is used in the datapath design. > >>>>>>>> There are so many options in a datapath design that it cannot be > >>>>>>>> represented with a bitmap. And it makes no sense to have some design > >>>>>>>> criterias more important than others. > >>>>>>>> I Cc an Intel architect (Edwin) who could explain you how much > >>>>>>>> a datapath design is more complicate than just using AVX > >>>>>>>> instructions. > >>>>>>> > >>>>>>> As I understand this is to let applications to give informed decision > >>>>>>> based on > >>>>>>> what vectorization is used in the driver, currently this is not know > >>>>>>> by the > >>>>>>> application. > >>>>>>> > >>>>>>> And as previously replied, the main target of the API is to define > >>>>>>> the vector > >>>>>>> path, not all optimizations, so the number is limited. > >>>>> > >>>>> No! > >>>>> The name of this API is "burst mode information", > >>>>> not "vector instructions used". > >>>>> I think the main error is that in Intel PMDs, > >>>>> each Rx/Tx function use different vector instructions. > >>>>> So you generalize that knowing the vectors instructions > >>>>> will give you a good information about the performance. > >>>>> But this is generally wrong! > >>>>> The right level of infos is much more complex. > >>>> > >>>> I don't think anyone was suggesting limiting it to purely describing PMD > >>>> optimization > >>>> with vector instructions. If there are other commonalities let's > >>>> describe those also. > >>>> > >>>> Vectorization was thought to be a good starting point - IMHO it is. > >>>> > >>>>> > >>>>>>> There are many optimization in the data path, I agree we may not > >>>>>>> represent all > >>>>>>> of them, and agreed existing enum having "RTE_ETH_BURST_BULK_ALLOC" > >>>>>>> and similar > >>>>>>> causing this confusion, perhaps we can remove them. > >>>>>>> > >>>>>>> And if the requirement from the application is just informative, I > >>>>>>> would agree > >>>>>>> that free text string will be better, right now > >>>>>>> 'rte_eth_rx/tx_burst_mode_get()' > >>>>>>> is the main API to provide the information and > >>>>>>> 'rte_eth_burst_mode_option_name()' is a helper for application/driver > >>>>>>> to log > >>>>>>> this information. > >>>>>>> > >>>>>> > >>>>>> Well look we have a general deficit of information about what is > >>>>>> happening under > >>>>>> the covers in DPDK. The end user may get wildly different performance > >>>>>> characteristics > >>>>>> based on the DPDK configuration. Simple example is using flow director > >>>>>> causes the i40e > >>>>>> PMD to switch to using a scalar code path, and performance may as much > >>>>>> as half. > >>>>>> > >>>>>> This can cause no end of head-scratching in consuming products, I have > >>>>>> done some > >>>>>> of that head scratching myself, it is a usability nightmare. > >>>>>> > >>>>>> FD.io VPP tries to work around this by mining the call stack, to give > >>>>>> the user _some_ > >>>>>> kind of information about what is happening. These kind of heroics > >>>>>> should not be necessary. > >>>>>> > >>>>>> For exactly the same reasons as telemetry, we should be trying to give > >>>>>> the users as much > >>>>>> information as possible, in as standard as format as possible. > >>>>>> Otherwise DPDK > >>>>>> becomes arcane leaving the user running gdb to understand what is > >>>>>> going on, as I > >>>>>> frequently do. > >>>>> > >>>>> I agree we must provide a clue to understand the performance result. > >>>>> As Stephen commented at the very beginning, a log is enough for such > >>>>> debug. > >>>>> But his comment was ignored. > >>>> > >>>> Do we expect applications built on DPDK to have to grep it's log to make > >>>> such discoveries? > >>>> It's very brittle and arcane way to provide information, if nothing > >>>> else. > >>>> > >>>>> You wanted an API, fine. > >>>>> I am OK to have an API to request infos which are also in logs. > >>>> > >>>> I would point out that an API to query meta-data is common practice else > >>>> where. > >>>> GStreamer GstCaps and Linux Sysfs are the closest example I can think of. > >>>> > >>>>> > >>>>>> Finally, again for the same reasons as telemetry, I would say that > >>>>>> machine readable is the > >>>>>> ideal here. > >>>>> > >>>>> I disagree here. There is no need to make this info machine readable. > >>>>> We want a clue about the optimizations which are all about creativity. > >>>>> And we cannot make creativity of developers "machine readable". > >>>> > >>>> I am more concerned about the creativity in how developers describe > >>>> optimizations. > >>>> If there is no standardization of strings (or bits), the API will be > >>>> challenging to use. > >>> > >>> No it won't be challenging because it will be just a string to print. > >> > >> Well the challenge is getting everyone to use the same set of strings, > >> such that what is returned by the API has common meaning. > >> > >> I am fine with strings. > >> So long as we have a method of encouraging folks to use a standard set > >> were possible. > > > > I don't understand why you insist on standardizing. > > Every drivers are different. > > The routine names will have a sense only in the context of the driver. > > The more diversity in description, the more the user is reaching for > documentation. > If the user is lucky enough that description has documentation.
I would go even further: The documentation will not explain each Rx/Tx routine. The user can have some guess, but the main information is to see that the routine changed from one run to the other, so he can expect a change in the performance result. And as a bonus, he can ask more explanation to the maintainers by giving the routine name he got from the API. > We can argue about this indefinitely, instead of proposing a standard. :-) > The best way to this is to leave as the API as experimental for some period - > as Haiyue suggests. > And then review as the API drops experimental status, with a view to > standardizing if possible? Yes we can argue indefinitely. My position is against standardizing this information. (not even talking about the ABI breaks it could cause) > >>> The challenge is trying to fix the design characteristics in an API. > >> > >> I thought Haiyue's patch with a fair degree of input from Ferruh and > >> others is a pretty solid start. > >> Let's describe those commonalities that _do_ exist today - it may not be > >> enough, but it's better than we had. > > > > The initial requirement is to describe what makes the performance > > change from one routine to the other. > > We don't want to describe the commonalities but the specific differences. > > Please let's focus on the real requirement and build an API which really > > helps. > > As said several times, a PMD-specific string would be a good API.