On 04/11/2019 13:09, Thomas Monjalon wrote:
> 04/11/2019 13:07, Ray Kinsella:
>>
>> On 04/11/2019 11:30, Thomas Monjalon wrote:
>>> 04/11/2019 11:03, Ray Kinsella:
>>>> On 04/11/2019 09:54, Thomas Monjalon wrote:
>>>>> 04/11/2019 10:49, Ray Kinsella:
>>>>>> On 03/11/2019 22:41, Thomas Monjalon wrote:
>>>>>>> 03/11/2019 21:35, Ray Kinsella:
>>>>>>>> On 29/10/2019 14:27, Ferruh Yigit wrote:
>>>>>>>>> On 10/26/2019 5:23 PM, Thomas Monjalon wrote:
>>>>>>>>>> 26/10/2019 11:23, Wang, Haiyue:
>>>>>>>>>>> From: Thomas Monjalon [mailto:tho...@monjalon.net]
>>>>>>>>>>>> 26/10/2019 06:40, Wang, Haiyue:
>>>>>>>>>>>>> From: Thomas Monjalon [mailto:tho...@monjalon.net]
>>>>>>>>>>>>>> 25/10/2019 18:02, Jerin Jacob:
>>>>>>>>>>>>>>> On Fri, Oct 25, 2019 at 9:15 PM Thomas Monjalon
>>>>>>>>>>>>>>> <tho...@monjalon.net> wrote:
>>>>>>>>>>>>>>>> 25/10/2019 16:08, Ferruh Yigit:
>>>>>>>>>>>>>>>>> On 10/25/2019 10:36 AM, Thomas Monjalon wrote:
>>>>>>>>>>>>>>>>>> 15/10/2019 09:51, Haiyue Wang:
>>>>>>>>>>>>>>>>>>> Some PMDs have more than one RX/TX burst paths, add the
>>>>>>>>>>>>>>>>>>> ethdev API
>>>>>>>>>>>>>>>>>>> that allows an application to retrieve the mode information
>>>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>> Rx/Tx packet burst such as Scalar or Vector, and Vector
>>>>>>>>>>>>>>>>>>> technology
>>>>>>>>>>>>>>>>>>> like AVX2.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I missed this patch. I and Andrew, maintainers of ethdev,
>>>>>>>>>>>>>>>>>> were not CC'ed.
>>>>>>>>>>>>>>>>>> Ferruh, I would expect to be Cc'ed and/or get a notification
>>>>>>>>>>>>>>>>>> before merging.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It has been discussed in the mail list and went through
>>>>>>>>>>>>>>>>> multiple discussions,
>>>>>>>>>>>>>>>>> patch is out since the August, +1 to cc all maintainers I
>>>>>>>>>>>>>>>>> missed that part,
>>>>>>>>>>>>>>>>> but when the patch is reviewed and there is no objection, why
>>>>>>>>>>>>>>>>> block the merge?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm not saying blocking the merge.
>>>>>>>>>>>>>>>> My bad is that I missed the patch and I am asking for help
>>>>>>>>>>>>>>>> with a notification
>>>>>>>>>>>>>>>> in this case. Same for Andrew I guess.
>>>>>>>>>>>>>>>> Note: it is merged in master and I am looking to improve this
>>>>>>>>>>>>>>>> feature.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>>>>>>> + * Ethernet device RX/TX queue packet burst mode
>>>>>>>>>>>>>>>>>>> information structure.
>>>>>>>>>>>>>>>>>>> + * Used to retrieve information about packet burst mode
>>>>>>>>>>>>>>>>>>> setting.
>>>>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>>>>> +struct rte_eth_burst_mode {
>>>>>>>>>>>>>>>>>>> + uint64_t options;
>>>>>>>>>>>>>>>>>>> +};
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Why a struct for an integer?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Again by a request from me, to not need to break the API if
>>>>>>>>>>>>>>>>> we need to add more
>>>>>>>>>>>>>>>>> thing in the future.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I would replace it with a string. This is the most flexible
>>>>>>>>>>>>>>>> API.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> IMO, Probably, best of both worlds make a good option here,
>>>>>>>>>>>>>>> as Haiyue suggested if we have an additional dev_specific[1] in
>>>>>>>>>>>>>>> structure.
>>>>>>>>>>>>>>> and when a pass to the application, let common code make final
>>>>>>>>>>>>>>> string as
>>>>>>>>>>>>>>> (options flags to string + dev_specific)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> options flag can be zero if PMD does not have any generic flags
>>>>>>>>>>>>>>> nor
>>>>>>>>>>>>>>> interested in such a scheme.
>>>>>>>>>>>>>>> Generic flags will help at least to have some common code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> struct rte_eth_burst_mode {
>>>>>>>>>>>>>>> uint64_t options;
>>>>>>>>>>>>>>> char dev_specific[128]; /* PMD has specific burst mode
>>>>>>>>>>>>>>> information */
>>>>>>>>>>>>>>> };
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I really don't see how we can have generic flags.
>>>>>>>>>>>>>> The flags which are proposed are just matching
>>>>>>>>>>>>>> the functions implemented in Intel PMDs.
>>>>>>>>>>>>>> And this is a complicate solution.
>>>>>>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Intel PMDs use the *generic* methods like x86 SSE, AVX2, ARM
>>>>>>>>>>>>> NEON, PPC ALTIVEC,
>>>>>>>>>>>>> 'dev->data->scattered_rx' etc for the target : "DPDK is the Data
>>>>>>>>>>>>> Plane Development Kit
>>>>>>>>>>>>> that consists of libraries to accelerate packet processing
>>>>>>>>>>>>> workloads running on a wide
>>>>>>>>>>>>> variety of CPU architectures."
>>>>>>>>>>>>
>>>>>>>>>>>> How RTE_ETH_BURST_SCATTERED and RTE_ETH_BURST_BULK_ALLOC are
>>>>>>>>>>>> generic?
>>>>>>>>>>>> They just match some features of the Intel PMDs.
>>>>>>>>>>>> Why not exposing other optimizations of the Rx/Tx implementations?
>>>>>>>>>>>> You totally missed the point of generic burst mode description.
>>>>>>>>>>>>
>>>>>>>>>>>>> If understand these new experimental APIs from above, then bit
>>>>>>>>>>>>> options is the best,
>>>>>>>>>>>>> and we didn't invent new words to describe them, just from the
>>>>>>>>>>>>> CPU & other *generic*
>>>>>>>>>>>>> technology. And the application can loop to check which kind of
>>>>>>>>>>>>> burst is running by
>>>>>>>>>>>>> just simple bit test.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If PMDs missed these, they can update them in future roadmaps to
>>>>>>>>>>>>> enhance their PMDs,
>>>>>>>>>>>>> like MLX5 supports ARM NEON, x86 SSE.
>>>>>>>>>>>>
>>>>>>>>>>>> I have no word!
>>>>>>>>>>>> You really think other PMDs should learn from Intel how to
>>>>>>>>>>>> "enhance" their PMD?
>>>>>>>>>>>> You talk about mlx5, did you look at its code? Did you see the
>>>>>>>>>>>> burst modes
>>>>>>>>>>>> depending on which specific hardware path is used (MPRQ, EMPW,
>>>>>>>>>>>> inline)?
>>>>>>>>>>>> Or depending on which offloads are handled?
>>>>>>>>>>>>
>>>>>>>>>>>> Again, the instruction set used by the function is a small part
>>>>>>>>>>>> of the burst mode optimization.
>>>>>>>>>>>>
>>>>>>>>>>>> So you did not reply to my question:
>>>>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode?
>>>>>>>>>>>
>>>>>>>>>>> In fact, RFC v1/v2 returns the *name*, but the *name* is hard for
>>>>>>>>>>> application to do further processing, strcmp, strstr ? Not so nice
>>>>>>>>>>> for C code, and it is not so standard, So switch it to bit
>>>>>>>>>>> definition.
>>>>>>>>>>
>>>>>>>>>> Again, please answer my question: why do you need it?
>>>>>>>>>> I think it is just informative, that's why a string should be enough.
>>>>>>>>>> I am clearly against the bitmap because it is way too much
>>>>>>>>>> restrictive.
>>>>>>>>>> I disagree that knowing it is using AVX2 or AVX512 is so interesting.
>>>>>>>>>> What you would like to know is whether it is processing packets 4 by
>>>>>>>>>> 4,
>>>>>>>>>> for instance, or to know which offload is supported, or what
>>>>>>>>>> hardware trick
>>>>>>>>>> is used in the datapath design.
>>>>>>>>>> There are so many options in a datapath design that it cannot be
>>>>>>>>>> represented with a bitmap. And it makes no sense to have some design
>>>>>>>>>> criterias more important than others.
>>>>>>>>>> I Cc an Intel architect (Edwin) who could explain you how much
>>>>>>>>>> a datapath design is more complicate than just using AVX
>>>>>>>>>> instructions.
>>>>>>>>>
>>>>>>>>> As I understand this is to let applications to give informed decision
>>>>>>>>> based on
>>>>>>>>> what vectorization is used in the driver, currently this is not know
>>>>>>>>> by the
>>>>>>>>> application.
>>>>>>>>>
>>>>>>>>> And as previously replied, the main target of the API is to define
>>>>>>>>> the vector
>>>>>>>>> path, not all optimizations, so the number is limited.
>>>>>>>
>>>>>>> No!
>>>>>>> The name of this API is "burst mode information",
>>>>>>> not "vector instructions used".
>>>>>>> I think the main error is that in Intel PMDs,
>>>>>>> each Rx/Tx function use different vector instructions.
>>>>>>> So you generalize that knowing the vectors instructions
>>>>>>> will give you a good information about the performance.
>>>>>>> But this is generally wrong!
>>>>>>> The right level of infos is much more complex.
>>>>>>
>>>>>> I don't think anyone was suggesting limiting it to purely describing PMD
>>>>>> optimization
>>>>>> with vector instructions. If there are other commonalities let's
>>>>>> describe those also.
>>>>>>
>>>>>> Vectorization was thought to be a good starting point - IMHO it is.
>>>>>>
>>>>>>>
>>>>>>>>> There are many optimization in the data path, I agree we may not
>>>>>>>>> represent all
>>>>>>>>> of them, and agreed existing enum having "RTE_ETH_BURST_BULK_ALLOC"
>>>>>>>>> and similar
>>>>>>>>> causing this confusion, perhaps we can remove them.
>>>>>>>>>
>>>>>>>>> And if the requirement from the application is just informative, I
>>>>>>>>> would agree
>>>>>>>>> that free text string will be better, right now
>>>>>>>>> 'rte_eth_rx/tx_burst_mode_get()'
>>>>>>>>> is the main API to provide the information and
>>>>>>>>> 'rte_eth_burst_mode_option_name()' is a helper for application/driver
>>>>>>>>> to log
>>>>>>>>> this information.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Well look we have a general deficit of information about what is
>>>>>>>> happening under
>>>>>>>> the covers in DPDK. The end user may get wildly different performance
>>>>>>>> characteristics
>>>>>>>> based on the DPDK configuration. Simple example is using flow director
>>>>>>>> causes the i40e
>>>>>>>> PMD to switch to using a scalar code path, and performance may as much
>>>>>>>> as half.
>>>>>>>>
>>>>>>>> This can cause no end of head-scratching in consuming products, I have
>>>>>>>> done some
>>>>>>>> of that head scratching myself, it is a usability nightmare.
>>>>>>>>
>>>>>>>> FD.io VPP tries to work around this by mining the call stack, to give
>>>>>>>> the user _some_
>>>>>>>> kind of information about what is happening. These kind of heroics
>>>>>>>> should not be necessary.
>>>>>>>>
>>>>>>>> For exactly the same reasons as telemetry, we should be trying to give
>>>>>>>> the users as much
>>>>>>>> information as possible, in as standard as format as possible.
>>>>>>>> Otherwise DPDK
>>>>>>>> becomes arcane leaving the user running gdb to understand what is
>>>>>>>> going on, as I
>>>>>>>> frequently do.
>>>>>>>
>>>>>>> I agree we must provide a clue to understand the performance result.
>>>>>>> As Stephen commented at the very beginning, a log is enough for such
>>>>>>> debug.
>>>>>>> But his comment was ignored.
>>>>>>
>>>>>> Do we expect applications built on DPDK to have to grep it's log to make
>>>>>> such discoveries?
>>>>>> It's very brittle and arcane way to provide information, if nothing
>>>>>> else.
>>>>>>
>>>>>>> You wanted an API, fine.
>>>>>>> I am OK to have an API to request infos which are also in logs.
>>>>>>
>>>>>> I would point out that an API to query meta-data is common practice else
>>>>>> where.
>>>>>> GStreamer GstCaps and Linux Sysfs are the closest example I can think of.
>>>>>>
>>>>>>>
>>>>>>>> Finally, again for the same reasons as telemetry, I would say that
>>>>>>>> machine readable is the
>>>>>>>> ideal here.
>>>>>>>
>>>>>>> I disagree here. There is no need to make this info machine readable.
>>>>>>> We want a clue about the optimizations which are all about creativity.
>>>>>>> And we cannot make creativity of developers "machine readable".
>>>>>>
>>>>>> I am more concerned about the creativity in how developers describe
>>>>>> optimizations.
>>>>>> If there is no standardization of strings (or bits), the API will be
>>>>>> challenging to use.
>>>>>
>>>>> No it won't be challenging because it will be just a string to print.
>>>>
>>>> Well the challenge is getting everyone to use the same set of strings,
>>>> such that what is returned by the API has common meaning.
>>>>
>>>> I am fine with strings.
>>>> So long as we have a method of encouraging folks to use a standard set
>>>> were possible.
>>>
>>> I don't understand why you insist on standardizing.
>>> Every drivers are different.
>>> The routine names will have a sense only in the context of the driver.
>>
>> The more diversity in description, the more the user is reaching for
>> documentation.
>> If the user is lucky enough that description has documentation.
>
> I would go even further:
> The documentation will not explain each Rx/Tx routine.
> The user can have some guess, but the main information is to see
> that the routine changed from one run to the other, so he can expect
> a change in the performance result.
> And as a bonus, he can ask more explanation to the maintainers
> by giving the routine name he got from the API.
>
>> We can argue about this indefinitely, instead of proposing a standard. :-)
>> The best way to this is to leave as the API as experimental for some period
>> - as Haiyue suggests.
>> And then review as the API drops experimental status, with a view to
>> standardizing if possible?
>
> Yes we can argue indefinitely.
> My position is against standardizing this information.
> (not even talking about the ABI breaks it could cause)
So here is my proposed work-around.
We are probably to early to describe any commonalities beyond vectorization.
Vectorization is seen as too Intel specific.
Instead let's go with the PMD-specific strings for v20 ABI.
Let's review the PMD-specific strings that emerge during v20 + Experimental,
and see if it possible to start to standardize for v21 ABI?
My position is that some degree of standardization is necessary here, for
usability if no other reason.
I am happy to give it time for that to emerge instead of trying to dictate it.
So go with PMD-specific strings, and review after 1 year to see if we can
improve?
>
>>>>> The challenge is trying to fix the design characteristics in an API.
>>>>
>>>> I thought Haiyue's patch with a fair degree of input from Ferruh and
>>>> others is a pretty solid start.
>>>> Let's describe those commonalities that _do_ exist today - it may not be
>>>> enough, but it's better than we had.
>>>
>>> The initial requirement is to describe what makes the performance
>>> change from one routine to the other.
>>> We don't want to describe the commonalities but the specific differences.
>>> Please let's focus on the real requirement and build an API which really
>>> helps.
>>> As said several times, a PMD-specific string would be a good API.
>
>
>