09/10/2020 12:03, Burakov, Anatoly: > On 09-Oct-20 10:54 AM, Jerin Jacob wrote: > > On Fri, Oct 9, 2020 at 3:10 PM Burakov, Anatoly > > <anatoly.bura...@intel.com> wrote: > >> On 09-Oct-20 10:29 AM, Thomas Monjalon wrote: > >>> 09/10/2020 11:25, Burakov, Anatoly: > >>>> On 09-Oct-20 6:42 AM, Jerin Jacob wrote: > >>>>> On Thu, Oct 8, 2020 at 10:38 PM Ananyev, Konstantin > >>>>> <konstantin.anan...@intel.com> wrote: > >>>>>>> On Thu, Oct 8, 2020 at 6:57 PM Burakov, Anatoly > >>>>>>> <anatoly.bura...@intel.com> wrote: > >>>>>>>> On 08-Oct-20 9:44 AM, Jerin Jacob wrote: > >>>>>>>>> On Thu, Oct 8, 2020 at 2:04 PM Thomas Monjalon > >>>>>>>>> <tho...@monjalon.net> wrote: > >>>>>>>>>> > >>>>>>>>>>> Add two new power management intrinsics, and provide an > >>>>>>>>>>> implementation > >>>>>>>>>>> in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions > >>>>>>>>>>> are implemented as raw byte opcodes because there is not yet > >>>>>>>>>>> widespread > >>>>>>>>>>> compiler support for these instructions. > >>>>>>>>>>> > >>>>>>>>>>> The power management instructions provide an architecture-specific > >>>>>>>>>>> function to either wait until a specified TSC timestamp is > >>>>>>>>>>> reached, or > >>>>>>>>>>> optionally wait until either a TSC timestamp is reached or a > >>>>>>>>>>> memory > >>>>>>>>>>> location is written to. The monitor function also provides an > >>>>>>>>>>> optional > >>>>>>>>>>> comparison, to avoid sleeping when the expected write has already > >>>>>>>>>>> happened, and no more writes are expected. > >>>>>>>>>>> > >>>>>>>>>>> For more details, Please reference Intel SDM Volume 2. > >>>>>>>>>> > >>>>>>>>>> I really would like to see feedbacks from other arch maintainers. > >>>>>>>>>> Unfortunately they were not Cc'ed. > >>>>>>>>> > >>>>>>>>> Shared the feedback from the arm64 perspective here. Yet to get a > >>>>>>>>> reply on this. > >>>>>>>>> http://mails.dpdk.org/archives/dev/2020-September/181646.html > >>>>>>>>> > >>>>>>>>>> Also please mark the new functions as experimental. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>>> Hi Jerin, > >>>>>>> > >>>>>>> Hi Anatoly, > >>>>>>> > >>>>>>>> > >>>>>>>> > IMO, We must introduce some arch feature-capability _get_ > >>>>>>>> scheme to tell > >>>>>>>> > the consumer of this API is only supported on x86. Probably as > >>>>>>>> functions[1] > >>>>>>>> > or macro flags scheme and have a stub for the other > >>>>>>>> architectures as the > >>>>>>>> > API marked as generic ie rte_power_* not rte_x86_.. > >>>>>>>> > > >>>>>>>> > This will help the consumer to create workers based on the > >>>>>>>> instruction features > >>>>>>>> > which can NOT be abstracted as a generic feature across the > >>>>>>>> architectures. > >>>>>>>> > >>>>>>>> I'm not entirely sure what you mean by that. > >>>>>>>> > >>>>>>>> I mean, yes, we should have added stubs for other architectures, and > >>>>>>>> we > >>>>>>>> will add those in future revisions, but what does your proposed > >>>>>>>> runtime > >>>>>>>> check accomplish that cannot currently be done with CPUID flags? > >>>>>>> > >>>>>>> > >>>>>>> RTE_CPUFLAG_WAITPKG flag definition is not available in other > >>>>>>> architectures. > >>>>>>> i.e RTE_CPUFLAG_WAITPKG defined in > >>>>>>> lib/librte_eal/x86/include/rte_cpuflags.h > >>>>>>> and it is used in http://patches.dpdk.org/patch/79540/ as generic API. > >>>>>>> I doubt http://patches.dpdk.org/patch/79540/ would compile on > >>>>>>> non-x86. > >>>>>> > >>>>>> > >>>>>> I am agree with Jerin, that we need some generic way to > >>>>>> figure-out does platform supports power_monitor() or not. > >>>>>> Though not sure do we need to create a new feature-get framework > >>>>>> here... > >>>>> > >>>>> That's works too. Some means of generic probing is fine. Following > >>>>> schemed needs > >>>>> more documentation on that usage, as, it is not straight forward > >>>>> compare to > >>>>> feature-get framework. Also, on the other thread, we are adding the > >>>>> new instructions like > >>>>> demote cacheline etc, maybe if the user wants to KNOW if the arch > >>>>> supports it then > >>>>> the feature-get framework is good. > >>>>> If we think, there is no other usecase for generic arch feature-get > >>>>> framework then > >>>>> we can keep the below scheme else generic arch feature is better for > >>>>> more forward > >>>>> looking use cases. > >>>>> > >>>>>> Might be just something like: > >>>>>> rte_power_monitor(...) == -ENOTSUP > >>>>>> be enough indication for that? > >>>>>> So user can just do: > >>>>>> if (rte_power_monitor(NULL, 0, 0, 0, 0) == -ENOTSUP) { > >>>>>> /* not supported path */ > >>>>>> } > >>>>>> > >>>>>> To check is that feature supported or not. > >>>>> > >>>>> > >>>> > >>>> Looking at CLDEMOTE patches, CLDEMOTE is a noop on other archs. I think > >>>> we can safely make this intrinsic as a noop on other archs as well, as > >>>> it's functionally identical to waking up immediately. > >>>> > >>>> If we're not creating this for CLDEMOTE, we don't need it here as well. > >>>> If we do need it for this, then we arguably need it for CLDEMOTE too. > >>> > >>> Sorry I don't understand what you mean, too many "it" and "this" :) > >>> > >> > >> Sorry, i meant "the generic feature-get framework". CLDEMOTE doesn't > >> exist on other archs, this doesn't too, so it's a fairly similar > >> situation. Stubbing UMWAIT with a noop is a valid approach because it's > >> equivalent to sleeping and then immediately waking up (which can happen > >> for a host of reasons unrelated to the code itself). > > > > If we are keeping the following return in the public API then it can not be > > NOP > > + * @return > > + * - 1 if wakeup was due to TSC timeout expiration. > > + * - 0 if wakeup was due to memory write or other reasons. > > + */ > > > > In the generic header, it is specified that return value is > implementation-defined (i.e. arch-specific).
Obviously an API definition should *never* be "implementation-defined". > I guess we could remove > that and set return value to either 0 or -ENOTSUP if that would resolve > the issue? > > > Also, we need to fix compilation issue if any with > > http://patches.dpdk.org/patch/79540/ > > as it has direct reference to if > > (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_WAITPKG)) { > > Either we need to add -ENOTSUP return or generic feature-get framework. > > IIRC power library isn't compiled on anything other than x86, so this > code wouldn't get compiled. It is not call "power-x86", so we must assume it could work on any architecture. > >> I'm not against a generic feature-get framework, i'm just pointing out > >> that if this is what's preventing the merge, it should prevent the merge > >> of CLDEMOTE as well, yet Jerin has acked that one and has explicitly > >> stated that he's OK with leaving CLDEMOTE as a noop on other architectures. CLDEMOTE is used for optimization, while UMWAIT can be used in a logic, that's why the expectations may be different.