> >>>>>> I did a quick prototype for this, and i don't think it is going to > >>>>>> work. > >>>>>> > >>>>>> Callbacks with just "current value" as argument will be pretty limited > >>>>>> and will only really work for cases where we know what we are > >>>>>> expecting. > >>>>>> However, for cases like event/dlb or net/mlx5, the expected value is > >>>>>> (or > >>>>>> appears to be) dependent upon some internal device data, and is not > >>>>>> constant like in case of net/ixgbe for example. > >>>>>> > >>>>>> This can be fixed by passing an opaque pointer, either by storing it in > >>>>>> the monitor condition, or by passing it directly to rte_power_monitor > >>>>>> at > >>>>>> invocation time. > >>>>>> > >>>>>> The latter doesn't work well because when we call rte_power_monitor > >>>>>> from > >>>>>> inside the rte_power library, we lack the context necessary to get said > >>>>>> opaque pointer. > >>>>>> > >>>>>> The former doesn't work either, because the only place where we can get > >>>>>> this argument is inside get_monitor_addr, but the opaque pointer must > >>>>>> persist after we exit that function in order to avoid use-after-free - > >>>>>> which means that it either has to be statically allocated (which means > >>>>>> it's not thread-safe for a non-trivial case), or dynamically allocated > >>>>>> (which a big no-no on a hotpath). > >>>>> > >>>>> If I get you right, expected_value (and probably mask) can be variable > >>>>> ones. > >>>>> So for callback approach to work we need to pass all this as parameters > >>>>> to PMD comparison callback: > >>>>> int pmc_callback(uint64_t real_val, uint64_t expected_val, uint64_t > >>>>> mask) > >>>>> Correct? > >>>> > >>>> If we have both expected value, mask, and current value, then what's the > >>>> point of the callback? The point of the callback would be to pass just > >>>> the current value, and let the callback decide what's the expected value > >>>> and how to compare it. > >>> > >>> For me the main point of callback is to hide PMD specific comparison > >>> semantics. > >>> Basically they provide us with some values in struct > >>> rte_power_monitor_cond, > >>> and then it is up to them how to interpret them in their comparison > >>> function. > >>> All we'll do for them: will read the value at address provided. > >>> I understand that it looks like an overkill, as majority of these > >>> comparison functions > >>> will be like: > >>> int cmp_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask) > >>> { > >>> return ((real_val & mask) == expected_val); > >>> } > >>> Though qsort() and bsearch() work in a similar manner, and everyone seems > >>> ok with it. > >>> > >>>> > >>>> So, we can either let callback handle expected values itself by having > >>>> an opaque callback-specific argument (which means it has to persist > >>>> between .get_monitor_addr() and rte_power_monitor() calls), > >>> > >>> But that's what we doing already - PMD fills rte_power_monitor_cond values > >>> for us, we store them somewhere and then use them to decide should we go > >>> to sleep or not. > >>> All callback does - moves actual values interpretation back to PMD: > >>> Right now: > >>> PMD: provide PMC values > >>> POWER: store PMC values somewhere > >>> read the value at address provided in PMC > >>> interpret PMC values and newly read value and make the > >>> decision > >>> > >>> With callback: > >>> PMD: provide PMC values > >>> POWER: store PMC values somewhere > >>> read the value at address provided in PMC > >>> PMD: interpret PMC values and newly read value and make the decision > >>> > >>> Or did you mean something different here? > >>> > >>>> or we do the > >>>> comparisons inside rte_power_monitor(), and store the expected/mask > >>>> values in the monitor condition, and *don't* have any callbacks at all. > >>>> Are you suggesting an alternative to the above two options? > >>> > >>> As I said in my first mail - we can just replace 'inverse' with 'op'. > >>> That at least will make this API extendable, if someone will need > >>> something different in future. > >>> > >>> Another option is > >> > >> Right, so the idea is store the PMD-specific data in the monitor > >> condition, and leave it to the callback to interpret it. > >> > >> The obvious question then is, how many values is enough? Two? Three? > >> Four? This option doesn't really solve the basic issue, it just kicks > >> the can down the road. I guess three values should be enough for > >> everyone (tm) ? :D > >> > >> I don't like the 'op' thing because if the goal is to be flexible, it's > >> unnecessarily limiting *and* makes the API even more complex to use. I > >> would rather have a number of PMD-specific values and leave it up to the > >> callback to interpret them, because at least that way we're not limited > >> to predefined operations on the monitor condition data. > > > > Just to make sure we are talking about the same, does what you propose > > looks like that: > > > > struct rte_power_monitor_cond { > > volatile void *addr; /**< Address to monitor for changes */ > > uint8_t size; /**< Data size (in bytes) that will be used to > > compare > > * expected value (`val`) with data read from > > the > > * monitored memory location (`addr`). Can be > > 1, 2, > > * 4, or 8. Supplying any other value will > > result in > > * an error. > > */ > > int (*cmp)(uint64_t real_value, const uint64_t opaque[4]); > > uint64_t opaque[4]; /*PMD specific data, used by comparison > > call-back below */ > > }; > > > > And then in rte_power_monitor(): > > ... > > uint64_t cur_value = __get_umwait_val(pmc->addr, pmc->size); > > if (pmc->cmp(cur_value, pmc->opaque) != 0) { > > /* goto sleep */ > > } > > > > ? > > > > Something like that, yes. >
Seems reasonable to me. Thanks Konstantin