> >>>>>> Add two new power management intrinsics, and provide an implementation > >>>>>> in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions > >>>>>> are implemented as raw byte opcodes because there is not yet widespread > >>>>>> compiler support for these instructions. > >>>>>> > >>>>>> The power management instructions provide an architecture-specific > >>>>>> function to either wait until a specified TSC timestamp is reached, or > >>>>>> optionally wait until either a TSC timestamp is reached or a memory > >>>>>> location is written to. The monitor function also provides an optional > >>>>>> comparison, to avoid sleeping when the expected write has already > >>>>>> happened, and no more writes are expected. > >>>>> > >>>>> I think what this API is missing - a function to wakeup sleeping core. > >>>>> If user can/should use some system call to achieve that, then at least > >>>>> it has to be clearly documented, even better some wrapper provided. > >>>> > >>>> I don't think it's possible to do that without severely overcomplicating > >>>> the intrinsic and its usage, because AFAIK the only way to wake up a > >>>> sleeping core would be to send some kind of interrupt to the core, or > >>>> trigger a write to the cache-line in question. > >>>> > >>> > >>> Yes, I think we either need a syscall that would do an IPI for us > >>> (on top of my head - membarrier() does that, might be there are some > >>> other syscalls too), > >>> or something hand-made. For hand-made, I wonder would something like that > >>> be safe and sufficient: > >>> uint64_t val = atomic_load(addr); > >>> CAS(addr, val, &val); > >>> ? > >>> Anyway, one way or another - I think ability to wakeup core we put to > >>> sleep > >>> have to be an essential part of this feature. > >>> As I understand linux kernel will limit max amount of sleep time for > >>> these instructions: > >>> https://lwn.net/Articles/790920/ > >>> But relying just on that, seems too vague for me: > >>> - user can adjust that value > >>> - wouldn't apply to older kernels and non-linux cases > >>> Konstantin > >>> > >> > >> This implies knowing the value the core is sleeping on. > > > > You don't the value to wait for, you just need an address. > > And you can make wakeup function to accept address as a parameter, > > same as monitor() does. > > Sorry, i meant the address. We don't know the address we're sleeping on. > > > > >> That's not > >> always the case - with this particular PMD power management scheme, we > >> get the address from the PMD and it stays inside the callback. > > > > That's fine - you can store address inside you callback metadata > > and do wakeup as part of _disable_ function. > > > > The address may be different, and by the time we access the address it > may become stale, so i don't see how that would help unless you're > suggesting to have some kind of synchronization mechanism there.
Yes, we'll need something to sync here for sure. Sorry, I should say it straightway, to avoid further misunderstanding. Let say, associate a spin_lock with monitor(), by analogy with pthread_cond_wait(). Konstantin