On 12-Oct-20 11:35 AM, Burakov, Anatoly wrote:
On 10-Oct-20 2:19 PM, Ananyev, Konstantin wrote:


Add two new power management intrinsics, and provide an implementation
in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions
are implemented as raw byte opcodes because there is not yet widespread
compiler support for these instructions.

The power management instructions provide an architecture-specific
function to either wait until a specified TSC timestamp is reached, or optionally wait until either a TSC timestamp is reached or a memory location is written to. The monitor function also provides an optional
comparison, to avoid sleeping when the expected write has already
happened, and no more writes are expected.

I think what this API is missing - a function to wakeup sleeping core. If user can/should use some system call to achieve that, then at least
it has to be clearly documented, even better some wrapper provided.

I don't think it's possible to do that without severely overcomplicating
the intrinsic and its usage, because AFAIK the only way to wake up a
sleeping core would be to send some kind of interrupt to the core, or
trigger a write to the cache-line in question.


Yes, I think we either need a syscall that would do an IPI for us
(on top of my head - membarrier() does that, might be there are some other syscalls too), or something hand-made. For hand-made, I wonder would something like that
be safe and sufficient:
uint64_t val = atomic_load(addr);
CAS(addr, val, &val);
?
Anyway, one way or another - I think ability to wakeup core we put to sleep
have to be an essential part of this feature.
As I understand linux kernel will limit max amount of sleep time for these instructions:
https://lwn.net/Articles/790920/
But relying just on that, seems too vague for me:
- user can adjust that value
- wouldn't apply to older kernels and non-linux cases
Konstantin


This implies knowing the value the core is sleeping on.

You don't the value to wait for, you just need an address.
And you can make wakeup function to accept address as a parameter,
same as monitor() does.

Sorry, i meant the address. We don't know the address we're sleeping on.


That's not
always the case - with this particular PMD power management scheme, we
get the address from the PMD and it stays inside the callback.

That's fine - you can store address inside you callback metadata
and do wakeup as part of _disable_ function.


The address may be different, and by the time we access the address it
may become stale, so i don't see how that would help unless you're
suggesting to have some kind of synchronization mechanism there.

Yes, we'll need something to sync here for sure.
Sorry, I should say it straightway, to avoid further misunderstanding.
Let say, associate a spin_lock with monitor(), by analogy with pthread_cond_wait().
Konstantin


The idea was to provide an intrinsic-like function - as in, raw instruction call, without anything extra. We even added the masks/values etc. only because there's no race-less way to combine UMONITOR/UMWAIT without those.

Perhaps we can provide a synchronize-able wrapper around it to avoid adding overhead to calls that function but doesn't need the sync mechanism?


Also, how would having a spinlock help to synchronize? Are you suggesting we do UMWAIT on a spinlock address, or something to that effect?

--
Thanks,
Anatoly

Reply via email to