If a Continuation is scheduled, but its mutex is locked, it's put in a queue specific to that mutex. The release function for the mutex (called when a Continuation holding the mutex exists) would put the Continuation at the front of the mutex's queue (if not empty) into the ready-to-run queue (transferring the lock to that Continuation). A drawback is that the queue would itself need a mutex (spinlock?), but the critical section would be very short.
There would be a function to lock a mutex directly. It would create a Continuation that had two condition variables. It would assign the mutex to this Continuation and schedule it. (In this case, it might make sense to put this Continuation at the front of the mutex's queue, since it would be blocking an entire event thread.) The direct-lock function would then block on the first condition variable. When the Continuation ran, it would trigger the first condition variable, and then block on the second condition variable. The direct-lock function would then exit, allowing the calling code to enter its critical section. At the end of the critical section, another function to release the direct lock would be called. It would trigger the second condition variable, which would cause the function of the Continuation created for the direct lock to exit (thus releasing the mutex). With this approach, I'm not sure thread affinities would be of any value. I think perhaps each core should have it's own list of ready-to-run Continuations, and a pool of event threads with affinity to that core. Not having per-event-thread ready-to-run lists means that a Continuation function that blocks is less likely to block other ready-to-run Continuations. If Continuations had core affinities to some degree, this might reduce evictions in per-core memory cache. (Multiple Continuations having the same function should have the same core affinity.)