So after some discussion with Anirban another option emerged:
5. System Threading w/user threading as a optimization The programming model and semantics, debugging, development environment and tools would all be based on system threading, but for optimized builds a user threading package could be used to increase performance. Pros: Simple Standard tools work (for development/debugging) Cons: Optimized builds differ somewhat from development/debugging builds. Requires porting and tuning user thread system for highest performance. Explicit yields must be inserted in long running code. Scheduler can't use mutex state to switch to a thread holding a requested lock. Because I/O is no longer centralized tuning is more difficult and inefficiencies harder to find. This proposal deals with two serious issues. First, development can now use standard debugging and other tools, and given moving to threads over using events is primarily for development ease, this would seem to be an important issue. Second, it ensures that the semantics of the user thread library match the system thread library and permits using the system thread library trivially for portability, fallback, and systems which support M:N threading. Of the remaining issues, lack of centralized I/O requires discipline, but perhaps some debug version smarts can detect silly behavior (like millions of small writes). The same goes for inserting yields as long running code without yields could be detected. The scheduling issue can probably be addressed within the user thread library. I think this last option is very reasonable. It is somewhat less performant portable than using a half-async half-sync pattern and would mean that debugging production binaries would be a jump, but it unifies the programming model.