I've done quite a lot of concurrent programming over the past 23ish years, from the implementation of a parallelized version of CLIPS back in the late 80s to many C, Perl, and Python projects involving everything from shared memory to process pooling to every permutation of hard and soft thread management. To say I'm rusty, however, would be an understatement, and I'm sure my information is sorely out of date.
What I can contribute to such a conversation, however, is this: - Make the concept of "process" and "thread" an implementation detail rather than separate worlds and your users won't learn to fear one or the other. - If the programmer has to think about semaphore management, there's already a problem. - If the programmer's not allowed to think about semaphore management, there's already a problem. - Don't paint yourself into a corner when it comes to playing nice with local interfaces. - If your idea of instantiating a "thread" involves creating a on OS VM, then you're probably lighter weight than Python's threading model, but I'd suggesting parring it down some more. It's "thread," not "ringworld" (I was going to say "not 'space elevator,'" but it seemed insufficient to the examples I've seen). I know that's pretty high-level, but it's what I've got. I think I wrote my last threaded application in 2007.