There seems to be some work going on with "UserFiber" which I infer has to do with "fibers" or user level threading.
This is an important area to discuss as it effects all users of a system, so allow me to initiate a discussion. There are a number of ways a system like TS can operate and lots of folks have opinions, experiences both good and bad with several of them, read papers and done evaluations or ports from one system to another, so I think it is important to not only discuss the issue, but discuss how best to weigh the pros and cons, and especially evaluate performance claims. A small list of options, pros and cons. 1. Event-based with N system threads where N ~ number of processors More or less the status quo. Pros: Very efficient for huge numbers of connections/activities All OS interaction is managed and can be optimized independently from protocol code Some standard tools work. Debugging works for active transactions, passive transactions must be examine on heap and breakpoints set based on the transaction ID. Cons: Hard to code to because any blocking operation must manage state on the heap. Some standard tools work less well with the odd use of the stack. 2. Half-async, Half-sync pattern (N + M threads) This is a common pattern which is used in high performance servers. Essentially the "slow" operations (e.g. I/O w/ 100k clients) is done with in an async event-driven manner while operations inside the system are done on blocking threads (within machine/cluster). Pros: Very efficient for huge numbers of connections/activities as long as they can be segmented into "transactions". Easy programming of all in-system code using normal threading Standard tools work for all but the small front-end async code and even then they work, although the "stack" is not useful. Cons: 2 different models in one can be hard on programmers Where to make the cut? Performance can be a problem because I/O is not centralized which makes tuning harder and inefficiencies harder to find. 3. Full system threading with N system thread where N is the number of connections. This model is possible as Linux can support 100k threads with the O(1) scheduler. Pros: Simple Standard tools work Cons: Portable performance. While it is possible to tune some platforms for this type of workload it is unlikely that good performance could be delivered over a range of systems. Performance can be a problem because I/O is not centralized which makes tuning harder and inefficiencies harder to find. 4. Userland threads (M:N) This model would relies on a user threading system to emulate system threads. Pros: Programming would be similar to system thread Cons: Difficult debugging. gdb will not know about the threads and since the state is not stored on the heap in a manner that can be read by gdb, non-active thread Some system tools will not work. Other tools which rely on knowing where the stack is will not work (e.g. leak detectors) Performance can be a problem because even though the thread library can be tuned, it is unlikely to be as polished as a OS thread system and emulation of system threading may require many more system calls. Lack of vendor support means that all vendor optimizations are unavailable. Because I/O is no longer centralized tuning is more difficult and inefficiencies harder to find. Programmers may take advantage of the non-preemtive semantics making the code difficult to switch to system threads. Explicit yields must be inserted. User thread libraries are complex and difficult to debug. The simpler ones and more portable ones are very inefficient, scale and load balance poorly. The more complex ones are well... more complex: fragile and hard to maintain and port. You may detect some bias. I have worked with a user threaded server application and it was a complete disaster. It was buggy, slow, hard to debug and optimize. It didn't scale, and because the programmers had taken advantage of non-preemption it was hard to convert to standard threads. And did a mention that it was really slow. And this was not a home grown one either. It has an established package with a good pedigree. It was slow because of all the reasons above. Solaris and IBM already have M:N half-user thread packages which are vendor supplied and very well tuned. It is silly to put user level threading on top of these. For 2.6 Linux developers had IBM's NGPT package which was M:N and they specifically decided not to use it becuase they found that they could tune up the 1:1 (each thread is a system thread) threading model to support fast thread creation (as in 1 thread created per op), millions of threads with O(1) scheduling cost, and Fast Userspace Locking (fmutex). This killed the N*M threading model for Linux and the developers abandoned it. Wikipedia is your friend: http://en.wikipedia.org/wiki/Native_POSIX_Thread_Library I would like us to discuss this because it is important and has long term consequences. I am sure there will be many opinions and I am interested in hearing them because much of what I now know I learned that hard way and I would not have though of initially. In particular the centralization of I/O preventing stupid write-on-byte-at-a-time inefficiencies and the tendency of users to take advantage of non-preemption.