There seems to be some work going on with "UserFiber" which
I infer has to do with "fibers" or user level threading.

This is an important area to discuss as it effects all users
of a system, so allow me to initiate a discussion.

There are a number of ways a system like TS can operate and lots
of folks have opinions, experiences both good and bad
with several of them, read papers and done evaluations or ports
from one system to another, so I think it is important to not
only discuss the issue, but discuss how best to weigh the pros
and cons, and especially evaluate performance claims.

A small list of options, pros and cons.

1. Event-based with N system threads where N ~ number of processors

   More or less the status quo.

   Pros:
     Very efficient for huge numbers of connections/activities
     All OS interaction is managed and can be optimized independently
       from protocol code
     Some standard tools work.  Debugging works for active transactions,
       passive transactions must be examine on heap and breakpoints set
       based on the transaction ID.

   Cons:
     Hard to code to because any blocking operation must manage
       state on the heap.
     Some standard tools work less well with the odd use of the stack.

2. Half-async, Half-sync pattern (N + M threads)

   This is a common pattern which is used in high performance servers.
   Essentially the "slow" operations (e.g. I/O w/ 100k clients) is
   done with in an async event-driven manner while operations inside
   the system are done on blocking threads (within machine/cluster).

   Pros:
     Very efficient for huge numbers of connections/activities as long
       as they can be segmented into "transactions".
     Easy programming of all in-system code using normal threading
     Standard tools work for all but the small front-end async code
       and even then they work, although the "stack" is not useful.

    Cons:
      2 different models in one can be hard on programmers
      Where to make the cut?
      Performance can be a problem because I/O is not centralized
        which makes tuning harder and inefficiencies harder to find.

3. Full system threading with N system thread where N is the number of
connections.

   This model is possible as Linux can support 100k threads with the
O(1) scheduler.

  Pros:
    Simple
    Standard tools work

  Cons:
    Portable performance.  While it is possible to tune some
      platforms for this type of workload it is unlikely that
      good performance could be delivered over a range of systems.
    Performance can be a problem because I/O is not centralized
        which makes tuning harder and inefficiencies harder to find.

4. Userland threads (M:N)

  This model would relies on a user threading system to emulate system
threads.

  Pros:
    Programming would be similar to system thread

  Cons:
    Difficult debugging. gdb will not
      know about the threads and since the state is not stored on
      the heap in a manner that can be read by gdb, non-active thread
    Some system tools will not work. Other tools which rely on knowing
      where the stack is will not work (e.g. leak detectors)
    Performance can be a problem because even though the thread
       library can be tuned, it is unlikely to be as polished as a
       OS thread system and emulation of system threading may require
       many more system calls.  Lack of vendor support means that
       all vendor optimizations are unavailable.
    Because I/O is no longer centralized tuning is more difficult
       and inefficiencies harder to find.
    Programmers may take advantage of the non-preemtive semantics
       making the code difficult to switch to system threads.
    Explicit yields must be inserted.
    User thread libraries are complex and difficult to debug. The
       simpler ones and more portable ones are very inefficient,
       scale and load balance poorly.  The more complex ones are
       well... more complex: fragile and hard to maintain and port.


You may detect some bias.  I have worked with a user threaded server
application and it was a complete disaster.  It was buggy, slow, hard to
debug and optimize.  It didn't scale, and because the programmers had
taken advantage of non-preemption it was hard to convert to standard
threads.  And did a mention that it was really slow.  And this was not a
home grown one either.  It has an established package with a good
pedigree. It was slow because of all the reasons above.

Solaris and IBM already have M:N half-user thread packages which are
vendor supplied and very well tuned.  It is silly to put user level
threading on top of these.

For 2.6 Linux developers had IBM's NGPT package which was M:N and they
specifically decided not to use it becuase they found that they could
tune up the 1:1 (each thread is a system thread) threading model
to support fast thread creation (as in 1 thread created per op),
millions of threads with O(1) scheduling cost, and Fast
Userspace Locking (fmutex).  This killed the N*M threading model for
Linux and the developers abandoned it.  Wikipedia is your friend:

http://en.wikipedia.org/wiki/Native_POSIX_Thread_Library

I would like us to discuss this because it is important and has long
term consequences.

I am sure there will be many opinions and I am interested in hearing
them because much of what I now know I learned that hard way and I would
not have though of initially.  In particular the centralization of I/O
preventing stupid write-on-byte-at-a-time inefficiencies and the
tendency of users to take advantage of non-preemption.

Reply via email to