>b) Use threads, either via some library or implement yourself. Each thread of >course uses some extra memory. Also enabling >threads causes glibc to start >using a thread-safe version of malloc() (I think?), which slows things down >(unless that can be avoided, >maybe by using clone() directly instead of >pthreads?).
Perhaps libeio (http://software.schmorp.de/pkg/libeio.html) is a good starting point? I don't have any experience with it but it's used by node.js (http://nodejs.org/) for the async I/O stuff. -Sebastian