On 3/4/07, Kyle Moffett <[EMAIL PROTECTED]> wrote:
Well, even this far into 2.6, Linus' patch from 2003 still (mostly) applies; the maintenance cost for this kind of code is virtually zilch. If it matters that much to you clean it up and make it apply; add an alarmfd() syscall (another 100 lines of code at most?) and make a "read" return an architecture-independent siginfo-like structure and submit it for inclusion. Adding epoll() support for random objects is as simple as a 75-line object-filesystem and a 25- line syscall to return an FD to a new inode. Have fun! Go wild! Something this trivially simple could probably spend a week in -mm and go to linus for 2.6.22.
Or, if you want to do slightly more work and produce something a great deal more useful, you could implement additional netlink address families for additional "event" sources. The socket - setsockopt - bind - sendmsg/recvmsg sequence is a well understood and well documented UNIX paradigm for multiplexing non-blocking I/O to many destinations over one socket. Everyone who has read Stevens is familiar with the basic UDP and "fd open server" techniques, and if you look at Linux's IP_PKTINFO and NETLINK_W1 (bravo, Evgeniy!) you'll see how easily they could be extended to file AIO and other kinds of event sources. For file AIO, you might have the application open one AIO socket per mount point, open files indirectly via the SCM_RIGHTS mechanism, and submit/retire read/write requests via sendmsg/recvmsg with ancillary data consisting of an lseek64 tuple and a user-provided cookie. Although the process still has to have one fd open per actual open file (because trying to authenticate file accesses without opening fds is madness), the only fds it has to manipulate directly are those representing entire pools of outstanding requests. This is usually a small enough set that select() will do just fine, if you're careful with fd allocation. (You can simply punt indirectly opened fds up to a high numerical range, where they can't be accessed directly from userspace but still make fine cookies for use in lseek64 tuples within cmsg headers). The same basic approach will work for timers, signals, and just about any other event source. Userspace is of course still stuck doing its own state machines / thread scheduling / however you choose to think of it. But all the important activity goes through socketcall(), and the data and control parameters are all packaged up into a struct msghdr instead of the bare buffer pointers of read/write. So if someone else does come along later and design an ultralight threading mechanism that isn't a total botch, the actual data paths won't need much rework; the exception handling will just get a lot simpler. Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/