Hello :) Currently, the GNU Mach kernel uses trap gates to enter the kernel (on i386). We always suspected this mechanism to be slow, but afaik noone quantified that.
Tl;dr: sysenter is twice as fast as a trap gate (on my system). I have a prototype that allows one to enter the kernel using sysenter. Here are the numbers: start sysenter: mach_print using [trap gate] [sysenter]. Running 268435456(1U<<28) times mach_print("")... using trap gate: 45s960000us 171.214342ns 5840632.202 (1/s) using sysenter: 20s600000us 76.740980ns 13030847.379 (1/s) Running 268435456(1U<<28) times mach_msg (NULL, ...)... using glibc stub: 46s050000us 171.549618ns 5829217.286 (1/s) using trap gate: 44s820000us 166.967511ns 5989189.112 (1/s) using sysenter: 20s050000us 74.692070ns 13388302.045 (1/s) exiting. So using sysenter is roughly 95ns faster. To put this into perspective, sending a simple (ie. no ports/external data in body) message takes ~950ns on my system. That suggests that merely using sysenter improves our IPC performance by ~10%. So how do we get there? One trouble with sysenter/sysexit (or the amd equivalent) isn't available on all processors. Linux solves this using the VDSO mechanism. I'd like to implement something similar: 1. There is a platform dependent way to map a special page. 2. That page contains a function that executes a syscall. This way we do not hardcode the system call method into the ABI. The kernel selects one appropriate for the processor, and we are free to change this interface anytime we want. On i386, the 'platform dependent way' to get the syscall wrapper is to use the current syscall mechanism to map a special device similar to how the mapped time interface works. What do you think? Justus