so here's a potentially interesting idea. Since you are running plan 9 under Linux with 9vx, consider using the TAU toolkit to measure it.
http://www.cs.uoregon.edu/research/tau/home.php we've used these tools to optmize an MPI library and they are quite powerful. See what you think. ron