I noticed that my OpenMPI processes are using larger amounts of system time than user time (via vmstat, top). I'm running on dual-core, dual-CPU Opterons, with 4 slots per node, where the program has the nodes to themselves. A closer look showed that they are constantly switching between run and sleep states with 4-8 page faults per second.
Why would this be? It doesn't happen with 4 sequential jobs running on a node, where I get 99% user time, maybe 1% system time. The processes have plenty of memory. This behavior occurs whether I use processor/memory affinity or not (there is no oversubscription). Thanks, Todd