> However, reading through your initial description on Tuesday, none of these > fit: You want to actually measure the kernel time on TCP communication costs. > Since the problem occurs also on node only configuration and mca-option btl = self,sm,tcp is used, I doubt it has to do with TCP communication. But anyways will keep in the back of my mind.
> So, have You tried attaching "strace -c -f -p PID" to the actual application > processes? > > As a starter You may invoke the benchmark using: > mpirun -np 4 strace -c -f ./benchmark > (which however includes initialization and all other system calls)... > I ran it as you suggested (node-only, no network distribution) I am not really fond of analyzing this in detail, but maybe this rings a bell for one of you: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 37.97 0.000508 0 119856 rt_sigaction 33.78 0.000452 0 59925 poll 21.00 0.000281 0 179776 rt_sigprocmask 7.25 0.000097 0 121297 gettimeofday 0.00 0.000000 0 85 read 0.00 0.000000 0 3 write 0.00 0.000000 0 324 203 open 0.00 0.000000 0 129 close 0.00 0.000000 0 3 3 unlink [...] % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 34.64 0.000194 0 92934 gettimeofday 28.75 0.000161 0 137227 rt_sigprocmask 26.25 0.000147 0 45742 poll [...] I can provide the whole output, if you like. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.