Hi Jeff,
Thanks, I will check flamegraphs.
Sample generation with perf could be a problem, I don't think I can do 'mpirun
-np <> perf record ' and get
the sampling done on all the cores and store each cores data (perf.data)
separately to analyze it. Is it possible to do?
Came to know that amdu
Arun,
You can use a small wrapper script like this one to store the perf data
in separate files:
```
$ cat perfwrap.sh
#!/bin/bash
exec perf record -o perf.data.$OMPI_COMM_WORLD_RANK $@
```
Then do `mpirun -n ./perfwrap.sh ./a.out` to run all processes under
perf. You can also select a subs