On Tue, 3 Jan 2006, Carsten Kutzner wrote: > On Tue, 3 Jan 2006, Graham E Fagg wrote: > > > Do you have any tools such as Vampir (or its Intel equivalent) available > > to get a time line graph ? (even jumpshot of one of the bad cases such as > > the 128/32 for 256 floats below would help). > > Hi Graham, > > I have attached an slog file of an all-to-all run for 1024 floats (ompi > tuned alltoall). I could not get clog files for >32 processes - is this > perhaps a limitation of MPE?
MPE/MPE2 logging (or clog/clog2) does not impose any limitation on the number of processes. Could you explain what difficulty or error message you encountered when using >32 processes ? BTW, the version of MPE that you are using seems old. You may want to downaload the latest version of MPE from http://www.mcs.anl.gov/perfvis. A.Chan > So I decided to take the case 32 CPUs on > 32 nodes which is performance-critical as well. From the run output you > can see that 2 of the 5 tries yield a fast execution while the others > are slow (see below). > > Carsten > > > > ckutzne@node001:~/mpe> mpirun -hostfile ./bhost1 -np 32 ./phas_mpe.x > Alltoall Test on 32 CPUs. 5 repetitions. > --- New category (first test not counted) --- > MPI: sending 1024 floats ( 4096 bytes) to 32 processes ( 1 times) > took ... 0.00690 seconds > --------------------------------------------- > MPI: sending 1024 floats ( 4096 bytes) to 32 processes ( 1 times) > took ... 0.00320 seconds > MPI: sending 1024 floats ( 4096 bytes) to 32 processes ( 1 times) > took ... 0.26392 seconds ! > MPI: sending 1024 floats ( 4096 bytes) to 32 processes ( 1 times) > took ... 0.26868 seconds ! > MPI: sending 1024 floats ( 4096 bytes) to 32 processes ( 1 times) > took ... 0.26398 seconds ! > MPI: sending 1024 floats ( 4096 bytes) to 32 processes ( 1 times) > took ... 0.00339 seconds > Summary (5-run average, timer resolution 0.000001): > 1024 floats took 0.160632 (0.143644) seconds. Min: 0.003200 max: > 0.268681 > Writing logfile.... > Finished writing logfile. > >