On Wed, Aug 25, 2010 at 12:14 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > Once you do that, try using just one of the networks by telling OMPI to use > only one of the devices, something like this: > > mpirun --mca btl_tcp_if_include eth0 ...
Thanks Jeff! Just tried the exact test that you suggested. [rpnabar@eu001 ~]$ NP=64;time mpirun -np $NP --host eu001,eu003,eu004,eu005,eu006,eu007,eu008,eu012 --mca btl_tcp_if_include eth0 -mca btl openib,sm,self /opt/src/mpitests/imb/src/IMB-MPI1 -npmin $NP gather Still the same problem. The NP64 gather stalls at 4096 for about 7 minutes and then completes with a step change increase in times. All 10GigE's are eth0 now and all on the 192.168.x.x. subnet. The 7 minute stall time seems very reproducible each time around. Once the test stalled I ran a padb stack trace from the master node. Posted here: [rpnabar@eu001 root]$ /opt/sbin/bin/padb --all --stack-trace --tree --config-option rmgr=orte http://dl.dropbox.com/u/118481/padb_Aug26_gather_NP64.txt Did a top for the most cpu intensive processes during the stall and the all seem the IMB-MPI ones. Memory usage seems minimal. (Each node has 16 Gigs of RAM) http://dl.dropbox.com/u/118481/top_Aug26.txt Interestingly the NP56 test runs just great and finishes in less than a minute. It's only at NP64 that I hit this roadblock. On the other hand even for the NP56 test at the bytesize of 4096-->8192 there is almost a 10x degradation in transmit times. Any other debug options or suggestions are most welcome! # /opt/src/mpitests/imb/src/IMB-MPI1 -npmin 64 gather # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # Gather #---------------------------------------------------------------- # Benchmarking Gather # #processes = 64 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.03 0.02 1 1000 84.25 84.55 84.40 2 1000 84.16 84.45 84.31 4 1000 84.48 84.78 84.64 8 1000 84.58 84.92 84.77 16 1000 86.51 86.79 86.66 32 1000 88.60 88.93 88.78 64 1000 90.88 91.22 91.06 128 1000 92.44 92.76 92.60 256 1000 95.79 96.14 95.98 512 1000 104.90 105.25 105.07 1024 1000 118.01 118.40 118.19 2048 1000 154.42 154.94 154.67 4096 1000 292.15 292.95 292.52 8192 13 1436.77 1667.15 1581.73 16384 13 1733.38 2004.77 1903.27 32768 13 2082.55 2403.24 2282.68 65536 13 3106.37 3546.15 3384.07 131072 13 7812.54 9011.62 8572.76 262144 13 10773.70 12358.30 11782.77 524288 13 19377.23 22315.85 21238.98 1048576 13 38661.61 44293.92 42280.09 2097152 13 120665.00 140697.08 136576.54 4194304 10 475155.12 567579.08 536037.92 # All processes entering MPI_Finalize real 7m31.039s user 58m58.321s sys 0m21.633s --------------------------------NP56 test------------------------------------------ #---------------------------------------------------------------- # Benchmarking Gather # #processes = 56 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.09 0.03 1 1000 74.23 74.53 74.35 2 1000 73.87 74.15 74.02 4 1000 73.59 73.86 73.72 8 1000 74.15 74.40 74.27 16 1000 76.18 76.45 76.30 32 1000 77.82 78.10 77.95 64 1000 79.85 80.16 80.00 128 1000 81.67 82.01 81.84 256 1000 86.07 86.41 86.27 512 1000 94.91 95.23 95.07 1024 843 33.45 35.13 34.38 2048 843 218.82 241.49 230.18 4096 843 130.76 131.62 131.17 8192 843 1344.88 1348.68 1347.62 16384 843 1915.72 1919.64 1918.58 32768 843 2463.28 2469.58 2468.08 65536 640 3395.59 3401.03 3398.49 131072 320 6952.66 6981.24 6968.44 262144 160 10137.25 10209.22 10174.13 524288 80 16631.20 16921.68 16788.20 1048576 40 35974.07 36980.07 36517.35 2097152 20 167574.75 183295.25 177734.75 4194304 10 321249.79 410697.10 367498.59