Re: [OMPI users] Mixed Mellanox and Qlogic problems

2011-07-27 Thread David Warren
Ok, I finally was able to get on and run some ofed tests - it looks to me like I must have something configured wrong with the qlogic cards, but I have no idea what??? Mellanox to Qlogic: ibv_rc_pingpong n15 local address: LID 0x0006, QPN 0x240049, PSN 0x87f83a, GID :: remote address: LID

Re: [OMPI users] Mixed Mellanox and Qlogic problems

2011-07-17 Thread Jeff Squyres
Interesting. Try with the native OFED benchmarks -- i.e., get MPI out of the way and see if the raw/native performance of the network between the devices reflects the same dichotomy. (e.g., ibv_rc_pingpong) On Jul 15, 2011, at 7:58 PM, David Warren wrote: > All OFED 1.4 and 2.6.32 (that's wh

Re: [OMPI users] Mixed Mellanox and Qlogic problems

2011-07-15 Thread David Warren
All OFED 1.4 and 2.6.32 (that's what I can get to today) qib to qib: # OSU MPI Latency Test v3.3 # SizeLatency (us) 0 0.29 1 0.32 2 0.31 4 0.32 8 0.32 16

Re: [OMPI users] Mixed Mellanox and Qlogic problems

2011-07-15 Thread Jeff Squyres
I don't think too many people have done combined QLogic + Mellanox runs, so this probably isn't a well-explored space. Can you run some microbenchmarks to see what kind of latency / bandwidth you're getting between nodes of the same type and nodes of different types? On Jul 14, 2011, at 8:21 PM

Re: [OMPI users] Mixed Mellanox and Qlogic problems

2011-07-14 Thread David Warren
On my test runs (wrf run just long enough to go beyond the spinup influence) On just 6 of the the old mlx4 machines I get about 00:05:30 runtime On 3 mlx4 and 3 qib nodes I get avg of 00:06:20 So the slow down is about 11+% When this is a full run 11% becomes a evry long time. This has held for

Re: [OMPI users] Mixed Mellanox and Qlogic problems

2011-07-14 Thread Jeff Squyres
On Jul 13, 2011, at 7:46 PM, David Warren wrote: > I finally got access to the systems again (the original ones are part of our > real time system). I thought I would try one other test I had set up first. > I went to OFED 1.6 and it started running with no errors. It must have been > an OFED

Re: [OMPI users] Mixed Mellanox and Qlogic problems

2011-07-13 Thread David Warren
I finally got access to the systems again (the original ones are part of our real time system). I thought I would try one other test I had set up first. I went to OFED 1.6 and it started running with no errors. It must have been an OFED bug. Now I just have the speed problem. Anyone have a way

Re: [OMPI users] Mixed Mellanox and Qlogic problems

2011-07-07 Thread Jeff Squyres
Huh; wonky. Can you set the MCA parameter "mpi_abort_delay" to -1 and run your job again? This will prevent all the processes from dying when MPI_ABORT is invoked. Then attach a debugger to one of the still-live processes after the error message is printed. Can you send the stack trace? It