On Fri, Jan 19, 2007 at 05:51:49PM +0000, Arif Ali wrote:
> >>I tried the nightly snapshot of OpenMPI-1.2b4r13137, which failed  
> >>miserably.
> >>    
> >
> >Can you describe what happened there?  Is it failing in a different way?
> >  
> Here's the output
> 
> #---------------------------------------------------
> # Intel (R) MPI Benchmark Suite V2.3, MPI-1 part
> #---------------------------------------------------
> # Date : Fri Jan 19 17:33:52 2007
> # Machine : ppc64# System : Linux
> # Release : 2.6.16.21-0.8-ppc64
> # Version : #1 SMP Mon Jul 3 18:25:39 UTC 2006
> 
> #
> # Minimum message length in bytes: 0
> # Maximum message length in bytes: 4194304
> #
> # MPI_Datatype : MPI_BYTE
> # MPI_Datatype for reductions : MPI_FLOAT
> # MPI_Op : MPI_SUM
> #
> #
> 
> # List of Benchmarks to run:
> 
> # PingPong
> # PingPing
> # Sendrecv
> # Exchange
> # Allreduce
> # Reduce
> # Reduce_scatter
> # Allgather
> # Allgatherv
> # Alltoall
> # Bcast
> # Barrier
> 
> #---------------------------------------------------
> # Benchmarking PingPong
> # #processes = 2
> # ( 58 additional processes waiting in MPI_Barrier)
> #---------------------------------------------------
> #bytes #repetitions t[usec] Mbytes/sec
> 0 1000 1.76 0.00
> 1 1000 1.88 0.51
> 2 1000 1.89 1.01
> 4 1000 1.91 2.00
> 8 1000 1.88 4.05
> 16 1000 2.02 7.55
> 32 1000 2.05 14.88
> [0,1,4][btl_openib_component.c:1153:btl_openib_component_progress] from 
> node03 to: node02 error polling HP CQ with status REMOTE ACCESS ERROR 
> status number 10 for wr_id 268969528 opcode 128
> [0,1,28][btl_openib_component.c:1153:btl_openib_component_progress] from 
> node09 to: node02 error polling HP CQ with status REMOTE ACCESS ERROR 
> status number 10 for wr_id 268906808 opcode 128
> [0,1,58][btl_openib_component.c:1153:btl_openib_component_progress] from 
> node16 to: node02 error polling HP CQ with status REMOTE ACCESS ERROR 
> status number 10 for wr_id 268919352 opcode 256614836
> [0,1,0][btl_openib_component.c:1153:btl_openib_component_progress] from 
> node02 to: node03 error polling HP CQ with status WORK REQUEST FLUSHED 
> ERROR status number 5 for wr_id 276070200 opcode 0
> [0,1,59][btl_openib_component.c:1153:btl_openib_component_progress] from 
> node16 to: node02 error polling HP CQ with status REMOTE ACCESS ERROR 
> status number 10 for wr_id 268919352 opcode 256614836
> mpirun noticed that job rank 0 with PID 0 on node node02 exited on 
> signal 15 (Terminated).
> 55 additional processes aborted (not shown)
does this happen with btl_openib_flags=1? Does this also happen without
this setting. This doesn't happen with OpenMPI-1.2b3 right?


--
                        Gleb.

Reply via email to