[OMPI users] Error Polling HP CQ Status on PPC64 LInux with IB
I'm currently working on getting OpenMPI + OpenIB 1.0 (might be an RC) working on our 8 node Xserve G5 cluster running Linux kernel version 2.6.16 and get the following errors: Process 1 on node-192-168-111-249 Process 0 on node-192-168-111-248 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 1 for wr_id 270995584 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270995868 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270996152 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270996436 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270996720 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270997004 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270997288 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270997572 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 271077504 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 271077788 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 271078072 opcode -1286736 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 9 for wr_id 270991488 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270995584 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270995868 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270996152 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270996436 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270996720 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270997004 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270997288 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270997572 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 271077504 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 271077788 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 271078072 opcode -6639584 mpirun: killing job...
Re: [OMPI users] Error Polling HP CQ Status on PPC64 LInux with IB
What version of OpenMPI are you using? On Mon, Jun 19, 2006 at 07:06:54AM -0700, Owen Stampflee wrote: > I'm currently working on getting OpenMPI + OpenIB 1.0 (might be an RC) > working on our 8 node Xserve G5 cluster running Linux kernel version > 2.6.16 and get the following errors: > > Process 1 on node-192-168-111-249 > Process 0 on node-192-168-111-248 > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 1 for wr_id 270995584 opcode -1286736 > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270995868 opcode -1286736 > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270996152 opcode -1286736 > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270996436 opcode -1286736 > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270996720 opcode -1286736 > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270997004 opcode -1286736 > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270997288 opcode -1286736 > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270997572 opcode -1286736 > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 271077504 opcode -1286736 > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 271077788 opcode -1286736 > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 271078072 opcode -1286736 > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 9 for wr_id 270991488 opcode -6639584 > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270995584 opcode -6639584 > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270995868 opcode -6639584 > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270996152 opcode -6639584 > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270996436 opcode -6639584 > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270996720 opcode -6639584 > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270997004 opcode -6639584 > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270997288 opcode -6639584 > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 270997572 opcode -6639584 > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 271077504 opcode -6639584 > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 271077788 opcode -6639584 > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > error polling HP CQ with status 5 for wr_id 271078072 opcode -6639584 > > mpirun: killing job... > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Gleb.
[OMPI users] Error polling HP CQ on linux ppc64 w/Infiniband
We are currently trying to get OpenMPI 1.0.2 and OpenIB (1.0) running on our Xserve G5 cluster running Linux 2.6.16 with no luck, the ibv_*_pingpong tests work fine, opensm is started and the network is up. Here's the output: Process 1 on node-192-168-111-249 Process 0 on node-192-168-111-248 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 1 for wr_id 270995584 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270995868 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270996152 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270996436 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270996720 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270997004 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270997288 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270997572 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 271077504 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 271077788 opcode -1286736 [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 271078072 opcode -1286736 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 9 for wr_id 270991488 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270995584 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270995868 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270996152 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270996436 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270996720 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270997004 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270997288 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 270997572 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 271077504 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 271077788 opcode -6639584 [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] error polling HP CQ with status 5 for wr_id 271078072 opcode -6639584 mpirun: killing job...
Re: [OMPI users] Installing OpenMPI on a solaris
I checked the thread with the same title as this e-mail and tried compiling openmpi-1.1b4r10418 with: ./configure CFLAGS="-mv8plus" CXXFLAGS="-mv8plus" FFLAGS="-mv8plus" FCFLAGS="-mv8plus" --prefix=$HOME/openmpi-SUN-`uname -r` --enable-pretty-print-stacktrace but I keep getting: *** Assembler checking for BSD-compatible nm... /usr/ccs/bin/nm -p checking for fgrep... fgrep checking whether to enable smp locks... yes checking directive for setting text section... .text checking directive for exporting symbols... .globl checking for objdump... no checking if .note.GNU-stack is needed... no checking suffix for labels... : checking prefix for global symbol labels... checking prefix for lsym labels... .L checking prefix for function in .type... # checking if .size is needed... yes checking if .align directive takes logarithmic value... no checking if have Sparc v8+/v9 support... no configure: WARNING: Sparc v8 target is not supported in this release of Open MPI. configure: WARNING: You must specify the target architecture v8plus configure: WARNING: (cc: -xarch=v8plus, gcc: -mv8plus) for CFLAGS, CCXXFLAGS, configure: WARNING: FFLAGS, and FCFLAGS to compile Open MPI in32 bit mode on configure: WARNING: Sparc processors configure: error: Can not continue. Is Sparc support put aside for the moment or am-I doing something wrong? Thanks, -- Eric Thibodeau Neural Bucket Solutions Inc. T. (514) 736-1436 C. (514) 710-0517
[OMPI users] MPI_Wtime
Is anyone using MPI_Wtime with any version of OpenMPI under Fortran 90? I got my program to compile with MPI_Wtime commands but the difference between two different times in the process is always zero. When compiling against OpenMPI I have to specify mytime = MPI_Wtime For other MPI's I specify: mytime = MPI_Wtime() This has been tested on a dual-opteron with PGI 6.1-5 and a G4 with g95, I'm currently using OpenMPI 1.2a1r10297. The same code works fine on the dual-operton with PGI 6.1-5 and MPICH2, SGI Altix with Intel compilers and SGI MPI library, and SGI IRIX with SGI MPI library. Michael
[OMPI users] auto detect hosts
How does OpenMPI auto-detect available hosts? I'm running on a cluster of dual-opterons running Debian Linux. Just using "mpirun -np 4 hostname" somehow OpenMPI located the second dual-opteron in the stack of machines but no more than that, regardless of how many processes I asked for. The master node has an internal ip of 10.0.0.0 and the second node has an ip of 10.0.0.1 and a name of "node02" and "node2" I've been unable to find a file that contains only the name of my second node and not the others. I'm currently running OpenMPI 1.2a1r10297. Michael
Re: [OMPI users] Error Polling HP CQ Status on PPC64 LInux with IB
Ooops, thought I mentioned that, its 1.0.2. Cheers, Owen On Mon, 2006-06-19 at 17:08 +0300, Gleb Natapov wrote: > What version of OpenMPI are you using? > > On Mon, Jun 19, 2006 at 07:06:54AM -0700, Owen Stampflee wrote: > > I'm currently working on getting OpenMPI + OpenIB 1.0 (might be an RC) > > working on our 8 node Xserve G5 cluster running Linux kernel version > > 2.6.16 and get the following errors: > > > > Process 1 on node-192-168-111-249 > > Process 0 on node-192-168-111-248 > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 1 for wr_id 270995584 opcode -1286736 > > > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270995868 opcode -1286736 > > > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270996152 opcode -1286736 > > > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270996436 opcode -1286736 > > > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270996720 opcode -1286736 > > > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270997004 opcode -1286736 > > > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270997288 opcode -1286736 > > > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270997572 opcode -1286736 > > > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 271077504 opcode -1286736 > > > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 271077788 opcode -1286736 > > > > [0,1,1][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 271078072 opcode -1286736 > > > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 9 for wr_id 270991488 opcode -6639584 > > > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270995584 opcode -6639584 > > > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270995868 opcode -6639584 > > > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270996152 opcode -6639584 > > > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270996436 opcode -6639584 > > > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270996720 opcode -6639584 > > > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270997004 opcode -6639584 > > > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270997288 opcode -6639584 > > > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 270997572 opcode -6639584 > > > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 271077504 opcode -6639584 > > > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 271077788 opcode -6639584 > > > > [0,1,0][btl_openib_component.c:587:mca_btl_openib_component_progress] > > error polling HP CQ with status 5 for wr_id 271078072 opcode -6639584 > > > > mpirun: killing job... > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- > Gleb. > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > !DSPAM:4496b11547791312162!
Re: [OMPI users] auto detect hosts
Am 19.06.2006 um 20:02 schrieb Michael Kluskens: How does OpenMPI auto-detect available hosts? I'm running on a cluster of dual-opterons running Debian Linux. Just using "mpirun -np 4 hostname" somehow OpenMPI located the second dual-opteron in the stack of machines but no more than that, regardless of how many processes I asked for. The master node has an internal ip of 10.0.0.0 and the second node has an ip of 10.0.0.1 and a name of "node02" and "node2" I would suggest not to use .0 at all as a host address, as it usually refers to a subnet by convention. And having it in the same order as the name, avoids the offset. I mean .1 = node01, .2 = node02... I don't know, whether this is related in any way to the effect you observe. Cheers - Reuti I've been unable to find a file that contains only the name of my second node and not the others. I'm currently running OpenMPI 1.2a1r10297. Michael ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] auto detect hosts
Corrected: How does OpenMPI auto-detect available hosts? I'm running on a cluster of dual-opterons running Debian Linux. Just using "mpirun -np 4 hostname" somehow OpenMPI located the second dual-opteron in the stack of machines but no more than that, regardless of how many processes I asked for. The master node has an internal ip of 10.0.0.1 and the second node has an ip of 10.0.0.2 and a name of "node02" and "node2" I've been unable to find a file that contains only the name of my second node and not the others. I'm currently running OpenMPI 1.2a1r10297. Michael