[OMPI users] help: sm btl does not work when I specify the same host twice or more in the node list
Hi all, Good morning! I have trouble to communicate through sm btl in open MPI, please check the attached file for my system information. I am using open MPI 1.4.3, intel compilers V11.1, on linux RHEL 5.4 with kernel 2.6. The tests are the following: (1) if I specify the btl to mpirun by "--mca btl self,sm,openib", if I did not specify any of my computing nodes twice or more in the node list, my job runs fine. However, if I specify any of the computing nodes twice or more in the node list, it will hang there forever. (2) if I did not specify the sm btl to mpirun as "--mca btl self,openib", I could run my job smoothly, either put any of the computing nodes twice or more in the node list, or not. >From above 2 tests, apparently something wrong with sm btl interface on my system. As I checked the user archive, sm btl issue has been encountered due to the comm_spawned parent/child processes. But this seems not the case here, if I do not use any of my MPI based solver, only with MPI initialization and finalization procedures called, it still has this issue. Any comments? Thanks, Yiguang The following section of this message contains a file attachment prepared for transmission using the Internet MIME message format. If you are using Pegasus Mail, or any another MIME-compliant system, you should be able to save it or view it from within your mailer. If you cannot, please ask your system administrator for assistance. File information --- File: ompiinfo-config-uname-output.tgz Date: 9 Feb 2012, 8:58 Size: 126316 bytes. Type: Unknown ompiinfo-config-uname-output.tgz Description: Binary data
[OMPI users] Problem in epoll checking in the head revision of 1.5
Hi, I think there is a problem in the latest commit to the branch 1.5. When opal_setup_libevent.m4 is upgraded to autotools 1.5.5 the square brackets in the test C code should be replaced too. Otherwise they'll go unchanged to the configure file. And the C program which tests for epoll support will fail: ... configure:159404: checking for working epoll system call configure:159455: gcc -o conftest -DNDEBUG -g -O2 -I/hpc/home/USERS/senina/projects/hg/shmem-dev/opal/mca/hwloc/hwloc131/hwloc/include -I/hpc/home/USERS/senina/projects/distribs/valgrind-3.7.0/install/include -I/usr/include/infiniband -I/usr/include/infinibandconftest.c -lrt -lnsl -lutil -lm >&5 conftest.c: In function 'main': conftest.c:725: error: expected expression before '[' token conftest.c:729: error: expected expression before '[' token conftest.c:737: error: expected expression before '[' token configure:159455: $? = 1 configure: program exited with status 1 configure: failed program was: ... So int fildes[[2]]; and similar should be replaced to int fildes[2]; I've attached a diff file which worked for me. Regards, Andrew Senin diff Description: Binary data
Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine
Am 08.02.2012 um 22:52 schrieb Tom Bryan: > > Yes, this should work across multiple machines. And it's using `qrsh -inherit ...` so it's failing somewhere in Open MPI - is it working with 1.4.4? >>> >>> I'm not sure. We no longer have our 1.4 test environment, so I'm in the >>> process of building that now. I'll let you know once I have a chance to run >>> that experiment. > > You said that both of these cases worked for you in 1.4. Were you running a > modified version that did not use THREAD_MULTIPLE? I ask because I'm > getting worse errors in 1.4. I'm using the same code that was working (in > some cases) with 1.5.4. > > I built 1.4.4 with (among other option) > --with-threads=posix --enable-mpi-threads ./configure --prefix=$HOME/local/openmpi-1.4.4-default-thread --with-sge --with-threads=posix --enable-mpi-threads No problems even with THREAD_MULTIPLE. Only as stated in singleton mode one or more additional line (looks like one per slave host, but not always - race condition?): [pc15370:31390] [[24201,0],1] routed:binomial: Connection to lifeline [[24201,0],0] lost > > ompi_mpi_init: orte_init failed > --> Returned "Data unpack would read past end of buffer" (-26) instead of > "Success" (0) > -- > *** The MPI_Init_thread() function was called before MPI_INIT was invoked. > *** This is disallowed by the MPI standard. > *** Your MPI job will now abort. Interesting error message, as it's not true to be disallowed. -- Reuti
Re: [OMPI users] Problem in epoll checking in the head revision of 1.5
Committed -- thanks! On Feb 9, 2012, at 3:16 PM, Andrew Senin wrote: > Hi, > > I think there is a problem in the latest commit to the branch 1.5. When > opal_setup_libevent.m4 is upgraded to autotools 1.5.5 the square brackets in > the test C code should be replaced too. Otherwise they'll go unchanged to the > configure file. And the C program which tests for epoll support will fail: > > ... > configure:159404: checking for working epoll system call > configure:159455: gcc -o conftest -DNDEBUG -g -O2 > -I/hpc/home/USERS/senina/projects/hg/shmem-dev/opal/mca/hwloc/hwloc131/hwloc/include > -I/hpc/home/USERS/senina/projects/distribs/valgrind-3.7.0/install/include > -I/usr/include/infiniband -I/usr/include/infinibandconftest.c -lrt -lnsl > -lutil -lm >&5 > conftest.c: In function 'main': > conftest.c:725: error: expected expression before '[' token > conftest.c:729: error: expected expression before '[' token > conftest.c:737: error: expected expression before '[' token > configure:159455: $? = 1 > configure: program exited with status 1 > configure: failed program was: > ... > > So > int fildes[[2]]; > and similar > should be replaced to > int fildes[2]; > > I've attached a diff file which worked for me. > > Regards, > Andrew Senin > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/