[OMPI users] help: sm btl does not work when I specify the same host twice or more in the node list

2012-02-09 Thread yanyg
Hi all,

Good morning!

I have trouble to communicate through sm btl in open MPI, please 
check the attached file for my system information. I am using open 
MPI 1.4.3, intel compilers V11.1, on linux RHEL 5.4 with kernel 2.6.

The tests are the following: 

(1) if I specify the btl to mpirun by "--mca btl self,sm,openib", if I did 
not specify any of my computing nodes twice or more in the node 
list, my job runs fine. However, if I specify any of the computing 
nodes twice or more in the node list, it will hang there forever. 

(2) if I did not specify the sm btl to mpirun as "--mca btl 
self,openib", I could run my job smoothly, either put any of the 
computing nodes twice or more in the node list, or not. 

>From above 2 tests, apparently something wrong with sm btl 
interface on my system. As I checked the user archive, sm btl 
issue has been encountered due to the comm_spawned 
parent/child processes. But this seems not the case here, if I do 
not use any of my MPI based solver, only with MPI initialization and 
finalization procedures called, it still has this issue. 

Any comments?

Thanks,
Yiguang

The following section of this message contains a file attachment
prepared for transmission using the Internet MIME message format.
If you are using Pegasus Mail, or any another MIME-compliant system,
you should be able to save it or view it from within your mailer.
If you cannot, please ask your system administrator for assistance.

    File information ---
 File:  ompiinfo-config-uname-output.tgz
 Date:  9 Feb 2012, 8:58
 Size:  126316 bytes.
 Type:  Unknown


ompiinfo-config-uname-output.tgz
Description: Binary data


[OMPI users] Problem in epoll checking in the head revision of 1.5

2012-02-09 Thread Andrew Senin
Hi,

I think there is a problem in the latest commit to the branch 1.5. When
opal_setup_libevent.m4 is upgraded to autotools 1.5.5 the square brackets
in the test C code should be replaced too. Otherwise they'll go unchanged
to the configure file. And the C program which tests for epoll support will
fail:

...
configure:159404: checking for working epoll system call
configure:159455: gcc -o conftest -DNDEBUG -g -O2
-I/hpc/home/USERS/senina/projects/hg/shmem-dev/opal/mca/hwloc/hwloc131/hwloc/include
-I/hpc/home/USERS/senina/projects/distribs/valgrind-3.7.0/install/include
-I/usr/include/infiniband -I/usr/include/infinibandconftest.c -lrt
-lnsl  -lutil -lm  >&5
conftest.c: In function 'main':
conftest.c:725: error: expected expression before '[' token
conftest.c:729: error: expected expression before '[' token
conftest.c:737: error: expected expression before '[' token
configure:159455: $? = 1
configure: program exited with status 1
configure: failed program was:
...

So
 int fildes[[2]];
and similar
should be replaced to
int fildes[2];

I've attached a diff file which worked for me.

Regards,
Andrew Senin


diff
Description: Binary data


Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine

2012-02-09 Thread Reuti
Am 08.02.2012 um 22:52 schrieb Tom Bryan:

> 
> Yes, this should work across multiple machines. And it's using `qrsh
 -inherit
 ...` so it's failing somewhere in Open MPI - is it working with 1.4.4?
>>> 
>>> I'm not sure.  We no longer have our 1.4 test environment, so I'm in the
>>> process of building that now.  I'll let you know once I have a chance to run
>>> that experiment.
> 
> You said that both of these cases worked for you in 1.4.  Were you running a
> modified version that did not use THREAD_MULTIPLE?  I ask because I'm
> getting worse errors in 1.4.  I'm using the same code that was working (in
> some cases) with 1.5.4.
> 
> I built 1.4.4 with (among other option)
> --with-threads=posix --enable-mpi-threads

./configure --prefix=$HOME/local/openmpi-1.4.4-default-thread --with-sge 
--with-threads=posix --enable-mpi-threads

No problems even with THREAD_MULTIPLE.

Only as stated in singleton mode one or more additional line (looks like one 
per slave host, but not always - race condition?):

[pc15370:31390] [[24201,0],1] routed:binomial: Connection to lifeline 
[[24201,0],0] lost

> 
>  ompi_mpi_init: orte_init failed
>  --> Returned "Data unpack would read past end of buffer" (-26) instead of
> "Success" (0)
> --
> *** The MPI_Init_thread() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.

Interesting error message, as it's not true to be disallowed.

-- Reuti


Re: [OMPI users] Problem in epoll checking in the head revision of 1.5

2012-02-09 Thread Jeff Squyres
Committed -- thanks!

On Feb 9, 2012, at 3:16 PM, Andrew Senin wrote:

> Hi, 
> 
> I think there is a problem in the latest commit to the branch 1.5. When 
> opal_setup_libevent.m4 is upgraded to autotools 1.5.5 the square brackets in 
> the test C code should be replaced too. Otherwise they'll go unchanged to the 
> configure file. And the C program which tests for epoll support will fail:
> 
> ...
> configure:159404: checking for working epoll system call
> configure:159455: gcc -o conftest -DNDEBUG -g -O2   
> -I/hpc/home/USERS/senina/projects/hg/shmem-dev/opal/mca/hwloc/hwloc131/hwloc/include
>  -I/hpc/home/USERS/senina/projects/distribs/valgrind-3.7.0/install/include   
> -I/usr/include/infiniband -I/usr/include/infinibandconftest.c -lrt -lnsl  
> -lutil -lm  >&5
> conftest.c: In function 'main':
> conftest.c:725: error: expected expression before '[' token
> conftest.c:729: error: expected expression before '[' token
> conftest.c:737: error: expected expression before '[' token
> configure:159455: $? = 1
> configure: program exited with status 1
> configure: failed program was:
> ...
> 
> So
>  int fildes[[2]];
> and similar 
> should be replaced to 
> int fildes[2];
> 
> I've attached a diff file which worked for me. 
> 
> Regards, 
> Andrew Senin
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/