Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-09 Thread Boris Bierbaum
It has been explained in a different thread on [ofa-general] that the
problem lies in a combination of the OpenIB-cma provider not setting the
local and remote port numbers on endpoints correctly and Open MPI
stepping over the IA to save the port number to circumvent this problem,
thereby confusing the provider.

I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI
1.2.1 release) and this fixes the problem. As the problem in the
provider is currently being fixed, the whole saving of the port number
in the uDAPL BTL code will be unnecessary in the future.

Steve Wise wrote:
>>> Can the UDAPL OFED wizards shed any light on the error messages that  
>>> are listed below?  In particular, these seem to be worrysome:
>>>
  setup_listener Permission denied
>>>  setup_listener Address already in use
>> These failures are from rdma_cm_bind indicating the port is already 
>> bound to this IA address. How are you creating the service point?
>> dat_psp_create or dat_psp_create_any? If it is psp_create_any then you 
>> will see some failures until it  gets to a free port. That is normal. 
>> Just make sure your create call returns DAT_SUCCESS.
>>
> 
> Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down
> and let the rdma-cma pick an available port number?
> 
> 
> 
> ___
> general mailing list
> gene...@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


-- 
|  _  RWTH | Boris Bierbaum
|_|_`_ | Lehrstuhl fuer Betriebssysteme
   | |_) _  | RWTH Aachen D-52056 Aachen
 |_)(_` | Tel: +49-241-80-27805
._) | Fax: +49-241-80-22339



Re: [OMPI users] Newbie question. Please help.

2007-05-09 Thread Kevin Radican
Hi,

We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on
x86_64 with opensuse and compiled with gcc and Intel fortran and use
torque PBS.

I used standard configure to build openmpi something like 

./configure --prefix=/usr/local --enable-static --with-threads
--with-tm=/usr/local --with-libnuma

I used the ACLM math lapack libs and built Blacs and Scalapack with them
too.  

I attached my vasp makefile, I might of added 

mpi.o : mpi.F
$(CPP)
$(FC) -FR -lowercase -O0 -c $*$(SUFFIX)

to the end of the make file, It doesn't look like it is in the example
makefiles they give, but I compiled this a while ago.

Hope this helps. 

Cheers,
Kevin 





On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
> Hi, all.  I am new to OpenMPI and after initial setup I tried to run
> my app but got the followign errors:
> 
> [node07.my.com:16673] *** An error occurred in MPI_Comm_rank
> [node07.my.com:16673] *** on communicator MPI_COMM_WORLD
> [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
> [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node07.my.com:16674] *** An error occurred in MPI_Comm_rank
> [node07.my.com:16674] *** on communicator MPI_COMM_WORLD
> [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
> [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node07.my.com:16675] *** An error occurred in MPI_Comm_rank
> [node07.my.com:16675] *** on communicator MPI_COMM_WORLD
> [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
> [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node07.my.com:16676] *** An error occurred in MPI_Comm_rank
> [node07.my.com:16676] *** on communicator MPI_COMM_WORLD
> [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
> [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpiexec noticed that job rank 2 with PID 16675 on node node07 exited
> on signal 60 (Real-time signal 26).
> 
>  /usr/local/openmpi-1.2.1/bin/ompi_info
> Open MPI: 1.2.1
>Open MPI SVN revision: r14481
> Open RTE: 1.2.1
>Open RTE SVN revision: r14481
> OPAL: 1.2.1
>OPAL SVN revision: r14481
>   Prefix: /usr/local/openmpi-1.2.1
>  Configured architecture: x86_64-unknown-linux-gnu
>Configured by: root
>Configured on: Mon May  7 18:32:56 PDT 2007
>   Configure host: neptune.nanostellar.com
> Built by: root
> Built on: Mon May  7 18:40:28 PDT 2007
>   Built host: neptune.my.com
>   C bindings: yes
> C++ bindings: yes
>   Fortran77 bindings: yes (all)
>   Fortran90 bindings: yes
>  Fortran90 bindings size: small
>   C compiler: gcc
>  C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
>C++ compiler absolute: /usr/bin/g++
>   Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
>   Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
>   Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
>   Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
>  C profiling: yes
>C++ profiling: yes
>  Fortran77 profiling: yes
>  Fortran90 profiling: yes
>   C++ exceptions: no
>   Thread support: posix (mpi: no, progress: no)
>   Internal debug support: no
>  MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>  libltdl support: yes
>Heterogeneous support: yes
>  mpirun default --prefix: yes
>MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1)
>   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1)
>MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1)
>MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1)
>MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.1)
>MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.1)
>  MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.1)
>  MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.1)
>MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
>   MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
>MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.1)
>MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
>  MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
>  MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
>  MCA bml: r2 (MCA v1.

Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-09 Thread Andrew Friedley
You say that fixes the problem, does it work even when running more than 
one MPI process per node? (that is the case the hack fixes)  Simply 
doing an mpirun with a -np paremeter higher than the number of nodes you 
have set up should trigger this case, and making sure to use '-mca btl 
udapl,self' (ie not SM or anything else).


Andrew

Boris Bierbaum wrote:

It has been explained in a different thread on [ofa-general] that the
problem lies in a combination of the OpenIB-cma provider not setting the
local and remote port numbers on endpoints correctly and Open MPI
stepping over the IA to save the port number to circumvent this problem,
thereby confusing the provider.

I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI
1.2.1 release) and this fixes the problem. As the problem in the
provider is currently being fixed, the whole saving of the port number
in the uDAPL BTL code will be unnecessary in the future.

Steve Wise wrote:
Can the UDAPL OFED wizards shed any light on the error messages that  
are listed below?  In particular, these seem to be worrysome:



 setup_listener Permission denied

 setup_listener Address already in use
These failures are from rdma_cm_bind indicating the port is already 
bound to this IA address. How are you creating the service point?
dat_psp_create or dat_psp_create_any? If it is psp_create_any then you 
will see some failures until it  gets to a free port. That is normal. 
Just make sure your create call returns DAT_SUCCESS.



Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down
and let the rdma-cma pick an available port number?



___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general






Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-09 Thread Boris Bierbaum
I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2
processes per node and --mca btl udapl,self. I didn't encouter any problems.

The comment above line 197 says that dat_ep_query() returns wrong port
numbers (which it does indeed), but I can't find any call to
dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date?

Boris


Andrew Friedley wrote:
> You say that fixes the problem, does it work even when running more than 
> one MPI process per node? (that is the case the hack fixes)  Simply 
> doing an mpirun with a -np paremeter higher than the number of nodes you 
> have set up should trigger this case, and making sure to use '-mca btl 
> udapl,self' (ie not SM or anything else).
> 
> Andrew
> 
> Boris Bierbaum wrote:
>> It has been explained in a different thread on [ofa-general] that the
>> problem lies in a combination of the OpenIB-cma provider not setting the
>> local and remote port numbers on endpoints correctly and Open MPI
>> stepping over the IA to save the port number to circumvent this problem,
>> thereby confusing the provider.
>>
>> I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI
>> 1.2.1 release) and this fixes the problem. As the problem in the
>> provider is currently being fixed, the whole saving of the port number
>> in the uDAPL BTL code will be unnecessary in the future.
>>
>> Steve Wise wrote:
> Can the UDAPL OFED wizards shed any light on the error messages that  
> are listed below?  In particular, these seem to be worrysome:
>
>>  setup_listener Permission denied
>  setup_listener Address already in use
 These failures are from rdma_cm_bind indicating the port is already 
 bound to this IA address. How are you creating the service point?
 dat_psp_create or dat_psp_create_any? If it is psp_create_any then you 
 will see some failures until it  gets to a free port. That is normal. 
 Just make sure your create call returns DAT_SUCCESS.

>>> Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down
>>> and let the rdma-cma pick an available port number?
>>>
>>>
>>>
>>> ___
>>> general mailing list
>>> gene...@lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>
>>> To unsubscribe, please visit 
>>> http://openib.org/mailman/listinfo/openib-general
>>>
>>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
|  _  RWTH | Boris Bierbaum
|_|_`_ | Lehrstuhl fuer Betriebssysteme
   | |_) _  | RWTH Aachen D-52056 Aachen
 |_)(_` | Tel: +49-241-80-27805
._) | Fax: +49-241-80-22339



Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-09 Thread Andrew Friedley
OK, strange but good.  Yeah I wouldn't be surprised if something has 
been changed, though I wouldn't know what, and I don't have time right 
now to go digging :(  Maybe Don Kerr knows something?


Andrew


Boris Bierbaum wrote:

I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2
processes per node and --mca btl udapl,self. I didn't encouter any problems.

The comment above line 197 says that dat_ep_query() returns wrong port
numbers (which it does indeed), but I can't find any call to
dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date?

Boris


Andrew Friedley wrote:
You say that fixes the problem, does it work even when running more than 
one MPI process per node? (that is the case the hack fixes)  Simply 
doing an mpirun with a -np paremeter higher than the number of nodes you 
have set up should trigger this case, and making sure to use '-mca btl 
udapl,self' (ie not SM or anything else).


Andrew

Boris Bierbaum wrote:

It has been explained in a different thread on [ofa-general] that the
problem lies in a combination of the OpenIB-cma provider not setting the
local and remote port numbers on endpoints correctly and Open MPI
stepping over the IA to save the port number to circumvent this problem,
thereby confusing the provider.

I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI
1.2.1 release) and this fixes the problem. As the problem in the
provider is currently being fixed, the whole saving of the port number
in the uDAPL BTL code will be unnecessary in the future.

Steve Wise wrote:
Can the UDAPL OFED wizards shed any light on the error messages that  
are listed below?  In particular, these seem to be worrysome:



 setup_listener Permission denied

 setup_listener Address already in use
These failures are from rdma_cm_bind indicating the port is already 
bound to this IA address. How are you creating the service point?
dat_psp_create or dat_psp_create_any? If it is psp_create_any then you 
will see some failures until it  gets to a free port. That is normal. 
Just make sure your create call returns DAT_SUCCESS.



Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down
and let the rdma-cma pick an available port number?



___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users








Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-09 Thread Donald Kerr
Looking at that section it appears that we store the port value locally 
in udapl_addr and use the local copy, so changing the udapl attribute 
may not be doing anything for the BTL.  I will run some tests as well.


-DON

Andrew Friedley wrote:

OK, strange but good.  Yeah I wouldn't be surprised if something has 
been changed, though I wouldn't know what, and I don't have time right 
now to go digging :(  Maybe Don Kerr knows something?


Andrew


Boris Bierbaum wrote:
 


I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2
processes per node and --mca btl udapl,self. I didn't encouter any problems.

The comment above line 197 says that dat_ep_query() returns wrong port
numbers (which it does indeed), but I can't find any call to
dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date?

Boris


Andrew Friedley wrote:
   

You say that fixes the problem, does it work even when running more than 
one MPI process per node? (that is the case the hack fixes)  Simply 
doing an mpirun with a -np paremeter higher than the number of nodes you 
have set up should trigger this case, and making sure to use '-mca btl 
udapl,self' (ie not SM or anything else).


Andrew

Boris Bierbaum wrote:
 


It has been explained in a different thread on [ofa-general] that the
problem lies in a combination of the OpenIB-cma provider not setting the
local and remote port numbers on endpoints correctly and Open MPI
stepping over the IA to save the port number to circumvent this problem,
thereby confusing the provider.

I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI
1.2.1 release) and this fixes the problem. As the problem in the
provider is currently being fixed, the whole saving of the port number
in the uDAPL BTL code will be unnecessary in the future.

Steve Wise wrote:
   

Can the UDAPL OFED wizards shed any light on the error messages that  
are listed below?  In particular, these seem to be worrysome:


 


setup_listener Permission denied
   


setup_listener Address already in use
 

These failures are from rdma_cm_bind indicating the port is already 
bound to this IA address. How are you creating the service point?
dat_psp_create or dat_psp_create_any? If it is psp_create_any then you 
will see some failures until it  gets to a free port. That is normal. 
Just make sure your create call returns DAT_SUCCESS.


   


Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down
and let the rdma-cma pick an available port number?



___
general mailing list
gene...@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

 

   



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
 



Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-09 Thread Boris Bierbaum
I thought about it again: There's probably no call to dat_ep_query()
*because* it returns wrong port numbers and the port numbers saved by
the uDAPL BTL code itself are used.

I'll leave the debugging to those who know the code ... ;-)

Boris


Andrew Friedley wrote:
> OK, strange but good.  Yeah I wouldn't be surprised if something has 
> been changed, though I wouldn't know what, and I don't have time right 
> now to go digging :(  Maybe Don Kerr knows something?
> 
> Andrew
> 
> 
> Boris Bierbaum wrote:
>> I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2
>> processes per node and --mca btl udapl,self. I didn't encouter any problems.
>>
>> The comment above line 197 says that dat_ep_query() returns wrong port
>> numbers (which it does indeed), but I can't find any call to
>> dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date?
>>
>> Boris
>>
>>
>> Andrew Friedley wrote:
>>> You say that fixes the problem, does it work even when running more than 
>>> one MPI process per node? (that is the case the hack fixes)  Simply 
>>> doing an mpirun with a -np paremeter higher than the number of nodes you 
>>> have set up should trigger this case, and making sure to use '-mca btl 
>>> udapl,self' (ie not SM or anything else).
>>>
>>> Andrew
>>>
>>> Boris Bierbaum wrote:
 It has been explained in a different thread on [ofa-general] that the
 problem lies in a combination of the OpenIB-cma provider not setting the
 local and remote port numbers on endpoints correctly and Open MPI
 stepping over the IA to save the port number to circumvent this problem,
 thereby confusing the provider.

 I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI
 1.2.1 release) and this fixes the problem. As the problem in the
 provider is currently being fixed, the whole saving of the port number
 in the uDAPL BTL code will be unnecessary in the future.

 Steve Wise wrote:
>>> Can the UDAPL OFED wizards shed any light on the error messages that  
>>> are listed below?  In particular, these seem to be worrysome:
>>>
  setup_listener Permission denied
>>>  setup_listener Address already in use
>> These failures are from rdma_cm_bind indicating the port is already 
>> bound to this IA address. How are you creating the service point?
>> dat_psp_create or dat_psp_create_any? If it is psp_create_any then you 
>> will see some failures until it  gets to a free port. That is normal. 
>> Just make sure your create call returns DAT_SUCCESS.
>>
> Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down
> and let the rdma-cma pick an available port number?
>
>
>
> ___
> general mailing list
> gene...@lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
|  _  RWTH | Boris Bierbaum
|_|_`_ | Lehrstuhl fuer Betriebssysteme
   | |_) _  | RWTH Aachen D-52056 Aachen
 |_)(_` | Tel: +49-241-80-27805
._) | Fax: +49-241-80-22339



Re: [OMPI users] Newbie question. Please help.

2007-05-09 Thread Steven Truong

Thank Kevin and Brook for replying to my question.  I am going to try
out what Kevin suggested.

Steven.

On 5/9/07, Kevin Radican  wrote:

Hi,

We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on
x86_64 with opensuse and compiled with gcc and Intel fortran and use
torque PBS.

I used standard configure to build openmpi something like

./configure --prefix=/usr/local --enable-static --with-threads
--with-tm=/usr/local --with-libnuma

I used the ACLM math lapack libs and built Blacs and Scalapack with them
too.

I attached my vasp makefile, I might of added

mpi.o : mpi.F
$(CPP)
$(FC) -FR -lowercase -O0 -c $*$(SUFFIX)

to the end of the make file, It doesn't look like it is in the example
makefiles they give, but I compiled this a while ago.

Hope this helps.

Cheers,
Kevin





On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
> Hi, all.  I am new to OpenMPI and after initial setup I tried to run
> my app but got the followign errors:
>
> [node07.my.com:16673] *** An error occurred in MPI_Comm_rank
> [node07.my.com:16673] *** on communicator MPI_COMM_WORLD
> [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
> [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node07.my.com:16674] *** An error occurred in MPI_Comm_rank
> [node07.my.com:16674] *** on communicator MPI_COMM_WORLD
> [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
> [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node07.my.com:16675] *** An error occurred in MPI_Comm_rank
> [node07.my.com:16675] *** on communicator MPI_COMM_WORLD
> [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
> [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node07.my.com:16676] *** An error occurred in MPI_Comm_rank
> [node07.my.com:16676] *** on communicator MPI_COMM_WORLD
> [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
> [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpiexec noticed that job rank 2 with PID 16675 on node node07 exited
> on signal 60 (Real-time signal 26).
>
>  /usr/local/openmpi-1.2.1/bin/ompi_info
> Open MPI: 1.2.1
>Open MPI SVN revision: r14481
> Open RTE: 1.2.1
>Open RTE SVN revision: r14481
> OPAL: 1.2.1
>OPAL SVN revision: r14481
>   Prefix: /usr/local/openmpi-1.2.1
>  Configured architecture: x86_64-unknown-linux-gnu
>Configured by: root
>Configured on: Mon May  7 18:32:56 PDT 2007
>   Configure host: neptune.nanostellar.com
> Built by: root
> Built on: Mon May  7 18:40:28 PDT 2007
>   Built host: neptune.my.com
>   C bindings: yes
> C++ bindings: yes
>   Fortran77 bindings: yes (all)
>   Fortran90 bindings: yes
>  Fortran90 bindings size: small
>   C compiler: gcc
>  C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
>C++ compiler absolute: /usr/bin/g++
>   Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
>   Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
>   Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
>   Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
>  C profiling: yes
>C++ profiling: yes
>  Fortran77 profiling: yes
>  Fortran90 profiling: yes
>   C++ exceptions: no
>   Thread support: posix (mpi: no, progress: no)
>   Internal debug support: no
>  MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>  libltdl support: yes
>Heterogeneous support: yes
>  mpirun default --prefix: yes
>MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1)
>   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1)
>MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1)
>MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1)
>MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.1)
>MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.1)
>  MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.1)
>  MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.1)
>MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
>   MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
>MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.1)
>MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
>  MCA pml: cm (MCA v1.0, API v

Re: [OMPI users] Newbie question. Please help.

2007-05-09 Thread Steven Truong

Hi, Kevin and all.  I tried with the following:

./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
--with-tm=/usr/local/pbs  --enable-mpirun-prefix-by-default
--enable-mpi-f90 --with-threads=posix  --enable-static

and added the mpi.o in my VASP's makefile but i still got error.

I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and
my machines are dual proc dual core Xeon 5130 .

Well, I am going to try acml too.

Attached is my makefile for VASP and I am not sure if I missed anything again.

Thank you very much for all your helps.

On 5/9/07, Steven Truong  wrote:

Thank Kevin and Brook for replying to my question.  I am going to try
out what Kevin suggested.

Steven.

On 5/9/07, Kevin Radican  wrote:
> Hi,
>
> We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on
> x86_64 with opensuse and compiled with gcc and Intel fortran and use
> torque PBS.
>
> I used standard configure to build openmpi something like
>
> ./configure --prefix=/usr/local --enable-static --with-threads
> --with-tm=/usr/local --with-libnuma
>
> I used the ACLM math lapack libs and built Blacs and Scalapack with them
> too.
>
> I attached my vasp makefile, I might of added
>
> mpi.o : mpi.F
> $(CPP)
> $(FC) -FR -lowercase -O0 -c $*$(SUFFIX)
>
> to the end of the make file, It doesn't look like it is in the example
> makefiles they give, but I compiled this a while ago.
>
> Hope this helps.
>
> Cheers,
> Kevin
>
>
>
>
>
> On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
> > Hi, all.  I am new to OpenMPI and after initial setup I tried to run
> > my app but got the followign errors:
> >
> > [node07.my.com:16673] *** An error occurred in MPI_Comm_rank
> > [node07.my.com:16673] *** on communicator MPI_COMM_WORLD
> > [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
> > [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > [node07.my.com:16674] *** An error occurred in MPI_Comm_rank
> > [node07.my.com:16674] *** on communicator MPI_COMM_WORLD
> > [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
> > [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > [node07.my.com:16675] *** An error occurred in MPI_Comm_rank
> > [node07.my.com:16675] *** on communicator MPI_COMM_WORLD
> > [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
> > [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > [node07.my.com:16676] *** An error occurred in MPI_Comm_rank
> > [node07.my.com:16676] *** on communicator MPI_COMM_WORLD
> > [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
> > [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > mpiexec noticed that job rank 2 with PID 16675 on node node07 exited
> > on signal 60 (Real-time signal 26).
> >
> >  /usr/local/openmpi-1.2.1/bin/ompi_info
> > Open MPI: 1.2.1
> >Open MPI SVN revision: r14481
> > Open RTE: 1.2.1
> >Open RTE SVN revision: r14481
> > OPAL: 1.2.1
> >OPAL SVN revision: r14481
> >   Prefix: /usr/local/openmpi-1.2.1
> >  Configured architecture: x86_64-unknown-linux-gnu
> >Configured by: root
> >Configured on: Mon May  7 18:32:56 PDT 2007
> >   Configure host: neptune.nanostellar.com
> > Built by: root
> > Built on: Mon May  7 18:40:28 PDT 2007
> >   Built host: neptune.my.com
> >   C bindings: yes
> > C++ bindings: yes
> >   Fortran77 bindings: yes (all)
> >   Fortran90 bindings: yes
> >  Fortran90 bindings size: small
> >   C compiler: gcc
> >  C compiler absolute: /usr/bin/gcc
> > C++ compiler: g++
> >C++ compiler absolute: /usr/bin/g++
> >   Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
> >   Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
> >   Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
> >   Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
> >  C profiling: yes
> >C++ profiling: yes
> >  Fortran77 profiling: yes
> >  Fortran90 profiling: yes
> >   C++ exceptions: no
> >   Thread support: posix (mpi: no, progress: no)
> >   Internal debug support: no
> >  MPI parameter check: runtime
> > Memory profiling support: no
> > Memory debugging support: no
> >  libltdl support: yes
> >Heterogeneous support: yes
> >  mpirun default --prefix: yes
> >MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1)
> >   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1)
> >MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1)
> >MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1)
> >MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.1)
> >MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.1)
> >  MCA installdirs: env (MCA v1.0, API

Re: [OMPI users] Newbie question. Please help.

2007-05-09 Thread Steven Truong

Oh, no.  I tried with ACML and had the same set of errors.

Steven.

On 5/9/07, Steven Truong  wrote:

Hi, Kevin and all.  I tried with the following:

./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
--with-tm=/usr/local/pbs  --enable-mpirun-prefix-by-default
--enable-mpi-f90 --with-threads=posix  --enable-static

and added the mpi.o in my VASP's makefile but i still got error.

I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and
my machines are dual proc dual core Xeon 5130 .

 Well, I am going to try acml too.

Attached is my makefile for VASP and I am not sure if I missed anything again.

Thank you very much for all your helps.

On 5/9/07, Steven Truong  wrote:
> Thank Kevin and Brook for replying to my question.  I am going to try
> out what Kevin suggested.
>
> Steven.
>
> On 5/9/07, Kevin Radican  wrote:
> > Hi,
> >
> > We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on
> > x86_64 with opensuse and compiled with gcc and Intel fortran and use
> > torque PBS.
> >
> > I used standard configure to build openmpi something like
> >
> > ./configure --prefix=/usr/local --enable-static --with-threads
> > --with-tm=/usr/local --with-libnuma
> >
> > I used the ACLM math lapack libs and built Blacs and Scalapack with them
> > too.
> >
> > I attached my vasp makefile, I might of added
> >
> > mpi.o : mpi.F
> > $(CPP)
> > $(FC) -FR -lowercase -O0 -c $*$(SUFFIX)
> >
> > to the end of the make file, It doesn't look like it is in the example
> > makefiles they give, but I compiled this a while ago.
> >
> > Hope this helps.
> >
> > Cheers,
> > Kevin
> >
> >
> >
> >
> >
> > On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
> > > Hi, all.  I am new to OpenMPI and after initial setup I tried to run
> > > my app but got the followign errors:
> > >
> > > [node07.my.com:16673] *** An error occurred in MPI_Comm_rank
> > > [node07.my.com:16673] *** on communicator MPI_COMM_WORLD
> > > [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
> > > [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > > [node07.my.com:16674] *** An error occurred in MPI_Comm_rank
> > > [node07.my.com:16674] *** on communicator MPI_COMM_WORLD
> > > [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
> > > [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > > [node07.my.com:16675] *** An error occurred in MPI_Comm_rank
> > > [node07.my.com:16675] *** on communicator MPI_COMM_WORLD
> > > [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
> > > [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > > [node07.my.com:16676] *** An error occurred in MPI_Comm_rank
> > > [node07.my.com:16676] *** on communicator MPI_COMM_WORLD
> > > [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
> > > [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
> > > mpiexec noticed that job rank 2 with PID 16675 on node node07 exited
> > > on signal 60 (Real-time signal 26).
> > >
> > >  /usr/local/openmpi-1.2.1/bin/ompi_info
> > > Open MPI: 1.2.1
> > >Open MPI SVN revision: r14481
> > > Open RTE: 1.2.1
> > >Open RTE SVN revision: r14481
> > > OPAL: 1.2.1
> > >OPAL SVN revision: r14481
> > >   Prefix: /usr/local/openmpi-1.2.1
> > >  Configured architecture: x86_64-unknown-linux-gnu
> > >Configured by: root
> > >Configured on: Mon May  7 18:32:56 PDT 2007
> > >   Configure host: neptune.nanostellar.com
> > > Built by: root
> > > Built on: Mon May  7 18:40:28 PDT 2007
> > >   Built host: neptune.my.com
> > >   C bindings: yes
> > > C++ bindings: yes
> > >   Fortran77 bindings: yes (all)
> > >   Fortran90 bindings: yes
> > >  Fortran90 bindings size: small
> > >   C compiler: gcc
> > >  C compiler absolute: /usr/bin/gcc
> > > C++ compiler: g++
> > >C++ compiler absolute: /usr/bin/g++
> > >   Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
> > >   Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
> > >   Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
> > >   Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
> > >  C profiling: yes
> > >C++ profiling: yes
> > >  Fortran77 profiling: yes
> > >  Fortran90 profiling: yes
> > >   C++ exceptions: no
> > >   Thread support: posix (mpi: no, progress: no)
> > >   Internal debug support: no
> > >  MPI parameter check: runtime
> > > Memory profiling support: no
> > > Memory debugging support: no
> > >  libltdl support: yes
> > >Heterogeneous support: yes
> > >  mpirun default --prefix: yes
> > >MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1)
> > >   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1)
> > >MCA paffinity: linux (MCA v1

Re: [OMPI users] Newbie question. Please help.

2007-05-09 Thread Jeff Squyres

Can you send a simple test that reproduces these errors?

I.e., if there's a single, simple package that you can send  
instructions on how to build, it would be most helpful if we could  
reproduce the error (and therefore figure out how to fix it).


Thanks!


On May 9, 2007, at 2:19 PM, Steven Truong wrote:


Oh, no.  I tried with ACML and had the same set of errors.

Steven.

On 5/9/07, Steven Truong  wrote:

Hi, Kevin and all.  I tried with the following:

./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
--with-tm=/usr/local/pbs  --enable-mpirun-prefix-by-default
--enable-mpi-f90 --with-threads=posix  --enable-static

and added the mpi.o in my VASP's makefile but i still got error.

I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and
my machines are dual proc dual core Xeon 5130 .

 Well, I am going to try acml too.

Attached is my makefile for VASP and I am not sure if I missed  
anything again.


Thank you very much for all your helps.

On 5/9/07, Steven Truong  wrote:
Thank Kevin and Brook for replying to my question.  I am going to  
try

out what Kevin suggested.

Steven.

On 5/9/07, Kevin Radican  wrote:

Hi,

We use VASP 4.6 in parallel with opemmpi 1.1.2 without any  
problems on
x86_64 with opensuse and compiled with gcc and Intel fortran and  
use

torque PBS.

I used standard configure to build openmpi something like

./configure --prefix=/usr/local --enable-static --with-threads
--with-tm=/usr/local --with-libnuma

I used the ACLM math lapack libs and built Blacs and Scalapack  
with them

too.

I attached my vasp makefile, I might of added

mpi.o : mpi.F
$(CPP)
$(FC) -FR -lowercase -O0 -c $*$(SUFFIX)

to the end of the make file, It doesn't look like it is in the  
example

makefiles they give, but I compiled this a while ago.

Hope this helps.

Cheers,
Kevin





On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
Hi, all.  I am new to OpenMPI and after initial setup I tried  
to run

my app but got the followign errors:

[node07.my.com:16673] *** An error occurred in MPI_Comm_rank
[node07.my.com:16673] *** on communicator MPI_COMM_WORLD
[node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16674] *** An error occurred in MPI_Comm_rank
[node07.my.com:16674] *** on communicator MPI_COMM_WORLD
[node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16675] *** An error occurred in MPI_Comm_rank
[node07.my.com:16675] *** on communicator MPI_COMM_WORLD
[node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16676] *** An error occurred in MPI_Comm_rank
[node07.my.com:16676] *** on communicator MPI_COMM_WORLD
[node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpiexec noticed that job rank 2 with PID 16675 on node node07  
exited

on signal 60 (Real-time signal 26).

 /usr/local/openmpi-1.2.1/bin/ompi_info
Open MPI: 1.2.1
   Open MPI SVN revision: r14481
Open RTE: 1.2.1
   Open RTE SVN revision: r14481
OPAL: 1.2.1
   OPAL SVN revision: r14481
  Prefix: /usr/local/openmpi-1.2.1
 Configured architecture: x86_64-unknown-linux-gnu
   Configured by: root
   Configured on: Mon May  7 18:32:56 PDT 2007
  Configure host: neptune.nanostellar.com
Built by: root
Built on: Mon May  7 18:40:28 PDT 2007
  Built host: neptune.my.com
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
  Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
  Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
  Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: yes
   MCA backtrace: execinfo (MCA v1.0, API v1.0,  
Component v1.2.1)
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0,  
Component v1.2.1)
   MCA paffinity: linux (MCA v1.0, API v1.0, Component  
v1.2.1)
   MCA maffinity: first_use (MCA v1.0, API v1.0,  
Component v1.2.1)
   MCA maffinity: libnuma (MCA v1.0, API v1.0,  
Component v1

Re: [OMPI users] Newbie question. Please help.

2007-05-09 Thread Steven Truong

Hi, Jeff.   Thank you very much for looking into this issue.   I am
afraid that I can not give you the application/package because it is a
comercial software.  I believe that a lot of people are using this
VASP software package http://cms.mpi.univie.ac.at/vasp/.

My current environment uses MPICH 1.2.7p1, however, because a new set
of dual core machines has posed a new set of challenges and I am
looking into replacing MPICH with openmpi on these machines.

Could Mr. Radican, who wrote that he was able to run VASP with
openMPI, provide a lot more detail regarding how he configure openmpi,
how he compile and run VASP job and anything relating to this issue?

Thank you very much for all your helps.
Steven.

On 5/9/07, Jeff Squyres  wrote:

Can you send a simple test that reproduces these errors?

I.e., if there's a single, simple package that you can send
instructions on how to build, it would be most helpful if we could
reproduce the error (and therefore figure out how to fix it).

Thanks!


On May 9, 2007, at 2:19 PM, Steven Truong wrote:

> Oh, no.  I tried with ACML and had the same set of errors.
>
> Steven.
>
> On 5/9/07, Steven Truong  wrote:
>> Hi, Kevin and all.  I tried with the following:
>>
>> ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
>> --with-tm=/usr/local/pbs  --enable-mpirun-prefix-by-default
>> --enable-mpi-f90 --with-threads=posix  --enable-static
>>
>> and added the mpi.o in my VASP's makefile but i still got error.
>>
>> I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and
>> my machines are dual proc dual core Xeon 5130 .
>>
>>  Well, I am going to try acml too.
>>
>> Attached is my makefile for VASP and I am not sure if I missed
>> anything again.
>>
>> Thank you very much for all your helps.
>>
>> On 5/9/07, Steven Truong  wrote:
>>> Thank Kevin and Brook for replying to my question.  I am going to
>>> try
>>> out what Kevin suggested.
>>>
>>> Steven.
>>>
>>> On 5/9/07, Kevin Radican  wrote:
 Hi,

 We use VASP 4.6 in parallel with opemmpi 1.1.2 without any
 problems on
 x86_64 with opensuse and compiled with gcc and Intel fortran and
 use
 torque PBS.

 I used standard configure to build openmpi something like

 ./configure --prefix=/usr/local --enable-static --with-threads
 --with-tm=/usr/local --with-libnuma

 I used the ACLM math lapack libs and built Blacs and Scalapack
 with them
 too.

 I attached my vasp makefile, I might of added

 mpi.o : mpi.F
 $(CPP)
 $(FC) -FR -lowercase -O0 -c $*$(SUFFIX)

 to the end of the make file, It doesn't look like it is in the
 example
 makefiles they give, but I compiled this a while ago.

 Hope this helps.

 Cheers,
 Kevin





 On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
> Hi, all.  I am new to OpenMPI and after initial setup I tried
> to run
> my app but got the followign errors:
>
> [node07.my.com:16673] *** An error occurred in MPI_Comm_rank
> [node07.my.com:16673] *** on communicator MPI_COMM_WORLD
> [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
> [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node07.my.com:16674] *** An error occurred in MPI_Comm_rank
> [node07.my.com:16674] *** on communicator MPI_COMM_WORLD
> [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
> [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node07.my.com:16675] *** An error occurred in MPI_Comm_rank
> [node07.my.com:16675] *** on communicator MPI_COMM_WORLD
> [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
> [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [node07.my.com:16676] *** An error occurred in MPI_Comm_rank
> [node07.my.com:16676] *** on communicator MPI_COMM_WORLD
> [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
> [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpiexec noticed that job rank 2 with PID 16675 on node node07
> exited
> on signal 60 (Real-time signal 26).
>
>  /usr/local/openmpi-1.2.1/bin/ompi_info
> Open MPI: 1.2.1
>Open MPI SVN revision: r14481
> Open RTE: 1.2.1
>Open RTE SVN revision: r14481
> OPAL: 1.2.1
>OPAL SVN revision: r14481
>   Prefix: /usr/local/openmpi-1.2.1
>  Configured architecture: x86_64-unknown-linux-gnu
>Configured by: root
>Configured on: Mon May  7 18:32:56 PDT 2007
>   Configure host: neptune.nanostellar.com
> Built by: root
> Built on: Mon May  7 18:40:28 PDT 2007
>   Built host: neptune.my.com
>   C bindings: yes
> C++ bindings: yes
>   Fortran77 bindings: yes (all)
>>>

Re: [OMPI users] Newbie question. Please help.

2007-05-09 Thread Jeff Squyres
I have mailed the VASP maintainer asking for a copy of the code.   
Let's see what happens.


On May 9, 2007, at 2:44 PM, Steven Truong wrote:


Hi, Jeff.   Thank you very much for looking into this issue.   I am
afraid that I can not give you the application/package because it is a
comercial software.  I believe that a lot of people are using this
VASP software package http://cms.mpi.univie.ac.at/vasp/.

My current environment uses MPICH 1.2.7p1, however, because a new set
of dual core machines has posed a new set of challenges and I am
looking into replacing MPICH with openmpi on these machines.

Could Mr. Radican, who wrote that he was able to run VASP with
openMPI, provide a lot more detail regarding how he configure openmpi,
how he compile and run VASP job and anything relating to this issue?

Thank you very much for all your helps.
Steven.

On 5/9/07, Jeff Squyres  wrote:

Can you send a simple test that reproduces these errors?

I.e., if there's a single, simple package that you can send
instructions on how to build, it would be most helpful if we could
reproduce the error (and therefore figure out how to fix it).

Thanks!


On May 9, 2007, at 2:19 PM, Steven Truong wrote:


Oh, no.  I tried with ACML and had the same set of errors.

Steven.

On 5/9/07, Steven Truong  wrote:

Hi, Kevin and all.  I tried with the following:

./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
--with-tm=/usr/local/pbs  --enable-mpirun-prefix-by-default
--enable-mpi-f90 --with-threads=posix  --enable-static

and added the mpi.o in my VASP's makefile but i still got error.

I forgot to mention that our environment has Intel MKL 9.0 or  
8.1 and

my machines are dual proc dual core Xeon 5130 .

 Well, I am going to try acml too.

Attached is my makefile for VASP and I am not sure if I missed
anything again.

Thank you very much for all your helps.

On 5/9/07, Steven Truong  wrote:

Thank Kevin and Brook for replying to my question.  I am going to
try
out what Kevin suggested.

Steven.

On 5/9/07, Kevin Radican  wrote:

Hi,

We use VASP 4.6 in parallel with opemmpi 1.1.2 without any
problems on
x86_64 with opensuse and compiled with gcc and Intel fortran and
use
torque PBS.

I used standard configure to build openmpi something like

./configure --prefix=/usr/local --enable-static --with-threads
--with-tm=/usr/local --with-libnuma

I used the ACLM math lapack libs and built Blacs and Scalapack
with them
too.

I attached my vasp makefile, I might of added

mpi.o : mpi.F
$(CPP)
$(FC) -FR -lowercase -O0 -c $*$(SUFFIX)

to the end of the make file, It doesn't look like it is in the
example
makefiles they give, but I compiled this a while ago.

Hope this helps.

Cheers,
Kevin





On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:

Hi, all.  I am new to OpenMPI and after initial setup I tried
to run
my app but got the followign errors:

[node07.my.com:16673] *** An error occurred in MPI_Comm_rank
[node07.my.com:16673] *** on communicator MPI_COMM_WORLD
[node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16674] *** An error occurred in MPI_Comm_rank
[node07.my.com:16674] *** on communicator MPI_COMM_WORLD
[node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16675] *** An error occurred in MPI_Comm_rank
[node07.my.com:16675] *** on communicator MPI_COMM_WORLD
[node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16676] *** An error occurred in MPI_Comm_rank
[node07.my.com:16676] *** on communicator MPI_COMM_WORLD
[node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpiexec noticed that job rank 2 with PID 16675 on node node07
exited
on signal 60 (Real-time signal 26).

 /usr/local/openmpi-1.2.1/bin/ompi_info
Open MPI: 1.2.1
   Open MPI SVN revision: r14481
Open RTE: 1.2.1
   Open RTE SVN revision: r14481
OPAL: 1.2.1
   OPAL SVN revision: r14481
  Prefix: /usr/local/openmpi-1.2.1
 Configured architecture: x86_64-unknown-linux-gnu
   Configured by: root
   Configured on: Mon May  7 18:32:56 PDT 2007
  Configure host: neptune.nanostellar.com
Built by: root
Built on: Mon May  7 18:40:28 PDT 2007
  Built host: neptune.my.com
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
  Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
  Fortran90 compiler: /opt/intel/f

[OMPI users] CFP - HPCC 2007, Houston, TX - Extended Deadline: May 21, 2007

2007-05-09 Thread Laksono Adhianto

Due to many requests, the deadline has been extended to May 21, 2007.
sorry for cross-posting


   CALL FOR PAPERS

   HPCC-07
  The 2007 International Conference on High Performance
   Computing and Communications

   September 26 - 28, Houston, Texas, USA,
 http://www.tlc2.uh.edu/hpcc07/


With the rapid growth in computing and communications technology, the past
decade has witnessed a proliferation of powerful parallel and distributed
systems and an ever increasing demand for practice of high performance
computing and communications (HPCC). HPCC has moved into the mainstream of
computing and has become a key technology in determining future research and
development activities in many academic and industrial branches, especially
when the solution of large and complex problems must cope with very tight
timing schedules. The HPCC-07 conference provides a forum for engineers and
scientists in academia, industry, and government to address the resulting
profound challenges and to present and discuss their new ideas, research
results, applications and experience on all aspects of high performance
computing and communications. With this third event in this conference
series, HPCC-07 plans to continue the success of the previous two HPCC
conferences, held 2005 in Sorrento, Italy, and HPCC-06 held in Munich,
Germany, each with over 300 submitted papers and more than 100 
participants.

TOPICS OF INTEREST

  * Networking protocols, routing, algorithms
  * Languages and compilers for HPC
  * Parallel and distributed system architectures
  * Parallel and distributed algorithms
  * Embedded systems
  * Wireless, mobile and pervasive computing
  * Web services and Internet computing
  * Peer-to-peer computing
  * Grid computing
  * Cluster computing
  * Reliability, fault-tolerance, and security
  * Performance evaluation and measurement
  * Tools and environments for software development
  * Distributed systems and applications
  * High-performance scientific and engineering computing
  * Database applications and data mining
  * Biological/molecular computing
  * Collaborative and cooperative environments

IMPORTANT DATES

Paper submission: April 30, 2007 --> May 21, 2007
Acceptance notification:  June 15, 2007
Camera ready version: June 29, 2007
Special session proposal: April 30, 2007
Conference:   September 26-28, 2007

STEERING CHAIRS

* Beniamino Di Martino, Seconda Universita' di Napoli, Italy
* Laurence T. Yang, St. Francis Xavier University, Canada

ORGANIZATION

* Barbara Chapman, University of Houston (Co-Chair)
* Jaspal Subhlok, University of Houston (Co-Chair)
* Ronald Perrott, Queen's University of Belfast (Program Chair)
* Rosalinda Mendez, University of Houston (Local Chair)

PUBLICITY CHAIRS

* Hai Jiang, Arkansas State University, USA
* Weisong Shi, Wayne State University, USA
* Haiying Shen, University of Arkansas, USA

PUBLICATIONS

Accepted papers are published in the proceedings of the
HPCC-07 conference by Springer's Lecture Notes in Computer Science (LNCS).
Selected best papers will be considered for special issues of
Scientific Programming and the International Journal of High Performance
Computing and Networking (IJHPCN).

CONTACT ADDRESS
hpc...@tlc2.uh.edu or hpc...@googlegroups.com




Re: [OMPI users] Newbie question. Please help.

2007-05-09 Thread Steven Truong

Thank Jeff very much for your efforts and helps.

On 5/9/07, Jeff Squyres  wrote:

I have mailed the VASP maintainer asking for a copy of the code.
Let's see what happens.

On May 9, 2007, at 2:44 PM, Steven Truong wrote:

> Hi, Jeff.   Thank you very much for looking into this issue.   I am
> afraid that I can not give you the application/package because it is a
> comercial software.  I believe that a lot of people are using this
> VASP software package http://cms.mpi.univie.ac.at/vasp/.
>
> My current environment uses MPICH 1.2.7p1, however, because a new set
> of dual core machines has posed a new set of challenges and I am
> looking into replacing MPICH with openmpi on these machines.
>
> Could Mr. Radican, who wrote that he was able to run VASP with
> openMPI, provide a lot more detail regarding how he configure openmpi,
> how he compile and run VASP job and anything relating to this issue?
>
> Thank you very much for all your helps.
> Steven.
>
> On 5/9/07, Jeff Squyres  wrote:
>> Can you send a simple test that reproduces these errors?
>>
>> I.e., if there's a single, simple package that you can send
>> instructions on how to build, it would be most helpful if we could
>> reproduce the error (and therefore figure out how to fix it).
>>
>> Thanks!
>>
>>
>> On May 9, 2007, at 2:19 PM, Steven Truong wrote:
>>
>>> Oh, no.  I tried with ACML and had the same set of errors.
>>>
>>> Steven.
>>>
>>> On 5/9/07, Steven Truong  wrote:
 Hi, Kevin and all.  I tried with the following:

 ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
 --with-tm=/usr/local/pbs  --enable-mpirun-prefix-by-default
 --enable-mpi-f90 --with-threads=posix  --enable-static

 and added the mpi.o in my VASP's makefile but i still got error.

 I forgot to mention that our environment has Intel MKL 9.0 or
 8.1 and
 my machines are dual proc dual core Xeon 5130 .

  Well, I am going to try acml too.

 Attached is my makefile for VASP and I am not sure if I missed
 anything again.

 Thank you very much for all your helps.

 On 5/9/07, Steven Truong  wrote:
> Thank Kevin and Brook for replying to my question.  I am going to
> try
> out what Kevin suggested.
>
> Steven.
>
> On 5/9/07, Kevin Radican  wrote:
>> Hi,
>>
>> We use VASP 4.6 in parallel with opemmpi 1.1.2 without any
>> problems on
>> x86_64 with opensuse and compiled with gcc and Intel fortran and
>> use
>> torque PBS.
>>
>> I used standard configure to build openmpi something like
>>
>> ./configure --prefix=/usr/local --enable-static --with-threads
>> --with-tm=/usr/local --with-libnuma
>>
>> I used the ACLM math lapack libs and built Blacs and Scalapack
>> with them
>> too.
>>
>> I attached my vasp makefile, I might of added
>>
>> mpi.o : mpi.F
>> $(CPP)
>> $(FC) -FR -lowercase -O0 -c $*$(SUFFIX)
>>
>> to the end of the make file, It doesn't look like it is in the
>> example
>> makefiles they give, but I compiled this a while ago.
>>
>> Hope this helps.
>>
>> Cheers,
>> Kevin
>>
>>
>>
>>
>>
>> On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
>>> Hi, all.  I am new to OpenMPI and after initial setup I tried
>>> to run
>>> my app but got the followign errors:
>>>
>>> [node07.my.com:16673] *** An error occurred in MPI_Comm_rank
>>> [node07.my.com:16673] *** on communicator MPI_COMM_WORLD
>>> [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
>>> [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> [node07.my.com:16674] *** An error occurred in MPI_Comm_rank
>>> [node07.my.com:16674] *** on communicator MPI_COMM_WORLD
>>> [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
>>> [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> [node07.my.com:16675] *** An error occurred in MPI_Comm_rank
>>> [node07.my.com:16675] *** on communicator MPI_COMM_WORLD
>>> [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
>>> [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> [node07.my.com:16676] *** An error occurred in MPI_Comm_rank
>>> [node07.my.com:16676] *** on communicator MPI_COMM_WORLD
>>> [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
>>> [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>> mpiexec noticed that job rank 2 with PID 16675 on node node07
>>> exited
>>> on signal 60 (Real-time signal 26).
>>>
>>>  /usr/local/openmpi-1.2.1/bin/ompi_info
>>> Open MPI: 1.2.1
>>>Open MPI SVN revision: r14481
>>> Open RTE: 1.2.1
>>>Open RTE SVN revision: r14481
>>> OPAL: 1.2.1
>>>OPAL SVN revision: r14481
>>>   Prefix: /usr/local/op

[OMPI users] OMPI and OSU bandwidth benchmark

2007-05-09 Thread Mike Tsai

Greetings eveyrong,

My name is Mike, and I have recently downloaded the OMPI v1.2.1 and decide
to run the OSU bandwidth benchmark. However, I have noticed a few weird
things during my run.

Btw, I am using FreeBSD 6.2.

The OSU bandwidth test basically pre-post many ISend and IRecv. It tries to
measure the max. sustainable bandwidth.

Here is an output (I didn't finish running, but it should be sufficient to
show the problem that I am seeing):

Quick system info:
Two nodes testing (running Intel P4 Xeon 3.2Ghz Hyperthreading disabled,
1024Mb RAM).
3 1-Gig NiCs, all Intel Pro em1000(em0 and em2 are the private interfaces (
10.1.x.x) , while em1 is the public interface)

--

[myct@netbed21 ~/mpich/osu_benchmarks]$ mpirun --mca btl_tcp_if_include em0
--hostfile ~/mpd.hosts.private --mca btl tcp,self --mca btl_tcp_sndbuf
233016 --mca btl_tcp_rcvbuf 233016  -np 2 ./osu_bw
# OSU MPI Bandwidth Test (Version 2.3)
# Size  Bandwidth (MB/s)
1   0.12
2   0.26
4   0.53
8   1.06
16  2.12
32  4.22
64  8.26
128 14.61
256 28.06
512 51.27
102482.59
2048102.21
4096110.53
8192114.58
16384   118.16
32768   120.71
65536   33.23
131072  41.75
262144  70.42
524288  82.96
^Cmpirun: killing job...

--

The rendezvous  threshold is set to 64k by default.

It seems that when the rendezvous starts, the performance dropped
tremendously.
Btw, this is an out-of-box run, I have not tweaked anything except changing
the socket buffer sizes during runtime.
Is there something obvious that I am not doing correctly?

I have also attached the "ompi-info" output.

Thanks for everything,

Mike


ompi_out
Description: Binary data