Re: [OMPI users] init of component openib returned failure

2010-05-19 Thread Peter Kruse

Hello,

thanks for your reply.

Jeff Squyres wrote:

Try running with:

mpirun.openmpi-1.4.1 --mca btl_base_verbose 50  --mca btl self,openib -n 2 
--mca btl_openib_verbose 100 ./IMB-MPI1 -npmin 2 PingPong


the output is exactly the same as before.



Also, are you saying that running the same command line with osu_latency works 
just fine?  That would be really weird...


Yes, if I run:

mpirun.openmpi-1.4.1 --mca btl_base_verbose 50 --mca btl self,openib -n 2 
--mca btl_openib_verbose 100 ./osu_lat_ompi-1.4.1


the openib component can be initialized:

8<--

[beo-15:29479] mca: base: components_open: Looking for btl components
[beo-16:29063] mca: base: components_open: Looking for btl components
[beo-15:29479] mca: base: components_open: opening btl components
[beo-15:29479] mca: base: components_open: found loaded component openib
[beo-15:29479] mca: base: components_open: component openib has no register 
function
[beo-15:29479] mca: base: components_open: component openib open function 
successful

[beo-15:29479] mca: base: components_open: found loaded component self
[beo-15:29479] mca: base: components_open: component self has no register 
function
[beo-15:29479] mca: base: components_open: component self open function 
successful
[beo-16:29063] mca: base: components_open: opening btl components
[beo-16:29063] mca: base: components_open: found loaded component openib
[beo-16:29063] mca: base: components_open: component openib has no register 
function
[beo-16:29063] mca: base: components_open: component openib open function 
successful

[beo-16:29063] mca: base: components_open: found loaded component self
[beo-16:29063] mca: base: components_open: component self has no register 
function
[beo-16:29063] mca: base: components_open: component self open function 
successful
[beo-15:29479] select: initializing btl component openib
[beo-16:29063] select: initializing btl component openib
[beo-15][[12785,1],0][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x02c9, part ID 25204
[beo-15][[12785,1],0][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: Mellanox Sinai Infinihost III
[beo-15][[12785,1],0][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x, part ID 0
[beo-15][[12785,1],0][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: default

[beo-15:29479] openib BTL: oob CPC available for use on mthca0:1
[beo-15:29479] openib BTL: xoob CPC only supported with XRC receive queues; 
skipped on mthca0:1

[beo-15:29479] openib BTL: rdmacm CPC available for use on mthca0:1
[beo-15:29479] select: init of component openib returned success
[beo-15:29479] select: initializing btl component self
[beo-15:29479] select: init of component self returned success
[beo-16][[12785,1],1][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x02c9, part ID 25204
[beo-16][[12785,1],1][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: Mellanox Sinai Infinihost III
[beo-16][[12785,1],1][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x, part ID 0
[beo-16][[12785,1],1][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: default

[beo-16:29063] openib BTL: oob CPC available for use on mthca0:1
[beo-16:29063] openib BTL: xoob CPC only supported with XRC receive queues; 
skipped on mthca0:1

[beo-16:29063] openib BTL: rdmacm CPC available for use on mthca0:1
[beo-16:29063] select: init of component openib returned success
[beo-16:29063] select: initializing btl component self
[beo-16:29063] select: init of component self returned success
# OSU MPI Latency Test (Version 2.2)
# Size  Latency (us)
[beo-16][[12785,1],1][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
MTU to IBV value 4 (2048 bytes)
[beo-16][[12785,1],1][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
MTU to IBV value 4 (2048 bytes)
[beo-16][[12785,1],1][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
MTU to IBV value 4 (2048 bytes)
[beo-16][[12785,1],1][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
MTU to IBV value 4 (2048 bytes)
[beo-15][[12785,1],0][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
MTU to IBV value 4 (2048 bytes)
[beo-15][[12785,1],0][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
MTU to IBV value 4 (2048 bytes)
[beo-15][[12785,1],0][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
MTU to IBV value 4 (2048 bytes)
[beo-15][[12785,1],0][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
MTU to IBV value 4 (2048 bytes)

0   3.57
1   3.65
2   3.63
4   3.64
8   3.68
16  3.72
32  3.77
64  3.95
128 4.95
256 5.36
512 6.03
1024  

Re: [OMPI users] init of component openib returned failure

2010-05-19 Thread Jeff Squyres (jsquyres)
Ok, we've entered the Land of Really Weird - I've never seen a btl work with 
one mpi app and not another.

Some q's:

- are you running both apps on the same nodes?
- is anything else running on the nodes at the same time (e.g., other mpi jobs 
using openfabrics)?
- is the imb compiled for ompi 1.4.1?
- can you run ldd on the apps to ensure they're linking to the same libmpi?

-jms
Sent from my PDA.  No type good.

- Original Message -
From: users-boun...@open-mpi.org 
To: Open MPI Users 
Sent: Wed May 19 02:45:58 2010
Subject: Re: [OMPI users] init of component openib returned failure

Hello,

thanks for your reply.

Jeff Squyres wrote:
> Try running with:
> 
> mpirun.openmpi-1.4.1 --mca btl_base_verbose 50  --mca btl self,openib -n 2 
> --mca btl_openib_verbose 100 ./IMB-MPI1 -npmin 2 PingPong

the output is exactly the same as before.

> 
> Also, are you saying that running the same command line with osu_latency 
> works just fine?  That would be really weird...

Yes, if I run:

mpirun.openmpi-1.4.1 --mca btl_base_verbose 50 --mca btl self,openib -n 2 
--mca btl_openib_verbose 100 ./osu_lat_ompi-1.4.1

the openib component can be initialized:

8<--

[beo-15:29479] mca: base: components_open: Looking for btl components
[beo-16:29063] mca: base: components_open: Looking for btl components
[beo-15:29479] mca: base: components_open: opening btl components
[beo-15:29479] mca: base: components_open: found loaded component openib
[beo-15:29479] mca: base: components_open: component openib has no register 
function
[beo-15:29479] mca: base: components_open: component openib open function 
successful
[beo-15:29479] mca: base: components_open: found loaded component self
[beo-15:29479] mca: base: components_open: component self has no register 
function
[beo-15:29479] mca: base: components_open: component self open function 
successful
[beo-16:29063] mca: base: components_open: opening btl components
[beo-16:29063] mca: base: components_open: found loaded component openib
[beo-16:29063] mca: base: components_open: component openib has no register 
function
[beo-16:29063] mca: base: components_open: component openib open function 
successful
[beo-16:29063] mca: base: components_open: found loaded component self
[beo-16:29063] mca: base: components_open: component self has no register 
function
[beo-16:29063] mca: base: components_open: component self open function 
successful
[beo-15:29479] select: initializing btl component openib
[beo-16:29063] select: initializing btl component openib
[beo-15][[12785,1],0][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x02c9, part ID 25204
[beo-15][[12785,1],0][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: Mellanox Sinai Infinihost III
[beo-15][[12785,1],0][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x, part ID 0
[beo-15][[12785,1],0][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: default
[beo-15:29479] openib BTL: oob CPC available for use on mthca0:1
[beo-15:29479] openib BTL: xoob CPC only supported with XRC receive queues; 
skipped on mthca0:1
[beo-15:29479] openib BTL: rdmacm CPC available for use on mthca0:1
[beo-15:29479] select: init of component openib returned success
[beo-15:29479] select: initializing btl component self
[beo-15:29479] select: init of component self returned success
[beo-16][[12785,1],1][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x02c9, part ID 25204
[beo-16][[12785,1],1][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: Mellanox Sinai Infinihost III
[beo-16][[12785,1],1][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x, part ID 0
[beo-16][[12785,1],1][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: default
[beo-16:29063] openib BTL: oob CPC available for use on mthca0:1
[beo-16:29063] openib BTL: xoob CPC only supported with XRC receive queues; 
skipped on mthca0:1
[beo-16:29063] openib BTL: rdmacm CPC available for use on mthca0:1
[beo-16:29063] select: init of component openib returned success
[beo-16:29063] select: initializing btl component self
[beo-16:29063] select: init of component self returned success
# OSU MPI Latency Test (Version 2.2)
# Size  Latency (us)
[beo-16][[12785,1],1][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
MTU to IBV value 4 (2048 bytes)
[beo-16][[12785,1],1][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
MTU to IBV value 4 (2048 bytes)
[beo-16][[12785,1],1][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
MTU to IBV value 4 (2048 bytes)
[beo-16][[12785,1],1][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
MTU to IBV value 4 (2048 bytes)
[beo-15][[12785,1],0][connect/btl_openib_connect_oob.c:313:qp_connect_all] Set 
M

Re: [OMPI users] default hostfile (Ubuntu-9.10)

2010-05-19 Thread Stefan Kuhne
Am 18.05.2010 15:46, schrieb Ralph Castain:

Hello,

> Starting in the 1.3 series, you have to tell OMPI where to find the
> default hostfile. So put this in your default MCA param file:
> 
> orte_default_hostfile=
> 
> That should fix it.
> 
yes it fix it.

Thanks,
Stefan Kuhne



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] init of component openib returned failure

2010-05-19 Thread Peter Kruse

Hi,

Jeff Squyres (jsquyres) wrote:

Ok, we've entered the Land of Really Weird - I've never seen a btl work with 
one mpi app and not another.

Some q's:

- are you running both apps on the same nodes?


yes, in fact I'm running them in the same interactive Job.


- is anything else running on the nodes at the same time (e.g., other mpi jobs 
using openfabrics)?


no, the nodes are reserved for testing this at the moment.


- is the imb compiled for ompi 1.4.1?


yes it is.


- can you run ldd on the apps to ensure they're linking to the same libmpi?


8<--

$ ldd IMB-MPI1
linux-vdso.so.1 =>  (0x7fff077ff000)
libmpi.so.0 => /usr/lib/openmpi/1.4.1/gcc/lib/libmpi.so.0 
(0x2b9120a3a000)
libopen-rte.so.0 => /usr/lib/openmpi/1.4.1/gcc/lib/libopen-rte.so.0 
(0x2b9120cf4000)
libopen-pal.so.0 => /usr/lib/openmpi/1.4.1/gcc/lib/libopen-pal.so.0 
(0x2b9120f43000)

libdl.so.2 => /lib/libdl.so.2 (0x2b91211c6000)
libnsl.so.1 => /lib/libnsl.so.1 (0x2b91213ca000)
libutil.so.1 => /lib/libutil.so.1 (0x2b91215e2000)
libm.so.6 => /lib/libm.so.6 (0x2b91217e6000)
libpthread.so.0 => /lib/libpthread.so.0 (0x2b9121a69000)
libc.so.6 => /lib/libc.so.6 (0x2b9121c85000)
/lib64/ld-linux-x86-64.so.2 (0x2b912081d000)
$ cd ../../osu_benchmarks/
$ ldd osu_lat_ompi-1.4.1
linux-vdso.so.1 =>  (0x765ff000)
libmpi.so.0 => /usr/lib/openmpi/1.4.1/gcc/lib/libmpi.so.0 
(0x2b4f69ec8000)
libopen-rte.so.0 => /usr/lib/openmpi/1.4.1/gcc/lib/libopen-rte.so.0 
(0x2b4f6a182000)
libopen-pal.so.0 => /usr/lib/openmpi/1.4.1/gcc/lib/libopen-pal.so.0 
(0x2b4f6a3d1000)

libdl.so.2 => /lib/libdl.so.2 (0x2b4f6a654000)
libnsl.so.1 => /lib/libnsl.so.1 (0x2b4f6a858000)
libutil.so.1 => /lib/libutil.so.1 (0x2b4f6aa7)
libm.so.6 => /lib/libm.so.6 (0x2b4f6ac74000)
libpthread.so.0 => /lib/libpthread.so.0 (0x2b4f6aef7000)
libc.so.6 => /lib/libc.so.6 (0x2b4f6b113000)
/lib64/ld-linux-x86-64.so.2 (0x2b4f69cab000)

8<--



-jms
Sent from my PDA.  No type good.


thanks for going through this trouble to reply!

Peter



- Original Message -
From: users-boun...@open-mpi.org 
To: Open MPI Users 
Sent: Wed May 19 02:45:58 2010
Subject: Re: [OMPI users] init of component openib returned failure

Hello,

thanks for your reply.

Jeff Squyres wrote:

Try running with:

mpirun.openmpi-1.4.1 --mca btl_base_verbose 50  --mca btl self,openib -n 2 
--mca btl_openib_verbose 100 ./IMB-MPI1 -npmin 2 PingPong


the output is exactly the same as before.


Also, are you saying that running the same command line with osu_latency works 
just fine?  That would be really weird...


Yes, if I run:

mpirun.openmpi-1.4.1 --mca btl_base_verbose 50 --mca btl self,openib -n 2 
--mca btl_openib_verbose 100 ./osu_lat_ompi-1.4.1


the openib component can be initialized:

8<--

[beo-15:29479] mca: base: components_open: Looking for btl components
[beo-16:29063] mca: base: components_open: Looking for btl components
[beo-15:29479] mca: base: components_open: opening btl components
[beo-15:29479] mca: base: components_open: found loaded component openib
[beo-15:29479] mca: base: components_open: component openib has no register 
function
[beo-15:29479] mca: base: components_open: component openib open function 
successful

[beo-15:29479] mca: base: components_open: found loaded component self
[beo-15:29479] mca: base: components_open: component self has no register 
function
[beo-15:29479] mca: base: components_open: component self open function 
successful
[beo-16:29063] mca: base: components_open: opening btl components
[beo-16:29063] mca: base: components_open: found loaded component openib
[beo-16:29063] mca: base: components_open: component openib has no register 
function
[beo-16:29063] mca: base: components_open: component openib open function 
successful

[beo-16:29063] mca: base: components_open: found loaded component self
[beo-16:29063] mca: base: components_open: component self has no register 
function
[beo-16:29063] mca: base: components_open: component self open function 
successful
[beo-15:29479] select: initializing btl component openib
[beo-16:29063] select: initializing btl component openib
[beo-15][[12785,1],0][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x02c9, part ID 25204
[beo-15][[12785,1],0][btl_openib_ini.c:185:ompi_btl_openib_ini_query] Found 
corresponding INI values: Mellanox Sinai Infinihost III
[beo-15][[12785,1],0][btl_openib_ini.c:166:ompi_btl_openib_ini_query] Querying 
INI files for vendor 0x, part ID 0
[beo-15][[12785,1],0][btl_openib_ini.c:185:ompi

[OMPI users] OpenMPI + IB makes problem (btl_openib_component.c)

2010-05-19 Thread Dr. Vincent Keller
Dear list,

One of our users faces problems running his application (large CP2K cases)

Cluster:
OpenMPI 1.4.2, SLES 9, gcc 4.1.2, OFED 1.4 on Intel Nehalem (5350)

The message is:

[[45776,1],214][btl_openib_component.c:2951:handle_wc] from node140 to:
node400 error polling LP CQ with status LOCAL QP OPERATION ERROR status
number 2 for wr_id 250502144 opcode 1  vendor error 103 qp_idx 0

OpenMPI has been compiled using the following flags:

./configure --prefix=/som/prefix/dir --enable-branch-probabilities
--enable-mem-debug --enable-mem-profile --enable-picky --enable-peruse
--enable-per-user-config-files --enable-cxx-exceptions
--enable-mpi-threads --enable-openib-ibcm --enable-openib-rdmacm --with-sge

Any idea why and/or if something is wrong in the configuration ? Any fix ?

Thanks in advance

Best regards
Vince

-- 
---
Dr. Vincent KELLER

Universität Zürich
   http://www.hpcn.uzh.ch
ADDRESS:   Winterthurstrasse 190
   CH - 8057 Zürich
   Switzerland
PHONE  :   + 41 (0) 44/635'40'37
FAX:   + 41 (0) 44/635'45'05


Re: [OMPI users] init of component openib returned failure

2010-05-19 Thread Jeff Squyres
On May 19, 2010, at 7:18 AM, Peter Kruse wrote:

> > - are you running both apps on the same nodes?
> 
> yes, in fact I'm running them in the same interactive Job.

This is truly freaky; I've seen the same thing on Peter's computer.

We're working the issue off-list; we'll post the solution when we figure it out.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] openmpi + share points

2010-05-19 Thread Christophe Peyret
Hello,

Thank for the advice, it works with NFS !

But :

1) it doesn't work anymore, if I remove --prefix /Network/opt/openmpi-1.4.2 (is 
there a way to remove it on OSX, already declared  ?)

2) I must use the option -static-intel at link else i have a problem with 
libiomp5.dylib not found

Christophe


Message: 3
List-Post: users@lists.open-mpi.org
Date: Sat, 15 May 2010 08:14:42 -0400
From: Jeff Squyres 
Subject: Re: [OMPI users] openmpi + share points
To: "Open MPI Users" 
Message-ID: <6b0cfd91-0e2a-498c-97bd-c6eb974f9...@cisco.com>
Content-Type: text/plain; charset=us-ascii

Sorry for the delay in replying.

It is probably much easier to NFS share the installation directory so that the 
exact same installation directory is available on all nodes.  For example, if 
you installed OMPI into /opt/openmpi-1.4.2, then make /opt/openmpi-1.4.2 
available on all nodes (even if they're mounted network shares).

Can you try that?


On May 10, 2010, at 9:04 AM, Christophe Peyret wrote:

> Hello,
> 
> I am building a cluster with 6 Apple xserve running OSX Server 10.6 :
> 
> node1.cluster
> node2.cluster
> node3.cluster
> node4.cluster
> node5.cluster
> node6.cluster
> 
> I've intalled openmpi in directory /opt/openmpi-1.4.2 of node1 then I made a 
> share point of /opt -> /Network/opt and define variables
> 
> export MPI_HOME=/Network/opt/openmpi-1.4.2
> export OPAL_PREFIX=/Network/opt/openmpi-1.4.2
> 
> I can access to openmpi from all nodes. However, I still face a problem when 
> I launch a computation
> 
> mpirun --prefix /Network/opt/openmpi-1.4.2 -n 4 -hostfile ~peyret/hostfile  
> space64 -f Test/cfm56_hp_Rigid/cfm56_hp_Rigid.def -fast
> 
> is returns me the error message :
> 
> [node2.cluster:09163] mca: base: component_find: unable to open 
> /Network/opt/openmpi-1.4.2/lib/openmpi/mca_odls_default: file not found 
> (ignored)
> [node4.cluster:08867] mca: base: component_find: unable to open 
> /Network/opt/openmpi-1.4.2/lib/openmpi/mca_odls_default: file not found 
> (ignored)
> [node3.cluster:08880] mca: base: component_find: unable to open 
> /Network/opt/openmpi-1.4.2/lib/openmpi/mca_odls_default: file not found 
> (ignored)
> 
> any idea ?
> 
> Christophe
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/



ONERA - DSNA - PS3A
Christophe Peyret - Ingénieur de Recherche
29 ave de la Division Leclerc F92320 Chatillon
Tel. : +331 4673 4778   Fax : +331 4673 4166
Blog intranet : http://santafe.onera

N'imprimez ce document que si nécessaire.







[OMPI users] Execution don't go back to shell after MPI_Finalize()

2010-05-19 Thread Yves Caniou
Dear all,

I use the following code:
#include "stdlib.h"
#include "stdio.h"
#include "mpi.h"
#include "math.h"

#include "unistd.h" /* sleep */

int my_num, mpi_size ;

int
main(int argc, char *argv[])
{
  MPI_Init(&argc, &argv) ;

  MPI_Comm_rank(MPI_COMM_WORLD, &my_num);
  MPI_Comm_size(MPI_COMM_WORLD, &mpi_size);
  printf("%d calls MPI_Finalize()\n\n\n", my_num) ;

  MPI_Finalize() ;
}

I compile and run the code on two different architectures with their own 
version/installation of open-mpi, with the command lines:
$>mpicc -lm --std=c99 basis.c
$>mpiexec -n 1 a.out

On numerous runs of the executable, even with a nbproc equal to 1:

- Using a debian open-mpi v 1.2.7rc2 installation, my code always returns to 
shell after the call to MPI_Finalize()
Kernel is 2.6.34 SMP, Intel P9600 2 cores

- Using a homemade open-mpi v 1.4.2 installation, my code runs as expected, 
but instead of returning to the shell after the MPI_Finalize(), it can just 
hangs in Sl+ state.
Kernel is 2.6.18-53.1.19.el5 SMP (RedHat), Quad-Core AMD Opteron 8356

I join the ompi_info of the 2 archs. I surely miss something... but what?

.Yves.
>ompi_info
Open MPI: 1.2.7rc2
   Open MPI SVN revision: r18788
Open RTE: 1.2.7rc2
   Open RTE SVN revision: r18788
OPAL: 1.2.7rc2
   OPAL SVN revision: r18788
  Prefix: /usr
 Configured architecture: x86_64-pc-linux-gnu
   Configured by: pbuilder
   Configured on: Mon Aug 25 21:27:31 UTC 2008
  Configure host: charlie
Built by: root
Built on: Mon Aug 25 21:34:47 UTC 2008
  Built host: charlie
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
  Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: no
   MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.7)
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.7)
   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.7)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.7)
   MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.7)
 MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.7)
 MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.7)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.7)
MCA coll: self (MCA v1.0, API v1.0, Component v1.2.7)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.7)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.7)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.7)
   MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.7)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.7)
 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.7)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.7)
 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.7)
  MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.7)
 MCA btl: openib (MCA v1.0, API v1.0.1, Component v1.2.7)
 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.7)
 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.7)
 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.7)
 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.7)
  MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.7)
  MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.7)
  MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.7)
 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.7)
 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.7)
 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.7)
 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.7)
 MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.7)
  MCA ns: proxy (MCA v1.0, API v2.0, Componen

Re: [OMPI users] openmpi + share points

2010-05-19 Thread Jeff Squyres
On May 19, 2010, at 5:12 AM, Christophe Peyret wrote:

> 1) it doesn't work anymore, if I remove --prefix /Network/opt/openmpi-1.4.2 
> (is there a way to remove it on OSX, already declared  ?)

I'm assuming you're referring to shared libraries and/or executables not being 
found if you don't specify the --prefix.

There's several ways to avoid having to list --prefix -- see the FAQ, such as 
questions 1-4 on this page:

http://www.open-mpi.org/faq/?category=running

> 2) I must use the option -static-intel at link else i have a problem with 
> libiomp5.dylib not found

Yep.  Or install the support intel libs on each node.  Either way works fine.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] Some Questions on Building OMPI on Linux Em64t

2010-05-19 Thread Michael E. Thomadakis

Hello,

I would like to build OMPI V1.4.2 and make it available to our users at the
Supercomputing Center at TAMU. Our system is a 2-socket, 4-core Nehalem
@2.8GHz, 24GiB DRAM / node, 324 nodes connected to 4xQDR Voltaire fabric,
CentOS/RHEL 5.4.



I have been trying to find the following information :

1) high-resolution timers: how do I specify the HRT linux timers in the
--with-timer=TYPE
 line of ./configure ?

2) I have installed blcr V0.8.2 but when I try to built OMPI and I point to the
full installation it complains it cannot find it. Note that I build BLCR with
GCC but I am building OMPI with Intel compilers (V11.1)


3) Does OMPI by default use SHM for intra-node message IPC but revert to IB for
inter-node ?

4) How could I select the high-speed transport, say DAPL or OFED IB verbs ? Is
there any preference as to the specific high-speed transport over QDR IB?

5) When we launch MPI jobs via PBS/TORQUE do we have control on the task and
thread placement on nodes/cores ?

6) Can we suspend/restart cleanly OMPI jobs with the above scheduler ? Any
caveats on suspension / resumption of OMPI jobs ?

7) Do you have any performance data comparing OMPI vs say MVAPICVHv2 and
IntelMPI ? This is not a political issue since I am groing to be providing all
these MPI stacks to our users.




Thank you so much for the great s/w ...

best
Michael



%  \
% Michael E. Thomadakis, Ph.D.  Senior Lead Supercomputer Engineer/Res \
% E-mail: miket AT tamu DOT edu   Texas A&M University \
% web:http://alphamike.tamu.edu  Supercomputing Center \
% Voice:  979-862-3931Teague Research Center, 104B \
% FAX:979-847-8643  College Station, TX 77843, USA \
%  \



[OMPI users] Allgather in inter-communicator bug,

2010-05-19 Thread Battalgazi YILDIRIM
Hi,


I am trying to use intercommunicator ::Allgather between two child process.
I have fortran and Python code,
I am using mpi4py for python. It seems that ::Allgather is not working
properly in my desktop.

 I have contacted first mpi4py developers (Lisandro Dalcin), he simplified
my problem and provided two example files (python.py and fortran.f90,
please see below).

We tried with different MPI vendors, the following example worked correclty(
it means the final print out should be array('i', [1, 2, 3, 4, 5, 6, 7, 8])
)

However, it is not giving correct answer in my two desktop (Redhat and
ubuntu) both
using OPENMPI

Could yo look at this problem please?

If you want to follow our discussion before you, you can go to following
link:
http://groups.google.com/group/mpi4py/browse_thread/thread/c17c660ae56ff97e

yildirim@memosa:~/python_intercomm$ more python.py
from mpi4py import MPI
from array import array
import os

progr = os.path.abspath('a.out')
child = MPI.COMM_WORLD.Spawn(progr,[], 8)
n = child.remote_size
a = array('i', [0]) * n
child.Allgather([None,MPI.INT],[a,MPI.INT])
child.Disconnect()
print a

yildirim@memosa:~/python_intercomm$ more fortran.f90
program main
 use mpi
 implicit none
 integer :: parent, rank, val, dummy, ierr
 call MPI_Init(ierr)
 call MPI_Comm_get_parent(parent, ierr)
 call MPI_Comm_rank(parent, rank, ierr)
 val = rank + 1
 call MPI_Allgather(val,   1, MPI_INTEGER, &
dummy, 0, MPI_INTEGER, &
parent, ierr)
 call MPI_Comm_disconnect(parent, ierr)
 call MPI_Finalize(ierr)
end program main

yildirim@memosa:~/python_intercomm$ mpif90 fortran.f90
yildirim@memosa:~/python_intercomm$ python python.py
array('i', [0, 0, 0, 0, 0, 0, 0, 0])


-- 
B. Gazi YILDIRIM


[OMPI users] How to show outputs from MPI program that runs on a cluster?

2010-05-19 Thread Sang Chul Choi
Hi,

I am wondering if there is a way to run a particular process among multiple 
processes on the console of a linux cluster.

I want to see the screen output (standard output) of a particular process 
(using a particular ID of a process) on the console screen while the MPI 
program is running.  I think that if I run a MPI program on a linux cluster 
using Sun Grid Engine, the particular process that prints out to standard 
output could run on the console or computing node.   And, it would be hard to 
see screen output of the particular process.  Is there a way to to set one 
process aside and to run it on the console in Sun Grid Engine?

When I run the MPI program on my desktop with quad cores, I can set aside one 
process using an ID to print information that I need.  I do not know how I 
could do that in much larger scale like using Sun Grid Engine.  I could let one 
process print out in a file and then I could see it.  I do not know how I could 
let one process to print out on the console screen by setting it to run on the 
console using Sun Grid Engine or any other similar thing such as PBS.  I doubt 
that a cluster would allow jobs to run on the console because then others users 
would have to be in trouble in submitting jobs.  If this is the case, there 
seem no way to print out on the console.   Then, do I have to have a separate 
(non-MPI) program that can communicate with MPI program using TCP/IP by running 
the separate program on the master node of a cluster?  This separate non-MPI 
program may then communicate sporadically with the MPI program.  I do not know 
if this is a general approach or a peculiar way.

I will appreciate any of input.

Thank you,

Sang Chul