[O-MPI users] Error on mpirun in Redhat Fedora Core 4

2005-11-10 Thread Clement Chu

Hi,

 I got an error when tried the mpirun on mpi program.  The following is
the error message:

[clement@kfc TestMPI]$ mpicc -g -o test main.c
[clement@kfc TestMPI]$ mpirun -np 2 test
mpirun noticed that job rank 1 with PID 0 on node "localhost" exited on
signal 11.
[kfc:28466] ERROR: A daemon on node localhost failed to start as expected.
[kfc:28466] ERROR: There may be more information available from
[kfc:28466] ERROR: the remote shell (see above).
[kfc:28466] The daemon received a signal 11.
1 additional process aborted (not shown)
[clement@kfc TestMPI]$

I am using openmpi-1.0rc4 and running on Linux Redhat Fedora Core 4.
The kernal is 2.6.12-1.1456_FC4.  My building procedure is as below:
1.  ./configure --prefix=/home/clement/openmpi --with-devel-headers
2.  make all install
3.  login root.  add openmpi's path and lib in /etc/bashrc
4.  see the $PATH and $LD_LIBRARY_PATH as below
[clement@kfc TestMPI]$ echo $PATH
/usr/java/jdk1.5.0_05/bin:/home/clement/openmpi/bin:/usr/java/jdk1.5.0_05/bin:/home/clement/mpich-1.2.7/bin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/clement/bin
[clement@kfc TestMPI]$ echo $LD_LIBRARY_PATH
/home/clement/openmpi/lib
[clement@kfc TestMPI]$

5.  go to mpi program's directory
6.  mpicc -g -o test main.c
7.  mpirun -np 2 test

Any idea of this problem.  Many thanks.

Regards,
Clement

--
Clement Kam Man Chu
Research Assistant
School of Computer Science & Software Engineering
Monash University, Caulfield Campus
Ph: 61 3 9903 1964




Re: [O-MPI users] Error on mpirun in Redhat Fedora Core 4

2005-11-10 Thread Clement Chu

I have tried the latest version (rc5 8053), but the error is still here.

Jeff Squyres wrote:

We've actually made quite a few bug fixes since RC4 (RC5 is not  
available yet).  Would you mind trying with a nightly snapshot tarball?


(there were some SVN commits last night after the nightly snapshot was  
made; I've just initiated another snapshot build -- r8085 should be on  
the web site within an hour or so)



On Nov 10, 2005, at 4:38 AM, Clement Chu wrote:

 


Hi,

 I got an error when tried the mpirun on mpi program.  The following  
is

the error message:

[clement@kfc TestMPI]$ mpicc -g -o test main.c
[clement@kfc TestMPI]$ mpirun -np 2 test
mpirun noticed that job rank 1 with PID 0 on node "localhost" exited on
signal 11.
[kfc:28466] ERROR: A daemon on node localhost failed to start as  
expected.

[kfc:28466] ERROR: There may be more information available from
[kfc:28466] ERROR: the remote shell (see above).
[kfc:28466] The daemon received a signal 11.
1 additional process aborted (not shown)
[clement@kfc TestMPI]$

I am using openmpi-1.0rc4 and running on Linux Redhat Fedora Core 4.
The kernal is 2.6.12-1.1456_FC4.  My building procedure is as below:
1.  ./configure --prefix=/home/clement/openmpi --with-devel-headers
2.  make all install
3.  login root.  add openmpi's path and lib in /etc/bashrc
4.  see the $PATH and $LD_LIBRARY_PATH as below
[clement@kfc TestMPI]$ echo $PATH
/usr/java/jdk1.5.0_05/bin:/home/clement/openmpi/bin:/usr/java/ 
jdk1.5.0_05/bin:/home/clement/mpich-1.2.7/bin:/usr/kerberos/bin:/usr/ 
local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/clement/bin

[clement@kfc TestMPI]$ echo $LD_LIBRARY_PATH
/home/clement/openmpi/lib
[clement@kfc TestMPI]$

5.  go to mpi program's directory
6.  mpicc -g -o test main.c
7.  mpirun -np 2 test

Any idea of this problem.  Many thanks.

Regards,
Clement

--
Clement Kam Man Chu
Research Assistant
School of Computer Science & Software Engineering
Monash University, Caulfield Campus
Ph: 61 3 9903 1964


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

   



 




--
Clement Kam Man Chu
Research Assistant
School of Computer Science & Software Engineering
Monash University, Caulfield Campus
Ph: 61 3 9903 1964



Re: [O-MPI users] Error on mpirun in Redhat Fedora Core 4

2005-11-10 Thread Clement Chu
 btl: self (MCA v1.0, API v1.0, Component v1.0)
MCA btl: sm (MCA v1.0, API v1.0, Component v1.0)
MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
   MCA topo: unity (MCA v1.0, API v1.0, Component v1.0)
MCA gpr: null (MCA v1.0, API v1.0, Component v1.0)
MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0)
MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0)
MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0)
MCA iof: svc (MCA v1.0, API v1.0, Component v1.0)
 MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0)
 MCA ns: replica (MCA v1.0, API v1.0, Component v1.0)
MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.0)
MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.0)
MCA ras: localhost (MCA v1.0, API v1.0, Component v1.0)
MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0)
MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0)
MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0)
  MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0)
   MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0)
   MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0)
MCA rml: oob (MCA v1.0, API v1.0, Component v1.0)
MCA pls: fork (MCA v1.0, API v1.0, Component v1.0)
MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0)
MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0)
MCA pls: slurm (MCA v1.0, API v1.0, Component v1.0)
MCA sds: env (MCA v1.0, API v1.0, Component v1.0)
MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0)
MCA sds: seed (MCA v1.0, API v1.0, Component v1.0)
MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0)
MCA sds: slurm (MCA v1.0, API v1.0, Component v1.0)
[clement@kfc TestMPI]$



Jeff Squyres wrote:


I'm sorry -- I wasn't entirely clear:

1. Are you using a 1.0 nightly tarball or a 1.1 nightly tarball?  We 
have made a bunch of fixes to the 1.1 tree (i.e., the Subversion 
trunk), but have not fully vetted them yet, so they have not yet been 
taken to the 1.0 release branch yet.  If you have not done so already, 
could you try a tarball from the trunk?  
http://www.open-mpi.org/nightly/trunk/


2. The error you are seeing looks like a proxy process is failing to 
start because it seg faults.  Are you getting corefiles?  If so, can 
you send the backtrace?  The corefile should be from the 
$prefix/bin/orted executable.


3. Failing that, can you run with the "-d" switch?  It should give a 
bunch of debugging output that might be helpful.  "mpirun -d -np 2 
./test", for example.


4. Also please send the output of the "ompi_info" command.


On Nov 10, 2005, at 9:05 AM, Clement Chu wrote:

 

I have tried the latest version (rc5 8053), but the error is still 
here.


Jeff Squyres wrote:

   


We've actually made quite a few bug fixes since RC4 (RC5 is not
available yet).  Would you mind trying with a nightly snapshot 
tarball?


(there were some SVN commits last night after the nightly snapshot was
made; I've just initiated another snapshot build -- r8085 should be on
the web site within an hour or so)


On Nov 10, 2005, at 4:38 AM, Clement Chu wrote:



 


Hi,

I got an error when tried the mpirun on mpi program.  The following
is
the error message:

[clement@kfc TestMPI]$ mpicc -g -o test main.c
[clement@kfc TestMPI]$ mpirun -np 2 test
mpirun noticed that job rank 1 with PID 0 on node "localhost" exited 
on

signal 11.
[kfc:28466] ERROR: A daemon on node localhost failed to start as
expected.
[kfc:28466] ERROR: There may be more information available from
[kfc:28466] ERROR: the remote shell (see above).
[kfc:28466] The daemon received a signal 11.
1 additional process aborted (not shown)
[clement@kfc TestMPI]$

I am using openmpi-1.0rc4 and running on Linux Redhat Fedora Core 4.
The kernal is 2.6.12-1.1456_FC4.  My building procedure is as below:
1.  ./configure --prefix=/home/clement/openmpi --with-devel-headers
2.  make all install
3.  login root.  add openmpi's path and lib in /etc/bashrc
4.  see the $PATH and $LD_LIBRARY_PATH as below
[clement@kfc TestMPI]$ echo $PATH
/usr/java/jdk1.5.0_05/bin:/home/clement/openmpi/bin:/usr/java/
jdk1.5.0_05/bin:/home/clement/mpich-1.2.7/bin:/usr/kerberos/bin:/usr/
local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/clement/bin
[clement@kfc TestMPI]$ echo $LD_LIBRARY_PATH
/home/clement/openmpi/lib
[clement@kfc TestMPI]$

5.  go to mpi program's directory
6.  mpicc -g -o test main.c
7.  mpirun -np 2 test

Any idea of this problem.  Many thanks.
   



 




--
Clement Kam Man Chu
Research Assistant

Re: [O-MPI users] Error on mpirun in Redhat Fedora Core 4

2005-11-10 Thread Clement Chu
ng symbols from 
/home/clement/openmpi/lib/openmpi/mca_pls_slurm.so...done.

Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_pls_slurm.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_iof_svc.so...done.

Loaded symbols for /home/clement/openmpi/lib/openmpi/mca_iof_svc.so
#0  0x00e2a075 in orte_pls_rsh_launch ()
  from /home/clement/openmpi/lib/openmpi/mca_pls_rsh.so
(gdb) where
#0  0x00e2a075 in orte_pls_rsh_launch ()
  from /home/clement/openmpi/lib/openmpi/mca_pls_rsh.so
#1  0x0042b656 in orte_rmgr_urm_spawn ()
  from /home/clement/openmpi/lib/openmpi/mca_rmgr_urm.so
#2  0x0804a10c in orterun (argc=4, argv=0xbf983d54) at orterun.c:373
#3  0x08049b4e in main (argc=4, argv=0xbf983d54) at main.c:13
(gdb)





Jeff Squyres wrote:


I'm sorry -- I wasn't entirely clear:

1. Are you using a 1.0 nightly tarball or a 1.1 nightly tarball?  We 
have made a bunch of fixes to the 1.1 tree (i.e., the Subversion 
trunk), but have not fully vetted them yet, so they have not yet been 
taken to the 1.0 release branch yet.  If you have not done so already, 
could you try a tarball from the trunk?  
http://www.open-mpi.org/nightly/trunk/


2. The error you are seeing looks like a proxy process is failing to 
start because it seg faults.  Are you getting corefiles?  If so, can 
you send the backtrace?  The corefile should be from the 
$prefix/bin/orted executable.


3. Failing that, can you run with the "-d" switch?  It should give a 
bunch of debugging output that might be helpful.  "mpirun -d -np 2 
./test", for example.


4. Also please send the output of the "ompi_info" command.


On Nov 10, 2005, at 9:05 AM, Clement Chu wrote:

 

I have tried the latest version (rc5 8053), but the error is still 
here.


Jeff Squyres wrote:

   


We've actually made quite a few bug fixes since RC4 (RC5 is not
available yet).  Would you mind trying with a nightly snapshot 
tarball?


(there were some SVN commits last night after the nightly snapshot was
made; I've just initiated another snapshot build -- r8085 should be on
the web site within an hour or so)


On Nov 10, 2005, at 4:38 AM, Clement Chu wrote:



 


Hi,

I got an error when tried the mpirun on mpi program.  The following
is
the error message:

[clement@kfc TestMPI]$ mpicc -g -o test main.c
[clement@kfc TestMPI]$ mpirun -np 2 test
mpirun noticed that job rank 1 with PID 0 on node "localhost" exited 
on

signal 11.
[kfc:28466] ERROR: A daemon on node localhost failed to start as
expected.
[kfc:28466] ERROR: There may be more information available from
[kfc:28466] ERROR: the remote shell (see above).
[kfc:28466] The daemon received a signal 11.
1 additional process aborted (not shown)
[clement@kfc TestMPI]$

I am using openmpi-1.0rc4 and running on Linux Redhat Fedora Core 4.
The kernal is 2.6.12-1.1456_FC4.  My building procedure is as below:
1.  ./configure --prefix=/home/clement/openmpi --with-devel-headers
2.  make all install
3.  login root.  add openmpi's path and lib in /etc/bashrc
4.  see the $PATH and $LD_LIBRARY_PATH as below
[clement@kfc TestMPI]$ echo $PATH
/usr/java/jdk1.5.0_05/bin:/home/clement/openmpi/bin:/usr/java/
jdk1.5.0_05/bin:/home/clement/mpich-1.2.7/bin:/usr/kerberos/bin:/usr/
local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/clement/bin
[clement@kfc TestMPI]$ echo $LD_LIBRARY_PATH
/home/clement/openmpi/lib
[clement@kfc TestMPI]$

5.  go to mpi program's directory
6.  mpicc -g -o test main.c
7.  mpirun -np 2 test

Any idea of this problem.  Many thanks.
   



 




--
Clement Kam Man Chu
Research Assistant
School of Computer Science & Software Engineering
Monash University, Caulfield Campus
Ph: 61 3 9903 1964



Re: [O-MPI users] Error on mpirun in Redhat Fedora Core 4

2005-11-10 Thread Clement Chu
Thanks for your help.   kfc is machine name and clement is the username 
of this machine.  Do you think it is the problem?


Then I tried to remove kfc machine and run again.  This time I can run 
mpi program and there is no error message output, but it is no program 
output too.  I think it is something wrong there. 


[clement@localhost TestMPI]$ mpirun -d -np 2 test
[dhcppc0:02954] [0,0,0] setting up session dir with
[dhcppc0:02954] universe default-universe
[dhcppc0:02954] user clement
[dhcppc0:02954] host dhcppc0
[dhcppc0:02954] jobid 0
[dhcppc0:02954] procid 0
[dhcppc0:02954] procdir: 
/tmp/openmpi-sessions-clement@dhcppc0_0/default-universe/0/0
[dhcppc0:02954] jobdir: 
/tmp/openmpi-sessions-clement@dhcppc0_0/default-universe/0
[dhcppc0:02954] unidir: 
/tmp/openmpi-sessions-clement@dhcppc0_0/default-universe[dhcppc0:02954] 
top: openmpi-sessions-clement@dhcppc0_0

[dhcppc0:02954] tmp: /tmp
[dhcppc0:02954] [0,0,0] contact_file 
/tmp/openmpi-sessions-clement@dhcppc0_0/default-universe/universe-setup.txt

[dhcppc0:02954] [0,0,0] wrote setup file
[dhcppc0:02954] spawn: in job_state_callback(jobid = 1, state = 0x1)
[dhcppc0:02954] pls:rsh: local csh: 0, local bash: 1
[dhcppc0:02954] pls:rsh: assuming same remote shell as local shell
[dhcppc0:02954] pls:rsh: remote csh: 0, remote bash: 1
[dhcppc0:02954] pls:rsh: final template argv:
[dhcppc0:02954] pls:rsh: ssh  orted --debug --bootproxy 1 
--name  --num_procs 2 --vpid_start 0 --nodename  
--universe clement@dhcppc0:default-universe --nsreplica 
"0.0.0;tcp://192.168.11.100:32780" --gprreplica 
"0.0.0;tcp://192.168.11.100:32780" --mpi-call-yield 0

[dhcppc0:02954] pls:rsh: launching on node localhost
[dhcppc0:02954] pls:rsh: oversubscribed -- setting mpi_yield_when_idle 
to 1 (1 2)

[dhcppc0:02954] pls:rsh: localhost is a LOCAL node
[dhcppc0:02954] pls:rsh: executing: orted --debug --bootproxy 1 --name 
0.0.1 --num_procs 2 --vpid_start 0 --nodename localhost --universe 
clement@dhcppc0:default-universe --nsreplica 
"0.0.0;tcp://192.168.11.100:32780" --gprreplica 
"0.0.0;tcp://192.168.11.100:32780" --mpi-call-yield 1

[dhcppc0:02955] [0,0,1] setting up session dir with
[dhcppc0:02955] universe default-universe
[dhcppc0:02955] user clement
[dhcppc0:02955] host localhost
[dhcppc0:02955] jobid 0
[dhcppc0:02955] procid 1
[dhcppc0:02955] procdir: 
/tmp/openmpi-sessions-clement@localhost_0/default-universe/0/1
[dhcppc0:02955] jobdir: 
/tmp/openmpi-sessions-clement@localhost_0/default-universe/0
[dhcppc0:02955] unidir: 
/tmp/openmpi-sessions-clement@localhost_0/default-universe

[dhcppc0:02955] top: openmpi-sessions-clement@localhost_0
[dhcppc0:02955] tmp: /tmp
[dhcppc0:02955] sess_dir_finalize: proc session dir not empty - leaving
[dhcppc0:02955] sess_dir_finalize: proc session dir not empty - leaving
[dhcppc0:02955] orted: job_state_callback(jobid = 1, state = 
ORTE_PROC_STATE_TERMINATED)

[dhcppc0:02955] sess_dir_finalize: found proc session dir empty - deleting
[dhcppc0:02955] sess_dir_finalize: found job session dir empty - deleting
[dhcppc0:02955] sess_dir_finalize: found univ session dir empty - deleting
[dhcppc0:02955] sess_dir_finalize: found top session dir empty - deleting
[dhcppc0:02954] spawn: in job_state_callback(jobid = 1, state = 0x9)
[dhcppc0:02954] sess_dir_finalize: found proc session dir empty - deleting
[dhcppc0:02954] sess_dir_finalize: found job session dir empty - deleting
[dhcppc0:02954] sess_dir_finalize: found univ session dir empty - deleting
[dhcppc0:02954] sess_dir_finalize: found top session dir empty - deleting


Clement


Jeff Squyres wrote:

One minor thing that I notice in your ompi_info output -- your build  
and run machines are different (kfc vs. clement).


Are these both FC4 machines, or are they different OS's/distros?


On Nov 10, 2005, at 10:01 AM, Clement Chu wrote:

 


[clement@kfc TestMPI]$ mpirun -d -np 2 test
[kfc:29199] procdir: (null)
[kfc:29199] jobdir: (null)
[kfc:29199] unidir:  
/tmp/openmpi-sessions-clement@kfc_0/default-universe

[kfc:29199] top: openmpi-sessions-clement@kfc_0
[kfc:29199] tmp: /tmp
[kfc:29199] [0,0,0] setting up session dir with
[kfc:29199] tmpdir /tmp
[kfc:29199] universe default-universe-29199
[kfc:29199] user clement
[kfc:29199] host kfc
[kfc:29199] jobid 0
[kfc:29199] procid 0
[kfc:29199] procdir:
/tmp/openmpi-sessions-clement@kfc_0/default-universe-29199/0/0
[kfc:29199] jobdir:
/tmp/openmpi-sessions-clement@kfc_0/default-universe-29199/0
[kfc:29199] unidir:
/tmp/openmpi-sessions-clement@kfc_0/default-universe-29199
[kfc:29199] top: openmpi-sessions-clement@kfc_0
[kfc:29199] tmp: /tmp
[kfc:29199] [0,0,0] contact_file
/tmp/openmpi-sessions-clement@kfc_0/default-universe-29199/universe- 
setup.txt

[kfc:29199] [0,0,0] wrote setup file
[kfc:29199] pls:rsh: local csh: 0, local bash: 1
[kfc:29199] pls:rsh: assuming same remote shell as l

Re: [O-MPI users] Error on mpirun in Redhat Fedora Core 4

2005-11-14 Thread Clement Chu
bols from 
/home/clement/openmpi/lib/openmpi/mca_rmaps_round_robin.so...done.
Loaded symbols for 
/home/clement/openmpi//lib/openmpi/mca_rmaps_round_robin.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_pls_fork.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_pls_fork.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_pls_proxy.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_pls_proxy.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_pls_rsh.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_pls_rsh.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_pls_slurm.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_pls_slurm.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_iof_proxy.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_iof_proxy.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_allocator_basic.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_allocator_basic.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_allocator_bucket.so...done.
Loaded symbols for 
/home/clement/openmpi//lib/openmpi/mca_allocator_bucket.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_rcache_rb.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_rcache_rb.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_mpool_sm.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_mpool_sm.so
Reading symbols from /home/clement/openmpi/lib/libmca_common_sm.so.0...done.
Loaded symbols for /home/clement/openmpi//lib/libmca_common_sm.so.0
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_pml_ob1.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_pml_ob1.so
Reading symbols from /home/clement/openmpi/lib/openmpi/mca_bml_r2.so...done.
Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_bml_r2.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_btl_self.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_btl_self.so
Reading symbols from /home/clement/openmpi/lib/openmpi/mca_btl_sm.so...done.
Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_btl_sm.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_btl_tcp.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_btl_tcp.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_ptl_self.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_ptl_self.so
Reading symbols from /home/clement/openmpi/lib/openmpi/mca_ptl_sm.so...done.
Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_ptl_sm.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_ptl_tcp.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_ptl_tcp.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_coll_basic.so...done.Loaded 
symbols for /home/clement/openmpi//lib/openmpi/mca_coll_basic.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_coll_hierarch.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_coll_hierarch.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_coll_self.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_coll_self.so
Reading symbols from 
/home/clement/openmpi/lib/openmpi/mca_coll_sm.so...done.

Loaded symbols for /home/clement/openmpi//lib/openmpi/mca_coll_sm.so
#0  0x00324a60 in mca_btl_sm_add_procs_same_base_addr (btl=0x328200, 
nprocs=2,
   procs=0x95443a8, peers=0x95443d8, reachability=0xbf993994) at 
btl_sm.c:412

412 mca_btl_sm_component.sm_ctl_header->segment_header.
(gdb) where
#0  0x00324a60 in mca_btl_sm_add_procs_same_base_addr (btl=0x328200, 
nprocs=2,
   procs=0x95443a8, peers=0x95443d8, reachability=0xbf993994) at 
btl_sm.c:412

#1  0x00365fad in mca_bml_r2_add_procs (nprocs=2, procs=0x95443a8,
   bml_endpoints=0x9544388, reachable=0xbf993994) at bml_r2.c:220
#2  0x007ba346 in mca_pml_ob1_add_procs (procs=0x9544378, nprocs=2)
   at pml_ob1.c:131
#3  0x00d3df0b in ompi_mpi_init (argc=1, argv=0xbf993c74, requested=0,
   provided=0xbf993a44) at runtime/ompi_mpi_init.c:396
#4  0x00d59ab8 in PMPI_Init (argc=0xbf993bf0, argv=0xbf993bf4) at pinit.c:71
#5  0x08048904 in main (argc=1, argv=0xbf993c74) at cpi.c:20
(gdb)


I attached the mpi program.  I do hope you can help me.  Many thanks.


Clement



Jeff Squyres wrote:

One minor thing that I notice in your ompi_info output -- your build  
and run machines are different (kfc vs. clement).


Are these both FC4 machines, or are they different OS's/distros?


On Nov 10, 2005, at 10:01 AM, Clement Chu wrote:

 


[clement@kfc TestMPI]$ mpirun -d -np 2 test
[kfc:29199] procdir: (null)
[kfc:29199] jobdir: (null)
[kfc:29199] unidir:  
/tmp/openmpi-sessions-clement@kfc_0/default-universe

[kfc:29199] top: openmpi-sessions-clement@kfc_0
[kfc:29199] tmp: /tmp
[k

[O-MPI users] Anyone installed openmpi in Redhat 4?

2005-11-16 Thread Clement Chu

Hi,

   Did anyone installed openmpi in Redhat Core 4?  I got a major 
problem to run mpi program with openmpi in RH 4.  I would like to share 
your experience.


Regards,
Clement

--
Clement Kam Man Chu
Research Assistant
School of Computer Science & Software Engineering
Monash University, Caulfield Campus
Ph: 61 3 9903 1964



Re: [O-MPI users] Error on mpirun in Redhat Fedora Core 4

2005-11-17 Thread Clement Chu

Thanks Jeff.  The problem is solved on the latest version (8172).

Clement

Jeff Squyres wrote:


Clement --

Sorry for the delay in replying.  We're running around crazy here at 
SC, which pretty much keeps us away from e-mail except early in the 
morning and late at night.


We fixed a bunch of things in the sm btl as of r8136 (someone reported 
similar issues as you, and we took the exchange off-list to fix).  The 
problems could definitely affect correctness and cause segv's similar 
to what you were seeing (see 
http://www.open-mpi.org/community/lists/users/2005/11/0326.php for a 
little more info).


I notice that you're running 8113 here -- could you try the latest 
nightly snapshot or rc and see if the same problems occur?


Thanks for your patience!


On Nov 14, 2005, at 4:51 AM, Clement Chu wrote:

 


Hi Jeff,

  I tried the rc6 and trunk nightly 8150.  I got the same problem.  I 
copied the message from terminal as below.


[clement@localhost testmpi]$ ompi_info
  Open MPI: 1.1a1r8113
 Open MPI SVN revision: r8113
  Open RTE: 1.1a1r8113
 Open RTE SVN revision: r8113
  OPAL: 1.1a1r8113
 OPAL SVN revision: r8113
Prefix: /home/clement/openmpi/
Configured architecture: i686-pc-linux-gnu
 Configured by: clement
 Configured on: Mon Nov 14 10:12:12 EST 2005
Configure host: localhost
  Built by: clement
  Built on: Mon Nov 14 10:28:21 EST 2005
Built host: localhost
C bindings: yes
  C++ bindings: yes
Fortran77 bindings: yes (all)
Fortran90 bindings: yes
C compiler: gcc
   C compiler absolute: /usr/bin/gcc
  C++ compiler: g++
 C++ compiler absolute: /usr/bin/g++
Fortran77 compiler: gfortran
Fortran77 compiler abs: /usr/bin/gfortran
Fortran90 compiler: gfortran
Fortran90 compiler abs: /usr/bin/gfortran
   C profiling: yes
 C++ profiling: yes
   Fortran77 profiling: yes
   Fortran90 profiling: yes
C++ exceptions: no
Thread support: posix (mpi: no, progress: no)
Internal debug support: no
   MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
   libltdl support: 1
MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component 
v1.1)

 MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
 MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1)
 MCA timer: linux (MCA v1.0, API v1.0, Component v1.1)
 MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
 MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
  MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
  MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1)
  MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
  MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
 MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
   MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
   MCA pml: teg (MCA v1.0, API v1.0, Component v1.1)
   MCA pml: uniq (MCA v1.0, API v1.0, Component v1.1)
   MCA ptl: self (MCA v1.0, API v1.0, Component v1.1)
   MCA ptl: sm (MCA v1.0, API v1.0, Component v1.1)
   MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.1)
   MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
   MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
   MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
  MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
   MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
   MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
   MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)
   MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)
   MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)
MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)
MCA ns: replica (MCA v1.0, API v1.0, Component v1.1)
   MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
   MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1)
   MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1)
   MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1)
   MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1)
   MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1)
   MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1)
 MCA rmaps: round_robin (MCA v1.0, API v1.0, Component 
v1.1)

  MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1)
  MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1)
   MCA rml: oob (MCA v1.0, API v1.0, Component v1.1)
   MCA pls: fork (MCA v1.0, API