[OMPI users] orted daemon no found! --- environment not passed to slave nodes?

2012-02-27 Thread yanyg
Greetings! I have tried to run ring_c example test from a bash script. In this bash script, I setup PATH and LD_LIBRARY_PATH(I donot want to disturb ~/.bashrc, etc), then use a full path of mpirun to invoke mpi processes, the mpirun and orted are both on the PATH. However, from the Open MPI me

Re: [OMPI users] help: sm btl does not work when I specify the same host twice or more in the node list

2012-02-16 Thread yanyg
OK, with Jeff's kind help, I solved this issue in a very simple way. Now I would like to report back the reason for this issue and the solution. (1) The scenario under which this issue happened: In my OPMI environment, the $TMPDIR envar is set to different scratch directory for different MPI

Re: [OMPI users] help: sm btl does not work when I specify the same host twice or more in the node list

2012-02-15 Thread yanyg
> So the real issue is: the sm BTL is not working for you. > Yes. > What version of Open MPI are you using? > It is 1.4.3 I am using. > Can you rm -rf any Open MPI directories that may be left over in /tmp? Yes, I have tried that. The clean up does not help to make sm btl work.

Re: [OMPI users] help: sm btl does not work when I specify the same host twice or more in the node list

2012-02-15 Thread yanyg
> No, there are no others you need to set. Ralph's referring to the fact > that we set OMPI environment variables in the processes that are > started on the remote nodes. > > I was asking to ensure you hadn't set any MCA parameters in the > environment that could be creating a problem. Do you have

Re: [OMPI users] help: sm btl does not work when I specify the same host twice or more in the node list

2012-02-14 Thread yanyg
Yes, in short, I start a c-shell script from bash command line, in which I mpirun another c-shell script which start the computing process. The only OMPI related envars are PATH and LD_LIBRARY_PATH. Any other OPMI envars I should set?

Re: [OMPI users] help: sm btl does not work when I specify the same host twice or more in the node list

2012-02-14 Thread yanyg
Hi Ralph, Could you please tell me what OMPI envars are broken? or what OMPI envars should be there for OMPI to work properly? Although I start my c-shell script from a bash command line(not sure if this matters), I only add Open MPI executable and lib path to $PATH and $LD_LIBRARY_PATH, no ot

Re: [OMPI users] help: sm btl does not work when I specify the same host twice or more in the node list

2012-02-14 Thread yanyg
Hi Jeff, The command "env | grep OMPI" output nothing but a blank line from my script. Anything I should set for mpirun? On the other hand, you may get reminded that I found you discussed some similar issue with Jonathan Dursi. The difference is that when I tried with --mca btl_sm_num_fifos #(

Re: [OMPI users] help: sm btl does not work when I specify the same host twice or more in the node list

2012-02-13 Thread yanyg
Hi Jeff, Thank you very much for your help! I tried to run the same test of ring_c from standard examples in Open MPI 1.4.3 distribution. If I ran as you described from the command line, it worked without any problem with sm btl included(with --mca btl self,sm,openib). However, if I use sm bt

[OMPI users] help: sm btl does not work when I specify the same host twice or more in the node list

2012-02-09 Thread yanyg
Hi all, Good morning! I have trouble to communicate through sm btl in open MPI, please check the attached file for my system information. I am using open MPI 1.4.3, intel compilers V11.1, on linux RHEL 5.4 with kernel 2.6. The tests are the following: (1) if I specify the btl to mpirun by "-

Re: [OMPI users] Error-Open MPI over Infiniband: polling LP CQ with status LOCAL LENGTH ERROR

2011-07-11 Thread yanyg
Hi Yevgeny, Thanks. Here is the output of /usr/bin/ibv_devinfo: hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.8.000 node_guid: 0002:c903:0010:a85a sys_image_gui

[OMPI users] Error-Open MPI over Infiniband: polling LP CQ with status LOCAL LENGTH ERROR

2011-07-08 Thread yanyg
Hi all, The message says : [[17549,1],0][btl_openib_component.c:3224:handle_wc] from gulftown to: gulftown error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 492359816 opcode 32767 vendor error 105 qp_idx 3 This is very arcane to me, the same test ran when only one

Re: [OMPI users] mpirun does not propagate environment from master node to slave nodes

2011-07-08 Thread yanyg
Thanks, Ralph. *** quote begin * Let me get this straight. You are executing mpirun from inside a c- shell script, launching onto nodes where you will by default be running bash. The param I gave you should support that mode - it basically tells OMPI to probe the remote n

[OMPI users] MPI_Reduce error over Infiniband or TCP

2011-07-05 Thread yanyg
Dear all, We are testing Open MPI over Infiniband, and got a MPI_Reduce error message when we run our codes either over TCP or Infiniband interface, as follows, --- [gulftown:25487] *** An error occurred in MPI_Reduce [gulftown:25487] *** on communicator MPI COMMUNICATOR 3 CREATE FROM 0 [gulft

Re: [OMPI users] mpirun does not propagate environment from master node to slave nodes

2011-07-05 Thread yanyg
Thanks, Ralph. Your information is very deep and detailed. I tried with your suggestion to set ""-mca plm_rsh_assume_same_shell 0", it still does not work though. My situation is that we start a c-shell script from bash shell, which in turn invokes mpirun to other slave nodes. These slave nodes

Re: [OMPI users] mpirun does not propagate environment from master node to slave nodes

2011-06-28 Thread yanyg
Thanks, Ralph! a) Yes, I know I could use only IB by "--mca btl openib", but just want to make sure I am using IB interfaces. I am seeking an option to mpirun to print out the actual interconnect protocol, like --prot to mpirun in MPICH2. b) Yes, my default shell is bash, but I run a c-shell s

[OMPI users] mpirun does not propagate environment from master node to slave nodes

2011-06-28 Thread yanyg
Hello All, I installed Open MPI 1.4.3 on our new HPC blades, with Infiniband interconnection. My system environments are as: 1)uname -a output: Linux gulftown 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux 2) /home is mounted over all nodes, and mpirun is

Re: [OMPI users] intel compiler linking issue and issue of environment variable on remote node, with open mpi 1.4.3

2011-04-22 Thread yanyg
Open MPI 1.4.3 + Intel Compilers V8.1 summary: (in case someone likes to refer to it later) (1) To make all Open MPI executables statically linked and independent of any dynamic libraries, "--disable-shared" and "--enable-static" options should BOTH be fowarded to configure, and "-i-static" opti

Re: [OMPI users] intel compiler linking issue and issue of environment variable on remote node, with open mpi 1.4.3

2011-03-24 Thread yanyg
Thanks for your information. For my Open MPI installation, actually the executables such as mpirun and orted are dependent on those dynamic intel libraries, when I use ldd on mpirun, some dynamic libraries show up. I am trying to make these open mpi executables statically linked with these inte

Re: [OMPI users] intel compiler linking issue and issue of environment variable on remote node, with open mpi 1.4.3 (Tim Prince)

2011-03-22 Thread yanyg
Thank you very much for the comments and hints. I will try to upgrade our intel compiler collections. As for my second issue, with open mpi, is there any way to propagate enviroment variables of the current process on the master node to other slave nodes, such that orted daemon could run on s

[OMPI users] intel compiler linking issue and issue of environment variable on remote node, with open mpi 1.4.3

2011-03-21 Thread yanyg
Hi, I am trying to compile our codes with open mpi 1.4.3, by intel compilers 8.1. (1) For open mpi 1.4.3 installation on linux beowulf cluster, I use: ./configure --prefix=/home/yiguang/dmp-setup/openmpi-1.4.3 CC=icc CXX=icpc F77=ifort FC=ifort --enable-static LDFLAGS="-i-static - static-lib