Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Gilles Gouaillardet
That is a good sign, it means orted was started on both nodes strictly speaking, you should confirm both nodes appear 16 times each in the output, do you can draw any firm conclusion Cheers, Gilles On Monday, August 3, 2015, abhisek Mondal wrote: > I wrote 2 new node names(which I had not used

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Bennet Fauber
You will see those warning the first time you connect to a new host. If it printed the hostname from each processor, it should be OK. On Sun, Aug 2, 2015 at 11:06 AM, abhisek Mondal wrote: > I wrote 2 new node names(which I had not used before) in "myhostfile". > And when I run it from login term

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread abhisek Mondal
I wrote 2 new node names(which I had not used before) in "myhostfile". And when I run it from login terminal, it says: *Warning: Permanently added 'cx1055,10.1.5.35' (RSA) to the list of known hosts.* *Warning: Permanently added 'cx1071,10.1.5.51' (RSA) to the list of known hosts.* * * Is it o

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Bennet Fauber
On Sun, Aug 2, 2015 at 10:47 AM, abhisek Mondal wrote: Try /mpirun --hostfile myhostfile -np 32 hostname

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread abhisek Mondal
Sorry, but I can't get it. Would you please provide a demo_code(in context of the working code) ? Thanks. On Sun, Aug 2, 2015 at 7:43 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > simply replace nwchem with hostname > > both hosts should be part of the output... > > Cheers, >

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Gilles Gouaillardet
simply replace nwchem with hostname both hosts should be part of the output... Cheers, Gilles On Sunday, August 2, 2015, abhisek Mondal wrote: > Jeff, Gilles > > Here's my scenario again when I tried something different: > I've interactively booked 2 nodes(cx1015 and cx1016) and working in >

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread abhisek Mondal
Jeff, Gilles Here's my scenario again when I tried something different: I've interactively booked 2 nodes(cx1015 and cx1016) and working in "cx1015" node. Here I hit "module load openmpi" and "module load nwchem"( but I don't know how to "module load" on other node). Using the openmpi command to r

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Gilles Gouaillardet
The initial error was ompi could not find orted on the second node, and that was fixed by using the full path for mpirun if you run under pbs, you should not need the hostile option. just ask pbs to allocate 2 nodes and everything should run smoothly. at first, I recommend you run a non MPI appli

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Jeff Squyres (jsquyres)
Abhisek -- You are having two problems: 1. In the first "orted not found" problem, Open MPI was not finding its "orted" helper executable on the remote nodes in your cluster. When you "module load ..." something, it just loads the relevant PATH / LD_LIBRARY_PATH / etc. on the local node; it d

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread abhisek Mondal
I'm on a HPC cluster. So, the openmpi-1.6.4 here installed as a module. In .pbs script, before executing my code-line, I'm loading both "nwchem" and "openmpi" module. It is working very nicely when I work on single node(with 16 processors). But if I try to switch in multiple nodes with "hostfile" o

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread abhisek Mondal
HI, I have tried using full paths for both of them. But stuck in the same issue. On Sun, Aug 2, 2015 at 4:39 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Is ompi installed on the other node and at the same location ? > did you configure ompi with --enable-mpirun-prefix-by-def

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Gilles Gouaillardet
Is ompi installed on the other node and at the same location ? did you configure ompi with --enable-mpirun-prefix-by-default ? (note that should not be necessary if you invoke mpirun with full path ) you can also try /.../bin/mpirun --mca plm_base_verbose 100 ... and see if there is something wro

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread abhisek Mondal
Yes, I have tried this and got following error: *mpirun was unable to launch the specified application as it could not find an executable:* *Executable: nwchem* *Node: cx934* *while attempting to start process rank 16.* Given that: I have to run my code with "nwchem filename.nw" command. While

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread Gilles Gouaillardet
Can you try running invoking mpirun with its full path instead ? e.g. /usr/local/bin/mpirun instead of mpirun Cheers, Gilles On Sunday, August 2, 2015, abhisek Mondal wrote: > Here is the other details, > > a. The Openmpi version is 1.6.4 > > b. The error as being generated is : > *Warning: Pe

Re: [OMPI users] having an issue with paralleling jobs

2015-08-02 Thread abhisek Mondal
Here is the other details, a. The Openmpi version is 1.6.4 b. The error as being generated is : *Warning: Permanently added 'cx0937,10.1.4.1' (RSA) to the list of known hosts.* *Warning: Permanently added 'cx0934,10.1.3.255' (RSA) to the list of known hosts.* *orted: Command not found.* *orted: C

Re: [OMPI users] having an issue with paralleling jobs

2015-08-01 Thread Ralph Castain
Would you please tell us: (a) what version of OMPI you are using (b) what error message you are getting when the job terminates > On Aug 1, 2015, at 12:22 PM, abhisek Mondal wrote: > > I'm working on an openmpi enabled cluster. I'm trying to run a job with 2 > different nodes and 16 processo