hat is still wrong?
>
>
>
>
> From: r...@open-mpi.org
> Date: Fri, 25 Oct 2013 02:13:58 -0700
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] ORTE_ERROR_LOG
>
> I see two "mpirun" cmds on that cmd line - is that a copy/paste error or did
> yo
wrong?
From: r...@open-mpi.org
List-Post: users@lists.open-mpi.org
Date: Fri, 25 Oct 2013 02:13:58 -0700
To: us...@open-mpi.org
Subject: Re: [OMPI users] ORTE_ERROR_LOG
I see two "mpirun" cmds on that cmd line - is that a copy/paste error or did
you really put two of them on one l
I see two "mpirun" cmds on that cmd line - is that a copy/paste error or did
you really put two of them on one line?
On Oct 24, 2013, at 10:27 PM, Tommi Laiho wrote:
> Hi
>
> I am trying to setup a simple two machines home cluster. Later I may increase
> the number to 4 machines.
>
> I hav
The remote node starts the following process when mpirun is executed
on the local node:
25734 ?Ss 0:00 /usr/lib/openmpi/1.2.5-gcc/bin/orted --
bootproxy 1 --
I checked and it was not running before mpirun was executed.
I'll look into installing a more recent version of Open MPI.
Best I can tell, the remote orted never got executed - it looks to me
like there is something that blocks the ssh from working. Can you get
into another window and ssh to the remote node? If so, can you do a ps
and verify that the orted is actually running there?
mpirun is using the same sh
As far as I can tell, both the PATH and LD_LIBRARY_PATH are set correctly. I've tried with the full path to the mpirun executable and using the --prefix command line option. Neither works. The debug output seems to contain a lot of system specific information (IPs, usernames and such), which I'm a
Okay, that's one small step forward. You can lock that in by setting
the appropriate MCA parameter in one of two ways:
1. add the following to your default mca parameter file: btl =
tcp,sm,self (I added the shared memory subsystem as this will help
with performance). You can see how to do
Hi,
Yes I'm using ethernet connections. Doing as you suggest removes the
errors generated by running the small test program, but still doesn't
allow programs (including the small test program) to execute on any
node other than the one launching mpirun. If I try to do that, the
command han
In this instance, OMPI is complaining that you are attempting to use
Infiniband, but no suitable devices are found.
I assume you have Ethernet between your nodes? Can you run this with the
following added to your mpirun cmd line:
-mca btl tcp,self
That will cause OMPI to ignore the Infiniband su
Many thanks for your help nonetheless.
Hugh
On 28 Apr 2009, at 17:23, jody wrote:
Hi Hugh
I'm sorry, but i must admit that i have never encountered these
messages,
and i don't know what their cause exactly is.
Perhaps one of the developers can give an explanation?
Jody
On Tue, Apr 28, 2
Hi Hugh
I'm sorry, but i must admit that i have never encountered these messages,
and i don't know what their cause exactly is.
Perhaps one of the developers can give an explanation?
Jody
On Tue, Apr 28, 2009 at 5:52 PM, Hugh Dickinson
wrote:
> Hi again,
>
> I tried a simple mpi c++ program:
>
Hi again,
I tried a simple mpi c++ program:
--
#include
#include
using namespace MPI;
using namespace std;
int main(int argc, char* argv[]) {
int rank,size;
Init(argc,argv);
rank=COMM_WORLD.Get_rank();
size=COMM_WORLD.Get_size();
cout << "P:" << rank << " out of " << size << endl;
Hi Jody,
I can paswordlessly ssh between all nodes (to and from)
Almost none of these mpirun commands work. The only working case is
if nodenameX is the node from which you are running the command. I
don't know if this gives you extra diagnostic information, but if I
explicitly set the wron
Hi Hugh
You're right, there is no initialization command (like lamboot) you
have to call.
I don't really know why your sewtup doesn't work, so i'm making some
more "blind shots"
can you do passwordless ssh from between any two of your nodes?
does
mpirun -np 1 --host nodenameX uptime
work for e
Hi Jody,
The node names are exactly the same. I wanted to avoid updating the
version because I'm not the system administrator, and it could take
some time before it gets done. If it's likely to fix the problem
though I'll try it. I'm assuming that I don't have to do something
analogous to
Hi Hugh
Again, just to make sure, are the hostnames in your host file well-known?
I.e. when you say you can do
ssh nodename uptime
do you use exactly the same nodename in your host file?
(I'm trying to eliminate all non-Open-MPI error sources,
because with your setup it should basically work.)
Hi Jody,Indeed, all the nodes are running the same version of Open MPI. Perhaps I was incorrect to describe the cluster as heterogeneous. In fact, all the nodes run the same operating system (Scientific Linux 5.2), it's only the hardware that's different and even then they're all i386 or i686. I'm
Hi Hugh
Just to make sure:
You have installed Open-MPI on all your nodes?
Same version everywhere?
Jody
On Tue, Apr 28, 2009 at 12:57 PM, Hugh Dickinson
wrote:
> Hi all,
>
> First of all let me make it perfectly clear that I'm a complete beginner as
> far as MPI is concerned, so this may well
Please send all the information here:
http://www.open-mpi.org/community/help/
This kind of error can mean that you are inadvertently using
mismatched versions of Open MPI across your nodes.
On Jan 16, 2009, at 3:50 AM, Bernard Secher - SFME/LGLS wrote:
Hello,
I have the following err
Several thins are going on here. First, this error message:
> mpirun noticed that job rank 1 with PID 9658 on node mac1 exited on signal
> 6 (Aborted).
> 2 additional processes aborted (not shown)
indicates that your application procs are aborting for some reason. The
system is then attempting to
James --
Sorry for the delay in replying.
Do you have any firewall software running on your nodes (e.g.,
iptables)? OMPI uses random TCP ports to connect between nodes for
control messages. If they can't reach each other because TCP ports
are blocked, Bad Things will happen (potentially
er
> Moffett Field, CA 94035-1000
>
> Fax: 415-604-3957
>
>
> If I try to use multiple nodes, I got the error messages:
> ORTE_ERROR_LOG: Data unpack had inadequate space in file dss/dss_unpack.c at
> line 90
> ORTE_ERROR_LOG: Data unpack had inadequate space in file
> gpr_replica
Hi Qiang
This error message usually indicates that you have more than one Open MPI
installation around, and that the backend nodes are picking up a different
version than mpirun is using. Check to make sure that you have a consistent
version across all the nodes.
I also noted you were building wi
Followups on this show that this was caused by accidentally running on a one
node Torque allocation and using the "-nolocal" option to mpirun. So Open
MPI is doing what it should do (refusing to run), but being less than
helpful about its error message.
I'll file a feature enhancement to see if w
24 matches
Mail list logo