FWIW, orterun is exactly the same as mpirun (one is a sym link to the other).

This smacks of having a mismatch of Open MPI versions on different nodes.

Can you verify that default version of Open MPI that is being found on all your nodes is the same?


On Nov 30, 2007, at 12:01 AM, Madireddy Samuel Vijaykumar wrote:

Our application looks like it does not use mpirun at all. But we have
"orterun" so i just tested it by run

orterun --hostfile <hostfile> hostname and it prints out this ...

[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361

and  it just stay/hangs there :(

On Nov 29, 2007 6:07 PM, Jeff Squyres <jsquy...@cisco.com> wrote:
On Nov 29, 2007, at 2:09 AM, Madireddy Samuel Vijaykumar wrote:

A non MPI application does run without any issues. Could eloberate on
what you mean by doing mpirun "hostname". You mean i just do an
'mpirun lynx' in my case???

No, I mean

   mpirun --hostfile <your_hostfile> hostname

This should run the "hostname" command on each of your nodes.  If
running "hostname" doesn't work after changing the order, then
something is very wrong. If it *does* work, it implies something that
there is faulty in the MPI startup (which is more complicated than
starting up non-MPI applications).



On Nov 28, 2007 9:57 PM, Jeff Squyres <jsquy...@cisco.com> wrote:
Well, that's odd.

What happens if you try to mpirun "hostname" (i.e., a non-MPI
application)?  Does it run, or does it hang?



On Nov 23, 2007, at 6:00 AM, Madireddy Samuel Vijaykumar wrote:

I have been using using clusters for some tests. My localhost "lynx"
and i have "puma" and "tiger" which make up the cluster. All have
passwordless ssh enabled. Now if i have the following in my
hostfile(perline in the same order)

lynx
puma
tiger

My tests(from lynx) run over the cluster without any issues.

But if move/remove the lynx from there either (perline in the same
order)

puma
lynx
tiger

or

puma
tiger

My test(from lynx) just does not get any where. It just hangs. And
does not proceed at all. Is this an issue with way my script handles
the cluster node. Or is there an method for the hostfile. Thanks.

--
Sam aka Vijju
:)~
Linux: Open, True and Cool
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Sam aka Vijju
:)~
Linux: Open, True and Cool
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Sam aka Vijju
:)~
Linux: Open, True and Cool
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to