date:20070207

Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-07 Thread Heywood, Todd

Hi Ralph,

Unfortunately, adding "-mca pls_rsh_num_concurrent 50" to mpirun (with just -np 
and -hostfile) has no effect. The number of established connections for slapd 
grows to the same number at the same rate as without it. 

BTW, I upgraded from 1.2b2 to 1.2b3

Thanks,

TOdd

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Tuesday, February 06, 2007 6:48 PM
To: Open MPI Users
Subject: Re: [OMPI users] large jobs hang on startup (deadlock?)

Hi Todd

Just as a thought - you could try not using --debug-daemons or -d and
instead setting "-mca pls_rsh_num_concurrent 50" or some such small number.
This will tell the system to launch 50 ssh calls at a time, waiting for each
group to complete before launching the next. You can't use it with
--debug-daemons as that option prevents the ssh calls from "closing" so that
you can get the output from the daemons. You can still launch as big a job
as you like - we'll just do it 50 ssh calls at a time.

If we are truly overwhelming the slapd, then this should alleviate the
problem.

Let me know if you get to try it...
Ralph


On 2/6/07 4:05 PM, "Heywood, Todd"  wrote:

> Hi Ralph,

It looks that way. I created a user local to each node, with local
> authentication via /etc/passwd and /etc/shadow, and OpenMPI scales up just
> fine for that.

I know this is an OpenMPI list, but does anyone know how
> common or uncommon LDAP-based clusters are? I would have thought this issue
> would have arisen elsewhere, but Googling MPI+LDAP (and similar) doesn't turn
> up much.

I'd certainly be willing to test any patch.
> Thanks.

Todd

-Original Message-
From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain
Sent:
> Tuesday, February 06, 2007 9:54 AM
To: Open MPI Users
> 
Subject: Re: [OMPI users] large jobs hang on startup
> (deadlock?)

It sounds to me like we are probably overwhelming your slapd -
> your test
would seem to indicate that slowing down the slapd makes us fail
> even with
smaller jobs, which tends to support that idea.

We frankly haven't
> encountered that before since our rsh tests have all been
done using non-LDAP
> authentication (basically, we ask that you setup rsh to
auto-authenticate on
> each node). It sounds like we need to add an ability to
slow down so that the
> daemon doesn't "fail" due to authentication timeout
and/or slapd rejection due
> to the queue being full.

This may take a little time to fix due to other
> priorities, and will almost
certainly have to be released in a subsequent
> 1.2.x version. Meantime, I'll
let you know when I get something to test -
> would you be willing to give it
a shot if I provide a patch? I don't have
> access to an LDAP-based system.

Ralph


On 2/6/07 7:44 AM, "Heywood, Todd"
>  wrote:

> Hi Ralph,

Thanks for the reply. This is a tough
> one. It is OpenLDAP. I had
> thought that I might be hitting a file descriptor
> limit for slapd (LDAP
> daemon), which ulimit -n does not effect (you have to
> rebuild LDAP with a
> different FD_SETSIZE variable). However, I simply turned
> on more expressive
> logging to /var/log/slapd, and that resulted in smaller
> jobs (which
> successfully ran before) hanging. Go figure. It appears that
> daemons are up
> and running (from ps), and everything hangs in MPI_Init.
> Ctl-C
> gives

[blade1:04524] ERROR: A daemon on node blade26 failed to start
> as
> expected.
[blade1:04524] ERROR: There may be more information available
>
> from
[blade1:04524] ERROR: the remote shell (see above).
[blade1:04524]
> ERROR:
> The daemon exited unexpectedly with status 255.

I'm interested in
> any
> suggestion, semi-fixes, etc. which might help get to the bottom of this.
> Right
> now: whether the daemons are indeed up and running, or if there are
> some that
> are not (causing MPI_Init to
> hang).

Thanks,

Todd

-Original
> Message-
From:
> users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of
> Ralph H Castain
Sent:
> Tuesday, February 06, 2007 8:52 AM
To: Open MPI
> Users
> 
Subject: Re: [OMPI users] large jobs hang on
> startup
> (deadlock?)

Well, I can't say for sure about LDAP. I did a quick
> search and
> found two
things:

1. there are limits imposed in LDAP that may
> apply to your
> situation, and

2. that statement varies tremendously
> depending upon the
> specific LDAP
implementation you are using

I would
> suggest you see which LDAP
> you are using and contact the
> respective
organization to ask if they do have
> such a limit, and if so, how
> to adjust
it.

It sounds like maybe we are
> hitting the LDAP server with too
> many requests
too rapidly. Usually, the issue
> is not starting fast enough,
> so this is a
new one! We don't currently check to
> see if everything started
> up okay, so
that is why the processes might hang -
> we hope to fix that soon.
> I'll have
to see if there is something we can do to
> help

Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-07 Thread Ralph Castain

Hi Todd

I truly appreciate your patience. If the rate was the same with that switch
set, then that would indicate to me that we aren't having trouble getting
through the slapd - it probably isn't a problem with how hard we are driving
it, but rather with the total number of connections being created.
Basically, we need to establish one connection/node to launch the orteds
(the app procs are just fork/exec'd by the orteds so they shouldn't see the
slapd).

The issue may have to do with limits on the total number of LDAP
authentication connections allowed for one user. I believe that is settable,
but will have to look it up and/or ask a few friends that might know.

I have not seen an LDAP-based cluster before (though authentication onto the
head node of a cluster is frequently handled that way), but that doesn't
mean someone hasn't done it.

Again, appreciate the patience.
Ralph



On 2/7/07 10:28 AM, "Heywood, Todd"  wrote:

> Hi Ralph,

Unfortunately, adding "-mca pls_rsh_num_concurrent 50" to mpirun
> (with just -np and -hostfile) has no effect. The number of established
> connections for slapd grows to the same number at the same rate as without it.
> 

BTW, I upgraded from 1.2b2 to 1.2b3

Thanks,

TOdd

-Original
> Message-
From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Tuesday,
> February 06, 2007 6:48 PM
To: Open MPI Users
Subject: Re: [OMPI users] large
> jobs hang on startup (deadlock?)

Hi Todd

Just as a thought - you could try
> not using --debug-daemons or -d and
instead setting "-mca
> pls_rsh_num_concurrent 50" or some such small number.
This will tell the
> system to launch 50 ssh calls at a time, waiting for each
group to complete
> before launching the next. You can't use it with
--debug-daemons as that
> option prevents the ssh calls from "closing" so that
you can get the output
> from the daemons. You can still launch as big a job
as you like - we'll just
> do it 50 ssh calls at a time.

If we are truly overwhelming the slapd, then
> this should alleviate the
problem.

Let me know if you get to try
> it...
Ralph


On 2/6/07 4:05 PM, "Heywood, Todd"  wrote:

>
> Hi Ralph,

It looks that way. I created a user local to each node, with
> local
> authentication via /etc/passwd and /etc/shadow, and OpenMPI scales up
> just
> fine for that.

I know this is an OpenMPI list, but does anyone know
> how
> common or uncommon LDAP-based clusters are? I would have thought this
> issue
> would have arisen elsewhere, but Googling MPI+LDAP (and similar)
> doesn't turn
> up much.

I'd certainly be willing to test any patch.
>
> Thanks.

Todd

-Original Message-
From: users-boun...@open-mpi.org
>
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain
Sent:
>
> Tuesday, February 06, 2007 9:54 AM
To: Open MPI Users
>
> 
Subject: Re: [OMPI users] large jobs hang on startup
>
> (deadlock?)

It sounds to me like we are probably overwhelming your slapd -
>
> your test
would seem to indicate that slowing down the slapd makes us fail
>
> even with
smaller jobs, which tends to support that idea.

We frankly
> haven't
> encountered that before since our rsh tests have all been
done using
> non-LDAP
> authentication (basically, we ask that you setup rsh
> to
auto-authenticate on
> each node). It sounds like we need to add an ability
> to
slow down so that the
> daemon doesn't "fail" due to authentication
> timeout
and/or slapd rejection due
> to the queue being full.

This may take a
> little time to fix due to other
> priorities, and will almost
certainly have
> to be released in a subsequent
> 1.2.x version. Meantime, I'll
let you know
> when I get something to test -
> would you be willing to give it
a shot if I
> provide a patch? I don't have
> access to an LDAP-based system.

Ralph


On
> 2/6/07 7:44 AM, "Heywood, Todd"
>  wrote:

> Hi
> Ralph,

Thanks for the reply. This is a tough
> one. It is OpenLDAP. I had
>
> thought that I might be hitting a file descriptor
> limit for slapd (LDAP
>
> daemon), which ulimit -n does not effect (you have to
> rebuild LDAP with a
>
> different FD_SETSIZE variable). However, I simply turned
> on more
> expressive
> logging to /var/log/slapd, and that resulted in smaller
> jobs
> (which
> successfully ran before) hanging. Go figure. It appears that
>
> daemons are up
> and running (from ps), and everything hangs in MPI_Init.
>
> Ctl-C
> gives

[blade1:04524] ERROR: A daemon on node blade26 failed to
> start
> as
> expected.
[blade1:04524] ERROR: There may be more information
> available
>
> from
[blade1:04524] ERROR: the remote shell (see
> above).
[blade1:04524]
> ERROR:
> The daemon exited unexpectedly with status
> 255.

I'm interested in
> any
> suggestion, semi-fixes, etc. which might help
> get to the bottom of this.
> Right
> now: whether the daemons are indeed up
> and running, or if there are
> some that
> are not (causing MPI_Init to
>
> hang).

Thanks,

Todd

-Original
> Message

Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-07 Thread Heywood, Todd

Hi Ralph,

Patience is not an issue since I have a workaround (a locally authenticated 
user), and other users are not running large enough MPI jobs to hit this 
problem.

I'm a bit confused now though. I thought that setting this switch would set off 
50 ssh sessions at a time, or 50 connections to slapd. I.e. a second group of 
50 connections wouldn't initiate until the first group "closed" their sessions, 
which should be reflected by a corresponding decrease in the number of 
established connections for slapd. So my conclusion was that no sessions are 
"closing".

There's also the observation that when slapd is slowed down by logging 
(extensively), things hang with fewer number of established connections (open 
ssh sessions). I don't see how this fitrs with a total number of connections 
limitation.

Thanks,

Todd

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Wednesday, February 07, 2007 1:28 PM
To: Open MPI Users
Subject: Re: [OMPI users] large jobs hang on startup (deadlock?)

Hi Todd

I truly appreciate your patience. If the rate was the same with that switch
set, then that would indicate to me that we aren't having trouble getting
through the slapd - it probably isn't a problem with how hard we are driving
it, but rather with the total number of connections being created.
Basically, we need to establish one connection/node to launch the orteds
(the app procs are just fork/exec'd by the orteds so they shouldn't see the
slapd).

The issue may have to do with limits on the total number of LDAP
authentication connections allowed for one user. I believe that is settable,
but will have to look it up and/or ask a few friends that might know.

I have not seen an LDAP-based cluster before (though authentication onto the
head node of a cluster is frequently handled that way), but that doesn't
mean someone hasn't done it.

Again, appreciate the patience.
Ralph

On 2/7/07 10:28 AM, "Heywood, Todd"  wrote:

> Hi Ralph,

Unfortunately, adding "-mca pls_rsh_num_concurrent 50" to mpirun
> (with just -np and -hostfile) has no effect. The number of established
> connections for slapd grows to the same number at the same rate as without it.
> 

BTW, I upgraded from 1.2b2 to 1.2b3

Thanks,

TOdd

-Original
> Message-
From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Tuesday,
> February 06, 2007 6:48 PM
To: Open MPI Users
Subject: Re: [OMPI users] large
> jobs hang on startup (deadlock?)

Hi Todd

Just as a thought - you could try
> not using --debug-daemons or -d and
instead setting "-mca
> pls_rsh_num_concurrent 50" or some such small number.
This will tell the
> system to launch 50 ssh calls at a time, waiting for each
group to complete
> before launching the next. You can't use it with
--debug-daemons as that
> option prevents the ssh calls from "closing" so that
you can get the output
> from the daemons. You can still launch as big a job
as you like - we'll just
> do it 50 ssh calls at a time.

If we are truly overwhelming the slapd, then
> this should alleviate the
problem.

Let me know if you get to try
> it...
Ralph

On 2/6/07 4:05 PM, "Heywood, Todd"  wrote:

>
> Hi Ralph,

It looks that way. I created a user local to each node, with
> local
> authentication via /etc/passwd and /etc/shadow, and OpenMPI scales up
> just
> fine for that.

I know this is an OpenMPI list, but does anyone know
> how
> common or uncommon LDAP-based clusters are? I would have thought this
> issue
> would have arisen elsewhere, but Googling MPI+LDAP (and similar)
> doesn't turn
> up much.

I'd certainly be willing to test any patch.
>
> Thanks.

Todd

-Original Message-
From: users-boun...@open-mpi.org
>
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain
Sent:
>
> Tuesday, February 06, 2007 9:54 AM
To: Open MPI Users
>
> 
Subject: Re: [OMPI users] large jobs hang on startup
>
> (deadlock?)

It sounds to me like we are probably overwhelming your slapd -
>
> your test
would seem to indicate that slowing down the slapd makes us fail
>
> even with
smaller jobs, which tends to support that idea.

We frankly
> haven't
> encountered that before since our rsh tests have all been
done using
> non-LDAP
> authentication (basically, we ask that you setup rsh
> to
auto-authenticate on
> each node). It sounds like we need to add an ability
> to
slow down so that the
> daemon doesn't "fail" due to authentication
> timeout
and/or slapd rejection due
> to the queue being full.

This may take a
> little time to fix due to other
> priorities, and will almost
certainly have
> to be released in a subsequent
> 1.2.x version. Meantime, I'll
let you know
> when I get something to test -
> would you be willing to give it
a shot if I
> provide a patch? I don't have
> access to an LDAP-based system.

Ralph

On
> 2/6/07 7:44 AM, "Heywood, Todd"
>  wrote:

> Hi
> Ralph,

Thanks for the repl

Re: [OMPI users] large jobs hang on startup (deadlock?)

2007-02-07 Thread Ralph Castain

On 2/7/07 12:07 PM, "Heywood, Todd"  wrote:

> Hi Ralph,

Patience is not an issue since I have a workaround (a locally
> authenticated user), and other users are not running large enough MPI jobs to
> hit this problem.

I'm a bit confused now though. I thought that setting this
> switch would set off 50 ssh sessions at a time, or 50 connections to slapd.
> I.e. a second group of 50 connections wouldn't initiate until the first group
> "closed" their sessions, which should be reflected by a corresponding decrease
> in the number of established connections for slapd. So my conclusion was that
> no sessions are "closing".

The way the rsh launcher works is to fork/exec num_concurrent rsh/ssh
sessions, and watch as each one "closes" the connection back to the HNP.
When that block has cleared, we then begin launching the next one. Note that
we are talking here about closure of stdin/stdout connections - i.e., the
orteds "daemonize" themselves after launch, thus severing their stdin/stdout
relationship back to the HNP.

It is possible that this mechanism isn't actually limiting the launch rate -
e.g., the orteds may daemonize themselves so quickly that the block launch
doesn't help. In a soon-to-come future version, we won't use that mechanism
for determining when to launch the next block - my offer of a patch was to
give you that new version now, modify it to more explicitly limit launch
rate, and see if that helped. I'll try to put that together in the next week
or so.

Since you observed that the pls_rsh_num_concurrent option had no impact on
the *rate* at which we launched, that would indicate that either the slapd
connection isn't bottlenecking - the time to authenticate is showing as
independent of the rate at which we are hitting the slapd - or we are not
rate limiting as we had hoped. Hence my comment that it may not look like a
rate issue.

As I said earlier, we have never tested this with LDAP. From what I
understand of LDAP (which is limited, I admit), the ssh'd process (the orted
in this case) forms an authentication connection back to the slapd. It may
not be possible to sever this connection during the life of that process.
There typically are limits on the number of simultaneous LDAP sessions a
single user can have open - mainly for security reasons - so that could also
be causing the problem.

Given that you also observed that the total number of nodes we could launch
upon was the same regardless of the rate, it could be that we are hitting
the LDAP session limit.

Logging may have a broader impact than just slapd response rate - I honestly
don't know.

Hope that helps - I'll pass along that patch as soon as I can.
Ralph

>

There's also the observation that when slapd is
> slowed down by logging (extensively), things hang with fewer number of
> established connections (open ssh sessions). I don't see how this fitrs with a
> total number of connections limitation.

> Thanks,

Todd

-Original
> Message-
From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent:
> Wednesday, February 07, 2007 1:28 PM
To: Open MPI Users
Subject: Re: [OMPI
> users] large jobs hang on startup (deadlock?)

Hi Todd

I truly appreciate
> your patience. If the rate was the same with that switch
set, then that would
> indicate to me that we aren't having trouble getting
through the slapd - it
> probably isn't a problem with how hard we are driving
it, but rather with the
> total number of connections being created.
Basically, we need to establish one
> connection/node to launch the orteds
(the app procs are just fork/exec'd by
> the orteds so they shouldn't see the
slapd).

The issue may have to do with
> limits on the total number of LDAP
authentication connections allowed for one
> user. I believe that is settable,
but will have to look it up and/or ask a few
> friends that might know.

I have not seen an LDAP-based cluster before (though
> authentication onto the
head node of a cluster is frequently handled that
> way), but that doesn't
mean someone hasn't done it.

Again, appreciate the
> patience.
Ralph

On 2/7/07 10:28 AM, "Heywood, Todd" 
> wrote:

> Hi Ralph,

Unfortunately, adding "-mca pls_rsh_num_concurrent 50" to
> mpirun
> (with just -np and -hostfile) has no effect. The number of
> established
> connections for slapd grows to the same number at the same rate
> as without it.
> 

BTW, I upgraded from 1.2b2 to
> 1.2b3

Thanks,

TOdd

-Original
> Message-
From:
> users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of
> Ralph Castain
Sent: Tuesday,
> February 06, 2007 6:48 PM
To: Open MPI
> Users
Subject: Re: [OMPI users] large
> jobs hang on startup (deadlock?)

Hi
> Todd

Just as a thought - you could try
> not using --debug-daemons or -d
> and
instead setting "-mca
> pls_rsh_num_concurrent 50" or some such small
> number.
This will tell the
> system to launch 50 ssh calls at a time, waiting
> for each
group to complete

[OMPI users] Does Open MPI "Realy" support AIX?

2007-02-07 Thread Nader Ahmadi

Hello All,

We are in the process to decide, if we should use Open MPI in an AIX 
environment.

Our in-house testing indicates that OMPI (V 1.1.x and V 1.2.x) stdio is broken 
under AIX. 

At this point, I am trying to find out if there is a fix or work-around for 
this problem. I have put another 

posting (see attached). One recommendation was try pre-release of V 1.2, which 
didn't make any difference.

I am hoping that an OMPI developer or someone from IBM come up with a solution. 
 

Open MPI documentation, indicates that AIX is being supported, with limited 
testing before each release.

What is limited testing? Does it mean, Configure, Install and running "Hello 
World" on one node? 

In short, we did configure and installed  V 1.1.x as well as V1.2.x, but 
attempt to running

a simple test such as "mpirun -np 1 hostname", fails, see attached for more 
details.

I have eight  nodes IBM systems, I could run any test, to solve this problem.

Thanks for your comments

Ali,

---

>From Previous Posting on OMPI user's group 

--
 

I have installed Open MPI 1.1.2  on IBM AIX 5.3 cluster. It looks like 

terminal output is broken. There are a few entry in the OpenMPI archive for 
this problem, 

with no suggested solution or real work around. 

I am putting this posting with hope to get some advise for a work around or 
solution. 

#mpirun -np 1 hostname 

No out put, piping the command to "cat" or "more" generate no out put as well. 

The only way to get an output from this command is to add --debug-daemons 

#mpirun -np 1 --debug-daemons hostname 

Even this debug option is not working for a real application which generate 
several output. 

Looking forward for any comments. 

Thanks

[OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-07 Thread Mark Kosmowski


Dear Open-MPI list:

I'm trying to run two (soon to be three) dual opteron machines as a
cluster (network of workstations - they each have a disk and OS).  I
can ssh between machines with no password.  My open-mpi code compiled
fine and works great as an SMP program (using both processors on one
machine).  However, I am not able to run my open-mpi program parallel
between the two computers.

For SMP work I use:

mpirun -np 2 myprogram inputfile >outputfile

For cluster work I have tried:

mpirun --hostfile myhostfile -np 4 myprogram inputfile >outputfile

which does not write to the output file.

I have also tried:

mpirun --hostfile myhostfile -np 4 `myprogram inputfile >outputfile`

which just ran serially on the initial machine.

The open-mpi executable and libraries are on the head node NFS shared
to the slave node.  Both computers can run open-mpi [the open-mpi
application] as an SMP program with no problems.  When I am trying to
run the open-mpi program with both computers, I am using a directory
that is an NFS share to the other computer.

I am running OpenSUSE 10.2 on both machines.  I compiled with gcc 41 /
ifort 9.1.

I am using a gigabit network.

My hostfile specifies slots=2 max-slots=2 for each computer.  The
computers are identified in the hostfile using the /etc/hosts alias.

The only config.log that I found was in the directory I used to make
open-mpi; since everything works as SMP, I am not including that file
with this initial message.

What should I be trying to do next to remedy this issue?

Any help would be appreciated.

Thanks,

Mark Kosmowski

[OMPI users] install script issue

2007-02-07 Thread Michael

Building openmpi-1.3a1r13525 on OS X 10.4.8 (PowerPC), using my  
standard compile line


./configure F77=g95 FC=g95 LDFLAGS=-lSystemStubs --with-mpi-f90- 
size=large --with-f90-max-array-dim=3 ; make all


and after installing I found that I couldn't compile, because of the  
following:


-rw--- 1 root  wheel   640216 Feb  7 14:48 libmpi_f90.a

This has not happened in the past and I followed the same procedures  
I've been using for many months.


One slight difference is that I installed using the command "make  
install all" rather then "make install", also I had uninstalled the  
previous version prior to installing this version.


Michael

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-07 Thread Alex Tumanov


Hello,


mpirun -np 2 myprogram inputfile >outputfile

There can be a whole host of issues with the way you run your
executable and/or the way you have the environment setup. First of
all, when you ssh into the node, does the environment automatically
get updated with correct Open MPI paths? I.e. LD_LIBRARY_PATH should
be correctly set to the OMPI lib directory, PATH should contain OMPI's
bin dir, etc. If this is not the case, you have two options:
a. create small /etc/profile.d scripts to set up those env. variables
b. use --prefix version when you invoke mpirun on the headnode

Generally, it would be much more helpful if you provided the actual
output of running the commands you listed here.


mpirun --hostfile myhostfile -np 4 myprogram inputfile >outputfile

Another issue I can think of is path specification to 'myprogram'. Do
you just cd into the directory where it resides and specify its name
only? Try to either specify an absolute path to the executable or path
relative to your homedir: ~/appdir/bin/appexec, assuming this location
is the same on all the nodes. If mpirun can't find your executable on
one of the nodes, it should report that as an error.


which does not write to the output file.

Does it write anything to stderr? You could also try invoking mpirun
with '--mca pls_rsh_agent ssh'


mpirun --hostfile myhostfile -np 4 `myprogram inputfile >outputfile`

Are those backquotes?? I would recommend getting mpirun to invoke
something basic on all the participating nodes successfully first, try
mpirun --prefix /path/to/ompi/ --hostfile myhosfile --np 4 hostname
for instance. Nothing else will work until this does.

These are just a few pointers to get you started. Hope this helps.

Alex.

Re: [OMPI users] large jobs hang on startup (deadlock?)

Re: [OMPI users] large jobs hang on startup (deadlock?)

Re: [OMPI users] large jobs hang on startup (deadlock?)

Re: [OMPI users] large jobs hang on startup (deadlock?)

[OMPI users] Does Open MPI "Realy" support AIX?

[OMPI users] first time user - can run mpi job SMP but not over cluster

[OMPI users] install script issue

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

8 matches

Site Navigation

Mail list logo

Footer information