:
> 10.4.72.110:1
> [c0301b10e1:22830] [[0,],1]-[[0,0],0]
> mca_oob_tcp_peer_complete_connect: connection failed: Connection
> refused (111) - retrying
>
>
>
>
> Thanks!
> p.
>
>
> On Mon, Aug 23, 2010 at 3:24 PM, Ralph Castain wrote:
>> Can you send m
Yes, that's fine. Thx!
On Aug 24, 2010, at 9:02 AM, Philippe wrote:
> awesome, I'll give it a spin! with the parameters as below?
>
> p.
>
> On Tue, Aug 24, 2010 at 10:47 AM, Ralph Castain wrote:
>> I think I have this working now - try anything on or after r2
-gcj-1.4.2.0/jre --with-cpu=generic
> --host=x86_64-redhat-linux
> Thread model: posix
> gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
>
>
> but it failed. I am attaching the configure and make logs.
>
> regards
>
> Michael
>
>
> On 08/23/10 20:53, Ralph Cas
It's not a bug - that is normal behavior. The processes are polling hard to
establish the connections as quickly as possible.
On Sep 1, 2010, at 7:24 PM, lyb wrote:
> Hi, All,
>
> I tested two sample applications on Windows 2003 Server, one use
> MPI_Comm_accept and other use MPI_Comm_connect
In the upcoming 1.5 series, we will introduce a new "sensor" framework to help
resolve such issues. Among other things, it will automatically track (if
requested) the size of a sentinel file, cpu usage, and memory footprint and
will terminate the job if any exceed user-specified limits (e.g., fi
On Sep 3, 2010, at 5:10 PM, David Singleton wrote:
> On 09/03/2010 10:05 PM, Jeff Squyres wrote:
>> On Sep 3, 2010, at 12:16 AM, Ralph Castain wrote:
>>
>>> Backing off the polling rate requires more application-specific logic like
>>> that offered below, so i
As people have said, these time values are to be expected. All they reflect is
the time difference spent in reduce waiting for the slowest process to catch up
to everyone else. The barrier removes that factor by forcing all processes to
start from the same place.
No mystery here - just a reflec
On Sep 9, 2010, at 1:46 AM, Ashley Pittman wrote:
>
> On 9 Sep 2010, at 08:31, Terry Frankcombe wrote:
>
>> On Thu, 2010-09-09 at 01:24 -0600, Ralph Castain wrote:
>>> As people have said, these time values are to be expected. All they
>>> reflect is the time d
How did you configure OMPI?
On Sep 11, 2010, at 1:35 AM, Srikanth Raju wrote:
> Hello OMPI Users,
> I'm using OpenMPI 1.4.1 with gcc 4.4.3 on my x86_64 linux system running the
> latest Ubuntu 10.04 distro. I don't seem to be able to run any OpenMPI
> application. I try running the simplest app
Printouts of less than 100 bytes would be unusual...but possible
On Wed, Sep 22, 2010 at 8:15 AM, Jeff Squyres wrote:
> Are you running on machines with OpenFabrics devices (that Open MPI is
> using)?
>
> Is ompi-ps printing 100 bytes or more?
>
> What does ps show when your program is hung?
>
>
18053 poll_s pts/000:00:00 test.run
> 0 S 1000 1844 1840 0 80 0 - 18053 poll_s pts/000:00:00 test.run
>
> pipe_s = wait state on read/write against a pipe.
>
> So, with that command I concluded that one mpi process is waiting for the
> read of a pipe.
&
In a word, no. If a node crashes, OMPI will abort the currently-running job
if it had processes on that node. There is no current ability to "ride-thru"
such an event.
That said, there is work being done to support "ride-thru". Most of that is
in the current developer's code trunk, and more is com
You mean via an API of some kind? Not through an MPI call, but you can do it
(though your code will wind up OMPI-specific). Look at the OMPI source code
in opal/mca/paffinity/paffinity.h and you'll see the necessary calls as well
as some macros to help parse the results.
Depending upon what versio
Hi Daniel
I had actually volunteered to do this once Apple provided me with the required
Mac OSX Server license, but I honestly haven't had time to do so. We would
welcome any patches you can provide!
The relevant code is located in orte/mca/plm/xgrid/src. I believe it currently
compiles, but
It looks to me like your remote nodes aren't finding the orted executable. I
suspect the problem is that you need to forward the path and ld_library_path
tot he remove nodes. Use the mpirun -x option to do so.
On Oct 4, 2010, at 5:08 AM, Chris Jewell wrote:
> Hi all,
>
> Firstly, hello to the
I'm not sure why the group communicator would make a difference - the code area
in question knows nothing about the mpi aspects of the job. It looks like you
are hitting a race condition that causes a particular internal recv to not
exist when we subsequently try to cancel it, which generates th
On Oct 4, 2010, at 10:36 AM, Milan Hodoscek wrote:
>>>>>> "Ralph" == Ralph Castain writes:
>
>Ralph> I'm not sure why the group communicator would make a
>Ralph> difference - the code area in question knows nothing about
>Ralph
Some of what you are seeing is the natural result of context switchingsome
thoughts regarding the results:
1. You didn't bind your procs to cores when running with #procs < #cores, so
you're performance in those scenarios will also be less than max.
2. Once the number of procs exceeds the
yperthreads
at this time.
>
> BTW, how to bind the proc to the core? I tried --bind-to-core or
> -bind-to-core but neither works. Is it for OpenMP, not for OpenMPI?
Those should work. You might try --report-bindings to see what OMPI thought it
did.
>
> Linbao
>
>
>
There is an MCA param that tells the orted to set its usage limits to the hard
limit:
MCA opal: parameter "opal_set_max_sys_limits" (current value:
<0>, data source: default value)
Set to non-zero to automatically set any
system-imposed limits to the ma
Hetero operations tend to lose a little performance due to the need to
convert data, but otherwise there is no real negative. We don't do it by
default solely because the majority of installations don't need to, and
there is no reason to lose even a little performance if it isn't necessary.
If you
The short answer is "yes". It should work.
On Thu, Oct 7, 2010 at 1:53 PM, Durga Choudhury wrote:
> I'd like to add to this question the following:
>
> If I compile with --enable-heterogenous flag for different
> *architectures* (I have a mix of old 32 bit x86, newer x86_64 and some
> Cell BE b
On Oct 7, 2010, at 2:55 AM, Reuti wrote:
> Am 07.10.2010 um 01:55 schrieb David Turner:
>
>> Hi,
>>
>> We would like to set process memory limits (vmemoryuse, in csh
>> terms) on remote processes. Our batch system is torque/moab.
>
> Isn't it possible to set this up in torque/moab directly? I
On Oct 6, 2010, at 11:25 PM, David Turner wrote:
> Hi Ralph,
>
>> There is an MCA param that tells the orted to set its usage limits to the
>> hard limit:
>>
>> MCA opal: parameter "opal_set_max_sys_limits" (current
>> value:<0>, data source: default value)
>>
Ah - I was unfamiliar with that option. Thanks!
David: does that meet the need?
On Oct 8, 2010, at 2:45 AM, Reuti wrote:
> Am 08.10.2010 um 00:40 schrieb Ralph Castain:
>
>>
>> On Oct 7, 2010, at 2:55 AM, Reuti wrote:
>>
>>> Am 07.10.2010 um 01:55 s
No problem - I got to learn something too!
On Oct 11, 2010, at 11:19 PM, David Turner wrote:
> Hi,
>
> Various people contributed:
>
> Isn't it possible to set this up in torque/moab directly? In SGE I would
> simply define h_vmem and it's per slot then; and with a tight integration
There is no OMPI 2.5 - do you mean 1.5?
On Oct 17, 2010, at 4:11 PM, Brian Budge wrote:
> Hi Jody -
>
> I noticed this exact same thing the other day when I used OpenMPI v
> 2.5 built with valgrind support. I actually ran out of memory due to
> this. When I went back to v 2.43, my program work
in case you
call again. This helps performance by reducing the number of malloc's in your
application.
>
> Jody
>
> On Mon, Oct 18, 2010 at 1:57 AM, Ralph Castain wrote:
>> There is no OMPI 2.5 - do you mean 1.5?
>>
>> On Oct 17, 2010, at 4:11 PM, Brian Bu
; send data to each other and to the master.
>
>
> On Mon, Oct 18, 2010 at 2:48 PM, Ralph Castain wrote:
>>
>> On Oct 18, 2010, at 1:41 AM, jody wrote:
>>
>>> I had this leak with OpenMPI 1.4.2
>>>
>>> But in my case, there is no accumulati
w that the new realease 1.5 is out.
> I didn't found this fix in the "list of changes", is it present but not
> mentioned since is a minor fix ?
>
> Thank you,
> Federico
>
>
>
> 2010/4/1 Ralph Castain
> Hi there!
>
> It will be in the 1.5.0 r
Just to be clear: it isn't mpiexec that is failing. It is your MPI application
processes that are failing.
On Oct 20, 2010, at 7:38 AM, Siegmar Gross wrote:
> Hi,
>
> I have built Open MPI 1.5 on Linux x86_64 with the Oracle/Sun Studio C
> compiler. Unfortunately "mpiexec" breaks when I run a
The error message seems to imply that mpirun itself didn't segfault, but that
something else did. Is that segfault pid from mpirun?
This kind of problem usually is caused by mismatched builds - i.e., you compile
against your new build, but you pick up the Myrinet build when you try to run
becau
MPI won't do this - if a node dies, the entire MPI job is terminated.
Take a look at OpenRCM, a subproject of Open MPI:
http://www.open-mpi.org/projects/orcm/
This is designed to do what you describe as we have a similar (open source)
project underway at Cisco. If I were writing your system, I
RHE x86_64
> machinefile example :
> or985966@is209898 slots=1
> realtime@is206022 slots=8
> realtime@is206025 slots=8
>
> Best regards,
>
> Olivier
>
> -- Forwarded message --
> From: Ralph Castain
> Date: 2010/3/11
> Subject: Re: [OMPI us
Do ./configure --help and you'll see options for specifying the host and build
target. You need to do that when cross-compiling.
On Oct 25, 2010, at 12:01 PM, saahil...@gmail.com wrote:
> -- Forwarded message --
> From: saahil...@gmail.com
> Date: Oct 25, 2010 11:26pm
> Subject:
> Saahil
>
> On Oct 25, 2010 11:35pm, Ralph Castain wrote:
> > Do ./configure --help and you'll see options for specifying the host and
> > build target. You need to do that when cross-compiling.
> >
> >
> >
> >
> >
> > On Oct
e. Did I fail to understand what you said? Am I doing something
> > wrong here?
> >
> > Regards,
> > Saahil
> >
> > On Oct 25, 2010 11:35pm, Ralph Castain r...@open-mpi.org> wrote:
> > > Do ./configure --help and you'll see options for specifying the host a
Specify your hostfile as the default one:
mpirun --default-hostfile ./Cluster.hosts
Otherwise, we take the default hostfile and then apply the hostfile as a filter
to select hosts from within it. Sounds strange, I suppose, but the idea is that
the default hostfile can contain configuration info
Couple of things stand out:
1. you definitely don't want to use a copy of the trunk beyond r23924. The
developer's trunk is undergoing some major change and orcm no longer is in-sync
with it. I probably won't update orcm to match until later this year (will
freeze integration at r23924).
2. th
There are two connections to be specified:
-mca oob_tcp_if_include xxx
-mca btl_tcp_if_include xxx
On Nov 11, 2010, at 12:04 PM, Krzysztof Zarzycki wrote:
> Hi,
> I'm working with Grzegorz on the mentioned problem.
> If I'm correct on checking the firewall settings, "iptables --list" shows an
I am not a grid engine expert by any means, but I do know a bit about OMPI's
internals for binding processes.
Here is what we do:
1. mpirun gets its list of hosts from the environment, or from your machine
file. It then maps the processes across the machines.
2. mpirun launches a daemon on eac
I confess I am now confused. What version of OMPI are you using?
FWIW: OMPI was updated at some point to detect the actual cores of an
external binding, and abide by them. If we aren't doing that, then we have a
bug that needs to be resolved. Or it could be you are using a version that
predates th
The external binding code should be in that version.
If you add --report-bindings --leave-session-attached to the mpirun command
line, you should see output from each daemon telling you what external
binding it detected, and how it is binding each app it launches.
Thanks!
On Mon, Nov 15, 2010 a
Perhaps I'm missing it, but it seems to me that the real problem lies in the
interaction between SGE and OMPI during OMPI's two-phase launch. The verbose
output shows that SGE dutifully allocated the requested number of cores on
each node. However, OMPI launches only one process on each node (the O
Hi Reuti
> > 2. have SGE bind procs it launches to -all- of those cores. I believe SGE
> does this automatically to constrain the procs to running on only those
> cores.
>
> This is another "bug/feature" in SGE: it's a matter of discussion, whether
> the shepherd should get exactly one core (in c
On Tue, Nov 16, 2010 at 12:23 PM, Terry Dontje wrote:
> On 11/16/2010 01:31 PM, Reuti wrote:
>
> Hi Ralph,
>
> Am 16.11.2010 um 15:40 schrieb Ralph Castain:
>
>
> 2. have SGE bind procs it launches to -all- of those cores. I believe SGE
> does this automaticall
Cris' output is coming solely from the HNP, which is correct given the way
things were executed. My comment was from another email where he did what I
asked, which was to include the flags:
--report-bindings --leave-session-attached
so we could see the output from each orted. In that email, it wa
wrote:
> On 11/17/2010 09:32 AM, Ralph Castain wrote:
>
> Cris' output is coming solely from the HNP, which is correct given the way
> things were executed. My comment was from another email where he did what I
> asked, which was to include the flags:
>
> --repor
attached and provide -all- output we could verify that
analysis and clear up the confusion?
On Wed, Nov 17, 2010 at 8:13 AM, Terry Dontje wrote:
> On 11/17/2010 10:00 AM, Ralph Castain wrote:
>
> --leave-session-attached is always required if you want to see output from
> the daemons. O
On 11/17/2010 10:48 AM, Ralph Castain wrote:
>
> No problem at all. I confess that I am lost in all the sometimes disjointed
> emails in this thread. Frankly, now that I search, I can't find it either!
> :-(
>
> I see one email that clearly shows the external binding report f
Which executable is it not finding? mpirun? Your application?
On Wed, Nov 17, 2010 at 7:49 PM, Tushar Andriyas wrote:
> Hi there,
>
> I am new to using mpi commands and was stuck in problem with running a
> code. When I submit my job through a batch file, the job exits with the
> message that th
Is you "hello world" test program in the same directory as SWMF? Is it
possible that the path you are specifying is not available on all of the
remote machines? That's the most common problem we see.
On Thu, Nov 18, 2010 at 7:59 AM, Tushar Andriyas wrote:
> Hi there,
>
> Thanks for the expedite
rote:
> no its not in the same directory as SWMF. I guess the path is the same
> since all the machines in a cluster are configured d same way. How do I know
> if this is not the case?
>
>
> On Thu, Nov 18, 2010 at 8:25 AM, Ralph Castain wrote:
>
>> Is you "hello wo
What OMPI version are you talking about?
We already trap SIGPIPE, but ignore it at the request of others (not sure what
version that was started). I believe a flag may exist to alter that behavior -
could easily be added if not.
On Nov 24, 2010, at 5:08 PM, Jesse Ziser wrote:
> Hello,
>
> I'
After digging around a little, I found that you must be using the OMPI devel
trunk as no release version contains this code. I also looked to see why it was
done, and found that the concern was with an inadvertent sigpipe that can occur
internal to OMPI due to a race condition.
So I modified th
Are you using the Intel compiler? The build system is looking for an "icc"
command and not finding it. Perhaps something in your environment is defining
CC to be "icc"?
On Nov 29, 2010, at 10:43 AM, Maurício Rodrigues wrote:
> HI, I need to install opnmpi 1.4.2 in Ubuntu 4.10 64bit, and giving
lema.
> gostaria de ajuda se possivel.
> obrigado desde de já
>
> 2010/11/29 Ralph Castain
>
> Are you using the Intel compiler? The build system is looking for an "icc"
> command and not finding it. Perhaps something in your environment is defining
> CC to b
It truly does help to know what version of OMPI you are using - otherwise,
there is little we can do to help
On Nov 30, 2010, at 4:05 AM, Hicham Mouline wrote:
> Hello,
>
> I have successfully run
>
> mpirun -np 3 .\test.exe
>
> when I try MPMP
>
> mpirun -np 3 .\test.exe : -np 3 .\test
t; had nothing to do with OpenMPI. So thanks for the help; all is well. (And
>> sorry for the belated reply.)
>> Ralph Castain wrote:
>>> After digging around a little, I found that you must be using the OMPI
>>> devel trunk as no release version contains this code.
Guess I'm not entirely sure I understand how this is supposed to work. All the
-x does is tell us to pickup an envar of the given name and forward its value
to the remote apps. You can't set the envar's value on the cmd line. So you
told mpirun to pickup the value of an envar called "DISPLAY=:0.
ocesses. So I believe it to be necessary.
>
> But I'm thinking I may have to configure some kind of X11 forwarding. I'm
> not sure...
>
> Thanks for your reply! Any more ideas?
> Brad
>
>
> On Mon, Dec 6, 2010 at 6:31 PM, Ralph Castain wrote:
> Guess
n your app, just do a getenv and print the display envar
Would help tell us if there is an OMPI problem, or just a problem in how you
setup X11
On Dec 6, 2010, at 9:18 PM, Ralph Castain wrote:
> Hmmm...yes, the code does seem to handle that '=' being in there. Forgot it
> was th
You might want to ask the boost people - we wouldn't have any idea what asio is
or does.
On Dec 7, 2010, at 6:06 AM, Hannes Brandstätter-Müller wrote:
> Hello!
>
> I am using OpenMPI in combination with the boost libraries (especially
> boost::asio) and came across a weird interaction. When I
d using X for
> most things many years ago, so my xhost/xauth information is probably a
> little dated. Google around for the most recent / best ways to do this stuff.
>
>
> On Dec 6, 2010, at 10:11 PM, Ralph Castain wrote:
>
> > BTW: you might check to see if the DISPLAY e
That could mean you didn't recompile the code using the new version of OMPI.
The 1.4 and 1.5 series are not binary compatible - you have to recompile your
code.
If you did recompile, you may be getting version confusion on the backend nodes
- you should check your ld_library_path and ensure it
I know we have said this many times - OMPI made a design decision to poll hard
while waiting for messages to arrive to minimize latency.
If you want to decrease cpu usage, you can use the yield_when_idle option (it
will cost you some latency, though) - see ompi_info --param ompi all
Or don't se
>> :display and it'll just go in an unencrypted fashion. This is
>> normal X forwarding stuff -- you can probably google around for more info on
>> this.
>>
>> NOTE: IIRC, xauth is better than xhost these days. I stopped using X for
>> most things m
Sorry for delay - am occupied with my day job.
Yes, that is correct to an extent. When you yield the processor, all that
happens is that you surrender the rest of your scheduled time slice back to the
OS. The OS then cycles thru its scheduler and sequentially assigns the
processor to the line o
The answer is yes - sort of...
In OpenMPI, every process has information about not only its own local rank,
but the local rank of all its peers regardless of what node they are on. We use
that info internally for a variety of things.
Now the "sort of". That info isn't exposed via an MPI API at
rocesses and have them use that as the color to a MPI_Comm_split call.
> Once you've done that you can do a MPI_Comm_size to find how many are on the
> node and be able to send to all the other processes on that node using the
> new communicator.
>
> Good luck,
>
> -
There are no race conditions in this data. It is determined by mpirun prior to
launch, so all procs receive the data during MPI_Init and it remains static
throughout the life of the job. It isn't dynamically updated at this time (will
change in later versions), so it won't tell you if a process
Sorry - guess I had misunderstood. Yes, if all you want is the local rank of
your own process, then this will work.
My suggestion was if you wanted the list of local procs, or to know the local
rank of your peers.
On Dec 10, 2010, at 1:24 PM, David Mathog wrote:
> Ashley Pittman wrote:
>
>>
Terry is correct - not guaranteed, but that is the typical behavior.
However, you -can- guarantee that rank=0 will be on a particular host. Just run
your job:
mpirun -n 1 -host my_app : -n (N-1) my_app
This guarantees that rank=0 is on host . All other ranks will be
distributed according to t
mpirun is not an MPI process, and so it doesn't obey the btl port params. To
control mpirun's ports (and those used by the ORTE daemons), use the
oob_tcp_port... params
On Dec 10, 2010, at 3:29 PM, Tang, Hsiu-Khuern wrote:
>
> Hi,
>
> I am trying to understand how to control the range of por
> LISTEN 9714/mpirun
> tcp0 0 :::58600:::*
> LISTEN 9714/mpirun
> ...
>
> --
> Best,
> Hsiu-Khuern.
>
>
> * On Fri 03:49PM -0700, 10 Dec 2010, Ralph Castain (r...@open-mpi.org) wrote:
What version of OMPI are you using? That error message looks like something
from an ancient version - might be worth updating.
On Dec 13, 2010, at 4:04 AM, peifan wrote:
> i have 3 nodes, one is master node and another is computing nodes,these nodes
> deployed in the internet (not in cluster)
>
around to find those discussions.
>
>
> On Dec 9, 2010, at 4:07 PM, Ralph Castain wrote:
>
>> Sorry for delay - am occupied with my day job.
>>
>> Yes, that is correct to an extent. When you yield the processor, all that
>> happens is that you surrender the rest
l for some time before deciding to yield.
On Dec 13, 2010, at 7:52 AM, Jeff Squyres wrote:
> See the discussion on kerneltrap:
>
>http://kerneltrap.org/Linux/CFS_and_sched_yield
>
> Looks like the change came in somewhere around 2.6.23 or so...?
>
>
>
> On De
OMPI does use those methods, but they can't be used for something like shared
memory. So if you want the performance benefit of shared memory, then we have
to poll.
On Dec 13, 2010, at 9:00 AM, Hicham Mouline wrote:
> I don't understand 1 thing though and would appreciate your comments.
>
>
Not sure I fully understand the question. If you provide the --ompi-server
option to mpirun, that info will be passed along to all processes,
including those launched via comm_spawn, so they can subsequently connect to
the server.
On Dec 14, 2010, at 6:50 AM, Suraj Prabhakaran wrote:
> Hello
That's a big cluster to be starting with rsh! :-)
When you say it won't start, do you mean that it hangs? Or does it fail with
some error message? How many nodes are involved (this is the important number,
not the number of cores)?
Also, what version are you using?
On Dec 14, 2010, at 9:10 AM
>
> I wonder : is this plm_rsh_num_concurrent parameter standing ONLY for rsh use,
> or for ssh OR rsh, depending on plm_rsh_agent, please ?
>
> Thanks, Best, G.
>
>
> Le 14/12/2010 18:30, Ralph Castain a écrit :
>> That's a big cluster to be starting wit
It would appear that there is something trying to talk to a socket opened by
one of your daemons. At a guess, I would bet the problem is that a prior job
left a daemon alive that is talking on the same socket.
Are you by chance using static ports for the job? Did you run another job just
before
On Dec 15, 2010, at 10:14 AM, Gilbert Grosdidier wrote:
> Bonjour Ralph,
>
> Thanks for taking time to help me.
>
> Le 15 déc. 10 à 16:27, Ralph Castain a écrit :
>
>> It would appear that there is something trying to talk to a socket opened by
>> one of you
On Dec 15, 2010, at 12:30 PM, Gilbert Grosdidier wrote:
> Bonsoir Ralph,
>
> Le 15/12/2010 18:45, Ralph Castain a écrit :
>> It looks like all the messages are flowing within a single job (all three
>> processes mentioned in the error have the same identifier). Only possib
t zombies in their wake. Can you clean those up?
>
> Thanks, Best, G.
>
>
>
> Le 15/12/2010 21:03, Ralph Castain a écrit :
>> On Dec 15, 2010, at 12:30 PM, Gilbert Grosdidier wrote:
>>
>>> Bonsoir Ralph,
>>>
>>> Le 15/12/2010 18:
I have no idea what you mean by "cell sizes per core". Certainly not any
terminology within OMPI...
On Dec 15, 2010, at 3:47 PM, Vaz, Guilherme wrote:
>
> Dear all,
>
> I have a problem with openmpi1.3, ifort+mkl v11.1 in Ubuntu10.04 systems (32
> or 64bit). My code worked in Ubuntu8.04 and
I'm not sure there is any documentation yet - not much clamor for it. :-/
It would really help if you included the error message. Otherwise, all I can do
is guess, which wastes both of our time :-(
My best guess is that the port reservation didn't get passed down to the MPI
procs properly - but
All --debug-daemons really does is keep the ssh session open after launching
the remote daemon and turn on some output. Otherwise, we close that session as
most systems only allow a limited number of concurrent ssh sessions to be open.
I suspect you have a system setting that kills any running j
e that orted daemon on the second node is called in a different
> way.
> Moreover when i launch without --debug-daemons a process called orted..
> remain active on the second node after i kill (ctrl+c) the command on the
> first node.
>
> Can you continue to help me ?
>
gt; have to Ctrl-C and terminate
>>
>> I have mpiports defined in my slurm config and running srun with
>> -resv-ports does show the SLURM_RESV_PORTS environment variable
>> getting parts to the shell
>>
>>
>> On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain wr
transport key from ORTE
> (orte_precondition_transports not present in the environment)
> PML add procs failed
> --> Returned "Error" (-1) instead of "Success" (0)
>
> Turn off PSM and srun works fine
>
>
> On Thu, Dec 30, 2010 at 5:13 PM, Ralph Cas
inary data
On Dec 30, 2010, at 11:18 AM, Michael Di Domenico wrote:
> Sure, i'll give it a go
>
> On Thu, Dec 30, 2010 at 5:53 PM, Ralph Castain wrote:
>> Ah, yes - that is going to be a problem. The PSM key gets generated by
>> mpirun as it is shared info - i.e., every
Should have also warned you: you'll need to configure OMPI --with-devel-headers
to get this program to build/run.
On Dec 30, 2010, at 1:54 PM, Ralph Castain wrote:
> Well, I couldn't do it as a patch - proved too complicated as the psm system
> looks for the value early in th
at 2:11 PM, Michael Di Domenico wrote:
> How early does this need to run? Can I run it as part of a task
> prolog, or does it need to be the shell env for each rank? And does
> it need to run on one node or all the nodes in the job?
>
> On Thu, Dec 30, 2010 at 8:54 PM, Ralph Cas
Correct
On Jan 4, 2011, at 3:33 AM, Hicham Mouline wrote:
> From what I understand, unix variants can talk to each other (linux to macosx
> sunos ...) but windows cannot talk to non windows (not yet? :-)
>
> regards,
>
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
I'm afraid I don't understand your example - are you saying you provide "-np 1"
and get two processes instead of 1?
If so, would you please provide info on the type of system where this happens?
I've never seen it with mpich or ompi
On Jan 5, 2011, at 4:57 PM, Kristian Medri wrote:
> Any hint
Afraid not - though you could alias your program name to be "nice --10 prog"
On Jan 6, 2011, at 3:39 PM, David Mathog wrote:
> Is it possible using mpirun to specify the nice value for each program
> run on the worker nodes? It looks like some MPI implementations allow
> this, but "mpirun --hel
w had Open MPI on it.
>
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Ralph Castain
> Sent: January 5, 2011 8:09 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Duplicate independent processes
>
> I
On Jan 12, 2011, at 12:54 PM, Tena Sakai wrote:
> Hi Siegmar,
>
> Many thanks for your reply.
>
> I have tried man pages you mention, but one hurdle I am running into
> is orte_hosts page. I don't find the specification of fields for
> the file. I see an example:
>
> dummy1 slots=4
> dum
601 - 700 of 3066 matches
Mail list logo