On Apr 15, 2011, at 2:59 AM, Reuti wrote:
> Hi,
>
> Am 15.04.2011 um 07:25 schrieb Asad Ali:
>
>>
>> Yes. The entire job gets restarted.
>
> maybe this is caused by a signal sent to the job by Condor, so that it gets
> terminated and as a result Condor restarts it. Can you trap the signals
I'm no SGE expert, but I do note that your original error indicates that mpirun
was unable to find a launcher for your environment. When running under SGE,
mpirun looks for certain environmental variables indicative of SGE. If it finds
those, it then looks for the "qrsh" command. If it doesn't f
Just a suggestion: have you looked at it in a debugger? The error isn't coming
from OMPI - looks like a segfault caused by an error in the program or how it
is being run.
On Apr 19, 2011, at 7:19 AM, hi wrote:
> On WINDOWS platform, I am observing following error when executing
> "mpirun blacs
s: Visual Studio 2008 32bit and Intel ifort 32bit
> OpenMPI: OpenMPI-1.5.3 pre-built libraries and also with
> OpenMPI-1.5.2. locally built libraries
> BLACS: pre-built libraries taken from
> http://icl.cs.utk.edu/lapack-for-windows/scalapack/index.html#librairies
>
> Tha
You have to tell mpiexec what nodes you want to use for your application. This
is typically done either on the command line or in a file. For now, you could
just do this:
mpiexec -host node1,node2,node3 -np N ./my_app
where node1,node2,node3,... are the names or IP addresses of the nodes you
Nothing was attached, but I doubt they would help anyway. This looks like a
missing header file in Ubuntu, or else one that got moved and needs a different
path.
Where is asm/errno.h, and how was it included in /usr/include/linux/errno.h?
Best I can figure is it got put in some non-standard pla
On Apr 19, 2011, at 2:24 PM, Sergiy Bubin wrote:
>
> Thanks for the suggestion. I have figured (by googling around and comparing
> the content of asm directories) that Ubuntu 11.04 has some difference in the
> location of /usr/include/asm/. It appears that now that whole directory is
> locate
Sure - instead of what you did, just add --without-portals to your original
configure. The exact option depends on what portals you have installed.
Here is the relevant part of the "./configure -h" output:
--with-portals=DIR Specify the installation directory of PORTALS
--with-portals-l
On Apr 21, 2011, at 4:41 PM, Brock Palen wrote:
> Given that part of our cluster is TCP only, openib wouldn't even startup on
> those hosts
That is correct - it would have no impact on those hosts
> and this would be ignored on hosts with IB adaptors?
Ummm...not sure I understand this one.
On Apr 22, 2011, at 1:42 PM, ya...@adina.com wrote:
> Open MPI 1.4.3 + Intel Compilers V8.1 summary:
> (in case someone likes to refer to it later)
>
> (1) To make all Open MPI executables statically linked and
> independent of any dynamic libraries,
> "--disable-shared" and "--enable-static" o
On Apr 23, 2011, at 6:20 AM, Reuti wrote:
> Hi,
>
> Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios:
>
>> I'm having a bit of a problem with wrapping mpirun in a script. The script
>> needs to run an MPI job in the background and tail -f the output. Pressing
>> Ctrl+C should stop tail -f, and
On Apr 23, 2011, at 9:07 AM, Pablo Lopez Rios wrote:
>> what about:
>> ( trap "" sigint; exec mpiexec ...)&
>
> Yup, that's included in the workarounds I tried. Tried again with your
> specific suggestion; no luck.
>
>> Well, maybe mpiexec is adjusting it on its own
>> again. This can be check
ing to accomplish, but there are other
signals that don't cause termination. For example, we trap and forward SIGUSR1
and SIGUSR2 to your application procs, if that is of use.
But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore
it.
>
> Tha
mpirun overriding the trap in the *parent*
> subshell so that it ends up getting the SIGINT that was supposedly blocked at
> that level? Am I missing something trivial? How can I avoid this?
I keep telling you - you can't. The better way to do this is to execute mpirun,
and then run tail i
mp; out"&
> tail -f out
>
Yes - but now you can't kill mpirun when something goes wrong
> Thanks,
> Pablo
>
>
> On 23/04/11 18:39, Reuti wrote:
>> Am 23.04.2011 um 19:33 schrieb Ralph Castain:
>>
>>> On Apr 23, 2011, at 10:40 AM, Pa
On Apr 23, 2011, at 12:07 PM, Reuti wrote:
> Am 23.04.2011 um 19:58 schrieb Ralph Castain:
>
>>
>> On Apr 23, 2011, at 11:55 AM, Pablo Lopez Rios wrote:
>>
>>>> What about setsid and pushing it in a new
>>>> seesion instead of using&
Don't give it a host argument - unless you are trying to cross-compile, it
should figure it out for itself
On Apr 23, 2011, at 1:25 PM, Fernando Dutra Fagundes Macedo wrote:
> Correcting:
>
> I tried 1.5.2 and 1.5.3.
>
>
> -Mensagem original-
> De: users-boun...@open-mpi.org em nome
re it's still there, but it is hard to find. Try searching
the OMPI web site for info.
On Apr 25, 2011, at 5:09 AM, Fernando Dutra Fagundes Macedo wrote:
> I'm trying to cross-compile.
>
> -Mensagem original-
> De: users-boun...@open-mpi.org [mailto:users-boun.
Perhaps a firewall? All it is telling you is that mpirun couldn't establish TCP
communications with the daemon on ln10.
On Apr 27, 2011, at 11:58 AM, Sindhi, Waris PW wrote:
> Hi,
> I am getting a "oob-tcp: Communication retries exceeded" error
> message when I run a 238 MPI slave code
>
>
it, and/or have this executable
>> compiled as part of the PSM MTL and then installed into $bindir (maybe named
>> ompi-psm-keygen)?
>>
>> Right now, it's only compiled as part of "make check" and not installed,
>> right?
>>
>> On Dec
On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote:
> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote:
>>
>> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote:
>>
>>> Was this ever committed to the OMPI src as something not having to be
>>&
On Apr 27, 2011, at 1:27 PM, Jeff Squyres wrote:
> On Apr 27, 2011, at 2:46 PM, Ralph Castain wrote:
>
>> Actually, I understood you correctly. I'm just saying that I find no
>> evidence in the code that we try three times before giving up. What I see is
>> a si
, TechApps
> Pratt & Whitney, UTC
> (860)-565-8486
>
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Ralph Castain
> Sent: Wednesday, April 27, 2011 2:18 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI
On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote:
> On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote:
>>
>> On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote:
>>
>>> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote:
>>>>
>
What lead you to conclude 1.2.8?
> Is there any way you can upgrade to a (much) later version, such as 1.4.3?
> That might improve your TCP connectivity -- we made improvements in those
> portions of the code over the years.
>
> On Apr 27, 2011, at 8:09 PM, Ralph Castain wrot
On Apr 28, 2011, at 6:49 AM, Jeff Squyres wrote:
> On Apr 28, 2011, at 8:45 AM, Ralph Castain wrote:
>
>> What lead you to conclude 1.2.8?
>>
>>>>>> /opt/openmpi/i386/bin/mpirun -mca btl_openib_verbose 1 --mca btl ^tcp
>>>>>> --mca pls
27;s trunk, but not yet in a
release.
>
> Sincerely,
>
> Waris Sindhi
> High Performance Computing, TechApps
> Pratt & Whitney, UTC
> (860)-565-8486
>
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Be
On Apr 28, 2011, at 6:49 AM, Michael Di Domenico wrote:
> On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain wrote:
>>
>> On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote:
>>
>>> On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote:
>>>>
>
> Surprisingly, they are trying 'localhost:11.0' whereas when i use 'ssh -Y'
>> the
>> DISPLAY variable is set to 'localhost:10.0'
>>
>> So in what way would OMPI have to be adapted, so -xterm would work?
>>
>> Thank You
>>
Castain wrote:
>
> On Apr 28, 2011, at 6:49 AM, Michael Di Domenico wrote:
>
>> On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain wrote:
>>>
>>> On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote:
>>>
>>>> On Wed, Apr 27, 2011 at 2:46
t;> /usr/bin/xterm Xt error: Can't open display: localhost:11.0
>>>> /usr/bin/xterm Xt error: Can't open display: localhost:11.0
>>>> OMPI_COMM_WORLD_RANK=0
>>>> [aim-squid_0:09856] [[54132,0],1]->[[54132,0],0]
>>>> mca_oob_tcp_msg_se
n: r22285
> Open MPI release date: Dec 08, 2009
>Open RTE: 1.4
>
>
> Sincerely,
>
> Waris Sindhi
> High Performance Computing, TechApps
> Pratt & Whitney, UTC
> (860)-565-8486
>
> -Original Message-
> From: users-boun...@open-m
>> Warning: No xauth data; using fake authentication data for X11 forwarding.
>>>> /usr/bin/xterm Xt error: Can't open display: localhost:11.0
>>>> /usr/bin/xterm Xt error: Can't open display: localhost:11.0
>>>> OMPI_COMM_WORLD_RANK=0
>>&g
Hi Michael
Please see the attached updated patch to try for 1.5.3. I mistakenly free'd the
envar after adding it to the environ :-/
Thanks
Ralph
slurmd.diff
Description: Binary data
On Apr 28, 2011, at 2:31 PM, Michael Di Domenico wrote:
> On Thu, Apr 28, 2011 at 9:03 AM, Ralph
On Apr 29, 2011, at 8:05 AM, Michael Di Domenico wrote:
> On Fri, Apr 29, 2011 at 10:01 AM, Michael Di Domenico
> wrote:
>> On Fri, Apr 29, 2011 at 4:52 AM, Ralph Castain wrote:
>>> Hi Michael
>>>
>>> Please see the attached updated patch to try for 1.
p. We just need someone to
explain the requirements on that precondition value.
Thanks
Ralph
On Apr 29, 2011, at 8:12 AM, Ralph Castain wrote:
>
> On Apr 29, 2011, at 8:05 AM, Michael Di Domenico wrote:
>
>> On Fri, Apr 29, 2011 at 10:01 AM, Michael Di Domenico
>> wrote:
&
t, no xterm.
>
>> From these results i would say that there is no basic mishandling of
> 'ssh', though i have no idea
> what internal differences the use of the '-leave-session-attached'
> option or the debug options make.
>
> I hope these observations ar
run -np 4 -host squid_0 -mca
>>> plm_rsh_agent "ssh -Y" --leave-session-attached --xterm 0,1,2,3!
>>> ./HelloMPI
>>> The xterms are also opened if i do not use the '!' hold option.
>>
Did I miss something?
> Thank You
> Jody
>
; option)
Ah, well that might explain it. I don't know how xterm would react to just
being launched by mpirun onto a remote platform without any command to run. I
can't explain what the plm verbosity has to do with anything, though.
> Jody
>
> On Mon, May 2, 2011 at 4:
It's probably looking for the torque lib in lib instead of lib64. There should
be a configure option to tell it --with-tm-libdir or something like that -
check "configure -h"
On May 2, 2011, at 6:22 PM, Jim Kusznir wrote:
> Hi all:
>
> I'm trying to build openmpi 1.4.3 against PGI 11.4 on my
The error message is telling you the problem. You don't have your remote path
set so it can find the OMPI installation on the remote hosts. Look at the OMPI
FAQ section for more info if you are unsure how to set paths on remote hosts.
On May 3, 2011, at 2:04 AM, Ahsan Ali wrote:
> Hello,
>
>
You still have to set the PATH and LD_LIBRARY_PATH on your remote nodes to
include where you installed OMPI.
Alternatively, use the absolute path name to mpirun in your cmd - we'll pick up
the path and propagate it.
On May 3, 2011, at 9:14 PM, Ahsan Ali wrote:
> Dear Bart,
>
> I think OpenMP
Did you make clean first?
configure won't clean out the prior installation, so you may be picking up
stale libs.
On May 4, 2011, at 11:27 AM, Cizmas, Paul wrote:
> I added LDFLAGS=-m64, such that the command is now
>
> ./configure --prefix=/opt/openmpi1.4.3GFm64 CC=/sw/bin/gcc-fsf-4.5
> CFLAG
Usually that means you have a mismatch in your OMPI versions - you may have
built the app with one version and are running it against another, for
example, or perhaps compiled them against MPICH and run them using OMPI's
mpirun/mpiexec.
On Thu, May 5, 2011 at 1:23 PM, Bartłomiej W wrote:
> Hello
Why are you using ompi-clean for this purpose instead of a simple ctrl-c?
It wasn't intended for killing jobs, but only for attempting cleanup of lost
processes in extremity (i.e., when everything else short of rebooting the node
fails). So it isn't robust by any means.
On May 6, 2011, at 11:5
I don't know a lot about the Windows port, but that error means that mpirun got
an error when trying to discover the allocated nodes.
On May 11, 2011, at 6:10 AM, hi wrote:
> After setting OPAL_PKGDATADIR, "mpirun" gives proper help message.
>
> But when executing simple test program which cal
Sent from my iPad
On May 11, 2011, at 2:05 PM, Brock Palen wrote:
> On May 9, 2011, at 9:31 AM, Jeff Squyres wrote:
>
>> On May 3, 2011, at 6:42 AM, Dave Love wrote:
>>
We managed to have another user hit the bug that causes collectives (this
time MPI_Bcast() ) to hang on IB that
That would be a problem, I fear. We need to push those envars into the
environment.
Is there some particular problem causing what you see? We have no other reports
of this issue, and orterun has had that code forever.
Sent from my iPad
On May 11, 2011, at 2:05 PM, Peter Thompson
wrote:
>
On May 11, 2011, at 4:27 PM, Dave Love wrote:
> Ralph Castain writes:
>
>> I'll go back to my earlier comments. Users always claim that their
>> code doesn't have the sync issue, but it has proved to help more often
>> than not, and costs nothing to try,
>
On May 12, 2011, at 9:53 PM, Rodrigo Silva Oliveira wrote:
> Hi there.
>
> I'm developing a distributed system with a communication layer based on Open
> MPI. As part of my project, I have to create a process scheduler. So I
> decided to use the MPI_Spawn function to dynamically create (it is
I believe I answered that question. You can use the hostfile info key, or you
can use the host info key - either one will do what you require.
On May 13, 2011, at 4:11 PM, Rodrigo Silva Oliveira wrote:
> Hi,
>
> I think I was not specific enough. I need to spawn the copies of a process in
> a
gnores the repetition of hosts.
> Using Rodrigo's example I did:
>
> host info key = "m1,m2,m2,m2,m3" and number of processes = 5 and the result
> was
>
> m1 -> 2
> m2 -> 2
> m3 -> 1
>
> and not
>
> m1 -> 1
> m2 -> 3
> m3
've passed your comment back to the engineer, with a suspicion about the
> concerns about the abort, but if you have other objections, let me know.
>
> Cheers,
> PeterT
>
>
> Ralph Castain wrote:
>> That would be a problem, I fear. We need to push those envars into t
used by
> putenv(), and I do know that while that used to be just flagged as an event
> before, now we seem to be unable to continue past it. Not sure if that is
> our change or a library/system change.
> PeterT
>
>
> Ralph Castain wrote:
>> On May 16, 2011, at 12:
array are
> ignored because an info argument applies to the entire job that is spawned,
> and cannot be different for each executable in the job. See the INFO
> ARGUMENTS section for more information."
>
> Anyway, I'm glad it works!
>
> Thank you very much!
>
&g
le
>>> paths, and it's better to use UNC path.
>>>
>>> To clarify the path issue, if you just copy the OMPI dir to another
>>> computer, there might also be another problem that OMPI couldn't load the
>>> registry entries, as the registry entries w
before, now we seem to be unable to continue past it. Not sure if that is
> our change or a library/system change.
> PeterT
>
>
> Ralph Castain wrote:
>> On May 16, 2011, at 12:45 PM, Peter Thompson wrote:
>>
>>
>>> Hi Ralph,
>>>
>&g
Oh my - that is such an old version! Any reason for using it instead of
something more recent?
On Jun 11, 2011, at 8:43 AM, Ole Kliemann wrote:
> Hi everyone!
>
> I'm trying to use MPI on a cluster running OpenMPI 1.2.4 and starting
> processes through PBSPro_11.0.2.110766. I've been running i
On Jun 13, 2011, at 1:32 PM, Rodrigo Oliveira wrote:
> The point is: I have a system composed by a set of mpi processes. These
> processes run as daemons in each cluster machine. I need a way to kill those
> ones when I decide to shutdown the system.
Do you mean that your MPI processes actuall
One possibility: if you increase the number of processes in the job, and they
all interconnect, then the IB interface can (I believe) run out of memory at
some point. IIRC, the answer was to reduce the size of the QPs so that you
could support a larger number of them.
You should find info about
On Jun 28, 2011, at 9:05 AM, ya...@adina.com wrote:
> Hello All,
>
> I installed Open MPI 1.4.3 on our new HPC blades, with Infiniband
> interconnection.
>
> My system environments are as:
>
> 1)uname -a output:
> Linux gulftown 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT
> 2010 x86_64 x
How are you passing the port info between the server and client? You're hitting
a race condition between the two sides.
On Jun 27, 2011, at 9:29 AM, Rodrigo Oliveira wrote:
> Hi there.
> I am developing a server/client application using Open MPI 1.5.3. In a point
> of the server code I open a p
On Jun 28, 2011, at 3:52 PM, ya...@adina.com wrote:
> Thanks, Ralph!
>
> a) Yes, I know I could use only IB by "--mca btl openib", but just
> want to make sure I am using IB interfaces. I am seeking an option
> to mpirun to print out the actual interconnect protocol, like --prot to
> mpirun i
On Jun 28, 2011, at 3:52 PM, ya...@adina.com wrote:
> Thanks, Ralph!
>
> a) Yes, I know I could use only IB by "--mca btl openib", but just
> want to make sure I am using IB interfaces. I am seeking an option
> to mpirun to print out the actual interconnect protocol, like --prot to
> mpirun i
Looking deeper, I believe we may have a race condition in the code. Sadly, that
error message is actually irrelevant, but causes the code to abort.
It can be triggered by race conditions in the app as well, but ultimately is
something we need to clean up.
On Jun 27, 2011, at 9:29 AM, Rodrigo O
That didn't come from OMPI - that error message is from LAM-MPI, which no
longer is supported.
I suggest you check the default path being set by Torque - looks like it is
picking up an old LAM install.
On Jun 30, 2011, at 8:24 PM, zhuangchao wrote:
> hello all ,
>
> I submited the f
is started and it stores the port name in a file. When a
> client is started, it gets this port name and tries to connect. In my tests
> the error happens about 1 time in 10 executions.
>
> It still working without confidence.
>
> On Tue, Jun 28, 2011 at 11:10 PM, Ralph Ca
I very much doubt we have Tile support as it hasn't come up before. If you look
in opal/asm/base, you'll see a MIPS.asm that contains the MIPS code - perhaps
you could use that as a starting point?
I didn't write any of that code, but I think if you poke around that directory
looking for "MIPS"
Looks like your code is passing an invalid argument to MPI_Reduce...
On Jul 5, 2011, at 9:20 AM, ya...@adina.com wrote:
> Dear all,
>
> We are testing Open MPI over Infiniband, and got a MPI_Reduce
> error message when we run our codes either over TCP or
> Infiniband interface, as follows,
>
Let me get this straight. You are executing mpirun from inside a c-shell
script, launching onto nodes where you will by default be running bash. The
param I gave you should support that mode - it basically tells OMPI to probe
the remote node to discover what shell it will run under there, and th
I don't see Open MPI in your list of modules - looks to me like you are using
MPICH? If so, you should send this to their mailing list.
On Jul 5, 2011, at 1:44 PM, Chaudhari, Mangesh I wrote:
> hi all,
>
> I m trying to run a job from external hard disk and its giving me errors my
> output l
We don't directly link to that library, so it must be getting pulled in by some
other lib. Have you tried adding /usr/heimdal/lib to your LD_LIBRARY_PATH
before building?
On Jul 6, 2011, at 3:27 AM, Sushil Mishra wrote:
> Hi all:
> I am trying to install openmpi-1.5.2 in Debian 4.3.2-1.1. I am
Please see http://www.open-mpi.org/faq/?category=rsh#ssh-keys
On Jul 6, 2011, at 5:09 PM, Mohan, Ashwin wrote:
> Hi,
>
> I use the following command (mpirun --prefix /usr/local/openmpi1.4.3 -np 4
> hello) to successfully execute a simple hello world command on a single node.
> Each node has
Look at "mpirun -h" or "man mpirun" - you'll see options for binding processes
to cores etc.
On Jul 8, 2011, at 10:13 AM, Vlad Popa wrote:
> Hello!
>
> We habe a shared memory system based on 4CPUs of 12-core Opteron with a
> total of 256Gb RAM .
>
> Are there any switches, which we could
eport back when launched
> myocyte47 - daemon did not report back when launched
> myocyte49 - daemon did not report back when launched
>
> Thanks,
> Ashwin.
>
>
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of R
We've been moving to provide support for including values as CIDR notation
instead of names - e.g., 192.168.0/16 instead of bge0 or bge1 - but I don't
think that has been put into the 1.4 release series. If you need it now, you
might try using the developer's trunk - I know it works there.
On
On Jul 10, 2011, at 6:57 PM, BRADLEY, PETER C PW wrote:
> I know 1.4.x has a limit of 128 entries for procgroup files. To avoid some
> ugly surgery on a legacy application, we’d really like to have the ability to
> put up to 1024 lines in a procgroup file? Has the limit been raised at all
>
Have you gone to those nodes and checked their IP addresses of -all-
interfaces? OMPI must be picking up those addresses from somewhere - best guess
is that those nodes have multiple interfaces on them, some of which are
configured to those addresses.
Remember: we don't look at the /etc/hosts f
I believe we responded to this before...you might check your spam or inbox.
On Jul 12, 2011, at 7:39 PM, zhuangchao wrote:
> hello all :
>
>
>I run the following command :
>
> /data1/cluster/openmpi/bin/mpirun -d -machinefile /tmp/nodes.10515.txt
> -np 3 /data1/cluster
On Jul 12, 2011, at 2:34 PM, Paul Kapinos wrote:
> Hi OpenMPI folks,
>
> Using the version 1.4.3 of OpenMPI, I wanna to wrap the 'ssh' calls, which
> are made from the OpenMPIs 'mpiexec'. For this purpose, at least two ways
> seem to be possible for me:
>
> 1. let the wrapper have the name 's
On Jul 14, 2011, at 5:46 PM, Jeff Squyres wrote:
> Looping in the users mailing list so that Ralph and Oracle can comment...
Not entirely sure what I can contribute here, but I'll try - see below for some
clarifications. I think the discussion here is based on some misunderstanding
of how OMPI
Higher rev levels of the autotools are required for the 1.5 series - are you at
the right ones? See
http://www.open-mpi.org/svn/building.php
On Jul 22, 2011, at 9:12 AM, Paul Kapinos wrote:
> Dear OpenMPI volks,
> currently I have a problem by building the version 1.5.3 of OpenMPI on
> Scienti
A few thoughts:
* including both btl_tcp_if_include and btl_tcp_if_exclude is problematic as
they are mutually exclusive options. I'm not sure which one will take
precedence. I would suggest only using one of them.
* the default mapping algorithm is byslot - i.e., OMPI will place procs on each
r cmd line to see where
mpirun actually placed your processes, just to be sure they aren't overloading
a node.
>
> -Bill
>
> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of
> Ralph Castain [r...@open-mpi.org]
> Sent: Tuesday,
;
> Do I get it right: inside the granted slots by SGE you want the allocation
> inside Open MPI to follow a specific pattern, i.e.: which rank is where?
>
> -- Reuti
>
>
>>
>> Thanks for your help Ralph. At least I have some ideas on where to look now.
>>
>> -Bill
>> _
On Jul 26, 2011, at 1:58 PM, Reuti wrote:
allocation_rule$fill_up
>>>
>>> Here you specify to fill one machine after the other completely before
>>> gathering slots from the next machine. You can change this to $round_robin
>>> to get one slot form each node before taking a second from
I normally hide my eyes when rankfiles appear, but since you provide so much
help on this list yourself... :-)
I believe the problem is that you have the keyword "slots" wrong - it is
supposed to be "slot":
rank 1=host1 slot=1:0,1
rank 0=host2 slot=0:*
rank 2=host4 slot=1-2
rank
ite 100 times in the blackboard:
> "Slots in the hostfile, slot in the rankfile,
> slot is singular, to err is plural."
LOL
> ... at least until Ralph's new plural-forgiving parsing rule
> makes it to the code.
Committed to the trunk, in the queue for both 1.4.4 and 1.5.4
I don't believe we ever got anywhere with this due to lack of response. If you
get some info on what happened to tm_init, please pass it along.
Best guess: something changed in a recent PBS Pro release. Since none of us
have access to it, we don't know what's going on. :-(
On Jul 26, 2011, at
v11.x.
>
> I built OpenMPI 1.5.3 this morning with PBSPro v11.0, and it works fine. I
> don't get any segfaults.
>
> -Justin.
>
> On 07/26/2011 05:49 PM, Ralph Castain wrote:
>> I don't believe we ever got anywhere with this due to lack of response. If
>&
Do you have something like valgrind on your machine? If so, then why not launch
your apps under valgrind - eg., "mpirun valgrind my_app"?
If your app is segfaulting, there isn't much OMPI can do to tell you why. All
we can do is tell you that your app was hit with a SIGTERM.
Did you talk t
That error makes no sense - line 335 is just a variable declaration. Sure you
are not picking up a different version on that node?
On Aug 9, 2011, at 11:37 AM, CB wrote:
> Hi,
>
> Currently I'm having trouble to scale an MPI job beyond a certain limit.
> So I'm running an MPI hello example to
Also, please be aware that we haven't done any testing of OMPI on Lion, so this
is truly new ground.
On Aug 9, 2011, at 3:00 PM, Doug Reeder wrote:
> Matt,
>
> Are you sure you are building against your macports version of openmpi and
> not the one that ships w/ lion. In the trace back are it
tions for troubleshooting?
>
> Thanks,
> - Chansup
>
>
> On Tue, Aug 9, 2011 at 2:02 PM, CB wrote:
> Hi Ralph,
>
> Yes, you are right. Those nodes were still pointing to an old version.
> I'll check the installation on all nodes and try to run it again.
>
>
What version are you using?
On Aug 16, 2011, at 3:19 AM, Simone Pellegrini wrote:
> Dear all,
> I am developing a system to manage MPI tasks on top of MPI. The architecture
> is rather simple, I have a set of scheduler processes which takes care to
> manage the resources of a node. The idea is
tell us what is happening.
On Aug 16, 2011, at 5:09 AM, Simone Pellegrini wrote:
> On 08/16/2011 12:30 PM, Ralph Castain wrote:
>> What version are you using?
>
> OpenMPI 1.4.3
>
>>
>>
>> On Aug 16, 2011, at 3:19 AM, Simone Pellegrini wrote:
>>
>>
Smells like a bug - I'll take a look.
On Aug 16, 2011, at 9:10 AM, Simone Pellegrini wrote:
> On 08/16/2011 02:11 PM, Ralph Castain wrote:
>> That should work, then. When you set the "host" property, did you give the
>> same name as was in your machine file?
&
I'm not finding a bug - the code looks clean. If I send you a patch, could you
apply it, rebuild, and send me the resulting debug output?
On Aug 16, 2011, at 10:18 AM, Ralph Castain wrote:
> Smells like a bug - I'll take a look.
>
>
> On Aug 16, 2011, at 9:10 AM, S
Afraid I am confused. I assume this refers to the trunk, yes?
I also assume you are talking about launching an application directly from srun
as opposed to using mpirun - yes?
In that case, I fail to understand what difference it makes regarding this
proposed change. The application process is
Okay - thx! I'll install in trunk and schedule for 1.5
On Aug 22, 2011, at 7:20 AM, pascal.dev...@bull.net wrote:
>
> users-boun...@open-mpi.org a écrit sur 18/08/2011 14:41:25 :
>
>> De : Ralph Castain
>> A : Open MPI Users
>> Date : 18/08/2011 14:45
>
801 - 900 of 3066 matches
Mail list logo