On Jan 12, 2011, at 7:23 PM, Tena Sakai wrote:
> Hi,
>
> I can execute the command below:
>$ mpirun -H vixen -np 1 hostname : -H compute-0-0,compute-0-1,compute-0-2
> -np 3 hostname
> and I get:
>vixen.egcrc.org
>compute-0-0.local
>compute-0-1.local
>compute-0-2.local
>
> I
The problem is that mpirun regenerates itself to exec a command of "totalview
mpirun ", and the quotes are lost in the process.
Just start your debugged job with "totalview mpirun ..." and it should work
fine.
On Jan 27, 2011, at 3:00 AM, Gabriele Fatigati wrote:
> The problem is how mpiru
On Jan 27, 2011, at 7:47 AM, Reuti wrote:
> Am 27.01.2011 um 15:23 schrieb Joshua Hursey:
>
>> The current version of Open MPI does not support continued operation of an
>> MPI application after process failure within a job. If a process dies, so
>> will the MPI job. Note that this is true of
The easiest solution is to take advantage of the fact that the default hostfile
is an MCA parameter - so you can specify it in several ways other than on the
cmd line. It can be in your environment, in the default MCA parameter file, or
in an MCA param file in your home directory.
See
http://w
The 1.4 series is regularly tested on slurm machines after every modification,
and has been running at LANL (and other slurm installations) for quite some
time, so I doubt that's the core issue. Likewise, nothing in the system depends
upon the FQDN (or anything regarding hostname) - it's just us
Another possibility to check - are you sure you are getting the same OMPI
version on the backend nodes? When I see it work on local node, but fail
multi-node, the most common problem is that you are picking up a different OMPI
version due to path differences on the backend nodes.
On Feb 8, 201
See below
On Feb 8, 2011, at 2:44 PM, Michael Curtis wrote:
>
> On 09/02/2011, at 2:17 AM, Samuel K. Gutierrez wrote:
>
>> Hi Michael,
>>
>> You may have tried to send some debug information to the list, but it
>> appears to have been blocked. Compressed text output of the backtrace text
>
(due out any time now)
with the 1.5.1 slurm support. Any interested parties can follow it here:
https://svn.open-mpi.org/trac/ompi/ticket/2717
Ralph
On Feb 8, 2011, at 6:23 PM, Michael Curtis wrote:
>
> On 09/02/2011, at 9:16 AM, Ralph Castain wrote:
>
>> See below
>>
Gus is correct - the -host option needs to be in the appfile
On Feb 9, 2011, at 3:32 PM, Gus Correa wrote:
> Sindhi, Waris PW wrote:
>> Hi,
>>I am having trouble using the --app option with OpenMPI's mpirun
>> command. The MPI processes launched with the --app option get launched
>> on the l
Have you searched the email archive and/or web for openmpi and Amazon cloud?
Others have previously worked through many of these problems for that
environment - might be worth a look to see if someone already solved this, or
at least a contact point for someone who is already running in that env
OMPI doesn't do anything relative to the .ssh directory, or what key is used
for ssh authentication.
Afraid that is one you have to solve at the system level :-/
On Feb 15, 2011, at 11:35 AM, Barnet Wagman wrote:
> I need to find a way of controlling the rsa key used when open-mpi uses ssh
>
Setting the mca param plm_rsh_agent to "ssh -i xxx" should do the trick, I
think - haven't tried it, but it should work.
On Feb 15, 2011, at 12:24 PM, Barnet Wagman wrote:
>
>> OMPI doesn't do anything relative to the .ssh directory, or what key is used
>> for ssh authentication.
>>
>> Afrai
Your question actually doesn't make sense in an MPI application. In MPI, you
would have two independent processes running. One does the send, and the other
does the receive. Both processes are running all the time, each on its own
processor.
So you don't "switch" to another processor - the rece
Simplest soln: add -bynode to your mpirun cmd line
On Feb 20, 2011, at 10:50 PM, DOHERTY, Greg wrote:
> In order to be able to checkpoint openmpi jobs with blcr, we have
> configured openmpi as follows
>
> ./configure --prefix=/data1/packages/openmpi/1.5.1-blcr-without-tm
> --disable-openib-co
I very much doubt that either of those mappers has ever been tested against
comm_spawn. Just glancing thru them, I don't see an immediate reason why
loadbalance wouldn't work, but the error indicates that the system wound up
mapping one or more processes to an unknown node.
We are revising the
Resource managers generally frown on the idea of any program passing
RM-managed envars from one node to another, and this is certainly true of
slurm. The reason is that the RM reserves those values for its own use when
managing remote nodes. For example, if you got an allocation and then used
mpiru
oes OpenMPI start the
> processes on the remote nodes under the covers? (use srun, generate a
> hostfile and launch as you would outside SLURM, …) This may be the
> difference between HP-MPI and OpenMPI. Thanks, Brent From:
> users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
ALID=0
> > SLURM_LOCALID=1
> > SLURM_LOCALID=1
> > SLURM_NODEID=0
> > SLURM_NODEID=0
> > SLURM_NODEID=1
> > SLURM_NODEID=1
> > SLURM_PROCID=0
> > SLURM_PROCID=1
> > SLURM_PROCID=2
> > SLURM_PROCID=3
> > [brent@node1 mpi]$
> &g
If you are trying to use OMPI as the base for ORCM, then you can tell ORCM
to use OMPI's "tcp" multicast module - it fakes multicast using pt-2-pt tcp
messaging.
-mca rmcast tcp
will do the trick.
On Thu, Feb 24, 2011 at 6:27 AM, Jeff Squyres wrote:
> I'm still not sure what you're asking --
MPI).
>
We should note that you -can- directly srun an OMPI job now. I believe that
capability was released in the 1.5 series. It takes a minimum slurm release
level plus a slurm configuration setting to do so.
>
>
> On Feb 24, 2011, at 10:02 AM, Ralph Castain wrote:
>
> >
t; > SLURM_TASK_PID=2590
> [brent@node2 mpi]$
> [brent@node2 mpi]$
> [brent@node2 mpi]$ grep SLURM_PROCID srun.out
> SLURM_PROCID=0
> SLURM_PROCID=1
> [brent@node2 mpi]$ grep SLURM_PROCID mpirun.out
> SLURM_PROCID=0
> [brent@node2 mpi]$ grep SLURM_PROCID hpmpi.out
> SLURM
I guess I wasn't clear earlier - I don't know anything about how HP-MPI
works. I was only theorizing that perhaps they did something different that
results in some other slurm vars showing up in Brent's tests. From Brent's
comments, I guess they don't - but they launch jobs in a different manner
th
Error means OMPI didn't find a network interface - do you have your networks
turned off? Sometimes people travel with Airport turned off. If you haven wire
connected, then no interfaces exist.
Sent from my iPad
On Mar 1, 2011, at 11:50 AM, David Robertson
wrote:
> Hi all,
>
> I am having tr
On Mar 1, 2011, at 1:34 PM, David Robertson wrote:
> Hi,
>
> > Error means OMPI didn't find a network interface - do you have your
> > networks turned off? Sometimes people travel with Airport turned off.
> > If you haven wire connected, then no interfaces exist.
>
> I am logged in to the machi
Really appreciate you having looked into this!
Unfortunately, I can't see a way to resolve this for the general public. It
looks more to me like a PGI bug, frankly - not supporting code in a
system-level include makes no sense to me. But I confess this seems to be PGI's
mode of operation as I'v
OpenMPI version,
> that the LD_LIBRARY_PATH is consistent.
> So I would like to re-compile the openmpi-1.7a1r22794.tar.bz2 but where can I
> found it ?
>
>
> Thank you,
> Federico
>
>
>
>
>
>
>
>
>
>
> Il giorno 23 febbraio 2011 03:
FWIW: just tried current trunk on a multi-node cluster, and the loop_spawn test
worked fine there too.
On Mar 5, 2011, at 11:05 AM, Ralph Castain wrote:
> Hi Federico
>
> I tested the trunk today and it works fine for me - I let it spin for 1000
> cycles without issue. My tes
shprofile to export the correct
> LD_LIBRARY_PATH.
> - thank you for the usefull trick about svn.
No idea, then - all that error says is that the receiving code and the sending
code are mismatched.
>
>
> Thank you very much !!!
> Federico.
>
>
>
>
>
>
&
You need to set your LD_LIBRARY_PATH to point to where you installed openmpi.
On Mar 8, 2011, at 5:47 PM, Amos Leffler wrote:
> Hi,
>I am trying to get openmpi-1.4.3 to run but am having trouble.
> It is run using SUSE-11.3 with Intel XE-2011 Composer C and Fortran
> compilers. The comp
we can AC_DEFINE something to skip including that file
> in opal/util/if.h.
>
>
>
> On Mar 3, 2011, at 4:22 PM, Ralph Castain wrote:
>
>> Really appreciate you having looked into this!
>>
>> Unfortunately, I can't see a way to resolve this for the genera
11/bin:/home/fandreasi/libtool-2.2.6b/bin:$PATH
> > setenv LD_LIBRARY_PATH /home/fandreasi/libtool-2.2.6b/lib
> >
> > When I do the autogen it return me the error I've attached.
> > Can you help me on this ?
> >
> > Thank you,
> > Federico.
> >
> &
The error is telling you that your OS doesn't support queries telling us what
cores are on which sockets, so we can't perform a "bind to socket" operation.
You can probably still "bind to core", so if you know what cores are in which
sockets, then you could use the rank_file mapper to assign pro
> but ended up getting same error. Is there any patch that I can install in my
> system to make it
> topology aware?
>
> Thanks
>
>
> On Thu, Mar 17, 2011 at 2:05 PM, Ralph Castain wrote:
> The error is telling you that your OS doesn't support queries telling
7 13:14:04 CDT 2009 x86_64 x86_64
> x86_64 GNU/Linux
>
>
> On Thu, Mar 17, 2011 at 2:55 PM, Ralph Castain wrote:
> What OS version is it?
>
> uname -a
>
> will tell you, if you are on linux.
>
> On Mar 17, 2011, at 1:31 PM, vaibhav dutt wrote:
>
>> Hi
Just looking at this for another question. Yes, SGE integration is broken in
1.5. Looking at how to fix now.
Meantime, you can get it work by adding "-mca plm ^rshd" to your mpirun cmd
line.
On Mar 21, 2011, at 9:47 AM, Dave Love wrote:
> Terry Dontje writes:
>
>> Dave what version of Grid
Can you run anything under TM? Try running "hostname" directly from Torque to
see if anything works at all.
The error message is telling you that the Torque daemon on the remote node
reported a failure when trying to launch the OMPI daemon. Could be that Torque
isn't setup to forward environmen
On Mar 21, 2011, at 11:12 AM, Dave Love wrote:
> Ralph Castain writes:
>
>> Just looking at this for another question. Yes, SGE integration is broken in
>> 1.5. Looking at how to fix now.
>>
>> Meantime, you can get it work by adding "-mca plm ^rshd"
Ick - appears that got dropped a long time ago. I'll add it back in and post a
CMR for 1.4 and 1.5 series.
Thanks!
Ralph
On Mar 21, 2011, at 11:08 AM, David Turner wrote:
> Hi,
>
> About a month ago, this topic was discussed with no real resolution:
>
> http://www.open-mpi.org/community/list
; OMPI_COMM_WORLD_LOCAL_SIZE=1
> OMPI_MCA_orte_ess_jobid=3236233217
> OMPI_MCA_orte_ess_vpid=0
> OMPI_COMM_WORLD_RANK=0
> OMPI_COMM_WORLD_LOCAL_RANK=0
> OPAL_OUTPUT_STDERR_FD=19
>
> MPIExec with -mca plm rsh:
>
> [rsvancara@node164 ~]$ mpiexec -mca plm rsh -mca orte_tmpdi
.82:33559
>> OMPI_MCA_mpi_yield_when_idle=0
>> OMPI_MCA_orte_app_num=0
>> OMPI_UNIVERSE_SIZE=1
>> OMPI_MCA_ess=env
>> OMPI_MCA_orte_ess_num_procs=1
>> OMPI_COMM_WORLD_SIZE=1
>> OMPI_COMM_WORLD_LOCAL_SIZE=1
>> OMPI_MCA_orte_ess_jobid=3236233217
>> OM
wsuhpc.edu
>>>>> SHLVL=1
>>>>> HOME=/home/admins/rsvancara
>>>>> INTEL_LICENSES=/home/software/intel/Compiler/11.1/075/licenses:/opt/intel/licenses
>>>>> PBS_O_HOST=login1
>>>>> DYLD_LIBRARY_PATH=/home/software/intel/Compiler/11.1/075/tbb/intel6
On Mar 21, 2011, at 9:27 PM, Eugene Loh wrote:
> Gustavo Correa wrote:
>
>> Dear OpenMPI Pros
>>
>> Is there an MCA parameter that would do the same as the mpiexec switch
>> '-bind-to-core'?
>> I.e., something that I could set up not in the mpiexec command line,
>> but for the whole cluster, o
On Mar 22, 2011, at 6:02 AM, Dave Love wrote:
> Ralph Castain writes:
>
>>> Should rshd be mentioned in the release notes?
>>
>> Just starting the discussion on the best solution going forward. I'd
>> rather not have to tell SGE users to add this to
On a beowulf cluster? So you are using bproc?
If so, you have to use the OMPI 1.2 series - we discontinued bproc support at
the start of 1.3. Bproc will take care of the envars.
If not bproc, then I assume you will use ssh for launching? Usually, the
environment is taken care of by setting up y
On Mar 23, 2011, at 2:20 PM, Gus Correa wrote:
> Ralph Castain wrote:
>> On Mar 21, 2011, at 9:27 PM, Eugene Loh wrote:
>>> Gustavo Correa wrote:
>>>
>>>> Dear OpenMPI Pros
>>>>
>>>> Is there an MCA parameter that would do
On Mar 23, 2011, at 3:19 PM, Gus Correa wrote:
> Dear OpenMPI Pros
>
> Why am I getting the parser error below?
> It seems not to recognize comment lines (#).
>
> This is OpenMPI 1.4.3.
> The same error happens with the other compiler wrappers too.
> However, the wrappers compile and produce an
On Mar 24, 2011, at 12:45 PM, ya...@adina.com wrote:
> Thanks for your information. For my Open MPI installation, actually
> the executables such as mpirun and orted are dependent on those
> dynamic intel libraries, when I use ldd on mpirun, some dynamic
> libraries show up. I am trying to mak
>
> The whole problem must have been some computer daemon spell.
> Other than my growing feeling that flipping bits and logic gates
> have fun conspiring against my sanity, all is well now.
>
> Thank you,
> Gus Correa
>
>
>
> Gus Correa wrote:
>> Ralph Cast
Try adding some print statements so you can see where the error occurs.
On Mar 25, 2011, at 11:49 PM, Jack Bryan wrote:
> Hi , All:
>
> I running a Open MPI (1.3.4) program by 200 parallel processes.
>
> But, the program is terminated with
>
> ---
Have you tried a parallel debugger such as padb?
On Mar 26, 2011, at 10:34 AM, Jack Bryan wrote:
> Hi,
>
> I have tried this. But, the printout from 200 parallel processes make it
> very hard to locate the possible bug.
>
> They may not stop at the same point when the program got signal 9.
>
On Mar 26, 2011, at 11:34 AM, Michele Marena wrote:
> Hi,
> I've a problem with shared memory. When my application runs using pure
> message passing (one process for node), it terminates and returns correct
> results. When 2 processes share a node and use shared memory for exchanges
> messages
You don't need to install anything on a system folder - you can just install it
in your home directory, assuming that is accessible on the remote nodes.
As for the script - unless you can somehow modify it to allow you to run under
a debugger, I am afraid you are completely out of luck.
On Mar
mmunicating processes are on the same node the
> application don't terminate, otherwise the application terminate and its
> results are correct. My OpenMPI version is 1.2.7.
>
> 2011/3/26 Ralph Castain
>
> On Mar 26, 2011, at 11:34 AM, Michele Marena wrote:
>
> >
during
> compilation of your source and execution.
>
> -- Reuti
>
>
>> 2011/3/26 Ralph Castain
>> Can you update to a more recent version? That version is several years
>> out-of-date - we don't even really support it any more.
>>
>>
>> On Mar
I don't know, but Ashley may be able to help - or you can see his web site for
instructions.
Alternatively, since you can put print statements into your code, have you
considered using mpirun's option to direct output from each rank into its own
file? Look at "mpirun -h" for the options.
-o
If you use that mpirun option, mpirun will place the output from each rank into
a -separate- file for you. Give it:
mpirun --output-filename /myhome/debug/run01
and in /myhome/debug, you will find files:
run01.0
run01.1
...
each with the output from the indicated rank.
On Mar 26, 2011, at 3
That command line cannot possibly work. Both the -rf and --output-filename
options require arguments.
PLEASE read the documentation? mpirun -h, or "man mpirun" will tell you how to
correctly use these options.
On Mar 26, 2011, at 6:35 PM, Jack Bryan wrote:
> Hi, I used :
>
> mpirun -np 200
On Mar 27, 2011, at 7:37 AM, Tim Prince wrote:
> On 3/27/2011 2:26 AM, Michele Marena wrote:
>> Hi,
>> My application performs good without shared memory utilization, but with
>> shared memory I get performance worst than without of it.
>> Do I make a mistake? Don't I pay attention to something?
It means that Torque is unhappy with your job - either you are running longer
than it permits, or you exceeded some other system limit.
Talk to your sys admin about imposed limits. Usually, there are flags you can
provide to your job submission that allow you to change limits for your program.
ur overall application down*.
>>
>> How much does your application slow down in wall clock time? Seconds?
>> Minutes? Hours? (anything less than 1 second is in the noise)
>>
>>
>>
>> On Mar 27, 2011, at 10:33 AM, Ralph Castain wrote:
>>
>> >
>
It is hanging because your last nodes are not receiving the launch command.
The daemons receive a message from mpirun telling them what to launch. That
message is sent via a tree-like routing algorithm. So mpirun sends to the first
two daemons, each of which relays it on to some number of daemon
tual mpicc-wrapper-data.txt file
> that could cause this?
> Anything in the parser code?
>
> We have Linux CentOS 5.5 x86_62 with gcc 4.1.2.
> I built OpenMPI both with gfortran and Intel Ifort 12.0.0.
> Same problem on both builds.
>
> Thank you,
> Gus Correa
>
>
I'm afraid I have no idea what you are talking about. Are you saying you are
launching OMPI processes via mpirun, but with "pbsdsh" as the plm_rsh_agent???
That would be a very bad idea. If you are running under Torque, then let mpirun
"do the right thing" and use its Torque-based launcher.
On
On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
> Let me expand on this slightly (in response to Ralph Castain's posting
> -- I had digest mode set). As currently constructed a shellscript in
> Wien2k (www.wien2k.at) launches a series of tasks using
>
> ($remote $remotemachine "cd $PWD;$t $ttt
On Apr 3, 2011, at 9:12 AM, Reuti wrote:
> Am 03.04.2011 um 16:56 schrieb Ralph Castain:
>
>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>>
>>> Let me expand on this slightly (in response to Ralph Castain's posting
>>> -- I had digest mode set).
On Apr 3, 2011, at 9:34 AM, Laurence Marks wrote:
> On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote:
>>
>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote:
>>
>>> Let me expand on this slightly (in response to Ralph Castain's posting
>>
On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote:
>>>
>>> I am not using that computer. A scenario that I have come across is
>>> that when a msub job is killed because it has exceeded it's Walltime
>>> mpi tasks spawned by ssh may not be terminated because (so I am told)
>>> Torque does not kno
Works great for me...sleep is dead every time.
On Apr 3, 2011, at 3:13 PM, David Singleton wrote:
>
>> You can prove this to yourself rather easily. Just ssh to a remote node and
>> execute any command that lingers for awhile - say something simple like
>> "sleep". Then kill the ssh and do a
On Apr 3, 2011, at 3:22 PM, Reuti wrote:
> Am 03.04.2011 um 22:57 schrieb Ralph Castain:
>
>> On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote:
>>
>>>>>
>>>>> I am not using that computer. A scenario that I have come across is
>>>>&
On Apr 3, 2011, at 4:08 PM, Reuti wrote:
> Am 03.04.2011 um 23:59 schrieb David Singleton:
>
>> On 04/04/2011 12:56 AM, Ralph Castain wrote:
>>>
>>> What I still don't understand is why you are trying to do it this way. Why
>>> not just run
>&g
On Apr 3, 2011, at 4:37 PM, Laurence Marks wrote:
> On Sun, Apr 3, 2011 at 5:08 PM, Reuti wrote:
>> Am 03.04.2011 um 23:59 schrieb David Singleton:
>>
>>> On 04/04/2011 12:56 AM, Ralph Castain wrote:
>>>>
>>>> What I still don't un
oach.
Good luck!
>
> On Sun, Apr 3, 2011 at 6:13 PM, Ralph Castain wrote:
>>
>> On Apr 3, 2011, at 4:37 PM, Laurence Marks wrote:
>>
>>> On Sun, Apr 3, 2011 at 5:08 PM, Reuti wrote:
>>>> Am 03.04.2011 um 23:59 schrieb David Singleton:
>>
anned from the supercomputers I use
>> I want to find a adequate patch for myself --- and then try and
>> persuade the developers to adopt it.
>>
>> On Sun, Apr 3, 2011 at 6:13 PM, Ralph Castain wrote:
>>>
>>> On Apr 3, 2011, at 4:37 PM, Laurenc
On Apr 4, 2011, at 8:18 AM, Rob Latham wrote:
> On Sat, Apr 02, 2011 at 04:59:34PM -0400, fa...@email.com wrote:
>>
>> opal_mutex_lock(): Resource deadlock avoided
>> #0 0x0012e416 in __kernel_vsyscall ()
>> #1 0x01035941 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>> #2 0x
Guess I can/will add the node name to the error message - should have been
there before now.
If it is a debug build, you can add "-mca plm_base_verbose 1" to the cmd line
and get output tracing the launch and showing you what nodes are having
problems.
On Apr 4, 2011, at 8:24 AM, Nehemiah Dac
e currently doesn't include the node name -
not in the OMPI main code base, nor in the SCT port. So I will add it, which
won't help you at the moment.
Hence my suggestion about using the param :-)
>
> On Mon, Apr 4, 2011 at 9:34 AM, Ralph Castain wrote:
> Guess I can/will
robust
> and portable. This passes the simple test with B of "sleep 600" when
> terminating the process where the mpirun is launched kills the sleep
> on a remote node (unlike ssh on some but not all computers).
>
> On Mon, Apr 4, 2011 at 6:35 AM, Ralph Castain wrote:
>
Well, where is libfui located? Is that location in your ld path? Is the lib
present on all nodes in your hostfile?
On Apr 4, 2011, at 1:58 PM, Nehemiah Dacres wrote:
> [jian@therock ~]$ echo $LD_LIBRARY_PATH
> /opt/sun/sunstudio12.1/lib:/opt/vtk/lib:/opt/gridengine/lib/lx26-amd64:/opt/gridengin
Did you request an allocation from PCM? If not, then PCM will block you from
arbitrarily launching jobs on non-allocated nodes. Print out your environment
and look for any envars from PCM and/or LSF (e.g., LSB_JOBID).
I don't know what you mean about "no OMPI application is yet integrated with
If I read your error messages correctly, it looks like mpirun is crashing - the
daemon is complaining that it lost the socket connection back to mpirun, and
hence will abort.
Are you seeing mpirun still alive?
On Apr 5, 2011, at 4:46 AM, jody wrote:
> Hi
>
> On my workstation and the cluste
; Warning: No xauth data; using fake authentication data for X11 forwarding.
> Last login: Wed Apr 6 17:12:31 CEST 2011 from chefli.uzh.ch on ssh
>
> So perhaps the whole problem is linked to that xauth-thing.
> Do you have a suggestion how this can be solved?
>
> Thank You
> J
Sorry Jody - I should have read your note more carefully to see that you
already tried -Y. :-(
Not sure what to suggest...
On Apr 6, 2011, at 12:29 PM, Ralph Castain wrote:
> Like I said, I'm not expert. However, a quick "google" of revealed this
> result:
>
>
&
Here's a little more info - it's for Cygwin, but I don't see anything
Cygwin-specific in the answers:
http://x.cygwin.com/docs/faq/cygwin-x-faq.html#q-ssh-no-x11forwarding
On Apr 6, 2011, at 12:30 PM, Ralph Castain wrote:
> Sorry Jody - I should have read your note more car
Look at your output from mpicc --showme. It indicates that the OMPI libs were
put in the lib64 directory, not lib.
On Apr 6, 2011, at 1:38 PM, Nehemiah Dacres wrote:
> I am also trying to get netlib's hpl to run via sun cluster tools so i am
> trying to compile it and am having trouble. Which
at should they be in?
>
>
> On Wed, Apr 6, 2011 at 2:44 PM, Ralph Castain wrote:
> Look at your output from mpicc --showme. It indicates that the OMPI libs were
> put in the lib64 directory, not lib.
>
>
> On Apr 6, 2011, at 1:38 PM, Nehemiah Dacres wrote:
>
>
On Apr 6, 2011, at 1:27 PM, Jason Palmer wrote:
> Hello,
>
> I’m trying again with the 1.4.3 version to use compile openmpi statically
> with my program … but I’m running into a more basic problem, similar to one I
> previously encountered and solved using LD_LIBRARY_PATH.
>
> The configure
On Apr 6, 2011, at 1:21 PM, David Gunter wrote:
> We tend to build OMPI for several different architectures. Rather than untar
> the archive file each time I'd rather do a "make distclean" in between
> builds. However, this always produces the following error:
>
> ...
> Making distclean in li
Are you able to run non-MPI programs like "hostname"?
I ask because that error message indicates that everything started just fine,
but there is an error in your application.
On Apr 6, 2011, at 6:01 PM, Jason Palmer wrote:
> Btw, I did compile openmpi with the --with-sge flag.
>
> I am able t
On Apr 11, 2011, at 11:33 PM, kevin.buck...@ecs.vuw.ac.nz wrote:
>
>>> #!/bin/bash
>>> #$ -cwd
>>> #$ -j y
>>> #$ -S /bin/bash
>>> #$ -q all.q
>>> #$ -pe orte 18
>>> MPI_DIR=/home/jason/openmpi-1.4.3-install/bin
>>> /home/jason/openmpi-1.4.3-install/bin/mpirun -np $NSLOTS myprog
>
>
>> If you
Let's simplify the issue as we have no idea what your codes are doing.
Can you run two copies of hostname, for example?
What about multiple copies of an MPI version of "hello" - see the examples
directory in the OMPI tarball.
On Apr 12, 2011, at 8:43 AM, Stergiou, Jonathan C CIV NSWCCD West Be
Okay, that says that mpirun is working correctly - the problem appears to be in
MPI_Init.
How was OMPI configured?
On Apr 12, 2011, at 9:24 AM, Stergiou, Jonathan C CIV NSWCCD West Bethesda,
6640 wrote:
> Ralph,
>
> Thanks for the reply and guidance.
>
> I ran the following:
>
> $> mpirun
What version are you using? If you are using 1.5.x, there is an "orte-top"
command that will do what you ask. It queries the daemons to get the info.
On Apr 12, 2011, at 9:55 PM, Jack Bryan wrote:
> Hi , All:
>
> I need to monitor the memory usage of each parallel process on a linux Open
> M
On Apr 13, 2011, at 8:13 AM, Rushton Martin wrote:
> The bulk of our compute nodes are 8 cores (twin 4-core IBM x3550-m2).
> Jobs are submitted by Torque/MOAB. When run with up to np=8 there is
> good performance. Attempting to run with more processors brings
> problems, specifically if any one
nvironment before printing this email.
> -Original Message-
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Ralph Castain
> Sent: 13 April 2011 15:34
> To: Open MPI Users
> Subject: Re: [OMPI users] Over committing?
>
>
> On Apr
On Apr 13, 2011, at 10:29 AM, Jack Bryan wrote:
> Hi ,
>
> If I cannot ssh to a worker node, it means that my program cannot work
> correctly ?
No, that's not true. People thought you were on a cluster using ssh as the
launcher. From prior notes, you were using Torque, so not being allowed
On Apr 13, 2011, at 10:19 AM, Jack Bryan wrote:
> Hi, I am using
>
> mpirun (Open MPI) 1.3.4
>
> But, I have these,
>
> orte-clean orted orte-ioforte-ps orterun
>
> Can they do the same thing ?
Unfortunately, no
>
> If I use them, will they use a lot of memory on each wo
tried test jobs on 8+7 (or 7+8) with inconclusive
> results.
>> Some of the live jobs run for a month or more and cut down versions do
>
>> not model well.
>>
>> Martin Rushton
>> HPC System Manager, Weapons Technologies
>> Tel: 01959 514777, Mobile: 07939
Difficult to follow your thread here, but I think you're wondering about
post-job cleanup?
Torque runs an epilogue script on all nodes included in the allocation. It is
advisable to always have the epilogue script clean out the tmp directories,
assuming single-user use of allocated nodes. If mu
vering customer-focused solutions
>>
>> Please consider the environment before printing this email.
>> -Original Message-
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>> Behalf Of Ralph Castain
>> Sent: 14 April 2011 04:55
>> To
Not much we can say with that little info. :-/
Are you using Open MPI? If so, what version?
When you say the job gets restarted, do you mean that Condor restarts the
entire MPI job? If so, you had best talk to the Condor folks - it has nothing
to do with Open MPI, but is due to a job control fl
701 - 800 of 3066 matches
Mail list logo