ith a bad
configuration you could start all forks on the master node of the
parallel job and leave the slaves idling. Open MPI will do the right
thing on its own.
-- Reuti
# qconf -sp orte
pe_name orte
slots 999
user_listsNONE
xuser_lists NONE
start_p
cluster, you might want to look into a
batch queuing system. In fact: we use it even local on some machines
to serialize the workflow.
-- Reuti
Am 12.11.2008 um 14:40 schrieb Fabian Hänsel:
So, to make sure I understand what happens... This command:
mpirun -np 2 myprog
starts the program
h -pe orte 4 /path/to/binary
If you really need a shell, you can get one with:
$ qrsh -pe orte 4 bash -il
-- Reuti
Also, I have an initialization script from the vendor that requires
setting up local temporary directories. Prior to migration to OMPI
we just parsed the machines file th
Am 13.11.2008 um 05:41 schrieb Scott Beardsley:
Reuti wrote:
qlogin will create a completely fresh bash, which is not aware of
running under SGE. Although you could set the SGE_* variables by
hand, it's easier to use an interactive session with:
In the past we'd source some sge
_task FALSE
-- Reuti
urgency_slots min
# /opt/openmpi_intel/1.2.8/bin/ompi_info | grep gridengine
MCA ras: gridengine (MCA v1.0, API v1.3, Component
v1.2.8)
MCA pls: gridengine (MCA v1.0, API v1.3, Component
v1.2.8)
The SGE error and output files for th
Hi,
Am 24.12.2008 um 07:55 schrieb Sangamesh B:
Thanks Reuti. That sorted out the problem.
Now mpiblast is able to run, but only on single node. i.e. mpiformatdb
-> 4 fragments, mpiblast - 4 processes. Since each node is having 4
cores, the job will run on a single node and works fine. Wit
execd_params
ENABLE_ADDGRP_KILL=1" in your SGE configuration, to have the ability
to kill all the created xterm processes from SGE.)
HTH - Reuti
Am 16.01.2009 um 22:20 schrieb Jeff Dusenberry:
Reuti wrote:
Am 15.01.2009 um 16:20 schrieb Jeff Dusenberry:
I'm trying to launch multiple xterms under OpenMPI 1.2.8 and the
SGE job scheduler for purposes of running a serial debugger. I'm
experiencing file-locking probl
Am 16.01.2009 um 23:06 schrieb Reuti:
Am 16.01.2009 um 22:20 schrieb Jeff Dusenberry:
Reuti wrote:
Am 15.01.2009 um 16:20 schrieb Jeff Dusenberry:
I'm trying to launch multiple xterms under OpenMPI 1.2.8 and the
SGE job scheduler for purposes of running a serial debugger.
call to getenv, so I am not sure if mpiexec
on Fedora supports MPIEXEC_PORT_RANGE.
this variable is only part of MPICH(2), not Open MPI. Similar
discussions were already on the list, and one solution could be:
http://www.open-mpi.org/community/lists/users/2007/08/3962.php
-- Reuti
mpirun will occur on one of the nodes and
makes the connections to other nodes. Is this also possible? Do you
need ssh for sure?
-- Reuti
--
A daemon (pid 8462) died unexpectedly with status 129 while attempting
to
limit -l unlimited" near
the top of the SGE startup script on the computation nodes and
restarting SGE on every node.
Did you request/set any limits with SGE's h_vmem/h_stack resource
request?
-- Reuti
Jeremy Stout
On Sat, Jan 24, 2009 at 6:06 AM, Sangamesh B
wrote:
Hello all,
Am 25.01.2009 um 06:16 schrieb Sangamesh B:
Thanks Reuti for the reply.
On Sun, Jan 25, 2009 at 2:22 AM, Reuti
wrote:
Am 24.01.2009 um 17:12 schrieb Jeremy Stout:
The RLIMIT error is very common when using OpenMPI + OFED + Sun Grid
Engine. You can find more information and several
:
$cat err.77.CPMD-OMPI
ssh_exchange_identification: Connection closed by remote host
I think this might already be the reason why it's not working. A
mpihello program is running fine through SGE?
-- Reuti
--
A daemon
Am 31.01.2009 um 08:49 schrieb Sangamesh B:
On Fri, Jan 30, 2009 at 10:20 PM, Reuti marburg.de> wrote:
Am 30.01.2009 um 15:02 schrieb Sangamesh B:
Dear Open MPI,
Do you have a solution for the following problem of Open MPI (1.3)
when run through Grid Engine.
I changed global execd par
Am 01.02.2009 um 16:00 schrieb Sangamesh B:
On Sat, Jan 31, 2009 at 6:27 PM, Reuti
wrote:
Am 31.01.2009 um 08:49 schrieb Sangamesh B:
On Fri, Jan 30, 2009 at 10:20 PM, Reuti
wrote:
Am 30.01.2009 um 15:02 schrieb Sangamesh B:
Dear Open MPI,
Do you have a solution for the following
Am 02.02.2009 um 05:44 schrieb Sangamesh B:
On Sun, Feb 1, 2009 at 10:37 PM, Reuti
wrote:
Am 01.02.2009 um 16:00 schrieb Sangamesh B:
On Sat, Jan 31, 2009 at 6:27 PM, Reuti marburg.de> wrote:
Am 31.01.2009 um 08:49 schrieb Sangamesh B:
On Fri, Jan 30, 2009 at 10:20 PM, Reuti marburg
3431 \_ /home/reuti/mpihello
3433 3431 \_ /home/reuti/mpihello
-- Reuti
On Jan 29, 2009, at 3:05 PM, Rolf vandeVaart wrote:
I have not seen this before. I assume that for some reason, the
shared memory transport layer cannot create the file it uses for
communicating within a node. Ope
Am 02.02.2009 um 11:31 schrieb Sangamesh B:
On Mon, Feb 2, 2009 at 12:15 PM, Reuti
wrote:
Am 02.02.2009 um 05:44 schrieb Sangamesh B:
On Sun, Feb 1, 2009 at 10:37 PM, Reuti marburg.de> wrote:
Am 01.02.2009 um 16:00 schrieb Sangamesh B:
On Sat, Jan 31, 2009 at 6:27 PM, Reuti marburg
will automatically be forwarded to the remote nodes.
I have setted LD_LIBRARY_PATH, but still doesn't work.
Could you help me? Thanks in advance.
Can you SSH by hand? - Reuti
--
Ing. Gabriele Fatigati
Parallel programmer
CINECA Systems & Tecnologies Department
Supercomput
Hi,
the daemons will fork into daemon land - no accounting, no control by
SGE via qdel (nevertheless it runs, just not tightly integrated):
https://svn.open-mpi.org/trac/ompi/ticket/1783
-- Reuti
Am 26.02.2009 um 06:13 schrieb Sangamesh B:
Hello Reuti,
I'm sorry for the
jobs) and don't schedule any
jobs to this node further on. Or just kill all jobs running on this
node which should be excluded - of course, you might lose the
computing time spent on this node.
-- Reuti
Thank you in advance!
márcia.
___
use
Hi,
it shouldn't be necessary to supply a machinefile, as the one
generated by SGE is taken automatically (i.e. the granted nodes are
honored). You submitted the job requesting a PE?
-- Reuti
Am 18.03.2009 um 04:51 schrieb Salmon, Rene:
Hi,
I have looked through the list archive
Bernhard,
Am 18.03.2009 um 09:19 schrieb Bernhard Knapp:
come on, it must be somehow possible to use openmpi not on port
22!? ;-)
it's not an issue of Open MPI but ssh. You need in your home a file
~/.ssh/config with two lines:
host *
port 1234
or whatever port you need.
--
e you have to set this LD_LIBRARY_PATH in your .cshrc, so it's
known automatically on the nodes.
mpirun --mca plm_base_verbose 20 --prefix /bphpc7/vol0/salmr0/ompi -np
16 /bphpc7/vol0/salmr0/SGE/a.out
Do you use --mca... only for debugging or why is it added here?
-- Reuti
We are
IK in 1.3.2, so that
the daemons are still bound to a running sge_shephered.
If you need the -notify feature and corerct accouting, you will need
to wait until the qrsh_starter in SGE is fixed not to exit when they
receive a usr1/2.
-- Reuti
cpu 0.360
cpu 0.480
by convention. And having it in the same order as
the name, avoids the offset. I mean .1 = node01, .2 = node02...
I don't know, whether this is related in any way to the effect you
observe.
Cheers - Reuti
I've been unable to find a file that contains only the name of my
second no
2
setenv is csh, you are using (ba)sh.
export OMP_NUM_THREADS=2
but most likely you won't need it at all.
-- Reuti
echo I hope you find the correct number of processors
echo $OMP_NUM_THREADS
##
# above 3 lines produce the followin
so act as a NIS,
NTP and SGE qmaster server? You mentioned only the nodes.
-- Reuti
These are some of our thoughts. We know that the distribution choice
as well as the cluster management software will apply only ONCE and we
will not be able to test/change it easily...
Thanks very much for you
t
all, or will there be timing issues due to communication timeouts?
HTH - Reuti
Just want to ensure I understand the scenario here as that is
something
obviously unique to GE.
Thanks
Ralph
On 3/12/07 9:42 AM, "Olesen, Mark"
wrote:
I'm testing openmpi 1.2rc1 with Gri
aris the additonal group ID used to determine the
processes to signal, but default on Linux is the processgroup (unless
otherwise configured in the SGE config)?
-- Reuti
Usually when the child processes aren't started up properly, there
is a
high chance that the qrsh or orte daem
r Linux) the orte-daemons
could survive although the job was already killed (by processgroup),
as the final stop/kill can't be caught and forwarded.
I'll check this ASAP with 1.2-beta. I have only access to Linux
clusters.
But now we are going beyond Mark's initial problem.
Am 12.03.2007 um 21:29 schrieb Ralph Castain:
On 3/12/07 2:18 PM, "Reuti" wrote:
Am 12.03.2007 um 20:36 schrieb Ralph Castain:
ORTE propagates the signal to the application processes, but the
ORTE
daemons never actually look at the signal themselves (looks just
like a
messa
ode39/
job_scripts/45250
19927 19926 19926 T | \_ mpirun -np 4 /home/reuti/mpihello
19928 19927 19926 T | \_ qrsh -inherit -noshell -
nostdin -V node39 /home/reuti/local/openmpi-1.2rc3/bin/orted --no-
daemonize --bootpr
19934 19928 19926 T | | \_ /usr/sge/utilbin/lx2
finitely would cause shutdown
issues
They get a kill for sure, but no stop.
Do you have access to a SGE cluster?
-- Reuti
with respect to cleanup and possibly cause mpirun and/or your
application to
"hang". Again, we don't trap those signals in the daemon (only in
mpirun
ing under the
control of a queuing system. It should use `qrsh` in your case.
What does:
mpiexec --version
ompi_info | grep grid
reveal? What does:
qconf -sconf | egrep "(command|daemon)"
show?
-- Reuti
> Cheers,
>
> -David Laidlaw
>
>
>
>
> He
ing the
applications.
Side note: Open MPI binds the processes to cores by default. In case more than
one MPI job is running on a node one will have to use `mpiexec --bind-to none
…` as otherwise all jobs on this node will use core 0 upwards.
-- Reuti
> Thanks!
>
> -David Laidlaw
>
he length of the hostname
where it's running on?
If the admin are nice, the could define a symbolic link directly as /scratch
pointing to /var/spool/sge/wv2/tmp and setup in the queue configuration
/scratch as being TMPDIR. Effect and location like now, but safes some
characters
-- Reuti
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
e.test/bin/grid-sshd -i
> rlogin_command builtin
> rlogin_daemonbuiltin
> rsh_command builtin
> rsh_daemon builtin
That's fine. I wondered whether rsh_* would contain a redirection to
> that the feature was already there!)
>
> For the most part, this whole thing needs to get documented.
Especially that the colon is a disallowed character in the directory name. Any
suffix :foo will just be removed AFAICS without any error output about foo
being an unknown option.
--
ing all necessary environment
variable inside the job script itself, so that it is self contained.
Maybe they judge it a security issue, as this variable would also be present in
case you run a queue prolog/epilog as a different user. For the plain job
itself it wouldn't matter IMO.
And for any further investigation: which problem do you face in detail?
-- Reuti
ch node only once for sure. AFAIR
there was a setting in Torque to allow or disallow mutiple elections of the
fixed allocation rule per node.
HTH -- Reuti
tell the open-mpi where it is
> installed?
There is OPAL_PREFIX to be set:
https://www.open-mpi.org/faq/?category=building#installdirs
- -- Reuti
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org
iEYEARECAAYFAl8bIa0ACgkQo/GbGkBRnRrGywCgj5PHSKdMRwSx3jVB4en+wbmV
yG8Ani
lved and/or
replace vader? This was the reason I found '-mca btl ^openib' more appealing
than listing all others.
-- Reuti
> Prentice
>
> On 7/23/20 3:34 PM, Prentice Bisbal wrote:
>> I manage a cluster that is very heterogeneous. Some nodes have InfiniBand,
>> while
Hi,
what about putting the "-static-intel" into a configuration file for the Intel
compiler. Besides the default configuration, one can have a local one and put
the path in an environment variable IFORTCFG (there are other ones for C/C++).
$ cat myconf
--version
$ export IFORTCFG=/
`ldd`.)
Looks like I can get the intended behavior while configuring Open MPI on this
(older) system:
$ ./configure … LDFLAGS=-Wl,--enable-new-dtags
-- Reuti
Hi Jeff,
> Am 09.08.2022 um 16:17 schrieb Jeff Squyres (jsquyres) via users
> :
>
> Just to follow up on this thread...
>
> Reuti: I merged the PR on to the main docs branch. They're now live -- we
> changed the text:
> • here:
> https://docs.open-mpi
rror: Type mismatch in argument ‘pset_name_len’ at (1); passed INTEGER(8) to
INTEGER(4)
-- Reuti
*) https://diracprogram.org/doc/release-25/installation/int64/mpi64.html
To unsubscribe from this group and stop receiving emails from it, send an email
to users+unsubscr...@lists.op
501 - 548 of 548 matches
Mail list logo