Ha! I finally tracked it down - a new code path that bypassed the prior error
output. I have a fix going into master shortly, and will then port it to 1.10.1.
Thanks for your patience!
Ralph
> On Sep 24, 2015, at 1:12 AM, Patrick Begou
> wrote:
>
> Sorry for the delay. Runing mpirun whith wr
Sorry for the delay. Runing mpirun whith wrong OMPI_MCA_plm_rsh_agent doesn't
give any explicit message in OpenMPI-1.10.0.
How I can show the problem:
I request 2 nodes, 1cpu on each node, 4 cores on each cpu (so 8 cores availables
with cpusets). Node file is:
[begou@frog7 MPI_TESTS]$ cat $O
I’m really puzzled by that one - we very definitely will report an error and
exit if the user specifies that MCA param and we don’t find the given agent.
Could you please send us the actual cmd line plus the hostfile you gave, and
verify that the MCA param was set?
> On Sep 21, 2015, at 8:42 A
Patrick,
thanks for the report.
can you confirm what happened was
- you defined
OMPI_MCA_plm_rsh_agent=oarshmost
- oarshmost was not in the $PATH
- mpirun silently ignored the remote nodes
if that is correct, then i think mpirun should have reported an error
(oarshmost not found, or cannot remot
Hi Gilles,
I've done a big mistake! Compiling the patched version of openMPI and creating a
new module, I've forgotten to add the path to oarshmost command while
OMPI_MCA_plm_rsh_agent=oarshmost was set
OpenMPI was silently ignoring oarshmost command as it was unable to find it and
so only
Thanks Patrick,
could you please try again with the --hetero-nodes mpirun option ?
(I am afk, and not 100% sure about the syntax)
could you also submit a job with 2 nodes and 4 cores on each node, that does
cat /proc/self/status
oarshmost cat /proc/self/status
btw, is there any reason why do yo
Gilles Gouaillardet wrote:
Patrick,
by the way, this will work when running on a single node.
i do not know what will happen when you run on multiple node ...
since there is no OAR integration in openmpi, i guess you are using ssh to
start orted on the remote nodes
(unless you instructed ompi
Patrick,
by the way, this will work when running on a single node.
i do not know what will happen when you run on multiple node ...
since there is no OAR integration in openmpi, i guess you are using ssh
to start orted on the remote nodes
(unless you instructed ompi to use an OARified version
Patrick,
i just filled PR 586 https://github.com/open-mpi/ompi-release/pull/586
for the v1.10 series
this is only a three line patch.
could you please give it a try ?
Cheers,
Gilles
On 9/18/2015 4:54 PM, Patrick Begou wrote:
Ralph Castain wrote:
As I said, if you don’t provide an explicit
Ralph Castain wrote:
As I said, if you don't provide an explicit slot count in your hostfile, we
default to allowing oversubscription. We don't have OAR integration in OMPI,
and so mpirun isn't recognizing that you are running under a resource manager
- it thinks this is just being controlled b
Thanks Gilles!!
On Wed, Sep 16, 2015 at 9:21 PM, Gilles Gouaillardet
wrote:
> Ralph,
>
> you can reproduce this with master by manually creating a cpuset with less
> cores than available,
> and invoke mpirun with -bind-to core from within the cpuset.
>
> i made PR 904 https://github.com/open-mp
Ralph,
you can reproduce this with master by manually creating a cpuset with
less cores than available,
and invoke mpirun with -bind-to core from within the cpuset.
i made PR 904 https://github.com/open-mpi/ompi/pull/904
Brice,
can you please double check the hwloc_bitmap_isincluded invokati
As I said, if you don’t provide an explicit slot count in your hostfile, we
default to allowing oversubscription. We don’t have OAR integration in OMPI,
and so mpirun isn’t recognizing that you are running under a resource manager -
it thinks this is just being controlled by a hostfile.
If you
Thanks all for your answers, I've added some details about the tests I have
run. See below.
Ralph Castain wrote:
Not precisely correct. It depends on the environment.
If there is a resource manager allocating nodes, or you provide a hostfile
that specifies the number of slots on the nodes,
“We” do check the available cores - which is why I asked for details :-)
> On Sep 15, 2015, at 7:10 PM, Gilles Gouaillardet
> wrote:
>
> Ralph,
>
> my guess is that cupset is set by the batch manager (slurm?)
> so I think this is an ompi bug/missing feature :
> "we" should check the available
Ralph,
my guess is that cupset is set by the batch manager (slurm?)
so I think this is an ompi bug/missing feature :
"we" should check the available cores (4 in this case because of cpuset)
instead of the online cores (8 in this case)
I wrote "we" because it could either be ompi or hwloc, or ompi
Not precisely correct. It depends on the environment.
If there is a resource manager allocating nodes, or you provide a hostfile that
specifies the number of slots on the nodes, or you use -host, then we default
to no-oversubscribe.
If you provide a hostfile that doesn’t specify slots, then we
Looking at the Open MPI 1.10.0 man page:
https://www.open-mpi.org/doc/v1.10/man1/mpirun.1.php
it looks like perhaps -oversubscribe (which was an option) is now the
default behavior. Instead we have:
*-nooversubscribe, --nooversubscribe*Do not oversubscribe any nodes; error
(without starting an
Hi,
I'm runing OpenMPI 1.10.0 built with Intel 2015 compilers on a Bullx System.
I've some troubles with the bind-to core option when using cpuset.
If the cpuset is less than all the cores of a cpu (ex: 4 cores allowed on a 8
cores cpus) OpenMPI 1.10.0 allows to overload these cores until the m
19 matches
Mail list logo