James,
for the v1.8/v1.10 series, the fix is available at
https://github.com/ggouaillardet/ompi-release/commit/c301bab8c9aff76eb7a3ee56b965b6ff3cf0073c.diff
fwiw
- i ran the test program under the debugger, and the datatype is the
same before and after MPI_Type_create_resize (e.g. the compile
I'm unaware of any change that would impact you here. For some reason,
mpirun believes you are requesting multiple cpus-per-proc, and that seems
to be the heart of the problem. Is there an MCA parameter in your
environment or default param file, perhaps?
On Wed, Jan 27, 2016 at 2:57 PM, Ben Menad
Thanks Ralph,
There’s no MCA parameters in my environment at all. Here’s the contents of
openmpi-mca-params.conf:
mpi_leave_pinned = 0
hwloc_base_binding_policy = core
rmaps_base_mapping_policy = core
hwloc_base_mem_alloc_policy = local_only
shmem_mmap_enable_nfs_warning = 0
pml = ^ya
Hi Ben and Ralph, just a very short comment.
The error message shows the hardware detection doesn't work well,
because it says the number of cpus is zero.
>
> #cpus-per-proc: 1
>
> number of cpus: 0
>
> map-by: BYSOCKET:NOOVERSUBSCRIBE
Regards,
Tetsuya
> Thanks Ralph,
>
>
>
> T
Actually, looking at the output, it appears that we are correctly detecting
the cpus. It looks instead like there is some other setting that is
overriding the discovery.
Is your allocation setting a specific cpuset? Or are you allocating the
entire node?
On Thu, Jan 28, 2016 at 3:19 PM, wrote:
Ben,
with respect to PBS, are both OpenMPI built the same way ?
e.g. configure --with-tm=/opt/pbs/default or something similar
you ran run
mpirun --mca plm_base_verbose 100 --mca ess_base_verbose 100 --mca
ras_base_verbose 100 hostname
and you should see the "tm" module in the logs.
i notice
Hi Gilles,
> with respect to PBS, are both OpenMPI built the same way ?
> e.g. configure --with-tm=/opt/pbs/default or something similar
Both are built against TM explicitly using the --with-tm option.
> you ran run
> mpirun --mca plm_base_verbose 100 --mca ess_base_verbose 100 --mca
ras_base_ve
Hi Gilles, Ralph,
Okay, it definitely seems to be due to the cpuset having only one of the
hyperthreads of each physical core:
[13:02:13 root@r60:4363542.r-man2] # echo 0-15 > cpuset.cpus
13:03 bjm900@r60 ~ > cat /cgroup/cpuset/pbspro/4363542.r-man2/cpuset.cpus
0-15
13:03 bjm900@r60 ~ > /apps
Ben,
what is the minimum number of nodes required to reproduce the issue ?
e.g. can you reproduce it with one node ?
Cheers,
Gilles
On 1/29/2016 11:00 AM, Ben Menadue wrote:
Hi Gilles,
with respect to PBS, are both OpenMPI built the same way ?
e.g. configure --with-tm=/opt/pbs/default or so
I was able to reproduce the issue on one node with a cpuset manually set.
fwiw, i cannot reproduce the issue using taskset instead of cpuset (!)
Cheers,
Gilles
On 1/29/2016 11:08 AM, Ben Menadue wrote:
Hi Gilles, Ralph,
Okay, it definitely seems to be due to the cpuset having only one of the
Yes, I'm able to reproduce it on a single node as well.
Actually, even on just a single CPU (and -np 1) - won't let me launch unless
both threads of that core are in the cgroup.
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles
Gouaillardet
Sent: Frid
Ben,
here is a patch that does fix that
sorry for the inconvenience and thanks for your help in understanding
this issue
Cheers,
Gilles
diff --git a/opal/mca/hwloc/base/hwloc_base_util.c
b/opal/mca/hwloc/base/hwloc_base_util.c
index 237c6b0..a4fa193 100644
--- a/opal/mca/hwloc/base/hwloc_
Hi Gilles,
Wow, thanks - that was quick. I'm rebuilding now.
Cheers,
Ben
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Gilles
Gouaillardet
Sent: Friday, 29 January 2016 1:54 PM
To: Open MPI Users
Subject: Re: [OMPI users] Any changes to rmaps in 1.10.2
13 matches
Mail list logo