Thanks - please go ahead and release that allocation as I’m not going to get to
this immediately. I’ve got several hot irons in the fire right now, and I’m not
sure when I’ll get a chance to track this down.
Gilles or anyone else who might have time - feel free to take a gander and see
if somet
Done. I have compiled 1.10.0 and 1.10.rc1 with --enable-debug and executed
mpirun --mca rmaps_base_verbose 10 --hetero-nodes --report-bindings
--bind-to core -np 32 ./affinity
In case of 1.10.rc1 I have also added :overload-allowed - output in a
separate file. This option did not make much d
Rats - just realized I have no way to test this as none of the machines I can
access are setup for cgroup-based multi-tenant. Is this a debug version of
OMPI? If not, can you rebuild OMPI with —enable-debug?
Then please run it with —mca rmaps_base_verbose 10 and pass along the output.
Thanks
Ra
What version of slurm is this? I might try to debug it here. I’m not sure where
the problem lies just yet.
> On Oct 3, 2015, at 8:59 AM, marcin.krotkiewski
> wrote:
>
> Here is the output of lstopo. In short, (0,16) are core 0, (1,17) - core 1
> etc.
>
> Machine (64GB)
> NUMANode L#0 (P#0
Here is the output of lstopo. In short, (0,16) are core 0, (1,17) - core
1 etc.
Machine (64GB)
NUMANode L#0 (P#0 32GB)
Socket L#0 + L3 L#0 (20MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#16)
L2 L#1 (256KB) + L1d L#1 (32K
Maybe I’m just misreading your HT map - that slurm nodelist syntax is a new one
to me, but they tend to change things around. Could you run lstopo on one of
those compute nodes and send the output?
I’m just suspicious because I’m not seeing a clear pairing of HT numbers in
your output, but HT n
On 10/03/2015 04:38 PM, Ralph Castain wrote:
If mpirun isn’t trying to do any binding, then you will of course get
the right mapping as we’ll just inherit whatever we received.
Yes. I meant that whatever you received (what SLURM gives) is a correct
cpu map and assigns _whole_ CPUs, not a singl
If mpirun isn’t trying to do any binding, then you will of course get the right
mapping as we’ll just inherit whatever we received. Looking at your output,
it’s pretty clear that you are getting independent HTs assigned and not full
cores. My guess is that something in slurm has changed such tha
On 10/03/2015 01:06 PM, Ralph Castain wrote:
Thanks Marcin. Looking at this, I’m guessing that Slurm may be treating HTs as
“cores” - i.e., as independent cpus. Any chance that is true?
Not to the best of my knowledge, and at least not intentionally. SLURM
starts as many processes as there are
Hi, I have a pet bug causing silent data corruption here:
https://github.com/open-mpi/ompi/issues/965
which seems to have a fix committed some time later. I've tested v1.10.1rc1
now and it still has the issue. I hope the fix makes it in the release.
Cheers!
On Saturday 03 Oct 2015 10:18:47
Marcin,
could you give a try at v1.10.1rc1 that was released today ?
it fixes a bug when hwloc was trying to bind outside the cpuset.
Ralph and all,
imho, there are several issues here
- if slurm allocates threads instead of core, then the --oversubscribe
mpirun option could be mandatory
- with
Thanks Marcin. Looking at this, I’m guessing that Slurm may be treating HTs as
“cores” - i.e., as independent cpus. Any chance that is true?
I’m wondering because bind-to core will attempt to bind your proc to both HTs
on the core. For some reason, we thought that 8.24 were HTs on the same core,
Open MPI users --
We have just posted first release candidate for the upcoming v1.10.1 bug fix
release. We'd appreciate any testing and/or feedback that you may on this
release candidate:
http://www.open-mpi.org/software/ompi/v1.10/
Thank you!
Changes since v1.10.0:
- Fix segv when invo
Hi, Ralph,
I submit my slurm job as follows
salloc --ntasks=64 --mem-per-cpu=2G --time=1:0:0
Effectively, the allocated CPU cores are spread amount many cluster
nodes. SLURM uses cgroups to limit the CPU cores available for mpi
processes running on a given cluster node. Compute nodes are 2-so
14 matches
Mail list logo