Hi,
I am getting strange performance results for allgatherv operation for the
same number of procs and data, but with varying binding width. For example
here are two cases with about 180x difference in performance.
Each machine has 4 sockets each with 6 cores totaling 24 cores per node
(topology
Hi All,
Sorry for the late reply on this. I've been digging through the OpenMPI
FAQ. I've never explicitly set the subnet IDs for my IB subnets, so I
suspect I'm using the factory defaults. Probably, if I change this, it will
"just work". I'll see if the end user is still interested in testing thi
Hi Stefan (and Steven who reported this earlier with CUDA-aware program)
I have managed to observed the leak when running LAMMPS as well. Note that
this has nothing to do with CUDA-aware features. I am going to move this
discussion to the Open MPI developer’s list to dig deeper into this iss
I tried this, but I get an error,
---
An invalid value was given for the number of processes
per resource (ppr) to be mapped on each node:
PPR: 12:node,span
The specification must be a comma-separated list containing
combinations of number, followed by a colon, followed
by the resource ty
Thank you Ralph
Saliya
On Wed, Jul 1, 2015 at 4:01 PM, Ralph Castain wrote:
> Scenario 2: --map-by ppr:12:node,span --bind-to core
>
> will put 12 procs on each node, load balanced across the sockets, each
> proc bound to 1 core
>
> HTH
> Ralph
>
>
> On Wed, Jul 1, 2015 at 2:42 PM, Saliya Ekana
Scenario 2: --map-by ppr:12:node,span --bind-to core
will put 12 procs on each node, load balanced across the sockets, each proc
bound to 1 core
HTH
Ralph
On Wed, Jul 1, 2015 at 2:42 PM, Saliya Ekanayake wrote:
> Hi,
>
> I am doing some benchmarks and would like to test the following two
> sc
Hi,
I am doing some benchmarks and would like to test the following two
scenarios. Each machine has 4 sockets each with 6 cores (lstopo image
attached).
Scenario 1
---
Run 12 procs per node each bound to 2 cores. I can do this by --map-by
socket:PE=2
Scenario 2
Run 12 procs per node each bound t
Hi all,
Hopefully this mail gets posted in the right thread...
I have noticed the (I guess same) leak using OpenMPI 1.8.6 with LAMMPS, a
molecular dynamics program, without any use of CUDA. I am not that familiar
with how the internal memory management of LAMMPS works, but it does not
appear CUDA
Use --mca to pass the options directly through the mpirun.
George.
On Wed, Jul 1, 2015 at 9:14 AM, Saliya Ekanayake wrote:
> Thank you George. This is very informative.
>
> Is it possible to pass the option in runtime rather setting up in the
> config file?
>
> Thank you
> Saliya
>
> On Tue,
Thank you George. This is very informative.
Is it possible to pass the option in runtime rather setting up in the
config file?
Thank you
Saliya
On Tue, Jun 30, 2015 at 7:20 PM, George Bosilca wrote:
> Saliya,
>
> On Tue, Jun 30, 2015 at 10:50 AM, Saliya Ekanayake
> wrote:
>
>> Hi,
>>
>> I am
10 matches
Mail list logo