Tru Huynh writes:
> afaik, 2.6.32-431 series is from RHEL(and clones) version >=6.5
[Right.]
> otoh, it might be related to http://bugs.centos.org/view.php?id=6949
That looks likely. As we bind to cores, we wouldn't see it for MPI
processes, at least, and will see higher performance generally
Bernd Dammann writes:
> We use Moab/Torque, so we could use cpusets (but that has had some
> other side effects earlier, so we did not implement it in our setup).
I don't know remember Torque does, but core binding and (Linux) cpusets
are somewhat orthogonal. While a cpuset will obviously restr
On 3/2/14 0:44 AM, Tru Huynh wrote:
On Fri, Feb 28, 2014 at 08:49:45AM +0100, Bernd Dammann wrote:
Maybe I should say, that we moved from SL 6.1 and OMPI 1.4.x to SL
6.4 with the above kernel, and OMPI 1.6.5 - which means a major
upgrade of our cluster.
After the upgrade, users reported those s
On Fri, Feb 28, 2014 at 08:49:45AM +0100, Bernd Dammann wrote:
> On 2/27/14 16:47 PM, Dave Love wrote:
> >Bernd Dammann writes:
> >
> >>Hi,
> >>
> >>I found this thread from before Christmas, and I wondered what the
> >>status of this problem is. We experience the same problems since our
> >>upgr
On 2/27/14 14:06 PM, Noam Bernstein wrote:
On Feb 27, 2014, at 2:36 AM, Patrick Begou
wrote:
Bernd Dammann wrote:
Using the workaround '--bind-to-core' does only make sense for those jobs, that
allocate full nodes, but the majority of our jobs don't do that.
Why ?
We still use this option
On 2/27/14 16:47 PM, Dave Love wrote:
Bernd Dammann writes:
Hi,
I found this thread from before Christmas, and I wondered what the
status of this problem is. We experience the same problems since our
upgrade to Scientific Linux 6.4, kernel 2.6.32-431.1.2.el6.x86_64, and
OpenMPI 1.6.5.
Users
[I don't know what thread this is without References: or citation.]
Bernd Dammann writes:
> Hi,
>
> I found this thread from before Christmas, and I wondered what the
> status of this problem is. We experience the same problems since our
> upgrade to Scientific Linux 6.4, kernel 2.6.32-431.1.2.
Noam, cpusets are a very good idea.
Not only for CPU binding but for isolating 'badky behaved' applications.
If an application stsrts using huge amounts of memory - kill it, collapse
the cpuset and it is gone - nice clean way to manage jobs.
On Feb 27, 2014, at 5:06 AM, Noam Bernstein wrote:
> On Feb 27, 2014, at 2:36 AM, Patrick Begou
> wrote:
>
>> Bernd Dammann wrote:
>>> Using the workaround '--bind-to-core' does only make sense for those jobs,
>>> that allocate full nodes, but the majority of our jobs don't do that.
>> Why ?
On Feb 27, 2014, at 2:36 AM, Patrick Begou
wrote:
> Bernd Dammann wrote:
>> Using the workaround '--bind-to-core' does only make sense for those jobs,
>> that allocate full nodes, but the majority of our jobs don't do that.
> Why ?
> We still use this option in OpenMPI (1.6.x, 1.7.x) with OpenF
Bernd Dammann wrote:
Using the workaround '--bind-to-core' does only make sense for those jobs,
that allocate full nodes, but the majority of our jobs don't do that.
Why ?
We still use this option in OpenMPI (1.6.x, 1.7.x) with OpenFOAM and other
applications to attach each process on its core
Hi,
I found this thread from before Christmas, and I wondered what the
status of this problem is. We experience the same problems since our
upgrade to Scientific Linux 6.4, kernel 2.6.32-431.1.2.el6.x86_64, and
OpenMPI 1.6.5.
Users have reported severe slowdowns in all kinds of applications
On Dec 18, 2013, at 5:19 PM, Martin Siegert wrote:
>
> Thanks for figuring this out. Does this work for 1.6.x as well?
> The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity
> covers versions 1.2.x to 1.5.x.
> Does 1.6.x support mpi_paffinity_alone = 1 ?
> I set this in openmpi-m
Brice Goglin writes:
> hwloc-ps (and lstopo --top) are better at showing process binding but
> they lack a nice pseudographical interface with dynamic refresh.
That seems like an advantage when you want to check on a cluster!
> htop uses hwloc internally iirc, so there's hope we'll have everyth
Noam Bernstein writes:
> On Dec 18, 2013, at 10:32 AM, Dave Love wrote:
>
>> Noam Bernstein writes:
>>
>>> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in
>>> some
>>> collective communication), but now I'm wondering whether I should just test
>>> 1.6.5.
>>
>> What b
Hi,
expanding on Noam's problem a bit ...
On Wed, Dec 18, 2013 at 10:19:25AM -0500, Noam Bernstein wrote:
> Thanks to all who answered my question. The culprit was an interaction
> between
> 1.7.3 not supporting mpi_paffinity_alone (which we were using previously) and
> the new
> kernel. Swi
On Wed, 2013-12-18 at 11:47 -0500, Noam Bernstein wrote:
> Yes - I never characterized it fully, but we attached with gdb to every
> single vasp running process, and all were stuck in the same
> call to MPI_allreduce() every time. It's only happening on a rather large
> jobs, so it's not the easi
On Dec 18, 2013, at 10:32 AM, Dave Love wrote:
> Noam Bernstein writes:
>
>> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some
>> collective communication), but now I'm wondering whether I should just test
>> 1.6.5.
>
> What bug, exactly? As you mentioned vasp, is
hwloc-ps (and lstopo --top) are better at showing process binding but they lack
a nice pseudographical interface with dynamic refresh.
htop uses hwloc internally iirc, so there's hope we'll have everything needed
in htop one day ;)
Brice
Dave Love a écrit :
>John Hearns writes:
>
>> 'Htop' i
John Hearns writes:
> 'Htop' is a very good tool for looking at where processes are running.
I'd have thought hwloc-ps is the tool for that.
Noam Bernstein writes:
> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in some
> collective communication), but now I'm wondering whether I should just test
> 1.6.5.
What bug, exactly? As you mentioned vasp, is it specifically affecting
that?
We have seen apparent deadl
Thanks to all who answered my question. The culprit was an interaction between
1.7.3 not supporting mpi_paffinity_alone (which we were using previously) and
the new
kernel. Switching to --bind-to core (actually the environment variable
OMPI_MCA_hwloc_base_binding_policy=core) fixed the problem
Hi,
Do you have thread multiples enabled in your OpenMPI installation ?
Maxime Boissonneault
Le 2013-12-16 17:40, Noam Bernstein a écrit :
Has anyone tried to use openmpi 1.7.3 with the latest CentOS kernel
(well, nearly latest: 2.6.32-431.el6.x86_64), and especially with infiniband?
I'm seein
OMPI_MCA_hwloc_base_binding_policy=core
On Dec 17, 2013, at 8:40 AM, Noam Bernstein wrote:
> On Dec 17, 2013, at 11:04 AM, Ralph Castain wrote:
>
>> Are you binding the procs? We don't bind by default (this will change in
>> 1.7.4), and binding can play a significant role when comparing acro
On Dec 17, 2013, at 11:04 AM, Ralph Castain wrote:
> Are you binding the procs? We don't bind by default (this will change in
> 1.7.4), and binding can play a significant role when comparing across kernels.
>
> add "--bind-to-core" to your cmd line
Now that it works, is there a way to set it v
On Dec 17, 2013, at 11:04 AM, Ralph Castain wrote:
> Are you binding the procs? We don't bind by default (this will change in
> 1.7.4), and binding can play a significant role when comparing across kernels.
>
> add "--bind-to-core" to your cmd line
Yeay - it works. Thank you very much for the
On Tue, Dec 17, 2013 at 11:16:48AM -0500, Noam Bernstein wrote:
> On Dec 17, 2013, at 11:04 AM, Ralph Castain wrote:
>
> > Are you binding the procs? We don't bind by default (this will change in
> > 1.7.4), and binding can play a significant role when comparing across
> > kernels.
> >
> > add
On Dec 17, 2013, at 11:04 AM, Ralph Castain wrote:
> Are you binding the procs? We don't bind by default (this will change in
> 1.7.4), and binding can play a significant role when comparing across kernels.
>
> add "--bind-to-core" to your cmd line
I've previously always used mpi_paffinity_alo
'Htop' is a very good tool for looking at where processes are running.
Are you binding the procs? We don't bind by default (this will change in
1.7.4), and binding can play a significant role when comparing across kernels.
add "--bind-to-core" to your cmd line
On Dec 17, 2013, at 7:09 AM, Noam Bernstein wrote:
> On Dec 16, 2013, at 5:40 PM, Noam Bernstein
> wr
On Dec 16, 2013, at 5:40 PM, Noam Bernstein wrote:
>
> Once I have some more detailed information I'll follow up.
OK - I've tried to characterize the behavior with vasp, which accounts for
most of our cluster usage, and it's quite odd. I ran my favorite benchmarking
job repeated 4 times. As yo
Has anyone tried to use openmpi 1.7.3 with the latest CentOS kernel
(well, nearly latest: 2.6.32-431.el6.x86_64), and especially with infiniband?
I'm seeing lots of weird slowdowns, especially when using infiniband,
but even when running with "--mca btl self,sm" (it's much worse with
IB, though
32 matches
Mail list logo