Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-03-04 Thread Bernd Dammann
On 3/2/14 0:44 AM, Tru Huynh wrote: On Fri, Feb 28, 2014 at 08:49:45AM +0100, Bernd Dammann wrote: Maybe I should say, that we moved from SL 6.1 and OMPI 1.4.x to SL 6.4 with the above kernel, and OMPI 1.6.5 - which means a major upgrade of our cluster. After the upgrade, users reported those s

Re: [OMPI users] OpenMPI-ROMIO-OrangeFS

2014-03-04 Thread Dave Love
Edgar Gabriel writes: >> [What's OMPIO, and should we want it?] > > OMPIO is the 'native' implementation of MPI I/O in Open MPI, its however > only available from the 1.7 series onwards. Thanks, but I wonder how I'd know that? NEWS mentions "Various OMPIO updates and fixes.", but that's all I c

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-03-04 Thread Dave Love
Bernd Dammann writes: > We use Moab/Torque, so we could use cpusets (but that has had some > other side effects earlier, so we did not implement it in our setup). I don't know remember Torque does, but core binding and (Linux) cpusets are somewhat orthogonal. While a cpuset will obviously restr

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2014-03-04 Thread Dave Love
Tru Huynh writes: > afaik, 2.6.32-431 series is from RHEL(and clones) version >=6.5 [Right.] > otoh, it might be related to http://bugs.centos.org/view.php?id=6949 That looks likely. As we bind to cores, we wouldn't see it for MPI processes, at least, and will see higher performance generally

[OMPI users] bind-to core warning even with numactl

2014-03-04 Thread Saliya Ekanayake
Hi, In an earlier thread I mentioned getting the following error when trying --bind-to core option with mpirun. I was told that numactl and numactl-devel need to be installed. The sys admins have installed these in our cluster and I've rebuilt OpenMPI, but I still get the same warning. I wonder if

Re: [OMPI users] bind-to core warning even with numactl

2014-03-04 Thread Jeff Squyres (jsquyres)
Did you rebuild / re-install Open MPI after these packages were installed? I believe that the assumption is that Open MPI didn't find the headers / libraries it needed to do the binding when it was built. If that still didn't solve your issue, please send all the information listed here:

Re: [OMPI users] OpenMPI job initializing problem

2014-03-04 Thread Gus Correa
Hi Beichuan So, from "df" it looks like /home is /work1, right? Also, "mount" shows only /work[1-4], not the other 7 CWFS panfs (Panasas?), which apparently are not available in the compute nodes/blades. I presume you have access and are using only some of the /work[1-4] (lustre) file systems

Re: [OMPI users] bind-to core warning even with numactl

2014-03-04 Thread Saliya Ekanayake
I actually did a rebuild and install. Is there a quick test to see if these were picked up correctly. I checked OMPI_INFO and can see numaif.h has been founded. Is this the correct indication? I'll check the link and send details by tomorrow as our clusters are on maintenance today. Thank you, Sa

Re: [OMPI users] bind-to core warning even with numactl

2014-03-04 Thread Gus Correa
Hi Saliya Check with your sys admin if numactl and numactl-devel were installed on *ALL* cluster nodes, and in particular on Node: 192.168.0.19 where the problem happened in your most recent job. Sometimes a node is down during a massive package install, is forgotten, and never gets updated. I

Re: [OMPI users] bind-to core warning even with numactl

2014-03-04 Thread Saliya Ekanayake
Thank you Gus, I'll double check with them and get back to you. Saliya On Tue, Mar 4, 2014 at 12:46 PM, Gus Correa wrote: > Hi Saliya > > Check with your sys admin if numactl and numactl-devel > were installed on *ALL* cluster nodes, and in particular on > Node: 192.168.0.19 > where the probl

Re: [OMPI users] hwloc error in topology.c in OMPI 1.6.5

2014-03-04 Thread Gus Correa
On 03/03/2014 05:06 PM, Brice Goglin wrote: Le 03/03/2014 23:02, Gus Correa a écrit : I rebooted the node and ran hwloc-gather-topology again. This turn it didn't throw any errors on the terminal window, which may be a good sign. [root@node14 ~]# hwloc-gather-topology /tmp/`date +"%Y%m%d%H%M"`.

Re: [OMPI users] OrangeFS ROMIO support

2014-03-04 Thread Latham, Robert J.
On Fri, 2014-02-28 at 12:20 -0800, Ralph Castain wrote: > Did you see the note I forwarded to you about SLES issues? Not sure if that > is on your side or ours It looks like a strange interaction between SLES header files and the SLES compiler: odd that it would carp about one system call that's