On 3/2/14 0:44 AM, Tru Huynh wrote:
On Fri, Feb 28, 2014 at 08:49:45AM +0100, Bernd Dammann wrote:
Maybe I should say, that we moved from SL 6.1 and OMPI 1.4.x to SL
6.4 with the above kernel, and OMPI 1.6.5 - which means a major
upgrade of our cluster.
After the upgrade, users reported those s
Edgar Gabriel writes:
>> [What's OMPIO, and should we want it?]
>
> OMPIO is the 'native' implementation of MPI I/O in Open MPI, its however
> only available from the 1.7 series onwards.
Thanks, but I wonder how I'd know that? NEWS mentions "Various OMPIO
updates and fixes.", but that's all I c
Bernd Dammann writes:
> We use Moab/Torque, so we could use cpusets (but that has had some
> other side effects earlier, so we did not implement it in our setup).
I don't know remember Torque does, but core binding and (Linux) cpusets
are somewhat orthogonal. While a cpuset will obviously restr
Tru Huynh writes:
> afaik, 2.6.32-431 series is from RHEL(and clones) version >=6.5
[Right.]
> otoh, it might be related to http://bugs.centos.org/view.php?id=6949
That looks likely. As we bind to cores, we wouldn't see it for MPI
processes, at least, and will see higher performance generally
Hi,
In an earlier thread I mentioned getting the following error when trying
--bind-to core option with mpirun. I was told that numactl and
numactl-devel need to be installed. The sys admins have installed these in
our cluster and I've rebuilt OpenMPI, but I still get the same warning. I
wonder if
Did you rebuild / re-install Open MPI after these packages were installed? I
believe that the assumption is that Open MPI didn't find the headers /
libraries it needed to do the binding when it was built.
If that still didn't solve your issue, please send all the information listed
here:
Hi Beichuan
So, from "df" it looks like /home is /work1, right?
Also, "mount" shows only /work[1-4], not the other
7 CWFS panfs (Panasas?), which apparently are not available in the
compute nodes/blades.
I presume you have access and are using only some of the /work[1-4]
(lustre) file systems
I actually did a rebuild and install. Is there a quick test to see if these
were picked up correctly. I checked OMPI_INFO and can see numaif.h has been
founded. Is this the correct indication?
I'll check the link and send details by tomorrow as our clusters are on
maintenance today.
Thank you,
Sa
Hi Saliya
Check with your sys admin if numactl and numactl-devel
were installed on *ALL* cluster nodes, and in particular on
Node: 192.168.0.19
where the problem happened in your most recent job.
Sometimes a node is down during a massive package install,
is forgotten, and never gets updated.
I
Thank you Gus, I'll double check with them and get back to you.
Saliya
On Tue, Mar 4, 2014 at 12:46 PM, Gus Correa wrote:
> Hi Saliya
>
> Check with your sys admin if numactl and numactl-devel
> were installed on *ALL* cluster nodes, and in particular on
> Node: 192.168.0.19
> where the probl
On 03/03/2014 05:06 PM, Brice Goglin wrote:
Le 03/03/2014 23:02, Gus Correa a écrit :
I rebooted the node and ran hwloc-gather-topology again.
This turn it didn't throw any errors on the terminal window,
which may be a good sign.
[root@node14 ~]# hwloc-gather-topology /tmp/`date
+"%Y%m%d%H%M"`.
On Fri, 2014-02-28 at 12:20 -0800, Ralph Castain wrote:
> Did you see the note I forwarded to you about SLES issues? Not sure if that
> is on your side or ours
It looks like a strange interaction between SLES header files and the
SLES compiler: odd that it would carp about one system call that's
12 matches
Mail list logo