Oswin,
unfortunatly some important info is missing.
i guess the root cause is Open MPI was not configure'd with --enable-debug
could you please update your torque script and simply add the following
snippet before invoking mpirun
echo PBS_NODEFILE
cat $PBS_NODEFILE
echo ---
as i wrote
Ralph,
there might be an issue within Open MPI.
on the cluster i used, hostname returns the FQDN, and $PBS_NODEFILE uses
the FQDN too.
my $PBS_NODEFILE has one line per task, and it is ordered
e.g.
n0.cluster
n0.cluster
n1.cluster
n1.cluster
in my torque script, i rewrote the machine
Hello Gundram,
It looks like the test that is failing is TestMpiRmaCompareAndSwap.java. Is
that the one that is crashing? If so, could you try to run the C test from:
http://git.mpich.org/mpich.git/blob/c77631474f072e86c9fe761c1328c3d4cb8cc4a5:/test/mpi/rma/compare_and_swap.c#l1
Ther
Hi,
You are right. Yes the library is there and is linking to libtorque.so.
Sorry for the confusion.
Is there any other information I can provide? I am seriously new to all
of this.
Best,
Oswin
On 2016-09-07 17:16, r...@open-mpi.org wrote:
You aren’t looking in the right place - there is
You aren’t looking in the right place - there is an “openmpi” directory
underneath that one, and the mca_xxx libraries are down there
> On Sep 7, 2016, at 7:43 AM, Oswin Krause
> wrote:
>
> Hi Gilles,
>
> I do not have this library. Maybe this helps already...
>
> libmca_common_sm.so libmpi
You can also run: ompi_info | grep 'plm: tm'
(note the quotes, because you need to include the space)
If you see a line listing the TM PLM plugin, then you have Torque / PBS support
built in to Open MPI. If you don't, then you don't. :-)
> On Sep 7, 2016, at 11:01 AM, Gilles Gouaillardet
>
I will double check the name.
If you did not configure with --disable-dlopen, then mpirun only links with
opal and orte.
At run time, these libs will dlopen the plugins (from the openmpi sub
directory, they are named mca_abc_xyz.so)
If you have support for tm, then one of the plugin will be linke
Hi Gilles,
I do not have this library. Maybe this helps already...
libmca_common_sm.so libmpi_mpifh.so libmpi_usempif08.so
libompitrace.so libopen-rte.so
libmpi_cxx.solibmpi.solibmpi_usempi_ignore_tkr.so
libopen-pal.so liboshmem.so
and mpirun does only link to
Hi,
Thanks for looking into it. Also thanks to rhc. I tried to be very
consistent with the naming after being asked to do so by our it
department.
[zbh251@a00551 ~]$ hostname
a00551.science.domain
[zbh251@a00551 ~]$ hostname -f
a00551.science.domain
this is afair the same name as given in th
Note the torque library will only show up if you configure'd with
--disable-dlopen. Otherwise, you can ldd /.../lib/openmpi/mca_plm_tm.so
Cheers,
Gilles
Bennet Fauber wrote:
>Oswin,
>
>Does the torque library show up if you run
>
>$ ldd mpirun
>
>That would indicate that Torque support is comp
The usual cause of this problem is that the nodename in the machinefile is
given as a00551, while Torque is assigning the node name as
a00551.science.domain. Thus, mpirun thinks those are two separate nodes and
winds up spawning an orted on its own node.
You might try ensuring that your machine
Thanjs for the ligs
>From what i see now, it looks like a00551 is running both mpirun and orted,
>though it should only run mpirun, and orted should run only on a00553
I will check the code and see what could be happening here
Btw, what is the output of
hostname
hostname -f
On a00551 ?
Out of
Oswin,
Does the torque library show up if you run
$ ldd mpirun
That would indicate that Torque support is compiled in.
Also, what happens if you use the same hostfile, or some hostfile as
an explicit argument when you run mpirun from within the torque job?
-- bennet
On Wed, Sep 7, 2016 at
Hi,
Sorry, I forgot:
The node allocation seems to be correct as the nodes are NUMA. The node
allocation in torque is
a00551.science.domain-0
a00551.science.domain-1
a00553.science.domain-0
On 2016-09-07 14:41, Gilles Gouaillardet wrote:
Hi,
Which version of Open MPI are you running ?
I not
Hi Gilles,
Thanks for the hint with the machinefile. I know it is not equivalent
and i do not intend to use that approach. I just wanted to know whether
I could start the program successfully at all.
Outside torque(4.2), rsh seems to be used which works fine, querying a
password if no kerber
Hi,
Which version of Open MPI are you running ?
I noted that though you are asking three nodes and one task per node, you have
been allocated 2 nodes only.
I do not know if this is related to this issue.
Note if you use the machinefile, a00551 has two slots (since it appears twice
in the machi
Hi,
I am currently trying to set up OpenMPI in torque. OpenMPI is build with
tm support. Torque is correctly assigning nodes and I can run
mpi-programs on single nodes just fine. the problem starts when
processes are split between nodes.
For example, I create an interactive session with torq
17 matches
Mail list logo