Hi,
okay lets reboot, even though Gilles last mail was onto something.
The problem is that i failed starting programs with mpirun when more
than one node was involved. I mentioned that it is likely some
configuration problem with my server, especially authentification(we
have some kerberos ni
ras_base_verbose 10 hostname
Cheers,
Gilles
On 9/8/2016 6:42 PM, Oswin Krause wrote:
Hi,
i reconfigured to only have one physical node. Still no success, but
the nodefile now looks better. I still get the errors:
[a00551.science.domain:18021] [[34768,0],1] bind() failed on error
Address
.science.domain:18097] [[34561,0],0] plm:base:receive stop comm
[a00551.science.domain:18097] mca: base: close: component tm closed
[a00551.science.domain:18097] mca: base: close: unloading component tm
Best,
Oswin
On 2016-09-08 10:33, Oswin Krause wrote:
Hi Gilles, Hi Ralph,
I have just rebuild openmpi
nd running correctly on your cluster ?
Cheers,
Gilles
Oswin Krause wrote:
Hi Gilles,
Thanks for the hint with the machinefile. I know it is not
equivalent
and i do not intend to use that approach. I just wanted to know
whether
I could start the program successfully at all.
Outside torque(4.2),
see what could be happening here
Btw, what is the output of
hostname
hostname -f
On a00551 ?
Out of curiosity, is a previous version of Open MPI (e.g. v1.10.4)
installled and running correctly on your cluster ?
Cheers,
Gilles
Oswin Krause wrote:
Hi Gilles,
Thanks for the hint with the
an “openmpi”
directory underneath that one, and the mca_xxx libraries are down
there
On Sep 7, 2016, at 7:43 AM, Oswin Krause
wrote:
Hi Gilles,
I do not have this library. Maybe this helps already...
libmca_common_sm.so libmpi_mpifh.so libmpi_usempif08.so
libompitrace.so libopen
f you use the same hostfile, or some hostfile as
an explicit argument when you run mpirun from within the torque job?
-- bennet
On Wed, Sep 7, 2016 at 9:25 AM, Oswin Krause
wrote:
Hi Gilles,
Thanks for the hint with the machinefile. I know it is not equivalent
and i
do not intend to use
ck the code and see what could be happening here
Btw, what is the output of
hostname
hostname -f
On a00551 ?
Out of curiosity, is a previous version of Open MPI (e.g. v1.10.4)
installled and running correctly on your cluster ?
Cheers,
Gilles
Oswin Krause wrote:
Hi Gilles,
Thanks for th
the machinefile, the number of slots is automatically
detected)
Can you run
mpirun --mca plm_base_verbose 10 ...
So we can confirm tm is used.
Before invoking mpirun, you might want to cleanup the ompi directory in
/tmp
Cheers,
Gilles
Oswin Krause wrote:
Hi,
I am currently trying to set
can confirm tm is used.
Before invoking mpirun, you might want to cleanup the ompi directory in
/tmp
Cheers,
Gilles
Oswin Krause wrote:
Hi,
I am currently trying to set up OpenMPI in torque. OpenMPI is build
with
tm support. Torque is correctly assigning nodes and I can run
mpi-programs on
Hi,
I am currently trying to set up OpenMPI in torque. OpenMPI is build with
tm support. Torque is correctly assigning nodes and I can run
mpi-programs on single nodes just fine. the problem starts when
processes are split between nodes.
For example, I create an interactive session with torq
11 matches
Mail list logo