Hi Ralph,
Somehow I did not receive your last answer as mail, so I reply to myself...
Thanks for the explanation. I thought that the prefix issue would be
handled by the OMPI configure parameter
"enable-mpirun-prefix-by-default". But now I see your point. Anyway, I
did not find any further informa
The "mca plm rsh" param tells OMPI to use the rsh launcher instead of the
Torque launcher. The only real difference between them is that the rsh launcher
"pre-sets" the prefix into the remote environment prior to executing the orted
- and the Torque launcher doesn't.
So it sounds like you aren'
Ralph, thank you very much for your input! The parameter "mca plm rsh"
did it. I am just curious about the reasons for that behavior?
You can find the complete output of the different commands embedded in
your mail below. The first line states the successful load of the OMPI
environment, we use the
Sorry - hit "send" and then saw the version sitting right there in the subject!
Doh...
First, let's try verifying what components are actually getting used. Run this:
mpirun -n 1 -mca ras_base_verbose 10 -mca plm_base_verbose 10 which orted
Then get an allocation and run
mpirun -pernode which
That error has nothing to do with Torque. The cmd line is simply wrong - you
are specifying a btl that doesn't exist.
It should work just fine with
mpirun -n X hellocluster
Nothing else is required. When you run
mpirun --hostfile nodefile hellocluster
OMPI will still use Torque to do the laun
Ah, and do I have to take care of the MCA ras plugin by my own?
I tried somethings like
> mpirun --mca ras tm --mca btl ras,plm --mca ras_tm_nodefile_dir
> /var/spool/torque/aux/ hellocluster
but despite that it has not helped/worked out ([node3:22726] mca: base:
components_open: component pml / c
Hi Ralph and all,
Yes, the OMPI libs and binaries are at the same place on the nodes, I
packed OMPI via checkinstall and installed the deb via pdsh on the nodes.
The LD_LIBRARY_PATH is set; I can run for example "mpirun --hostfile
nodefile hellocluster" without problems. But when started via torqu
Are the OMPI libraries and binaries installed at the same place on all the
remote nodes?
Are you setting the LD_LIBRARY_PATH correctly?
Are the Torque libs available in the same place on the remote nodes? Remember,
Torque runs mpirun on a backend node - not on the frontend.
These are the most
Hi all,
Your help with the following torque integration issue will be much
appreciated: whenever I try to start a openmpi job on more than one
node, it simply does not start up on the nodes.
The torque job fails with the following:
> Fri Dec 18 22:11:07 CET 2009
> OpenMPI with PPU-GCC was loaded