Re: [OMPI users] Torque 2.4.3 fails with OpenMPI 1.3.4; no startup at all

2009-12-22 Thread Johann Knechtel
Hi Ralph, Somehow I did not receive your last answer as mail, so I reply to myself... Thanks for the explanation. I thought that the prefix issue would be handled by the OMPI configure parameter "enable-mpirun-prefix-by-default". But now I see your point. Anyway, I did not find any further informa

Re: [OMPI users] Torque 2.4.3 fails with OpenMPI 1.3.4; no startup at all

2009-12-20 Thread Ralph Castain
The "mca plm rsh" param tells OMPI to use the rsh launcher instead of the Torque launcher. The only real difference between them is that the rsh launcher "pre-sets" the prefix into the remote environment prior to executing the orted - and the Torque launcher doesn't. So it sounds like you aren'

Re: [OMPI users] Torque 2.4.3 fails with OpenMPI 1.3.4; no startup at all

2009-12-20 Thread Johann Knechtel
Ralph, thank you very much for your input! The parameter "mca plm rsh" did it. I am just curious about the reasons for that behavior? You can find the complete output of the different commands embedded in your mail below. The first line states the successful load of the OMPI environment, we use the

Re: [OMPI users] Torque 2.4.3 fails with OpenMPI 1.3.4; no startup at all

2009-12-19 Thread Ralph Castain
Sorry - hit "send" and then saw the version sitting right there in the subject! Doh... First, let's try verifying what components are actually getting used. Run this: mpirun -n 1 -mca ras_base_verbose 10 -mca plm_base_verbose 10 which orted Then get an allocation and run mpirun -pernode which

Re: [OMPI users] Torque 2.4.3 fails with OpenMPI 1.3.4; no startup at all

2009-12-19 Thread Ralph Castain
That error has nothing to do with Torque. The cmd line is simply wrong - you are specifying a btl that doesn't exist. It should work just fine with mpirun -n X hellocluster Nothing else is required. When you run mpirun --hostfile nodefile hellocluster OMPI will still use Torque to do the laun

Re: [OMPI users] Torque 2.4.3 fails with OpenMPI 1.3.4; no startup at all

2009-12-19 Thread Johann Knechtel
Ah, and do I have to take care of the MCA ras plugin by my own? I tried somethings like > mpirun --mca ras tm --mca btl ras,plm --mca ras_tm_nodefile_dir > /var/spool/torque/aux/ hellocluster but despite that it has not helped/worked out ([node3:22726] mca: base: components_open: component pml / c

Re: [OMPI users] Torque 2.4.3 fails with OpenMPI 1.3.4; no startup at all

2009-12-19 Thread Johann Knechtel
Hi Ralph and all, Yes, the OMPI libs and binaries are at the same place on the nodes, I packed OMPI via checkinstall and installed the deb via pdsh on the nodes. The LD_LIBRARY_PATH is set; I can run for example "mpirun --hostfile nodefile hellocluster" without problems. But when started via torqu

Re: [OMPI users] Torque 2.4.3 fails with OpenMPI 1.3.4; no startup at all

2009-12-18 Thread Ralph Castain
Are the OMPI libraries and binaries installed at the same place on all the remote nodes? Are you setting the LD_LIBRARY_PATH correctly? Are the Torque libs available in the same place on the remote nodes? Remember, Torque runs mpirun on a backend node - not on the frontend. These are the most

[OMPI users] Torque 2.4.3 fails with OpenMPI 1.3.4; no startup at all

2009-12-18 Thread Johann Knechtel
Hi all, Your help with the following torque integration issue will be much appreciated: whenever I try to start a openmpi job on more than one node, it simply does not start up on the nodes. The torque job fails with the following: > Fri Dec 18 22:11:07 CET 2009 > OpenMPI with PPU-GCC was loaded