Re: [OMPI users] Torque and OpenMPI 1.2.1 problems

2007-05-02 Thread Ralph Castain
On 5/2/07 7:57 AM, "Ole Holm Nielsen" wrote: > > What I'm saying is that users should be able run the same script in different > environments, they being Torque or non-Torque, without having to change > the arguments to the mpirun command. Maybe they submit batch jobs to > our Linux/Torque

Re: [OMPI users] Torque and OpenMPI 1.2.1 problems

2007-05-02 Thread Ole Holm Nielsen
Ralph, thanks very much for your continued support: Ralph Castain wrote: I'd say that this behavior of mpirun under Torque TM should be considered as a bug. Ideally, users should not have to design their scripts differently according to whether the sysadmin decided to configure in TM or not. Als

Re: [OMPI users] Torque and OpenMPI 1.2.1 problems

2007-05-02 Thread Ralph Castain
I guess I am now totally confused, so I will have to ask your patience with a few questions. On 5/2/07 4:55 AM, "Ole Holm Nielsen" wrote: > Ralph Castain wrote: >> We would consider it a "feature" that OpenMPI is integrated with Torque. We >> actually read the PBS_NODEFILE internally ourselves.

Re: [OMPI users] Torque and OpenMPI 1.2.1 problems

2007-05-02 Thread Ole Holm Nielsen
Ralph Castain wrote: We would consider it a "feature" that OpenMPI is integrated with Torque. We actually read the PBS_NODEFILE internally ourselves. I believe the problem here is that specifying the "machinefile" prevents us from using that Torque-integrated code and forces us down a different c

Re: [OMPI users] Torque and OpenMPI 1.2.1 problems

2007-05-02 Thread Ralph Castain
On 5/2/07 1:28 AM, "Ole Holm Nielsen" wrote: > Bas hit the nail on the head: When using OpenMPI's mpirun under > Torque TM one apparently *must* omit the "-machinefile $PBS_NODEFILE" > flags and only specify "-np 2", presumably because TM knows all > about the machines under its control. > >

Re: [OMPI users] Torque and OpenMPI 1.2.1 problems

2007-05-02 Thread Ole Holm Nielsen
Bas hit the nail on the head: When using OpenMPI's mpirun under Torque TM one apparently *must* omit the "-machinefile $PBS_NODEFILE" flags and only specify "-np 2", presumably because TM knows all about the machines under its control. This behavior is new to me: Is this a feature or a bug in O

Re: [OMPI users] Torque and OpenMPI 1.2.1 problems

2007-05-02 Thread Bas van der Vlies
Ole Holm Nielsen wrote: We have built OpenMPI 1.2.1 with support for Torque 2.1.8 and its Task Manager interface. We use the PGI 6.2-4 compiler and the --with-tm option as described in http://www.open-mpi.org/faq/?category=building#build-rte-tm for building an OpenMPI RPM on a Pentium-4 machin

Re: [OMPI users] Torque and OpenMPI 1.2.1 problems

2007-05-01 Thread Ole Holm Nielsen
Thanks for the suggestion. I inserted a printenv command and the path and library variables seem to be correct for our OpenMPI installation: LD_LIBRARY_PATH=/usr/local/openmpi-1.2.1-pgi/lib:/opt/intel/compiler90/lib MPIHOME=/usr/local/openmpi-1.2.1-pgi PATH=/usr/local/openmpi-1.2.1-pgi/bin:/usr/

Re: [OMPI users] Torque and OpenMPI 1.2.1 problems

2007-05-01 Thread Ralph Castain
The most likely problem is that you have a path or library issue regarding the location of the OpenMPI/OpenRTE executables when running batch versus interactive. We see this sometimes when the shell startups differ in those two modes. You might try just running a batch vs interactive printenv to s

[OMPI users] Torque and OpenMPI 1.2.1 problems

2007-05-01 Thread Ole Holm Nielsen
We have built OpenMPI 1.2.1 with support for Torque 2.1.8 and its Task Manager interface. We use the PGI 6.2-4 compiler and the --with-tm option as described in http://www.open-mpi.org/faq/?category=building#build-rte-tm for building an OpenMPI RPM on a Pentium-4 machine running CentOS 4.4 (RHEL4