On Jan 24, 2013, at 10:10 AM, Sabuj Pattanayek wrote: > or do i just need to compile two versions, one with IB and one without?
You should not need to, we have OMPI compiled for openib/psm and run that same install on psm/tcp and verbs(openib) based gear. All the nodes assigned to your job have qlogic IB adaptors? They also have libpsm_ininipath installed on all of them? This will be required. Also did you build your openmpi with tm? --with-tm=/usr/local/torque/ (or where ever the path to lib/libtorque.so is.) With TM support, mpirun from OMPI will know how to find the CPUs assigned to your job by torque. This is the better way, you can also in a pinch use mpirun -machinefile $PBS_NODEFILE -np 8 .... But really tm is better. Here is our build line for OMPI: ./configure --prefix=/home/software/rhel6/openmpi-1.6.3-mxm/intel-12.1 --mandir=/home/software/rhel6/openmpi-1.6.3-mxm/intel-12.1/man --with-tm=/usr/local/torque --with-openib --with-psm --with-mxm=/home/software/rhel6/mxm/1.5 --with-io-romio-flags=--with-file-system=testfs+ufs+lustre --disable-dlopen --enable-shared CC=icc CXX=icpc FC=ifort F77=ifort We run torque with OMPI. > > On Thu, Jan 24, 2013 at 9:09 AM, Sabuj Pattanayek <sab...@gmail.com> wrote: >> ahha, with --display-allocation I'm getting : >> >> mca: base: component_find: unable to open >> /sb/apps/openmpi/1.6.3/x86_64/lib/openmpi/mca_mtl_psm: >> libpsm_infinipath.so.1: cannot open shared object file: No such file >> or directory (ignored) >> >> I think the system I compiled it on has different ib libs than the >> nodes. I'll need to recompile and then see if it runs, but is there >> anyway to get it to ignore IB and just use gigE? Not all of our nodes >> have IB and I just want to use any node. >> >> On Thu, Jan 24, 2013 at 8:52 AM, Ralph Castain <r...@open-mpi.org> wrote: >>> How did you configure OMPI? If you add --display-allocation to your cmd >>> line, does it show all the nodes? >>> >>> On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek <sab...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I'm submitting a job through torque/PBS, the head node also runs the >>>> Moab scheduler, the .pbs file has this in the resources line : >>>> >>>> #PBS -l nodes=2:ppn=4 >>>> >>>> I've also tried something like : >>>> >>>> #PBS -l procs=56 >>>> >>>> and at the end of script I'm running : >>>> >>>> mpirun -np 8 cat /dev/urandom > /dev/null >>>> >>>> or >>>> >>>> mpirun -np 56 cat /dev/urandom > /dev/null >>>> >>>> ...depending on how many processors I requested. The job starts, >>>> $PBS_NODEFILE has the nodes that the job was assigned listed, but all >>>> the cat's are piled onto the first node. Any idea how I can get this >>>> to submit jobs across multiple nodes? Note, I have OSU mpiexec working >>>> without problems with mvapich and mpich2 on our cluster to launch jobs >>>> across multiple nodes. >>>> >>>> Thanks, >>>> Sabuj >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users