On Jan 24, 2013, at 10:10 AM, Sabuj Pattanayek wrote:

> or do i just need to compile two versions, one with IB and one without?

You should not need to, we have OMPI compiled for openib/psm and run that same 
install on psm/tcp and verbs(openib) based gear.

All the nodes assigned to your job have qlogic IB adaptors? They also have 
libpsm_ininipath installed on all of them?  This will be required.

Also did you build your openmpi with tm?  --with-tm=/usr/local/torque/  (or 
where ever the path to lib/libtorque.so  is.)

With TM support, mpirun from OMPI will know how to find the CPUs assigned to 
your job by torque.  This is the better way, you can also in a pinch use 
mpirun -machinefile $PBS_NODEFILE -np 8 ....

But really tm is better.

Here is our build line for OMPI:

./configure --prefix=/home/software/rhel6/openmpi-1.6.3-mxm/intel-12.1 
--mandir=/home/software/rhel6/openmpi-1.6.3-mxm/intel-12.1/man 
--with-tm=/usr/local/torque --with-openib --with-psm 
--with-mxm=/home/software/rhel6/mxm/1.5 
--with-io-romio-flags=--with-file-system=testfs+ufs+lustre --disable-dlopen 
--enable-shared CC=icc CXX=icpc FC=ifort F77=ifort

We run torque with OMPI.

> 
> On Thu, Jan 24, 2013 at 9:09 AM, Sabuj Pattanayek <sab...@gmail.com> wrote:
>> ahha, with --display-allocation I'm getting :
>> 
>> mca: base: component_find: unable to open
>> /sb/apps/openmpi/1.6.3/x86_64/lib/openmpi/mca_mtl_psm:
>> libpsm_infinipath.so.1: cannot open shared object file: No such file
>> or directory (ignored)
>> 
>> I think the system I compiled it on has different ib libs than the
>> nodes. I'll need to recompile and then see if it runs, but is there
>> anyway to get it to ignore IB and just use gigE? Not all of our nodes
>> have IB and I just want to use any node.
>> 
>> On Thu, Jan 24, 2013 at 8:52 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>> How did you configure OMPI? If you add --display-allocation to your cmd 
>>> line, does it show all the nodes?
>>> 
>>> On Jan 24, 2013, at 6:34 AM, Sabuj Pattanayek <sab...@gmail.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I'm submitting a job through torque/PBS, the head node also runs the
>>>> Moab scheduler, the .pbs file has this in the resources line :
>>>> 
>>>> #PBS -l nodes=2:ppn=4
>>>> 
>>>> I've also tried something like :
>>>> 
>>>> #PBS -l procs=56
>>>> 
>>>> and at the end of script I'm running :
>>>> 
>>>> mpirun -np 8 cat /dev/urandom > /dev/null
>>>> 
>>>> or
>>>> 
>>>> mpirun -np 56 cat /dev/urandom > /dev/null
>>>> 
>>>> ...depending on how many processors I requested. The job starts,
>>>> $PBS_NODEFILE has the nodes that the job was assigned listed, but all
>>>> the cat's are piled onto the first node. Any idea how I can get this
>>>> to submit jobs across multiple nodes? Note, I have OSU mpiexec working
>>>> without problems with mvapich and mpich2 on our cluster to launch jobs
>>>> across multiple nodes.
>>>> 
>>>> Thanks,
>>>> Sabuj
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to