Oy; that's weird.

I'm afraid we're going to have to wait for Ralph to answer why that is 
happening -- sorry!


On Mar 18, 2013, at 4:45 PM, <tmish...@jcity.maeda.co.jp> wrote:

> 
> 
> Hi Correa and Jeff,
> 
> Thank you for your comments. I quickly checked your suggestion.
> 
> As a result, my simple example case worked well.
> export OMP_NUM_THREADS=4
> mpiexec -bynode -np 2 ./my_program   &&   #PBS -l nodes=2:ppn=4
> 
> But, practical case that more than 1 process was allocated to a node like
> below did not work.
> export OMP_NUM_THREADS=4
> mpiexec -bynode -np 4 ./my_program   &&   #PBS -l nodes=2:ppn=8
> 
> The error message is as follows:
> [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is
> attempting to be sent to a process whose contact infor
> mation is unknown in file rml_oob_send.c at line 316
> [node08.cluster:11946] [[30666,0],3] unable to find address for
> [[30666,0],1]
> [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is
> attempting to be sent to a process whose contact infor
> mation is unknown in file base/grpcomm_base_rollup.c at line 123
> 
> Here is our openmpi configuration:
> ./configure \
> --prefix=/home/mishima/opt/mpi/openmpi-1.7rc8-pgi12.9 \
> --with-tm \
> --with-verbs \
> --disable-ipv6 \
> CC=pgcc CFLAGS="-fast -tp k8-64e" \
> CXX=pgCC CXXFLAGS="-fast -tp k8-64e" \
> F77=pgfortran FFLAGS="-fast -tp k8-64e" \
> FC=pgfortran FCFLAGS="-fast -tp k8-64e"
> 
> Regards,
> Tetsuya Mishima
> 
>> On Mar 17, 2013, at 10:55 PM, Gustavo Correa <g...@ldeo.columbia.edu>
> wrote:
>> 
>>> In your example, have you tried not to modify the node file,
>>> launch two mpi processes with mpiexec, and request a "-bynode"
> distribution of processes:
>>> 
>>> mpiexec -bynode -np 2 ./my_program
>> 
>> This should work in 1.7, too (I use these kinds of options with SLURM all
> the time).
>> 
>> However, we should probably verify that the hostfile functionality in
> batch jobs hasn't been broken in 1.7, too, because I'm pretty sure that
> what you described should work.  However, Ralph, our
>> run-time guy, is on vacation this week.  There might be a delay in
> checking into this.
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to