Hi Jeff,
I didn't have much time to test this morning. So, I checked it again now. Then, the trouble seems to depend on the number of nodes to use. This works(nodes < 4): mpiexec -bynode -np 4 ./my_program && #PBS -l nodes=2:ppn=8 (OMP_NUM_THREADS=4) This causes error(nodes >= 4): mpiexec -bynode -np 8 ./my_program && #PBS -l nodes=4:ppn=8 (OMP_NUM_THREADS=4) Regards, Tetsuya Mishima > Oy; that's weird. > > I'm afraid we're going to have to wait for Ralph to answer why that is happening -- sorry! > > > On Mar 18, 2013, at 4:45 PM, <tmish...@jcity.maeda.co.jp> wrote: > > > > > > > Hi Correa and Jeff, > > > > Thank you for your comments. I quickly checked your suggestion. > > > > As a result, my simple example case worked well. > > export OMP_NUM_THREADS=4 > > mpiexec -bynode -np 2 ./my_program && #PBS -l nodes=2:ppn=4 > > > > But, practical case that more than 1 process was allocated to a node like > > below did not work. > > export OMP_NUM_THREADS=4 > > mpiexec -bynode -np 4 ./my_program && #PBS -l nodes=2:ppn=8 > > > > The error message is as follows: > > [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is > > attempting to be sent to a process whose contact infor > > mation is unknown in file rml_oob_send.c at line 316 > > [node08.cluster:11946] [[30666,0],3] unable to find address for > > [[30666,0],1] > > [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is > > attempting to be sent to a process whose contact infor > > mation is unknown in file base/grpcomm_base_rollup.c at line 123 > > > > Here is our openmpi configuration: > > ./configure \ > > --prefix=/home/mishima/opt/mpi/openmpi-1.7rc8-pgi12.9 \ > > --with-tm \ > > --with-verbs \ > > --disable-ipv6 \ > > CC=pgcc CFLAGS="-fast -tp k8-64e" \ > > CXX=pgCC CXXFLAGS="-fast -tp k8-64e" \ > > F77=pgfortran FFLAGS="-fast -tp k8-64e" \ > > FC=pgfortran FCFLAGS="-fast -tp k8-64e" > > > > Regards, > > Tetsuya Mishima > > > >> On Mar 17, 2013, at 10:55 PM, Gustavo Correa <g...@ldeo.columbia.edu> > > wrote: > >> > >>> In your example, have you tried not to modify the node file, > >>> launch two mpi processes with mpiexec, and request a "-bynode" > > distribution of processes: > >>> > >>> mpiexec -bynode -np 2 ./my_program > >> > >> This should work in 1.7, too (I use these kinds of options with SLURM all > > the time). > >> > >> However, we should probably verify that the hostfile functionality in > > batch jobs hasn't been broken in 1.7, too, because I'm pretty sure that > > what you described should work. However, Ralph, our > >> run-time guy, is on vacation this week. There might be a delay in > > checking into this. > >> > >> -- > >> Jeff Squyres > >> jsquy...@cisco.com > >> For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > >> > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >