Hi, Am 19.03.2013 um 08:00 schrieb tmish...@jcity.maeda.co.jp:
> I didn't have much time to test this morning. So, I checked it again > now. Then, the trouble seems to depend on the number of nodes to use. > > This works(nodes < 4): > mpiexec -bynode -np 4 ./my_program && #PBS -l nodes=2:ppn=8 > (OMP_NUM_THREADS=4) > > This causes error(nodes >= 4): > mpiexec -bynode -np 8 ./my_program && #PBS -l nodes=4:ppn=8 > (OMP_NUM_THREADS=4) we don't use Torque/PBS on our own, but AFAIK the request "-l nodes=4:ppn=8" can give you 4 nodes with 8 slots each, or even some nodes twice or more often when slots are available and it's set up this way. To allow or disallow this behavior is a global setting in Torque/PBS. Did you get different nodes or some nodes at least twice? I don't know whether this is related to this issue, but at least worth to be mentioned in this context. -- Reuti > Regards, > Tetsuya Mishima > >> Oy; that's weird. >> >> I'm afraid we're going to have to wait for Ralph to answer why that is > happening -- sorry! >> >> >> On Mar 18, 2013, at 4:45 PM, <tmish...@jcity.maeda.co.jp> wrote: >> >>> >>> >>> Hi Correa and Jeff, >>> >>> Thank you for your comments. I quickly checked your suggestion. >>> >>> As a result, my simple example case worked well. >>> export OMP_NUM_THREADS=4 >>> mpiexec -bynode -np 2 ./my_program && #PBS -l nodes=2:ppn=4 >>> >>> But, practical case that more than 1 process was allocated to a node > like >>> below did not work. >>> export OMP_NUM_THREADS=4 >>> mpiexec -bynode -np 4 ./my_program && #PBS -l nodes=2:ppn=8 >>> >>> The error message is as follows: >>> [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is >>> attempting to be sent to a process whose contact infor >>> mation is unknown in file rml_oob_send.c at line 316 >>> [node08.cluster:11946] [[30666,0],3] unable to find address for >>> [[30666,0],1] >>> [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is >>> attempting to be sent to a process whose contact infor >>> mation is unknown in file base/grpcomm_base_rollup.c at line 123 >>> >>> Here is our openmpi configuration: >>> ./configure \ >>> --prefix=/home/mishima/opt/mpi/openmpi-1.7rc8-pgi12.9 \ >>> --with-tm \ >>> --with-verbs \ >>> --disable-ipv6 \ >>> CC=pgcc CFLAGS="-fast -tp k8-64e" \ >>> CXX=pgCC CXXFLAGS="-fast -tp k8-64e" \ >>> F77=pgfortran FFLAGS="-fast -tp k8-64e" \ >>> FC=pgfortran FCFLAGS="-fast -tp k8-64e" >>> >>> Regards, >>> Tetsuya Mishima >>> >>>> On Mar 17, 2013, at 10:55 PM, Gustavo Correa <g...@ldeo.columbia.edu> >>> wrote: >>>> >>>>> In your example, have you tried not to modify the node file, >>>>> launch two mpi processes with mpiexec, and request a "-bynode" >>> distribution of processes: >>>>> >>>>> mpiexec -bynode -np 2 ./my_program >>>> >>>> This should work in 1.7, too (I use these kinds of options with SLURM > all >>> the time). >>>> >>>> However, we should probably verify that the hostfile functionality in >>> batch jobs hasn't been broken in 1.7, too, because I'm pretty sure that >>> what you described should work. However, Ralph, our >>>> run-time guy, is on vacation this week. There might be a delay in >>> checking into this. >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users