You obviously have some MCA params set somewhere:

> --------------------------------------------------------------------------
> A deprecated MCA parameter value was specified in an MCA parameter
> file.  Deprecated MCA parameters should be avoided; they may disappear
> in future releases.
> 
>  Deprecated parameter: orte_rsh_agent
> --------------------------------------------------------------------------

Check your environment for anything with OMPI_MCA_xxx, and your default MCA 
parameter file to see what has been specified.

The allocation looks okay - I'll have to look for other debug flags you can 
set. Meantime, can you please add --enable-debug to your configure cmd line and 
rebuild?

Thanks
Ralph


On Mar 20, 2013, at 4:39 PM, tmish...@jcity.maeda.co.jp wrote:

> 
> 
> Hi Ralph,
> 
> Here is a result of rerun with --display-allocation.
> I set OMP_NUM_THREADS=1 to make the problem clear.
> 
> Regards,
> Tetsuya Mishima
> 
> P.S. As far as I checked, these 2 cases are OK(no problem).
> (1)mpirun -v -np $NPROCS-x OMP_NUM_THREADS --display-allocation
> ~/Ducom/testbed/mPre m02-ld
> (2)mpirun -v -x OMP_NUM_THREADS --display-allocation ~/Ducom/testbed/mPre
> m02-ld
> 
> Script File:
> 
> #!/bin/sh
> #PBS -A tmishima
> #PBS -N Ducom-run
> #PBS -j oe
> #PBS -l nodes=2:ppn=4
> export OMP_NUM_THREADS=1
> cd $PBS_O_WORKDIR
> cp $PBS_NODEFILE pbs_hosts
> NPROCS=`wc -l < pbs_hosts`
> mpirun -v -np $NPROCS -hostfile pbs_hosts -x OMP_NUM_THREADS
> --display-allocation ~/Ducom/testbed/mPre m02-ld
> 
> Output:
> --------------------------------------------------------------------------
> A deprecated MCA parameter value was specified in an MCA parameter
> file.  Deprecated MCA parameters should be avoided; they may disappear
> in future releases.
> 
>  Deprecated parameter: orte_rsh_agent
> --------------------------------------------------------------------------
> 
> ======================   ALLOCATED NODES   ======================
> 
> Data for node: node06  Num slots: 4    Max slots: 0
> Data for node: node05  Num slots: 4    Max slots: 0
> 
> =================================================================
> --------------------------------------------------------------------------
> A hostfile was provided that contains at least one node not
> present in the allocation:
> 
>  hostfile:  pbs_hosts
>  node:      node06
> 
> If you are operating in a resource-managed environment, then only
> nodes that are in the allocation can be used in the hostfile. You
> may find relative node syntax to be a useful alternative to
> specifying absolute node names see the orte_hosts man page for
> further information.
> --------------------------------------------------------------------------
> 
> 
>> I've submitted a patch to fix the Torque launch issue - just some
> leftover garbage that existed at the time of the 1.7.0 branch and didn't
> get removed.
>> 
>> For the hostfile issue, I'm stumped as I can't see how the problem would
> come about. Could you please rerun your original test and add
> "--display-allocation" to your cmd line? Let's see if it is
>> correctly finding the original allocation.
>> 
>> Thanks
>> Ralph
>> 
>> On Mar 19, 2013, at 5:08 PM, tmish...@jcity.maeda.co.jp wrote:
>> 
>>> 
>>> 
>>> Hi Gus,
>>> 
>>> Thank you for your comments. I understand your advice.
>>> Our script used to be --npernode type as well.
>>> 
>>> As I told before, our cluster consists of nodes having 4, 8,
>>> and 32 cores, although it used to be homogeneous at the
>>> starting time. Furthermore, since performance of each core
>>> is almost same, a mixed use of nodes with different number
>>> of cores is possible, just like #PBS -l nodes=1:ppn=32+4:ppn=8.
>>> 
>>> --npernode type is not applicable to such a mixed use.
>>> That's why I'd like to continue to use modified hostfile.
>>> 
>>> By the way, the problem I reported to Jeff yesterday
>>> was that openmpi-1.7 with torque is something wrong,
>>> because it caused error against such a simple case as
>>> shown below, which surprised me. Now, the problem is not
>>> limited to modified hostfile, I guess.
>>> 
>>> #PBS -l nodes=4:ppn=8
>>> mpirun -np 8 ./my_program
>>> (OMP_NUM_THREADS=4)
>>> 
>>> Regards,
>>> Tetsuya Mishima
>>> 
>>>> Hi Tetsuya
>>>> 
>>>> Your script that edits $PBS_NODEFILE into a separate hostfile
>>>> is very similar to some that I used here for
>>>> hybrid OpenMP+MPI programs on older versions of OMPI.
>>>> I haven't tried this in 1.6.X,
>>>> but it looks like you did and it works also.
>>>> I haven't tried 1.7 either.
>>>> Since we run production machines,
>>>> I try to stick to the stable versions of OMPI (even numbered:
>>>> 1.6.X, 1.4.X, 1.2.X).
>>>> 
>>>> I believe you can get the same effect even if you
>>>> don't edit your $PBS_NODEFILE and let OMPI use it as is.
>>>> Say, if you choose carefully the values in your
>>>> #PBS -l nodes=?:ppn=?
>>>> of your
>>>> $OMP_NUM_THREADS
>>>> and use an mpiexec with --npernode or --cpus-per-proc.
>>>> 
>>>> For instance, for twelve MPI processes, with two threads each,
>>>> on nodes with eight cores each, I would try
>>>> (but I haven't tried!):
>>>> 
>>>> #PBS -l nodes=3:ppn=8
>>>> 
>>>> export $OMP_NUM_THREADS=2
>>>> 
>>>> mpiexec -np 12 -npernode 4
>>>> 
>>>> or perhaps more tightly:
>>>> 
>>>> mpiexec -np 12 --report-bindings --bind-to-core --cpus-per-proc 2
>>>> 
>>>> I hope this helps,
>>>> Gus Correa
>>>> 
>>>> 
>>>> 
>>>> On 03/19/2013 03:12 PM, tmish...@jcity.maeda.co.jp wrote:
>>>>> 
>>>>> 
>>>>> Hi Reuti and Gus,
>>>>> 
>>>>> Thank you for your comments.
>>>>> 
>>>>> Our cluster is a little bit heterogeneous, which has nodes with 4, 8,
>>> 32
>>>>> cores.
>>>>> I used 8-core nodes for "-l nodes=4:ppn=8" and 4-core nodes for "-l
>>>>> nodes=2:ppn=4".
>>>>> (strictly speaking, Torque picked up proper nodes.)
>>>>> 
>>>>> As I mentioned before, I usually use openmpi-1.6.x, which has no
> troble
>>>>> against that kind
>>>>> of use. I encountered the issue when I was evaluating openmpi-1.7 to
>>> check
>>>>> when we could
>>>>> move on to it, although we have no positive reason to do that at this
>>>>> moment.
>>>>> 
>>>>> As Gus pointed out, I use a script file as shown below for a
> practical
>>> use
>>>>> of openmpi-1.6.x.
>>>>> 
>>>>> #PBS -l nodes=2:ppn=32  # even "-l nodes=1:ppn=32+4:ppn=8" works fine
>>>>> export OMP_NUM_THREADS=4
>>>>> modify $PBS_NODEFILE pbs_hosts # 64 lines are condensed to 16 lines
>>> here
>>>>> mpirun -hostfile pbs_hosts -np 16 -cpus-per-proc 4 -report-bindings \
>>>>> -x OMP_NUM_THREADS ./my_program  # 32-core node has 8 numanodes,
> 8-core
>>>>> node has 2 numanodes
>>>>> 
>>>>> It works well under the combination of openmpi-1.6.x and Torque. The
>>>>> problem is just
>>>>> openmpi-1.7's behavior.
>>>>> 
>>>>> Regards,
>>>>> Tetsuya Mishima
>>>>> 
>>>>>> Hi Tetsuya Mishima
>>>>>> 
>>>>>> Mpiexec offers you a number of possibilities that you could try:
>>>>>> --bynode,
>>>>>> --pernode,
>>>>>> --npernode,
>>>>>> --bysocket,
>>>>>> --bycore,
>>>>>> --cpus-per-proc,
>>>>>> --cpus-per-rank,
>>>>>> --rankfile
>>>>>> and more.
>>>>>> 
>>>>>> Most likely one or more of them will fit your needs.
>>>>>> 
>>>>>> There are also associated flags to bind processes to cores,
>>>>>> to sockets, etc, to report the bindings, and so on.
>>>>>> 
>>>>>> Check the mpiexec man page for details.
>>>>>> 
>>>>>> Nevertheless, I am surprised that modifying the
>>>>>> $PBS_NODEFILE doesn't work for you in OMPI 1.7.
>>>>>> I have done this many times in older versions of OMPI.
>>>>>> 
>>>>>> Would it work for you to go back to the stable OMPI 1.6.X,
>>>>>> or does it lack any special feature that you need?
>>>>>> 
>>>>>> I hope this helps,
>>>>>> Gus Correa
>>>>>> 
>>>>>> On 03/19/2013 03:00 AM, tmish...@jcity.maeda.co.jp wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Hi Jeff,
>>>>>>> 
>>>>>>> I didn't have much time to test this morning. So, I checked it
> again
>>>>>>> now. Then, the trouble seems to depend on the number of nodes to
> use.
>>>>>>> 
>>>>>>> This works(nodes<   4):
>>>>>>> mpiexec -bynode -np 4 ./my_program&&     #PBS -l nodes=2:ppn=8
>>>>>>> (OMP_NUM_THREADS=4)
>>>>>>> 
>>>>>>> This causes error(nodes>= 4):
>>>>>>> mpiexec -bynode -np 8 ./my_program&&     #PBS -l nodes=4:ppn=8
>>>>>>> (OMP_NUM_THREADS=4)
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Tetsuya Mishima
>>>>>>> 
>>>>>>>> Oy; that's weird.
>>>>>>>> 
>>>>>>>> I'm afraid we're going to have to wait for Ralph to answer why
> that
>>> is
>>>>>>> happening -- sorry!
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mar 18, 2013, at 4:45 PM,<tmish...@jcity.maeda.co.jp>   wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hi Correa and Jeff,
>>>>>>>>> 
>>>>>>>>> Thank you for your comments. I quickly checked your suggestion.
>>>>>>>>> 
>>>>>>>>> As a result, my simple example case worked well.
>>>>>>>>> export OMP_NUM_THREADS=4
>>>>>>>>> mpiexec -bynode -np 2 ./my_program&&     #PBS -l nodes=2:ppn=4
>>>>>>>>> 
>>>>>>>>> But, practical case that more than 1 process was allocated to a
>>> node
>>>>>>> like
>>>>>>>>> below did not work.
>>>>>>>>> export OMP_NUM_THREADS=4
>>>>>>>>> mpiexec -bynode -np 4 ./my_program&&     #PBS -l nodes=2:ppn=8
>>>>>>>>> 
>>>>>>>>> The error message is as follows:
>>>>>>>>> [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is
>>>>>>>>> attempting to be sent to a process whose contact infor
>>>>>>>>> mation is unknown in file rml_oob_send.c at line 316
>>>>>>>>> [node08.cluster:11946] [[30666,0],3] unable to find address for
>>>>>>>>> [[30666,0],1]
>>>>>>>>> [node08.cluster:11946] [[30666,0],3] ORTE_ERROR_LOG: A message is
>>>>>>>>> attempting to be sent to a process whose contact infor
>>>>>>>>> mation is unknown in file base/grpcomm_base_rollup.c at line 123
>>>>>>>>> 
>>>>>>>>> Here is our openmpi configuration:
>>>>>>>>> ./configure \
>>>>>>>>> --prefix=/home/mishima/opt/mpi/openmpi-1.7rc8-pgi12.9 \
>>>>>>>>> --with-tm \
>>>>>>>>> --with-verbs \
>>>>>>>>> --disable-ipv6 \
>>>>>>>>> CC=pgcc CFLAGS="-fast -tp k8-64e" \
>>>>>>>>> CXX=pgCC CXXFLAGS="-fast -tp k8-64e" \
>>>>>>>>> F77=pgfortran FFLAGS="-fast -tp k8-64e" \
>>>>>>>>> FC=pgfortran FCFLAGS="-fast -tp k8-64e"
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Tetsuya Mishima
>>>>>>>>> 
>>>>>>>>>> On Mar 17, 2013, at 10:55 PM, Gustavo
>>> Correa<g...@ldeo.columbia.edu>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> In your example, have you tried not to modify the node file,
>>>>>>>>>>> launch two mpi processes with mpiexec, and request a "-bynode"
>>>>>>>>> distribution of processes:
>>>>>>>>>>> 
>>>>>>>>>>> mpiexec -bynode -np 2 ./my_program
>>>>>>>>>> 
>>>>>>>>>> This should work in 1.7, too (I use these kinds of options with
>>>>> SLURM
>>>>>>> all
>>>>>>>>> the time).
>>>>>>>>>> 
>>>>>>>>>> However, we should probably verify that the hostfile
> functionality
>>>>> in
>>>>>>>>> batch jobs hasn't been broken in 1.7, too, because I'm pretty
> sure
>>>>> that
>>>>>>>>> what you described should work.  However, Ralph, our
>>>>>>>>>> run-time guy, is on vacation this week.  There might be a delay
> in
>>>>>>>>> checking into this.
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Jeff Squyres
>>>>>>>>>> jsquy...@cisco.com
>>>>>>>>>> For corporate legal information go to:
>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Jeff Squyres
>>>>>>>> jsquy...@cisco.com
>>>>>>>> For corporate legal information go to:
>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to