Hi all, I submit a parallel ORCA (Quantum Chemistry Program) job on multiple nodes in Rocks SGE, and get the follow error information, -------------------------------------------------------------------------- A hostfile was provided that contains at least one node not present in the allocation:
hostfile: test.nodes node: compute-0-67 If you are operating in a resource-managed environment, then only nodes that are in the allocation can be used in the hostfile. You may find relative node syntax to be a useful alternative to specifying absolute node names see the orte_hosts man page for further information. -------------------------------------------------------------------------- The ORCA program compiled with openmpi, here, I used orte parallel environment in Rocks SGE. $ qconf -sp orte pe_name orte slots 9999 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $fill_up control_slaves TRUE job_is_first_task FALSE urgency_slots min accounting_summary TRUE The submitted sge script: #!/bin/bash # Job submission script: # Usage: qsub <this_script> # #$ -cwd #$ -j y #$ -o test.sge.o$JOB_ID #$ -S /bin/bash #$ -N test #$ -pe orte 24 #$ -l h_vmem=3.67G #$ -l h_rt=1240:00:00 # go to work dir cd $SGE_O_WORKDIR # load the module env for ORCA source /usr/share/Modules/init/sh module load intel/compiler/2011.7.256 source /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh export orcapath=/share/apps/orca4.0.0 export RSH_COMMAND="ssh" #creat scratch dir on nfs dir tdir=/home/data/$SGE_O_LOGNAME/$JOB_ID mkdir -p $tdir #cat $PE_HOSTFILE PeHostfile2MachineFile() { cat $1 | while read line; do # echo $line host=`echo $line|cut -f1 -d" "|cut -f1 -d"."` nslots=`echo $line|cut -f2 -d" "` i=1 while [ $i -le $nslots ]; do # add here code to map regular hostnames into ATM hostnames echo $host i=`expr $i + 1` done done } PeHostfile2MachineFile $PE_HOSTFILE >> $tdir/test.nodes cp ${SGE_O_WORKDIR}/test.inp $tdir cd $tdir echo "ORCA job start at" `date` time $orcapath/orca test.inp > ${SGE_O_WORKDIR}/test.log rm ${tdir}/test.inp rm ${tdir}/test.*tmp 2>/dev/null rm ${tdir}/test.*tmp.* 2>/dev/null mv ${tdir}/test.* $SGE_O_WORKDIR echo "ORCA job finished at" `date` echo "Work Dir is : $SGE_O_WORKDIR" rm -rf $tdir rm $SGE_O_WORKDIR/test.sge However, the job can run normally on multiple nodes in Torque. Can someone help me? Thanks very much! Best regards! Yong Wu
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users