Hi, Am 09.04.2010 um 23:48 schrieb Cristobal Navarro:
> Thanks, > now i get mixed results and everything seems to be working ok with mixed mpi > xecution > > is it normal that after receiving the results, the hosts remain busy like 15 > seconds ?? > example yes. This is the time SGE needs for housekeeping, ist can even take some minutes (especially if you kill a parallel job). -- Reuti > master:common master$ qrsh -verbose -pe orte 10 /opt/openmpi-1.4.1/bin/mpirun > -np 10 hostname > Your job 65 ("mpirun") has been submitted > waiting for interactive job to be scheduled ... > Your interactive job 65 has been successfully scheduled. > Establishing builtin session to host worker00.local ... > worker00.local > worker00.local > worker00.local > worker00.local > worker00.local > master.local > master.local > master.local > master.local > master.local > #after some seconds, i query the hosts status and slots are still used > master:common master$ qstat -f > queuename qtype resv/used/tot. load_avg arch > states > --------------------------------------------------------------------------------- > all.q@master.local BIP 0/5/16 0.02 darwin-x86 > 65 0.55500 mpirun master r 04/09/2010 17:44:36 5 > > --------------------------------------------------------------------------------- > all.q@worker00.local BIP 0/5/16 0.01 darwin-x86 > 65 0.55500 mpirun master r 04/09/2010 17:44:36 5 > > master:common master$ > > but after waiting more time, they get free again > master:common master$ qstat -f > queuename qtype resv/used/tot. load_avg arch > states > --------------------------------------------------------------------------------- > all.q@master.local BIP 0/0/16 0.01 darwin-x86 > --------------------------------------------------------------------------------- > all.q@worker00.local BIP 0/0/16 0.01 darwin-x86 > > anyways these are just details, thanks to your help the important aspects are > working. > Cristobal > > > > > On Fri, Apr 9, 2010 at 1:34 PM, Reuti <re...@staff.uni-marburg.de> wrote: > Am 09.04.2010 um 18:57 schrieb Cristobal Navarro: > > > sorry the command was missing a number > > > > as you said it should be > > > > qrsh -verbose -pe pempi 6 mpirun -np 6 hostname > > waiting for interactive job to be scheduled ... > > > > Your "qrsh" request could not be scheduled, try again later. > > --- > > this is my parallel enviroment > > qconf -sp pempi > > pe_name pempi > > slots 210 > > user_lists NONE > > xuser_lists NONE > > start_proc_args /usr/bin/true > > stop_proc_args /usr/bin/true > > allocation_rule $pe_slots > > $pe_slots means that all slots must come from one and the same machine (e.g. > for smp jobs). You can try $round_robin. > > -- Reuti > > > > control_slaves TRUE > > job_is_first_task FALSE > > urgency_slots min > > accounting_summary TRUE > > > > this is the queue > > qconf -sq cola.q > > qname cola.q > > hostlist @allhosts > > seq_no 0 > > load_thresholds np_load_avg=1.75 > > suspend_thresholds NONE > > nsuspend 1 > > suspend_interval 00:05:00 > > priority 0 > > min_cpu_interval 00:05:00 > > processors UNDEFINED > > qtype BATCH INTERACTIVE > > ckpt_list NONE > > pe_list make pempi > > rerun FALSE > > slots 2 > > tmpdir /tmp > > shell /bin/csh > > > > i noticed that if i put 2 slots (since the queue has 2 slots) on the -pe > > pempi N argument and also the full path to mpirun as you guys pointed, it > > works!!! > > cristobal@neoideo:~$ qrsh -verbose -pe pempi 2 > > /opt/openmpi-1.4.1/bin/mpirun -np 6 hostname > > Your job 125 ("mpirun") has been submitted > > waiting for interactive job to be scheduled ... > > Your interactive job 125 has been successfully scheduled. > > Establishing builtin session to host ijorge.local ... > > ijorge.local > > ijorge.local > > ijorge.local > > ijorge.local > > ijorge.local > > ijorge.local > > cristobal@neoideo:~$ qrsh -verbose -pe pempi 2 > > /opt/openmpi-1.4.1/bin/mpirun -np 6 hostname > > Your job 126 ("mpirun") has been submitted > > waiting for interactive job to be scheduled ... > > Your interactive job 126 has been successfully scheduled. > > Establishing builtin session to host neoideo ... > > neoideo > > neoideo > > neoideo > > neoideo > > neoideo > > neoideo > > cristobal@neoideo:~$ > > > > i just wonder why i didnt get mixed hostnames? like > > neoideo > > neoideo > > ijorge.local > > ijorge.local > > neoideo > > ijorge.local > > > > ?? > > > > thanks for the help already!!! > > > > Cristobal > > > > > > > > > > On Fri, Apr 9, 2010 at 8:58 AM, Huynh Thuc Cuoc <htc...@gmail.com> wrote: > > Dear friend, > > 1. > > I prefer to use sge qsub cmd, for examples: > > > > [huong@ioitg2 MyPhylo]$ qsub -pe orte 3 myphylo.qsub > > Your job 35 ("myphylo.qsub") has been submitted > > [huong@ioitg2 MyPhylo]$ qstat > > job-ID prior name user state submit/start at queue > > slots ja-task-ID > > ----------------------------------------------------------------------------------------------------------------- > > 35 0.55500 myphylo.qs huong r 04/09/2010 19:28:59 > > al...@node2.ioit-grid.ac.vn 3 > > [huong@ioitg2 MyPhylo]$ qstat > > [huong@ioitg2 MyPhylo]$ > > > > This job is running on node2 of my cluster. > > My softs as following: > > headnode: 4 CPUs. $GRAM, CentOS 5.4 + sge 6.2u4 (qmaster and also execd > > host) + openmpi 1.4.1 > > nodes 4CPUs, 1GRAM, CentOS 5.4 + sgeexecd + openmpi1.4.1 > > PE=orte and set to 4 slots. > > The app myphylo.qsub has the long cmd in the shell: > > /opt/openmpi/bin/mpirun -np 10 $HOME/MyPhylo/bin/par-phylo-builder --data . > > . . . > > Try to set PE as orte, use default PE = make instead. > > > > 2. I test your cmd on my sytem as: > > a. > > [huong@ioitg2 MyPhylo]$ qrsh -verbose -pe make mpirun -np 6 hostname > > error: Numerical value invalid! > > The initial portion of string "mpirun" contains no decimal number > > [huong@ioitg2 MyPhylo]$ qrsh -verbose -pe orte 2 mpirun -np 6 hostname > > Your job 36 ("mpirun") has been submitted > > > > waiting for interactive job to be scheduled ... > > Your interactive job 36 has been successfully scheduled. > > Establishing builtin session to host ioitg2.ioit-grid.ac.vn ... > > bash: mpirun: command not found > > [huong@ioitg2 MyPhylo]$ > > > > ERROR ! So I try: > > [huong@ioitg2 MyPhylo]$ qrsh -verbose -pe orte 2 /opt/openmpi/bin/mpirun > > -np 6 hostname > > Your job 38 ("mpirun") has been submitted > > > > waiting for interactive job to be scheduled ... > > Your interactive job 38 has been successfully scheduled. > > Establishing builtin session to host ioitg2.ioit-grid.ac.vn ... > > ioitg2.ioit-grid.ac.vn > > ioitg2.ioit-grid.ac.vn > > ioitg2.ioit-grid.ac.vn > > ioitg2.ioit-grid.ac.vn > > ioitg2.ioit-grid.ac.vn > > ioitg2.ioit-grid.ac.vn > > [huong@ioitg2 MyPhylo]$ > > > > This OK. > > What is: the PATH points to where mpirun is located. > > > > TRY. > > > > Good chance > > HT Cuoc > > > > > > On Fri, Apr 9, 2010 at 11:02 AM, Cristobal Navarro <axisch...@gmail.com> > > wrote: > > Hello, > > > > after some days of work and testing, i managed to install SGE on two > > machines, also installed openMPI 1.4.1 for each one. > > > > SGE is working, i can submit jobs and it schedules the jobs to the > > available cores total of 6, > > > > my problem is that im trying to run an openMPI job and i cant. > > > > this is an example of what i am trying. > > > > > > $qrsh -verbose -pe pempi mpirun -np 6 hostname > > Your job 105 ("mpirun") has been submitted > > waiting for interactive job to be scheduled ... > > > > Your "qrsh" request could not be scheduled, try again later. > > > > im not sure what this can be, > > in the ompi_info i have gridengine support. > > > > where do you recommend to look ?? > > thanks in advance > > > > Cristobal > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users