At 02:15 PM 9/29/2008 -0700, you wrote: >It sounds like you may not have setup paswordless ssh between all >your nodes. > >Doug Reeder
That's not the case. paswordless ssh is set up and it works fine. -- that's how I can do "mpirun -np 6 -machinefiles ......" fine. Zhiliang >On Sep 29, 2008, at 2:12 PM, Zhiliang Hu wrote: > >>At 10:45 PM 9/29/2008 +0200, you wrote: >>>Am 29.09.2008 um 22:33 schrieb Zhiliang Hu: >>> >>>>At 07:37 PM 9/29/2008 +0200, Reuti wrote: >>>> >>>>>>"-l nodes=6:ppn=2" is all I have to specify the node requests: >>>>> >>>>>this might help: http://www.open-mpi.org/faq/?category=tm >>>> >>>>Essentially the examples given on this web is no difference from >>>>what I did. >>>>Only thing new is, I suppose "qsub -I " is for interactive mode. >>>>When I did this: >>>> >>>> qsub -I -l nodes=7 mpiblastn.sh >>>> >>>>It hangs on "qsub: waiting for job 798.nagrp2.ansci.iastate.edu to >>>>start". >>>> >>>> >>>>>>UNIX_PROMPT> qsub -l nodes=6:ppn=2 /path/to/mpi_program >>>>>>where "mpi_program" is a file with one line: >>>>>>/path/to/mpirun -np 12 /path/to/my_program >>>>> >>>>>Can you please try this jobscript instead: >>>>> >>>>>#!/bin/sh >>>>>set | grep PBS >>>>>/path/to/mpirun /path/to/my_program >>>>> >>>>>All should be handled by Open MPI automatically. With the "set" >>>>>bash >>>>>command you will get a list with all defined variables for further >>>>>analysis; and where you can check for the variables set by Torque. >>>>> >>>>>-- Reuti >>>> >>>>"set | grep PBS" part had nothing in output. >>> >>>Strange - you checked the .o end .e files of the job? - Reuti >> >>There is nothing in -o nor -e output. I had to kill the job. >>I checked torque log, it shows (/var/spool/torque/server_logs): >> >>09/29/2008 15:52:16;0100;PBS_Server;Job;799.xxx.xxx.xxx;enqueuing >>into default, state 1 hop 1 >>09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job Queued >>at request of z...@xxx.xxx.xxx, owner = z...@xxx.xxx.xxx, job name = >>mpiblastn.sh, queue = default >>09/29/2008 15:52:16;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent >>command new >>09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job >>Modified at request of schedu...@xxx.xxx.xxx >>09/29/2008 15:52:27;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job deleted >>at request of z...@xxx.xxx.xxx >>09/29/2008 15:52:27;0100;PBS_Server;Job;799.xxx.xxx.xxx;dequeuing >>from default, state EXITING >>09/29/2008 15:52:27;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent >>command term >>09/29/2008 15:52:47;0001;PBS_Server;Svr;PBS_Server;is_request, bad >>attempt to connect from 172.16.100.1:1021 (address not trusted - >>check entry in server_priv/nodes) >> >>where the server_priv/nodes has: >>node001 np=4 >>node002 np=4 >>node003 np=4 >>node004 np=4 >>node005 np=4 >>node006 np=4 >>node007 np=4 >> >>which was set up by the vender. >> >>What is "address not trusted"? >> >>Zhiliang >> >> >> >> >>_______________________________________________ >>users mailing list >>us...@open-mpi.org >>http://www.open-mpi.org/mailman/listinfo.cgi/users > >_______________________________________________ >users mailing list >us...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/users