At 12:10 AM 9/30/2008 +0200, you wrote: >>>>>>Can you please try this jobscript instead: >>>>>> >>>>>>#!/bin/sh >>>>>>set | grep PBS >>>>>>/path/to/mpirun /path/to/my_program >>>>>> >>>>>>All should be handled by Open MPI automatically. With the "set" >>>>>>bash >>>>>>command you will get a list with all defined variables for further >>>>>>analysis; and where you can check for the variables set by Torque. >>>>>> >>>>>>-- Reuti >>>>> >>>>>"set | grep PBS" part had nothing in output. >>>> >>>>Strange - you checked the .o end .e files of the job? - Reuti >>> >>>There is nothing in -o nor -e output. I had to kill the job. >>>I checked torque log, it shows (/var/spool/torque/server_logs): >>> >>>09/29/2008 15:52:16;0100;PBS_Server;Job;799.xxx.xxx.xxx;enqueuing >>>into default, state 1 hop 1 >>>09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job Queued >>>at request of z...@xxx.xxx.xxx, owner = z...@xxx.xxx.xxx, job name = >>>mpiblastn.sh, queue = default >>>09/29/2008 15:52:16;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent >>>command new >>>09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job >>>Modified at request of schedu...@xxx.xxx.xxx >>>09/29/2008 15:52:27;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job >>>deleted at request of z...@xxx.xxx.xxx >>>09/29/2008 15:52:27;0100;PBS_Server;Job;799.xxx.xxx.xxx;dequeuing >>>from default, state EXITING >>>09/29/2008 15:52:27;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent >>>command term >>>09/29/2008 15:52:47;0001;PBS_Server;Svr;PBS_Server;is_request, bad >>>attempt to connect from 172.16.100.1:1021 (address not trusted - >>>check entry in server_priv/nodes) > >As you blank out some addresses: have the nodes and the headnode one >or two network cards installed? All the names like node001 et al. are >known on neach node by the correct address? I.e. 172.16.100.1 = node001? > >-- Reuti
There should be no problem in this regard -- the set up is by a commercial company. I can ssh from any node to any node (passwdless). Zhiliang