On Mar 14, 2010, at 3:20 PM, Josh Bernstein wrote: > Hi John, > > Mpiexec isn't needed with OMPI, in fact if you are using the one from OSC, it > only works with MPICH.
Hi Josh, I guess I don't understand. I think we do link against torque, but what I am trying to do is multiple mpi runs. So I qsub a script that might have in it script1.sh script2.sh ... Inside of script1.sh is some various logic culminating in mpiexec <app> -i appinputfile1 script2.sh similarly invokes mpiexec <app> -i appinputfile2 but then those fail as shown below. So I am not sure what is going on. Thx....John > > > Instead just build OMPI with --with-tm, and it will link against TORQUE and > start up and track jobs properly. > > -Joshua Bernstein > Penguin Computing > > On Mar 14, 2010, at 21:35, "John R. Cary" <c...@txcorp.com> wrote: > >> I have a script that launches a bunch of runs on some compute nodes of >> a cluster. Once I get through the queue, I query PBS for my machine >> file, then I copy that to a local file 'nodes' which I use for mpiexec: >> >> mpiexec -machinefile /home/research/cary/projects/vpall/vptests/nodes -np 6 >> /hom >> e/research/cary/projects/vpall/builds/vorpal/par/vorpal/vorpal -i >> bathtubAntenna >> .in -dim 2 -o bathtubAntenna2p -n 100 -d 100 >> >> but this fails with >> >> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file >> ../../../ >> ../../orte/mca/ras/tm/ras_tm_module.c at line 153 >> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file >> ../../../ >> ../../orte/mca/ras/tm/ras_tm_module.c at line 87 >> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file >> ../../../ >> ../orte/mca/ras/base/ras_base_allocate.c at line 133 >> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file >> ../../../ >> ../orte/mca/plm/base/plm_base_launch_support.c at line 72 >> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file >> ../../../ >> ../../orte/mca/plm/tm/plm_tm_module.c at line 167 >> -------------------------------------------------------------------------- >> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to >> launch so we are aborting. >> >> The appropriate code snippet is >> >> /* setup the full path to the PBS file */ >> filename = opal_os_path(false, mca_ras_tm_component.nodefile_dir, >> pbs_jobid, NULL); >> fp = fopen(filename, "r"); >> if (NULL == fp) { >> ORTE_ERROR_LOG(ORTE_ERR_FILE_OPEN_FAILURE); >> free(filename); >> return ORTE_ERR_FILE_OPEN_FAILURE; >> } >> >> which kind of looks like it might be trying to open my pbs file instead >> of the file I gave on the command line? I really don't know, but does >> anyone have any ideas here? >> >> Thx....John Cary >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- John R Cary c...@txcorp.com