On Mar 14, 2010, at 3:20 PM, Josh Bernstein wrote:

> Hi John,
> 
> Mpiexec isn't needed with OMPI, in fact if you are using the one from OSC, it 
> only works with MPICH.


Hi Josh,

I guess I don't understand.  I think we do link against torque, but what I 
am trying to do is multiple mpi runs.  So I qsub a script that might have
in it

script1.sh

script2.sh

...

Inside of script1.sh is some various logic culminating in

  mpiexec <app> -i appinputfile1

script2.sh similarly invokes

  mpiexec <app> -i appinputfile2

but then those fail as shown below.

So I am not sure what is going on.

Thx....John





>  
> 
> Instead just build OMPI with --with-tm, and it will link against TORQUE and 
> start up and track jobs properly. 
> 
> -Joshua Bernstein
> Penguin Computing
> 
> On Mar 14, 2010, at 21:35, "John R. Cary" <c...@txcorp.com> wrote:
> 
>> I have a script that launches a bunch of runs on some compute nodes of
>> a cluster.  Once I get through the queue, I query PBS for my machine
>> file, then I copy that to a local file 'nodes' which I use for mpiexec:
>> 
>> mpiexec -machinefile /home/research/cary/projects/vpall/vptests/nodes -np 6 
>> /hom
>> e/research/cary/projects/vpall/builds/vorpal/par/vorpal/vorpal -i 
>> bathtubAntenna
>> .in -dim 2 -o bathtubAntenna2p -n 100 -d 100
>> 
>> but this fails with
>> 
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file 
>> ../../../
>> ../../orte/mca/ras/tm/ras_tm_module.c at line 153
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file 
>> ../../../
>> ../../orte/mca/ras/tm/ras_tm_module.c at line 87
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file 
>> ../../../
>> ../orte/mca/ras/base/ras_base_allocate.c at line 133
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file 
>> ../../../
>> ../orte/mca/plm/base/plm_base_launch_support.c at line 72
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file 
>> ../../../
>> ../../orte/mca/plm/tm/plm_tm_module.c at line 167
>> --------------------------------------------------------------------------
>> A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
>> launch so we are aborting.
>> 
>> The appropriate code snippet is
>> 
>>     /* setup the full path to the PBS file */
>>     filename = opal_os_path(false, mca_ras_tm_component.nodefile_dir,
>>                             pbs_jobid, NULL);
>>     fp = fopen(filename, "r");
>>     if (NULL == fp) {
>>         ORTE_ERROR_LOG(ORTE_ERR_FILE_OPEN_FAILURE);
>>         free(filename);
>>         return ORTE_ERR_FILE_OPEN_FAILURE;
>>     }
>> 
>> which kind of looks like it might be trying to open my pbs file instead
>> of the file I gave on the command line?  I really don't know, but does
>> anyone have any ideas here?
>> 
>> Thx....John Cary
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
John R Cary
c...@txcorp.com




Reply via email to