Hi, Am 08.02.2013 um 19:36 schrieb Pierre LINDENBAUM:
> ( cross-posted on SO: http://stackoverflow.com/questions/14775451 ) > I'm very new to OpenMpi and I'm trying tosubmit OMPI to SGE: > > > I've installed openmpi , not in > /usr/... > but in > /commun/data/packages/openmpi/ > > it was compiled with --with-sge. > > I've added a new PE in SGE with qconf as descibed in > http://docs.oracle.com/cd/E19080-01/n1.grid.eng6/817-5677/6ml49n2c0/index.html > > # /commun/data/packages/openmpi/bin/ompi_info | grep gridengine > MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.3) > > # qconf -sq all.q | grep pe_ > pe_list make orte > > Without SGE, the program runs without any problem, using several processors. > > /commun/data/packages/openmpi/bin/orterun -np 20 ./a.out args > > Now I want to submit my program to SGE > > In the Open MPI FAQ, I read: > > # Allocate a SGE interactive job with 4 slots > # from a parallel environment (PE) named 'orte' > shell$ qsh -pe orte 4 > > but my output is: > > qsh -pe orte 4 > Your job 84550 ("INTERACTIVE") has been submitted > waiting for interactive job to be scheduled ... > Could not start interactive job. An INTERACTIVE job is more like an immediate job, i.e. "-now y". Do you have any interactive queue configured and the cluster is empty right now? > I've also tried the mpirun command embedded in a script: > > $ cat ompi.sh > #!/bin/sh > /commun/data/packages/openmpi/bin/mpirun \ > /path/to/a.out args > > but it fails > > $ cat ompi.sh.e84552 > error: executing task of job 84552 failed: execution daemon on host > "node02" didn't accept task This is a good sign, as it tries to use `qrsh -inherit ...` already. Can you confirm the following settings: $ qconf -sp orte ... control_slaves TRUE $ qconf -sq all.q ... shell_start_mode unix_behavior -- Reuti > -------------------------------------------------------------------------- > A daemon (pid 18327) died unexpectedly with status 1 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > -------------------------------------------------------------------------- > error: executing task of job 84552 failed: execution daemon on host > "node01" didn't accept task > -------------------------------------------------------------------------- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > > How can I fix this? > > Many thanks > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users