Hi, > Am 29.09.2016 um 14:41 schrieb <aditi...@wipro.com> <aditi...@wipro.com>: > > Hi, > > I am trying to run a Job on parallel nodes using openmpi1.4.5
Better would be Open MPI 1.6.5. Nevertheless the questions are: - Was Open MPI compiled with SGE integration, i.e. something like: $ ompi_info | grep grid MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.5) - Did you request a PE in the submission and how is this PE set up? - How does the `mpiexec` line in your jobscript look like? - All nodes can talk to each other directly? > and ge2011.11, job goes in Running state and then gets aborted. > After Job gets aborted, I get following error message on the primary node: > > error: executing task of job 28561 failed: failed sending task to > ex...@punehpcdl01.wiprohpc.com: can't find connection > > Or > > error: executing task of job 28560 failed: failed sending task to > ex...@punehpcdl01.wiprohpc.com: can't find connection > -------------------------------------------------------------------------- > A daemon (pid 20651) died unexpectedly with status 1 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the - Regarding this error, maybe you have to set explicitly LD_LIBRARY_PATH with the path to the dynamic libraries, and export this in your jobscript to the nodes: export LD_LIBRARY_PATH=<your_location_of_the_shared_libs> mpiexec -x LD_LIBRARY_PATH BTW: The Open MPI is also available on all nodes? -- Reuti > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > -------------------------------------------------------------------------- > mpirun: clean termination accomplished > > Is this mpi issue ? Please suggest how do I resolve this connection issue > between nodes. > > Thanks & Regards, > Aditi > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. WARNING: Computer viruses can be transmitted via > email. The recipient should check this email and any attachments for the > presence of viruses. The company accepts no liability for any damage caused > by any virus transmitted by this email. www.wipro.com > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users