Not a direct answer to your question, but have you tried using Eclipse with the 
Parallel Platform Tools installed?

http://eclipse.org/ptp/

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of devendra rai
Sent: Monday, October 24, 2011 2:50 PM
To: us...@open-mpi.org
Subject: [OMPI users] Visual debugging on the cluster

Hello Community,


I have been struggling with visual debugging on cluster machines. So far, I 
tried to work around the problem, or total avoid it, but no more.


I have three machines on the cluster: a.s1.s2, b.s1.s2 and c.s1.s2. I do not 
have admin privileges on any of these machines.


Now, I want to run a visual debugger on all of these machines, and have the 
windows come up.



So for from: (http://www.open-mpi.org/faq/? 
category=running<http://www.open-mpi.org/faq/?category=running>)


13. Can I run GUI applications with Open MPI?
Yes, but it will depend on your local setup and may require additional setup.
In short: you will need to have X forwarding enabled from the remote processes 
to the display where you want output to appear. In a secure environment, you 
can simply allow all X requests to be shown on the target display and set the 
DISPLAY environment variable in all MPI process' environments to the target 
display, perhaps something like this:

shell$ hostname

my_desktop.secure-cluster. 
example.com<http://my_desktop.secure-cluster.example.com/>

shell$ xhost +

shell$ mpirun -np 4 -x DISPLAY=my_desktop.secure- 
cluster.example.com<http://my_desktop.secure-cluster.example.com/> a.out

However, this technique is not generally suitable for unsecure environments 
(because it allows anyone to read and write to your display). A slightly more 
secure way is to only allow X connections from the nodes where your application 
will be running:

shell$ hostname

my_desktop.secure-cluster. 
example.com<http://my_desktop.secure-cluster.example.com/>

shell$ xhost +compute1 +compute2 +compute3 +compute4

compute1 being added to access control list

compute2 being added to access control list

compute3 being added to access control list

compute4 being added to access control list

shell$ mpirun -np 4 -x DISPLAY=my_desktop.secure- 
cluster.example.com<http://my_desktop.secure-cluster.example.com/> a.out

(assuming that the four nodes you are running on are compute1 through compute4).
Other methods are available, but they involve sophisticated X forwarding 
through mpirun and are generally more complicated than desirable.

This still gives me "Error: Can't open display:" problem.

My mpirun shell script contains:

mpirun-1.4.3 -hostfile hostfile -np 3 -v -nooversubscribe --rankfile 
rankfile.txt --report-bindings  -timestamp-output ./testdisplay-window.sh


where rankfile and hostfile contain a.s1.s2, b.s1.s2 and c.s1.s2, and are 
proper.

The file ./testdisplay-window.sh:

#!/bin/bash
echo "Running xeyes on `hostname`"
DISPLAY=a.s1.s2:11.0
xeyes
exit 0

I see that my xauth list output already contains entries like:

a.s1.s2/unix:12  MIT-MAGIC-COOKIE-1  aa16a9573f42224d760c7bb618b48a6f
a.s1.s2/unix:10  MIT-MAGIC-COOKIE-1  0fb6fe3c2e35676136c8642412fb5809
a.s1.s2/unix:11  MIT-MAGIC-COOKIE-1  a3a65970b5f545bc750e3520a4e3b872


I seem to have run out of ideas now.

However, this works prefectly on any of the machines a.s1.s2, b.s1.s2 or 
c.s1.s2:

(for example, running from a.s1.s2):

ssh b.s1.s2 xeyes

Can someone help?


Best

Devendra Rai




________________________________
From: Jeff Squyres <jsquy...@cisco.com>
To: devendra rai <rai.deven...@yahoo.co.uk>; Open MPI Users <us...@open-mpi.org>
Sent: Friday, 21 October 2011, 13:14
Subject: Re: [OMPI users] orte_grpcomm_modex failed

This usually means that you have a Open MPI version mismatch between some of 
your nodes.  Meaning: on some nodes, you're finding version X.Y.Z of Open MPI 
by default, but on other nodes, you're finding version A.B.C.


On Oct 21, 2011, at 7:00 AM, devendra rai wrote:

> Hello Community,
>
> I have been struggling with this error for quite some time:
>
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>  orte_grpcomm_modex failed
>  --> Returned "Data unpack would read past end of buffer" (-26) instead of 
> "Success" (0)
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 18945 on
> node tik35x.ethz.ch exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
>
> I am running this on a cluster and this has started happening only after a 
> recent rebuild of openmpi-1.4.3. Interestingly, I have the same version of 
> openmpi on my PC, and the same application works fine.
>
> I have looked into this error on the web, but there is very little 
> discussion, on the causes, or how to correct it. I asked the admin to attempt 
> a re-install of openmpi, but I am not sure whether this will solve the 
> problem.
>
> Can some one please help?
>
> Thanks a lot.
>
> Best,
>
> Devendra Rai
> _______________________________________________
> users mailing list
> us...@open-mpi.org<mailto:us...@open-mpi.org>
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquy...@cisco.com<mailto:jsquy...@cisco.com>
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to