Hello Meredith, Yes, I have tried the plugin already. The problem is that the plugin seems to be forever stuck in "Waiting for job information" stage. I scouted around a bit on how to solve the problem, and it did not seem straightforward. At least, the solution to me seemed like a one-time wonder.
And, this is how I shifted to parallel visual debuggers, using other tools like kdbg. However, in case you have PTP plugin working for you on Linux, it would help a lot if you can send screenshots/notes on how to set it up for multiple machines. So, summing up, I am still clueless. Thanks for your time though. Best Devendra ________________________________ From: Meredith Creekmore <mtcreekm...@broncs.utpa.edu> To: devendra rai <rai.deven...@yahoo.co.uk>; Open MPI Users <us...@open-mpi.org> Sent: Monday, 24 October 2011, 22:31 Subject: RE: [OMPI users] Visual debugging on the cluster Not a direct answer to your question, but have you tried using Eclipse with the Parallel Platform Tools installed? http://eclipse.org/ptp/ From:users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of devendra rai Sent: Monday, October 24, 2011 2:50 PM To: us...@open-mpi.org Subject: [OMPI users] Visual debugging on the cluster Hello Community, I have been struggling with visual debugging on cluster machines. So far, I tried to work around the problem, or total avoid it, but no more. I have three machines on the cluster: a.s1.s2, b.s1.s2 and c.s1.s2. I do not have admin privileges on any of these machines. Now, I want to run a visual debugger on all of these machines, and have the windows come up. So for from: (http://www.open-mpi.org/faq/? category=running) 13. Can I run GUI applications with Open MPI? Yes, but it will depend on your local setup and may require additional setup. In short: you will need to have X forwarding enabled from the remote processes to the display where you want output to appear. In a secure environment, you can simply allow all X requests to be shown on the target display and set the DISPLAYenvironment variable in all MPI process' environments to the target display, perhaps something like this: shell$ hostname my_desktop.secure-cluster. example.com shell$ xhost + shell$ mpirun -np 4 -x DISPLAY=my_desktop.secure- cluster.example.com a.out However, this technique is not generally suitable for unsecure environments (because it allows anyone to read and write to your display). A slightly more secure way is to only allow X connections from the nodes where your application will be running: shell$ hostname my_desktop.secure-cluster. example.com shell$ xhost +compute1 +compute2 +compute3 +compute4 compute1 being added to access control list compute2 being added to access control list compute3 being added to access control list compute4 being added to access control list shell$ mpirun -np 4 -x DISPLAY=my_desktop.secure- cluster.example.com a.out (assuming that the four nodes you are running on are compute1through compute4). Other methods are available, but they involve sophisticated X forwarding through mpirun and are generally more complicated than desirable. This still gives me "Error: Can't open display:" problem. My mpirun shell script contains: mpirun-1.4.3 -hostfile hostfile -np 3 -v -nooversubscribe --rankfile rankfile.txt --report-bindings -timestamp-output ./testdisplay-window.sh where rankfile and hostfile contain a.s1.s2, b.s1.s2 and c.s1.s2, and are proper. The file ./testdisplay-window.sh: #!/bin/bash echo "Running xeyes on `hostname`" DISPLAY=a.s1.s2:11.0 xeyes exit 0 I see that my xauth list output already contains entries like: a.s1.s2/unix:12 MIT-MAGIC-COOKIE-1 aa16a9573f42224d760c7bb618b48a6f a.s1.s2/unix:10 MIT-MAGIC-COOKIE-1 0fb6fe3c2e35676136c8642412fb5809 a.s1.s2/unix:11 MIT-MAGIC-COOKIE-1 a3a65970b5f545bc750e3520a4e3b872 I seem to have run out of ideas now. However, this works prefectly on any of the machines a.s1.s2, b.s1.s2or c.s1.s2: (for example, running from a.s1.s2): ssh b.s1.s2 xeyes Can someone help? Best Devendra Rai ________________________________ From:Jeff Squyres <jsquy...@cisco.com> To: devendra rai <rai.deven...@yahoo.co.uk>; Open MPI Users <us...@open-mpi.org> Sent: Friday, 21 October 2011, 13:14 Subject: Re: [OMPI users] orte_grpcomm_modex failed This usually means that you have a Open MPI version mismatch between some of your nodes. Meaning: on some nodes, you're finding version X.Y.Z of Open MPI by default, but on other nodes, you're finding version A.B.C. On Oct 21, 2011, at 7:00 AM, devendra rai wrote: > Hello Community, > > I have been struggling with this error for quite some time: > > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > orte_grpcomm_modex failed > --> Returned "Data unpack would read past end of buffer" (-26) instead of >"Success" (0) > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun has exited due to process rank 1 with PID 18945 on > node tik35x.ethz.ch exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > > I am running this on a cluster and this has started happening only after a > recent rebuild of openmpi-1.4.3. Interestingly, I have the same version of > openmpi on my PC, and the same application works fine. > > I have looked into this error on the web, but there is very little > discussion, on the causes, or how to correct it. I asked the admin to attempt > a re-install of openmpi, but I am not sure whether this will solve the > problem. > > Can some one please help? > > Thanks a lot. > > Best, > > Devendra Rai > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/