[OMPI users] Purify found bugs inside open-mpi library

2009-04-28 Thread Brian Blank
To Whom This May Concern: I am having problems with an OpenMPI application I am writing on the Solaris/Intel AMD64 platform. I am using OpenMPI 1.3.2 which I compiled myself using the Solaris C/C++ compiler. My application was crashing (signal 11) inside a call to malloc() which my code was runn

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Ralph Castain
Best I can tell, the remote orted never got executed - it looks to me like there is something that blocks the ssh from working. Can you get into another window and ssh to the remote node? If so, can you do a ps and verify that the orted is actually running there? mpirun is using the same sh

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
As far as I can tell, both the PATH and LD_LIBRARY_PATH are set correctly. I've tried with the full path to the mpirun executable and using the --prefix command line option. Neither works. The debug output seems to contain a lot of system specific information (IPs, usernames and such), which I'm a

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Ralph Castain
Okay, that's one small step forward. You can lock that in by setting the appropriate MCA parameter in one of two ways: 1. add the following to your default mca parameter file: btl = tcp,sm,self (I added the shared memory subsystem as this will help with performance). You can see how to do

Re: [OMPI users] Problem with running openMPI program

2009-04-28 Thread Gus Correa
Hi Ankush Glad to hear that your MPI and cluster project were successful. I don't know if you would call these "mathematical computation" or "real life applications" of MPI and clusters, but here are a few samples I am familiar with (Earth Science): Weather forecast: http://www.wrf-model.org/in

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Hi, Yes I'm using ethernet connections. Doing as you suggest removes the errors generated by running the small test program, but still doesn't allow programs (including the small test program) to execute on any node other than the one launching mpirun. If I try to do that, the command han

Re: [OMPI users] Problem with running openMPI program

2009-04-28 Thread Jeff Squyres
On Apr 28, 2009, at 1:29 PM, Ankush Kaul wrote: I would like to know one more thing, what are real life applications that i can use the cluster for (except mathematical computation)? Can i use if for my web server, if yes then how? Not really. MPI is just about message passing -- it's fre

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Ralph Castain
In this instance, OMPI is complaining that you are attempting to use Infiniband, but no suitable devices are found. I assume you have Ethernet between your nodes? Can you run this with the following added to your mpirun cmd line: -mca btl tcp,self That will cause OMPI to ignore the Infiniband su

Re: [OMPI users] sharing memory between processes

2009-04-28 Thread Shaun Jackman
For what it's worth, the genome assembly software ABySS uses exactly this system that Jody is describing to represent a directed graph. http://www.bcgsc.ca/platform/bioinfo/software/abyss Cheers, Shaun jody wrote: Hi Barnabas As far as i know, Open-MPI is not a shared memory system. Using Op

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Many thanks for your help nonetheless. Hugh On 28 Apr 2009, at 17:23, jody wrote: Hi Hugh I'm sorry, but i must admit that i have never encountered these messages, and i don't know what their cause exactly is. Perhaps one of the developers can give an explanation? Jody On Tue, Apr 28, 2

Re: [OMPI users] sharing memory between processes

2009-04-28 Thread Barnabas Debreczeni
Hi Eugene and Jody, thanks for the ideas and elaborate answers. I will look into SysV and mmap, and find out something. I am not tied to PGAPack, there may be other PGA libs too... But I guess MPI and SysV/mmap do not cancel each other out, I just have to know about what is running locally and wha

Re: [OMPI users] Problem with running openMPI program

2009-04-28 Thread Ankush Kaul
Thanks everyone(esp Gus and Jeff) for the support and guidance. We are almost at the verge of completing our project which could have not been possible without all u guys. I would like to know one more thing, what are real life applications that i can use the cluster for (except mathematical compu

Re: [OMPI users] sharing memory between processes

2009-04-28 Thread Eugene Loh
Barnabas Debreczeni wrote: I am using PGAPack as a GA library, and it uses MPI to parallelize optimization runs. This is how I got to Open MPI. Let me see if I understand the underlying premise. You want to parallelize, but there are some large shared tables. There are many different para

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread jody
Hi Hugh I'm sorry, but i must admit that i have never encountered these messages, and i don't know what their cause exactly is. Perhaps one of the developers can give an explanation? Jody On Tue, Apr 28, 2009 at 5:52 PM, Hugh Dickinson wrote: > Hi again, > > I tried a simple mpi c++ program: >

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Hi again, I tried a simple mpi c++ program: -- #include #include using namespace MPI; using namespace std; int main(int argc, char* argv[]) { int rank,size; Init(argc,argv); rank=COMM_WORLD.Get_rank(); size=COMM_WORLD.Get_size(); cout << "P:" << rank << " out of " << size << endl;

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Hi Jody, I can paswordlessly ssh between all nodes (to and from) Almost none of these mpirun commands work. The only working case is if nodenameX is the node from which you are running the command. I don't know if this gives you extra diagnostic information, but if I explicitly set the wron

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread jody
Hi Hugh You're right, there is no initialization command (like lamboot) you have to call. I don't really know why your sewtup doesn't work, so i'm making some more "blind shots" can you do passwordless ssh from between any two of your nodes? does mpirun -np 1 --host nodenameX uptime work for e

Re: [OMPI users] sharing memory between processes

2009-04-28 Thread jody
Hi Barnabas As far as i know, Open-MPI is not a shared memory system. Using Open-MPI to attack your problem on N processors, i would poceed as follows: - processor 0 reads the table and then splits it into N parts (Table_0,...Table_N) - processor 0 sends Table_i to processor i (for all i > 0) usi

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Hi Jody, The node names are exactly the same. I wanted to avoid updating the version because I'm not the system administrator, and it could take some time before it gets done. If it's likely to fix the problem though I'll try it. I'm assuming that I don't have to do something analogous to

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread jody
Hi Hugh Again, just to make sure, are the hostnames in your host file well-known? I.e. when you say you can do ssh nodename uptime do you use exactly the same nodename in your host file? (I'm trying to eliminate all non-Open-MPI error sources, because with your setup it should basically work.)

[OMPI users] sharing memory between processes

2009-04-28 Thread Barnabas Debreczeni
Hi! I am new to this list and to parallel programming in general. I am writing a trading simulator for the forex market and I am using genetic algorithms to breed trading parameters. I am using PGAPack as a GA library, and it uses MPI to parallelize optimization runs. This is how I got to Open MP

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Hi Jody,Indeed, all the nodes are running the same version of Open MPI. Perhaps I was incorrect to describe the cluster as heterogeneous. In fact, all the nodes run the same operating system (Scientific Linux 5.2), it's only the hardware that's different and even then they're all i386 or i686. I'm

Re: [OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread jody
Hi Hugh Just to make sure: You have installed Open-MPI on all your nodes? Same version everywhere? Jody On Tue, Apr 28, 2009 at 12:57 PM, Hugh Dickinson wrote: > Hi all, > > First of all let me make it perfectly clear that I'm a complete beginner as > far as MPI is concerned, so this may well

Re: [OMPI users] users Digest, Vol 1212, Issue 3, Message: 2

2009-04-28 Thread Jeff Squyres
On Apr 27, 2009, at 10:22 PM, jan wrote: Thank You Jeff Squyres. I have checked out the web page http://www.open-mpi.org/community/lists/announce/2009/03/0029.php, then the page https://svn.open-mpi.org/trac/ompi/ticket/1853 , but the web page svn.open-mpi.org seems crash. Try that ticket

Re: [OMPI users] Checkpointing hangs with OpenMPI-1.3.1

2009-04-28 Thread Josh Hursey
On Apr 28, 2009, at 7:27 AM, nee...@crlindia.com wrote: Hi Josh, Thanks for your reply. Actually the reason for hang was missing blcr library in LD_LIBRARY_PATH. After setting it right, checkpoint was working but as you mentioned before, datatype error is coming along with,

Re: [OMPI users] Upgrade from Open MPI 1.2 to 1.3

2009-04-28 Thread Sergiy Khan
due to other reasons) -- he has two separate installs of OMPI: /opt/ompi-1.2 /opt/ompi-1.3 Jeff, that is correct. He builds his app with /opt/ompi-1.2/bin/mpicc. But then he sets his LD_LIBRARY_PATH to /opt/ompi-1.3/lib/ and runs his app with /opt/ompi-1.3/bin/mpirun. This means his app will

Re: [OMPI users] Upgrade from Open MPI 1.2 to 1.3

2009-04-28 Thread Jeff Squyres
On Apr 28, 2009, at 7:50 AM, Ralph Castain wrote: I'd be fascinated to understand how this works. There are multiple function calls in MPI_Init, for example, that simply don't exist in 1.3.x. There are references to fields in structures that are no longer present, though the structure itself doe

Re: [OMPI users] Upgrade from Open MPI 1.2 to 1.3

2009-04-28 Thread Ralph Castain
I'd be fascinated to understand how this works. There are multiple function calls in MPI_Init, for example, that simply don't exist in 1.3.x. There are references to fields in structures that are no longer present, though the structure itself does still exist. Etc. I frankly am stunned that

Re: [OMPI users] Upgrade from Open MPI 1.2 to 1.3

2009-04-28 Thread Serge
Ralph, Brian, and Jeff, Thank you for your answers. I want to confirm Brian's words that I am "compiling the application against one version of Open MPI, linking dynamically, then running against another version of Open MPI". The fact that the ABI has stabilized with the release of version 1

Re: [OMPI users] Checkpointing hangs with OpenMPI-1.3.1

2009-04-28 Thread neeraj
Hi Josh, Thanks for your reply. Actually the reason for hang was missing blcr library in LD_LIBRARY_PATH. After setting it right, checkpoint was working but as you mentioned before, datatype error is coming along with, and hence restart is not working. a) The errors comi

[OMPI users] ORTE_ERROR_LOG: Timeout in file

2009-04-28 Thread Hugh Dickinson
Hi all, First of all let me make it perfectly clear that I'm a complete beginner as far as MPI is concerned, so this may well be a trivial problem! I've tried to set up Open MPI to use SSH to communicate between nodes on a heterogeneous cluster. I've set up passwordless SSH and it seems

Re: [OMPI users] OpenMPI MPI_Bcast Algorithms

2009-04-28 Thread neeraj
Hi Axida, There are two ways you can call MPI_collectives. 1) Automatic decision by OpenMPI which in turn call tuned collectives 2) Forced decision, where you can override OpenMPI to call certain algorithms available for collective say MPI_Bcast. The logic for 1

[OMPI users] OpenMPI MPI_Bcast Algorithms

2009-04-28 Thread shan axida
Hi all, I think there are several algorithms used in MPI_Bcast. I am wondering how are they decided to be excuted ? I mean, How to decide which algorithm will be used? Is it depending on the message size or something ? Would some people help me? Thank you!