[OMPI users] btl_tcp_endpoint errors

2007-04-02 Thread Heywood, Todd
I'm testing a couple of applications with OpenMPI v1.2b, using over 1000 processors, and am getting TCP errors. These apps ran fine for a lesser number of processors. The errors can be different for different runs. Here's one: [blade90][0,1,223][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:

[OMPI users] problem with MPI_Bcast over ethernet

2007-04-02 Thread Jeff Stuart
for some reason, i am getting intermittent process crashing in MPI_Bcast. i run my program, which distributes some data via lots (thousands or more ) of 64k MPI_Bcast calls. the program that is crashing is fairly big, and it would take some time to widdle down a small example program. i *am* willi

Re: [OMPI users] experiences with gfortran and ompi

2007-04-02 Thread de Almeida, Valmor F.
> -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Jeff Squyres > > Bummer. FWIW, we have some internal testing OMPI Fortran codes that > are difficult to get to compile uniformly across all Fortran > compilers (the most difficult po

Re: [OMPI users] mca_btl_mx_init: mx_open_endpoint() failed withstatus=20

2007-04-02 Thread Tim Prins
Yes, only the first segfault is fixed in the nightly builds. You can run mx_endpoint_info to see how many endpoints are available and if any are in use. As far as the segfault you are seeing now, I am unsure what is causing it. Hopefully someone who knows more about that area of the code

Re: [OMPI users] Open MPI error when using MPI_Comm_spawn

2007-04-02 Thread Prakash Velayutham
Thanks Ralph. I will wait for your Torque dynamic host addition solution. Prakash >>> r...@lanl.gov 04/02/07 1:00 PM >>> Hi Prakash This is telling you that you have an error in the comm_spawn command itself. I am no expert there, so I'll have to let someone else identify it for you. There are

Re: [OMPI users] Open MPI error when using MPI_Comm_spawn

2007-04-02 Thread Ralph Castain
Hi Prakash This is telling you that you have an error in the comm_spawn command itself. I am no expert there, so I'll have to let someone else identify it for you. There are no limits to launching on nodes in a hostfile - they are all automatically considered "allocated" when the file is read. If

Re: [OMPI users] Open MPI error when using MPI_Comm_spawn

2007-04-02 Thread Prakash Velayutham
Hello, Thanks for the patch. I still do not know the internals of Open MPI, so can't test this right away. But here is another test I ran and that failed too. I have now removed Torque from the equation. I am NOT requesting nodes through Torque. I SSH to a compute node and start up the code as

Re: [OMPI users] Open MPI error when using MPI_Comm_spawn

2007-04-02 Thread Ralph Castain
No offense, but I would definitely advise against that path. There are other, much simpler solutions to dynamically add hosts. We *do* allow dynamic allocation changes - you just have to know how to do them. Nobody asked before... ;-) Future variations will include an even simpler, single API sol

Re: [OMPI users] Open MPI error when using MPI_Comm_spawn

2007-04-02 Thread Jeremy Buisson
Ralph Castain a écrit : > The runtime underneath Open MPI (called OpenRTE) will not allow you to spawn > processes on nodes outside of your allocation. This is for several reasons, > but primarily because (a) we only know about the nodes that were allocated, > so we have no idea how to spawn a proc

Re: [OMPI users] Open MPI error when using MPI_Comm_spawn

2007-04-02 Thread Prakash Velayutham
Thanks for the info, Ralph. It is as I thought, but was hoping wouldn't be that way. I am requesting more nodes from the resource manager from inside of my application code using the RM's API. when I know they are available (allocated by the RM), I am trying to split the application data across the

Re: [OMPI users] Open MPI error when using MPI_Comm_spawn

2007-04-02 Thread Ralph Castain
The runtime underneath Open MPI (called OpenRTE) will not allow you to spawn processes on nodes outside of your allocation. This is for several reasons, but primarily because (a) we only know about the nodes that were allocated, so we have no idea how to spawn a process anywhere else, and (b) most

[OMPI users] Open MPI error when using MPI_Comm_spawn

2007-04-02 Thread Prakash Velayutham
Hello, I have built Open MPI (1.2) with run-time environment enabled for Torque (2.1.6) resource manager. Initially I am requesting 4 nodes (1 CPU each) from Torque. The from inside of my MPI code I am trying to spawn more processes to nodes outside of Torque-assigned nodes using MPI_Comm_spawn, b

[OMPI users] HotI 2007 Call for Papers -- Deadline (April 9) is approaching

2007-04-02 Thread Weikuan Yu
Apologies if you received multiple copies of this posting. Please feel free to distribute it to those who might be interested. Hot In

Re: [OMPI users] experiences with gfortran and ompi

2007-04-02 Thread Jeff Squyres
On Apr 1, 2007, at 12:50 PM, de Almeida, Valmor F. wrote: I would be interested in hearing folk's experiences with gfortran and ompi-1.2. Is gfortran good enough for prime time? It is my understanding that gfortran is the GNU path forward -- they are not spending any time on g77. I've been

Re: [OMPI users] mca_btl_mx_init: mx_open_endpoint() failed withstatus=20

2007-04-02 Thread de Almeida, Valmor F.
Hi Tim, I installed the openmpi-1.2.1a0r14178 tarball (took this opportunity to use the intel fortran compiler instead gfortran). With a simple test it seems to work but note the same messages ->mpirun -np 8 -machinefile mymachines a.out [x1:25417] mca_btl_mx_init: mx_open_endpoint() failed wit