Re: [OMPI users] Mpirun failed for machines not in the same subnet.

2007-04-03 Thread Jeff Squyres
I have filed a ticket for this: https://svn.open-mpi.org/trac/ompi/ticket/972 On Apr 3, 2007, at 5:18 PM, Xie, Hugh wrote: I think that workaround you purposed would resolve this problem. -Original Message- From: users-boun...@open-mpi.org [mailto:users-bounces@open- mpi.org] O

Re: [OMPI users] Mpirun failed for machines not in the same subnet.

2007-04-03 Thread Xie, Hugh
I think that workaround you purposed would resolve this problem. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Tuesday, April 03, 2007 5:05 PM To: Open MPI Users Subject: Re: [OMPI users] Mpirun failed for machines

Re: [OMPI users] Mpirun failed for machines not in the same subnet.

2007-04-03 Thread Jeff Squyres
Do your different subnets violate the assumptions listed here? http://www.open-mpi.org/faq/?category=tcp#tcp-routability We have not implemented any workarounds to say "subnet X is routable to subnet Y" because no one had asked for them. Such workarounds are possible, of course, but I d

[OMPI users] Mpirun failed for machines not in the same subnet.

2007-04-03 Thread Xie, Hugh
Hi, I got the follow error message while running: 'mpirun -v -np 2 -machinefile hosts.txt testc.x' Process 0.1.1 is unable to reach 0.1.0 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. ---

Re: [OMPI users] btl_tcp_endpoint errors

2007-04-03 Thread Heywood, Todd
Hi Adrian, Thanks for that info. The OS is Linux. I was able to get rid of the "connection reset" (104) errors by increasing btl_tcp_endpoint_cache. That leaves the "no route to host" (113) problem. Interestingly, I sometimes (sometimes not) get the same error on daemon startup with ssh when expe

Re: [OMPI users] Open MPI error when using MPI_Comm_spawn

2007-04-03 Thread Jeff Squyres
On Apr 2, 2007, at 12:53 PM, Prakash Velayutham wrote: prakash@wins04:~/thesis/CS/Samples>mpirun -np 4 --bynode --hostfile machinefile ./parallel.laplace [wins01:17699] *** An error occurred in MPI_Comm_spawn [wins01:17699] *** on communicator MPI_COMM_WORLD [wins01:17699] *** MPI_ERR_ARG: in

Re: [OMPI users] [Re: Memory leak in openmpi-1.2?]

2007-04-03 Thread Mohamad Chaarawi
Yes we saw the memory leak, and a fix is already in the trunk right now.. Sorry i didn't reply back earlier... The fix will be merged in V1.2, as soon as the release managers approve it.. Thank you, On Tue, April 3, 2007 5:14 am, Bas van der Vlies wrote: > Mohamad Chaarawi wrote: >> Hello Mr. V

Re: [OMPI users] problems with profile.d scripts generated using openmpi.spec

2007-04-03 Thread Jeff Squyres
Thanks for the report! We've actually fixed and improved the specfile as part of the 1.2ofed release (see http://www.open-mpi.org/faq/?category=openfabrics#ofed- and-ompi-versions); those fixes should be available soon. In the meantime, here's the specfile that we're using for the 1.2 ofed

[OMPI users] problems with profile.d scripts generated using openmpi.spec

2007-04-03 Thread Marcin Dulak
Hi, I found that the /etc/profile.d/openmpi-1.2.sh and /etc/profile.d/openmpi-1.2.csh generated using openmpi.spec are incorrect. The contents of the generated (see the details of the generation process at the very end of my email) scripts is the following:

Re: [OMPI users] [Re: Memory leak in openmpi-1.2?]

2007-04-03 Thread Bas van der Vlies
Mohamad Chaarawi wrote: Hello Mr. Van der Vlies, We are currently looking into this problem and will send out an email as soon as we recognize something and fix it. Thank you, Mohamed, Just curious. Did you test this program and see the same behavior as at our site? Regards Subject:

Re: [OMPI users] btl_tcp_endpoint errors

2007-04-03 Thread Adrian Knoth
On Mon, Apr 02, 2007 at 07:15:41PM -0400, Heywood, Todd wrote: Hi, > [blade90][0,1,223][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:572:mc > a_btl_tcp_endpoint_complete_connect] connect() failed with errno=113 errno is OS specific, so it's important to know which OS you're using. You can