Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-10-01 Thread Sofia Aparicio Secanellas
I have tried with nc -lp and it was working pefectly everything I write on computer2 I can see on computer1. Thank you very much. Sofía - Original Message - From: "Jeff Squyres" To: "Open MPI Users" Sent: Tuesday, September 30, 2008 9:11 PM Subject: Re: [OMPI users] Problem wi

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Roberto Fichera
Ralph Castain ha scritto: > Hi Roberto > > There is something wrong with this cmd line - perhaps it wasn't copied > correctly? > > mpirun --verbose --debug-daemons --mca obl -np 1 -wdir `pwd` > testmaster 1 $PBS_NODEFILE > > Specifically, the following is incomplete: --mca obl > > I'm not sure

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Ralph Castain
Okay, I believe I understand the problem. What this error is telling you is that the Torque MOM is refusing our connection request because it is already busy. So we cannot spawn another process. If I understand your application correctly, you are spinning off multiple threads, each attempti

Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-10-01 Thread saparicio
The problem is the WiFi connection! We were connecting the computers using WiFi, we have changed to a cable connection and the program is working. It seems that the port that MPI_Send and MPI_Recv use is closed. Do you know which port use these commands? Thank you, Sofia > On Sep 30, 2008, at

Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-10-01 Thread Sofia Aparicio Secanellas
The problem is with the WiFi connection! We were connecting the computers using WiFi, we have changed to a cable connection and the program is working. It seems that the port that MPI_Send and MPI_Recv use are closed. Do you know which port are using the MPI_Send and MPI_Recv commands? Thank yo

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Roberto Fichera
Ralph Castain ha scritto: > Okay, I believe I understand the problem. What this error is telling > you is that the Torque MOM is refusing our connection request because > it is already busy. So we cannot spawn another process. > > If I understand your application correctly, you are spinning off > m

[OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-01 Thread V. Ram
I wrote earlier about one of my users running a third-party Fortran code on 32-bit x86 machines, using OMPI 1.2.7, that is having some odd crash behavior. Our cluster's nodes all have 2 single-core processors. If this code is run on 2 processors on 1 node, it runs seemingly fine. However, if the

Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-01 Thread Aurélien Bouteiller
If you have several network cards in your system, it can sometime get the endpoints confused. Especially if you don't have the same number of cards or don't use the same subnet for all "eth0, eth1". You should try to restrict Open MPI to use only one of the available networks by using the -

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Roberto Fichera
Ralph Castain ha scritto: > 3. remove the threaded launch scenario and just call comm_spawn in a > loop. > Below you find how openmpi works, if I put the MPI_Comm_spawn() in a loop and I drive the rest of the communication in a thread. Basically it freeze in the same place as I see [roberto@master

Re: [OMPI users] Crashes over TCP/ethernet but not on shared memory

2008-10-01 Thread Leonardo Fialho
Ram, What is the name and version of the kernel module for your NIC? I have experimented some similar with my tg3 module. The error which appeared for my was different: [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: No route to host (113) I solved it changi

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Ralph Castain
Afraid I am somewhat at a loss. The logs indicate that mpirun itself is having problems, likely caused by the threading. Only thing I can suggest is that you "unthread" the spawning loop and try it that way first so we can see if some underlying problem exists. FWIW: I have run a loop over

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Ralph Castain
Actually, it just occurred to me that you may be seeing a problem in comm_spawn itself that I am currently chasing down. It is in the 1.3 branch and has to do with comm_spawning procs on subsets of nodes (instead of across all nodes). Could be related to this - you might want to give me a c

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Roberto Fichera
Ralph Castain ha scritto: > Afraid I am somewhat at a loss. The logs indicate that mpirun itself > is having problems, likely caused by the threading. Only thing I can > suggest is that you "unthread" the spawning loop and try it that way > first so we can see if some underlying problem exists. > >

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Roberto Fichera
Ralph Castain ha scritto: > Actually, it just occurred to me that you may be seeing a problem in > comm_spawn itself that I am currently chasing down. It is in the 1.3 > branch and has to do with comm_spawning procs on subsets of nodes > (instead of across all nodes). Could be related to this - you

Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-10-01 Thread Jeff Squyres
I can't think of any reason why Open MPI wouldn't work over a WiFi connection unless there's some weirdness in the wireless driver. On Oct 1, 2008, at 5:19 AM, Sofia Aparicio Secanellas wrote: The problem is with the WiFi connection! We were connecting the computers using WiFi, we have chang

Re: [OMPI users] 1.2.2 to 1.2.7 differences.

2008-10-01 Thread Jeff Squyres
Joe -- AFAIK, we didn't change anything with regards to OPAL_PREFIX in the 1.2 series. Here's my tests on a 1.2.7 installation: - [17:34] svbu-mpi:/home/jsquyres/openmpi-1.2.7 % head config.log This file contains any messages produced by compilers while running configure, to aid debugging

Re: [OMPI users] 1.2.2 to 1.2.7 differences.

2008-10-01 Thread Joe Griffin
Hi Jeff, Thanks for the reply. I built 1.2.2 with configure/make/make install. but I built 1.2.7 with the srpm. Perhaps I will try the tar file. Thanks, Joe From: users-boun...@open-mpi.org on behalf of Jeff Squyres Sent: Wed 10/1/2008 5:35 PM To: Open MPI

Re: [OMPI users] 1.2.2 to 1.2.7 differences.

2008-10-01 Thread Shafagh Jafer
On our cluster we have RedHat Linux 7.3 Professional, and on the cluster specification it says the following: -The cluster should be able to run the follwoing software tools: gcc 2.96.x(or 2.95.x or 2.91.66) Bison 1.28 flex 2.5.4 mpich 1.2.5 So i am just wondering if my cluster is capable to run

[OMPI users] does openmpi run on RedHat Linux 7.3?

2008-10-01 Thread Shafagh Jafer
On our cluster we have RedHat Linux 7.3 Professional, and on the cluster specification it says the following: -The cluster should be able to run the follwoing software tools: gcc 2.96.x(or 2.95.x or 2.91.66) Bison 1.28 flex 2.5.4 mpich 1.2.5 So i am just wondering if my cluster is capab