Re: [OMPI users] mpirun should run with just the localhost interface on win?

2011-10-25 Thread MM
-Original Message- if the interface is down, should localhost still allow mpirun to run mpi processes?

[OMPI users] Subnet routing (1.2.x) not working in 1.4.3 anymore

2011-10-25 Thread Mirco Wahab
In the last few years, it has been very simple to set up high-performance (GbE) multiple back-to-back connections between three nodes (triangular topology) or four nodes (tetrahedral topology). The only things you had to do was - use 3 (or 4) cheap compute nodes w/Linux and connect each of them

Re: [OMPI users] Checkpoint from inside MPI program with OpenMPI 1.4.2 ?

2011-10-25 Thread Josh Hursey
Open MPI (trunk/1.7 - not 1.4 or 1.5) provides an application level interface to request a checkpoint of an application. This API is defined on the following website: http://osl.iu.edu/research/ft/ompi-cr/api.php#api-cr_checkpoint This will behave the same as if you requested the checkpoint of t

Re: [OMPI users] Problem-Bug with MPI_Intercomm_create()

2011-10-25 Thread Ralph Castain
FWIW: I have tracked this problem down. The fix is a little more complicated then I'd like, so I'm going to have to ping some other folks to ensure we concur on the approach before doing something. On Oct 25, 2011, at 8:20 AM, Ralph Castain wrote: > I still see it failing the test George provid

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Gus Correa
Hi Mouhamad The locked memory is set to unlimited, but the lines about the stack are commented out. Have you tried to add this line: * - stack -1 then run wrf again? [Note no "#" hash character] Also, if you login to the compute nodes, what is the output of 'limit' [csh,tcsh] or 'uli

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Mouhamad Al-Sayed-Ali
Hi all, I've checked the "limits.conf", and it contains theses lines # Jcb 29.06.2007 : pbs wrf (Siji) #* hardstack 100 #* softstack 100 # Dr 14.02.2008 : pour voltaire mpi * hardmemlock unlimited * softmemlock unlimited Many thanks for you

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Gus Correa
Hi Mouhamad, Ralph, Terry Very often big programs like wrf crash with segfault because they can't allocate memory on the stack, and assume the system doesn't impose any limits for it. This has nothing to do with MPI. Mouhamad: Check if your stack size is set to unlimited on all compute nodes.

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Ralph Castain
Looks like you are crashing in wrf - have you asked them for help? On Oct 25, 2011, at 7:53 AM, Mouhamad Al-Sayed-Ali wrote: > Hi again, > > This is exactly the error I have: > > > taskid: 0 hostname: part034.u-bourgogne.fr > [part034:21443] *** Process received signal *** > [part034:21443

Re: [OMPI users] Hybrid MPI/Pthreads program behaves differently on two different machines with same hardware

2011-10-25 Thread Ralph Castain
My best guess is that you are seeing differences in scheduling behavior with respect to memory locale. I notice that you are not binding your processes, and so they are free to move around the various processors on the node. I would guess that your thread is winding up on a processor that is non

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE
This looks more like a seg fault in wrf and not OMPI. Sorry not much I can do here to help you. --td On 10/25/2011 9:53 AM, Mouhamad Al-Sayed-Ali wrote: Hi again, This is exactly the error I have: taskid: 0 hostname: part034.u-bourgogne.fr [part034:21443] *** Process received signal **

Re: [OMPI users] Problem-Bug with MPI_Intercomm_create()

2011-10-25 Thread Ralph Castain
I still see it failing the test George provided on the trunk. I'm unaware of anyone looking further into it, though, as the prior discussion seemed to just end. On Oct 25, 2011, at 7:01 AM, orel wrote: > Dears, > > I try from several days to use advanced MPI2 features in the following > scena

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Mouhamad Al-Sayed-Ali
Hi again, This is exactly the error I have: taskid: 0 hostname: part034.u-bourgogne.fr [part034:21443] *** Process received signal *** [part034:21443] Signal: Segmentation fault (11) [part034:21443] Signal code: Address not mapped (1) [part034:21443] Failing at address: 0xfffe01eeb340

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Mouhamad Al-Sayed-Ali
Hello can you run wrf successfully on one node? NO, It can't run on one node Can you run a simple code across your two nodes? I would try hostname then some simple MPI program like the ring example. Yes, I can run a simple code many thanks Mouhamad

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE
Can you run wrf successfully on one node? Can you run a simple code across your two nodes? I would try hostname then some simple MPI program like the ring example. --td On 10/25/2011 9:05 AM, Mouhamad Al-Sayed-Ali wrote: hello, -What version of ompi are you using I am using ompi version

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Mouhamad Al-Sayed-Ali
hello, -What version of ompi are you using I am using ompi version 1.4.1-1 compiled with gcc 4.5 -What type of machine and os are you running on I'm using linux machine 64 bits. -What does the machine file look like part033 part033 part031 part031 -Is there a stack trace le

[OMPI users] Problem-Bug with MPI_Intercomm_create()

2011-10-25 Thread orel
Dears, I try from several days to use advanced MPI2 features in the following scenario : 1) a master code A (of size NPA) spawns (MPI_Comm_spawn()) two slave codes B (of size NPB) and C (of size NPC), providing intercomms A-B and A-C ; 2) i create intracomm AB and AC by merging inte

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE
Some more info would be nice like: -What version of ompi are you using -What type of machine and os are you running on -What does the machine file look like -Is there a stack trace left behind by the pid that seg faulted? --td On 10/25/2011 8:07 AM, Mouhamad Al-Sayed-Ali wrote: Hello, I have t

[OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Mouhamad Al-Sayed-Ali
Hello, I have tried to run the executable "wrf.exe", using mpirun -machinefile /tmp/108388.1.par2/machines -np 4 wrf.exe but, I've got the following error: -- mpirun noticed that process rank 1 with PID 9942 on node pa

Re: [OMPI users] Memory mapped memory

2011-10-25 Thread Mouhamad Al-Sayed-Ali
Hello, I have tried to run the executable "wrf.exe", using mpirun -machinefile /tmp/108388.1.par2/machines -np 4 wrf.exe but, I've got the following error: -- mpirun noticed that process rank 1 with PID 9942 on node pa

Re: [OMPI users] Visual debugging on the cluster

2011-10-25 Thread devendra rai
Hello Meredith, Yes, I have tried the plugin already. The problem is that the plugin seems to be forever stuck in "Waiting for job information" stage. I scouted around a bit on how to solve the problem, and it did not seem straightforward. At least, the solution to me seemed like a one-time won

Re: [OMPI users] Hybrid MPI/Pthreads program behaves differently on two different machines with same hardware

2011-10-25 Thread 吕慧伟
Thanks, Ralph. Yes, I have taking that into account. The problem is not to compare two proc with one proc, but the "multi-threading effect". Multi-threading is good on the first machine for one and two proc, but on the second machine, it disappears for two proc. To narrow down the problem, I reins

Re: [OMPI users] Hybrid MPI/Pthreads program behaves differently on two different machines with same hardware

2011-10-25 Thread Ralph Castain
Okay - thanks for testing it. Of course, one obvious difference is that there isn't any communication when you run only one proc, but there is when you run two or more, assuming your application has MPI send/recv (or calls collective and other functions that communicate) calls in it. Communicat