Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Ralph Castain
Quite possible - the two sections were written by different people several years apart. I'll take a look and see what can be done. Thanks! On Dec 1, 2009, at 8:45 PM, kevin.buck...@ecs.vuw.ac.nz wrote: >> Interesting - especially since the existing code works quite well over a >> wide range of

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-01 Thread Nicolas Bock
On Tue, Dec 1, 2009 at 18:03, Ralph Castain wrote: > You may want to check your limits as defined by the shell/system. I can > also run this for as long as I'm willing to let it run, so something else > appears to be going on. > > > Is that with 1.3.3? I found that with 1.3.4 I can run the exampl

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Kevin . Buckley
> Interesting - especially since the existing code works quite well over a > wide range of platforms. So I'm not quite so eager to declare it incorrect > and only working by accident. > > However, I would welcome a proposed patch so we can look at it. This is > always an important area for us, so t

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Gus Correa
Hi Belaid PBS loves to read the nodes' list backwards. If you want to start with WN1, put it last on the Torque/PBS "nodes" file. Gus Correa - Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY,

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Gus Correa
Hi Belaid Belaid MOA wrote: You made my day Gus! Thank you very much. I'm glad it helped. I hope it is working for you now. If I asked before, I would have finished within two hours (but I guess that's part of the learning process). Oh, well, that's nothing to worry about. On these mailin

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Ralph Castain
Interesting - especially since the existing code works quite well over a wide range of platforms. So I'm not quite so eager to declare it incorrect and only working by accident. However, I would welcome a proposed patch so we can look at it. This is always an important area for us, so the more

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Kevin . Buckley
>> I assume that both of you have seen the reply from Aleksej Saushev, >> who seems to be the bloke looking after the port of OpenMPI to the >> NetBSD platform. >> >> >> Aleksej suggested some mods he had partially looked at, in >> >> opal/util/if.c > > Nope - didn't see anything like that :-/ Aah

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Belaid MOA
I actually tried both: -- in the interactive mode, as soon as I hit enter, the PBS sends me to a worker node (WN2) that does not have tm support. I guess if I added the head node to the list of PBS nodes, I would not run into the problem. However, I am glad I did run into the problem. Y

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Belaid MOA
You made my day Gus! Thank you very much. If I asked before, I would have finished within two hours (but I guess that's part of the learning process). Very straight forward! Although I tried doing exactly what you said, the Googled-information is not clear and sometimes misleading about what t

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Ralph Castain
Just to further show my confusion (since I wrote much of the TM support): If you get an interactive allocation and then type "mpirun ", mpirun will execute on the node upon which you are sitting. Jeff's statement is -only- true if you "qsub" the job - i.e., you run it in batch mode. From yo

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Belaid MOA
> Yes, this page is definitely incorrect if you want to run with PBS/TM > support -- you definitely need to install with TM support on all nodes. > > The reason is that PBS will launch your script (and therefore > "mpirun") on the first node of the job. This node must have an Open > MPI mp

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Ralph Castain
On Dec 1, 2009, at 6:43 PM, kevin.buck...@ecs.vuw.ac.nz wrote: > >> "Jeff Squyres" >> >> >> Oy. This is ick, because this error code is coming from horrendously >> complex code deep in the depths of OMPI that is probing the OS to >> figure out what ethernet interfaces you have. It may or ma

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Kevin . Buckley
> "Jeff Squyres" > > > Oy. This is ick, because this error code is coming from horrendously > complex code deep in the depths of OMPI that is probing the OS to > figure out what ethernet interfaces you have. It may or may not be > simple to fix this. > > Do you mind diving into the OMPI code a

Re: [OMPI users] Program deadlocks, on simple send/recv loop

2009-12-01 Thread John R. Cary
Jeff Squyres wrote: (for the web archives) Brock and I talked about this .f90 code a bit off list -- he's going to investigate with the test author a bit more because both of us are a bit confused by the F90 array syntax used. Attached is a simple send/recv code written (procedural) C++ that

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-01 Thread Ralph Castain
You may want to check your limits as defined by the shell/system. I can also run this for as long as I'm willing to let it run, so something else appears to be going on. On Dec 1, 2009, at 4:38 PM, Nicolas Bock wrote: > > > On Tue, Dec 1, 2009 at 16:28, Abhishek Kulkarni wrote: > On Tue, De

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Ralph Castain
I believe what this is saying is that we are not finding any TCP interfaces - the ioctl itself is failing. So yes - miprun failing at that point is going to happen because we have no way to communicate for launch. Do you see interfaces if you do an /sbin/ifconfig? Do they have valid IP address

Re: [OMPI users] OpenMPI without IPoIB

2009-12-01 Thread Jeff Squyres
You might also want to ensure that your Open MPI was built with OpenFabrics support (i.e., to use verbs directly instead of IPoIB). Try this: ompi_info | grep openib If that returns a line with "openib" and "btl" in it, then your Open MPI has OpenFabrics support (we named the plugin "o

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-01 Thread Jeff Squyres
On Nov 29, 2009, at 6:15 PM, > wrote: $ mpirun -n 4 hello_f77 [somebox.ecs.vuw.ac.nz:04414] opal_ifinit: ioctl(SIOCGIFFLAGS) failed with errno=6 Oy. This is ick, because this error code is coming from horrendously complex code deep in the depths of OMPI that is probing the OS to figu

Re: [OMPI users] Program deadlocks, on simple send/recv loop

2009-12-01 Thread Jeff Squyres
(for the web archives) Brock and I talked about this .f90 code a bit off list -- he's going to investigate with the test author a bit more because both of us are a bit confused by the F90 array syntax used. On Dec 1, 2009, at 10:46 AM, Brock Palen wrote: The attached code, is an example

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Gus Correa
Hi Belaid Moa Belaid MOA wrote: In that case, the way I installed it is not right. I thought that only the HN should be configured with the tm support not the worker nodes; the worker nodes only have the PBS daemon clients - No need for tm support on the worker nodes. When I ran ompi_info | g

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Jeff Squyres
On Dec 1, 2009, at 7:02 PM, Belaid MOA wrote: The information on the following link has mislead me then: http://www.physics.iitm.ac.in/~sanoop/linux_files/cluster.html (check OpenMPI Configuration section.) Yes, this page is definitely incorrect if you want to run with PBS/TM support -- you

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Belaid MOA
Thanks a lot Jeff. That's what I will do next :) With Many Thanks to everyone. ~Belaid. > From: jsquy...@cisco.com > To: us...@open-mpi.org > Date: Tue, 1 Dec 2009 18:59:52 -0500 > Subject: Re: [OMPI users] mpirun is using one PBS node only > > You need to install with TM support on all nodes

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Belaid MOA
In that case, the way I installed it is not right. I thought that only the HN should be configured with the tm support not the worker nodes; the worker nodes only have the PBS daemon clients - No need for tm support on the worker nodes. When I ran ompi_info | grep tm on the worker nodes, the

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Jeff Squyres
You need to install with TM support on all nodes. On Dec 1, 2009, at 6:08 PM, Belaid MOA wrote: I tried -bynode option but it did not change anything. I also tried the "hostname" name command and I keep getting only the name of one node repeated according to the - n value. Just to make sure

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Belaid MOA
Thank you very much Ralph for your help. >I'm having a little trouble following this email thread, so forgive any >misunderstanding. >If I understand this correctly, you are able to correctly run if you provide a >-hostfile option. The issue is that mpirun does not >appear to be picking up

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-01 Thread Nicolas Bock
On Tue, Dec 1, 2009 at 16:28, Abhishek Kulkarni wrote: > On Tue, Dec 1, 2009 at 6:15 PM, Nicolas Bock > wrote: > > After reading Anthony's question again, I am not sure now that we are > having > > the same problem, but we might. In any case, the attached example > programs > > trigger the issue

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Gus Correa
Hi Belaid Moa The OpenMPI I install and use is on a NFS mounted directory. Hence, all the nodes see the same version, which has "tm" support. After reading your OpenMPI configuration parameters on the headnode and working nodes (and the difference between them), I would guess (just a guess) that

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Ralph Castain
I'm having a little trouble following this email thread, so forgive any misunderstanding. If I understand this correctly, you are able to correctly run if you provide a -hostfile option. The issue is that mpirun does not appear to be picking up the PBS_NODEFILE automatically and using it - corr

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-01 Thread Abhishek Kulkarni
On Tue, Dec 1, 2009 at 6:15 PM, Nicolas Bock wrote: > After reading Anthony's question again, I am not sure now that we are having > the same problem, but we might. In any case, the attached example programs > trigger the issue of running out of pipes. I don't see how orted could, even > if it was

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-01 Thread Nicolas Bock
Linux mujo 2.6.30-gentoo-r5 #1 SMP PREEMPT Thu Sep 17 07:47:12 MDT 2009 x86_64 Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz GenuineIntel GNU/Linux On Tue, Dec 1, 2009 at 16:24, Ralph Castain wrote: > It really does help if we have some idea what OMPI version you are talking > about, and on what k

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-01 Thread Nicolas Bock
Sorry, openmpi-1.3.3 compiled with gcc-4.4.2 nick On Tue, Dec 1, 2009 at 16:24, Ralph Castain wrote: > It really does help if we have some idea what OMPI version you are talking > about, and on what kind of platform. > > This issue was fixed to the best of my knowledge (not all the pipes were

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-01 Thread Ralph Castain
It really does help if we have some idea what OMPI version you are talking about, and on what kind of platform. This issue was fixed to the best of my knowledge (not all the pipes were getting closed), but I would have to look and see what release might contain the fix...would be nice to know w

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-01 Thread Nicolas Bock
After reading Anthony's question again, I am not sure now that we are having the same problem, but we might. In any case, the attached example programs trigger the issue of running out of pipes. I don't see how orted could, even if it was reused. There is only a very limited number of processes run

[OMPI users] MPI_Comm_spawn lots of times

2009-12-01 Thread Nicolas Bock
Hello list, a while back in January of this year, a user (Anthony Thevenin) had the problem of running out of open pipes when he tried to use MPI_Comm_spawn a few times. As I the thread his started in the mailing list archives and have just joined the mailing list myself, I unfortunately can't rep

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Belaid MOA
I tried -bynode option but it did not change anything. I also tried the "hostname" name command and I keep getting only the name of one node repeated according to the -n value. Just to make sure I did the right installation, here is what I did: -- On the head node (HN), I installed openMPI u

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Gus Correa
Hi Belaid Moa Belaid MOA wrote: Thanks a lot Gus for you help again. I only have one CPU per node. The -n X option (no matter what the value of X is) shows X processes running on one node only (the other one is free). So, somehow it is oversubscribing your single processor on the first node.

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Belaid MOA
Thanks a lot Gus for you help again. I only have one CPU per node. The -n X option (no matter what the value of X is) shows X processes running on one node only (the other one is free). If I add the machinefile option with WN1 and WN2 in it, the right behavior is manifested. According to the do

Re: [OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Gus Correa
Hi Belaid Moa Belaid MOA wrote: Hi everyone, Here is another elementary question. I tried the following steps found in the FAQ section of www.open-mpi.org with a simple hello world example (with PBS/torque): $ qsub -l nodes=2 my_script.sh my_script.sh is pasted below:

[OMPI users] mpirun is using one PBS node only

2009-12-01 Thread Belaid MOA
Hi everyone, Here is another elementary question. I tried the following steps found in the FAQ section of www.open-mpi.org with a simple hello world example (with PBS/torque): $ qsub -l nodes=2 my_script.sh my_script.sh is pasted below: #!/bin/sh -l #PBS -N hell

Re: [OMPI users] Elementary question on openMPI application location when using PBS submission

2009-12-01 Thread Belaid MOA
I saw those options before but somehow I did not pay attention to them :(. I was thinking that the copying is done automatically, so I felt the options were useless but I was wrong. Thanks a lot Gus; that's exactly what I was looking for. I will try them then. Best Regards. ~Belaid. > Date: T

Re: [OMPI users] Elementary question on openMPI application location when using PBS submission

2009-12-01 Thread Gus Correa
Hi Belaid Moa Belaid MOA wrote: Thanks a lot Gus for your help. Although I used stage_in/stage_out features before, I found NFS mounting much better and cleaner. Best Regards. ~Belaid. Yes, unless you have very heavy I/O programs (some computational Chemistry and genome programs are like th

Re: [OMPI users] Elementary question on openMPI application location when using PBS submission

2009-12-01 Thread Gus Correa
Hi Belaid Moa I spoke too fast, and burnt my tongue. I should have double checked before speaking out. I just looked up "man mpiexec" and found the options below. I never used or knew about them, but you may want to try. They seem to be similar to the Torque/PBS stage_in feature. I would guess th

Re: [OMPI users] Elementary question on openMPI application location when using PBS submission

2009-12-01 Thread Belaid MOA
Thanks a lot Gus for your help. Although I used stage_in/stage_out features before, I found NFS mounting much better and cleaner. Best Regards. ~Belaid. > Date: Tue, 1 Dec 2009 14:55:53 -0500 > From: g...@ldeo.columbia.edu > To: us...@open-mpi.org > Subject: Re: [OMPI users] Elementary question

Re: [OMPI users] Elementary question on openMPI application location when using PBS submission

2009-12-01 Thread Gus Correa
Hi Belaid Moa Belaid MOA wrote: Thank you very very much Gus. Does this mean that OpenMPI does not copy the executable from the master node to the worker nodes? Not that I know. Making the executable available on the nodes, and any input files the program may need, is the user's responsibility

Re: [OMPI users] Elementary question on openMPI application location when using PBS submission

2009-12-01 Thread Belaid MOA
Thank you very very much Gus. Does this mean that OpenMPI does not copy the executable from the master node to the worker nodes? If that's case, I will go ahead and NFS mount my working directory. ~Belaid. > Date: Tue, 1 Dec 2009 13:50:57 -0500 > From: g...@ldeo.columbia.edu > To: us...@open-m

Re: [OMPI users] Elementary question on openMPI application location when using PBS submission

2009-12-01 Thread Gus Correa
Hi Belaid MOA See this FAQ: http://www.open-mpi.org/faq/?category=running#do-i-need-a-common-filesystem http://www.open-mpi.org/faq/?category=building#where-to-install http://www.open-mpi.org/faq/?category=tm#tm-obtain-host Your executable needs to be on a directory that is accessible by all nod

[OMPI users] Elementary question on openMPI application location when using PBS submission

2009-12-01 Thread Belaid MOA
Hello everyone, I am new to this list and I have a very elementary question: suppose we have three machines, HN (Head Node hosting the pbs server), WN1 (A worker node) and WN (another worker node). The PBS nodefile has WN1 and WN2 in it (DOES NOT HAVE HN). My openMPI program (hello) and PBS sc

Re: [OMPI users] configuration settings for Mac OSX

2009-12-01 Thread Jeff Squyres
On Dec 1, 2009, at 12:20 PM, Jurgen Heymann wrote: Thank you for your feedback. In all cases, I installed it clean. After consulting the mpiBLAST user group, I learned that there is an issue building mpiBLAST with openMPI on Intel platforms and that it is being investigated. Once I learn mo

Re: [OMPI users] configuration settings for Mac OSX

2009-12-01 Thread Jurgen Heymann
Hi Jeff, Thank you for your feedback. In all cases, I installed it clean. After consulting the mpiBLAST user group, I learned that there is an issue building mpiBLAST with openMPI on Intel platforms and that it is being investigated. Once I learn more about it, I will post it here. I would stil

Re: [OMPI users] Program deadlocks, on simple send/recv loop

2009-12-01 Thread Ashley Pittman
On Tue, 2009-12-01 at 10:46 -0500, Brock Palen wrote: > The attached code, is an example where openmpi/1.3.2 will lock up, if > ran on 48 cores, of IB (4 cores per node), > The code loops over recv from all processors on rank 0 and sends from > all other ranks, as far as I know this should work

[OMPI users] Program deadlocks, on simple send/recv loop

2009-12-01 Thread Brock Palen
The attached code, is an example where openmpi/1.3.2 will lock up, if ran on 48 cores, of IB (4 cores per node), The code loops over recv from all processors on rank 0 and sends from all other ranks, as far as I know this should work, and I can't see why not. Note yes I know we can do the sam

Re: [OMPI users] configuration settings for Mac OSX

2009-12-01 Thread Jeff Squyres
On Nov 23, 2009, at 12:51 PM, Jurgen Heymann wrote: I am trying to understand what parameters are essential to include when running ./configure with openmpi-1.3.3 when working with Mac using PPC (OS 10.4) or Intel platform (OS 10.5). What compilers and settings work best for the individual

Re: [OMPI users] exceedingly virtual memory consumption of MPI, environment if higher-setting "ulimit -s"

2009-12-01 Thread Jeff Squyres
I can't think of what OMPI would be doing related to the predefined stack size -- I am not aware of anywhere in the code where we look up the predefine stack size and then do something with it. That being said, I don't know what the OS and resource consumption effects are of setting 1GB+ st

Re: [OMPI users] MTT trivial test is getting failed:

2009-12-01 Thread Jeff Squyres
This is probably best taken up on the MTT list -- it doesn't look like an OMPI error, but rather an MTT configuration error (if you're running OMPI 1.3.3 through MTT, it shouldn't be trying to find OMPI 1.3.2). On Dec 1, 2009, at 1:45 AM, Vishal Shorrghar wrote: Hi ALL, I tried to run

Re: [OMPI users] Trouble with SGE integration

2009-12-01 Thread Ondrej Glembek
Hi We have solved the problem by rewriting the starter.sh... The script remained the same except for the very final part where command is executed... Instead of plain exec "$@", we replaced it by: == #need for exec to fail on non-script jobs shopt -s execfail #start the job in thus shell

Re: [OMPI users] MPI Processes and Auto Vectorization

2009-12-01 Thread Tim Prince
amjad ali wrote: Hi, thanks T.Prince, Your saying: "I'll just mention that we are well into the era of 3 levels of programming parallelization: vectorization, threaded parallel (e.g. OpenMP), and process parallel (e.g. MPI)." is a really great new learning for me. Now I can perceive better.

Re: [OMPI users] MPI Processes and Auto Vectorization

2009-12-01 Thread amjad ali
Hi, thanks T.Prince, Your saying: "I'll just mention that we are well into the era of 3 levels of programming parallelization: vectorization, threaded parallel (e.g. OpenMP), and process parallel (e.g. MPI)." is a really great new learning for me. Now I can perceive better. Can you please expl

Re: [OMPI users] MPI Processes and Auto Vectorization

2009-12-01 Thread Tim Prince
amjad ali wrote: Hi, Suppose we run a parallel MPI code with 64 processes on a cluster, say of 16 nodes. The cluster nodes has multicore CPU say 4 cores on each node. Now all the 64 cores on the cluster running a process. Program is SPMD, means all processes has the same workload. Now if we

Re: [OMPI users] Trouble with SGE integration

2009-12-01 Thread Reuti
Am 01.12.2009 um 10:32 schrieb Ondrej Glembek: Just to add more info: Reuti wrote: Am 30.11.2009 um 20:07 schrieb Ondrej Glembek: But I think the real problem is, that Open MPI assumes you are outside of SGE and so uses a different startup. Are you resetting any of SGE's environment vari

Re: [OMPI users] Trouble with SGE integration

2009-12-01 Thread Reuti
Hi, Am 01.12.2009 um 10:00 schrieb Ondrej Glembek: Reuti wrote: ./configure --prefix=/homes/kazi/glembek/share/openmpi-1.3.3-64 --with-sge --enable-shared --enable-static --host=x86_64-linux --build=x86_64-linux NM=x86_64-linux-nm Is there any list of valid values for --host, --build and NM

Re: [OMPI users] Trouble with SGE integration

2009-12-01 Thread Ondrej Glembek
Just to add more info: Reuti wrote: > Am 30.11.2009 um 20:07 schrieb Ondrej Glembek: > > But I think the real problem is, that Open MPI assumes you are outside > of SGE and so uses a different startup. Are you resetting any of SGE's > environment variables in your custom starter method (like $JOB

Re: [OMPI users] Trouble with SGE integration

2009-12-01 Thread Ondrej Glembek
Hi Reuti wrote: >> >> ./configure --prefix=/homes/kazi/glembek/share/openmpi-1.3.3-64 >> --with-sge --enable-shared --enable-static --host=x86_64-linux >> --build=x86_64-linux NM=x86_64-linux-nm > > Is there any list of valid values for --host, --build and NM - and what > is NM for? From the ./co

[OMPI users] MTT trivial test is getting failed:

2009-12-01 Thread Vishal Shorrghar
Hi ALL, I tried to run trivial test between two nodes, it seems to be running without any memory issue ,but it gives some error(path mismatch like its taking openmpi 1.3.2 instead of 1.3.3) while fetching/executing some test binaries i.e. cxx_ring, c_ring etc. I am using openmpi-1.3.3.path is

[OMPI users] MPI Processes and Auto Vectorization

2009-12-01 Thread amjad ali
Hi, Suppose we run a parallel MPI code with 64 processes on a cluster, say of 16 nodes. The cluster nodes has multicore CPU say 4 cores on each node. Now all the 64 cores on the cluster running a process. Program is SPMD, means all processes has the same workload. Now if we had done auto-vectoriz