Re: [OMPI users] Condor and MPI

2011-04-15 Thread Ralph Castain
On Apr 15, 2011, at 2:59 AM, Reuti wrote: > Hi, > > Am 15.04.2011 um 07:25 schrieb Asad Ali: > >> >> Yes. The entire job gets restarted. > > maybe this is caused by a signal sent to the job by Condor, so that it gets > terminated and as a result Condor restarts it. Can you trap the signals

Re: [OMPI users] Try to submit OMPI job to SGE gives ERRORS (orte_plm_base_select failed & orte_ess_set_name failed) (Reuti)

2011-04-17 Thread Ralph Castain
I'm no SGE expert, but I do note that your original error indicates that mpirun was unable to find a launcher for your environment. When running under SGE, mpirun looks for certain environmental variables indicative of SGE. If it finds those, it then looks for the "qrsh" command. If it doesn't f

Re: [OMPI users] mca_oob_tcp_msg_recv: readv failed:Unknown error (10054)

2011-04-19 Thread Ralph Castain
Just a suggestion: have you looked at it in a debugger? The error isn't coming from OMPI - looks like a segfault caused by an error in the program or how it is being run. On Apr 19, 2011, at 7:19 AM, hi wrote: > On WINDOWS platform, I am observing following error when executing > "mpirun blacs

Re: [OMPI users] mca_oob_tcp_msg_recv: readv failed:Unknown error (10054)

2011-04-19 Thread Ralph Castain
s: Visual Studio 2008 32bit and Intel ifort 32bit > OpenMPI: OpenMPI-1.5.3 pre-built libraries and also with > OpenMPI-1.5.2. locally built libraries > BLACS: pre-built libraries taken from > http://icl.cs.utk.edu/lapack-for-windows/scalapack/index.html#librairies > > Tha

Re: [OMPI users] mpirun unsuccessful when run across multiple nodes

2011-04-19 Thread Ralph Castain
You have to tell mpiexec what nodes you want to use for your application. This is typically done either on the command line or in a file. For now, you could just do this: mpiexec -host node1,node2,node3 -np N ./my_app where node1,node2,node3,... are the names or IP addresses of the nodes you

Re: [OMPI users] Problem compiling OpenMPI on Ubuntu 11.04

2011-04-19 Thread Ralph Castain
Nothing was attached, but I doubt they would help anyway. This looks like a missing header file in Ubuntu, or else one that got moved and needs a different path. Where is asm/errno.h, and how was it included in /usr/include/linux/errno.h? Best I can figure is it got put in some non-standard pla

Re: [OMPI users] Problem compiling OpenMPI on Ubuntu 11.04

2011-04-19 Thread Ralph Castain
On Apr 19, 2011, at 2:24 PM, Sergiy Bubin wrote: > > Thanks for the suggestion. I have figured (by googling around and comparing > the content of asm directories) that Ubuntu 11.04 has some difference in the > location of /usr/include/asm/. It appears that now that whole directory is > locate

Re: [OMPI users] Removing Portals BTLs

2011-04-21 Thread Ralph Castain
Sure - instead of what you did, just add --without-portals to your original configure. The exact option depends on what portals you have installed. Here is the relevant part of the "./configure -h" output: --with-portals=DIR Specify the installation directory of PORTALS --with-portals-l

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-04-21 Thread Ralph Castain
On Apr 21, 2011, at 4:41 PM, Brock Palen wrote: > Given that part of our cluster is TCP only, openib wouldn't even startup on > those hosts That is correct - it would have no impact on those hosts > and this would be ignored on hosts with IB adaptors? Ummm...not sure I understand this one.

Re: [OMPI users] intel compiler linking issue and issue of environment variable on remote node, with open mpi 1.4.3

2011-04-22 Thread Ralph Castain
On Apr 22, 2011, at 1:42 PM, ya...@adina.com wrote: > Open MPI 1.4.3 + Intel Compilers V8.1 summary: > (in case someone likes to refer to it later) > > (1) To make all Open MPI executables statically linked and > independent of any dynamic libraries, > "--disable-shared" and "--enable-static" o

Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

2011-04-23 Thread Ralph Castain
On Apr 23, 2011, at 6:20 AM, Reuti wrote: > Hi, > > Am 23.04.2011 um 04:31 schrieb Pablo Lopez Rios: > >> I'm having a bit of a problem with wrapping mpirun in a script. The script >> needs to run an MPI job in the background and tail -f the output. Pressing >> Ctrl+C should stop tail -f, and

Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

2011-04-23 Thread Ralph Castain
On Apr 23, 2011, at 9:07 AM, Pablo Lopez Rios wrote: >> what about: >> ( trap "" sigint; exec mpiexec ...)& > > Yup, that's included in the workarounds I tried. Tried again with your > specific suggestion; no luck. > >> Well, maybe mpiexec is adjusting it on its own >> again. This can be check

Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

2011-04-23 Thread Ralph Castain
ing to accomplish, but there are other signals that don't cause termination. For example, we trap and forward SIGUSR1 and SIGUSR2 to your application procs, if that is of use. But ctrl-c has a special meaning ("die"), and you can't tell mpirun to ignore it. > > Tha

Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

2011-04-23 Thread Ralph Castain
mpirun overriding the trap in the *parent* > subshell so that it ends up getting the SIGINT that was supposedly blocked at > that level? Am I missing something trivial? How can I avoid this? I keep telling you - you can't. The better way to do this is to execute mpirun, and then run tail i

Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

2011-04-23 Thread Ralph Castain
mp; out"& > tail -f out > Yes - but now you can't kill mpirun when something goes wrong > Thanks, > Pablo > > > On 23/04/11 18:39, Reuti wrote: >> Am 23.04.2011 um 19:33 schrieb Ralph Castain: >> >>> On Apr 23, 2011, at 10:40 AM, Pa

Re: [OMPI users] OpenMPI exits when subsequent tail -f in script is interrupted

2011-04-23 Thread Ralph Castain
On Apr 23, 2011, at 12:07 PM, Reuti wrote: > Am 23.04.2011 um 19:58 schrieb Ralph Castain: > >> >> On Apr 23, 2011, at 11:55 AM, Pablo Lopez Rios wrote: >> >>>> What about setsid and pushing it in a new >>>> seesion instead of using&

Re: [OMPI users] RES: Error with ARM target

2011-04-23 Thread Ralph Castain
Don't give it a host argument - unless you are trying to cross-compile, it should figure it out for itself On Apr 23, 2011, at 1:25 PM, Fernando Dutra Fagundes Macedo wrote: > Correcting: > > I tried 1.5.2 and 1.5.3. > > > -Mensagem original- > De: users-boun...@open-mpi.org em nome

Re: [OMPI users] RES: RES: Error with ARM target

2011-04-25 Thread Ralph Castain
re it's still there, but it is hard to find. Try searching the OMPI web site for info. On Apr 25, 2011, at 5:09 AM, Fernando Dutra Fagundes Macedo wrote: > I'm trying to cross-compile. > > -Mensagem original- > De: users-boun...@open-mpi.org [mailto:users-boun.

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-27 Thread Ralph Castain
Perhaps a firewall? All it is telling you is that mpirun couldn't establish TCP communications with the daemon on ln10. On Apr 27, 2011, at 11:58 AM, Sindhi, Waris PW wrote: > Hi, > I am getting a "oob-tcp: Communication retries exceeded" error > message when I run a 238 MPI slave code > >

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Ralph Castain
it, and/or have this executable >> compiled as part of the PSM MTL and then installed into $bindir (maybe named >> ompi-psm-keygen)? >> >> Right now, it's only compiled as part of "make check" and not installed, >> right? >> >> On Dec

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Ralph Castain
On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote: > On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote: >> >> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: >> >>> Was this ever committed to the OMPI src as something not having to be >>&

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Ralph Castain
On Apr 27, 2011, at 1:27 PM, Jeff Squyres wrote: > On Apr 27, 2011, at 2:46 PM, Ralph Castain wrote: > >> Actually, I understood you correctly. I'm just saying that I find no >> evidence in the code that we try three times before giving up. What I see is >> a si

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-27 Thread Ralph Castain
, TechApps > Pratt & Whitney, UTC > (860)-565-8486 > > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Wednesday, April 27, 2011 2:18 PM > To: Open MPI Users > Subject: Re: [OMPI users] OpenMPI

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Ralph Castain
On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote: > On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote: >> >> On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote: >> >>> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain wrote: >>>> >

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Ralph Castain
What lead you to conclude 1.2.8? > Is there any way you can upgrade to a (much) later version, such as 1.4.3? > That might improve your TCP connectivity -- we made improvements in those > portions of the code over the years. > > On Apr 27, 2011, at 8:09 PM, Ralph Castain wrot

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Ralph Castain
On Apr 28, 2011, at 6:49 AM, Jeff Squyres wrote: > On Apr 28, 2011, at 8:45 AM, Ralph Castain wrote: > >> What lead you to conclude 1.2.8? >> >>>>>> /opt/openmpi/i386/bin/mpirun -mca btl_openib_verbose 1 --mca btl ^tcp >>>>>> --mca pls

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Ralph Castain
27;s trunk, but not yet in a release. > > Sincerely, > > Waris Sindhi > High Performance Computing, TechApps > Pratt & Whitney, UTC > (860)-565-8486 > > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Be

Re: [OMPI users] srun and openmpi

2011-04-28 Thread Ralph Castain
On Apr 28, 2011, at 6:49 AM, Michael Di Domenico wrote: > On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain wrote: >> >> On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote: >> >>> On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain wrote: >>>> >

Re: [OMPI users] problems with the -xterm option

2011-04-28 Thread Ralph Castain
> Surprisingly, they are trying 'localhost:11.0' whereas when i use 'ssh -Y' >> the >> DISPLAY variable is set to 'localhost:10.0' >> >> So in what way would OMPI have to be adapted, so -xterm would work? >> >> Thank You >>

Re: [OMPI users] srun and openmpi

2011-04-28 Thread Ralph Castain
Castain wrote: > > On Apr 28, 2011, at 6:49 AM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain wrote: >>> >>> On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote: >>> >>>> On Wed, Apr 27, 2011 at 2:46

Re: [OMPI users] problems with the -xterm option

2011-04-28 Thread Ralph Castain
t;> /usr/bin/xterm Xt error: Can't open display: localhost:11.0 >>>> /usr/bin/xterm Xt error: Can't open display: localhost:11.0 >>>> OMPI_COMM_WORLD_RANK=0 >>>> [aim-squid_0:09856] [[54132,0],1]->[[54132,0],0] >>>> mca_oob_tcp_msg_se

Re: [OMPI users] OpenMPI out of band TCP retry exceeded

2011-04-28 Thread Ralph Castain
n: r22285 > Open MPI release date: Dec 08, 2009 >Open RTE: 1.4 > > > Sincerely, > > Waris Sindhi > High Performance Computing, TechApps > Pratt & Whitney, UTC > (860)-565-8486 > > -Original Message- > From: users-boun...@open-m

Re: [OMPI users] problems with the -xterm option

2011-04-28 Thread Ralph Castain
>> Warning: No xauth data; using fake authentication data for X11 forwarding. >>>> /usr/bin/xterm Xt error: Can't open display: localhost:11.0 >>>> /usr/bin/xterm Xt error: Can't open display: localhost:11.0 >>>> OMPI_COMM_WORLD_RANK=0 >>&g

Re: [OMPI users] srun and openmpi

2011-04-29 Thread Ralph Castain
Hi Michael Please see the attached updated patch to try for 1.5.3. I mistakenly free'd the envar after adding it to the environ :-/ Thanks Ralph slurmd.diff Description: Binary data On Apr 28, 2011, at 2:31 PM, Michael Di Domenico wrote: > On Thu, Apr 28, 2011 at 9:03 AM, Ralph

Re: [OMPI users] srun and openmpi

2011-04-29 Thread Ralph Castain
On Apr 29, 2011, at 8:05 AM, Michael Di Domenico wrote: > On Fri, Apr 29, 2011 at 10:01 AM, Michael Di Domenico > wrote: >> On Fri, Apr 29, 2011 at 4:52 AM, Ralph Castain wrote: >>> Hi Michael >>> >>> Please see the attached updated patch to try for 1.

Re: [OMPI users] srun and openmpi

2011-04-29 Thread Ralph Castain
p. We just need someone to explain the requirements on that precondition value. Thanks Ralph On Apr 29, 2011, at 8:12 AM, Ralph Castain wrote: > > On Apr 29, 2011, at 8:05 AM, Michael Di Domenico wrote: > >> On Fri, Apr 29, 2011 at 10:01 AM, Michael Di Domenico >> wrote: &

Re: [OMPI users] problems with the -xterm option

2011-05-02 Thread Ralph Castain
t, no xterm. > >> From these results i would say that there is no basic mishandling of > 'ssh', though i have no idea > what internal differences the use of the '-leave-session-attached' > option or the debug options make. > > I hope these observations ar

Re: [OMPI users] problems with the -xterm option

2011-05-02 Thread Ralph Castain
run -np 4 -host squid_0 -mca >>> plm_rsh_agent "ssh -Y" --leave-session-attached --xterm 0,1,2,3! >>> ./HelloMPI >>> The xterms are also opened if i do not use the '!' hold option. >> Did I miss something? > Thank You > Jody >

Re: [OMPI users] problems with the -xterm option

2011-05-02 Thread Ralph Castain
; option) Ah, well that might explain it. I don't know how xterm would react to just being launched by mpirun onto a remote platform without any command to run. I can't explain what the plm verbosity has to do with anything, though. > Jody > > On Mon, May 2, 2011 at 4:

Re: [OMPI users] Building openmpi with PGI 11.4: won't find torque??

2011-05-02 Thread Ralph Castain
It's probably looking for the torque lib in lib instead of lib64. There should be a configure option to tell it --with-tm-libdir or something like that - check "configure -h" On May 2, 2011, at 6:22 PM, Jim Kusznir wrote: > Hi all: > > I'm trying to build openmpi 1.4.3 against PGI 11.4 on my

Re: [OMPI users] WRF Problem running in Parallel on multiple nodes (cluster)

2011-05-03 Thread Ralph Castain
The error message is telling you the problem. You don't have your remote path set so it can find the OMPI installation on the remote hosts. Look at the OMPI FAQ section for more info if you are unsure how to set paths on remote hosts. On May 3, 2011, at 2:04 AM, Ahsan Ali wrote: > Hello, > >

Re: [OMPI users] [Wrf-users] WRF Problem running in Parallel on multiple nodes(cluster)

2011-05-04 Thread Ralph Castain
You still have to set the PATH and LD_LIBRARY_PATH on your remote nodes to include where you installed OMPI. Alternatively, use the absolute path name to mpirun in your cmd - we'll pick up the path and propagate it. On May 3, 2011, at 9:14 PM, Ahsan Ali wrote: > Dear Bart, > > I think OpenMP

Re: [OMPI users] cputype (7) does not match previous archive members cputype

2011-05-04 Thread Ralph Castain
Did you make clean first? configure won't clean out the prior installation, so you may be picking up stale libs. On May 4, 2011, at 11:27 AM, Cizmas, Paul wrote: > I added LDFLAGS=-m64, such that the command is now > > ./configure --prefix=/opt/openmpi1.4.3GFm64 CC=/sw/bin/gcc-fsf-4.5 > CFLAG

Re: [OMPI users] All processes have id 0 of 1

2011-05-05 Thread Ralph Castain
Usually that means you have a mismatch in your OMPI versions - you may have built the app with one version and are running it against another, for example, or perhaps compiled them against MPICH and run them using OMPI's mpirun/mpiexec. On Thu, May 5, 2011 at 1:23 PM, Bartłomiej W wrote: > Hello

Re: [OMPI users] Error when trying to kill a spawned process

2011-05-06 Thread Ralph Castain
Why are you using ompi-clean for this purpose instead of a simple ctrl-c? It wasn't intended for killing jobs, but only for attempting cleanup of lost processes in extremity (i.e., when everything else short of rebooting the node fails). So it isn't robust by any means. On May 6, 2011, at 11:5

Re: [OMPI users] Sorry! You were supposed to get help about: But couldn't open help-orterun.txt

2011-05-11 Thread Ralph Castain
I don't know a lot about the Windows port, but that error means that mpirun got an error when trying to discover the allocated nodes. On May 11, 2011, at 6:10 AM, hi wrote: > After setting OPAL_PKGDATADIR, "mpirun" gives proper help message. > > But when executing simple test program which cal

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-11 Thread Ralph Castain
Sent from my iPad On May 11, 2011, at 2:05 PM, Brock Palen wrote: > On May 9, 2011, at 9:31 AM, Jeff Squyres wrote: > >> On May 3, 2011, at 6:42 AM, Dave Love wrote: >> We managed to have another user hit the bug that causes collectives (this time MPI_Bcast() ) to hang on IB that

Re: [OMPI users] TotalView Memory debugging and OpenMPI

2011-05-11 Thread Ralph Castain
That would be a problem, I fear. We need to push those envars into the environment. Is there some particular problem causing what you see? We have no other reports of this issue, and orterun has had that code forever. Sent from my iPad On May 11, 2011, at 2:05 PM, Peter Thompson wrote: >

Re: [OMPI users] btl_openib_cpc_include rdmacm questions

2011-05-12 Thread Ralph Castain
On May 11, 2011, at 4:27 PM, Dave Love wrote: > Ralph Castain writes: > >> I'll go back to my earlier comments. Users always claim that their >> code doesn't have the sync issue, but it has proved to help more often >> than not, and costs nothing to try, >

Re: [OMPI users] Scheduling dynamically spawned processes

2011-05-13 Thread Ralph Castain
On May 12, 2011, at 9:53 PM, Rodrigo Silva Oliveira wrote: > Hi there. > > I'm developing a distributed system with a communication layer based on Open > MPI. As part of my project, I have to create a process scheduler. So I > decided to use the MPI_Spawn function to dynamically create (it is

Re: [OMPI users] Scheduling dynamically spawned processes

2011-05-13 Thread Ralph Castain
I believe I answered that question. You can use the hostfile info key, or you can use the host info key - either one will do what you require. On May 13, 2011, at 4:11 PM, Rodrigo Silva Oliveira wrote: > Hi, > > I think I was not specific enough. I need to spawn the copies of a process in > a

Re: [OMPI users] Scheduling dynamically spawned processes

2011-05-16 Thread Ralph Castain
gnores the repetition of hosts. > Using Rodrigo's example I did: > > host info key = "m1,m2,m2,m2,m3" and number of processes = 5 and the result > was > > m1 -> 2 > m2 -> 2 > m3 -> 1 > > and not > > m1 -> 1 > m2 -> 3 > m3

Re: [OMPI users] TotalView Memory debugging and OpenMPI

2011-05-16 Thread Ralph Castain
've passed your comment back to the engineer, with a suspicion about the > concerns about the abort, but if you have other objections, let me know. > > Cheers, > PeterT > > > Ralph Castain wrote: >> That would be a problem, I fear. We need to push those envars into t

Re: [OMPI users] TotalView Memory debugging and OpenMPI

2011-05-16 Thread Ralph Castain
used by > putenv(), and I do know that while that used to be just flagged as an event > before, now we seem to be unable to continue past it. Not sure if that is > our change or a library/system change. > PeterT > > > Ralph Castain wrote: >> On May 16, 2011, at 12:

Re: [OMPI users] Scheduling dynamically spawned processes

2011-05-17 Thread Ralph Castain
array are > ignored because an info argument applies to the entire job that is spawned, > and cannot be different for each executable in the job. See the INFO > ARGUMENTS section for more information." > > Anyway, I'm glad it works! > > Thank you very much! > &g

Re: [OMPI users] Sorry! You were supposed to get help about: But couldn't open help-orterun.txt

2011-05-18 Thread Ralph Castain
le >>> paths, and it's better to use UNC path. >>> >>> To clarify the path issue, if you just copy the OMPI dir to another >>> computer, there might also be another problem that OMPI couldn't load the >>> registry entries, as the registry entries w

Re: [OMPI users] TotalView Memory debugging and OpenMPI

2011-05-18 Thread Ralph Castain
before, now we seem to be unable to continue past it. Not sure if that is > our change or a library/system change. > PeterT > > > Ralph Castain wrote: >> On May 16, 2011, at 12:45 PM, Peter Thompson wrote: >> >> >>> Hi Ralph, >>> >&g

Re: [OMPI users] Deadlock with barrier und RMA

2011-06-11 Thread Ralph Castain
Oh my - that is such an old version! Any reason for using it instead of something more recent? On Jun 11, 2011, at 8:43 AM, Ole Kliemann wrote: > Hi everyone! > > I'm trying to use MPI on a cluster running OpenMPI 1.2.4 and starting > processes through PBSPro_11.0.2.110766. I've been running i

Re: [OMPI users] Error when trying to kill a spawned process

2011-06-13 Thread Ralph Castain
On Jun 13, 2011, at 1:32 PM, Rodrigo Oliveira wrote: > The point is: I have a system composed by a set of mpi processes. These > processes run as daemons in each cluster machine. I need a way to kill those > ones when I decide to shutdown the system. Do you mean that your MPI processes actuall

Re: [OMPI users] [ompi-1.4.2] Infiniband issue on smoky @ ornl

2011-06-23 Thread Ralph Castain
One possibility: if you increase the number of processes in the job, and they all interconnect, then the IB interface can (I believe) run out of memory at some point. IIRC, the answer was to reduce the size of the QPs so that you could support a larger number of them. You should find info about

Re: [OMPI users] mpirun does not propagate environment from master node to slave nodes

2011-06-28 Thread Ralph Castain
On Jun 28, 2011, at 9:05 AM, ya...@adina.com wrote: > Hello All, > > I installed Open MPI 1.4.3 on our new HPC blades, with Infiniband > interconnection. > > My system environments are as: > > 1)uname -a output: > Linux gulftown 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT > 2010 x86_64 x

Re: [OMPI users] Problems with Mpi Accept - ORTE_ERROR_LOG

2011-06-28 Thread Ralph Castain
How are you passing the port info between the server and client? You're hitting a race condition between the two sides. On Jun 27, 2011, at 9:29 AM, Rodrigo Oliveira wrote: > Hi there. > I am developing a server/client application using Open MPI 1.5.3. In a point > of the server code I open a p

Re: [OMPI users] mpirun does not propagate environment from master node to slave nodes

2011-06-28 Thread Ralph Castain
On Jun 28, 2011, at 3:52 PM, ya...@adina.com wrote: > Thanks, Ralph! > > a) Yes, I know I could use only IB by "--mca btl openib", but just > want to make sure I am using IB interfaces. I am seeking an option > to mpirun to print out the actual interconnect protocol, like --prot to > mpirun i

Re: [OMPI users] mpirun does not propagate environment from master node to slave nodes

2011-06-28 Thread Ralph Castain
On Jun 28, 2011, at 3:52 PM, ya...@adina.com wrote: > Thanks, Ralph! > > a) Yes, I know I could use only IB by "--mca btl openib", but just > want to make sure I am using IB interfaces. I am seeking an option > to mpirun to print out the actual interconnect protocol, like --prot to > mpirun i

Re: [OMPI users] Problems with Mpi Accept - ORTE_ERROR_LOG

2011-06-28 Thread Ralph Castain
Looking deeper, I believe we may have a race condition in the code. Sadly, that error message is actually irrelevant, but causes the code to abort. It can be triggered by race conditions in the app as well, but ultimately is something we need to clean up. On Jun 27, 2011, at 9:29 AM, Rodrigo O

Re: [OMPI users] The hostfile could not be found

2011-06-30 Thread Ralph Castain
That didn't come from OMPI - that error message is from LAM-MPI, which no longer is supported. I suggest you check the default path being set by Torque - looks like it is picking up an old LAM install. On Jun 30, 2011, at 8:24 PM, zhuangchao wrote: > hello all , > > I submited the f

Re: [OMPI users] Problems with Mpi Accept - ORTE_ERROR_LOG

2011-07-04 Thread Ralph Castain
is started and it stores the port name in a file. When a > client is started, it gets this port name and tries to connect. In my tests > the error happens about 1 time in 10 executions. > > It still working without confidence. > > On Tue, Jun 28, 2011 at 11:10 PM, Ralph Ca

Re: [OMPI users] OpenMPI on Tile architectures (no atomic primitives)

2011-07-05 Thread Ralph Castain
I very much doubt we have Tile support as it hasn't come up before. If you look in opal/asm/base, you'll see a MIPS.asm that contains the MIPS code - perhaps you could use that as a starting point? I didn't write any of that code, but I think if you poke around that directory looking for "MIPS"

Re: [OMPI users] MPI_Reduce error over Infiniband or TCP

2011-07-05 Thread Ralph Castain
Looks like your code is passing an invalid argument to MPI_Reduce... On Jul 5, 2011, at 9:20 AM, ya...@adina.com wrote: > Dear all, > > We are testing Open MPI over Infiniband, and got a MPI_Reduce > error message when we run our codes either over TCP or > Infiniband interface, as follows, >

Re: [OMPI users] mpirun does not propagate environment from master node to slave nodes

2011-07-05 Thread Ralph Castain
Let me get this straight. You are executing mpirun from inside a c-shell script, launching onto nodes where you will by default be running bash. The param I gave you should support that mode - it basically tells OMPI to probe the remote node to discover what shell it will run under there, and th

Re: [OMPI users] Running MPI jobs from external Hard Disk

2011-07-05 Thread Ralph Castain
I don't see Open MPI in your list of modules - looks to me like you are using MPICH? If so, you should send this to their mailing list. On Jul 5, 2011, at 1:44 PM, Chaudhari, Mangesh I wrote: > hi all, > > I m trying to run a job from external hard disk and its giving me errors my > output l

Re: [OMPI users] openmpi-1.5.2 installation problem

2011-07-06 Thread Ralph Castain
We don't directly link to that library, so it must be getting pulled in by some other lib. Have you tried adding /usr/heimdal/lib to your LD_LIBRARY_PATH before building? On Jul 6, 2011, at 3:27 AM, Sushil Mishra wrote: > Hi all: > I am trying to install openmpi-1.5.2 in Debian 4.3.2-1.1. I am

Re: [OMPI users] Error using hostfile

2011-07-06 Thread Ralph Castain
Please see http://www.open-mpi.org/faq/?category=rsh#ssh-keys On Jul 6, 2011, at 5:09 PM, Mohan, Ashwin wrote: > Hi, > > I use the following command (mpirun --prefix /usr/local/openmpi1.4.3 -np 4 > hello) to successfully execute a simple hello world command on a single node. > Each node has

Re: [OMPI users] Pinning of openmpi to certain defined cores possible

2011-07-08 Thread Ralph Castain
Look at "mpirun -h" or "man mpirun" - you'll see options for binding processes to cores etc. On Jul 8, 2011, at 10:13 AM, Vlad Popa wrote: > Hello! > > We habe a shared memory system based on 4CPUs of 12-core Opteron with a > total of 256Gb RAM . > > Are there any switches, which we could

Re: [OMPI users] Error using hostfile

2011-07-08 Thread Ralph Castain
eport back when launched > myocyte47 - daemon did not report back when launched > myocyte49 - daemon did not report back when launched > > Thanks, > Ashwin. > > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of R

Re: [OMPI users] tcp communication problems with 1.4.3 and 1.4.4 rc2 on FreeBSD

2011-07-08 Thread Ralph Castain
We've been moving to provide support for including values as CIDR notation instead of names - e.g., 192.168.0/16 instead of bge0 or bge1 - but I don't think that has been put into the 1.4 release series. If you need it now, you might try using the developer's trunk - I know it works there. On

Re: [OMPI users] max entries in procgroup file for OpenMPI 1.5?

2011-07-10 Thread Ralph Castain
On Jul 10, 2011, at 6:57 PM, BRADLEY, PETER C PW wrote: > I know 1.4.x has a limit of 128 entries for procgroup files. To avoid some > ugly surgery on a legacy application, we’d really like to have the ability to > put up to 1024 lines in a procgroup file? Has the limit been raised at all >

Re: [OMPI users] a question about network connection of open-mpi

2011-07-10 Thread Ralph Castain
Have you gone to those nodes and checked their IP addresses of -all- interfaces? OMPI must be picking up those addresses from somewhere - best guess is that those nodes have multiple interfaces on them, some of which are configured to those addresses. Remember: we don't look at the /etc/hosts f

Re: [OMPI users] a question about network connection of open-mpi

2011-07-12 Thread Ralph Castain
I believe we responded to this before...you might check your spam or inbox. On Jul 12, 2011, at 7:39 PM, zhuangchao wrote: > hello all : > > >I run the following command : > > /data1/cluster/openmpi/bin/mpirun -d -machinefile /tmp/nodes.10515.txt > -np 3 /data1/cluster

Re: [OMPI users] How to use a wrapper for ssh?

2011-07-12 Thread Ralph Castain
On Jul 12, 2011, at 2:34 PM, Paul Kapinos wrote: > Hi OpenMPI folks, > > Using the version 1.4.3 of OpenMPI, I wanna to wrap the 'ssh' calls, which > are made from the OpenMPIs 'mpiexec'. For this purpose, at least two ways > seem to be possible for me: > > 1. let the wrapper have the name 's

Re: [OMPI users] Open MPI & Grid Engine/Grid Scheduler thread binding (was: New loadcheck)

2011-07-15 Thread Ralph Castain
On Jul 14, 2011, at 5:46 PM, Jeff Squyres wrote: > Looping in the users mailing list so that Ralph and Oracle can comment... Not entirely sure what I can contribute here, but I'll try - see below for some clarifications. I think the discussion here is based on some misunderstanding of how OMPI

Re: [OMPI users] Cofigure(?) problem building /1.5.3 on ScientificLinux6.0

2011-07-22 Thread Ralph Castain
Higher rev levels of the autotools are required for the 1.5 series - are you at the right ones? See http://www.open-mpi.org/svn/building.php On Jul 22, 2011, at 9:12 AM, Paul Kapinos wrote: > Dear OpenMPI volks, > currently I have a problem by building the version 1.5.3 of OpenMPI on > Scienti

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Ralph Castain
A few thoughts: * including both btl_tcp_if_include and btl_tcp_if_exclude is problematic as they are mutually exclusive options. I'm not sure which one will take precedence. I would suggest only using one of them. * the default mapping algorithm is byslot - i.e., OMPI will place procs on each

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Ralph Castain
r cmd line to see where mpirun actually placed your processes, just to be sure they aren't overloading a node. > > -Bill > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of > Ralph Castain [r...@open-mpi.org] > Sent: Tuesday,

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Ralph Castain
; > Do I get it right: inside the granted slots by SGE you want the allocation > inside Open MPI to follow a specific pattern, i.e.: which rank is where? > > -- Reuti > > >> >> Thanks for your help Ralph. At least I have some ideas on where to look now. >> >> -Bill >> _

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Ralph Castain
On Jul 26, 2011, at 1:58 PM, Reuti wrote: allocation_rule$fill_up >>> >>> Here you specify to fill one machine after the other completely before >>> gathering slots from the next machine. You can change this to $round_robin >>> to get one slot form each node before taking a second from

Re: [OMPI users] Rankfile problem with Open MPI 1.4.3

2011-07-26 Thread Ralph Castain
I normally hide my eyes when rankfiles appear, but since you provide so much help on this list yourself... :-) I believe the problem is that you have the keyword "slots" wrong - it is supposed to be "slot": rank 1=host1 slot=1:0,1 rank 0=host2 slot=0:* rank 2=host4 slot=1-2 rank

Re: [OMPI users] Rankfile problem with Open MPI 1.4.3

2011-07-26 Thread Ralph Castain
ite 100 times in the blackboard: > "Slots in the hostfile, slot in the rankfile, > slot is singular, to err is plural." LOL > ... at least until Ralph's new plural-forgiving parsing rule > makes it to the code. Committed to the trunk, in the queue for both 1.4.4 and 1.5.4

Re: [OMPI users] Seg fault with PBS Pro 10.4

2011-07-26 Thread Ralph Castain
I don't believe we ever got anywhere with this due to lack of response. If you get some info on what happened to tm_init, please pass it along. Best guess: something changed in a recent PBS Pro release. Since none of us have access to it, we don't know what's going on. :-( On Jul 26, 2011, at

Re: [OMPI users] Seg fault with PBS Pro 10.4

2011-07-27 Thread Ralph Castain
v11.x. > > I built OpenMPI 1.5.3 this morning with PBSPro v11.0, and it works fine. I > don't get any segfaults. > > -Justin. > > On 07/26/2011 05:49 PM, Ralph Castain wrote: >> I don't believe we ever got anywhere with this due to lack of response. If >&

Re: [OMPI users] OpenMPI causing WRF to crash

2011-08-06 Thread Ralph Castain
Do you have something like valgrind on your machine? If so, then why not launch your apps under valgrind - eg., "mpirun valgrind my_app"? If your app is segfaulting, there isn't much OMPI can do to tell you why. All we can do is tell you that your app was hit with a SIGTERM. Did you talk t

Re: [OMPI users] scaling issue beyond 1024 processes

2011-08-09 Thread Ralph Castain
That error makes no sense - line 335 is just a variable declaration. Sure you are not picking up a different version on that node? On Aug 9, 2011, at 11:37 AM, CB wrote: > Hi, > > Currently I'm having trouble to scale an MPI job beyond a certain limit. > So I'm running an MPI hello example to

Re: [OMPI users] CMAQ crashes with OpenMPI

2011-08-09 Thread Ralph Castain
Also, please be aware that we haven't done any testing of OMPI on Lion, so this is truly new ground. On Aug 9, 2011, at 3:00 PM, Doug Reeder wrote: > Matt, > > Are you sure you are building against your macports version of openmpi and > not the one that ships w/ lion. In the trace back are it

Re: [OMPI users] scaling issue beyond 1024 processes

2011-08-10 Thread Ralph Castain
tions for troubleshooting? > > Thanks, > - Chansup > > > On Tue, Aug 9, 2011 at 2:02 PM, CB wrote: > Hi Ralph, > > Yes, you are right. Those nodes were still pointing to an old version. > I'll check the installation on all nodes and try to run it again. > >

Re: [OMPI users] MPI_Spawn and process allocation policy

2011-08-16 Thread Ralph Castain
What version are you using? On Aug 16, 2011, at 3:19 AM, Simone Pellegrini wrote: > Dear all, > I am developing a system to manage MPI tasks on top of MPI. The architecture > is rather simple, I have a set of scheduler processes which takes care to > manage the resources of a node. The idea is

Re: [OMPI users] MPI_Spawn and process allocation policy

2011-08-16 Thread Ralph Castain
tell us what is happening. On Aug 16, 2011, at 5:09 AM, Simone Pellegrini wrote: > On 08/16/2011 12:30 PM, Ralph Castain wrote: >> What version are you using? > > OpenMPI 1.4.3 > >> >> >> On Aug 16, 2011, at 3:19 AM, Simone Pellegrini wrote: >> >>

Re: [OMPI users] MPI_Spawn and process allocation policy

2011-08-16 Thread Ralph Castain
Smells like a bug - I'll take a look. On Aug 16, 2011, at 9:10 AM, Simone Pellegrini wrote: > On 08/16/2011 02:11 PM, Ralph Castain wrote: >> That should work, then. When you set the "host" property, did you give the >> same name as was in your machine file? &

Re: [OMPI users] MPI_Spawn and process allocation policy

2011-08-16 Thread Ralph Castain
I'm not finding a bug - the code looks clean. If I send you a patch, could you apply it, rebuild, and send me the resulting debug output? On Aug 16, 2011, at 10:18 AM, Ralph Castain wrote: > Smells like a bug - I'll take a look. > > > On Aug 16, 2011, at 9:10 AM, S

Re: [OMPI users] Bindings not detected with slurm (srun)

2011-08-18 Thread Ralph Castain
Afraid I am confused. I assume this refers to the trunk, yes? I also assume you are talking about launching an application directly from srun as opposed to using mpirun - yes? In that case, I fail to understand what difference it makes regarding this proposed change. The application process is

Re: [OMPI users] Bindings not detected with slurm (srun)

2011-08-22 Thread Ralph Castain
Okay - thx! I'll install in trunk and schedule for 1.5 On Aug 22, 2011, at 7:20 AM, pascal.dev...@bull.net wrote: > > users-boun...@open-mpi.org a écrit sur 18/08/2011 14:41:25 : > >> De : Ralph Castain >> A : Open MPI Users >> Date : 18/08/2011 14:45 >

<    4   5   6   7   8   9   10   11   12   13   >