Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-24 Thread Ralph Castain
: > 10.4.72.110:1 > [c0301b10e1:22830] [[0,],1]-[[0,0],0] > mca_oob_tcp_peer_complete_connect: connection failed: Connection > refused (111) - retrying > > > > > Thanks! > p. > > > On Mon, Aug 23, 2010 at 3:24 PM, Ralph Castain wrote: >> Can you send m

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-24 Thread Ralph Castain
Yes, that's fine. Thx! On Aug 24, 2010, at 9:02 AM, Philippe wrote: > awesome, I'll give it a spin! with the parameters as below? > > p. > > On Tue, Aug 24, 2010 at 10:47 AM, Ralph Castain wrote: >> I think I have this working now - try anything on or after r2

Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

2010-08-24 Thread Ralph Castain
-gcj-1.4.2.0/jre --with-cpu=generic > --host=x86_64-redhat-linux > Thread model: posix > gcc version 4.1.2 20080704 (Red Hat 4.1.2-46) > > > but it failed. I am attaching the configure and make logs. > > regards > > Michael > > > On 08/23/10 20:53, Ralph Cas

Re: [OMPI users] MPI_Comm_accept and MPI_Comm_connect both use 100% one cpu core. Is it a bug?

2010-09-01 Thread Ralph Castain
It's not a bug - that is normal behavior. The processes are polling hard to establish the connections as quickly as possible. On Sep 1, 2010, at 7:24 PM, lyb wrote: > Hi, All, > > I tested two sample applications on Windows 2003 Server, one use > MPI_Comm_accept and other use MPI_Comm_connect

Re: [OMPI users] spin-wait backoff

2010-09-03 Thread Ralph Castain
In the upcoming 1.5 series, we will introduce a new "sensor" framework to help resolve such issues. Among other things, it will automatically track (if requested) the size of a sentinel file, cpu usage, and memory footprint and will terminate the job if any exceed user-specified limits (e.g., fi

Re: [OMPI users] spin-wait backoff

2010-09-04 Thread Ralph Castain
On Sep 3, 2010, at 5:10 PM, David Singleton wrote: > On 09/03/2010 10:05 PM, Jeff Squyres wrote: >> On Sep 3, 2010, at 12:16 AM, Ralph Castain wrote: >> >>> Backing off the polling rate requires more application-specific logic like >>> that offered below, so i

Re: [OMPI users] MPI_Reduce performance

2010-09-09 Thread Ralph Castain
As people have said, these time values are to be expected. All they reflect is the time difference spent in reduce waiting for the slowest process to catch up to everyone else. The barrier removes that factor by forcing all processes to start from the same place. No mystery here - just a reflec

Re: [OMPI users] MPI_Reduce performance

2010-09-09 Thread Ralph Castain
On Sep 9, 2010, at 1:46 AM, Ashley Pittman wrote: > > On 9 Sep 2010, at 08:31, Terry Frankcombe wrote: > >> On Thu, 2010-09-09 at 01:24 -0600, Ralph Castain wrote: >>> As people have said, these time values are to be expected. All they >>> reflect is the time d

Re: [OMPI users] Strange Segmentation Fault inside MPI_Init

2010-09-11 Thread Ralph Castain
How did you configure OMPI? On Sep 11, 2010, at 1:35 AM, Srikanth Raju wrote: > Hello OMPI Users, > I'm using OpenMPI 1.4.1 with gcc 4.4.3 on my x86_64 linux system running the > latest Ubuntu 10.04 distro. I don't seem to be able to run any OpenMPI > application. I try running the simplest app

Re: [OMPI users] function fgets hangs a mpi program when it is used ompi-ps command

2010-09-22 Thread Ralph Castain
Printouts of less than 100 bytes would be unusual...but possible On Wed, Sep 22, 2010 at 8:15 AM, Jeff Squyres wrote: > Are you running on machines with OpenFabrics devices (that Open MPI is > using)? > > Is ompi-ps printing 100 bytes or more? > > What does ps show when your program is hung? > >

Re: [OMPI users] function fgets hangs a mpi program when it is used ompi-ps command

2010-09-23 Thread Ralph Castain
18053 poll_s pts/000:00:00 test.run > 0 S 1000 1844 1840 0 80 0 - 18053 poll_s pts/000:00:00 test.run > > pipe_s = wait state on read/write against a pipe. > > So, with that command I concluded that one mpi process is waiting for the > read of a pipe. &

Re: [OMPI users] Running on crashing nodes

2010-09-23 Thread Ralph Castain
In a word, no. If a node crashes, OMPI will abort the currently-running job if it had processes on that node. There is no current ability to "ride-thru" such an event. That said, there is work being done to support "ride-thru". Most of that is in the current developer's code trunk, and more is com

Re: [OMPI users] How to know which process is running on which core?

2010-09-23 Thread Ralph Castain
You mean via an API of some kind? Not through an MPI call, but you can do it (though your code will wind up OMPI-specific). Look at the OMPI source code in opal/mca/paffinity/paffinity.h and you'll see the necessary calls as well as some macros to help parse the results. Depending upon what versio

Re: [OMPI users] Potential developer to reinstate Xgrid support

2010-09-30 Thread Ralph Castain
Hi Daniel I had actually volunteered to do this once Apple provided me with the required Mac OSX Server license, but I honestly haven't had time to do so. We would welcome any patches you can provide! The relevant code is located in orte/mca/plm/xgrid/src. I believe it currently compiles, but

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-10-04 Thread Ralph Castain
It looks to me like your remote nodes aren't finding the orted executable. I suspect the problem is that you need to forward the path and ld_library_path tot he remove nodes. Use the mpirun -x option to do so. On Oct 4, 2010, at 5:08 AM, Chris Jewell wrote: > Hi all, > > Firstly, hello to the

Re: [OMPI users] mpi_comm_spawn have problems with group communicators

2010-10-04 Thread Ralph Castain
I'm not sure why the group communicator would make a difference - the code area in question knows nothing about the mpi aspects of the job. It looks like you are hitting a race condition that causes a particular internal recv to not exist when we subsequently try to cancel it, which generates th

Re: [OMPI users] mpi_comm_spawn have problems with group communicators

2010-10-04 Thread Ralph Castain
On Oct 4, 2010, at 10:36 AM, Milan Hodoscek wrote: >>>>>> "Ralph" == Ralph Castain writes: > >Ralph> I'm not sure why the group communicator would make a >Ralph> difference - the code area in question knows nothing about >Ralph

Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-04 Thread Ralph Castain
Some of what you are seeing is the natural result of context switchingsome thoughts regarding the results: 1. You didn't bind your procs to cores when running with #procs < #cores, so you're performance in those scenarios will also be less than max. 2. Once the number of procs exceeds the

Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-04 Thread Ralph Castain
yperthreads at this time. > > BTW, how to bind the proc to the core? I tried --bind-to-core or > -bind-to-core but neither works. Is it for OpenMP, not for OpenMPI? Those should work. You might try --report-bindings to see what OMPI thought it did. > > Linbao > > >

Re: [OMPI users] memory limits on remote nodes

2010-10-06 Thread Ralph Castain
There is an MCA param that tells the orted to set its usage limits to the hard limit: MCA opal: parameter "opal_set_max_sys_limits" (current value: <0>, data source: default value) Set to non-zero to automatically set any system-imposed limits to the ma

Re: [OMPI users] Pros and cons of --enable-heterogeneous

2010-10-07 Thread Ralph Castain
Hetero operations tend to lose a little performance due to the need to convert data, but otherwise there is no real negative. We don't do it by default solely because the majority of installations don't need to, and there is no reason to lose even a little performance if it isn't necessary. If you

Re: [OMPI users] Pros and cons of --enable-heterogeneous

2010-10-07 Thread Ralph Castain
The short answer is "yes". It should work. On Thu, Oct 7, 2010 at 1:53 PM, Durga Choudhury wrote: > I'd like to add to this question the following: > > If I compile with --enable-heterogenous flag for different > *architectures* (I have a mix of old 32 bit x86, newer x86_64 and some > Cell BE b

Re: [OMPI users] memory limits on remote nodes

2010-10-07 Thread Ralph Castain
On Oct 7, 2010, at 2:55 AM, Reuti wrote: > Am 07.10.2010 um 01:55 schrieb David Turner: > >> Hi, >> >> We would like to set process memory limits (vmemoryuse, in csh >> terms) on remote processes. Our batch system is torque/moab. > > Isn't it possible to set this up in torque/moab directly? I

Re: [OMPI users] memory limits on remote nodes

2010-10-07 Thread Ralph Castain
On Oct 6, 2010, at 11:25 PM, David Turner wrote: > Hi Ralph, > >> There is an MCA param that tells the orted to set its usage limits to the >> hard limit: >> >> MCA opal: parameter "opal_set_max_sys_limits" (current >> value:<0>, data source: default value) >>

Re: [OMPI users] memory limits on remote nodes

2010-10-08 Thread Ralph Castain
Ah - I was unfamiliar with that option. Thanks! David: does that meet the need? On Oct 8, 2010, at 2:45 AM, Reuti wrote: > Am 08.10.2010 um 00:40 schrieb Ralph Castain: > >> >> On Oct 7, 2010, at 2:55 AM, Reuti wrote: >> >>> Am 07.10.2010 um 01:55 s

Re: [OMPI users] memory limits on remote nodes

2010-10-12 Thread Ralph Castain
No problem - I got to learn something too! On Oct 11, 2010, at 11:19 PM, David Turner wrote: > Hi, > > Various people contributed: > > Isn't it possible to set this up in torque/moab directly? In SGE I would > simply define h_vmem and it's per slot then; and with a tight integration

Re: [OMPI users] my leak or OpenMPI's leak?

2010-10-17 Thread Ralph Castain
There is no OMPI 2.5 - do you mean 1.5? On Oct 17, 2010, at 4:11 PM, Brian Budge wrote: > Hi Jody - > > I noticed this exact same thing the other day when I used OpenMPI v > 2.5 built with valgrind support. I actually ran out of memory due to > this. When I went back to v 2.43, my program work

Re: [OMPI users] my leak or OpenMPI's leak?

2010-10-18 Thread Ralph Castain
in case you call again. This helps performance by reducing the number of malloc's in your application. > > Jody > > On Mon, Oct 18, 2010 at 1:57 AM, Ralph Castain wrote: >> There is no OMPI 2.5 - do you mean 1.5? >> >> On Oct 17, 2010, at 4:11 PM, Brian Bu

Re: [OMPI users] my leak or OpenMPI's leak?

2010-10-18 Thread Ralph Castain
; send data to each other and to the master. > > > On Mon, Oct 18, 2010 at 2:48 PM, Ralph Castain wrote: >> >> On Oct 18, 2010, at 1:41 AM, jody wrote: >> >>> I had this leak with OpenMPI 1.4.2 >>> >>> But in my case, there is no accumulati

Re: [OMPI users] Number of processes and spawn

2010-10-19 Thread Ralph Castain
w that the new realease 1.5 is out. > I didn't found this fix in the "list of changes", is it present but not > mentioned since is a minor fix ? > > Thank you, > Federico > > > > 2010/4/1 Ralph Castain > Hi there! > > It will be in the 1.5.0 r

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-20 Thread Ralph Castain
Just to be clear: it isn't mpiexec that is failing. It is your MPI application processes that are failing. On Oct 20, 2010, at 7:38 AM, Siegmar Gross wrote: > Hi, > > I have built Open MPI 1.5 on Linux x86_64 with the Oracle/Sun Studio C > compiler. Unfortunately "mpiexec" breaks when I run a

Re: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults

2010-10-20 Thread Ralph Castain
The error message seems to imply that mpirun itself didn't segfault, but that something else did. Is that segfault pid from mpirun? This kind of problem usually is caused by mismatched builds - i.e., you compile against your new build, but you pick up the Myrinet build when you try to run becau

Re: [OMPI users] dinamic spawn process on remote node

2010-10-22 Thread Ralph Castain
MPI won't do this - if a node dies, the entire MPI job is terminated. Take a look at OpenRCM, a subproject of Open MPI: http://www.open-mpi.org/projects/orcm/ This is designed to do what you describe as we have a similar (open source) project underway at Cisco. If I were writing your system, I

Re: [OMPI users] Fix the use of hostfiles when a username is supplied in v1.5 ?

2010-10-22 Thread Ralph Castain
RHE x86_64 > machinefile example : > or985966@is209898 slots=1 > realtime@is206022 slots=8 > realtime@is206025 slots=8 > > Best regards, > > Olivier > > -- Forwarded message -- > From: Ralph Castain > Date: 2010/3/11 > Subject: Re: [OMPI us

Re: [OMPI users] Fwd: Cross compiling for 32 bit from a 64 bit machine

2010-10-25 Thread Ralph Castain
Do ./configure --help and you'll see options for specifying the host and build target. You need to do that when cross-compiling. On Oct 25, 2010, at 12:01 PM, saahil...@gmail.com wrote: > -- Forwarded message -- > From: saahil...@gmail.com > Date: Oct 25, 2010 11:26pm > Subject:

Re: [OMPI users] Fwd: Cross compiling for 32 bit from a 64 bit machine

2010-10-25 Thread Ralph Castain
> Saahil > > On Oct 25, 2010 11:35pm, Ralph Castain wrote: > > Do ./configure --help and you'll see options for specifying the host and > > build target. You need to do that when cross-compiling. > > > > > > > > > > > > On Oct

Re: [OMPI users] Fwd: Cross compiling for 32 bit from a 64 bit machine

2010-10-25 Thread Ralph Castain
e. Did I fail to understand what you said? Am I doing something > > wrong here? > > > > Regards, > > Saahil > > > > On Oct 25, 2010 11:35pm, Ralph Castain r...@open-mpi.org> wrote: > > > Do ./configure --help and you'll see options for specifying the host a

Re: [OMPI users] Using hostfile with default hostfile

2010-10-27 Thread Ralph Castain
Specify your hostfile as the default one: mpirun --default-hostfile ./Cluster.hosts Otherwise, we take the default hostfile and then apply the hostfile as a filter to select hosts from within it. Sounds strange, I suppose, but the idea is that the default hostfile can contain configuration info

Re: [OMPI users] failed to install openmpi trunk

2010-10-29 Thread Ralph Castain
Couple of things stand out: 1. you definitely don't want to use a copy of the trunk beyond r23924. The developer's trunk is undergoing some major change and orcm no longer is in-sync with it. I probably won't update orcm to match until later this year (will freeze integration at r23924). 2. th

Re: [OMPI users] Problem with sending messages from one of the machines

2010-11-11 Thread Ralph Castain
There are two connections to be specified: -mca oob_tcp_if_include xxx -mca btl_tcp_if_include xxx On Nov 11, 2010, at 12:04 PM, Krzysztof Zarzycki wrote: > Hi, > I'm working with Grzegorz on the mentioned problem. > If I'm correct on checking the firewall settings, "iptables --list" shows an

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-13 Thread Ralph Castain
I am not a grid engine expert by any means, but I do know a bit about OMPI's internals for binding processes. Here is what we do: 1. mpirun gets its list of hosts from the environment, or from your machine file. It then maps the processes across the machines. 2. mpirun launches a daemon on eac

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Ralph Castain
I confess I am now confused. What version of OMPI are you using? FWIW: OMPI was updated at some point to detect the actual cores of an external binding, and abide by them. If we aren't doing that, then we have a bug that needs to be resolved. Or it could be you are using a version that predates th

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Ralph Castain
The external binding code should be in that version. If you add --report-bindings --leave-session-attached to the mpirun command line, you should see output from each daemon telling you what external binding it detected, and how it is binding each app it launches. Thanks! On Mon, Nov 15, 2010 a

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Ralph Castain
Perhaps I'm missing it, but it seems to me that the real problem lies in the interaction between SGE and OMPI during OMPI's two-phase launch. The verbose output shows that SGE dutifully allocated the requested number of cores on each node. However, OMPI launches only one process on each node (the O

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Ralph Castain
Hi Reuti > > 2. have SGE bind procs it launches to -all- of those cores. I believe SGE > does this automatically to constrain the procs to running on only those > cores. > > This is another "bug/feature" in SGE: it's a matter of discussion, whether > the shepherd should get exactly one core (in c

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Ralph Castain
On Tue, Nov 16, 2010 at 12:23 PM, Terry Dontje wrote: > On 11/16/2010 01:31 PM, Reuti wrote: > > Hi Ralph, > > Am 16.11.2010 um 15:40 schrieb Ralph Castain: > > > 2. have SGE bind procs it launches to -all- of those cores. I believe SGE > does this automaticall

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Ralph Castain
Cris' output is coming solely from the HNP, which is correct given the way things were executed. My comment was from another email where he did what I asked, which was to include the flags: --report-bindings --leave-session-attached so we could see the output from each orted. In that email, it wa

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Ralph Castain
wrote: > On 11/17/2010 09:32 AM, Ralph Castain wrote: > > Cris' output is coming solely from the HNP, which is correct given the way > things were executed. My comment was from another email where he did what I > asked, which was to include the flags: > > --repor

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Ralph Castain
attached and provide -all- output we could verify that analysis and clear up the confusion? On Wed, Nov 17, 2010 at 8:13 AM, Terry Dontje wrote: > On 11/17/2010 10:00 AM, Ralph Castain wrote: > > --leave-session-attached is always required if you want to see output from > the daemons. O

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Ralph Castain
On 11/17/2010 10:48 AM, Ralph Castain wrote: > > No problem at all. I confess that I am lost in all the sometimes disjointed > emails in this thread. Frankly, now that I search, I can't find it either! > :-( > > I see one email that clearly shows the external binding report f

Re: [OMPI users] Unable to find the following executable

2010-11-17 Thread Ralph Castain
Which executable is it not finding? mpirun? Your application? On Wed, Nov 17, 2010 at 7:49 PM, Tushar Andriyas wrote: > Hi there, > > I am new to using mpi commands and was stuck in problem with running a > code. When I submit my job through a batch file, the job exits with the > message that th

Re: [OMPI users] Unable to find the following executable

2010-11-18 Thread Ralph Castain
Is you "hello world" test program in the same directory as SWMF? Is it possible that the path you are specifying is not available on all of the remote machines? That's the most common problem we see. On Thu, Nov 18, 2010 at 7:59 AM, Tushar Andriyas wrote: > Hi there, > > Thanks for the expedite

Re: [OMPI users] Unable to find the following executable

2010-11-18 Thread Ralph Castain
rote: > no its not in the same directory as SWMF. I guess the path is the same > since all the machines in a cluster are configured d same way. How do I know > if this is not the case? > > > On Thu, Nov 18, 2010 at 8:25 AM, Ralph Castain wrote: > >> Is you "hello wo

Re: [OMPI users] SIGPIPE handling?

2010-11-24 Thread Ralph Castain
What OMPI version are you talking about? We already trap SIGPIPE, but ignore it at the request of others (not sure what version that was started). I believe a flag may exist to alter that behavior - could easily be added if not. On Nov 24, 2010, at 5:08 PM, Jesse Ziser wrote: > Hello, > > I'

Re: [OMPI users] SIGPIPE handling?

2010-11-25 Thread Ralph Castain
After digging around a little, I found that you must be using the OMPI devel trunk as no release version contains this code. I also looked to see why it was done, and found that the concern was with an inadvertent sigpipe that can occur internal to OMPI due to a race condition. So I modified th

Re: [OMPI users] Help!!!!!!!!!!!!Openmpi instal for ubuntu 64 bits

2010-11-29 Thread Ralph Castain
Are you using the Intel compiler? The build system is looking for an "icc" command and not finding it. Perhaps something in your environment is defining CC to be "icc"? On Nov 29, 2010, at 10:43 AM, Maurício Rodrigues wrote: > HI, I need to install opnmpi 1.4.2 in Ubuntu 4.10 64bit, and giving

Re: [OMPI users] Help!!!!!!!!!!!!Openmpi instal for ubuntu 64 bits

2010-11-29 Thread Ralph Castain
lema. > gostaria de ajuda se possivel. > obrigado desde de já > > 2010/11/29 Ralph Castain > > Are you using the Intel compiler? The build system is looking for an "icc" > command and not finding it. Perhaps something in your environment is defining > CC to b

Re: [OMPI users] failure to launch MPMD program on win32 w 1.4.3

2010-11-30 Thread Ralph Castain
It truly does help to know what version of OMPI you are using - otherwise, there is little we can do to help On Nov 30, 2010, at 4:05 AM, Hicham Mouline wrote: > Hello, > > I have successfully run > > mpirun -np 3 .\test.exe > > when I try MPMP > > mpirun -np 3 .\test.exe : -np 3 .\test

Re: [OMPI users] SIGPIPE handling?

2010-12-01 Thread Ralph Castain
t; had nothing to do with OpenMPI. So thanks for the help; all is well. (And >> sorry for the belated reply.) >> Ralph Castain wrote: >>> After digging around a little, I found that you must be using the OMPI >>> devel trunk as no release version contains this code.

Re: [OMPI users] glut display 'occasionally' opens

2010-12-06 Thread Ralph Castain
Guess I'm not entirely sure I understand how this is supposed to work. All the -x does is tell us to pickup an envar of the given name and forward its value to the remote apps. You can't set the envar's value on the cmd line. So you told mpirun to pickup the value of an envar called "DISPLAY=:0.

Re: [OMPI users] glut display 'occasionally' opens

2010-12-06 Thread Ralph Castain
ocesses. So I believe it to be necessary. > > But I'm thinking I may have to configure some kind of X11 forwarding. I'm > not sure... > > Thanks for your reply! Any more ideas? > Brad > > > On Mon, Dec 6, 2010 at 6:31 PM, Ralph Castain wrote: > Guess

Re: [OMPI users] glut display 'occasionally' opens

2010-12-07 Thread Ralph Castain
n your app, just do a getenv and print the display envar Would help tell us if there is an OMPI problem, or just a problem in how you setup X11 On Dec 6, 2010, at 9:18 PM, Ralph Castain wrote: > Hmmm...yes, the code does seem to handle that '=' being in there. Forgot it > was th

Re: [OMPI users] Interaction with boost::asio

2010-12-07 Thread Ralph Castain
You might want to ask the boost people - we wouldn't have any idea what asio is or does. On Dec 7, 2010, at 6:06 AM, Hannes Brandstätter-Müller wrote: > Hello! > > I am using OpenMPI in combination with the boost libraries (especially > boost::asio) and came across a weird interaction. When I

Re: [OMPI users] glut display 'occasionally' opens

2010-12-07 Thread Ralph Castain
d using X for > most things many years ago, so my xhost/xauth information is probably a > little dated. Google around for the most recent / best ways to do this stuff. > > > On Dec 6, 2010, at 10:11 PM, Ralph Castain wrote: > > > BTW: you might check to see if the DISPLAY e

Re: [OMPI users] mpirun error in OpenMPI 1.5

2010-12-08 Thread Ralph Castain
That could mean you didn't recompile the code using the new version of OMPI. The 1.4 and 1.5 series are not binary compatible - you have to recompile your code. If you did recompile, you may be getting version confusion on the backend nodes - you should check your ld_library_path and ensure it

Re: [OMPI users] curious behavior during wait for broadcast: 100% cpu

2010-12-08 Thread Ralph Castain
I know we have said this many times - OMPI made a design decision to poll hard while waiting for messages to arrive to minimize latency. If you want to decrease cpu usage, you can use the yield_when_idle option (it will cost you some latency, though) - see ompi_info --param ompi all Or don't se

Re: [OMPI users] glut display 'occasionally' opens

2010-12-08 Thread Ralph Castain
>> :display and it'll just go in an unencrypted fashion. This is >> normal X forwarding stuff -- you can probably google around for more info on >> this. >> >> NOTE: IIRC, xauth is better than xhost these days. I stopped using X for >> most things m

Re: [OMPI users] curious behavior during wait for broadcast: 100% cpu

2010-12-09 Thread Ralph Castain
Sorry for delay - am occupied with my day job. Yes, that is correct to an extent. When you yield the processor, all that happens is that you surrender the rest of your scheduled time slice back to the OS. The OS then cycles thru its scheduler and sequentially assigns the processor to the line o

Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-09 Thread Ralph Castain
The answer is yes - sort of... In OpenMPI, every process has information about not only its own local rank, but the local rank of all its peers regardless of what node they are on. We use that info internally for a variety of things. Now the "sort of". That info isn't exposed via an MPI API at

Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-10 Thread Ralph Castain
rocesses and have them use that as the color to a MPI_Comm_split call. > Once you've done that you can do a MPI_Comm_size to find how many are on the > node and be able to send to all the other processes on that node using the > new communicator. > > Good luck, > > -

Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-10 Thread Ralph Castain
There are no race conditions in this data. It is determined by mpirun prior to launch, so all procs receive the data during MPI_Init and it remains static throughout the life of the job. It isn't dynamically updated at this time (will change in later versions), so it won't tell you if a process

Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-10 Thread Ralph Castain
Sorry - guess I had misunderstood. Yes, if all you want is the local rank of your own process, then this will work. My suggestion was if you wanted the list of local procs, or to know the local rank of your peers. On Dec 10, 2010, at 1:24 PM, David Mathog wrote: > Ashley Pittman wrote: > >>

Re: [OMPI users] Guaranteed run rank 0 on a given machine?

2010-12-10 Thread Ralph Castain
Terry is correct - not guaranteed, but that is the typical behavior. However, you -can- guarantee that rank=0 will be on a particular host. Just run your job: mpirun -n 1 -host my_app : -n (N-1) my_app This guarantees that rank=0 is on host . All other ranks will be distributed according to t

Re: [OMPI users] cannot restrict port numbers using btl_tcp_port_min_v4 and btl_tcp_port_range_v4

2010-12-10 Thread Ralph Castain
mpirun is not an MPI process, and so it doesn't obey the btl port params. To control mpirun's ports (and those used by the ORTE daemons), use the oob_tcp_port... params On Dec 10, 2010, at 3:29 PM, Tang, Hsiu-Khuern wrote: > > Hi, > > I am trying to understand how to control the range of por

Re: [OMPI users] cannot restrict port numbers using btl_tcp_port_min_v4 and btl_tcp_port_range_v4

2010-12-11 Thread Ralph Castain
> LISTEN 9714/mpirun > tcp0 0 :::58600:::* > LISTEN 9714/mpirun > ... > > -- > Best, > Hsiu-Khuern. > > > * On Fri 03:49PM -0700, 10 Dec 2010, Ralph Castain (r...@open-mpi.org) wrote:

Re: [OMPI users] how to set the connecttimeout para?

2010-12-13 Thread Ralph Castain
What version of OMPI are you using? That error message looks like something from an ancient version - might be worth updating. On Dec 13, 2010, at 4:04 AM, peifan wrote: > i have 3 nodes, one is master node and another is computing nodes,these nodes > deployed in the internet (not in cluster) >

Re: [OMPI users] curious behavior during wait for broadcast: 100% cpu

2010-12-13 Thread Ralph Castain
around to find those discussions. > > > On Dec 9, 2010, at 4:07 PM, Ralph Castain wrote: > >> Sorry for delay - am occupied with my day job. >> >> Yes, that is correct to an extent. When you yield the processor, all that >> happens is that you surrender the rest

Re: [OMPI users] curious behavior during wait for broadcast: 100% cpu

2010-12-13 Thread Ralph Castain
l for some time before deciding to yield. On Dec 13, 2010, at 7:52 AM, Jeff Squyres wrote: > See the discussion on kerneltrap: > >http://kerneltrap.org/Linux/CFS_and_sched_yield > > Looks like the change came in somewhere around 2.6.23 or so...? > > > > On De

Re: [OMPI users] curious behavior during wait for broadcast: 100% cpu

2010-12-13 Thread Ralph Castain
OMPI does use those methods, but they can't be used for something like shared memory. So if you want the performance benefit of shared memory, then we have to poll. On Dec 13, 2010, at 9:00 AM, Hicham Mouline wrote: > I don't understand 1 thing though and would appreciate your comments. > >

Re: [OMPI users] Spawning with the ompi-server option

2010-12-14 Thread Ralph Castain
Not sure I fully understand the question. If you provide the --ompi-server option to mpirun, that info will be passed along to all processes, including those launched via comm_spawn, so they can subsequently connect to the server. On Dec 14, 2010, at 6:50 AM, Suraj Prabhakaran wrote: > Hello

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread Ralph Castain
That's a big cluster to be starting with rsh! :-) When you say it won't start, do you mean that it hangs? Or does it fail with some error message? How many nodes are involved (this is the important number, not the number of cores)? Also, what version are you using? On Dec 14, 2010, at 9:10 AM

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread Ralph Castain
> > I wonder : is this plm_rsh_num_concurrent parameter standing ONLY for rsh use, > or for ssh OR rsh, depending on plm_rsh_agent, please ? > > Thanks, Best, G. > > > Le 14/12/2010 18:30, Ralph Castain a écrit : >> That's a big cluster to be starting wit

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Ralph Castain
It would appear that there is something trying to talk to a socket opened by one of your daemons. At a guess, I would bet the problem is that a prior job left a daemon alive that is talking on the same socket. Are you by chance using static ports for the job? Did you run another job just before

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Ralph Castain
On Dec 15, 2010, at 10:14 AM, Gilbert Grosdidier wrote: > Bonjour Ralph, > > Thanks for taking time to help me. > > Le 15 déc. 10 à 16:27, Ralph Castain a écrit : > >> It would appear that there is something trying to talk to a socket opened by >> one of you

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Ralph Castain
On Dec 15, 2010, at 12:30 PM, Gilbert Grosdidier wrote: > Bonsoir Ralph, > > Le 15/12/2010 18:45, Ralph Castain a écrit : >> It looks like all the messages are flowing within a single job (all three >> processes mentioned in the error have the same identifier). Only possib

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Ralph Castain
t zombies in their wake. Can you clean those up? > > Thanks, Best, G. > > > > Le 15/12/2010 21:03, Ralph Castain a écrit : >> On Dec 15, 2010, at 12:30 PM, Gilbert Grosdidier wrote: >> >>> Bonsoir Ralph, >>> >>> Le 15/12/2010 18:

Re: [OMPI users] segmentation fault

2010-12-15 Thread Ralph Castain
I have no idea what you mean by "cell sizes per core". Certainly not any terminology within OMPI... On Dec 15, 2010, at 3:47 PM, Vaz, Guilherme wrote: > > Dear all, > > I have a problem with openmpi1.3, ifort+mkl v11.1 in Ubuntu10.04 systems (32 > or 64bit). My code worked in Ubuntu8.04 and

Re: [OMPI users] srun and openmpi

2010-12-23 Thread Ralph Castain
I'm not sure there is any documentation yet - not much clamor for it. :-/ It would really help if you included the error message. Otherwise, all I can do is guess, which wastes both of our time :-( My best guess is that the port reservation didn't get passed down to the MPI procs properly - but

Re: [OMPI users] openmpi hangs when running on more than one node (unless i use --debug-daemons )

2010-12-28 Thread Ralph Castain
All --debug-daemons really does is keep the ssh session open after launching the remote daemon and turn on some output. Otherwise, we close that session as most systems only allow a limited number of concurrent ssh sessions to be open. I suspect you have a system setting that kills any running j

Re: [OMPI users] openmpi hangs when running on more than one node (unless i use --debug-daemons )

2010-12-29 Thread Ralph Castain
e that orted daemon on the second node is called in a different > way. > Moreover when i launch without --debug-daemons a process called orted.. > remain active on the second node after i kill (ctrl+c) the command on the > first node. > > Can you continue to help me ? >

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Ralph Castain
gt; have to Ctrl-C and terminate >> >> I have mpiports defined in my slurm config and running srun with >> -resv-ports does show the SLURM_RESV_PORTS environment variable >> getting parts to the shell >> >> >> On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain wr

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Ralph Castain
transport key from ORTE > (orte_precondition_transports not present in the environment) > PML add procs failed > --> Returned "Error" (-1) instead of "Success" (0) > > Turn off PSM and srun works fine > > > On Thu, Dec 30, 2010 at 5:13 PM, Ralph Cas

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Ralph Castain
inary data On Dec 30, 2010, at 11:18 AM, Michael Di Domenico wrote: > Sure, i'll give it a go > > On Thu, Dec 30, 2010 at 5:53 PM, Ralph Castain wrote: >> Ah, yes - that is going to be a problem. The PSM key gets generated by >> mpirun as it is shared info - i.e., every

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Ralph Castain
Should have also warned you: you'll need to configure OMPI --with-devel-headers to get this program to build/run. On Dec 30, 2010, at 1:54 PM, Ralph Castain wrote: > Well, I couldn't do it as a patch - proved too complicated as the psm system > looks for the value early in th

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Ralph Castain
at 2:11 PM, Michael Di Domenico wrote: > How early does this need to run? Can I run it as part of a task > prolog, or does it need to be the shell env for each rank? And does > it need to run on one node or all the nodes in the job? > > On Thu, Dec 30, 2010 at 8:54 PM, Ralph Cas

Re: [OMPI users] Can openmpi run on two different operating system?

2011-01-04 Thread Ralph Castain
Correct On Jan 4, 2011, at 3:33 AM, Hicham Mouline wrote: > From what I understand, unix variants can talk to each other (linux to macosx > sunos ...) but windows cannot talk to non windows (not yet? :-) > > regards, > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On

Re: [OMPI users] Duplicate independent processes

2011-01-05 Thread Ralph Castain
I'm afraid I don't understand your example - are you saying you provide "-np 1" and get two processes instead of 1? If so, would you please provide info on the type of system where this happens? I've never seen it with mpich or ompi On Jan 5, 2011, at 4:57 PM, Kristian Medri wrote: > Any hint

Re: [OMPI users] mpirun --nice 10 prog ??

2011-01-06 Thread Ralph Castain
Afraid not - though you could alias your program name to be "nice --10 prog" On Jan 6, 2011, at 3:39 PM, David Mathog wrote: > Is it possible using mpirun to specify the nice value for each program > run on the worker nodes? It looks like some MPI implementations allow > this, but "mpirun --hel

Re: [OMPI users] Duplicate independent processes

2011-01-06 Thread Ralph Castain
w had Open MPI on it. > > -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: January 5, 2011 8:09 PM > To: Open MPI Users > Subject: Re: [OMPI users] Duplicate independent processes > > I&#x

Re: [OMPI users] Newbie question

2011-01-12 Thread Ralph Castain
On Jan 12, 2011, at 12:54 PM, Tena Sakai wrote: > Hi Siegmar, > > Many thanks for your reply. > > I have tried man pages you mention, but one hurdle I am running into > is orte_hosts page. I don't find the specification of fields for > the file. I see an example: > > dummy1 slots=4 > dum

<    2   3   4   5   6   7   8   9   10   11   >