Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Terry Dontje
A clarification from your previous email, you had your code working with OMPI 1.4.1 but an older version of OFED? Then you upgraded to OFED 1.4 and things stopped working? Sounds like your current system is set up with OMPI 1.4.2 and OFED 1.5. Anyways, I am a little confused as to when thing

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Terry Dontje
4 0x0042f15c in main () On Tue, 2010-07-27 at 06:14 -0400, Terry Dontje wrote: A clarification from your previous email, you had your code working with OMPI 1.4.1 but an older version of OFED? Then you upgraded to OFED 1.4 and things stopped working? Sounds like your current system is set

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Terry Dontje
with appears to work fine with MVAPICH2 1.4.1, if that is any help. -Brian On Tue, 2010-07-27 at 10:48 -0400, Terry Dontje wrote: Can you try a simple point-to-point program? --td Brian Smith wrote: After running on two processors across two nodes, the problem occurs much earlier d

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-28 Thread Terry Dontje
ithm work instead of using the tuned, which is the default I believe, by setting "-mca coll_basic_priority 100". The idea here is to determine if the tuned collective itself is tickling the issue. --td Terry Dontje wrote: With this earlier failure do you know how many message may ha

Re: [OMPI users] Hybrid OpenMPI / OpenMP run pins OpenMP threads to a single core

2010-07-29 Thread Terry Dontje
Ralph Castain wrote: How are you running it when the threads are all on one core? If you are specifying --bind-to-core, then of course all the threads will be on one core since we bind the process (not the thread). If you are specifying -mca mpi_paffinity_alone 1, then the same behavior result

Re: [OMPI users] Hybrid OpenMPI / OpenMP run pins OpenMP threads to a single core

2010-07-29 Thread Terry Dontje
Ralph Castain wrote: On Jul 29, 2010, at 5:09 AM, Terry Dontje wrote: Ralph Castain wrote: How are you running it when the threads are all on one core? If you are specifying --bind-to-core, then of course all the threads will be on one core since we bind the process (not the thread). If

Re: [OMPI users] Hybrid OpenMPI / OpenMP run pins OpenMP threads to a single core

2010-07-29 Thread Terry Dontje
because I can manipulate (somewhat) the cores the threads are assigned by adding -bysocket -bind-to-socket to mpirun. On Thu, Jul 29, 2010 at 10:08 AM, Terry Dontje mailto:terry.don...@oracle.com>> wrote: Ralph Castain wrote: On Jul 29, 2010, at 5:09 AM, Terry Dontje

Re: [OMPI users] OpenIB Error in ibv_create_srq

2010-08-02 Thread Terry Dontje
My guess is from the message below saying "(openib) BTL failed to initialize" that the code is probably running over tcp. To absolutely prove this you can specify to only use the openib, sm and self btls to eliminate the tcp btl. To do that you add the following to the mpirun line "-mca btl

Re: [OMPI users] Accessing to the send buffer

2010-08-02 Thread Terry Dontje
I believe it is definitely a no-no to STORE (write) into a send buffer while a send is posted. I know there have been debate in the forum to relax LOADS (reads) from a send buffer. I think OMPI can handle the latter case (LOADS). On the posted receive side you open yourself up for some race

Re: [OMPI users] Accessing to the send buffer

2010-08-02 Thread Terry Dontje
In the posted irecv case if you are reading from the posted receive buffer the problem is you may get one of three values: 1. pre irecv value 2. value received from the irecv in progress 3. possibly garbage if you are unlucky enough to access memory that is at the same time being updated.

Re: [OMPI users] Accessing to the send buffer

2010-08-02 Thread Terry Dontje
For OMPI I believe reading the data buffer given to a posted send will not cause any problems. Anyone on the list care to disagree? --td Alberto Canestrelli wrote: Thanks, ok that is not my problem I never read a data from the posted receive before the correspondent WAIT. Now the last questi

Re: [OMPI users] OpenIB Error in ibv_create_srq

2010-08-03 Thread Terry Dontje
use cluster has more recent Mellanox IB hardware and is running this same IB stack and ompi 1.4.2 works OK, so I suspect srq is supported by the OpenFabrics stack. Perhaps.) Thanks, Allen On Mon, 2010-08-02 at 06:47 -0400, Terry Dontje wrote: My guess is from the message below saying &qu

Re: [OMPI users] OpenIB Error in ibv_create_srq

2010-08-04 Thread Terry Dontje
, Terry Dontje wrote: Sorry, I didn't see your prior question glad you found the btl_openib_receive_queues parameter. There is not a faq entry for this but I found the following in the openib btl help file that spells out the parameters when using Per-peer receive queue (ie receive queue se

Re: [OMPI users] problem with .bashrc stetting of openmpi

2010-08-13 Thread Terry Dontje
sun...@chem.iitb.ac.in wrote: Dear Open-mpi users, I installed openmpi-1.4.1 in my user area and then set the path for openmpi in the .bashrc file as follow. However, am still getting following error message whenever am starting the parallel molecular dynamics simulation using GROMACS. So every

Re: [OMPI users] is there a way to bring to light _all_ configure options in a ready installation?

2010-08-24 Thread Terry Dontje
Jeff Squyres wrote: You should be able to run "./configure --help" and see a lengthy help message that includes all the command line options to configure. Is that what you're looking for? No, he wants to know what configure options were used with some binaries. --td On Aug 24, 2010, at 7

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-23 Thread Terry Dontje
Eloi, I am curious about your problem. Can you tell me what size of job it is? Does it always fail on the same bcast, or same process? Eloi Gaudry wrote: Hi Nysal, Thanks for your suggestions. I'm now able to get the checksum computed and redirected to stdout, thanks (I forgot the "-mca p

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-24 Thread Terry Dontje
e segfault (occuring as hrd->tag = 0 in btl_openib_component.c:2881) yet. Eloi /home/pp_fr/st03230/EG/Softs/openmpi-custom-1.4.2/bin/ On Thursday 23 September 2010 23:33:48 Terry Dontje wrote: Eloi, I am curious about your problem. Can you tell me what size of job it is? Does it

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-24 Thread Terry Dontje
the send side to figure out what might make it generate at 0 hdr->tag. Or maybe instrument the send side to stop when it is about ready to send a 0 hdr->tag and see if we can see how the code got there. I might have some cycles to look at this Monday. --td Eloi On Friday 24 Sep

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje
reproducer, I've already tried to write something but I haven't succeeded so far at reproducing the hdr->tag=0 issue with it. Eloi On 24/09/2010 18:37, Terry Dontje wrote: Eloi Gaudry wrote: Terry, You were right, the error indeed seems to come from the message coalescing featur

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje
ib_component.c::handle_wc in the SEND/RDMA_WRITE case, but this is all I can think of alone. You'll find a stacktrace (receive side) in this thread (10th or 11th message) but it might be pointless. Regards, Eloi On Monday 27 September 2010 11:43:55 Terry Dontje wrote: So it sounds like coa

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje
in ompi/mca/btl/openib/btl_openib_component.c::handle_wc in the SEND/RDMA_WRITE case, but this is all I can think of alone. You'll find a stacktrace (receive side) in this thread (10th or 11th message) but it might be pointless. Regards, Eloi On Monday 27 September 2010 11:43:55 Terry Dontje wrote:

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje
closed the requested check outputs (using -output-filename stdout.tag.null option). I'm displaying frag->hdr->tag here. Eloi On Monday 27 September 2010 16:29:12 Terry Dontje wrote: Eloi, sorry can you print out frag->hdr->tag? Unfortunately from your last email I think it w

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-29 Thread Terry Dontje
Pasha, do you by any chance know who at Mellanox might be responsible for OMPI working? --td Eloi Gaudry wrote: Hi Nysal, Terry, Thanks for your input on this issue. I'll follow your advice. Do you know any Mellanox developer I may discuss with, preferably someone who has spent some time ins

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-29 Thread Terry Dontje
relaxed ordering memory operations. If I remember correct it was some IBM platform. Do you know if relaxed memory ordering is enabled on your platform ? If it is enabled you have to disable eager rdma. Regards, Pasha On Sep 29, 2010, at 1:04 PM, Terry Dontje wrote: Pasha, do you by any chance

Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-05 Thread Terry Dontje
On 10/05/2010 10:23 AM, Storm Zhang wrote: Sorry, I should say one more thing about the 500 procs test. I tried to run two 500 procs at the same time using SGE and it runs fast and finishes at the same time as the single run. So I think OpenMPI can handle them separately very well. For the b

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-20 Thread Terry Dontje
Can you remove the -with-threads and -enable-mpi-threads options from the configure line and see if that helps your 32 bit problem any? --td On 10/20/2010 09:38 AM, Siegmar Gross wrote: Hi, I have built Open MPI 1.5 on Linux x86_64 with the Oracle/Sun Studio C compiler. Unfortunately "mpiexec

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje
I wonder if the error below be due to crap being left over in the source tree. Can you do a "make clean". Note on a new checkout from the v1.5 svn branch I was able to build 64 bit with the following configure line: ../configure FC=f95 F77=f77 CC=cc CXX=CC --without-openib --without-udapl

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje
oh. --td Sent from my PDA. No type good. On Oct 21, 2010, at 6:25 AM, "Terry Dontje" <mailto:terry.don...@oracle.com>> wrote: I wonder if the error below be due to crap being left over in the source tree. Can you do a "make clean". Note on a new checkout from

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje
not building must be left over cruft. Note, my compiler hang disappeared on me. So maybe there was an environmental issue on my side. --td On 10/21/2010 06:47 AM, Terry Dontje wrote: On 10/21/2010 06:43 AM, Jeff Squyres (jsquyres) wrote: Also, i'm not entirely sure what all the comm

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje
On 10/21/2010 10:18 AM, Jeff Squyres wrote: Terry -- Can you file relevant ticket(s) for v1.5 on Trac? Once I have more information and have proven it isn't due to us using old compilers or a compiler error itself. --td On Oct 21, 2010, at 10:10 AM, Terry Dontje wrote: I've

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje
When you do a make can your add a V=1 to have the actual compile lines printed out. That will probably show you the line with -fno-directives-only in it. Which is odd because I think that option is a gcc'ism and don't know why it would show up in a studio build (note my build doesn't show it

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Terry Dontje
So what you are saying is *all* the ranks have entered MPI_Finalize and only a subset has exited per placing prints before and after MPI_Finalize. Good. So my guess is that the processes stuck in MPI_Finalize have a prior MPI request outstanding that for whatever reason is unable to complete

Re: [OMPI users] cannot install Open MPI 1.5 on Solaris x86_64 with Oracle/Sun C 5.11

2010-10-29 Thread Terry Dontje
Sorry, but can you give us the config line, the config.log and the full output of make preferrably with make V=1? --td On 10/29/2010 04:30 AM, Siegmar Gross wrote: Hi, I tried to build Open MPI 1.5 on Solaris X86 and x86_64 with Oracle Studio 12.2. I can compile Open MPI with thread support,

Re: [OMPI users] cannot install Open MPI 1.5 on Solaris x86_64 withOracle/Sun C 5.11

2010-11-01 Thread Terry Dontje
I am able to build on Linux systems with Sun C 5.11 using gcc-4.1.2. Still trying to get a version of gcc 4.3.4 compiled on our systems so I can use it with Sun C 5.11 to build OMPI. --td On 11/01/2010 05:58 AM, Siegmar Gross wrote: Hi, Sorry, but can you give us the config line, the c

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Terry Dontje
Sorry, I am still trying to grok all your email as what the problem you are trying to solve. So is the issue is trying to have two jobs having processes on the same node be able to bind there processes on different resources. Like core 1 for the first job and core 2 and 3 for the 2nd job? --

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Terry Dontje
On 11/15/2010 11:08 AM, Chris Jewell wrote: Sorry, I am still trying to grok all your email as what the problem you are trying to solve. So is the issue is trying to have two jobs having processes on the same node be able to bind there processes on different resources. Like core 1 for the first j

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Terry Dontje
On 11/15/2010 02:11 PM, Reuti wrote: Just to give my understanding of the problem: Am 15.11.2010 um 19:57 schrieb Terry Dontje: On 11/15/2010 11:08 AM, Chris Jewell wrote: Sorry, I am still trying to grok all your email as what the problem you are trying to solve. So is the issue is trying

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje
On 11/16/2010 04:26 AM, Chris Jewell wrote: Hi all, On 11/15/2010 02:11 PM, Reuti wrote: Just to give my understanding of the problem: Sorry, I am still trying to grok all your email as what the problem you are trying to solve. So is the issue is trying to have two jobs having processes on th

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje
On 11/16/2010 09:08 AM, Reuti wrote: Hi, Am 16.11.2010 um 14:07 schrieb Ralph Castain: Perhaps I'm missing it, but it seems to me that the real problem lies in the interaction between SGE and OMPI during OMPI's two-phase launch. The verbose output shows that SGE dutifully allocated the reque

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje
On 11/16/2010 10:59 AM, Reuti wrote: Am 16.11.2010 um 15:26 schrieb Terry Dontje: 1. allocate a specified number of cores on each node to your job this is currently the bug in the "slot<=> core" relation in SGE, which has to be removed, updated or clarified. For now slo

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje
On 11/16/2010 12:13 PM, Chris Jewell wrote: On 16 Nov 2010, at 14:26, Terry Dontje wrote: In the original case of 7 nodes and processes if we do -binding pe linear:2, and add the -bind-to-core to mpirun I'd actually expect 6 of the nodes processes bind to one core and the 7th node w

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje
On 11/16/2010 12:13 PM, Chris Jewell wrote: On 16 Nov 2010, at 14:26, Terry Dontje wrote: In the original case of 7 nodes and processes if we do -binding pe linear:2, and add the -bind-to-core to mpirun I'd actually expect 6 of the nodes processes bind to one core and the 7th node w

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje
On 11/16/2010 01:31 PM, Reuti wrote: Hi Ralph, Am 16.11.2010 um 15:40 schrieb Ralph Castain: 2. have SGE bind procs it launches to -all- of those cores. I believe SGE does this automatically to constrain the procs to running on only those cores. This is another "bug/feature" in SGE: it's a m

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje
On 11/16/2010 08:24 PM, Ralph Castain wrote: On Tue, Nov 16, 2010 at 12:23 PM, Terry Dontje mailto:terry.don...@oracle.com>> wrote: On 11/16/2010 01:31 PM, Reuti wrote: Hi Ralph, Am 16.11.2010 um 15:40 schrieb Ralph Castain: 2. have SGE bind procs it launches to -a

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje
On 11/17/2010 07:41 AM, Chris Jewell wrote: On 17 Nov 2010, at 11:56, Terry Dontje wrote: You are absolutely correct, Terry, and the 1.4 release series does include the proper code. The point here, though, is that SGE binds the orted to a single core, even though other cores are also

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje
sing that --leave-session-attached is not required when the OGE binding argument is not given. --td HTH Ralph On Wed, Nov 17, 2010 at 6:57 AM, Terry Dontje <mailto:terry.don...@oracle.com>> wrote: On 11/17/2010 07:41 AM, Chris Jewell wrote: On 17 Nov 2010, at 11:56, Terry Don

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje
a and email flying around it would be nice to actually see the output you mention. --td On Wed, Nov 17, 2010 at 7:51 AM, Terry Dontje <mailto:terry.don...@oracle.com>> wrote: On 11/17/2010 09:32 AM, Ralph Castain wrote: Cris' output is coming solely from the HNP, which is co

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje
given the -binding option. Perhaps if someone could run this test again with --report-bindings --leave-session-attached and provide -all- output we could verify that analysis and clear up the confusion? Yeah, however I bet you we still won't see output. --td On Wed, Nov 17, 20

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-18 Thread Terry Dontje
Yes, I believe this solves the mystery. In short OGE and ORTE both work. In the linear:1 case the job is exiting because there are not enough resources for the orte binding to work, which actually makes sense. In the linear:2 case I think we've proven that we are binding to the right amount

Re: [OMPI users] Multiple Subnet MPI Fail

2010-11-22 Thread Terry Dontje
You're gonna have to use a protocol that can route through a machine, OFED User Verbs (ie openib) does not do this. The only way I know of to do this via OMPI is with the tcp btl. --td On 11/22/2010 09:28 AM, Paul Monday (Parallel Scientific) wrote: We've been using OpenMPI in a switched envi

Re: [OMPI users] Prioritization of --mca btl openib,tcp,self

2010-11-23 Thread Terry Dontje
On 11/22/2010 08:18 PM, Paul Monday (Parallel Scientific) wrote: This is a follow-up to an earlier question, I'm trying to understand how --mca btl prioritizes it's choice for connectivity. Going back to my original network, there are actually two networks running around. A point to point In

Re: [OMPI users] cannot build Open MPI 1.5 on Linux x86_64 with Oracle/Sun C 5.11

2010-11-29 Thread Terry Dontje
This is ticket 2632 https://svn.open-mpi.org/trac/ompi/ticket/2632. A fix has been put into the trunk last week. We should be able to CMR this fix to the 1.5 and 1.4 branches later this week.The ticket actually has a workaround for 1.5 branch. --td On 11/29/2010 09:46 AM, Siegmar Gross w

Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-30 Thread Terry Dontje
On 11/29/2010 05:41 PM, Nehemiah Dacres wrote: thanks. FYI: its openmpi-1.4.2 from a tarball like you assume I changed this line *Sun\ F* | *Sun*Fortran*) # Sun Fortran 8.3 passes all unrecognized flags to the linker _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC' _LT_

Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-30 Thread Terry Dontje
hem all to '-Qoption ld' and then do the configure things should work. Good luck, --td On 11/30/2010 06:19 AM, Terry Dontje wrote: On 11/29/2010 05:41 PM, Nehemiah Dacres wrote: thanks. FYI: its openmpi-1.4.2 from a tarball like you assume I changed this line *Sun\ F* | *Sun*Fortr

Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-30 Thread Terry Dontje
A slight note for the below there should be a space between "ld" and the ending single quote mark so it should be '-Qoption ld ' not '-Qoption ld' --td On 11/30/2010 06:31 AM, Terry Dontje wrote: Actually there is a way to modify the configure file that will not

Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-30 Thread Terry Dontje
Ticket 2632 really spells out what the issue is. On 11/30/2010 10:23 AM, Prentice Bisbal wrote: Nehemiah Dacres wrote: that looks about right. So the suggestion: ./configure LDFLAGS="-notpath ... ... ..." -notpath should be replaced by whatever the proper flag should be, in my case -L ? Ye

Re: [OMPI users] Segmentation fault in mca_pml_ob1.so

2010-12-07 Thread Terry Dontje
I am not sure this has anything to do with your problem but if you look at the stack entry for PMPI_Recv I noticed the buf has a value of 0. Shouldn't that be an address? Does your code fail if the MPI library is built with -g? If it does fail the same way, the next step I would do would be

Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-10 Thread Terry Dontje
A more portable way of doing what you want below is to gather each processes processor_name given by MPI_Get_processor_name, have the root who gets this data assign unique numbers to each name and then scatter that info to the processes and have them use that as the color to a MPI_Comm_split ca

Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-10 Thread Terry Dontje
g method when the info is already available on every process. On Dec 10, 2010, at 3:36 AM, Terry Dontje wrote: A more portable way of doing what you want below is to gather each processes processor_name given by MPI_Get_processor_name, have the root who gets this data assign unique numbers t

Re: [OMPI users] Guaranteed run rank 0 on a given machine?

2010-12-10 Thread Terry Dontje
On 12/10/2010 01:46 PM, David Mathog wrote: The master is commonly very different from the workers, so I expected there would be something like --rank0-on but there doesn't seem to be a single switch on mpirun to do that. If "mastermachine" is the first entry in the hostfile, or the first m

Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-10 Thread Terry Dontje
On 12/10/2010 03:24 PM, David Mathog wrote: Ashley Pittman wrote: For a much simpler approach you could also use these two environment variables, this is on my current system which is 1.5 based, YMMV of course. OMPI_COMM_WORLD_LOCAL_RANK OMPI_COMM_WORLD_LOCAL_SIZE However that doesn't really

Re: [OMPI users] Newbie question

2011-01-11 Thread Terry Dontje
So are you trying to start an mpi job that one process is one executable and the other process(es) are something else? If so, you probably want to use a multiple app context. If you look at FAQ question 7. How do I run an MPMD MPI Job at http://www.open-mpi.org/faq/?category=running this sho

Re: [OMPI users] openmpi's mpi_comm_spawn integrated with sge?

2011-01-25 Thread Terry Dontje
On 01/25/2011 02:17 AM, Will Glover wrote: Hi all, I tried a google/mailing list search for this but came up with nothing, so here goes: Is there any level of automation between open mpi's dynamic process management and the SGE queue manager? In particular, can I make a call to mpi_comm_spawn

Re: [OMPI users] How closely tied is a specific release of OpenMPI to the host operating system and other system software?

2011-02-02 Thread Terry Dontje
On 02/01/2011 07:34 PM, Jeff Squyres wrote: On Feb 1, 2011, at 5:02 PM, Jeffrey A Cummings wrote: I'm getting a lot of push back from the SysAdmin folks claiming that OpenMPI is closely intertwined with the specific version of the operating system and/or other system software (i.e., Rocks on

Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-09 Thread Terry Dontje
This sounds like something I ran into some time ago that involved the compiler omitting frame pointers. You may want to try to compile your code with -fno-omit-frame-pointer. I am unsure if you may need to do the same while building MPI though. --td On 02/09/2011 02:49 PM, Dennis McRitchie

Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-11 Thread Terry Dontje
*On Behalf Of *Terry Dontje *Sent:* Wednesday, February 09, 2011 5:02 PM *To:* us...@open-mpi.org *Subject:* Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x This sounds like something I ran into some time ago that involved the compiler omitting fram

Re: [OMPI users] Error in Binding MPI Process to a socket

2011-03-18 Thread Terry Dontje
On 03/17/2011 03:31 PM, vaibhav dutt wrote: Hi, Thanks for your reply. I tried to execute first a process by using mpirun -machinefile hostfile.txt --slot-list 0:1 -np 1 but it gives the same as error as mentioned previously. Then, I created a rankfile with contents" rank 0=t1.tools.xxx

Re: [OMPI users] Error in Binding MPI Process to a socket

2011-03-18 Thread Terry Dontje
On 03/17/2011 03:31 PM, vaibhav dutt wrote: Hi, Thanks for your reply. I tried to execute first a process by using mpirun -machinefile hostfile.txt --slot-list 0:1 -np 1 but it gives the same as error as mentioned previously. Then, I created a rankfile with contents" rank 0=t1.tools.xxx

Re: [OMPI users] 1.5.3 and SGE integration?

2011-03-21 Thread Terry Dontje
Dave what version of Grid Engine are you using? The plm checks for the following env-var's to determine if you are running Grid Engine. SGE_ROOT ARC PE_HOSTFILE JOB_ID If these are not there during the session that mpirun is executed then it will resort to ssh. --td On 03/21/2011 08:24 AM,

Re: [OMPI users] mpi problems,

2011-04-04 Thread Terry Dontje
libfui.so is a library a part of the Solaris Studio FORTRAN tools. It should be located under lib from where your Solaris Studio compilers are installed from. So one question is whether you actually have Studio Fortran installed on all your nodes or not? --td On 04/04/2011 04:02 PM, Ralph C

Re: [OMPI users] orte-odls-default:execv-error

2011-04-05 Thread Terry Dontje
On 04/05/2011 05:11 AM, SLIM H.A. wrote: After an upgrade of our system I receive the following error message (openmpi 1.4.2 with gridengine): quote -- Sorry! You were supposed to get help about: orte-odls-default:e

Re: [OMPI users] orte-odls-default:execv-error

2011-04-05 Thread Terry Dontje
same is similarly true with LD_LIBRARY_PATH that you really shouldn't need to set that in your scripts/shell if you've compiled the programs such that the Rpath is correctly passed to the linker. --td Thanks Henk *From:*users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] *On Beh

Re: [OMPI users] alltoall messages > 2^26

2011-04-05 Thread Terry Dontje
It was asked during the community concall whether the below may be related to ticket #2722 https://svn.open-mpi.org/trac/ompi/ticket/2722? --td On 04/04/2011 10:17 PM, David Zhang wrote: Any error messages? Maybe the nodes ran out of memory? I know MPI implement some kind of buffering under

Re: [OMPI users] Not pointing to correct libraries

2011-04-05 Thread Terry Dontje
I am not sure Fedora comes with Open MPI installed on it by default (at least my FC13 did not). You may want to look at trying to install the Open MPI from yum or some other package mananger. Or you can download the source tarball from http://www.open-mpi.org/software/ompi/v1.4/, build and in

Re: [OMPI users] mpi problems,

2011-04-06 Thread Terry Dontje
Something looks fishy about your numbers. The first two sets of numbers look the same and the last set do look better for the most part. Your mpirun command line looks weird to me with the "-mca orte_base_help_aggregate btl,openib,self," did something get chopped off with the text copy? You

Re: [OMPI users] mpi problems,

2011-04-07 Thread Terry Dontje
ting *.a in the lib directory. none of those are equivilant becasue they are all linked with vampire trace if I am reading the names right. I've already tried putting /opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.mpi.a for this and it didnt work giving errors like On Wed, Apr 6, 2011

Re: [OMPI users] mpi problems,

2011-04-07 Thread Terry Dontje
compiler/arch you are using. --td On 04/07/2011 06:20 AM, Terry Dontje wrote: On 04/06/2011 03:38 PM, Nehemiah Dacres wrote: I am also trying to get netlib's hpl to run via sun cluster tools so i am trying to compile it and am having trouble. Which is the proper mpi library to give? naturally

Re: [OMPI users] Configure fail: OpenMPI/1.5.3 with Support for LSF using Sun Studio compilers

2011-04-07 Thread Terry Dontje
On 04/07/2011 06:16 AM, Paul Kapinos wrote: Dear OpenMPI developers, We tried to build OpenMPI 1.5.3 including Support for Platform LSF using the Sun Studio (=Oracle Solaris Studio now) /12.2 and the configure stage failed. 1. Used flags: ./configure --with-lsf --with-openib --with-devel-he

Re: [OMPI users] Configure fail: OpenMPI/1.5.3 with Support for LSF using Sun Studio compilers

2011-04-07 Thread Terry Dontje
On 04/07/2011 08:36 AM, Paul Kapinos wrote: Hi Terry, so, the attached ceil.c example file *can* be compiled by "CC" (the Studio C++ compiler), but *cannot* be compiled using "cc" (the Studio C compiler). $ CC ceil.c $ cc ceil.c Did you try to link in the math library -lm? When I did this

Re: [OMPI users] Configure fail: OpenMPI/1.5.3 with Support for LSF using Sun Studio compilers

2011-04-08 Thread Terry Dontje
Paul and I have been talking about the below issue and I thought it would be useful to update the list just in case someone else runs into this problem and ends up searching the email list before we actually fix the issue. The problem is OMPI's configure tests to see if -lm is needed to get m

Re: [OMPI users] OMPI vs. network socket communcation

2011-05-02 Thread Terry Dontje
On 04/30/2011 08:52 PM, Jack Bryan wrote: Hi, All: What is the relationship between MPI communication and socket communication ? MPI may use socket communications to do communications between two processes. Aside from that they are used for different purposes. Is the network socket program

Re: [OMPI users] OMPI vs. network socket communcation

2011-05-02 Thread Terry Dontje
On 05/02/2011 11:30 AM, Jack Bryan wrote: Thanks for your reply. MPI is for academic purpose. How about business applications ? There are quite a bit of non-academic MPI applications. For example there are quite a bit of simulation codes from different vendors that support MPI (Nastran is on

Re: [OMPI users] OpenMPI LS-DYNA Connection refused

2011-05-02 Thread Terry Dontje
On 05/02/2011 01:27 PM, Robert Walters wrote: Open-MPI Users, I've been using OpenMPI for a while now and am very pleased with it. I use the OpenMPI system across eight Red Hat Linux nodes (8 cores each) on 1 Gbps Ethernet behind a dedicated switch. After working out kinks in the beginning,

Re: [OMPI users] OpenMPI LS-DYNA Connection refused

2011-05-02 Thread Terry Dontje
On 05/02/2011 02:04 PM, Robert Walters wrote: Terry, I was under the impression that all connections are made because of the nature of the program that OpenMPI is invoking. LS-DYNA is a finite element solver and for any given simulation I run, the cores on each node must constantly communica

Re: [OMPI users] OpenMPI LS-DYNA Connection refused

2011-05-03 Thread Terry Dontje
--- *From:*users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] *On Behalf Of *Terry Dontje *Sent:* Monday, May 02, 2011 2:50 PM *To:* us...@open-mpi.org *Subject:* Re: [OMPI users] OpenMPI LS-DYNA Connection refused On 05/02/2011 02:04 PM, Robert Walters wrote: Terry, I was u

Re: [OMPI users] OpenMPI LS-DYNA Connection refused

2011-05-03 Thread Terry Dontje
-- Regards, Robert Walters *From:*users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] *On Behalf Of *Terry Dontje *Sent:* Monday, May

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje
Mudassar, You can do what you are asking. The receiver uses MPI_ANY_SOURCE for the source rank value and when you receive a message the status.MPI_SOURCE will contain the rank of the actual sender not the receiver's rank. If you are not seeing that then there is a bug somewhere. --td On 7

Re: [OMPI users] Open MPI & Grid Engine/Grid Scheduler thread binding

2011-07-15 Thread Terry Dontje
Here's, hopefully, more useful info. Note reading the job2core.pdf presentation, that was mentioned earlier, more closely will also clarify a couple points (I've put those points inline below). On 7/15/2011 12:01 AM, Ralph Castain wrote: On Jul 14, 2011, at 5:46 PM, Jeff Squyres wrote: Loo

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje
;s rank in status.MPI_SOURCE, but it is different than expected. I need to receive that message which was sent to me, not any message. regards, Date: Fri, 15 Jul 2011 06:33:41 -0400 From: Terry Dontje <mailto:terry.don...@oracle.com>> Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOU

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje
ave more than a broad description of the problem it is going to be nearly impossible for us to tell you what is wrong. --td regards, Mudassar Date: Fri, 15 Jul 2011 07:04:34 -0400 From: Terry Dontje <mailto:terry.don...@oracle.com>> Subject: Re: [OMPI users] Urgent Question regarding,

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje
comes to be wrong. This shows to me that messages on the receiving sides are captured on the basis of MPI_ANY_SOURCE, that seems like it does not see the destination of message while capturing it from message queue of the MPI system. regards, Mudassar --

Re: [OMPI users] Does Oracle Cluster Tools aka Sun's MPI work with LDAP?

2011-07-15 Thread Terry Dontje
On 7/15/2011 1:46 PM, Paul Kapinos wrote: Hi OpenMPI volks (and Oracle/Sun experts), we have a problem with Sun's MPI (Cluster Tools 8.2.x) on a part of our cluster. In the part of the cluster where LDAP is activated, the mpiexec does not try to spawn tasks on remote nodes at all, but exits

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje
..!! P1>> Received from P7, packet contains rank: 11 P1>> I could reach here ...!! P9>> I could reach here ...!! P2>> Received from P11, packet contains rank: 13 P2>> I could reach here ...!! P0>> I could reach here ...!! P11>> I could reach here ..

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-17 Thread Terry Dontje
, it was working fine. Then I understood that the problem is somewhere else. I found that problem. Thanks to all of you people. regards, Mudassar *From:* Terry Dontje *To:* Jeff Squyres *Cc:* Mudassar Majeed ; Open MPI

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE
Some more info would be nice like: -What version of ompi are you using -What type of machine and os are you running on -What does the machine file look like -Is there a stack trace left behind by the pid that seg faulted? --td On 10/25/2011 8:07 AM, Mouhamad Al-Sayed-Ali wrote: Hello, I have t

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE
Can you run wrf successfully on one node? Can you run a simple code across your two nodes? I would try hostname then some simple MPI program like the ring example. --td On 10/25/2011 9:05 AM, Mouhamad Al-Sayed-Ali wrote: hello, -What version of ompi are you using I am using ompi version

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE
This looks more like a seg fault in wrf and not OMPI. Sorry not much I can do here to help you. --td On 10/25/2011 9:53 AM, Mouhamad Al-Sayed-Ali wrote: Hi again, This is exactly the error I have: taskid: 0 hostname: part034.u-bourgogne.fr [part034:21443] *** Process received signal **

Re: [OMPI users] Changing plm_rsh_agent system wide

2011-10-26 Thread TERRY DONTJE
I am using prefix configuration so no it does not exist in /usr. --td On 10/26/2011 10:44 AM, Ralph Castain wrote: Did the version you are running get installed in /usr? Sounds like you are picking up a different version when running a command - i.e., that your PATH is finding a different ins

Re: [OMPI users] Changing plm_rsh_agent system wide

2011-10-26 Thread TERRY DONTJE
Sorry please disregard my reply to this email. :-) --td On 10/26/2011 10:44 AM, Ralph Castain wrote: Did the version you are running get installed in /usr? Sounds like you are picking up a different version when running a command - i.e., that your PATH is finding a different installation tha

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-04 Thread TERRY DONTJE
David, are you saying your jobs consistently leave behind session files after the job exits? It really shouldn't even in the case when a job aborts, I thought, mpirun took great pains to cleanup after itself. Can you tell us what version of OMPI you are running with? I think I could see ki

  1   2   >