Re: [OMPI users] an error when running MPI on 2 machines

2013-02-11 Thread Jeff Squyres (jsquyres)
On Feb 11, 2013, at 10:17 PM, Paul Gribelyuk wrote: > UPDATE: I wish I could reproduce the error, because now it's gone and I can > run the same program from each machine in the hostfile. Good! > I would still be very interested to know what kind of MPI situations are > likely to cause thes

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Jeff Squyres (jsquyres)
On Feb 11, 2013, at 3:41 PM, "Beatty, Daniel D CIV NAVAIR, 474300D" wrote: > The Intel+PPC is one issue. However, even on Intel, there tends to be a > distinction between Intel environments going from Xeon to Core iX > environments. While Objective-C/C/C++ handle this well, the Fortran > compil

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Jeff Squyres (jsquyres)
I got your tarball (no need to re-send it). I'm a little confused by your output from make, though. Did you run autogen? If so, there's no need to do that -- try expanding a fresh tarball and just running ./configure and make. On Feb 11, 2013, at 10:03 PM, Mark Bolstad wrote: > I packed t

Re: [OMPI users] Fwd: an error when running MPI on 2 machines

2013-02-11 Thread Paul Gribelyuk
Hi Jeff, Thank you for your email. The program make an MPI_Reduce call as the only form of explicit communication between machines… I said it was simple because it's effectively a very trivial distributed computation for me to learn MPI. I am using the same version, by doing "brew install open

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Mark Bolstad
I packed the compile info as requested but the message is to big. Changing the compression didn't help. I can split it, or do you just want to approve it out of the hold queue? Mark On Mon, Feb 11, 2013 at 3:03 PM, Jeff Squyres (jsquyres) wrote: > On Feb 11, 2013, at 2:46 PM, Mark Bolstad > wr

Re: [OMPI users] mpirun completes for one user, not for another

2013-02-11 Thread Daniel Fetchinson
Thanks a lot, this was exactly the problem: > Make sure that the PATH really is identical between users -- especially for > non-iteractive logins. E.g.: > > env Here PATH was correct. > vs. > > ssh othernode env Here PATH was not correct. The PATH was set in .bash_profile and apparently in non

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Beatty, Daniel D CIV NAVAIR, 474300D
Hi Jeff, The Intel+PPC is one issue. However, even on Intel, there tends to be a distinction between Intel environments going from Xeon to Core iX environments. While Objective-C/C/C++ handle this well, the Fortran compilers have given me a different story over the years. It tends to be the case

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Jeff Squyres (jsquyres)
On Feb 11, 2013, at 2:46 PM, Mark Bolstad wrote: > That's what I noticed, no .so's (actually, I noticed that the dlname in the > .la file is empty. thank you, dtruss) Please send all the information listed here: http://www.open-mpi.org/community/help/ > I've built it two different ways: >

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Mark Bolstad
That's what I noticed, no .so's (actually, I noticed that the dlname in the .la file is empty. thank you, dtruss) I've built it two different ways: --disable-mpi-f77 and --prefix=/Users/bolstadm/papillon/build/macosx-x86_64/Release/openmpi-1.6.3 --disable-mpi-f77 --with-openib=no --enable-shared

Re: [OMPI users] mmap and MPI_File_Read

2013-02-11 Thread Jeff Squyres (jsquyres)
On Feb 2, 2013, at 3:52 AM, Andreas Bok Andersen wrote: > I am using Open-MPI in a parallelization of matrix multiplication for large > matrices. > My question is: > - Is MPI_File_read using mmapping under the hood when reading a binary file. Sorry for the delay in replying; my INBOX is a d

Re: [OMPI users] Simple MPI hello world hangs over IB

2013-02-11 Thread Jeff Squyres (jsquyres)
On Feb 4, 2013, at 10:55 AM, Bharath Ramesh wrote: > I am trying to debug an issue which is really weird. I have > simple MPI hello world application (attached) that hangs when I > try to run on our cluster using 256 nodes with 16 cores on each > node. The cluster uses QDR IB. > > I am able to r

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Jeff Squyres (jsquyres)
On Feb 11, 2013, at 1:11 PM, "Beatty, Daniel D CIV NAVAIR, 474300D" wrote: > There are two issues that have concerned me. One is universal capabilities, > namely ensuring that the library allows the same results for binaries in both > any of their universal compiled forms. Not sure what y

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Jeff Squyres (jsquyres)
Ah -- your plugins are all .a files. How did you configure/build Open MPI? On Feb 11, 2013, at 11:09 AM, Mark Bolstad wrote: > It's not just one plugin, it was about 6 of them. I just deleted the error > message from the others as I believed that opal_init was the problem. > > However, I hav

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Beatty, Daniel D CIV NAVAIR, 474300D
Greetings Fellow MPI users, I may need to get involved here on this issue also. I will need to do a similar number for Mountain Lion/ and regular Lion. I am still a little bit in design phase at this time so I am paying close attention to this thread. There are two issues that have concerned me.

Re: [OMPI users] error when running mpirun

2013-02-11 Thread albatr...@gmail.com
Hi Ralph , Thanks for the reply.. In fact it worked after installing the libnuma rpm . Thanks a ton.. cheers Satya On Mon, Feb 11, 2013 at 1:40 AM, Ralph Castain wrote: > The error message indicates that libnuma was not installed on at least one > node. That's a system library, n

Re: [OMPI users] mpirun completes for one user, not for another

2013-02-11 Thread Jeff Squyres (jsquyres)
Make sure that the PATH really is identical between users -- especially for non-iteractive logins. E.g.: env vs. ssh othernode env Also check the LD_LIBRARY_PATH. On Feb 11, 2013, at 7:11 AM, Daniel Fetchinson wrote: > Hi folks, > > I have a really strange problem: a super simple MPI t

Re: [OMPI users] mpirun completes for one user, not for another

2013-02-11 Thread Jeff Squyres (jsquyres)
Make sure that the PATH really is identical between users -- especially for non-iteractive logins. E.g.: env vs. ssh othernode env Also check the LD_LIBRARY_PATH. On Feb 11, 2013, at 7:11 AM, Daniel Fetchinson wrote: > Hi folks, > > I have a really strange problem: a super simple MPI t

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Mark Bolstad
It's not just one plugin, it was about 6 of them. I just deleted the error message from the others as I believed that opal_init was the problem. However, I have done a full build multiple times and have blown away all the plugins and other remnants of the build and install and get the same results

[OMPI users] MPI_FILE_READ: wrong file-size does not raise an exception

2013-02-11 Thread Stefan Mauerberger
Hi Everyone! Playing around with MPI_FILE_READ() puzzles me a little. To catch all errors I set the error-handler - the one which is related to file I/O - to MPI_ERRORS_ARE_FATAL. However, when reading from a file which has not the necessary size MPI_FILE_READ(...) returns 'MPI_SUCCESS: no errors

Re: [OMPI users] newbie: Submitting Open MPI jobs to SGE ( `qsh, -pe orte 4` fails)

2013-02-11 Thread Reuti
Am 11.02.2013 um 12:26 schrieb Pierre Lindenbaum: > > and I've changed `shell_start_mode posix_compliant` to `unix_behavior ` > using `qconf -mconf`. (However, shell_start_mode is still listed as > posix_compliant ) AFAIK this is deprecated on the configuration level, as it moved to th

Re: [OMPI users] Building 1.6.3 on OS X 10.8

2013-02-11 Thread Jeff Squyres (jsquyres)
That's very idd; I cant think of why that would happen offhand. I build and run all the time on ML with no problems. Can you deleted that plugin and run ok? Sent from my phone. No type good. On Feb 10, 2013, at 10:22 PM, "Mark Bolstad" wrote: > I having some difficulties with building/running

Re: [OMPI users] running mpi job..

2013-02-11 Thread Jeff Squyres (jsquyres)
Can you provide all the information in http://www.open-mpi.org/community/help/ ? Sent from my phone. No type good. On Feb 10, 2013, at 12:14 PM, "satya k" mailto:satya5...@gmail.com>> wrote: Hi everyone out there, I am a Newbie to HPC, we have a couple of HPC clusters where I work. S

Re: [OMPI users] Fwd: an error when running MPI on 2 machines

2013-02-11 Thread Jeff Squyres (jsquyres)
Can you provide any more detail? Your report looks weird - you said its a simple c++ hello world, but the executable you show is "pi", which is typically a simple C example program. Are you using the same version of open MPI on all nodes? Are you able to run n way jobs on single nodes? Sen

Re: [OMPI users] [Open MPI] #3493: Handle the case where rankfile provides theallocation

2013-02-11 Thread Jeff Squyres (jsquyres)
Sweet! Sent from my phone. No type good. On Feb 11, 2013, at 1:39 AM, "Siegmar Gross" wrote: > Hi > >> #3493: Handle the case where rankfile provides the allocation >> ---+- >> Reporter: rhc | Owner: ompi

[OMPI users] mpirun completes for one user, not for another

2013-02-11 Thread Daniel Fetchinson
Hi folks, I have a really strange problem: a super simple MPI test program (see below) runs successfully for all users when executed on 4 processes in 1 node, but hangs for user A and runs successfully for user B when executed on 8 processes in 2 nodes. The executable used is the same and the appf

Re: [OMPI users] newbie: Submitting Open MPI jobs to SGE ( `qsh, -pe orte 4` fails)

2013-02-11 Thread Pierre Lindenbaum
This is a good sign, as it tries to use `qrsh -inherit ...` already. Can you confirm the following settings: $ qconf -sp orte ... control_slaves TRUE $ qconf -sq all.q ... shell_start_mode unix_behavior -- Reuti qconf -sp orte pe_nameorte slots 4

Re: [OMPI users] how to find the binding of each rank on the local machine

2013-02-11 Thread Jeff Squyres (jsquyres)
Remember that OMPI 1.6.x is our stable series; we're no longer adding new features to it -- only bug fixes. What Ralph described is available on the OMPI SVN trunk HEAD (i.e., what will eventually become the v1.9 series). It may also be available in the upcoming v1.7 series; I'm not sure if we

Re: [OMPI users] how to find the binding of each rank on the local machine

2013-02-11 Thread Kranthi Kumar
Sir, I was following your discussion. Brice Sir's explanation of what I want is correct. Your last reply was asking me to look for ompi_proc_t for the process in the proc_flafs field if I am correct. You said that the defintion of the values will be in opal/mca/hwloc/hwloc.h. I checked in this f

Re: [OMPI users] [Open MPI] #3493: Handle the case where rankfile provides theallocation

2013-02-11 Thread Siegmar Gross
Hi > #3493: Handle the case where rankfile provides the allocation > ---+- > Reporter: rhc | Owner: ompi-gk1.6 > Type: changeset move request | Status: closed > Priority: critical|