Re: [OMPI users] Connection timed out with multiple nodes

2014-01-17 Thread Ralph Castain
The most common cause of this problem is a firewall between the nodes - you can ssh across, but not communicate. Have you checked to see that the firewall is turned off? On Jan 17, 2014, at 4:59 PM, Doug Roberts wrote: > > 1) When openmpi programs run across multiple nodes they hang > rather

[OMPI users] Connection timed out with multiple nodes

2014-01-17 Thread Doug Roberts
1) When openmpi programs run across multiple nodes they hang rather quickly as shown in the mpi_test example below. Note that I am assuming the initital topology error message is a separate issue since single node openmpi jobs run just fine. [roberpj@bro127:~/samples/mpi_test] /opt/sharcnet/op

Re: [OMPI users] How to use non-primitive types with Java binding

2014-01-17 Thread Saliya Ekanayake
Thank you and this makes sense. In fact we've been trying to avoid serialization as much as possible because we found it to be a bottleneck. Anyway I wonder if there are some samples illustrating the use of complex structures in OpenMPI Thank you, Saliya On Jan 17, 2014 5:20 PM, "Oscar Vega-Gisbe

Re: [OMPI users] openmpi-1.6.5 intel 14.0 MPI-IO Errors

2014-01-17 Thread Brock Palen
BAH, The error persisted when doing the test to /tmp/ (local disk) I rebuilt the library with the same compiler and all is well now. Sorry for the false alarm. Thanks for the help and ideas Jeff. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (7

Re: [OMPI users] How to use non-primitive types with Java binding

2014-01-17 Thread Oscar Vega-Gisbert
MPI.OBJECT is no longer supported because of it was based on serialization, and it made the java bindings more complicated. It brought more problems than benefits. For example, it was necessary a shadow communicator... You can define complex struct data using direct buffers and avoiding s

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-17 Thread Ralph Castain
Sorry for delay - I understood and was just occupied with something else for a while. Thanks for the follow-up. I'm looking at the issue and trying to decipher the right solution. On Jan 17, 2014, at 2:00 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > > I'm sorry that my explanatio

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-17 Thread tmishima
Hi Ralph, I'm sorry that my explanation was not enough ... This is the summary of my situation: 1. I create a hostfile as shown below manually. 2. I use mpirun to start the job without Torque, which means I'm running in an un-managed environment. 3. Firstly, ORTE detects 8 slots on each host(

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Ronald Cohen
Thanks. For what it is worth, it looks like I now have a successful build of open-mpi plus hdf5. With the caveat (see pasted note below from the HDF5 support desk) about make check-p from hdf5. For anyone else trying to get hdf5 going with openmpi on mavericks, here are the configure combinatio

Re: [OMPI users] openmpi-1.6.5 intel 14.0 MPI-IO Errors

2014-01-17 Thread Jeff Squyres (jsquyres)
Brock and I chatted off list. I'm unable to replicate the error, but I have icc 14.0.1, not 14.0. I also don't have Lustre, which is his base case. So there's at least 2 variables here that need to be resolved. On Jan 9, 2014, at 11:46 AM, Brock Palen wrote: > Attached you will find a small

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-17 Thread tmishima
No, I didn't use Torque this time. This issue is caused only when it is not in the managed environment - namely, orte_managed_allocation is false (and orte_set_slots is NULL). Under the torque management, it works fine. I hope you can understand the situation. Tetsuya Mishima > I'm sorry, bu

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Ralph Castain
We did update ROMIO at some point in there, so it is possible this is a ROMIO bug that we have picked up. I've asked someone to check upstream about it. On Jan 17, 2014, at 12:02 PM, Ronald Cohen wrote: > Sorry, too many entries in this thread, I guess. My general goal is to get a > working

Re: [OMPI users] openmpi-1.6.5 intel 14.0 MPI-IO Errors

2014-01-17 Thread Ralph Castain
Afraid I don't have access to Intel compilers (oddly enough), but my immediate thought would be that there is some variable size difference - possibly a default change between 64 and 32 bit for "int"? Your output offset just looks to me like you wrapped the field. I believe the MPI interfaces a

Re: [OMPI users] openmpi-1.6.5 intel 14.0 MPI-IO Errors

2014-01-17 Thread Brock Palen
I never saw any replies on this. Has anyone else been able to produce this sort of error? It is 100% reproducible for me. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Jan 9, 2014, at 11:46 AM, Brock Palen wrote: > Attached

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Ronald Cohen
Sorry, too many entries in this thread, I guess. My general goal is to get a working parallel hdf5 with openmpi on Mac OS X Mavericks. At one point in the saga I had romio disabled, which naturally doesn't work for hdf5 (which is trying to read/write files in parallel). So the hdf5 tests would o

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Jeff Squyres (jsquyres)
Can you specify exactly which issue you're referring to? - test failing when you had ROMIO disabled - test (sometimes) failing when you had ROMIO disabled - compiling / linking issues ? On Jan 17, 2014, at 1:50 PM, Ronald Cohen wrote: > Hello Ralph and others, I just got the following back fr

Re: [OMPI users] Calling a variable from another processor

2014-01-17 Thread Jeff Hammond
The attached version is modified to use passive target, which does not require collective synchronization for remote access. Note that I didn't compile and run this and don't write MPI in Fortran so there may be syntax errors. Jeff On Thu, Jan 16, 2014 at 11:03 AM, Christoph Niethammer wrote: >

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Ronald Cohen
Hello Ralph and others, I just got the following back from the HDF-5 support group, suggesting an ompi bug. So I should either try 1.7.3 or a recent nightly 1.7.4.Will likely opt for 1.7.3, but hopefully someone at openmpi can look at the problem for 1.7.4. In short, the challenge is to get

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Ronald Cohen
I figured that. On Fri, Jan 17, 2014 at 10:26 AM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > On Jan 17, 2014, at 1:17 PM, Jeff Squyres (jsquyres) > wrote: > > > 3. --enable-shared is *not* implied by --enable-static. So if you > --enable-static without --disable-shared, you're buil

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Jeff Squyres (jsquyres)
On Jan 17, 2014, at 1:17 PM, Jeff Squyres (jsquyres) wrote: > 3. --enable-shared is *not* implied by --enable-static. So if you > --enable-static without --disable-shared, you're building both libmpi.so and > libmpi.a (both of which will have the plugins slurped up -- no DSOs). Which > is no

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Ronald Cohen
Very helpful, thanks. On Fri, Jan 17, 2014 at 10:17 AM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > Ok, thanks. A few notes: > > 1. --enable-static implies --disable-dlopen. Specifically: > --enable-static does two things: > > 1a. Build libmpi.a (and friends) > 1b. Slurp all the OMP

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Ronald Cohen
Good suggestions, and thanks! But since I haven't been able to get the problem to recur and I'm stuck now on other issues related to getting parallel hdf5 to pass its make check, I will likely not follow up on this particular (non-recurring) issue (except maybe I should forward your comments to t

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Jeff Squyres (jsquyres)
Ok, thanks. A few notes: 1. --enable-static implies --disable-dlopen. Specifically: --enable-static does two things: 1a. Build libmpi.a (and friends) 1b. Slurp all the OMPI plugins into libmpi.a (and friends), vs. building them as standalone dynamic shared object (DSO) files (this is half of

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Jeff Squyres (jsquyres)
I'm looking at your code, and I'm not actually an expert in the MPI IO sutff... but do you have a race condition in the file close+delete and the open with EXCL? I'm asking because I don't know offhand if the the file close+delete is supposed to be collective and not return until the file is gu

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Ronald Cohen
Thanks, I've just gotten an email with some suggestions (and promise of more help) from the HDF5 support team. I will report back here, as it may be of interest to others trying to build hdf5 on mavericks. On Fri, Jan 17, 2014 at 9:08 AM, Ralph Castain wrote: > Afraid I have no idea, but hope

Re: [OMPI users] hosfile issue of openmpi-1.7.4rc2

2014-01-17 Thread Ralph Castain
I'm sorry, but I'm really confused, so let me try to understand the situation. You use Torque to get an allocation, so you are running in a managed environment. You then use mpirun to start the job, but pass it a hostfile as shown below. Somehow, ORTE believes that there is only one slot on eac

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Ralph Castain
Afraid I have no idea, but hopefully someone else here with experience with HDF5 can chime in? On Jan 17, 2014, at 9:03 AM, Ronald Cohen wrote: > Still a timely response, thank you.The particular problem I noted hasn't > recurred; for reasons I will explain shortly I had to rebuild openmp

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Ronald Cohen
Still a timely response, thank you.The particular problem I noted hasn't recurred; for reasons I will explain shortly I had to rebuild openmpi again, and this time Sample_mpio.c compiled and ran successfully from the start. But now my problem is trying to get parallel HDF5 to run. In my first

Re: [OMPI users] Can't get a fully functional openmpi build on Mac OSX Mavericks

2014-01-17 Thread Ralph Castain
sorry for delayed response - just getting back from travel. I don't know why you would get that behavior other than a race condition. Afraid that code path is foreign to me, but perhaps one of the folks in the MPI-IO area can respond On Jan 15, 2014, at 4:26 PM, Ronald Cohen wrote: > Update:

Re: [OMPI users] cluster checkpoint error

2014-01-17 Thread Ralph Castain
with what version of OMPI? On Jan 17, 2014, at 3:23 AM, basma a.azeem wrote: > > > i am trying to run Blcr with Open mpi on a cluster of 4 nodes > blcr version 0.8.5 > when i run the command : > > mpirun -np 4 -am ft-enable-cr -hostfile hosts > /home/ubuntu//N/NPB3.3-MPI/bin/bt.B.4 >

[OMPI users] MPI hangs when application compiled with -O3, runs fine with -O0

2014-01-17 Thread Julien Bodart
version: 1.6.5 (compiled with Intel compilers) command used: mpirun --machinefile mfile --debug-daemons -np 16 myapp Description of the problem: When myapp is compiled without optimizations everything runs fine if compiled with -O3, then the application hangs. I cannot reproduce the problem with

[OMPI users] FW: cluster checkpoint error

2014-01-17 Thread basma a . azeem
i am trying to run Blcr with Open mpi on a cluster of 4 nodes blcr version 0.8.5 when i run the command : mpirun -np 4 -am ft-enable-cr -hostfile hosts /home/ubuntu//N/NPB3.3-MPI/bin/bt.B.4 i got this error : - It looks like opal_init failed for some reason;

Re: [OMPI users] Calling a variable from another processor

2014-01-17 Thread Pradeep Jha
Thanks a ton Christoph. That helps a lot. 2014/1/17 Christoph Niethammer > Hello, > > Find attached a minimal example - hopefully doing what you intended. > > Regards > Christoph > > -- > > Christoph Niethammer > High Performance Computing Center Stuttgart (HLRS) > Nobelstrasse 19 > 70569 Stutt