Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-11 Thread Saliya Ekanayake
I forgot to mention that I tried the hello.c version instead of Java and it too failed in a similar manner, but 1. On a single node with --mca btl ^tcp it went up to 24 procs before failing 2. On 8 nodes with --mca btl ^tcp it could go only up to 16 procs On Tue, Mar 11, 2014 at 5:06 PM, Saliya

Re: [OMPI users] incorrect verbose output in bind_downwards

2014-03-11 Thread Ralph Castain
Good catch - thanks Ralph On Mar 11, 2014, at 5:29 PM, tmish...@jcity.maeda.co.jp wrote: > > > Ralph, sorry. I missed a problem in the hwloc_base_util.c file. > The "static int build_map" still depends on the opal_hwloc_topology. > (Please see attached patch file) > > (See attached file: patch

Re: [OMPI users] incorrect verbose output in bind_downwards

2014-03-11 Thread tmishima
Ralph, sorry. I missed a problem in the hwloc_base_util.c file. The "static int build_map" still depends on the opal_hwloc_topology. (Please see attached patch file) (See attached file: patch.hwloc_base_util) Tetsuya > Ralph, sorry for late confirmation. It worked for me, thanks. > > Tetsuya >

[OMPI users] FW: LOCAL QP OPERATION ERROR

2014-03-11 Thread Joshua Ladd
Hi, Vince Have you tried with a different BTL? In particular, have you tried with the TCP BTL? Please try setting "-mca btl sm,self,tcp" and see if you still run into the issue. How is your OMPI configured? Josh > From: Vince Grimes > Subject: [OMPI users] LOCAL QP OPERATION ERROR > Date

Re: [OMPI users] incorrect verbose output in bind_downwards

2014-03-11 Thread tmishima
Ralph, sorry for late confirmation. It worked for me, thanks. Tetsuya > I fear that would be a bad thing to do as it would disrupt mpirun's operations. However, I did fix the problem by adding the topology as a param to the pretty-print functions. Please see: > > https://svn.open-mpi.org/trac/o

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-11 Thread Saliya Ekanayake
I just tested with "ml" turned off as you suggested, but unfortunately it didn't solve the issue. However, I found that by explicitly setting --mca btl ^tcp the code worked on upto 4 nodes with each running 8 procs. If I don't specify this it'll simply fail even on one node with 8 procs. Thank yo

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-11 Thread Jeff Squyres (jsquyres)
Looks like we still have a bug in one of our components -- can you try: mpirun --mca coll ^ml ... This will deactivate the "ml" collective component. See if that enables you to run (this particular component has nothing to do with Java). On Mar 11, 2014, at 1:33 AM, Saliya Ekanayake wrot

Re: [OMPI users] ssh error

2014-03-11 Thread raha khalili
Very thanks to Mehdi and Reuti for your helps. On Tue, Mar 11, 2014 at 3:46 PM, Mehdi Rahmani wrote: > Hi > use --hostfile or --machinefile in your command > mpirun *--hostfile* texthost -np 2 /home/client3/espresso-5.0.2/bin/pw.x > -in AdnAu.rx.in | tee AdnAu.rx.out > > > On Tue, Mar 11, 2014

Re: [OMPI users] Compiling Open MPI 1.7.4 using PGI 14.2 and Mellanox HCOLL enabled

2014-03-11 Thread Jeff Squyres (jsquyres)
On Mar 11, 2014, at 11:22 AM, Åke Sandgren wrote: >>> ../configure CC=pgcc CXX=pgCC FC=pgf90 F90=pgf90 >>> --prefix=/usr/local/Cluster-Users/fs395/openmpi-1.7.4/pgi-14.2_cuda-6.0RC >>> --enable-mpirun-prefix-by-default --with-hcoll=$HCOLL_DIR >>> --with-fca=$FCA_DIR --with-mxm=$MXM_DIR --wit

Re: [OMPI users] Compiling Open MPI 1.7.4 using PGI 14.2 and Mellanox HCOLL enabled

2014-03-11 Thread Åke Sandgren
On 03/11/2014 04:12 PM, Jeff Squyres (jsquyres) wrote: I don't see the config.log and make.log attached - can you send all the info requested here (including config.log and config.out): http://www.open-mpi.org/community/help/ Can you also send "make V=1" output as well? On Feb 25, 2014,

Re: [OMPI users] Compiling Open MPI 1.7.4 using PGI 14.2 and Mellanox HCOLL enabled

2014-03-11 Thread Jeff Squyres (jsquyres)
I don't see the config.log and make.log attached - can you send all the info requested here (including config.log and config.out): http://www.open-mpi.org/community/help/ Can you also send "make V=1" output as well? On Feb 25, 2014, at 6:22 PM, Filippo Spiga wrote: > Dear all, > > I cam

Re: [OMPI users] Problems with computation-communication overlap in non-blocking mode

2014-03-11 Thread Velickovic Nikola
Alex, Jeff thanks for your answers. I understand better now how OMPI works. The results I see now make much more sense. Best, Nikola From: users [users-boun...@open-mpi.org] on behalf of Jeff Squyres (jsquyres) [jsquy...@cisco.com] Sent: Tuesday, March

Re: [OMPI users] Problems with computation-communication overlap in non-blocking mode

2014-03-11 Thread Jeff Squyres (jsquyres)
Yes, you're seeing more-or-less the expected behavior. It's a complicated issue. Short version: you might want to sprinkle MPI_Test's throughout your compute stage to get true overlap. More detail: MPI's typically use a "rendezvous" protocol for large messages, meaning that it sends a small f

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-11 Thread Ralph Castain
Seems odd - the Java code is passing all tests on my Linux boxes. A quick glance shows it failing on memcpy on your machine during MPI_Init, which would make one suspect either an uninitialized variable or something not getting loaded correctly. Oscar, Jose? Any thoughts? Ralph On Mar 10, 201

Re: [OMPI users] ssh error

2014-03-11 Thread Mehdi Rahmani
Hi use --hostfile or --machinefile in your command mpirun *--hostfile* texthost -np 2 /home/client3/espresso-5.0.2/bin/pw.x -in AdnAu.rx.in | tee AdnAu.rx.out On Tue, Mar 11, 2014 at 1:35 PM, raha khalili wrote: > Dear users > > I want to run a quantum espresso program (with passwordless ssh). I

Re: [OMPI users] ssh error

2014-03-11 Thread Reuti
Hi, Am 11.03.2014 um 11:05 schrieb raha khalili: > I want to run a quantum espresso program (with passwordless ssh). I prepared > a hostfile named 'texthost' at my input directory. I get this error when I > run the program: > > texthost: > # This is a hostfile. > # I have 4 syetems are parall

[OMPI users] ssh error

2014-03-11 Thread raha khalili
Dear users I want to run a quantum espresso program (with passwordless ssh). I prepared a hostfile named 'texthost' at my input directory. I get this error when I run the program: texthost: # This is a hostfile. # I have 4 syetems are paralleled by mpich2 # The following nodes are that machines I

Re: [OMPI users] Problems with computation-communication overlap innon-blocking mode

2014-03-11 Thread Alex A. Granovsky
Dear Nikola, you can check this presentation: http://classic.chem.msu.su/gran/gamess/mp2par.pdf for the solution we have been using with Firefly (formerly PC GAMESS) for more than last ten years. Hope this helps. Kind regards, Alex Granovsky -Original Message- From: Velickovic N

Re: [OMPI users] SIGSEV when running OMPI Java binding

2014-03-11 Thread Saliya Ekanayake
Just tested that this happens even with the simple Hello.java program given in OMPI distribution. I've made a tarball containing details of the error adhering to http://www.open-mpi.org/community/help/. Please let me know if I have missed any info necessary. Thank you, Saliya On Mon, Mar 10,