Re: [OMPI users] Problem with OpenMPI as Third pary library

2022-08-10 Thread Benson Muite via users
10, 2022 3:26 AM *To:* Jeff Squyres (jsquyres) *Subject:* Re: [OMPI users] Problem with OpenMPI as Third pary library Hello, I tried what is explained there (changed OPAL_PREFIX to the new location and changed the rpath of my executable), I even added the flags --with-hwloc and the other one becaus

Re: [OMPI users] Problem with OpenMPI as Third pary library

2022-08-09 Thread Jeff Squyres (jsquyres) via users
I can't see the image that you sent; it seems to be broken. But I think you're asking about this: https://www.open-mpi.org/faq/?category=building#installdirs -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Sebastian Gutierrez via users Sent: Tuesda

Re: [OMPI users] Problem in starting openmpi job - no output just hangs - SOLVED

2020-09-01 Thread Tony Ladd via users
Jeff I found the solution - rdma needs significant memory so the limits on the shell have to be increased. I needed to add the lines * soft memlock unlimited * hard memlock unlimited to the end of the file /etc/security/limits.conf. After that the openib driver loads and everything is fine -

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-25 Thread Jeff Squyres (jsquyres) via users
On Aug 24, 2020, at 9:44 PM, Tony Ladd wrote: > > I appreciate your help (and John's as well). At this point I don't think is > an OMPI problem - my mistake. I think the communication with RDMA is somehow > disabled (perhaps its the verbs layer - I am not very knowledgeable with > this). It us

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-25 Thread John Hearns via users
I apologise. That was an Omnipath issue https://www.beowulf.org/pipermail/beowulf/2017-March/034214.html On Tue, 25 Aug 2020 at 08:17, John Hearns wrote: > Aha. I dimly remember a problem with the ibverbs /dev device - maybe the > permissions, > or more likely the owner account for that device.

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-25 Thread John Hearns via users
Aha. I dimly remember a problem with the ibverbs /dev device - maybe the permissions, or more likely the owner account for that device. On Tue, 25 Aug 2020 at 02:44, Tony Ladd wrote: > Hi Jeff > > I appreciate your help (and John's as well). At this point I don't think > is an OMPI problem - m

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-24 Thread Tony Ladd via users
Hi Jeff I appreciate your help (and John's as well). At this point I don't think is an OMPI problem - my mistake. I think the communication with RDMA is somehow disabled (perhaps its the verbs layer - I am not very knowledgeable with this). It used to work like a dream but Mellanox has appare

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-24 Thread Jeff Squyres (jsquyres) via users
I'm afraid I don't have many better answers for you. I can't quite tell from your machines, but are you running IMB-MPI1 Sendrecv *on a single node* with `--mca btl openib,self`? I don't remember offhand, but I didn't think that openib was supposed to do loopback communication. E.g., if both M

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-23 Thread Tony Ladd via users
Hi John Thanks for the response. I have run all those diagnostics, and as best I can tell the IB fabric is OK. I have a cluster of 49 nodes (48 clients + server) and the fabric passes all the tests. There is 1 warning: I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-23 Thread John Hearns via users
Tony, start at a low level. Is the Infiniband fabric healthy? Run ibstatus on every node sminfo on one node ibdiagnet on one node On Sun, 23 Aug 2020 at 05:02, Tony Ladd via users wrote: > Hi Jeff > > I installed ucx as you suggested. But I can't get even the simplest code > (ucp_client_server

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-22 Thread Tony Ladd via users
Hi Jeff I installed ucx as you suggested. But I can't get even the simplest code (ucp_client_server) to work across the network. I can compile openMPI with UCX but it has the same problem - mpi codes will not execute and there are no messages. Really, UCX is not helping. It is adding another

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-19 Thread Jeff Squyres (jsquyres) via users
Tony -- Have you tried compiling Open MPI with UCX support? This is Mellanox (NVIDIA's) preferred mechanism for InfiniBand support these days -- the openib BTL is legacy. You can run: mpirun --mca pml ucx ... > On Aug 19, 2020, at 12:46 PM, Tony Ladd via users > wrote: > > One other updat

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-19 Thread Tony Ladd via users
One other update. I compiled OpenMPI-4.0.4 The outcome was the same but there is no mention of ibv_obj this time. Tony -- Tony Ladd Chemical Engineering Department University of Florida Gainesville, Florida 32611-6005 USA Email: tladd-"(AT)"-che.ufl.edu Webhttp://ladd.che.ufl.edu Tel:

Re: [OMPI users] Problem in starting openmpi job - no output just hangs

2020-08-17 Thread Tony Ladd via users
My apologies - I did not read the FAQ's carefully enough - with regard to 14: 1. openib 2. Ubuntu supplied drivers etc. 3. Ubuntu 18.04  4.15.0-112-generic 4. opensm-3.3.5_mlnx-0.1.g6b18e73 5. Attached 6. Attached 7. unlimited on foam and 16384 on f34 I changed the ulimit to unlimited on

Re: [OMPI users] Problem with open-mpi installation

2020-06-05 Thread Jeff Squyres (jsquyres) via users
Are you actually running into a problem? A successful install may still end with "Nothing to be done..." messages. On Jun 5, 2020, at 10:48 AM, Edris Tajfirouzeh via users mailto:users@lists.open-mpi.org>> wrote: Dear Operator I'm trying to install open-mpi package on my mac catalina version

Re: [OMPI users] problem with cancelling Send-Request

2019-10-02 Thread Jeff Hammond via users
“Supposedly faster” isn’t a particularly good reason to change MPI implementations but canceling sends is hard for reasons that have nothing to do with performance. Also, I’d not be so eager to question the effectiveness of Open-MPI on InfiniBand. Check the commit logs for Mellanox employees some

Re: [OMPI users] problem with cancelling Send-Request

2019-10-02 Thread Jeff Hammond via users
Don’t try to cancel sends. https://github.com/mpi-forum/mpi-issues/issues/27 has some useful info. Jeff On Wed, Oct 2, 2019 at 7:17 AM Christian Von Kutzleben via users < users@lists.open-mpi.org> wrote: > Hi, > > I’m currently evaluating to use openmpi (4.0.1) in our application. > > We are us

Re: [OMPI users] problem with cancelling Send-Request

2019-10-02 Thread Emyr James via users
Hi Christian, I would suggest using mvapich2 instead. It is supposedly faster than OpenMpi on infiniband and it seems to have fewer options under the hood which means less things you have to tweak to get it working for you. Regards, Emyr James Head of Scientific IT CRG -Centre for Genomic R

Re: [OMPI users] Problem running with UCX/oshmem on single node?

2018-05-14 Thread Michael Di Domenico
On Wed, May 9, 2018 at 9:45 PM, Howard Pritchard wrote: > > You either need to go and buy a connectx4/5 HCA from mellanox (and maybe a > switch), and install that > on your system, or else install xpmem (https://github.com/hjelmn/xpmem). > Note there is a bug right now > in UCX that you may hit if

Re: [OMPI users] problem

2018-05-10 Thread dpchoudh
What Jeff is suggesting is probably valgrind. However, in my experience, which is much less than most OpenMPI developers, a simple code inspection often is adequate. Here are the steps: 1. If you don't already have it, build a debug version of your code. If you are using gcc, you'd use a -g to CFL

Re: [OMPI users] problem

2018-05-10 Thread Ankita m
ok...Thank you so much sir On Wed, May 9, 2018 at 11:13 PM, Jeff Squyres (jsquyres) wrote: > It looks like you're getting a segv when calling MPI_Comm_rank(). > > This is quite unusual -- MPI_Comm_rank() is just a local lookup / return > of an integer. If MPI_Comm_rank() is seg faulting, it usu

Re: [OMPI users] Problem running with UCX/oshmem on single node?

2018-05-09 Thread Howard Pritchard
Hi Craig, You are experiencing problems because you don't have a transport installed that UCX can use for oshmem. You either need to go and buy a connectx4/5 HCA from mellanox (and maybe a switch), and install that on your system, or else install xpmem (https://github.com/hjelmn/xpmem). Note ther

Re: [OMPI users] problem

2018-05-09 Thread Jeff Squyres (jsquyres)
It looks like you're getting a segv when calling MPI_Comm_rank(). This is quite unusual -- MPI_Comm_rank() is just a local lookup / return of an integer. If MPI_Comm_rank() is seg faulting, it usually indicates that there's some other kind of memory error in the application, and this seg fault

Re: [OMPI users] problem

2018-05-09 Thread Ankita m
yes. Because previously i was using intel-mpi. That time the program was running perfectly. Now when i use openmpi this shows this error files...Though i am not quite sure. I just thought if the issue will be for Openmpi then i could get some help here. On Wed, May 9, 2018 at 6:47 PM, Gilles Gouai

Re: [OMPI users] problem

2018-05-09 Thread Gilles Gouaillardet
Ankita, Do you have any reason to suspect the root cause of the crash is Open MPI ? Cheers, Gilles On Wednesday, May 9, 2018, Ankita m wrote: > MPI "Hello World" program is also working > > please see this error file attached below. its of a different program > > On Wed, May 9, 2018 at 4:10 P

Re: [OMPI users] problem

2018-05-09 Thread Ankita m
MPI "Hello World" program is also working please see this error file attached below. its of a different program On Wed, May 9, 2018 at 4:10 PM, John Hearns via users < users@lists.open-mpi.org> wrote: > Ankita, looks like your program is not launching correctly. > I would try the following: > de

Re: [OMPI users] problem

2018-05-09 Thread John Hearns via users
Ankita, looks like your program is not launching correctly. I would try the following: define two hosts in a machinefile. Use mpirun -np 2 machinefile date Ie can you use mpirun just to run the command 'date' Secondly compile up and try to run an MPI 'Hello World' program On 9 May 2018 at 12:

Re: [OMPI users] problem related ORTE

2018-04-06 Thread Jeff Squyres (jsquyres)
Can you please send all the information listed here: https://www.open-mpi.org/community/help/ Thanks! > On Apr 6, 2018, at 8:27 AM, Ankita m wrote: > > Hello Sir/Madam > > I am Ankita Maity, a PhD scholar from Mechanical Dept., IIT Roorkee, India > > I am facing a problem while submitti

Re: [OMPI users] Problem related to openmpi cart create command

2017-12-03 Thread Gilles Gouaillardet
Hi, There is not enough information to help. Can you build a minimal example that evidences the issue and states how many MPI tasks are needed to evidence this issue ? Cheers, Gilles On Sun, Dec 3, 2017 at 6:00 PM, Muhammad Umar wrote: > Hello, hope everyone is fine. > > > I have been given

Re: [OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-31 Thread Andy Riebs
As always, thanks for your help Ralph! Cutting over to PMIx 1.2.4 solved the problem for me. (Slurm wasn't happy building with PMIx v2.) And yes, I had ssh access to node04. (And Gilles, thanks for your note, as well.) Andy On 10/27/2017 04:31 PM, r...@open-mpi.org wrote: Two questions: 1

Re: [OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-29 Thread Gilles Gouaillardet
Andy, The crash occurs in the orted daemon and not in the mpi_hello MPI app, so you will not see anything useful in gdb. you can use the attached launch agent script in order to get a stack trace of orted. your mpirun command line should be updated like this mpirun --mca orte_launch_agent

Re: [OMPI users] Problem with MPI jobs terminating when using OMPI 3.0.x

2017-10-27 Thread r...@open-mpi.org
Two questions: 1. are you running this on node04? Or do you have ssh access to node04? 2. I note you are building this against an old version of PMIx for some reason. Does it work okay if you build it with the embedded PMIx (which is 2.0)? Does it work okay if you use PMIx v1.2.4, the latest re

Re: [OMPI users] Problem with MPI_FILE_WRITE_AT

2017-09-15 Thread Edgar Gabriel
thank you for the report and the code, I will look into this. What file system is that occurring on? Until I find the problem, note that you could switch to back to the previous parallel I/O implementation (romio) by providing that as a parameter to your mpirun command, e.g. mpirun --mca io

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-31 Thread Gilles Gouaillardet
Thanks Siegmar, i was finally able to reproduce it. the error is triggered by the VM topology, and i was able to reproduce it by manually removing the "NUMA" objects from the topology. as a workaround, you can mpirun --map-by socket ... i will follow-up on the devel ML with Ralph. Bes

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-31 Thread Siegmar Gross
Hi Gilles, Am 31.05.2017 um 08:38 schrieb Gilles Gouaillardet: Siegmar, the "big ORTE update" is a bunch of backports from master to v3.x btw, does the same error occurs with master ? Yes, it does, but the error occurs only if I use a real machine with my virtual machine "exin". I get the e

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Gilles Gouaillardet
Siegmar, the "big ORTE update" is a bunch of backports from master to v3.x btw, does the same error occurs with master ? i noted mpirun simply does ssh exin orted ... can you double check the right orted (e.g. /usr/local/openmpi-3.0.0_64_cc/bin/orted) or you can try to mpirun --mca orte

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Siegmar Gross
Hi Gilles, I configured Open MPI with the following command. ../openmpi-v3.x-201705250239-d5200ea/configure \ --prefix=/usr/local/openmpi-3.0.0_64_cc \ --libdir=/usr/local/openmpi-3.0.0_64_cc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ --with-jdk-headers=/usr/local/jdk1.8.0_66

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread r...@open-mpi.org
Until the fixes pending in the big ORTE update PR are committed, I suggest not wasting time chasing this down. I tested the “patched” version of the 3.x branch, and it works just fine. > On May 30, 2017, at 7:43 PM, Gilles Gouaillardet wrote: > > Ralph, > > > the issue Siegmar initially rep

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Gilles Gouaillardet
Ralph, the issue Siegmar initially reported was loki hello_1 111 mpiexec -np 3 --host loki:2,exin hello_1_mpi per what you wrote, this should be equivalent to loki hello_1 111 mpiexec -np 3 --host loki:2,exin:1 hello_1_mpi and this is what i initially wanted to double check (but i made a ty

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread r...@open-mpi.org
This behavior is as-expected. When you specify "-host foo,bar”, you have told us to assign one slot to each of those nodes. Thus, running 3 procs exceeds the number of slots you assigned. You can tell it to set the #slots to the #cores it discovers on the node by using “-host foo:*,bar:*” I ca

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread gilles
Hi Siegmar, my bad, there was a typo in my reply. i really meant > > what if you ? > > mpiexec --host loki:2,exin:1 -np 3 hello_1_mpi but you also tried that and it did not help. i could not find anything in your logs that suggest mpiexec tries to start 5 MPI tasks, did i miss something ? i w

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread Siegmar Gross
Hi Gilles, what if you ? mpiexec --host loki:1,exin:1 -np 3 hello_1_mpi I need as many slots as processes so that I use "-np 2". "mpiexec --host loki,exin -np 2 hello_1_mpi" works as well. The command breaks, if I use at least "-np 3" and distribute the processes across at least two machines.

Re: [OMPI users] problem with "--host" with openmpi-v3.x-201705250239-d5200ea

2017-05-30 Thread gilles
Hi Siegmar, what if you ? mpiexec --host loki:1,exin:1 -np 3 hello_1_mpi are loki and exin different ? (os, sockets, core) Cheers, Gilles - Original Message - > Hi, > > I have installed openmpi-v3.x-201705250239-d5200ea on my "SUSE Linux > Enterprise Server 12.2 (x86_64)" with Sun C

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-21 Thread Jing Gong
Hi, The email is intended to follow the thread about "Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch". https://mail-archive.com/users@lists.open-mpi.org/msg30650.html We have installed the latest version v2.0.2 on the cluster that

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Anastasia Kruchinina
Ok, thanks for your answers! I was not aware that it is a known issue. I guess I will just try to find a machine with OpenMPI/2.0.2 and try there. On 16 February 2017 at 00:01, r...@open-mpi.org wrote: > Yes, 2.0.1 has a spawn issue. We believe that 2.0.2 is okay if you want to > give it a try

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
Yes, 2.0.1 has a spawn issue. We believe that 2.0.2 is okay if you want to give it a try Sent from my iPad > On Feb 15, 2017, at 1:14 PM, Jason Maldonis wrote: > > Just to throw this out there -- to me, that doesn't seem to be just a problem > with SLURM. I'm guessing the exact same error wo

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Jason Maldonis
Just to throw this out there -- to me, that doesn't seem to be just a problem with SLURM. I'm guessing the exact same error would be thrown interactively (unless I didn't read the above messages carefully enough). I had a lot of problems running spawned jobs on 2.0.x a few months ago, so I switched

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Anastasia Kruchinina
Hi! I am doing like this: sbatch -N 2 -n 5 ./job.sh where job.sh is: #!/bin/bash -l module load openmpi/2.0.1-icc mpirun -np 1 ./manager 4 On 15 February 2017 at 17:58, r...@open-mpi.org wrote: > The cmd line looks fine - when you do your “sbatch” request, what is in > the shell scrip

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
The cmd line looks fine - when you do your “sbatch” request, what is in the shell script you give it? Or are you saying you just “sbatch” the mpirun cmd directly? > On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina > wrote: > > Hi, > > I am running like this: > mpirun -np 1 ./manager > >

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Anastasia Kruchinina
Hi, I am running like this: mpirun -np 1 ./manager Should I do it differently? I also thought that all sbatch does is create an allocation and then run my script in it. But it seems it is not since I am getting these results... I would like to upgrade to OpenMPI, but no clusters near me have it

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread Howard Pritchard
Hi Anastasia, Definitely check the mpirun when in batch environment but you may also want to upgrade to Open MPI 2.0.2. Howard r...@open-mpi.org schrieb am Mi. 15. Feb. 2017 um 07:49: > Nothing immediate comes to mind - all sbatch does is create an allocation > and then run your script in it.

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
Nothing immediate comes to mind - all sbatch does is create an allocation and then run your script in it. Perhaps your script is using a different “mpirun” command than when you type it interactively? > On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina > wrote: > > Hi, > > I am trying to us

Re: [OMPI users] problem with opal_list_remove_item for openmpi-v2.x-201702010255-8b16747 on Linux

2017-02-03 Thread Jeff Squyres (jsquyres)
I've filed this as https://github.com/open-mpi/ompi/issues/2920. Ralph is just heading out for about a week or so; it may not get fixed until he comes back. > On Feb 3, 2017, at 2:03 AM, Siegmar Gross > wrote: > > Hi, > > I have installed openmpi-v2.x-201702010255-8b16747 on my "SUSE Linux

Re: [OMPI users] Problem with double shared library

2016-10-28 Thread Sean Ahern
Gilles, You described the problem exactly. I think we were able to nail down a solution to this one through judicious use of the -rpath $MPI_DIR/lib linker flag, allowing the runtime linker to properly find OpenMPI symbols at runtime. We're operational. Thanks for your help. -Sean -- Sean Ahern

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-24 Thread Gilles Gouaillardet
Brice, unless you want to enable/disable nvml at runtime, and assuming we do not need nvml in Open MPI, and IMHO, the easiest workaround is to update https://github.com/open-mpi/ompi/blob/master/opal/mca/hwloc/hwloc1113/configure.m4 and add the oneliner enable_nvml=no a better option cou

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-24 Thread Brice Goglin
FWIW, I am still open to implementing something to workaround this in hwloc. Could be shell variable such as HWLOC_DISABLE_NVML=yes for all our major configured dependencies. Brice Le 24/10/2016 02:12, Gilles Gouaillardet a écrit : > Justin, > > > iirc, NVML is only used by hwloc (e.g. not by C

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-23 Thread Gilles Gouaillardet
Justin, iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no real benefit for having that. as a workaround, you can export enable_nvml=no and then configure && make install Cheers, Gilles On 10/20/2016 12:49 AM, Jeff Squyres (jsquyres) wrote: Justin -- Fair point. Can y

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-19 Thread Jeff Squyres (jsquyres)
Justin -- Fair point. Can you work with Sylvain Jeaugey (at Nvidia) to submit a pull request for this functionality? Thanks. > On Oct 18, 2016, at 2:26 PM, Justin Luitjens wrote: > > After looking into this a bit more it appears that the issue is I am building > on a head node which does n

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-18 Thread Justin Luitjens
After looking into this a bit more it appears that the issue is I am building on a head node which does not have the driver installed. Building on back node resolves this issue. In CUDA 8.0 the NVML stubs can be found in the toolkit at the following path: ${CUDA_HOME}/lib64/stubs For 8.0 I'd

Re: [OMPI users] Problem with double shared library

2016-10-17 Thread Gilles Gouaillardet
Sean, if i understand correctly, your built a libtransport_mpi.so library that depends on Open MPI, and your main program dlopen libtransport_mpi.so. in this case, and at least for the time being, you need to use RTLD_GLOBAL in your dlopen flags. Cheers, Gilles On 10/18/2016 4:53 AM,

Re: [OMPI users] Problem running an MPI program through the PBS manager

2016-09-26 Thread Mahmood Naderan
OK thank you very much. It is now running... Regards, Mahmood On Mon, Sep 26, 2016 at 2:04 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Mahmood, > > The node is defined in the PBS config, however it is not part of the > allocation (e.g. job) so it cannot be used, and hence

Re: [OMPI users] Problem running an MPI program through the PBS manager

2016-09-26 Thread Gilles Gouaillardet
Mahmood, The node is defined in the PBS config, however it is not part of the allocation (e.g. job) so it cannot be used, and hence the error message. In your PBS script, you do not need -np nor -host parameters to your mpirun command. Open MPI mpirun will automatically detect it is launched from

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
OK. Running "module unload rocks-openmpi" and putting that in ~/.bashrc will remove /opt/openmpi/lib from LD_LIBRARY_PATH. Thanks Gilles for your help. Regards, Mahmood On Mon, Sep 12, 2016 at 1:25 PM, Mahmood Naderan wrote: > It seems that it is part of rocks-openmpi. I will find out how to

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
It seems that it is part of rocks-openmpi. I will find out how to remove it and will come back. Regards, Mahmood On Mon, Sep 12, 2016 at 1:06 PM, Gilles Gouaillardet wrote: > Mahmood, > > you need to manually remove /opt/openmpi/lib from your LD_LIBRARY_PATH > (or have your sysadmin do it if

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
Mahmood, you need to manually remove /opt/openmpi/lib from your LD_LIBRARY_PATH (or have your sysadmin do it if this is somehow done automatically) the point of configuring with --enable-mpirun-prefix-by-default is you do *not* need to add /export/apps/siesta/openmpi-1.8.8/lib in your LD_LIBRA

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
Is the following output OK? ... Making install in util make[2]: Entering directory `/export/apps/siesta/openmpi-1.8.8/test/util' make[3]: Entering directory `/export/apps/siesta/openmpi-1.8.8/test/util' make[3]: Nothing to be done for `install-exec-am'. make[3]: Nothing to be done for `install-da

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
Mahmood, I was suggesting you (re)configure (i assume you did it) the Open MPI 1.8.8 installed in /export/apps/siesta/openmpi-1.8.8 with --enable-mpirun-prefix-by-default Cheers, Gilles On 9/12/2016 4:51 PM, Mahmood Naderan wrote: >​ --enable-mpirun-prefix-by-default​ What is that? Does t

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
Basically, it means libs with be linked with -Wl,-rpath,/export/apps/siesta/openmpi-1.8.8/lib so if you run a.out with an empty $LD_LIBRARY_PATH, then it will look for the MPI libraries in /export/apps/siesta/openmpi-1.8.8/lib Cheers, Gilles On 9/12/2016 4:50 PM, Mahmood Naderan wrote:

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
>​ --enable-mpirun-prefix-by-default​ What is that? Does that mean "configure 1.8.8 with the default one installed on the system"? Then that is not good I think because # /opt/openmpi/bin/ompi_info Package: Open MPI root@centos-6-3.localdomain Distribution O

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
>​ --enable-mpirun-prefix-by-default​ What is that? Does that mean "configure 1.8.8 with the default one installed on the system"? Then that is not good I think because Regards, Mahmood ___ users mailing list users@lists.open-mpi.org https://rfd.n

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
That sounds good to me ! just to make it crystal clear ... assuming you configure'd your Open MPI 1.8.8 with --enable-mpirun-prefix-by-default (and if you did not, i do encourage you to do so), then all you need is to remove /opt/openmpi/lib from your LD_LIBRARY_PATH (e.g. you do *not* ha

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Mahmood Naderan
​>(i'd like to make sure you are not using IntelMPI libmpi.so.1 with Open MPI libmpi_mpifh.so.2, that can happen if Intel MPI >appears first in your LD_LIBRARY_PATH) # echo $LD_LIBRARY_PATH /opt/gridengine/lib/linux-x64:/opt/openmpi/lib # ls /opt/openmpi/lib libmpi.a libompitrace.a

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
Hi, this is the relevant part of your config.log configure:1594: checking whether the Fortran compiler works configure:1600: ./a.out ./a.out: symbol lookup error: /export/apps/siesta/openmpi-1.8.8/lib/libmpi_mpifh.so.2: undefined symbol: mpi_fortran_weights_empty configure:1603: $? = 127 c

Re: [OMPI users] problem with exceptions in Java interface

2016-08-29 Thread Graham, Nathaniel Richard
, August 29, 2016 6:16 AM To: Open MPI Users Subject: Re: [OMPI users] problem with exceptions in Java interface Hi Siegmar, I will review PR 1698 and wait some more feedback from the developers, they might have different views than mine. assuming PR 1698 does what you expect, it does not catch all

Re: [OMPI users] problem with exceptions in Java interface

2016-08-29 Thread Gilles Gouaillardet
Hi Siegmar, I will review PR 1698 and wait some more feedback from the developers, they might have different views than mine. assuming PR 1698 does what you expect, it does not catch all user errors. for example, if you MPI_Send a buffer that is too short, the exception might be thrown at any time

Re: [OMPI users] problem with exceptions in Java interface

2016-08-29 Thread Siegmar Gross
Hi Gilles, isn't it possible to pass all exceptions from the Java interface to the calling method? I can live with the current handling of exceptions as well, although some exceptions can be handled within my program and some will break my program even if I want to handle exceptions myself. I und

Re: [OMPI users] problem with exceptions in Java interface

2016-08-29 Thread Gilles Gouaillardet
Siegmar and all, i am puzzled with this error. on one hand, it is caused by an invalid buffer (e.g. buffer size is 1, but user suggests size is 2) so i am fine with current behavior (e.g. java.lang.ArrayIndexOutOfBoundsException is thrown) /* if that was a C program, it would very likely S

Re: [OMPI users] Problem when installing Rmpi package in HPC cluster

2016-07-11 Thread Bennet Fauber
We have found that virtually all Rmpi jobs need to be started with $ mpirun -np 1 R CMD BATCH This is, as I understand it, because the first R will initialize the MPI environment and then when you create the cluster, it wants to be able to start the rest of the processes. When you intialize

Re: [OMPI users] Problem when installing Rmpi package in HPC cluster

2016-07-11 Thread Gilles Gouaillardet
Note this is just a workaround, this simply disables the mxm mtl (e.g. Mellanox optimized infiniband driver). basically, there are two ways to run a single task mpi program (a.out) - mpirun -np 1 ./a.out (this is the "standard" way) - ./a.out (aka singleton mode) the logs you posted do not s

Re: [OMPI users] Problem when installing Rmpi package in HPC cluster

2016-07-11 Thread pan yang
Dear Gilles, I tried export OMPI_MCA_pml=ob1, and it worked! Thank you very much for your brilliant suggestion. By the way, I don't really understand what do you mean by '*can you also extract the command tha launch the test ?*'... Cheers, Pan

Re: [OMPI users] Problem when installing Rmpi package in HPC cluster

2016-07-11 Thread Gilles Gouaillardet
That could be specific to mtl/mxm could you export OMPI_MCA_pml=ob1 and try again ? can you also extract the command tha launch the test ? I am curious whether this is via mpirun or as a singleton Cheers, Gilles On Monday, July 11, 2016, pan yang wrote: > Dear OpenMPI community, > > I faced

Re: [OMPI users] problem with exceptions in Java interface

2016-05-24 Thread Howard Pritchard
Hi Siegmar, Sorry for the delay, I seem to have missed this one. It looks like there's an error in the way the native methods are processing java exceptions. The code correctly builds up an exception message for cases where MPI 'c' returns non-success but, not if the problem occured in one of th

Re: [OMPI users] problem about mpirun on two nodes

2016-05-23 Thread Jeff Squyres (jsquyres)
topped firewalld and the error messages disappeared. I saw six > processes were running on each node, but now the all processes keep running > forever with 100% CPU usage. > > > -Original Message- > From: Jeff Squyres (jsquyres) > To: Open MPI User's List > Se

Re: [OMPI users] problem about mpirun on two nodes

2016-05-23 Thread douraku
d the error messages disappeared. I saw six processes were running on each node, but now the all processes keep running forever with 100% CPU usage. -Original Message- From: Jeff Squyres (jsquyres) To: Open MPI User's List Sent: Mon, May 23, 2016 9:13 am Subject: Re: [OMPI users] pr

Re: [OMPI users] problem about mpirun on two nodes

2016-05-23 Thread Jeff Squyres (jsquyres)
On May 21, 2016, at 11:31 PM, dour...@aol.com wrote: > > I encountered a problem about mpirun and SSH when using OMPI 1.10.0 compiled > with gcc, running on centos7.2. > When I execute mpirun on my 2 node cluster, I get the following errors pasted > below. > > [douraku@master home]$ mpirun -np

Re: [OMPI users] problem with ld for Sun C 5.14 beta and openmpi-dev-4010-g6c9d65c

2016-05-10 Thread Gilles Gouaillardet
Siegmar, this issue was previously reported at http://www.open-mpi.org/community/lists/devel/2016/05/18923.php i just pushed the patch Cheers, Gilles On 5/10/2016 2:27 PM, Siegmar Gross wrote: Hi, I tried to install openmpi-dev-4010-g6c9d65c on my "SUSE Linux Enterprise Server 12 (x86

Re: [OMPI users] problem with Sun C 5.14 beta

2016-05-07 Thread Siegmar Gross
Hi Gilles, thank you very much for your help. Now C and C++ are link compatible. Kind regards Siegmar On 05/07/16 12:15, Gilles Gouaillardet wrote: Siegmar, per the config.log, you need to update your CXXFLAGS="-m64 -library=stlport4 -std=sun03" or just CXXFLAGS="-m64" Cheers, Gilles On

Re: [OMPI users] problem with Sun C 5.14 beta

2016-05-07 Thread Gilles Gouaillardet
Siegmar, per the config.log, you need to update your CXXFLAGS="-m64 -library=stlport4 -std=sun03" or just CXXFLAGS="-m64" Cheers, Gilles On Saturday, May 7, 2016, Siegmar Gross < siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi, > > today I tried to install openmpi-v1.10.2-176-g9d45e07 on my

Re: [OMPI users] Problem with 'orted: command not found'

2016-05-03 Thread Maciek Lewiński
Thank you! I have set up my env paths at the end of the script and thanks to you I just noticed that at the beginning of the bashrc script there's a simple IF that returns when it's opened as non-interactive. I moved my exports above it and it finally works. Again, thank you very much. 2016-05-03

Re: [OMPI users] Problem with 'orted: command not found'

2016-05-02 Thread Gilles Gouaillardet
If OpenMPI is installed at the same path on every node, the easiest optin is to re-configure with --enable-mpirun-prefix-by-default an other option is to use `which mpirun` instead of mpirun and yet an other option is to mpirun --prefix=$USER/.openmpi Cheers, Gilles On Tuesday, May 3, 2016, Maci

Re: [OMPI users] Problem with 'orted: command not found'

2016-05-02 Thread Jeff Squyres (jsquyres)
Make sure you check that these paths are set for *non-interactive* logins. > On May 2, 2016, at 6:14 PM, Maciek Lewiński wrote: > > I already had correct paths in .bashrc: > > export > PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/l

Re: [OMPI users] Problem with 'orted: command not found'

2016-05-02 Thread Maciek Lewiński
I already had correct paths in .bashrc: export PATH=/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/bin:/usr/local/bin:/home/$USER/.openmpi/bin export LD_LIBRARY_PATH=:/usr/local/lib:/usr/local/lib:/home/$USER/.openmpi/lib I can run mpi

Re: [OMPI users] Problem with 'orted: command not found'

2016-05-02 Thread Jeff Squyres (jsquyres)
The key is this error: bash: orted: command not found Meaning: you need to set your PATH and LD_LIBRARY_PATH properly for non-interactive logins. See https://www.open-mpi.org/faq/?category=running#adding-ompi-to-path. > On May 2, 2016, at 5:36 PM, Maciek Lewiński wrote: > > Hi, I'm having

Re: [OMPI users] Problem when installing OpenMPI during make all install

2016-02-25 Thread Gilles Gouaillardet
: > Attached are the log file for ./configure and make all install. I am using > the openmpi-1.10.2 > > Thanks > > -Original Message- > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres > (jsquyres) > Sent: Friday, 26 February, 2016 10:13 AM

Re: [OMPI users] Problem when installing OpenMPI during make all install

2016-02-25 Thread Tang Cheng Yee
t: Re: [OMPI users] Problem when installing OpenMPI during make all install Can you send all the information listed here: https://www.open-mpi.org/community/help/ > On Feb 25, 2016, at 9:09 PM, Tang Cheng Yee wrote: > > Hi all, > > I am new to the bioinformatics world. When I w

Re: [OMPI users] Problem when installing OpenMPI during make all install

2016-02-25 Thread Jeff Squyres (jsquyres)
Can you send all the information listed here: https://www.open-mpi.org/community/help/ > On Feb 25, 2016, at 9:09 PM, Tang Cheng Yee wrote: > > Hi all, > > I am new to the bioinformatics world. When I was trying to install OpenMPI, I > encountered the following error during ‘make all in

Re: [OMPI users] problem withexecstackandopenmpi-v1.10.1-140-g31ff573

2016-01-19 Thread Gilles Gouaillardet
Siegmar, that looks like a libtool issue : "-z noexecstack" is not correctly passed to cc a simple workaround is to append "-Wl,-z -Wl,noexecstack" to your LDFLAGS (and do not pass "-z noexecstack" to LDFLAGS nor CFLAGS) Cheers, Gilles On 1/19/2016 11:19 PM, Siegmar Gross wrote: Hi Gilles,

Re: [OMPI users] problem withexecstackandopenmpi-v1.10.1-140-g31ff573

2016-01-19 Thread Siegmar Gross
Hi Gilles, we have installed "execstack", but it doesn't work as expected. loki lib64 110 execstack -c libmpi_java.so.1.2.0 execstack: libmpi_java.so.1.2.0: Unknown debugging section .debug_funcnames loki lib64 111 We've also tried to build it with "cc" instead of "gcc", but it failed, because

Re: [OMPI users] problem with execstackandopenmpi-v1.10.1-140-g31ff573

2016-01-15 Thread Siegmar Gross
Hi Gilles, now I can answer the second part of your email. "LDFLAGS='-m64 -mt -z noexecstack'" didn't help. loki java 114 ompi_info | grep "Built on:" Built on: Fr 15. Jan 15:02:52 CET 2016 loki java 115 head /export2/src/openmpi-1.10.2/openmpi-v1.10.1-140-g31ff573-Linux.x86_

Re: [OMPI users] problem with execstackandopenmpi-v1.10.1-140-g31ff573

2016-01-15 Thread Siegmar Gross
Hi Gilles, "execstack" isn't available at our system and it isn't part of the repository for SuSE Linux Enterprise Server or Desktop. Next week I'll ask our admin, if he can try to locate and install the program. Best regards Siegmar On 01/15/16 08:01, Gilles Gouaillardet wrote: Siegmar, d

  1   2   3   4   5   6   7   8   9   10   >