Re: [OMPI users] Runtime error with OpenMPI via InfiniBand - [btl_openib_proc.c:157] ompi_modex_recv failed for peer

2017-04-19 Thread Jeff Squyres (jsquyres)
Dong -- I do not see an obvious cause for the error. Are you able to run trivial hello world / ring kinds of MPI jobs? Is the problem localized to a specific set of nodes in the cluster? > On Apr 14, 2017, at 4:30 PM, Dong Young Yoon wrote: > > Hi everyone, > > I am a student working on a p

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-15 Thread Jason Maldonis
Hi Gilles, I would like to be able to run on anywhere from 1-16 nodes. Let me explain our (mpi/parallelism) situation briefly for more context: We have a "master" job that needs MPI functionality. This master job is written in python (we use mpi4py). The master job then makes spawn calls out to

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-15 Thread Gilles Gouaillardet
Jason, How many nodes are you running on ? Since you have an IB network, IB is used for intra node communication between tasks that are not part of the same OpenMPI job (read spawn group) I can make a simple patch to use tcp instead of IB for these intra node communication, Let me know if you are

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Jason Maldonis
Thanks Ralph for all the help. I will do that until it gets fixed. Nathan, I am very very interested in this working because we are developing some new cool code for research in materials science. This is the last piece of the puzzle for us I believe. I can use TCP for now though of course. While

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Ralph Castain
You don’t want to always use those options as your performance will take a hit - TCP vs Infiniband isn’t a good option. Sadly, this is something we need someone like Nathan to address as it is a bug in the code base, and in an area I’m not familiar with For now, just use TCP so you can move for

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Jason Maldonis
Ralph, The problem *does* go away if I add "-mca btl tcp,sm,self" to the mpiexec cmd line. (By the way, I am using mpiexec rather than mpirun; do you recommend one over the other?) Will you tell me what this means for me? For example, should I always append these arguments to mpiexec for my non-tes

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Nathan Hjelm
That message is coming from udcm in the openib btl. It indicates some sort of failure in the connection mechanism. It can happen if the listening thread no longer exists or is taking too long to process messages. -Nathan On Jun 14, 2016, at 12:20 PM, Ralph Castain wrote: Hmm…I’m unable to r

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Ralph Castain
Hmm…I’m unable to replicate a problem on my machines. What fabric are you using? Does the problem go away if you add “-mca btl tcp,sm,self” to the mpirun cmd line? > On Jun 14, 2016, at 11:15 AM, Jason Maldonis wrote: > > Hi Ralph, et. al, > > Great, thank you for the help. I downloaded the m

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Jason Maldonis
Hi Ralph, et. al, Great, thank you for the help. I downloaded the mpi loop spawn test directly from what I think is the master repo on github: https://github.com/open-mpi/ompi/blob/master/orte/test/mpi/loop_spawn.c I am still using the mpi code from 1.10.2, however. Is that test updated with the

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-14 Thread Ralph Castain
I dug into this a bit (with some help from others) and found that the spawn code appears to be working correctly - it is the test in orte/test that is wrong. The test has been correctly updated in the 2.x and master repos, but we failed to backport it to the 1.10 series. I have done so this morn

Re: [OMPI users] runtime error in orte/loop_spawn test using OMPI 1.10.2

2016-06-13 Thread Ralph Castain
No, that PR has nothing to do with loop_spawn. I’ll try to take a look at the problem. > On Jun 13, 2016, at 3:47 PM, Jason Maldonis wrote: > > Hello, > > I am using OpenMPI 1.10.2 compiled with Intel. I am trying to get the spawn > functionality to work inside a for loop, but continue to get

Re: [OMPI users] runtime error with openmpi-v2.x-dev-958-g7e94425

2016-02-20 Thread Ralph Castain
On https://github.com/open-mpi/ompi/pull/1385 Gilles indicated he would update the patch and commit it on Monday > On Feb 20, 2016, at 12:48 AM, Siegmar Gross > wrote: > > Hi Gilles, > > do you know, when fixes for the problems will be ready? Th

Re: [OMPI users] runtime error with openmpi-v2.x-dev-958-g7e94425

2016-02-20 Thread Siegmar Gross
Hi Gilles, do you know, when fixes for the problems will be ready? They still exist in the current version. tyr spawn 136 ompi_info | grep -e "Open MPI repo revision" -e "C compiler absolute" Open MPI repo revision: v2.x-dev-1108-gaaf15d9 C compiler absolute: /usr/local/gcc-5.1.0/bin/gc

Re: [OMPI users] runtime error with openmpi-v2.x-dev-958-g7e94425

2016-01-15 Thread Gilles Gouaillardet
Siegmar, the fix is now being discussed at https://github.com/open-mpi/ompi/pull/1285 the other error your reported (MPI_Comm_spawn hanging on an heterogeneous cluster) is being discussed at https://github.com/open-mpi/ompi/pull/1292 Cheers, Gilles On 1/14/2016 11:06 PM, Siegmar Gross wrot

Re: [OMPI users] runtime error

2011-02-14 Thread Jeff Squyres
What happens if you try to mpirun a non-MPI program like, "date" or "hostname"? On Feb 11, 2011, at 6:14 AM, Marcela Castro León wrote: > Excuse me. I forgot the attaching. > > 2011/2/11 Marcela Castro León > Hello: > > I've the same version ob Ubuntu 10.04. The original version was Ubuntu Se

Re: [OMPI users] runtime error

2011-02-11 Thread Marcela Castro León
Excuse me. I forgot the attaching. 2011/2/11 Marcela Castro León > Hello: > > I've the same version ob Ubuntu 10.04. The original version was Ubuntu > Server 9.1 (64) and upgraded both of them to 10.04. > Yesterday I've updated and upgraded to the same level again. But I've got > the same error

Re: [OMPI users] runtime error

2011-02-11 Thread Marcela Castro León
Hello: I've the same version ob Ubuntu 10.04. The original version was Ubuntu Server 9.1 (64) and upgraded both of them to 10.04. Yesterday I've updated and upgraded to the same level again. But I've got the same error after that. The machine are exactly the same, HP Compaq with inter Core I5. An

Re: [OMPI users] runtime error

2011-02-10 Thread Jeff Squyres
I typically see these kinds of errors when there's an Open MPI version mismatch between the nodes, and/or if there are slightly different flavors of Linux installed on each node (i.e., you're technically in a heterogeneous situation, but you're trying to run a single application binary). Can yo

Re: [OMPI users] runtime error

2011-02-10 Thread Marcela Castro León
Hello > I've a program that allways works fine, but i'm trying it on a new cluster > and fails when I execute it on more than one machine. > I mean, if I execute alone on each host, everything works fine. > radic@santacruz:~/gaps/caso3-i1$ mpirun -np 3 ../test parcorto.txt > > But when I execute >

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread Jeff Squyres
On Nov 2, 2009, at 7:43 AM, Shiqing Fan wrote: Because you were building Open MPI with libtool support, probably the problem could be that libtool is not loaded correctly. Could you check that libtool bin directory is in the PATH environment variable? If Open MPI can't find correct libtool li

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread Shiqing Fan
e] Sent: Mon 11/2/2009 7:55 PM To: Basant Lakhotiya (WT01 - Computing and Storage IPG) Cc: us...@open-mpi.org Subject: Re: [OMPI users] Runtime error while running mpirun Hi Basant, Could you please also check in your Open MPI solutions, that do you have the mca_paffinity_windows project? and in

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread Shiqing Fan
09 6:13 PM To: Open MPI Users Cc: Basant Lakhotiya (WT01 - Computing and Storage IPG) Subject: Re: [OMPI users] Runtime error while running mpirun Hi Basant, The mca_paffinity_windowsd.dll is the debug version of mca_paffinity_windows.dll, but orterun.exe should know which one it can use whe

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread Shiqing Fan
lto:f...@hlrs.de] Sent: Mon 11/2/2009 6:13 PM To: Open MPI Users Cc: Basant Lakhotiya (WT01 - Computing and Storage IPG) Subject: Re: [OMPI users] Runtime error while running mpirun Hi Basant, The mca_paffinity_windowsd.dll is the debug version of mca_paffinity_windows.dll, but orterun.exe sh

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread basant.lakhotiya
\orterun.c at line 570 Regards, Basant From: Shiqing Fan [mailto:f...@hlrs.de] Sent: Mon 11/2/2009 6:13 PM To: Open MPI Users Cc: Basant Lakhotiya (WT01 - Computing and Storage IPG) Subject: Re: [OMPI users] Runtime error while running mpirun

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread Shiqing Fan
r. Thanks, Basant *From:* Basant [mailto:basant.lakhot...@wipro.com] *Sent:* Mon 11/2/2009 12:14 PM *To:* 'Open MPI Users' *Subject:* RE: [OMPI users] Runtime error while running mpirun Hi Terry, Its not creating mca_paffin

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread basant.lakhotiya
r. Thanks, Basant From: Basant [mailto:basant.lakhot...@wipro.com] Sent: Mon 11/2/2009 12:14 PM To: 'Open MPI Users' Subject: RE: [OMPI users] Runtime error while running mpirun Hi Terry, Its not creating mca_paffinity_windows.dll

Re: [OMPI users] Runtime error while running mpirun

2009-11-02 Thread basant.lakhotiya
g] On Behalf Of Terry Dontje Sent: Friday, October 30, 2009 11:05 PM To: us...@open-mpi.org Subject: Re: [OMPI users] Runtime error while running mpirun Hi Basant, I am not familiar with Windows builds of Open MPI. However, can you see if you Open MPI build actually created a mca_paffinity_window'

Re: [OMPI users] Runtime error while running mpirun

2009-10-30 Thread Terry Dontje
Hi Basant, I am not familiar with Windows builds of Open MPI. However, can you see if you Open MPI build actually created a mca_paffinity_window's dll? I could imagine the issue might be that the dll is not finding a needed dependency. Under Windows is there a command similar to Unix's ldd

Re: [OMPI users] Runtime error only on one node.

2009-03-05 Thread Jeff Squyres
On Mar 5, 2009, at 7:05 PM, Shinta Bonnefoy wrote: Thanks, the option --mca btl ^openib works fine ! Half of the cluster has Infiniband/OpenFabrics (from node49 to node96) and the other half (nodes from 01 to 48) doesn't. Ah... this explains things. I wonder if we have not tes

Re: [OMPI users] Runtime error only on one node.

2009-03-05 Thread Shinta Bonnefoy
Mar 2009 17:25:34 -0500 > From: Jeff Squyres > Subject: Re: [OMPI users] Runtime error only on one node. > To: "Open MPI Users" > Message-ID: <70d31c29-b711-419f-9973-73b41feb0...@cisco.com> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > >

Re: [OMPI users] Runtime error only on one node.

2009-03-05 Thread Jeff Squyres
Whoops; we shouldn't be seg faulting. :-\ The warning is exactly what it implies -- it found the OpenFabrics network stack by no functioning OpenFabrics-capable hardware. You can disable it (and the segv) by disabling the openfabrics BTL from running: mpirun --mca btl ^openib But what

Re: [OMPI users] Runtime Error

2006-07-28 Thread Jeff Squyres
This question has come up a few times now, so I've added it to the faq, which should make the "mca_pml_teg.so:undefined symbol" message web-searchable for others who run into this issue. On 7/26/06 8:36 AM, "Michael Kluskens" wrote: > Summary: You have to properly uninstall OpenMPI 1.0.2 before

Re: [OMPI users] Runtime Error

2006-07-26 Thread Michael Kluskens
Summary: You have to properly uninstall OpenMPI 1.0.2 before installing OpenMPI 1.1 On Jul 26, 2006, at 7:05 AM, wrote: Updated to open_mpi-1.1. I get a runtime error on the application as follows mca:base:component_find:unable to open:/usr/local/lip/openmpi/mca_pml_teg.so:undefined symb

Re: [OMPI users] runtime error

2006-07-04 Thread Brian Barrett
On Jul 4, 2006, at 1:58 AM, Manal Helal wrote: sorry for posting too much, I tried running and I got this error, I assume that this is the stack of the calls before the error Signal:11 info.si_errno:0(Success) si_code:2(SEGV_ACCERR) Failing at addr:0x8059b73 [0] func:/usr/local/bin/openmpi/lib/