Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-16 Thread Patrick Begou
Thanks all for your answers, I've added some details about the tests I have 
run.  See below.



Ralph Castain wrote:

Not precisely correct. It depends on the environment.

If there is a resource manager allocating nodes, or you provide a hostfile 
that specifies the number of slots on the nodes, or you use -host, then we 
default to no-oversubscribe.

I'm using a batch scheduler (OAR).
# cat /dev/cpuset/oar/begou_793/cpuset.cpus
4-7

So 4 cores allowed. Nodes have two height cores cpus.

Node file contains:
# cat $OAR_NODEFILE
frog53
frog53
frog53
frog53

# mpirun --hostfile $OAR_NODEFILE -bind-to core location.exe
is  okay (my test code show one process on each core)
(process 3) thread is now running on PU logical index 1 (OS/physical index 5) on 
system frog53
(process 0) thread is now running on PU logical index 3 (OS/physical index 7) on 
system frog53
(process 1) thread is now running on PU logical index 0 (OS/physical index 4) on 
system frog53
(process 2) thread is now running on PU logical index 2 (OS/physical index 6) on 
system frog53


# mpirun -np 5 --hostfile $OAR_NODEFILE -bind-to core location.exe
oversuscribe with:
(process 0) thread is now running on PU logical index 3 (OS/physical index 7) on 
system frog53
(process 1) thread is now running on PU logical index 1 (OS/physical index 5) on 
system frog53
(*process 3*) thread is now running on PU logical index *2 (OS/physical index 
6)* on system frog53
(process 4) thread is now running on PU logical index 0 (OS/physical index 4) on 
system frog53
(*process 2*) thread is now running on PU logical index *2 (OS/physical index 
6)* on system frog53

This is not allowed with OpenMPI 1.7.3

I can increase until the maximul core number of this first pocessor (8 cores)
# mpirun -np 8 --hostfile $OAR_NODEFILE -bind-to core location.exe |grep 'thread 
is now running on PU'
(process 5) thread is now running on PU logical index 1 (OS/physical index 5) on 
system frog53
(process 7) thread is now running on PU logical index 3 (OS/physical index 7) on 
system frog53
(process 4) thread is now running on PU logical index 0 (OS/physical index 4) on 
system frog53
(process 6) thread is now running on PU logical index 2 (OS/physical index 6) on 
system frog53
(process 2) thread is now running on PU logical index 1 (OS/physical index 5) on 
system frog53
(process 0) thread is now running on PU logical index 2 (OS/physical index 6) on 
system frog53
(process 1) thread is now running on PU logical index 0 (OS/physical index 4) on 
system frog53
(process 3) thread is now running on PU logical index 0 (OS/physical index 4) on 
system frog53


But I cannot overload more than the 8 cores (max core number of one cpu).
# mpirun -np 9 --hostfile $OAR_NODEFILE -bind-to core location.exe
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node:frog53
   #processes:  2
   #cpus:   1

You can override this protection by adding the "overload-allowed"
option to your binding directive.

Now if I add *--nooversubscribe* the problem doesn't exist anymore (no more than 
4 processes, one on each core). So looks like if default behavior would be a 
nooversuscribe on cores number of the socket ???


Again, with 1.7.3 this problem doesn't occur at all.

Patrick




If you provide a hostfile that doesn't specify slots, then we use the number 
of cores we find on each node, and we allow oversubscription.


What is being described sounds like more of a bug than an intended feature. 
I'd need to know more about it, though, to be sure. Can you tell me how you 
are specifying this cpuset?



On Sep 15, 2015, at 4:44 PM, Matt Thompson > wrote:


Looking at the Open MPI 1.10.0 man page:

https://www.open-mpi.org/doc/v1.10/man1/mpirun.1.php

it looks like perhaps -oversubscribe (which was an option) is now the default 
behavior. Instead we have:


*-nooversubscribe, --nooversubscribe*
Do not oversubscribe any nodes; error (without starting any processes) if
the requested number of processes would cause oversubscription. This
option implicitly sets "max_slots" equal to the "slots" value for each node.

It also looks like -map-by has a way to implement it as well (see man page).

Thanks for letting me/us know about this. On a system of mine I sort of 
depend on the -nooversubscribe behavior!


Matt


On Tue, Sep 15, 2015 at 11:17 AM, Patrick Begou 
> wrote:


Hi,

I'm runing OpenMPI 1.10.0 built with Intel 2015 compilers on a Bullx System.
I've some troubles with the bind-to core option when using cpuset.
If the cpuset is less than all the cores of a cpu (ex: 4 cores allowed on
a 8 cores cpus) OpenMPI 1.10.0 allows to overload these cores until the
maximum number of cores of the cpu.
With this config and because the cpuset only allows 4 cores, I can reach
2 processes/core if I use:

mpirun -n

Re: [OMPI users] runtime MCA parameters

2015-09-16 Thread marcin.krotkiewski

Thanks a lot, that looks right! Looks like some reading to do..

Do you know if in the OpenMPI implementation the MPI_T-interfaced MCA 
settings are thread-local, or rank-local?


Thanks!

Marcin


On 09/15/2015 07:58 PM, Nathan Hjelm wrote:

You can use MPI_T to set any MCA variable before MPI_Init. At this time
we lock down all MCA variable during MPI_Init. You will need to call
MPI_T_init_thread before MPI_Init and make sure to call MPI_T_finalize
any time after you are finished setting MCA variables. For more
information see MPI-3.1 chapter 14.

-Nathan

On Tue, Sep 15, 2015 at 07:40:56PM +0200, marcin.krotkiewski wrote:

I was wondering if it is possible, or considered to make it possible to
change the various MCA parameters by individual ranks during runtime in
addition to the command line?

I tried to google a bit, but did not get any indication that such topic has
even been discussed. It would be a very useful thing, especially in
multi-threaded applications when using MPI_THREAD_MULTIPLE, but I could come
up with plenty uses in usual single-threaded ranks setups.

Marcin
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27576.php


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27578.php




[OMPI users] bug in MPI_Comm_accept?

2015-09-16 Thread marcin.krotkiewski
I have run into a freeze / potential bug when using MPI_Comm_accept in a 
simple client / server implementation. I have attached two simplest 
programs I could produce:


 1. mpi-receiver.c opens a port using MPI_Open_port, saves the port 
name to a file


 2. mpi-receiver enters infinite loop and waits for connections using 
MPI_Comm_accept


 3. mpi-sender.c connects to that port using MPI_Comm_connect, sends 
one MPI_UNSIGNED_LONG, calls barrier and disconnects using 
MPI_Comm_disconnect


 4. mpi-receiver reads the MPI_UNSIGNED_LONG, prints it, calls barrier 
and disconnects using MPI_Comm_disconnect and goes to point 2 - infinite 
loop


All works fine, but only exactly 5 times. After that the receiver hangs 
in MPI_Recv, after exit from MPI_Comm_accept. That is 100% repeatable. I 
have tried with Intel MPI - no such problem.


I execute the programs using OpenMPI 1.10 as follows

mpirun -np 1 --mca mpi_leave_pinned 0 ./mpi-receiver


Do you have any clues what could be the reason? Am I doing sth wrong, or 
is it some problem with internal state of OpenMPI?


Thanks a lot!

Marcin

#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
  MPI_Info info;
  char port_name[MPI_MAX_PORT_NAME];
  MPI_Comm intercomm;

  MPI_Init(&argc, &argv);
  MPI_Info_create(&info);
  MPI_Open_port(info, port_name);
  printf("port name: %s\n", port_name);

  /* write port name to file */   
  {
FILE *fd;
fd = fopen("port.txt", "w+");
fprintf(fd, "%s", port_name);
fclose(fd);
  }

  /* accept connections */
  while(1){
unsigned long data;

/* accept connection */
MPI_Comm_accept(port_name, info, 0, MPI_COMM_WORLD, &intercomm);

/* receive comm size from the sender */
MPI_Recv(&data, 1, MPI_UNSIGNED_LONG, 0, 1, intercomm, MPI_STATUS_IGNORE);
printf("received data: %lx\n", data);

MPI_Barrier(intercomm);
MPI_Comm_disconnect(&intercomm);
printf("client disconnected\n");   
  }
}
#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
  char port_name[MPI_MAX_PORT_NAME+1];
  MPI_Info info;
  MPI_Comm intercomm;
  unsigned long data = 0x12345678;

  /* initialize MPI */
  MPI_Init(&argc, &argv);
  MPI_Info_create(&info);

  /* connect to receiver ranks - port is a string parameter */
  strcpy(port_name, argv[1]);

  /* connect to server - intercomm is the remote communicator */
  MPI_Comm_connect(port_name, info, 0, MPI_COMM_WORLD, &intercomm);
  printf("** connected\n");

  /* send data */
  MPI_Send(&data, 1, MPI_UNSIGNED_LONG, 0, 1, intercomm);
  MPI_Barrier(intercomm);

  /* disconnect */
  MPI_Comm_disconnect(&intercomm);
  MPI_Finalize();
  printf("** disconnected\n");

  return 0;
}


Re: [OMPI users] bug in MPI_Comm_accept?

2015-09-16 Thread Jalel Chergui

Can you check with an MPI_Finalize in the receiver ?
Jalel

Le 16/09/2015 16:06, marcin.krotkiewski a écrit :
I have run into a freeze / potential bug when using MPI_Comm_accept in 
a simple client / server implementation. I have attached two simplest 
programs I could produce:


 1. mpi-receiver.c opens a port using MPI_Open_port, saves the port 
name to a file


 2. mpi-receiver enters infinite loop and waits for connections using 
MPI_Comm_accept


 3. mpi-sender.c connects to that port using MPI_Comm_connect, sends 
one MPI_UNSIGNED_LONG, calls barrier and disconnects using 
MPI_Comm_disconnect


 4. mpi-receiver reads the MPI_UNSIGNED_LONG, prints it, calls barrier 
and disconnects using MPI_Comm_disconnect and goes to point 2 - 
infinite loop


All works fine, but only exactly 5 times. After that the receiver 
hangs in MPI_Recv, after exit from MPI_Comm_accept. That is 100% 
repeatable. I have tried with Intel MPI - no such problem.


I execute the programs using OpenMPI 1.10 as follows

mpirun -np 1 --mca mpi_leave_pinned 0 ./mpi-receiver


Do you have any clues what could be the reason? Am I doing sth wrong, 
or is it some problem with internal state of OpenMPI?


Thanks a lot!

Marcin



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27585.php


--
**
 Jalel CHERGUI, LIMSI-CNRS, Bât. 508 - BP 133, 91403 Orsay cedex, FRANCE
 Tél: (33 1) 69 85 81 27 ; Télécopie: (33 1) 69 85 80 88
 Mél: jalel.cher...@limsi.fr ; Référence: http://perso.limsi.fr/chergui
**



Re: [OMPI users] bug in MPI_Comm_accept?

2015-09-16 Thread Marcin Krotkiewski
But where would I put it? If I put it in the while(1), then 
MPI_Comm_Accept cannot be called for the second time. If I put it 
outside of the loop it will never be called.



On 09/16/2015 04:18 PM, Jalel Chergui wrote:

Can you check with an MPI_Finalize in the receiver ?
Jalel

Le 16/09/2015 16:06, marcin.krotkiewski a écrit :
I have run into a freeze / potential bug when using MPI_Comm_accept 
in a simple client / server implementation. I have attached two 
simplest programs I could produce:


 1. mpi-receiver.c opens a port using MPI_Open_port, saves the port 
name to a file


 2. mpi-receiver enters infinite loop and waits for connections using 
MPI_Comm_accept


 3. mpi-sender.c connects to that port using MPI_Comm_connect, sends 
one MPI_UNSIGNED_LONG, calls barrier and disconnects using 
MPI_Comm_disconnect


 4. mpi-receiver reads the MPI_UNSIGNED_LONG, prints it, calls 
barrier and disconnects using MPI_Comm_disconnect and goes to point 2 
- infinite loop


All works fine, but only exactly 5 times. After that the receiver 
hangs in MPI_Recv, after exit from MPI_Comm_accept. That is 100% 
repeatable. I have tried with Intel MPI - no such problem.


I execute the programs using OpenMPI 1.10 as follows

mpirun -np 1 --mca mpi_leave_pinned 0 ./mpi-receiver


Do you have any clues what could be the reason? Am I doing sth wrong, 
or is it some problem with internal state of OpenMPI?


Thanks a lot!

Marcin



___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/09/27585.php


--
**
  Jalel CHERGUI, LIMSI-CNRS, Bât. 508 - BP 133, 91403 Orsay cedex, FRANCE
  Tél: (33 1) 69 85 81 27 ; Télécopie: (33 1) 69 85 80 88
  Mél:jalel.cher...@limsi.fr  ; Référence:http://perso.limsi.fr/chergui
**


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27586.php




Re: [OMPI users] bug in MPI_Comm_accept?

2015-09-16 Thread Jalel Chergui
Right, anyway Finalize is necessary at the end of the receiver. The 
other issue is Barrier which is invoked probably when the sender has 
exited hence changing the size of intercom. Can you comment that line in 
both files ?


Jalel

Le 16/09/2015 16:22, Marcin Krotkiewski a écrit :
But where would I put it? If I put it in the while(1), then 
MPI_Comm_Accept cannot be called for the second time. If I put it 
outside of the loop it will never be called.



On 09/16/2015 04:18 PM, Jalel Chergui wrote:

Can you check with an MPI_Finalize in the receiver ?
Jalel

Le 16/09/2015 16:06, marcin.krotkiewski a écrit :
I have run into a freeze / potential bug when using MPI_Comm_accept 
in a simple client / server implementation. I have attached two 
simplest programs I could produce:


 1. mpi-receiver.c opens a port using MPI_Open_port, saves the port 
name to a file


 2. mpi-receiver enters infinite loop and waits for connections 
using MPI_Comm_accept


 3. mpi-sender.c connects to that port using MPI_Comm_connect, sends 
one MPI_UNSIGNED_LONG, calls barrier and disconnects using 
MPI_Comm_disconnect


 4. mpi-receiver reads the MPI_UNSIGNED_LONG, prints it, calls 
barrier and disconnects using MPI_Comm_disconnect and goes to point 
2 - infinite loop


All works fine, but only exactly 5 times. After that the receiver 
hangs in MPI_Recv, after exit from MPI_Comm_accept. That is 100% 
repeatable. I have tried with Intel MPI - no such problem.


I execute the programs using OpenMPI 1.10 as follows

mpirun -np 1 --mca mpi_leave_pinned 0 ./mpi-receiver


Do you have any clues what could be the reason? Am I doing sth 
wrong, or is it some problem with internal state of OpenMPI?


Thanks a lot!

Marcin



___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/09/27585.php


--
**
  Jalel CHERGUI, LIMSI-CNRS, Bât. 508 - BP 133, 91403 Orsay cedex, FRANCE
  Tél: (33 1) 69 85 81 27 ; Télécopie: (33 1) 69 85 80 88
  Mél:jalel.cher...@limsi.fr  ; Référence:http://perso.limsi.fr/chergui
**


___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/09/27586.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27587.php


--
**
 Jalel CHERGUI, LIMSI-CNRS, Bât. 508 - BP 133, 91403 Orsay cedex, FRANCE
 Tél: (33 1) 69 85 81 27 ; Télécopie: (33 1) 69 85 80 88
 Mél: jalel.cher...@limsi.fr ; Référence: http://perso.limsi.fr/chergui
**



Re: [OMPI users] bug in MPI_Comm_accept?

2015-09-16 Thread marcin.krotkiewski


I have removed the MPI_Barrier, to no avail. Same thing happens. Adding 
verbosity, before the receiver hangs I get the following message


[node2:03928] mca: bml: Using openib btl to [[12620,1],0] on node node3

So It is somewhere in the openib btl module

Marcin


On 09/16/2015 04:34 PM, Jalel Chergui wrote:
Right, anyway Finalize is necessary at the end of the receiver. The 
other issue is Barrier which is invoked probably when the sender has 
exited hence changing the size of intercom. Can you comment that line 
in both files ?


Jalel

Le 16/09/2015 16:22, Marcin Krotkiewski a écrit :
But where would I put it? If I put it in the while(1), then 
MPI_Comm_Accept cannot be called for the second time. If I put it 
outside of the loop it will never be called.



On 09/16/2015 04:18 PM, Jalel Chergui wrote:

Can you check with an MPI_Finalize in the receiver ?
Jalel

Le 16/09/2015 16:06, marcin.krotkiewski a écrit :
I have run into a freeze / potential bug when using MPI_Comm_accept 
in a simple client / server implementation. I have attached two 
simplest programs I could produce:


 1. mpi-receiver.c opens a port using MPI_Open_port, saves the port 
name to a file


 2. mpi-receiver enters infinite loop and waits for connections 
using MPI_Comm_accept


 3. mpi-sender.c connects to that port using MPI_Comm_connect, 
sends one MPI_UNSIGNED_LONG, calls barrier and disconnects using 
MPI_Comm_disconnect


 4. mpi-receiver reads the MPI_UNSIGNED_LONG, prints it, calls 
barrier and disconnects using MPI_Comm_disconnect and goes to point 
2 - infinite loop


All works fine, but only exactly 5 times. After that the receiver 
hangs in MPI_Recv, after exit from MPI_Comm_accept. That is 100% 
repeatable. I have tried with Intel MPI - no such problem.


I execute the programs using OpenMPI 1.10 as follows

mpirun -np 1 --mca mpi_leave_pinned 0 ./mpi-receiver


Do you have any clues what could be the reason? Am I doing sth 
wrong, or is it some problem with internal state of OpenMPI?


Thanks a lot!

Marcin



___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/09/27585.php


--
**
  Jalel CHERGUI, LIMSI-CNRS, Bât. 508 - BP 133, 91403 Orsay cedex, FRANCE
  Tél: (33 1) 69 85 81 27 ; Télécopie: (33 1) 69 85 80 88
  Mél:jalel.cher...@limsi.fr  ; Référence:http://perso.limsi.fr/chergui
**


___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/09/27586.php




___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/09/27587.php


--
**
  Jalel CHERGUI, LIMSI-CNRS, Bât. 508 - BP 133, 91403 Orsay cedex, FRANCE
  Tél: (33 1) 69 85 81 27 ; Télécopie: (33 1) 69 85 80 88
  Mél:jalel.cher...@limsi.fr  ; Référence:http://perso.limsi.fr/chergui
**


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27588.php




Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-16 Thread Ralph Castain
As I said, if you don’t provide an explicit slot count in your hostfile, we 
default to allowing oversubscription. We don’t have OAR integration in OMPI, 
and so mpirun isn’t recognizing that you are running under a resource manager - 
it thinks this is just being controlled by a hostfile.

If you want us to error out on oversubscription, you can either add the flag 
you identified, or simply change your hostfile to:

frog53 slots=4

Either will work.


> On Sep 16, 2015, at 1:00 AM, Patrick Begou 
>  wrote:
> 
> Thanks all for your answers, I've added some details about the tests I have 
> run.  See below.
> 
> 
> Ralph Castain wrote:
>> 
>> Not precisely correct. It depends on the environment.
>> 
>> If there is a resource manager allocating nodes, or you provide a hostfile 
>> that specifies the number of slots on the nodes, or you use -host, then we 
>> default to no-oversubscribe.
> I'm using a batch scheduler (OAR). 
> # cat /dev/cpuset/oar/begou_793/cpuset.cpus
> 4-7
> 
> So 4 cores allowed. Nodes have two height cores cpus.
> 
> Node file contains:
> # cat $OAR_NODEFILE
> frog53
> frog53
> frog53
> frog53
> 
> # mpirun --hostfile $OAR_NODEFILE -bind-to core location.exe 
> is  okay (my test code show one process on each core)
> (process 3) thread is now running on PU logical index 1 (OS/physical index 5) 
> on system frog53
> (process 0) thread is now running on PU logical index 3 (OS/physical index 7) 
> on system frog53
> (process 1) thread is now running on PU logical index 0 (OS/physical index 4) 
> on system frog53
> (process 2) thread is now running on PU logical index 2 (OS/physical index 6) 
> on system frog53
> 
> # mpirun -np 5 --hostfile $OAR_NODEFILE -bind-to core location.exe 
> oversuscribe with:
> (process 0) thread is now running on PU logical index 3 (OS/physical index 7) 
> on system frog53
> (process 1) thread is now running on PU logical index 1 (OS/physical index 5) 
> on system frog53
> (process 3) thread is now running on PU logical index 2 (OS/physical index 6) 
> on system frog53
> (process 4) thread is now running on PU logical index 0 (OS/physical index 4) 
> on system frog53
> (process 2) thread is now running on PU logical index 2 (OS/physical index 6) 
> on system frog53
> This is not allowed with OpenMPI 1.7.3
> 
> I can increase until the maximul core number of this first pocessor (8 cores) 
> # mpirun -np 8 --hostfile $OAR_NODEFILE -bind-to core location.exe |grep 
> 'thread is now running on PU'
> (process 5) thread is now running on PU logical index 1 (OS/physical index 5) 
> on system frog53
> (process 7) thread is now running on PU logical index 3 (OS/physical index 7) 
> on system frog53
> (process 4) thread is now running on PU logical index 0 (OS/physical index 4) 
> on system frog53
> (process 6) thread is now running on PU logical index 2 (OS/physical index 6) 
> on system frog53
> (process 2) thread is now running on PU logical index 1 (OS/physical index 5) 
> on system frog53
> (process 0) thread is now running on PU logical index 2 (OS/physical index 6) 
> on system frog53
> (process 1) thread is now running on PU logical index 0 (OS/physical index 4) 
> on system frog53
> (process 3) thread is now running on PU logical index 0 (OS/physical index 4) 
> on system frog53
> 
> But I cannot overload more than the 8 cores (max core number of one cpu). 
> # mpirun -np 9 --hostfile $OAR_NODEFILE -bind-to core location.exe
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
> 
>Bind to: CORE
>Node:frog53
>#processes:  2
>#cpus:   1
> 
> You can override this protection by adding the "overload-allowed"
> option to your binding directive.
> 
> Now if I add --nooversubscribe the problem doesn't exist anymore (no more 
> than 4 processes, one on each core). So looks like if default behavior would 
> be a nooversuscribe on cores number of the socket ???
> 
> Again, with 1.7.3 this problem doesn't occur at all.
> 
> Patrick
> 
> 
>> 
>> If you provide a hostfile that doesn’t specify slots, then we use the number 
>> of cores we find on each node, and we allow oversubscription.
>> 
>> What is being described sounds like more of a bug than an intended feature. 
>> I’d need to know more about it, though, to be sure. Can you tell me how you 
>> are specifying this cpuset?
>> 
>> 
>>> On Sep 15, 2015, at 4:44 PM, Matt Thompson >> > wrote:
>>> 
>>> Looking at the Open MPI 1.10.0 man page:
>>> 
>>>   https://www.open-mpi.org/doc/v1.10/man1/mpirun.1.php 
>>> 
>>> 
>>> it looks like perhaps -oversubscribe (which was an option) is now the 
>>> default behavior. Instead we have:
>>> 
>>> -nooversubscribe, --nooversubscribe
>>> Do not oversubscribe any nodes; error (without starting any processes) if 
>>> the requested number of processes would cause oversubscription. This option 
>>> implicitly sets "max_slots" equ

Re: [OMPI users] runtime MCA parameters

2015-09-16 Thread Jeff Squyres (jsquyres)
On Sep 16, 2015, at 8:22 AM, marcin.krotkiewski  
wrote:
> 
> Thanks a lot, that looks right! Looks like some reading to do..
> 
> Do you know if in the OpenMPI implementation the MPI_T-interfaced MCA 
> settings are thread-local, or rank-local? 

By "rank local", I assume you mean "process local" (remember: every MPI process 
has at least 2 ranks -- one in MPI_COMM_WORLD, and one in MPI_COMM_SELF).

All MPI_T interfaces in MPI are local to the MPI process.

How each MPI implementation defines an "MPI process" is up to them; Open MPI 
defines an "MPI process" as an "OS process".

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] bug in MPI_Comm_accept?

2015-09-16 Thread Jalel Chergui

With openmpi-1.7.5, the sender segfaults.
Sorry, I cannot see the problem in the codes. Perhaps people out there 
may help.


Jalel


Le 16/09/2015 16:40, marcin.krotkiewski a écrit :


I have removed the MPI_Barrier, to no avail. Same thing happens. 
Adding verbosity, before the receiver hangs I get the following message


[node2:03928] mca: bml: Using openib btl to [[12620,1],0] on node node3

So It is somewhere in the openib btl module

Marcin


On 09/16/2015 04:34 PM, Jalel Chergui wrote:
Right, anyway Finalize is necessary at the end of the receiver. The 
other issue is Barrier which is invoked probably when the sender has 
exited hence changing the size of intercom. Can you comment that line 
in both files ?


Jalel

Le 16/09/2015 16:22, Marcin Krotkiewski a écrit :
But where would I put it? If I put it in the while(1), then 
MPI_Comm_Accept cannot be called for the second time. If I put it 
outside of the loop it will never be called.



On 09/16/2015 04:18 PM, Jalel Chergui wrote:

Can you check with an MPI_Finalize in the receiver ?
Jalel

Le 16/09/2015 16:06, marcin.krotkiewski a écrit :
I have run into a freeze / potential bug when using 
MPI_Comm_accept in a simple client / server implementation. I have 
attached two simplest programs I could produce:


 1. mpi-receiver.c opens a port using MPI_Open_port, saves the 
port name to a file


 2. mpi-receiver enters infinite loop and waits for connections 
using MPI_Comm_accept


 3. mpi-sender.c connects to that port using MPI_Comm_connect, 
sends one MPI_UNSIGNED_LONG, calls barrier and disconnects using 
MPI_Comm_disconnect


 4. mpi-receiver reads the MPI_UNSIGNED_LONG, prints it, calls 
barrier and disconnects using MPI_Comm_disconnect and goes to 
point 2 - infinite loop


All works fine, but only exactly 5 times. After that the receiver 
hangs in MPI_Recv, after exit from MPI_Comm_accept. That is 100% 
repeatable. I have tried with Intel MPI - no such problem.


I execute the programs using OpenMPI 1.10 as follows

mpirun -np 1 --mca mpi_leave_pinned 0 ./mpi-receiver


Do you have any clues what could be the reason? Am I doing sth 
wrong, or is it some problem with internal state of OpenMPI?


Thanks a lot!

Marcin



___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/09/27585.php


--
**
  Jalel CHERGUI, LIMSI-CNRS, Bât. 508 - BP 133, 91403 Orsay cedex, FRANCE
  Tél: (33 1) 69 85 81 27 ; Télécopie: (33 1) 69 85 80 88
  Mél:jalel.cher...@limsi.fr  ; Référence:http://perso.limsi.fr/chergui
**


___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/09/27586.php




___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/09/27587.php


--
**
  Jalel CHERGUI, LIMSI-CNRS, Bât. 508 - BP 133, 91403 Orsay cedex, FRANCE
  Tél: (33 1) 69 85 81 27 ; Télécopie: (33 1) 69 85 80 88
  Mél:jalel.cher...@limsi.fr  ; Référence:http://perso.limsi.fr/chergui
**


___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/09/27588.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27589.php


--
**
 Jalel CHERGUI, LIMSI-CNRS, Bât. 508 - BP 133, 91403 Orsay cedex, FRANCE
 Tél: (33 1) 69 85 81 27 ; Télécopie: (33 1) 69 85 80 88
 Mél: jalel.cher...@limsi.fr ; Référence: http://perso.limsi.fr/chergui
**



[OMPI users] open mpi gcc

2015-09-16 Thread Kumar, Sudhir
Hi
 We are currently using openmpi 1.8.5 and gcc 4.4.7, we would like to change 
the associated gcc to gcc 4.1.2 for our openmpi 1.8.5 installation.
 Is this possible. If so how can it be done.
Thanks
Sudhir Kumar




Re: [OMPI users] bug in MPI_Comm_accept? (UNCLASSIFIED)

2015-09-16 Thread Burns, Andrew J CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED

Have you attempted using 2 cores per process? I have noticed that 
MPI_Comm_accept sometimes behaves strangely on single core variations.

I have a program that makes use of Comm_accept/connect and I also call 
MPI_Comm_merge. So, you may want to look into that call as well.

-Andrew Burns

-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jalel Chergui
Sent: Wednesday, September 16, 2015 11:49 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] bug in MPI_Comm_accept?

This email was sent from a non-Department of Defense email account, and 
contained active links. All links are disabled, and require you to copy and 
paste the address to a Web browser. Please verify the identity of the sender, 
and confirm authenticity of all links contained within the message.  



 
With openmpi-1.7.5, the sender segfaults.
Sorry, I cannot see the problem in the codes. Perhaps people out there may help.

Jalel


Le 16/09/2015 16:40, marcin.krotkiewski a ?crit :

I have removed the MPI_Barrier, to no avail. Same thing happens. Adding 
verbosity, before the receiver hangs I get the following message

[node2:03928] mca: bml: Using openib btl to [[12620,1],0] on node node3

So It is somewhere in the openib btl module

Marcin


On 09/16/2015 04:34 PM, Jalel Chergui wrote:
Right, anyway Finalize is necessary at the end of the receiver. The other issue 
is Barrier which is invoked probably when the sender has exited hence changing 
the size of intercom. Can you comment that line in both files ?

Jalel

Le 16/09/2015 16:22, Marcin Krotkiewski a ?crit :
But where would I put it? If I put it in the while(1), then MPI_Comm_Accept 
cannot be called for the second time. If I put it outside of the loop it will 
never be called.


On 09/16/2015 04:18 PM, Jalel Chergui wrote:
Can you check with an MPI_Finalize in the receiver ?
Jalel

Le 16/09/2015 16:06, marcin.krotkiewski a ?crit :
I have run into a freeze / potential bug when using MPI_Comm_accept in a simple 
client / server implementation. I have attached two simplest programs I could 
produce:

 1. mpi-receiver.c opens a port using MPI_Open_port, saves the port name to a 
file

 2. mpi-receiver enters infinite loop and waits for connections using 
MPI_Comm_accept

 3. mpi-sender.c connects to that port using MPI_Comm_connect, sends one 
MPI_UNSIGNED_LONG, calls barrier and disconnects using MPI_Comm_disconnect

 4. mpi-receiver reads the MPI_UNSIGNED_LONG, prints it, calls barrier and 
disconnects using MPI_Comm_disconnect and goes to point 2 - infinite loop

All works fine, but only exactly 5 times. After that the receiver hangs in 
MPI_Recv, after exit from MPI_Comm_accept. That is 100% repeatable. I have 
tried with Intel MPI - no such problem.

I execute the programs using OpenMPI 1.10 as follows

mpirun -np 1 --mca mpi_leave_pinned 0 ./mpi-receiver


Do you have any clues what could be the reason? Am I doing sth wrong, or is it 
some problem with internal state of OpenMPI?

Thanks a lot!

Marcin




___
users mailing list
us...@open-mpi.org
Subscription: Caution-www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
Caution-www.open-mpi.org/community/lists/users/2015/09/27585.php


--
**
 Jalel CHERGUI, LIMSI-CNRS, B?t. 508 - BP 133, 91403 Orsay cedex, FRANCE
 T?l: (33 1) 69 85 81 27 ; T?l?copie: (33 1) 69 85 80 88
 M?l: jalel.cher...@limsi.fr ; R?f?rence: 
Caution-perso.limsi.fr/chergui
**




___
users mailing list
us...@open-mpi.org
Subscription: Caution-www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
Caution-www.open-mpi.org/community/lists/users/2015/09/27586.php




___
users mailing list
us...@open-mpi.org
Subscription: Caution-www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
Caution-www.open-mpi.org/community/lists/users/2015/09/27587.php


--
**
 Jalel CHERGUI, LIMSI-CNRS, B?t. 508 - BP 133, 91403 Orsay cedex, FRANCE
 T?l: (33 1) 69 85 81 27 ; T?l?copie: (33 1) 69 85 80 88
 M?l: jalel.cher...@limsi.fr ; R?f?rence: 
Caution-perso.limsi.fr/chergui
**




___
users mailing list
us...@open-mpi.org
Subscription: Caution-www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
Caution-www.open-mpi.org/community/lists/users/2015/09/27588.php




___
users mailing list
us...@open-mpi.org

Re: [OMPI users] open mpi gcc

2015-09-16 Thread Ralph Castain
Have you tried just rebuilding OMPI after setting gcc 4.1.2 at the front of 
your PATH and LD_LIBRARY_PATH?


> On Sep 16, 2015, at 8:58 AM, Kumar, Sudhir  wrote:
> 
> Hi
> We are currently using openmpi 1.8.5 and gcc 4.4.7, we would like to change 
> the associated gcc to gcc 4.1.2 for our openmpi 1.8.5 installation.
> Is this possible. If so how can it be done.
> Thanks
> Sudhir Kumar
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27593.php



Re: [OMPI users] open mpi gcc

2015-09-16 Thread Kumar, Sudhir
Haven't tried that. Will try that approach.
Thanks
Sudhir Kumar



-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Wednesday, September 16, 2015 11:05 AM
To: Open MPI Users
Subject: Re: [OMPI users] open mpi gcc

Have you tried just rebuilding OMPI after setting gcc 4.1.2 at the front of 
your PATH and LD_LIBRARY_PATH?


> On Sep 16, 2015, at 8:58 AM, Kumar, Sudhir  wrote:
> 
> Hi
> We are currently using openmpi 1.8.5 and gcc 4.4.7, we would like to change 
> the associated gcc to gcc 4.1.2 for our openmpi 1.8.5 installation.
> Is this possible. If so how can it be done.
> Thanks
> Sudhir Kumar
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27593.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27595.php


Re: [OMPI users] bug in MPI_Comm_accept? (UNCLASSIFIED)

2015-09-16 Thread marcin.krotkiewski

Thank you all for your replies.

I have now tested the code with various setups and versions. First of 
all, the tcp btl seems to work fine (I had patience to check ~10 runs), 
the openib is the problem. I have also compiled using the Intel compiler 
and the story is the same as when using gcc.


I have then tested many openmpi versions from 1.7.5 to 1.10.0 using 
bisection ;) Versions up to and including 1.8.3 worked fine (at least 
above 5 times and around 10), the problem was likely introduced in 
version 1.8.4. Actually, version 1.8.4 was the only one to spit out some 
interesting warning on the receiver side at the moment it hang:


[warn] opal_libevent2021_event_base_loop: reentrant invocation. Only one 
event_base_loop can run on each event_base at once.


which may or may not be of importance in this particular case ;)

So to summarize, problem appeared in openib btl in version 1.8.4.

Does anybody have any more ideas?

Thanks!

Marcin



On 09/16/2015 05:59 PM, Burns, Andrew J CTR USARMY RDECOM ARL (US) wrote:

CLASSIFICATION: UNCLASSIFIED

Have you attempted using 2 cores per process? I have noticed that 
MPI_Comm_accept sometimes behaves strangely on single core variations.

I have a program that makes use of Comm_accept/connect and I also call 
MPI_Comm_merge. So, you may want to look into that call as well.

-Andrew Burns

-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jalel Chergui
Sent: Wednesday, September 16, 2015 11:49 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] bug in MPI_Comm_accept?

This email was sent from a non-Department of Defense email account, and 
contained active links. All links are disabled, and require you to copy and 
paste the address to a Web browser. Please verify the identity of the sender, 
and confirm authenticity of all links contained within the message.



  
With openmpi-1.7.5, the sender segfaults.

Sorry, I cannot see the problem in the codes. Perhaps people out there may help.

Jalel


Le 16/09/2015 16:40, marcin.krotkiewski a ?crit :

I have removed the MPI_Barrier, to no avail. Same thing happens. Adding 
verbosity, before the receiver hangs I get the following message

[node2:03928] mca: bml: Using openib btl to [[12620,1],0] on node node3

So It is somewhere in the openib btl module

Marcin


On 09/16/2015 04:34 PM, Jalel Chergui wrote:
Right, anyway Finalize is necessary at the end of the receiver. The other issue 
is Barrier which is invoked probably when the sender has exited hence changing 
the size of intercom. Can you comment that line in both files ?

Jalel

Le 16/09/2015 16:22, Marcin Krotkiewski a ?crit :
But where would I put it? If I put it in the while(1), then MPI_Comm_Accept 
cannot be called for the second time. If I put it outside of the loop it will 
never be called.


On 09/16/2015 04:18 PM, Jalel Chergui wrote:
Can you check with an MPI_Finalize in the receiver ?
Jalel

Le 16/09/2015 16:06, marcin.krotkiewski a ?crit :
I have run into a freeze / potential bug when using MPI_Comm_accept in a simple 
client / server implementation. I have attached two simplest programs I could 
produce:

  1. mpi-receiver.c opens a port using MPI_Open_port, saves the port name to a 
file

  2. mpi-receiver enters infinite loop and waits for connections using 
MPI_Comm_accept

  3. mpi-sender.c connects to that port using MPI_Comm_connect, sends one 
MPI_UNSIGNED_LONG, calls barrier and disconnects using MPI_Comm_disconnect

  4. mpi-receiver reads the MPI_UNSIGNED_LONG, prints it, calls barrier and 
disconnects using MPI_Comm_disconnect and goes to point 2 - infinite loop

All works fine, but only exactly 5 times. After that the receiver hangs in 
MPI_Recv, after exit from MPI_Comm_accept. That is 100% repeatable. I have 
tried with Intel MPI - no such problem.

I execute the programs using OpenMPI 1.10 as follows

mpirun -np 1 --mca mpi_leave_pinned 0 ./mpi-receiver


Do you have any clues what could be the reason? Am I doing sth wrong, or is it 
some problem with internal state of OpenMPI?

Thanks a lot!

Marcin




___
users mailing list
us...@open-mpi.org
Subscription: Caution-www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
Caution-www.open-mpi.org/community/lists/users/2015/09/27585.php


--
**
  Jalel CHERGUI, LIMSI-CNRS, B?t. 508 - BP 133, 91403 Orsay cedex, FRANCE
  T?l: (33 1) 69 85 81 27 ; T?l?copie: (33 1) 69 85 80 88
  M?l: jalel.cher...@limsi.fr ; R?f?rence: 
Caution-perso.limsi.fr/chergui
**




___
users mailing list
us...@open-mpi.org
Subscription: Caution-www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
Caution-www.open-mpi.org/community/lists/u

[OMPI users] XHPL question

2015-09-16 Thread Mark Moorcroft
I found the thread below from May. I’m setting up a new cluster and using
openmpi 1.10. I have a gnu build and an Intel. Neither has libmpi.so.1. I
created a symlink and it’s working. My question is if I should try to
rebuild LAPACK, and is it wise to be adding that link? For me it’s just
burn-in and testing. I don’t want to create issues for the scientists
later. Was this link purposely removed some number of versions back?


Thanks




Ralph,

I copied the LAPACK benchmark binaries (xhpl being the binary) over to a
development system (which

is running the same version of CentOS) but I'm getting some errors trying
to run the OpenMPI LAPACK benchmark

on OpenMPI 1.8.5:

xhpl: error while loading shared libraries: libmpi.so.1: cannot open shared
object file: No such file or directory

xhpl: error while loading shared libraries: libmpi.so.1: cannot open shared
object file: No such file or directory

xhpl: error while loading shared libraries: libmpi.so.1: cannot open shared
object file: No such file or directory

xhpl: error while loading shared libraries: libmpi.so.1: cannot open shared
object file: No such file or directory

When I look at the 1.8.5 install directory I find the following shared
object library but no libmpi.so.1

/apps/mpi/openmpi/1.8.5-dev/lib/libmpi.so

/apps/mpi/openmpi/1.8.5-dev/lib/libmpi.so.0

Is it necessary to re-compile the OpenMPI LAPACK benchmark to run OpenMPI
1.8.5

as opposed to 1.8.2?

-Bill L.


[OMPI users] Contact?

2015-09-16 Thread Mark Moorcroft
It's worth a mention that I made several attempts to add a NASA email
address to this list. Nothing ever happened. Nothing bounced. I never got a
validation email. Nothing appeared in our spam filter. I emailed
webmaster@openmpi and got no reply. There seems to be no contact conduit
but these lists. If you can't join these lists you don't exist. I added a
gmail address and it worked instantaneously.


Re: [OMPI users] Contact?

2015-09-16 Thread Jeff Squyres (jsquyres)
Sorry for the trouble.  FWIW, there actually is a real, live sysadmin at 
Indiana University who actually receives the webmaster emails; he usually 
forwards such emails to me.

Ping me off-list and we can dig into why your NASA email address didn't work 
(i.e., I can ask the IU sysadmins to look in the logs to see what happened).


> On Sep 16, 2015, at 5:52 PM, Mark Moorcroft  wrote:
> 
> 
> It's worth a mention that I made several attempts to add a NASA email address 
> to this list. Nothing ever happened. Nothing bounced. I never got a 
> validation email. Nothing appeared in our spam filter. I emailed 
> webmaster@openmpi and got no reply. There seems to be no contact conduit but 
> these lists. If you can't join these lists you don't exist. I added a gmail 
> address and it worked instantaneously.
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27599.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] XHPL question

2015-09-16 Thread Ralph Castain
Jeff will undoubtedly start typing before he reads my response, so I'll
spare you from reading all the ugly details twice :-)

There was an unintentional ABI break in the 1.8 series that necessitated a
version numbering change to libmpi. It involves the code that handles the
connection between a process and its local daemon. If you hard link (e.g.,
static build) your app against a pre-1.8.5 lib and then run it against a
1.8.5+ version of mpirun, it will fail.

However, if you dynamically link, everything should be fine so long as the
app's LD_LIBRARY_PATH points to the 1.8.5+ shared libs.

Ralph



On Wed, Sep 16, 2015 at 2:49 PM, Mark Moorcroft  wrote:

>
>
> I found the thread below from May. I’m setting up a new cluster and using
> openmpi 1.10. I have a gnu build and an Intel. Neither has libmpi.so.1. I
> created a symlink and it’s working. My question is if I should try to
> rebuild LAPACK, and is it wise to be adding that link? For me it’s just
> burn-in and testing. I don’t want to create issues for the scientists
> later. Was this link purposely removed some number of versions back?
>
>
> Thanks
>
>
>
>
> Ralph,
>
> I copied the LAPACK benchmark binaries (xhpl being the binary) over to a
> development system (which
>
> is running the same version of CentOS) but I'm getting some errors trying
> to run the OpenMPI LAPACK benchmark
>
> on OpenMPI 1.8.5:
>
> xhpl: error while loading shared libraries: libmpi.so.1: cannot open
> shared object file: No such file or directory
>
> xhpl: error while loading shared libraries: libmpi.so.1: cannot open
> shared object file: No such file or directory
>
> xhpl: error while loading shared libraries: libmpi.so.1: cannot open
> shared object file: No such file or directory
>
> xhpl: error while loading shared libraries: libmpi.so.1: cannot open
> shared object file: No such file or directory
>
> When I look at the 1.8.5 install directory I find the following shared
> object library but no libmpi.so.1
>
> /apps/mpi/openmpi/1.8.5-dev/lib/libmpi.so
>
> /apps/mpi/openmpi/1.8.5-dev/lib/libmpi.so.0
>
> Is it necessary to re-compile the OpenMPI LAPACK benchmark to run OpenMPI
> 1.8.5
>
> as opposed to 1.8.2?
>
> -Bill L.
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27598.php
>


Re: [OMPI users] XHPL question

2015-09-16 Thread Mark Moorcroft
Hmm, I'm pretty sure my xhpl binary is/was dynamic linked. Here is my env:

LD_LIBRARY_PATH=/share/apps/openmpi-1.10-intel-x86_64/lib:/share/apps/Intel/composer_xe_2015.2.164/compiler/lib/intel64:/share/apps/Intel/composer_xe_2015.2.164/mkl/lib/intel64:/opt/python/lib

The binary fails without a symlink to libmpi.so.1.

So I'm not sure if you're saying yes create a symlink or yes rebuild the
binary. Or it should work, and no symlink should be necessary.

>Jeff will undoubtedly start typing before he reads my response, so I'll
>spare you from reading all the ugly details twice :-)
>There was an unintentional ABI break in the 1.8 series that necessitated a
>version numbering change to libmpi. It involves the code that handles the
>connection between a process and its local daemon. If you hard link (e.g.,
>static build) your app against a pre-1.8.5 lib and then run it against a
>1.8.5+ version of mpirun, it will fail.
>However, if you dynamically link, everything should be fine so long as the
>app's LD_LIBRARY_PATH points to the 1.8.5+ shared libs.
>Ralph


Re: [OMPI users] XHPL question

2015-09-16 Thread Ralph Castain
Looks like you are trying to link it against the 1.10 series? You could
probably get away with the symlink, but unless there is some reason to
avoid it, I'd just recompile to be safe.


On Wed, Sep 16, 2015 at 7:36 PM, Mark Moorcroft  wrote:

>
>
> Hmm, I'm pretty sure my xhpl binary is/was dynamic linked. Here is my env:
>
>
> LD_LIBRARY_PATH=/share/apps/openmpi-1.10-intel-x86_64/lib:/share/apps/Intel/composer_xe_2015.2.164/compiler/lib/intel64:/share/apps/Intel/composer_xe_2015.2.164/mkl/lib/intel64:/opt/python/lib
>
> The binary fails without a symlink to libmpi.so.1.
>
> So I'm not sure if you're saying yes create a symlink or yes rebuild the
> binary. Or it should work, and no symlink should be necessary.
>
> >Jeff will undoubtedly start typing before he reads my response, so I'll
> >spare you from reading all the ugly details twice :-)
> >There was an unintentional ABI break in the 1.8 series that necessitated
> a
> >version numbering change to libmpi. It involves the code that handles the
> >connection between a process and its local daemon. If you hard link
> (e.g.,
> >static build) your app against a pre-1.8.5 lib and then run it against a
> >1.8.5+ version of mpirun, it will fail.
> >However, if you dynamically link, everything should be fine so long as
> the
> >app's LD_LIBRARY_PATH points to the 1.8.5+ shared libs.
> >Ralph
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27602.php
>