Re: [OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Ralph Castain
Afraid I have no brilliant ideas to offer - I’m not seeing that problem. It usually indicates that the orte_schizo plugin is being pulled from an incorrect location. You might just look in your install directory and ensure that the plugin is there. Also ensure that your install lib is at the fro

[OMPI users] This list is migrating!

2016-07-19 Thread Jeff Squyres (jsquyres)
Short version = The server for this mailing list will be migrating sometime soon (the exact timing is not fully predictable). Three things you need to know: 1. We'll send a "This list is now closed for migration" last message when the migration starts 2. We'll send a "This list is

Re: [OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Nathaniel Graham
Ive also blown away the install directory and did a complete reinstall in case there was something old left in the directory. -Nathan On Tue, Jul 19, 2016 at 2:21 PM, Nathaniel Graham wrote: > The prefix location has to be there. Otherwise ompi attempts to install > to a read only directory. >

Re: [OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Nathaniel Graham
The prefix location has to be there. Otherwise ompi attempts to install to a read only directory. I have the install bin directory added to my path and the lib directory added to the LD_LIBRARY_PATH. When I run: which mpirun it is pointing to the expected place. -Nathan On Tue, Jul 19, 2016 at

Re: [OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Ralph Castain
Sounds to me like you have a confused build - I’d whack your prefix location and do a “make install” again > On Jul 19, 2016, at 1:04 PM, Nathaniel Graham wrote: > > Hello, > > I am trying to run the OSU tests for some results for a poster, but I am > getting the following error: > > mpi

[OMPI users] mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking

2016-07-19 Thread Nathaniel Graham
Hello, I am trying to run the OSU tests for some results for a poster, but I am getting the following error: mpirun: Symbol `orte_schizo' has different size in shared object, consider re-linking I am building off master with gcc on Red Hat Enterprise Linux Server release 6.7. My config comm

Re: [OMPI users] Help - Client / server - app hangs in connect/accept by the second or next client that wants to connect to server

2016-07-19 Thread Gilles Gouaillardet
my bad for the confusion, I misread you and miswrote my reply. I will investigate this again. strictly speaking, the clients can only start after the server first write the port info to a file. if you start the client right after the server start, they might use incorrect/outdated info and cause

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-19 Thread Sylvain Jeaugey
As a workaround, you can also try adding -noswitcherror to PGCC flags. On 07/11/2016 03:52 PM, Åke Sandgren wrote: Looks like you are compiling with slurm support. If so, you need to remove the "-pthread" from libslurm.la and libpmi.la On 07/11/2016 02:54 PM, Michael Di Domenico wrote: I'm tr

Re: [OMPI users] Help - Client / server - app hangs in connect/accept by the second or next client that wants to connect to server

2016-07-19 Thread M. D.
Yes I understand it, but I think, this is exactly that situation you are talking about. In my opinion, the test is doing exactly what you said - when a new player is willing to join, other players must invoke MPI_Comm_accept(). All *other* players must invoke MPI_Comm_accept(). Only the last client

Re: [OMPI users] Help - Client / server - app hangs in connect/accept by the second or next client that wants to connect to server

2016-07-19 Thread Gilles Gouaillardet
here is what the client is doing printf("CLIENT: after merging, new comm: size=%d rank=%d\n", size, rank) ; for (i = rank ; i < num_clients ; i++) { /* client performs a collective accept */ CHK(MPI_Comm_accept(server_port_name, MPI_INFO_NULL, 0, intracomm, &intercomm)

Re: [OMPI users] Help - Client / server - app hangs in connect/accept by the second or next client that wants to connect to server

2016-07-19 Thread M. D.
2016-07-19 10:06 GMT+02:00 Gilles Gouaillardet : > MPI_Comm_accept must be called by all the tasks of the local communicator. > Yes, that's how I understand it. In the source code of the test, all the tasks call MPI_Comm_accept - server and also relevant clients. > so if you > > 1) mpirun -np 1

Re: [OMPI users] Help - Client / server - app hangs in connect/accept by the second or next client that wants to connect to server

2016-07-19 Thread Gilles Gouaillardet
MPI_Comm_accept must be called by all the tasks of the local communicator. so if you 1) mpirun -np 1 ./singleton_client_server 2 1 2) mpirun -np 1 ./singleton_client_server 2 0 3) mpirun -np 1 ./singleton_client_server 2 0 then 3) starts after 2) has exited, so on 1), intracomm is made of 1)

Re: [OMPI users] Help - Client / server - app hangs in connect/accept by the second or next client that wants to connect to server

2016-07-19 Thread M. D.
Hi, thank you for your interest in this topic. So, I normally run the test as follows: Firstly, I run "server" (second parameter is 1): *mpirun -np 1 ./singleton_client_server number_of_clients 1* Secondly, I run corresponding number of "clients" via following command: *mpirun -np 1 ./singleton_c

Re: [OMPI users] Forcing TCP btl

2016-07-19 Thread Saliya Ekanayake
Thank you, Gilles. That explains it! On Tue, Jul 19, 2016 at 1:14 AM, Gilles Gouaillardet wrote: > basically, there are two methods (aka pml) to send/recv messages. > > ob1 is the basic one, it works with (all ?) interconnects that can > send/recv a stream of data > > pml/ob1 uses the available

Re: [OMPI users] Forcing TCP btl

2016-07-19 Thread Gilles Gouaillardet
basically, there are two methods (aka pml) to send/recv messages. ob1 is the basic one, it works with (all ?) interconnects that can send/recv a stream of data pml/ob1 uses the available btl(s) (tcp, openib, ...) cm is for feature rich interconnects that can send/recv messages pml/cm uses th

Re: [OMPI users] Forcing TCP btl

2016-07-19 Thread Saliya Ekanayake
Thank you, but what's mxm? On Tue, Jul 19, 2016 at 12:52 AM, Nathan Hjelm wrote: > You probably will also want to run with -mca pml ob1 to make sure mxm is > not in use. The combination should be sufficient to force tcp usage. > > -Nathan > > > On Jul 18, 2016, at 10:50 PM, Saliya Ekanayake > w

Re: [OMPI users] Forcing TCP btl

2016-07-19 Thread Nathan Hjelm
You probably will also want to run with -mca pml ob1 to make sure mxm is not in use. The combination should be sufficient to force tcp usage. -Nathan > On Jul 18, 2016, at 10:50 PM, Saliya Ekanayake wrote: > > Hi, > > I read in a previous thread > (https://www.open-mpi.org/community/lists/us

[OMPI users] Forcing TCP btl

2016-07-19 Thread Saliya Ekanayake
Hi, I read in a previous thread ( https://www.open-mpi.org/community/lists/users/2014/05/24475.php) that Jeff mentions it's possible for OpenMPI to pick up the openib transport if tcp is not requested explicitly. So, does that mean if I do, --mca btl ^openib that it's still possible for OpenMPI

Re: [OMPI users] Help - Client / server - app hangs in connect/accept by the second or next client that wants to connect to server

2016-07-19 Thread Gilles Gouaillardet
How do you run the test ? you should have the same number of clients in each mpirun instance, the following simple shell starts the test as i think it is supposed to note the test itself is arguable since MPI_Comm_disconnect() is never invoked (and you will observe some related dpm_base_dis