Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-17 Thread Gilles Gouaillardet

Ralph,

you can reproduce this with master by manually creating a cpuset with 
less cores than available,

and invoke mpirun with -bind-to core from within the cpuset.

i made PR 904 https://github.com/open-mpi/ompi/pull/904

Brice,

can you please double check the hwloc_bitmap_isincluded invokation is 
correct ?

an other way to fix this could be to always set opal_hwloc_base_cpu_set

Cheers,

Gilles



On 9/16/2015 11:47 PM, Ralph Castain wrote:
As I said, if you don’t provide an explicit slot count in your 
hostfile, we default to allowing oversubscription. We don’t have OAR 
integration in OMPI, and so mpirun isn’t recognizing that you are 
running under a resource manager - it thinks this is just being 
controlled by a hostfile.


If you want us to error out on oversubscription, you can either add 
the flag you identified, or simply change your hostfile to:


frog53 slots=4

Either will work.


On Sep 16, 2015, at 1:00 AM, Patrick Begou 
> wrote:


Thanks all for your answers, I've added some details about the tests 
I have run.  See below.



Ralph Castain wrote:

Not precisely correct. It depends on the environment.

If there is a resource manager allocating nodes, or you provide a 
hostfile that specifies the number of slots on the nodes, or you use 
-host, then we default to no-oversubscribe.

I'm using a batch scheduler (OAR).
# cat /dev/cpuset/oar/begou_793/cpuset.cpus
4-7

So 4 cores allowed. Nodes have two height cores cpus.

Node file contains:
# cat $OAR_NODEFILE
frog53
frog53
frog53
frog53

# mpirun --hostfile $OAR_NODEFILE -bind-to core location.exe
is  okay (my test code show one process on each core)
(process 3) thread is now running on PU logical index 1 (OS/physical 
index 5) on system frog53
(process 0) thread is now running on PU logical index 3 (OS/physical 
index 7) on system frog53
(process 1) thread is now running on PU logical index 0 (OS/physical 
index 4) on system frog53
(process 2) thread is now running on PU logical index 2 (OS/physical 
index 6) on system frog53


# mpirun -np 5 --hostfile $OAR_NODEFILE -bind-to core location.exe
oversuscribe with:
(process 0) thread is now running on PU logical index 3 (OS/physical 
index 7) on system frog53
(process 1) thread is now running on PU logical index 1 (OS/physical 
index 5) on system frog53
(*process 3*) thread is now running on PU logical index*2 
(OS/physical index 6)*on system frog53
(process 4) thread is now running on PU logical index 0 (OS/physical 
index 4) on system frog53
(*process 2*) thread is now running on PU logical index*2 
(OS/physical index 6)*on system frog53

This is not allowed with OpenMPI 1.7.3

I can increase until the maximul core number of this first pocessor 
(8 cores)
# mpirun -np 8 --hostfile $OAR_NODEFILE -bind-to core location.exe 
|grep 'thread is now running on PU'
(process 5) thread is now running on PU logical index 1 (OS/physical 
index 5) on system frog53
(process 7) thread is now running on PU logical index 3 (OS/physical 
index 7) on system frog53
(process 4) thread is now running on PU logical index 0 (OS/physical 
index 4) on system frog53
(process 6) thread is now running on PU logical index 2 (OS/physical 
index 6) on system frog53
(process 2) thread is now running on PU logical index 1 (OS/physical 
index 5) on system frog53
(process 0) thread is now running on PU logical index 2 (OS/physical 
index 6) on system frog53
(process 1) thread is now running on PU logical index 0 (OS/physical 
index 4) on system frog53
(process 3) thread is now running on PU logical index 0 (OS/physical 
index 4) on system frog53


But I cannot overload more than the 8 cores (max core number of one cpu).
#mpirun -np 9 --hostfile $OAR_NODEFILE -bind-to core location.exe
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node:frog53
   #processes:  2
   #cpus:   1

You can override this protection by adding the "overload-allowed"
option to your binding directive.

Now if I add*--nooversubscribe*the problem doesn't exist anymore (no 
more than 4 processes, one on each core). So looks like if default 
behavior would be a nooversuscribe on cores number of the socket ???


Again, with 1.7.3 this problem doesn't occur at all.

Patrick




If you provide a hostfile that doesn’t specify slots, then we use 
the number of cores we find on each node, and we allow oversubscription.


What is being described sounds like more of a bug than an intended 
feature. I’d need to know more about it, though, to be sure. Can you 
tell me how you are specifying this cpuset?



On Sep 15, 2015, at 4:44 PM, Matt Thompson > wrote:


Looking at the Open MPI 1.10.0 man page:

https://www.open-mpi.org/doc/v1.10/man1/mpirun.1.php

it looks like perhaps -oversubscribe (which was an option) is now 
the default behavior. Instead we have:


*-nooversubscribe, --nooversubscribe*
Do n

Re: [OMPI users] open mpi gcc

2015-09-17 Thread Thomas Jahns

The usual way to override the C compiler is to invoke configure like this:

./configure CC=nameofcc

Regards, Thomas



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] OpenMPI-1.10.0 bind-to core error

2015-09-17 Thread Ralph Castain
Thanks Gilles!!


On Wed, Sep 16, 2015 at 9:21 PM, Gilles Gouaillardet 
wrote:

> Ralph,
>
> you can reproduce this with master by manually creating a cpuset with less
> cores than available,
> and invoke mpirun with -bind-to core from within the cpuset.
>
> i made PR 904 https://github.com/open-mpi/ompi/pull/904
>
> Brice,
>
> can you please double check the hwloc_bitmap_isincluded invokation is
> correct ?
> an other way to fix this could be to always set opal_hwloc_base_cpu_set
>
> Cheers,
>
> Gilles
>
>
>
>
> On 9/16/2015 11:47 PM, Ralph Castain wrote:
>
> As I said, if you don’t provide an explicit slot count in your hostfile,
> we default to allowing oversubscription. We don’t have OAR integration in
> OMPI, and so mpirun isn’t recognizing that you are running under a resource
> manager - it thinks this is just being controlled by a hostfile.
>
> If you want us to error out on oversubscription, you can either add the
> flag you identified, or simply change your hostfile to:
>
> frog53 slots=4
>
> Either will work.
>
>
> On Sep 16, 2015, at 1:00 AM, Patrick Begou <
> patrick.be...@legi.grenoble-inp.fr> wrote:
>
> Thanks all for your answers, I've added some details about the tests I
> have run.  See below.
>
>
> Ralph Castain wrote:
>
> Not precisely correct. It depends on the environment.
>
> If there is a resource manager allocating nodes, or you provide a hostfile
> that specifies the number of slots on the nodes, or you use -host, then we
> default to no-oversubscribe.
>
> I'm using a batch scheduler (OAR).
> # cat /dev/cpuset/oar/begou_793/cpuset.cpus
> 4-7
>
> So 4 cores allowed. Nodes have two height cores cpus.
>
> Node file contains:
> # cat $OAR_NODEFILE
> frog53
> frog53
> frog53
> frog53
>
> # mpirun --hostfile $OAR_NODEFILE -bind-to core location.exe
> is  okay (my test code show one process on each core)
> (process 3) thread is now running on PU logical index 1 (OS/physical index
> 5) on system frog53
> (process 0) thread is now running on PU logical index 3 (OS/physical index
> 7) on system frog53
> (process 1) thread is now running on PU logical index 0 (OS/physical index
> 4) on system frog53
> (process 2) thread is now running on PU logical index 2 (OS/physical index
> 6) on system frog53
>
> # mpirun -np 5 --hostfile $OAR_NODEFILE -bind-to core location.exe
> oversuscribe with:
> (process 0) thread is now running on PU logical index 3 (OS/physical index
> 7) on system frog53
> (process 1) thread is now running on PU logical index 1 (OS/physical index
> 5) on system frog53
> (*process 3*) thread is now running on PU logical index *2 (OS/physical
> index 6)* on system frog53
> (process 4) thread is now running on PU logical index 0 (OS/physical index
> 4) on system frog53
> (*process 2*) thread is now running on PU logical index *2 (OS/physical
> index 6)* on system frog53
> This is not allowed with OpenMPI 1.7.3
>
> I can increase until the maximul core number of this first pocessor (8
> cores)
> # mpirun -np 8 --hostfile $OAR_NODEFILE -bind-to core location.exe |grep
> 'thread is now running on PU'
> (process 5) thread is now running on PU logical index 1 (OS/physical index
> 5) on system frog53
> (process 7) thread is now running on PU logical index 3 (OS/physical index
> 7) on system frog53
> (process 4) thread is now running on PU logical index 0 (OS/physical index
> 4) on system frog53
> (process 6) thread is now running on PU logical index 2 (OS/physical index
> 6) on system frog53
> (process 2) thread is now running on PU logical index 1 (OS/physical index
> 5) on system frog53
> (process 0) thread is now running on PU logical index 2 (OS/physical index
> 6) on system frog53
> (process 1) thread is now running on PU logical index 0 (OS/physical index
> 4) on system frog53
> (process 3) thread is now running on PU logical index 0 (OS/physical index
> 4) on system frog53
>
> But I cannot overload more than the 8 cores (max core number of one cpu).
> # mpirun -np 9 --hostfile $OAR_NODEFILE -bind-to core location.exe
> A request was made to bind to that would result in binding more
> processes than cpus on a resource:
>
>Bind to: CORE
>Node:frog53
>#processes:  2
>#cpus:   1
>
> You can override this protection by adding the "overload-allowed"
> option to your binding directive.
>
> Now if I add *--nooversubscribe* the problem doesn't exist anymore (no
> more than 4 processes, one on each core). So looks like if default behavior
> would be a nooversuscribe on cores number of the socket ???
>
> Again, with 1.7.3 this problem doesn't occur at all.
>
> Patrick
>
>
>
> If you provide a hostfile that doesn’t specify slots, then we use the
> number of cores we find on each node, and we allow oversubscription.
>
> What is being described sounds like more of a bug than an intended
> feature. I’d need to know more about it, though, to be sure. Can you tell
> me how you are specifying this cpuset?
>
>
> On Sep 15, 2015, at 4:44 PM, Matt Thompson  wrote:
>

[OMPI users] Problem with using MPI in a Python extension

2015-09-17 Thread Joel Hermanns
Hi all,

I’m currently trying to use MPI within a Python extension (written in C++). I 
was able to compile the extension and import it correctly, but as soon as I run 
the function, which contains the MPI code, I get the following error:

```
[aia256:15841] mca: base: component_find: unable to open 
/pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_posix: 
/pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_posix.so: undefined symbol: 
opal_shmem_base_framework (ignored)
[aia256:15841] mca: base: component_find: unable to open 
/pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_mmap: 
/pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_mmap.so: undefined symbol: 
opal_show_help (ignored)
[aia256:15841] mca: base: component_find: unable to open 
/pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_sysv: 
/pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_sysv.so: undefined symbol: 
opal_show_help (ignored)
...
```

(for the full message please have a look at [1])


I put together a minimal example to reproduce this problem, which can be found 
at [1]. Essentially, it is an extension that consist of only one function. The 
function basically just runs MPI_Init and MPI_Finalize. 

Maybe someone has some ideas what I could try to do.

Thanks in advance!


Best,
Joel


[1] https://github.com/jhedev/mpi_python

Re: [OMPI users] Problem with using MPI in a Python extension

2015-09-17 Thread Jeff Squyres (jsquyres)
Short version:

The easiest way to do this is to configure your Open MPI installation with 
--disable-dlopen.

More detail:

Open MPI uses a bunch of plugins for its functionality.  When you dlopen libmpi 
in a private namespace (like Python does), and then libmpi tries to dlopen its 
plugins, the plugins can't find the symbols that they need in the main libmpi 
library (because they're in a private namespace).

The workaround is to build Open MPI with all of its plugins slurped up into the 
libmpi library itself (i.e., so that Open MPI doesn't have to dlopen its 
plugins).


> On Sep 17, 2015, at 11:08 AM, Joel Hermanns  wrote:
> 
> Hi all,
> 
> I’m currently trying to use MPI within a Python extension (written in C++). I 
> was able to compile the extension and import it correctly, but as soon as I 
> run the function, which contains the MPI code, I get the following error:
> 
> ```
> [aia256:15841] mca: base: component_find: unable to open 
> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_posix: 
> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_posix.so: undefined symbol: 
> opal_shmem_base_framework (ignored)
> [aia256:15841] mca: base: component_find: unable to open 
> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_mmap: 
> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_mmap.so: undefined symbol: 
> opal_show_help (ignored)
> [aia256:15841] mca: base: component_find: unable to open 
> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_sysv: 
> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_sysv.so: undefined symbol: 
> opal_show_help (ignored)
> 
> ...
> ```
> 
> (for the full message please have a look at [1])
> 
> 
> I put together a minimal example to reproduce this problem, which can be 
> found at [1]. Essentially, it is an extension that consist of only one 
> function. The function basically just runs MPI_Init and MPI_Finalize. 
> 
> Maybe someone has some ideas what I could try to do.
> 
> Thanks in advance!
> 
> 
> Best,
> Joel
> 
> 
> [1] https://github.com/jhedev/mpi_python
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27607.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] Problem with using MPI in a Python extension

2015-09-17 Thread Nick Papior
FYI, you can also see what they have done in mpi4py to by-pass this
problem.
I would actually highly recommend you to use mpi4py rather than
implementing this from scratch your-self ;)

2015-09-17 15:21 GMT+00:00 Jeff Squyres (jsquyres) :

> Short version:
>
> The easiest way to do this is to configure your Open MPI installation with
> --disable-dlopen.
>
> More detail:
>
> Open MPI uses a bunch of plugins for its functionality.  When you dlopen
> libmpi in a private namespace (like Python does), and then libmpi tries to
> dlopen its plugins, the plugins can't find the symbols that they need in
> the main libmpi library (because they're in a private namespace).
>
> The workaround is to build Open MPI with all of its plugins slurped up
> into the libmpi library itself (i.e., so that Open MPI doesn't have to
> dlopen its plugins).
>
>
> > On Sep 17, 2015, at 11:08 AM, Joel Hermanns 
> wrote:
> >
> > Hi all,
> >
> > I’m currently trying to use MPI within a Python extension (written in
> C++). I was able to compile the extension and import it correctly, but as
> soon as I run the function, which contains the MPI code, I get the
> following error:
> >
> > ```
> > [aia256:15841] mca: base: component_find: unable to open
> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_posix:
> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_posix.so: undefined symbol:
> opal_shmem_base_framework (ignored)
> > [aia256:15841] mca: base: component_find: unable to open
> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_mmap:
> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_mmap.so: undefined symbol:
> opal_show_help (ignored)
> > [aia256:15841] mca: base: component_find: unable to open
> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_sysv:
> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_sysv.so: undefined symbol:
> opal_show_help (ignored)
> >
> > ...
> > ```
> >
> > (for the full message please have a look at [1])
> >
> >
> > I put together a minimal example to reproduce this problem, which can be
> found at [1]. Essentially, it is an extension that consist of only one
> function. The function basically just runs MPI_Init and MPI_Finalize.
> >
> > Maybe someone has some ideas what I could try to do.
> >
> > Thanks in advance!
> >
> >
> > Best,
> > Joel
> >
> >
> > [1] https://github.com/jhedev/mpi_python
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27607.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27608.php
>



-- 
Kind regards Nick


Re: [OMPI users] Problem with using MPI in a Python extension

2015-09-17 Thread Joel Hermanns
Thanks for the quick answer!

I have a few questions now:

1. Are there any downsides of using —disable-dlopen?
2. Are there any other options? We might not be able to change MPI 
installation, when this is running on a supercomputer.

Joel
On 17 Sep 2015, at 17:21, Jeff Squyres (jsquyres)  wrote:

> Short version:
> 
> The easiest way to do this is to configure your Open MPI installation with 
> --disable-dlopen.
> 
> More detail:
> 
> Open MPI uses a bunch of plugins for its functionality.  When you dlopen 
> libmpi in a private namespace (like Python does), and then libmpi tries to 
> dlopen its plugins, the plugins can't find the symbols that they need in the 
> main libmpi library (because they're in a private namespace).
> 
> The workaround is to build Open MPI with all of its plugins slurped up into 
> the libmpi library itself (i.e., so that Open MPI doesn't have to dlopen its 
> plugins).
> 
> 
>> On Sep 17, 2015, at 11:08 AM, Joel Hermanns  wrote:
>> 
>> Hi all,
>> 
>> I’m currently trying to use MPI within a Python extension (written in C++). 
>> I was able to compile the extension and import it correctly, but as soon as 
>> I run the function, which contains the MPI code, I get the following error:
>> 
>> ```
>> [aia256:15841] mca: base: component_find: unable to open 
>> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_posix: 
>> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_posix.so: undefined symbol: 
>> opal_shmem_base_framework (ignored)
>> [aia256:15841] mca: base: component_find: unable to open 
>> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_mmap: 
>> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_mmap.so: undefined symbol: 
>> opal_show_help (ignored)
>> [aia256:15841] mca: base: component_find: unable to open 
>> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_sysv: 
>> /pds/opt/openmpi-1.8.7/lib64/openmpi/mca_shmem_sysv.so: undefined symbol: 
>> opal_show_help (ignored)
>> 
>> ...
>> ```
>> 
>> (for the full message please have a look at [1])
>> 
>> 
>> I put together a minimal example to reproduce this problem, which can be 
>> found at [1]. Essentially, it is an extension that consist of only one 
>> function. The function basically just runs MPI_Init and MPI_Finalize. 
>> 
>> Maybe someone has some ideas what I could try to do.
>> 
>> Thanks in advance!
>> 
>> 
>> Best,
>> Joel
>> 
>> 
>> [1] https://github.com/jhedev/mpi_python
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/09/27607.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27608.php



Re: [OMPI users] Problem with using MPI in a Python extension

2015-09-17 Thread Jeff Squyres (jsquyres)
On Sep 17, 2015, at 11:44 AM, Joel Hermanns  wrote:
> 
> Thanks for the quick answer!

Be sure to see Nick's answer, too -- mpi4py is a nice package.

> I have a few questions now:
> 
> 1. Are there any downsides of using —disable-dlopen?

You won't be able to add or remove plugins in the filesystem after you do the 
Open MPI installation.  But that's a pretty unusual thing to do, so I wouldn't 
worry about it.

> 2. Are there any other options? We might not be able to change MPI 
> installation, when this is running on a supercomputer.

I'm not super familiar with Python and its extension capabilities -- Nick's 
email implies that there's a different way to solve the problem, and I confess 
to not remembering what mpi4py does, offhand.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-17 Thread Shang Li
Hi all,

I wanted to setup a 3-node ring network, each connects to the other 2 using
2 Ethernet ports directly without a switch/router.

The interface configurations could be found in the following picture.

https://www.dropbox.com/s/g75i51rrjs51b21/mpi-graph%20-%20New%20Page.png?dl=0

I've used *ifconfig *on each node to configure each port, and made sure I
can ssh from each node to the other 2 nodes.

But a simple ring_c
 example
doesn't work... So I turn on  --mca btl_base_verbose 30, I could see that
node1 was trying to use 23.0.0.2  (linke between node2 and 3) to get to
node2 though there is a direct link to node 2.

The output log is like:

[node1:01828] btl: tcp: attempting to connect() to [[19529,1],1] address
> 23.0.0.2 on port 1024
> [[19529,1],0][btl_tcp_endpoint.c:606:mca_btl_tcp_endpoint_start_connect]
> from node1 to: node2 Unable to connect to the peer 23.0.0.2  on port 4:
> Network is unreachable


I've read the following posts and FAQs but still couldn't understand this
kind of behavior.

http://www.open-mpi.org/faq/?category=tcp#tcp-routability-1.3
http://www.open-mpi.org/faq/?category=tcp#tcp-selection
http://www.open-mpi.org/community/lists/users/2014/11/25810.php


Any pointers would be appreciated! Thanks in advance!

My open-mpi info:

 Package: Open MPI gtbldadm@ubuntu-12 Distribution
Open MPI: 1.0.0.22
  Open MPI repo revision: git714842d
   Open MPI release date: May 27, 2015
Open RTE: 1.0.0.22
  Open RTE repo revision: git714842d
   Open RTE release date: May 27, 2015
OPAL: 1.0.0.22
  OPAL repo revision: git714842d
   OPAL release date: May 27, 2015
 MPI API: 2.1


Best,
Shawn


Re: [OMPI users] Problem with using MPI in a Python extension

2015-09-17 Thread Joel Hermanns

> FYI, you can also see what they have done in mpi4py to by-pass this problem. 

Could you elaborate on this or give me some pointer to other resources?

> I would actually highly recommend you to use mpi4py rather than implementing 
> this from scratch your-self ;)

I fully agree that it is a bad idea to implement something like mpi4py from 
scratch. However, I don’t plan to do this and 
I’m not sure if mpi4py will work for us. This problem initially came up when 
working on a thin layer around some parallel netcdf functionality to request 
and compare data from NetCDF (especially CDF-5) files. 
It is written in C++ due to performance reasons. Additionally, I’m not sure if 
there is any up-to-date python library for parallel netcdf that could help here.
As you can see, we don’t need full blown MPI features in python, and so I’m not 
really sure if mpi4py can help us.

Please correct me if I’m wrong!

Best,
Joel



Re: [OMPI users] Problem with using MPI in a Python extension

2015-09-17 Thread Lisandro Dalcin
On 17 September 2015 at 18:56, Jeff Squyres (jsquyres)
 wrote:
> On Sep 17, 2015, at 11:44 AM, Joel Hermanns  wrote:
>>
>> Thanks for the quick answer!
>
> Be sure to see Nick's answer, too -- mpi4py is a nice package.
>
>> I have a few questions now:
>>
>> 1. Are there any downsides of using —disable-dlopen?
>
> You won't be able to add or remove plugins in the filesystem after you do the 
> Open MPI installation.  But that's a pretty unusual thing to do, so I 
> wouldn't worry about it.
>
>> 2. Are there any other options? We might not be able to change MPI 
>> installation, when this is running on a supercomputer.
>
> I'm not super familiar with Python and its extension capabilities -- Nick's 
> email implies that there's a different way to solve the problem, and I 
> confess to not remembering what mpi4py does, offhand.
>

mpi4py just calls

dlopen("libmpi.so", RTLD_NOW | RTLD_GLOBAL | RTLD_NOLOAD);

before calling MPI_Init(), see the code below:

https://bitbucket.org/mpi4py/mpi4py/src/master/src/lib-mpi/compat/openmpi.h?fileviewer=file-view-default#openmpi.h-52



-- 
Lisandro Dalcin

Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Numerical Porous Media Center (NumPor)
King Abdullah University of Science and Technology (KAUST)
http://numpor.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 4332
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459


Re: [OMPI users] Problem with using MPI in a Python extension

2015-09-17 Thread Nick Papior
Depending on your exact usage and the data contained in the CDF-5 files I
guess netcdf4-python would work for reading the files (if the underlying
netcdf library is compiled against pnetcdf).
However, this will not immediately yield mpi features. Yet, reading
different segments of files could be made embarrassingly parallel which
might be OK, but defeat the purpose of your code.

Yet, why do you use python on-top of C++ for data comparison? If you need
the speed (you mentioned speed) why not do it in simple C++, C or fortran?
Data comparison can be made extremely easy in fortran. Sorry for the blurp
;)


2015-09-17 18:20 GMT+00:00 Joel Hermanns :

>
> > FYI, you can also see what they have done in mpi4py to by-pass this
> problem.
>
> Could you elaborate on this or give me some pointer to other resources?
>
> > I would actually highly recommend you to use mpi4py rather than
> implementing this from scratch your-self ;)
>
> I fully agree that it is a bad idea to implement something like mpi4py
> from scratch. However, I don’t plan to do this and
> I’m not sure if mpi4py will work for us. This problem initially came up
> when
> working on a thin layer around some parallel netcdf functionality to
> request and compare data from NetCDF (especially CDF-5) files.
> It is written in C++ due to performance reasons. Additionally, I’m not
> sure if there is any up-to-date python library for parallel netcdf that
> could help here.
> As you can see, we don’t need full blown MPI features in python, and so
> I’m not really sure if mpi4py can help us.
>
> Please correct me if I’m wrong!
>
> Best,
> Joel
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27613.php
>



-- 
Kind regards Nick


Re: [OMPI users] Have Trouble Setting Up a Ring Network Using Open MPI

2015-09-17 Thread Gilles Gouaillardet

Shang,

can you please run
mpirun --version
i cannot find the ompi version you are running based on the git hash you 
reported


as a temporary workaround, you can do minimal tcp routing :
on the three nodes
1) run
sysctl -w net.ipv4.ip_forward=1

2) route the other nodes interface not on the same network
for example, on node 1, you can run
route add -host 23.0.0.2 gw 12.0.0.2
route add -host 23.0.0.3 gw 13.0.0.3

Cheers,

Gilles

On 9/18/2015 1:31 AM, Shang Li wrote:

Hi all,

I wanted to setup a 3-node ring network, each connects to the other 2 
using 2 Ethernet ports directly without a switch/router.


The interface configurations could be found in the following picture.

https://www.dropbox.com/s/g75i51rrjs51b21/mpi-graph%20-%20New%20Page.png?dl=0

I've used /ifconfig /on each node to configure each port, and made 
sure I can ssh from each node to the other 2 nodes.


But a simplering_c 
 
example doesn't work... So I turn on  --mca btl_base_verbose 30, I 
could see that node1 was trying to use 23.0.0.2  (linke between node2 
and 3) to get to node2 though there is a direct link to node 2.


The output log is like:

[node1:01828] btl: tcp: attempting to connect() to [[19529,1],1]
address 23.0.0.2 on port 1024
[[19529,1],0][btl_tcp_endpoint.c:606:mca_btl_tcp_endpoint_start_connect]
from node1 to: node2 Unable to connect to the peer 23.0.0.2  on
port 4: Network is unreachable


I've read the following posts and FAQs but still couldn't understand 
this kind of behavior.


http://www.open-mpi.org/faq/?category=tcp#tcp-routability-1.3
http://www.open-mpi.org/faq/?category=tcp#tcp-selection
http://www.open-mpi.org/community/lists/users/2014/11/25810.php


Any pointers would be appreciated! Thanks in advance!

My open-mpi info:

 Package: Open MPI gtbldadm@ubuntu-12 Distribution
Open MPI: 1.0.0.22
  Open MPI repo revision: git714842d
   Open MPI release date: May 27, 2015
Open RTE: 1.0.0.22
  Open RTE repo revision: git714842d
   Open RTE release date: May 27, 2015
OPAL: 1.0.0.22
  OPAL repo revision: git714842d
   OPAL release date: May 27, 2015
 MPI API: 2.1


Best,
Shawn



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27612.php




[OMPI users] C/R Enabled Debugging

2015-09-17 Thread gzzh...@buaa.edu.cn
Hi Team 
I am trying to use the MPI to do some test and study on the C/R enabled 
debugging.   Professor Josh Hursey said that the feature never made it into  a 
release so it was only ever available on the trunk, However , since that time 
the C/R functionality has fallen into disrepair. It is most likely broken in 
the trunk today. T tried  with the current openmpi-master sourcecode,   it can 
be configure, but can't be make successful because bugs still existing 
according to  the log.  Is there any possible that the history 
openmpi-developer code which supports C/R enabled debugging can be download . I 
  appreciate your offer to help us .
 Best wishes.

ZhangGuozhen
Department of Computer Science
Beihang University
Beijing, China


Re: [OMPI users] C/R Enabled Debugging

2015-09-17 Thread Ralph Castain
I believe that the 1.6 series was the last to support C/R - you can find it
on our web site.

http://www.open-mpi.org/software/ompi/v1.6/

HTH
Ralph


On Thu, Sep 17, 2015 at 6:42 PM, gzzh...@buaa.edu.cn 
wrote:

> Hi Team
> I am trying to use the MPI to do some test and study on the C/R enabled
> debugging.   Professor Josh Hursey said that the feature never made it into
>  a release so it was only ever available on the trunk, However , since that
> time the C/R functionality has fallen into disrepair. It is most likely
> broken in the trunk today. T tried  with the current openmpi-master
> sourcecode,   it can be configure, but can't be make successful because
> bugs still existing according to  the log.  Is there any possible that the
> history openmpi-developer code which supports C/R enabled debugging can be
> download . I   appreciate your offer to help us .
>  Best wishes.
>
> ZhangGuozhen
> Department of Computer Science
> Beihang University
> Beijing, China
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/09/27617.php
>