Re: [OMPI users] MPI Behaviour Question

2016-10-12 Thread Mark Potter
Thanks, between yourself and the Gilles I've got plenty of information
to use in an explanation! And thanks for the hello world link, I've
used the examples that come with OpenMPI but hadn't used that one.
Usually I end up assuming it works and just running HPL. ;)

On Tue, 2016-10-11 at 15:27 +0200, Reuti wrote:
> Hi,
> 
> > 
> > Am 11.10.2016 um 14:56 schrieb Mark Potter 
> > :
> > 
> > This question is related to OpenMPI 2.0.1 compiled with GCC 4.8.2
> > on
> > RHEL 6.8 using Torque 6.0.2 with Moab 9.0.2. To be clear, I am an
> > administrator and not a coder and I suspect this is expected
> > behavior
> > but I have been asked by a client to explain why this is happening.
> > 
> > Using Torque, the following command returns the hostname of the
> > first
> > node only, regardless of how the nodes/cores are split up:
> > 
> > mpirun -np 20 echo "Hello from $HOSTNAME"
> The $HOSTNAME will be expanded and used as argument before `mpirun`
> even starts. Instead it has to be evaluated on the nodes:
> 
> $ mpirun bash -c "echo \$HOSTNAME"
> 
> 
> > 
> > (the behaviour is the same with "echo $(hostname))
> > 
> > The Torque script looks like this:
> > 
> > #PBS -V
> > #PBS -N test-job
> > #PBS -l nodes=2:ppn=16
> > #PBS -e ERROR
> > #PBS -o OUTPUT
> > 
> > 
> > cd $PBS_O_WORKDIR
> > date
> > cat $PBS_NODEFILE
> > 
> > mpirun -np32 echo "Hello from $HOSTNAME"
> > 
> > If the echo statement is replaced with "hostname" then a proper
> > response is received from all nodes.
> > 
> > While I know there are better ways to test OpenMPI's functionality,
> > like compiling and using the programs in examples/, this is the
> > method
> > a specific client chose.
> There are small "Hello world" programs like here:
> 
> http://mpitutorial.com/tutorials/mpi-hello-world/
> 
> to test whether e.g. the libraries are found at runtime by the
> application(s).
> 
> -- Reuti
> 
> 
> > 
> > I was using both the examples and a Torque job
> > script calling just "hostname" as a command and not using echo and
> > the
> > client was using the script above. It took some doing to figure out
> > why
> > he thought it wasn't working and all my tests were successful and
> > when
> > I figured it, he wanted an explanation that's beyond my current
> > knowledge. Any help towards explaining the behaviour would be
> > greatly
> > appreciated.
> > 
-- 
Regards,

Mark L. Potter
Senior Consultant
PCPC Direct, Ltd.
O: 713-344-0952 
M: 713-965-4133
S: mpot...@pcpcdirect.com
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI Behaviour Question

2016-10-12 Thread Mark Potter
After the responses I did more testing. Even $(hostname) and `hostname`
get expanded on the first node. A script using echo (then any of them
from the environment variable to the backticks works. I'm guessing all
shell expansion on the CLI happens on the first node, from my limited
testing. That explanation makes sense and fits the results. It's easy
enough to explain as well!

On Tue, 2016-10-11 at 22:17 +0900, Gilles Gouaillardet wrote:
> Mark,
> 
> My understanding is that shell meta expansion occurs once on the
> first node, so from an Open MPI point of view, you really invoke
> mpirun echo node0
> I suspect
> mpirun echo 'Hello from $(hostname)'
> Is what you want to do
> I do not know about
> mpirun echo 'Hello from $HOSTNAME'
> $HOSTNAME might be passed by the first node to all tasks, and hence
> might not have the value you expect on all the nodes
> Feel free to
> mpirun env | grep ^HOSTNAME=
> To check if the HOSTNAME variable is set to what you expect
> 
> /* i an afk, so i cannot check that right now ... */
> 
> 
> Cheers,
> 
> Gilles
> 
> Mark Potter  wrote:
> > 
> > This question is related to OpenMPI 2.0.1 compiled with GCC 4.8.2
> > on
> > RHEL 6.8 using Torque 6.0.2 with Moab 9.0.2. To be clear, I am an
> > administrator and not a coder and I suspect this is expected
> > behavior
> > but I have been asked by a client to explain why this is happening.
> > 
> > Using Torque, the following command returns the hostname of the
> > first
> > node only, regardless of how the nodes/cores are split up:
> > 
> > mpirun -np 20 echo "Hello from $HOSTNAME"
> > 
> > (the behaviour is the same with "echo $(hostname))
> > 
> > The Torque script looks like this:
> > 
> > #PBS -V
> > #PBS -N test-job
> > #PBS -l nodes=2:ppn=16
> > #PBS -e ERROR
> > #PBS -o OUTPUT
> > 
> > 
> > cd $PBS_O_WORKDIR
> > date
> > cat $PBS_NODEFILE
> > 
> > mpirun -np32 echo "Hello from $HOSTNAME"
> > 
> > If the echo statement is replaced with "hostname" then a proper
> > response is received from all nodes.
> > 
> > While I know there are better ways to test OpenMPI's functionality,
> > like compiling and using the programs in examples/, this is the
> > method
> > a specific client chose. I was using both the examples and a Torque
> > job
> > script calling just "hostname" as a command and not using echo and
> > the
> > client was using the script above. It took some doing to figure out
> > why
> > he thought it wasn't working and all my tests were successful and
> > when
> > I figured it, he wanted an explanation that's beyond my current
> > knowledge. Any help towards explaining the behaviour would be
> > greatly
> > appreciated.
> > 
-- 
Regards,

Mark L. Potter
Senior Consultant
PCPC Direct, Ltd.
O: 713-344-0952 
M: 713-965-4133
S: mpot...@pcpcdirect.com
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on CentOS 7.2

2016-10-12 Thread Cabral, Matias A
Hi Limin,

One more detail. I advise to use a stable release:

https://github.com/01org/opa-psm2/releases

Regards,

_MAC

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Limin Gu
Sent: Tuesday, October 11, 2016 7:33 PM
To: Open MPI Users 
Subject: Re: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on CentOS 7.2

Thank you very much, MAC!

Limin




On Tue, Oct 11, 2016 at 10:15 PM, Cabral, Matias A 
mailto:matias.a.cab...@intel.com>> wrote:
Building psm2 should not be complicated (in case you cannot find a newer 
binary):

https://github.com/01org/opa-psm2


Note that newer rpm are named hfi1-psm*


_MAC

From: users 
[mailto:users-boun...@lists.open-mpi.org]
 On Behalf Of Limin Gu
Sent: Tuesday, October 11, 2016 6:44 PM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Subject: Re: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on CentOS 7.2


Thanks Gilles!



Limin

On Tue, Oct 11, 2016 at 9:33 PM, Gilles Gouaillardet 
mailto:gil...@rist.or.jp>> wrote:

Limin,



It seems libpsm2 provided by Centos 7 is a bit too old

all symbols are prefixed with psm_, and Open MPI expect they are prefixed with 
psm2_

i am afraid your only option is to manually install the latest libpsm2 and then 
configure again with your psm2 install dir


Cheers,

Gilles

On 10/12/2016 9:57 AM, Limin Gu wrote:
Hi MAC,

It seems /usr/lib64/libpsm2.so.2 has no symbols. Can configure check some other 
ways?



[root@uranus ~]# rpm -qi libpsm2-0.7-4.el7.x86_64

Name: libpsm2

Version : 0.7

Release : 4.el7

Architecture: x86_64

Install Date: Tue 11 Oct 2016 05:45:59 PM PDT

Group   : System Environment/Libraries

Size: 400282

License : GPLv2 or BSD

Signature   : RSA/SHA256, Wed 25 Nov 2015 07:02:20 AM PST, Key ID 
24c6a8a7f4a80eb5

Source RPM  : libpsm2-0.7-4.el7.src.rpm

Build Date  : Fri 20 Nov 2015 08:05:13 AM PST

Build Host  : worker1.bsys.centos.org

Relocations : (not relocatable)

Packager: CentOS BuildSystem 

Vendor  : CentOS

URL : http://www.intel.com/

Summary : Intel PSM Libraries

Description :

The PSM Messaging API, or PSM API, is Intel's low-level

user-level communications interface for the Truescale

family of products. PSM users are enabled with mechanisms

necessary to implement higher level communications

interfaces in parallel environments.

[root@uranus ~]# objdump -p /usr/lib64/libpsm2.so.2 |grep SONAME

  SONAME   libpsm2.so.2

[root@uranus ~]# nm /usr/lib64/libpsm2.so.2

nm: /usr/lib64/libpsm2.so.2: no symbols

[root@uranus ~]#


Thanks!
Limin


On Tue, Oct 11, 2016 at 7:00 PM, Cabral, Matias A 
mailto:matias.a.cab...@intel.com>> wrote:
Hi  Limin,

psm2_mq_irecv2 should be in libpsm2.so.  I’m not quite sure how CentOS packs it 
so I would like a little more info about the version being used. Some things to 
share:

>rpm -qi libpsm2-0.7-4.el7.x86_64
> objdump –p /usr/lib64/libpsm2.so |grep SONAME
>nm /usr/lib64/libpsm2.so |grep psm2_mq_irecv2 (will not work if the lib 
>Stripped)


Thanks,
_MAC

From: users 
[mailto:users-boun...@lists.open-mpi.org]
 On Behalf Of Limin Gu
Sent: Tuesday, October 11, 2016 2:58 PM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Subject: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on CentOS 7.2

Hi All,

I am trying to build openmpi 2.0.1 on a CentOS 7.2 system, and I have following 
libpsm2 packages installed:

libpsm2-0.7-4.el7.x86_64
libpsm2-compat-0.7-4.el7.x86_64
libpsm2-compat-devel-0.7-4.el7.x86_64
libpsm2-devel-0.7-4.el7.x86_64

I added --with-psm2 to my configure, but it failed:

--- MCA component mtl:psm2 (m4 configuration macro)
checking for MCA component mtl:psm2 compile mode... static
checking --with-psm2 value... simple ok (unspecified)
checking --with-psm2-libdir value... simple ok (unspecified)
checking psm2.h usability... yes
checking psm2.h presence... yes
checking for psm2.h... yes
looking for library without search path
checking for library containing psm2_mq_irecv2... no
configure: error: PSM2 support requested but not found.  Aborting
error: Bad exit status from /var/tmp/rpm-tmp.TLxu8O (%build)


/usr/lib64/libpsm2.so is on the system though.

What else libraries do I need for psm2?

Thank you!


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users



___

users mailing list

users@lists.open-mpi.org

https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing li

Re: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on CentOS 7.2

2016-10-12 Thread Limin Gu
Wonderful, thank you so much MAC!

Limin

On Wed, Oct 12, 2016 at 12:50 PM, Cabral, Matias A <
matias.a.cab...@intel.com> wrote:

> Hi Limin,
>
>
>
> One more detail. I advise to use a stable release:
>
>
>
> https://github.com/01org/opa-psm2/releases
>
>
>
> Regards,
>
>
>
> _MAC
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of *Limin
> Gu
> *Sent:* Tuesday, October 11, 2016 7:33 PM
>
> *To:* Open MPI Users 
> *Subject:* Re: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on
> CentOS 7.2
>
>
>
> Thank you very much, MAC!
>
>
>
> Limin
>
>
>
>
>
>
>
> On Tue, Oct 11, 2016 at 10:15 PM, Cabral, Matias A <
> matias.a.cab...@intel.com> wrote:
>
> Building psm2 should not be complicated (in case you cannot find a newer
> binary):
>
>
>
> https://github.com/01org/opa-psm2
>
>
>
> Note that newer rpm are named hfi1-psm*
>
>
>
>
>
> _MAC
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of *Limin
> Gu
> *Sent:* Tuesday, October 11, 2016 6:44 PM
> *To:* Open MPI Users 
> *Subject:* Re: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on
> CentOS 7.2
>
>
>
> Thanks Gilles!
>
>
>
> Limin
>
>
>
> On Tue, Oct 11, 2016 at 9:33 PM, Gilles Gouaillardet 
> wrote:
>
> Limin,
>
>
>
> It seems libpsm2 provided by Centos 7 is a bit too old
>
> all symbols are prefixed with psm_, and Open MPI expect they are prefixed
> with psm2_
>
>
> i am afraid your only option is to manually install the latest libpsm2 and
> then configure again with your psm2 install dir
>
>
> Cheers,
>
> Gilles
>
>
>
> On 10/12/2016 9:57 AM, Limin Gu wrote:
>
> Hi MAC,
>
>
>
> It seems /usr/lib64/libpsm2.so.2 has no symbols. Can configure check some
> other ways?
>
>
>
>
>
> [root@uranus ~]# rpm -qi libpsm2-0.7-4.el7.x86_64
>
> Name: libpsm2
>
> Version : 0.7
>
> Release : 4.el7
>
> Architecture: x86_64
>
> Install Date: Tue 11 Oct 2016 05:45:59 PM PDT
>
> Group   : System Environment/Libraries
>
> Size: 400282
>
> License : GPLv2 or BSD
>
> Signature   : RSA/SHA256, Wed 25 Nov 2015 07:02:20 AM PST, Key ID
> 24c6a8a7f4a80eb5
>
> Source RPM  : libpsm2-0.7-4.el7.src.rpm
>
> Build Date  : Fri 20 Nov 2015 08:05:13 AM PST
>
> Build Host  : worker1.bsys.centos.org
>
> Relocations : (not relocatable)
>
> Packager: CentOS BuildSystem 
>
> Vendor  : CentOS
>
> URL : http://www.intel.com/
>
> Summary : Intel PSM Libraries
>
> Description :
>
> The PSM Messaging API, or PSM API, is Intel's low-level
>
> user-level communications interface for the Truescale
>
> family of products. PSM users are enabled with mechanisms
>
> necessary to implement higher level communications
>
> interfaces in parallel environments.
>
> [root@uranus ~]# objdump -p /usr/lib64/libpsm2.so.2 |grep SONAME
>
>   *SONAME*   libpsm2.so.2
>
> [root@uranus ~]# nm /usr/lib64/libpsm2.so.2
>
> nm: /usr/lib64/libpsm2.so.2: no symbols
>
> [root@uranus ~]#
>
>
>
>
>
> Thanks!
>
> Limin
>
>
>
>
>
> On Tue, Oct 11, 2016 at 7:00 PM, Cabral, Matias A <
> matias.a.cab...@intel.com> wrote:
>
> Hi  Limin,
>
>
>
> psm2_mq_irecv2 should be in libpsm2.so.  I’m not quite sure how CentOS
> packs it so I would like a little more info about the version being used.
> Some things to share:
>
>
>
> >rpm -qi libpsm2-0.7-4.el7.x86_64
>
> > objdump –p /usr/lib64/libpsm2.so |grep SONAME
>
> >nm /usr/lib64/libpsm2.so |grep psm2_mq_irecv2 (will not work if the lib
> Stripped)
>
>
>
>
>
> Thanks,
>
> _MAC
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of *Limin
> Gu
> *Sent:* Tuesday, October 11, 2016 2:58 PM
> *To:* Open MPI Users 
> *Subject:* [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on CentOS
> 7.2
>
>
>
> Hi All,
>
> I am trying to build openmpi 2.0.1 on a CentOS 7.2 system, and I have
> following libpsm2 packages installed:
>
> libpsm2-0.7-4.el7.x86_64
> libpsm2-compat-0.7-4.el7.x86_64
> libpsm2-compat-devel-0.7-4.el7.x86_64
> libpsm2-devel-0.7-4.el7.x86_64
>
> I added --with-psm2 to my configure, but it failed:
>
> --- MCA component mtl:psm2 (m4 configuration macro)
> checking for MCA component mtl:psm2 compile mode... static
> checking --with-psm2 value... simple ok (unspecified)
> checking --with-psm2-libdir value... simple ok (unspecified)
> checking psm2.h usability... yes
> checking psm2.h presence... yes
> checking for psm2.h... yes
> looking for library without search path
> checking for library containing psm2_mq_irecv2... no
> configure: error: PSM2 support requested but not found.  Aborting
> error: Bad exit status from /var/tmp/rpm-tmp.TLxu8O (%build)
>
>
>
>
>
> /usr/lib64/libpsm2.so is on the system though.
>
>
>
> What else libraries do I need for psm2?
>
>
>
> Thank you!
>
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
>
>
> ___
>
> users mailing list
>
> us

[OMPI users] clarity on Comm_connect

2016-10-12 Thread Marlborough, Rick
Designation: Non-Export Controlled Content
Folks;
Trying to do an MPI_Lookup_name. The call is surrounded by a 
try catch block. Even with the try catch block the calling process will still 
abort if the publishing process has not published the name. Is there a way to 
configure/code  to cause MPI to throw a trappable exception?

Thanx
Rick

3.1.1001
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] clarity on Comm_connect

2016-10-12 Thread Marlborough, Rick
Designation: Non-Export Controlled Content
...forgot to mention...

I have a group of processes called sensors and a group of processes called 
proxies. A central dispatch process launches all of the sensors followed by all 
of the proxies. The sensors publish named ports and wait on MPI_Comm_accept. 
The proxies look up the named port and to a MPI_Comm_connect. If this all 
occurs on the same node as the dispatcher then all proxies connect their 
respective sensor and all is well. If I configure my slots to force proxies or 
sensors onto other nodes(I have 20) then the connections fail. There is full 
connectivity between all of these nodes. We are testing various forms of 
middleware. Some use tcp, some use udp, some use multi-cast. All work. Full ssh 
connectivity is setup between all of these nodes. Oddly enough the sensors all 
perform a Comm_connect to the dispatcher. This always works! The sensors and 
proxies are all spawned in 2 batches using Comm_spawn_multiple.  Error message 
below. Is there some configuration to enable this?

[cid:image001.png@01D224B0.446AA710]


3.1.1001
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Marlborough, 
Rick
Sent: Wednesday, October 12, 2016 4:47 PM
To: users@lists.open-mpi.org
Subject: [OMPI users] clarity on Comm_connect


Designation: Non-Export Controlled Content
Folks;
Trying to do an MPI_Lookup_name. The call is surrounded by a 
try catch block. Even with the try catch block the calling process will still 
abort if the publishing process has not published the name. Is there a way to 
configure/code  to cause MPI to throw a trappable exception?

Thanx
Rick

3.1.1001
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] clarity on Comm_connect

2016-10-12 Thread Marlborough, Rick
Designation: Non-Export Controlled Content
Another follow up. If I run all proxies on the same node as the dispatcher then 
it works. Even with all sensors spread to different nodes. If I force the 
proxies to another node, they all fail. Here is some more error output.

[cid:image001.png@01D224B2.985D36B0]


3.1.1001
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Marlborough, 
Rick
Sent: Wednesday, October 12, 2016 5:44 PM
To: Open MPI Users
Subject: Re: [OMPI users] clarity on Comm_connect


Designation: Non-Export Controlled Content
...forgot to mention...

I have a group of processes called sensors and a group of processes called 
proxies. A central dispatch process launches all of the sensors followed by all 
of the proxies. The sensors publish named ports and wait on MPI_Comm_accept. 
The proxies look up the named port and to a MPI_Comm_connect. If this all 
occurs on the same node as the dispatcher then all proxies connect their 
respective sensor and all is well. If I configure my slots to force proxies or 
sensors onto other nodes(I have 20) then the connections fail. There is full 
connectivity between all of these nodes. We are testing various forms of 
middleware. Some use tcp, some use udp, some use multi-cast. All work. Full ssh 
connectivity is setup between all of these nodes. Oddly enough the sensors all 
perform a Comm_connect to the dispatcher. This always works! The sensors and 
proxies are all spawned in 2 batches using Comm_spawn_multiple.  Error message 
below. Is there some configuration to enable this?

[cid:image001.png@01D224B0.446AA710]


3.1.1001
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Marlborough, 
Rick
Sent: Wednesday, October 12, 2016 4:47 PM
To: users@lists.open-mpi.org
Subject: [OMPI users] clarity on Comm_connect


Designation: Non-Export Controlled Content
Folks;
Trying to do an MPI_Lookup_name. The call is surrounded by a 
try catch block. Even with the try catch block the calling process will still 
abort if the publishing process has not published the name. Is there a way to 
configure/code  to cause MPI to throw a trappable exception?

Thanx
Rick

3.1.1001
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users