[OMPI users] Strange OpenMPI messages

2012-02-14 Thread Tohiko Looka
Greetings,

Until today I was running my openmpi applications with no errors/warnings
Today I restarted my computer (possibly after an automatic openmpi update)
and got these warnings when
running my program
[tohiko@kw12614 1d]$ mpirun -x LD_LIBRARY_PATH -hostfile hosts -np 10 hello
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--
[[21652,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: kw12614

Another transport will be used instead, although this may result in
lower performance.
--
[kw12614:03195] 10 more processes have sent help message
help-mpi-btl-base.txt / btl:no-nics
[kw12614:03195] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages


Is this normal? And how come it happened now?
-- Tohiko


Re: [OMPI users] Strange OpenMPI messages

2012-02-14 Thread Tohiko Looka
Sorry for the noob question, but how do I check my network type and if OFED
service is running correctly or not? And how do I run it

Thank you,

On Tue, Feb 14, 2012 at 2:14 PM, Jeff Squyres  wrote:

> Do you have an OpenFabrics-based network?  (e.g., InfiniBand or iWarp)
>
> If so, this error message usually means that OFED is either installed
> incorrectly, or is not running properly (e.g., its services didn't get
> started properly upon boot).
>
> If you don't have an OpenFabrics-based network, then it usually means that
> you have OpenFabrics services running when you really shouldn't (because
> you don't have any OpenFabrics-based devices).
>
>
> On Feb 14, 2012, at 4:48 AM, Tohiko Looka wrote:
>
> > Greetings,
> >
> > Until today I was running my openmpi applications with no errors/warnings
> > Today I restarted my computer (possibly after an automatic openmpi
> update) and got these warnings when
> > running my program
> > [tohiko@kw12614 1d]$ mpirun -x LD_LIBRARY_PATH -hostfile hosts -np 10
> hello
> > librdmacm: couldn't read ABI version.
> > librdmacm: assuming: 4
> > CMA: unable to get RDMA device list
> >
> --
> > [[21652,1],0]: A high-performance Open MPI point-to-point messaging
> module
> > was unable to find any relevant network interfaces:
> >
> > Module: OpenFabrics (openib)
> >   Host: kw12614
> >
> > Another transport will be used instead, although this may result in
> > lower performance.
> >
> --
> > [kw12614:03195] 10 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> > [kw12614:03195] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
> >
> >
> > Is this normal? And how come it happened now?
> > -- Tohiko
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] Different Prefix for different nodes

2012-02-14 Thread Tohiko Looka
Hello,

I'm trying to run my application on different nodes; each with a different
path to OpenMPI libraries and binaries.
According to the documentation I can set '-prefix' on a per-context basis,
so I can set '-prefix' differently
for each node, but I wasn't able to do it and I didn't find an example

What is the correct syntax to set different '-prefixes' arguments for each
node?
Thank you,


Re: [OMPI users] Strange OpenMPI messages

2012-02-15 Thread Tohiko Looka
Mm... This is really strange
I don't have that service and there is no ib* output in 'ifconfig -a' or
'Infinband' in 'lspci'
Which makes me believe that I don't have such a network. I also checked on
an identical computer on the same network with the same results.

What's strange is that these messages didn't use to show up and they don't
show up on that identical computer; only on mine. Even though both
computers have the same hardware, openMPI version and on the same network.

I guess I can safely ignore these warnings and run on Ethernet, but it
would be nice to know what happened there, in case anybody has an idea.

Thank you,

On Wed, Feb 15, 2012 at 12:52 AM, Gustavo Correa wrote:

> Hi Tohiko
>
> OpenFabrics network a.k.a. Infiniband a.k.a. IB.
> To check if the compute nodes have IB interfaces, try:
>
> lspci [and search the output for Infinband]
>
> To see if the IB interface is configured try:
>
> ifconfig -a  [and search the output for ib0, ib1, or similar]
>
> To check if the OFED module is up try:
>
> 'service openibd status'
>
>
> As an alternative, you could also try to run your program over Ethernet,
> avoiding Infinband,
> in case you don't have IB or if somehow it is broken.
> It is slower than Infiniband, though.
>
> Try something like this:
>
> mpiexec -mca btl tcp,sm,self -np 4 ./my_mpi_program
>
> I hope this helps,
> Gus Correa
>
> On Feb 14, 2012, at 4:02 PM, Tohiko Looka wrote:
>
> > Sorry for the noob question, but how do I check my network type and if
> OFED service is running correctly or not? And how do I run it
> >
> > Thank you,
> >
> > On Tue, Feb 14, 2012 at 2:14 PM, Jeff Squyres 
> wrote:
> > Do you have an OpenFabrics-based network?  (e.g., InfiniBand or iWarp)
> >
> > If so, this error message usually means that OFED is either installed
> incorrectly, or is not running properly (e.g., its services didn't get
> started properly upon boot).
> >
> > If you don't have an OpenFabrics-based network, then it usually means
> that you have OpenFabrics services running when you really shouldn't
> (because you don't have any OpenFabrics-based devices).
> >
> >
> > On Feb 14, 2012, at 4:48 AM, Tohiko Looka wrote:
> >
> > > Greetings,
> > >
> > > Until today I was running my openmpi applications with no
> errors/warnings
> > > Today I restarted my computer (possibly after an automatic openmpi
> update) and got these warnings when
> > > running my program
> > > [tohiko@kw12614 1d]$ mpirun -x LD_LIBRARY_PATH -hostfile hosts -np 10
> hello
> > > librdmacm: couldn't read ABI version.
> > > librdmacm: assuming: 4
> > > CMA: unable to get RDMA device list
> > >
> --
> > > [[21652,1],0]: A high-performance Open MPI point-to-point messaging
> module
> > > was unable to find any relevant network interfaces:
> > >
> > > Module: OpenFabrics (openib)
> > >   Host: kw12614
> > >
> > > Another transport will be used instead, although this may result in
> > > lower performance.
> > >
> --
> > > [kw12614:03195] 10 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> > > [kw12614:03195] Set MCA parameter "orte_base_help_aggregate" to 0 to
> see all help / error messages
> > >
> > >
> > > Is this normal? And how come it happened now?
> > > -- Tohiko
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Different Prefix for different nodes

2012-02-15 Thread Tohiko Looka
Hello Jeff,

Yes, I tried that and it worked.. Thanks
But I hope the people behind OpenMPI will correct this in the documentation

On Wed, Feb 15, 2012 at 7:08 PM, Jeff Squyres  wrote:

> On Feb 14, 2012, at 4:06 PM, Tohiko Looka wrote:
>
> > I'm trying to run my application on different nodes; each with a
> different path to OpenMPI libraries and binaries.
> > According to the documentation I can set '-prefix' on a per-context
> basis, so I can set '-prefix' differently
> > for each node, but I wasn't able to do it and I didn't find an example
>
> Yoinks -- that might be incorrect documentation.  I'm pretty sure --prefix
> is a global switch.
>
> If you have OMPI installed in different locations on different nodes,
> --prefix might not be a good solution.  Instead, you might well want to set
> your PATH/LD_LIBRARY_PATH in the shell startup files on each node (e.g.,
> $HOME/.bashrc) to values that are relevant for that node.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Strange OpenMPI messages

2012-02-15 Thread Tohiko Looka
Jeff,
My computer doesn't have such a service. and I think that's the correct
name for Fedora
Also, what bugs me is that it used to work with no warnings before
restarting my computer.
I will try to recompile openMPI myself (as opposed to installing it using
yum) and see what happens

On Wed, Feb 15, 2012 at 6:32 PM, Jeff Squyres  wrote:

> It is possible to have the OpenFabrics drivers loaded in your kernel, even
> if you have no OpenFabrics-based devices in your hardware.
>
> You probably just want to unload those drivers, and then Open MPI should
> not try to use OpenFabrics.  Sometimes distros have init scripts that load
> the OpenFabrics drivers automatically -- Gus suggested "service openibd
> status" to see if your system has a service named "openibd" (although it
> might be a different name, depending on your distro).  If you find such a
> service, you might want to disable it.
>
>
> On Feb 15, 2012, at 1:27 AM, Tohiko Looka wrote:
>
> > Mm... This is really strange
> > I don't have that service and there is no ib* output in 'ifconfig -a' or
> 'Infinband' in 'lspci'
> > Which makes me believe that I don't have such a network. I also checked
> on an identical computer on the same network with the same results.
> >
> > What's strange is that these messages didn't use to show up and they
> don't show up on that identical computer; only on mine. Even though both
> computers have the same hardware, openMPI version and on the same network.
> >
> > I guess I can safely ignore these warnings and run on Ethernet, but it
> would be nice to know what happened there, in case anybody has an idea.
> >
> > Thank you,
> >
> > On Wed, Feb 15, 2012 at 12:52 AM, Gustavo Correa 
> wrote:
> > Hi Tohiko
> >
> > OpenFabrics network a.k.a. Infiniband a.k.a. IB.
> > To check if the compute nodes have IB interfaces, try:
> >
> > lspci [and search the output for Infinband]
> >
> > To see if the IB interface is configured try:
> >
> > ifconfig -a  [and search the output for ib0, ib1, or similar]
> >
> > To check if the OFED module is up try:
> >
> > 'service openibd status'
> >
> >
> > As an alternative, you could also try to run your program over Ethernet,
> avoiding Infinband,
> > in case you don't have IB or if somehow it is broken.
> > It is slower than Infiniband, though.
> >
> > Try something like this:
> >
> > mpiexec -mca btl tcp,sm,self -np 4 ./my_mpi_program
> >
> > I hope this helps,
> > Gus Correa
> >
> > On Feb 14, 2012, at 4:02 PM, Tohiko Looka wrote:
> >
> > > Sorry for the noob question, but how do I check my network type and if
> OFED service is running correctly or not? And how do I run it
> > >
> > > Thank you,
> > >
> > > On Tue, Feb 14, 2012 at 2:14 PM, Jeff Squyres 
> wrote:
> > > Do you have an OpenFabrics-based network?  (e.g., InfiniBand or iWarp)
> > >
> > > If so, this error message usually means that OFED is either installed
> incorrectly, or is not running properly (e.g., its services didn't get
> started properly upon boot).
> > >
> > > If you don't have an OpenFabrics-based network, then it usually means
> that you have OpenFabrics services running when you really shouldn't
> (because you don't have any OpenFabrics-based devices).
> > >
> > >
> > > On Feb 14, 2012, at 4:48 AM, Tohiko Looka wrote:
> > >
> > > > Greetings,
> > > >
> > > > Until today I was running my openmpi applications with no
> errors/warnings
> > > > Today I restarted my computer (possibly after an automatic openmpi
> update) and got these warnings when
> > > > running my program
> > > > [tohiko@kw12614 1d]$ mpirun -x LD_LIBRARY_PATH -hostfile hosts -np
> 10 hello
> > > > librdmacm: couldn't read ABI version.
> > > > librdmacm: assuming: 4
> > > > CMA: unable to get RDMA device list
> > > >
> --
> > > > [[21652,1],0]: A high-performance Open MPI point-to-point messaging
> module
> > > > was unable to find any relevant network interfaces:
> > > >
> > > > Module: OpenFabrics (openib)
> > > >   Host: kw12614
> > > >
> > > > Another transport will be used instead, although this may result in
> > > > lower performance.
> > > >
> 

Re: [OMPI users] Strange OpenMPI messages

2012-02-15 Thread Tohiko Looka
Gustavo,

I will definitely try to compile OpenMPI myself and see if the problem
persist
Regarding your note on homogeneous nodes; I tried to do that as much as
possible.
But I had no control over two nodes and each of them had different setup.
As Jeff suggested, using .bashrc seems to solve the issue

Thanks

On Wed, Feb 15, 2012 at 6:52 PM, Gustavo Correa wrote:

> Hi Tohiko
>
> If you compiled Open MPI in a computer with IB hardware,
> then copied the installation tree to another machine,
> or if you installed from an RPM or other package generated in a
> machine with IB, your OpenMPI will have IB enabled,  I think, even if the
> machine where it is running does not have IB.
>
> This is a matter of taste, but here is what I think,
> regarding a previous question you sent.
> I would rather compile open MPI from source, in the machine[s] where it
> will
> run, and install it with the same path on all machines {or in a single NFS
> shared directory},
> to make things simpler.
> I would use the most homogeneous set of machines possible,  to avoid too
> many headaches.
> I.e. use the least common denominator, so to speak.
> Say, everything x86_64, all with Ethernet only [or all with IB + Ethernet,
> but you
> don't seem to have IB, at least not on all machines].
>
> I hope this helps,
> Gus Correa
>
> On Feb 15, 2012, at 1:27 AM, Tohiko Looka wrote:
>
> > Mm... This is really strange
> > I don't have that service and there is no ib* output in 'ifconfig -a' or
> 'Infinband' in 'lspci'
> > Which makes me believe that I don't have such a network. I also checked
> on an identical computer on the same network with the same results.
> >
> > What's strange is that these messages didn't use to show up and they
> don't show up on that identical computer; only on mine. Even though both
> computers have the same hardware, openMPI version and on the same network.
> >
> > I guess I can safely ignore these warnings and run on Ethernet, but it
> would be nice to know what happened there, in case anybody has an idea.
> >
> > Thank you,
> >
> > On Wed, Feb 15, 2012 at 12:52 AM, Gustavo Correa 
> wrote:
> > Hi Tohiko
> >
> > OpenFabrics network a.k.a. Infiniband a.k.a. IB.
> > To check if the compute nodes have IB interfaces, try:
> >
> > lspci [and search the output for Infinband]
> >
> > To see if the IB interface is configured try:
> >
> > ifconfig -a  [and search the output for ib0, ib1, or similar]
> >
> > To check if the OFED module is up try:
> >
> > 'service openibd status'
> >
> >
> > As an alternative, you could also try to run your program over Ethernet,
> avoiding Infinband,
> > in case you don't have IB or if somehow it is broken.
> > It is slower than Infiniband, though.
> >
> > Try something like this:
> >
> > mpiexec -mca btl tcp,sm,self -np 4 ./my_mpi_program
> >
> > I hope this helps,
> > Gus Correa
> >
> > On Feb 14, 2012, at 4:02 PM, Tohiko Looka wrote:
> >
> > > Sorry for the noob question, but how do I check my network type and if
> OFED service is running correctly or not? And how do I run it
> > >
> > > Thank you,
> > >
> > > On Tue, Feb 14, 2012 at 2:14 PM, Jeff Squyres 
> wrote:
> > > Do you have an OpenFabrics-based network?  (e.g., InfiniBand or iWarp)
> > >
> > > If so, this error message usually means that OFED is either installed
> incorrectly, or is not running properly (e.g., its services didn't get
> started properly upon boot).
> > >
> > > If you don't have an OpenFabrics-based network, then it usually means
> that you have OpenFabrics services running when you really shouldn't
> (because you don't have any OpenFabrics-based devices).
> > >
> > >
> > > On Feb 14, 2012, at 4:48 AM, Tohiko Looka wrote:
> > >
> > > > Greetings,
> > > >
> > > > Until today I was running my openmpi applications with no
> errors/warnings
> > > > Today I restarted my computer (possibly after an automatic openmpi
> update) and got these warnings when
> > > > running my program
> > > > [tohiko@kw12614 1d]$ mpirun -x LD_LIBRARY_PATH -hostfile hosts -np
> 10 hello
> > > > librdmacm: couldn't read ABI version.
> > > > librdmacm: assuming: 4
> > > > CMA: unable to get RDMA device list
> > > >
> --

Re: [OMPI users] Different Prefix for different nodes

2012-02-15 Thread Tohiko Looka
On Wed, Feb 15, 2012 at 9:03 PM, Jeff Squyres  wrote:
> Can do.  Can you point me to exactly where you saw that?

In mpirun man pages, like here
http://www.open-mpi.org/doc/v1.4/man1/mpirun.1.php

it says:
"Note that --prefix can be set on a per-context basis, allowing for
different values for different nodes. "


[OMPI users] PAPI errors when compiling OpenMPI

2012-10-08 Thread Tohiko Looka
Greetings,

I am trying to compile openmpi-1.5.4, while it usually works out fine
it is failing on a specific node.
The error is
vt_metric_papi.c:262: error: too many arguments to function ‘PAPI_perror’
vt_metric_papi.c: In function ‘metric_warning’:

Of course configure runs successfully.
Any ideas?
Thanks



Re: [OMPI users] PAPI errors when compiling OpenMPI

2012-10-09 Thread Tohiko Looka
Mmm... The problem is I already have applications/nodes that use 1.5.4
and upgrading might be difficult.
It is strange because it works on other nodes.
I will try to check if 1.6.2 compiles anyways

Thanks for your reply,

On Tue, Oct 9, 2012 at 5:11 PM, Jeff Squyres  wrote:
> Please try upgrading to Open MPI 1.6.2.
>
> On Oct 8, 2012, at 6:34 PM, Tohiko Looka wrote:
>
>> Greetings,
>>
>> I am trying to compile openmpi-1.5.4, while it usually works out fine
>> it is failing on a specific node.
>> The error is
>> vt_metric_papi.c:262: error: too many arguments to function ‘PAPI_perror’
>> vt_metric_papi.c: In function ‘metric_warning’:
>>
>> Of course configure runs successfully.
>> Any ideas?
>> Thanks
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] PAPI errors when compiling OpenMPI

2012-10-10 Thread Tohiko Looka
Thanks a lot Jeff
1.6.2 had similar problems but --disable-vt worked
Is there a page that tell what OpenMPI versions are compatible with
each other? (In the sense that they can communicate with each other)

On Tue, Oct 9, 2012 at 6:42 PM, Jeff Squyres  wrote:
> On Oct 9, 2012, at 11:34 AM, Tohiko Looka wrote:
>
>> Mmm... The problem is I already have applications/nodes that use 1.5.4
>> and upgrading might be difficult.
>
> FWIW, Open MPI 1.5.4 is binary compatible with Open MPI 1.6.2.
>
>> It is strange because it works on other nodes.
>
> Perhaps you have different versions of PAPI on your nodes...?
>
>> I will try to check if 1.6.2 compiles anyways
>
> Worst case, you can ./configure --disable-vt to disable the (optional) Vampir 
> Trace package.
>
>> Thanks for your reply,
>>
>> On Tue, Oct 9, 2012 at 5:11 PM, Jeff Squyres  wrote:
>>> Please try upgrading to Open MPI 1.6.2.
>>>
>>> On Oct 8, 2012, at 6:34 PM, Tohiko Looka wrote:
>>>
>>>> Greetings,
>>>>
>>>> I am trying to compile openmpi-1.5.4, while it usually works out fine
>>>> it is failing on a specific node.
>>>> The error is
>>>> vt_metric_papi.c:262: error: too many arguments to function ‘PAPI_perror’
>>>> vt_metric_papi.c: In function ‘metric_warning’:
>>>>
>>>> Of course configure runs successfully.
>>>> Any ideas?
>>>> Thanks
>>>>
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users