Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-10-05 Thread Chris Jewell

> 
> It looks to me like your remote nodes aren't finding the orted executable. I 
> suspect the problem is that you need to forward the path and ld_library_path 
> tot he remove nodes. Use the mpirun -x option to do so.


Hi, problem sorted.  It was actually caused by the system I currently use to 
create Linux cpusets on the execution nodes.  Grid Engine was trying to execv 
on the slave nodes, and not supplying an executable to run, since this is 
deferred to OpenMPI.  I've scrapped this system now in favour of the new SGE 
core binding feature.

Thanks, sorry to waste people's time!

Chris








Re: [OMPI users] location of ompi libraries

2010-10-05 Thread Jeff Squyres
It is more than likely that you compiled Open MPI with --enable-static and/or 
--disable-dlopen.  In this case, all of Open MPI's plugins are slurped up into 
the libraries themselves (e.g., libmpi.so or libmpi.a).  That's why everything 
continues to work properly.


On Oct 4, 2010, at 6:58 PM, David Turner wrote:

> Hi,
> 
> In Open MPI 1.4.1, the directory lib/openmpi contains about 130
> entries, including such things as mca_btl_openib.so.  In my
> build of Open MPI 1.4.2, lib/openmpi contains exactly three
> items:
> libompi_dbg_msgq.a  libompi_dbg_msgq.la  libompi_dbg_msgq.so
> 
> I have searched my 1.4.2 installation for mca_btl_openib.so,
> to no avail.  And yet, 1.4.2 seems to work "fine".  Is my
> installation broken, or is the organization significantly
> different between the two versions?  A quick scan of the
> release notes didn't help.
> 
> Thanks!
> 
> -- 
> Best regards,
> 
> David Turner
> User Services Groupemail: dptur...@lbl.gov
> NERSC Division phone: (510) 486-4027
> Lawrence Berkeley Labfax: (510) 486-4316
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-05 Thread Storm Zhang
Sorry, I should say one more thing about the 500 procs test. I tried to run
two 500 procs at the same time using SGE and it runs fast and finishes at
the same time as the single run. So I think OpenMPI can handle them
separately very well.

For the bind-to-core, I tried to run mpirun --help but not find the
bind-to-core info. I only see bynode or byslot options. Is it same as
bind-to-core? My mpirun shows version 1.3.3 but ompi_info shows 1.4.2.

Thanks a lot.

Linbao


On Mon, Oct 4, 2010 at 9:18 PM, Eugene Loh  wrote:

> Storm Zhang wrote:
>
>
>> Here is what I meant: the results of 500 procs in fact shows it with
>> 272-304(<500) real cores, the program's running time is good, which is
>> almost five times 100 procs' time. So it can be handled very well. Therefore
>> I guess OpenMPI or Rocks OS does make use of hyperthreading to do the job.
>> But with 600 procs, the running time is more than double of that of 500
>> procs. I don't know why. This is my problem.
>> BTW, how to use -bind-to-core? I added it as mpirun's options. It always
>> gives me error " the executable 'bind-to-core' can't be found. Isn't it
>> like:
>> mpirun --mca btl_tcp_if_include eth0 -np 600  -bind-to-core scatttest
>>
>
> Thanks for sending the mpirun run and error message.  That helps.
>
> It's not recognizing the --bind-to-core option.  (Single hyphen, as you
> had, should also be okay.)  Skimming through the e-mail, it looks like you
> are using OMPI 1.3.2 and 1.4.2.  Did you try --bind-to-core with both?  If I
> remember my version numbers, --bind-to-core will not be recognized with
> 1.3.2, but should be with 1.4.2.  Could it be that you only tried 1.3.2?
>
> Another option is to try "mpirun --help".  Make sure that it reports
> --bind-to-core.
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-05 Thread Terry Dontje

 On 10/05/2010 10:23 AM, Storm Zhang wrote:
Sorry, I should say one more thing about the 500 procs test. I tried 
to run two 500 procs at the same time using SGE and it runs fast and 
finishes at the same time as the single run. So I think OpenMPI can 
handle them separately very well.


For the bind-to-core, I tried to run mpirun --help but not find the 
bind-to-core info. I only see bynode or byslot options. Is it same as 
bind-to-core? My mpirun shows version 1.3.3 but ompi_info shows 1.4.2.


No, -bynode/-byslot is for mapping not binding.  I cannot explain the 
different release versions of ompi_info and mpirun.  Have you done a 
which to see where each of them are located.  Anyways, 1.3.3 does not 
have any of the -bind-to-* options.


--td

Thanks a lot.

Linbao


On Mon, Oct 4, 2010 at 9:18 PM, Eugene Loh > wrote:


Storm Zhang wrote:


Here is what I meant: the results of 500 procs in fact shows
it with 272-304(<500) real cores, the program's running time
is good, which is almost five times 100 procs' time. So it can
be handled very well. Therefore I guess OpenMPI or Rocks OS
does make use of hyperthreading to do the job. But with 600
procs, the running time is more than double of that of 500
procs. I don't know why. This is my problem.
BTW, how to use -bind-to-core? I added it as mpirun's options.
It always gives me error " the executable 'bind-to-core' can't
be found. Isn't it like:
mpirun --mca btl_tcp_if_include eth0 -np 600  -bind-to-core
scatttest


Thanks for sending the mpirun run and error message.  That helps.

It's not recognizing the --bind-to-core option.  (Single hyphen,
as you had, should also be okay.)  Skimming through the e-mail, it
looks like you are using OMPI 1.3.2 and 1.4.2.  Did you try
--bind-to-core with both?  If I remember my version numbers,
--bind-to-core will not be recognized with 1.3.2, but should be
with 1.4.2.  Could it be that you only tried 1.3.2?

Another option is to try "mpirun --help".  Make sure that it
reports --bind-to-core.

___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] location of ompi libraries

2010-10-05 Thread David Turner

Hi Jeff,

Thanks for the response.  Reviewing my builds, I realized that for
1.4.2, I had configured using

contrib/platform/lanl/tlcc/optimized-nopanasas

per Ralph Castain's suggestion.  That file includes both:

enable_dlopen=no
enable_shared=yes
enable_static=yes

Here is my *real* issue.  I am trying to test Voltaire's Fabric
Collective Accelerator, which extends mca_component_path, and
adds a few additional .so files.  It appears I must have
enable_dlopen=yes for this to work, which makes sense.

I assume that the shared/static settings above result in
*both* .a and .so versions of the ompi libraries getting
built.  I'm not sure if this will affect my ability to
use Voltaire's mca plugins, but I have determined that
simply removing the enable_dlopen=no is not sufficient
to restore all the ompi .so files.  I assume (haven't
tried it yet) that removing the enable_static=yes will
result in the ompi .so files getting created.

I guess I'm just looking for some guidance in the use
of the above options.  I have read many warnings on
the ompi website about trying to link statically.

Thanks!

On 10/5/10 7:17 AM, Jeff Squyres wrote:

It is more than likely that you compiled Open MPI with --enable-static and/or 
--disable-dlopen.  In this case, all of Open MPI's plugins are slurped up into 
the libraries themselves (e.g., libmpi.so or libmpi.a).  That's why everything 
continues to work properly.


On Oct 4, 2010, at 6:58 PM, David Turner wrote:


Hi,

In Open MPI 1.4.1, the directory lib/openmpi contains about 130
entries, including such things as mca_btl_openib.so.  In my
build of Open MPI 1.4.2, lib/openmpi contains exactly three
items:
libompi_dbg_msgq.a  libompi_dbg_msgq.la  libompi_dbg_msgq.so

I have searched my 1.4.2 installation for mca_btl_openib.so,
to no avail.  And yet, 1.4.2 seems to work "fine".  Is my
installation broken, or is the organization significantly
different between the two versions?  A quick scan of the
release notes didn't help.

Thanks!

--
Best regards,

David Turner
User Services Groupemail: dptur...@lbl.gov
NERSC Division phone: (510) 486-4027
Lawrence Berkeley Labfax: (510) 486-4316
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






--
Best regards,

David Turner
User Services Groupemail: dptur...@lbl.gov
NERSC Division phone: (510) 486-4027
Lawrence Berkeley Labfax: (510) 486-4316


Re: [OMPI users] location of ompi libraries

2010-10-05 Thread Barrett, Brian W
David -

You're correct - adding --enable-static (or it's file equivalent enable_static) 
causes components to be linked into libmpi instead of left as individual 
components.  This is probably a bug, but it's what Open MPI's done for it's 
entire life, so it's unlikely to change.  Removing the enable_dlopen=no means 
that Open MPI will look for other dynamicaly loaded components, so that should 
be sufficient for your use as long as mpicc properly added -Wl,--export-dynamic 
(which it used to do).  To be safe, however, you might want to also remove the 
enable_static line from the file.

The static library warnings are more about doing a completely static link 
(including libc and crt0) than about linking against libmpi.a.  The memory 
tricks needed to support RDMA networks on Linux are the main driver behind 
those statements.

Brian


On Oct 5, 2010, at 3:29 PM, David Turner wrote:

> Hi Jeff,
> 
> Thanks for the response.  Reviewing my builds, I realized that for
> 1.4.2, I had configured using
> 
> contrib/platform/lanl/tlcc/optimized-nopanasas
> 
> per Ralph Castain's suggestion.  That file includes both:
> 
> enable_dlopen=no
> enable_shared=yes
> enable_static=yes
> 
> Here is my *real* issue.  I am trying to test Voltaire's Fabric
> Collective Accelerator, which extends mca_component_path, and
> adds a few additional .so files.  It appears I must have
> enable_dlopen=yes for this to work, which makes sense.
> 
> I assume that the shared/static settings above result in
> *both* .a and .so versions of the ompi libraries getting
> built.  I'm not sure if this will affect my ability to
> use Voltaire's mca plugins, but I have determined that
> simply removing the enable_dlopen=no is not sufficient
> to restore all the ompi .so files.  I assume (haven't
> tried it yet) that removing the enable_static=yes will
> result in the ompi .so files getting created.
> 
> I guess I'm just looking for some guidance in the use
> of the above options.  I have read many warnings on
> the ompi website about trying to link statically.
> 
> Thanks!
> 
> On 10/5/10 7:17 AM, Jeff Squyres wrote:
>> It is more than likely that you compiled Open MPI with --enable-static 
>> and/or --disable-dlopen.  In this case, all of Open MPI's plugins are 
>> slurped up into the libraries themselves (e.g., libmpi.so or libmpi.a).  
>> That's why everything continues to work properly.
>> 
>> 
>> On Oct 4, 2010, at 6:58 PM, David Turner wrote:
>> 
>>> Hi,
>>> 
>>> In Open MPI 1.4.1, the directory lib/openmpi contains about 130
>>> entries, including such things as mca_btl_openib.so.  In my
>>> build of Open MPI 1.4.2, lib/openmpi contains exactly three
>>> items:
>>> libompi_dbg_msgq.a  libompi_dbg_msgq.la  libompi_dbg_msgq.so
>>> 
>>> I have searched my 1.4.2 installation for mca_btl_openib.so,
>>> to no avail.  And yet, 1.4.2 seems to work "fine".  Is my
>>> installation broken, or is the organization significantly
>>> different between the two versions?  A quick scan of the
>>> release notes didn't help.
>>> 
>>> Thanks!
>>> 
>>> --
>>> Best regards,
>>> 
>>> David Turner
>>> User Services Groupemail: dptur...@lbl.gov
>>> NERSC Division phone: (510) 486-4027
>>> Lawrence Berkeley Labfax: (510) 486-4316
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> 
> -- 
> Best regards,
> 
> David Turner
> User Services Groupemail: dptur...@lbl.gov
> NERSC Division phone: (510) 486-4027
> Lawrence Berkeley Labfax: (510) 486-4316
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories