[OMPI users] Concerning infiniband support

2011-01-20 Thread Zhigang Wei
Dear all, 





I want to use infiniband, I am from a University in the US, my University's
high performance center don't have Gcc compiled openmpi that support
infiniband, so I want to compile myself. 

But I have a few questions,



1.  Is it ok to compile openmpi myself with infiniband support, if I
don't have the root privilege? Is it going to work?

2.  If it is ok, how can I find out where is the infiniband installation
directory, any shell command to show it?

3.  Which configuration is correct? For example, using
"--with-openib=/usr/include/infiniband" as told in openmpi FAQ, or I need to
use "--with-openib=/usr/include/infiniband --with-openib-libdir=/usr/lib64"
both?


Thanks so much.





Daniel Wei

---

University of Notre Dame







Re: [OMPI users] Concerning infiniband support

2011-01-20 Thread John Hearns
On 20 January 2011 06:59, Zhigang Wei  wrote:
> Dear all,
>
>
>
>
>
> I want to use infiniband, I am from a University in the US, my University’s
> high performance center don’t have Gcc compiled openmpi that support
> infiniband, so I want to compile myself.

That is a surprise - you must have some Infiniband hardware available.
Is this hardware not managed by your high performance centre?

My advice - bring beer and salty snacks to your high performance
systems administrators.
Ask them to help.



Re: [OMPI users] Concerning infiniband support

2011-01-20 Thread Jeff Squyres (jsquyres)
Haha!  +1 on what John says. 

But otherwise, you shouldn't need root to install OMPI with ib support. If the 
ib drivers are installed correctly, you shouldn't need the --with-openib 
configure switches at all. 

Sent from my PDA. No type good. 

On Jan 20, 2011, at 4:48 AM, "John Hearns"  wrote:

> On 20 January 2011 06:59, Zhigang Wei  wrote:
>> Dear all,
>> 
>> 
>> 
>> 
>> 
>> I want to use infiniband, I am from a University in the US, my University’s
>> high performance center don’t have Gcc compiled openmpi that support
>> infiniband, so I want to compile myself.
> 
> That is a surprise - you must have some Infiniband hardware available.
> Is this hardware not managed by your high performance centre?
> 
> My advice - bring beer and salty snacks to your high performance
> systems administrators.
> Ask them to help.
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Concerning infiniband support

2011-01-20 Thread Bowen Zhou

Hi,

Besides all these advices have been given, you may need to use --prefix 
in configure script to override default installation directory since you 
don't have root account. Also you might want to look at MVAPICH as an 
alternative, an variant of MPICH2 that supports infiniband.


good luck,

Bowen Zhou
On 01/20/2011 01:59 AM,

Dear all,

I want to use infiniband, I am from a University in the US, my
University’s high performance center don’t have Gcc compiled openmpi
that support infiniband, so I want to compile myself.

But I have a few questions,

1.Is it ok to compile openmpi myself with infiniband support, if I don’t
have the root privilege? Is it going to work?

2.If it is ok, how can I find out where is the infiniband installation
directory, any shell command to show it?

3.Which configuration is correct? For example, using
“--with-openib=/usr/include/infiniband” as told in openmpi FAQ, or I
need to use "--with-openib=/usr/include/infiniband
--with-openib-libdir=/usr/lib64" both?


Thanks so much.

Daniel Wei

---

University of Notre Dame



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Concerning infiniband support

2011-01-20 Thread Jeff Squyres
On Jan 20, 2011, at 7:51 AM, Bowen Zhou wrote:

> Besides all these advices have been given, you may need to use --prefix in 
> configure script to override default installation directory since you don't 
> have root account. Also you might want to look at MVAPICH as an alternative, 
> an variant of MPICH2 that supports infiniband.

Ouch.  Such blasphemy on the Open MPI list hurts my poor little eyes...

;-)

(yes, that's a joke; all of us MPI people know each other.  Heck, we see each 
other every 2 months at the ongoing MPI-3 Forum meetings! :-) )

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Concerning infiniband support

2011-01-20 Thread Bowen Zhou

On 01/20/2011 07:57 AM,

On Jan 20, 2011, at 7:51 AM, Bowen Zhou wrote:


 Besides all these advices have been given, you may need to use --prefix in 
configure script to override default installation directory since you don't 
have root account. Also you might want to look at MVAPICH as an alternative, an 
variant of MPICH2 that supports infiniband.


Ouch.  Such blasphemy on the Open MPI list hurts my poor little eyes...

;-)

(yes, that's a joke; all of us MPI people know each other.  Heck, we see each 
other every 2 months at the ongoing MPI-3 Forum meetings! :-) )

Haha, my bad. I found people in this list are equally capable of 
answering technical questions and making jokes. That's something unique. :-)


[OMPI users] Help with some fundamentals

2011-01-20 Thread Olivier SANNIER
Hello,

I am currently working on a Win32 program that makes some intensive 
calculation, and is already written to be multithreaded. As a result, it uses 
all the available cores on the PC it runs on.
The basic behavior is for the user to open a model, click the "start" button, 
then the threads are spawned, and once all is finished, control is given back 
to the user.
While this works great, we have found that for larger models, the computation 
time is limited by the number of cores as the pool of tasks that could run in 
parallel is not empty.
As a result, we are investigating the possibility to use grid computing to 
somehow multiply the number of available cores.
This, of course, has technical challenges and reading documentation on various 
websites led me to the OpenMPI one and to this list.
I'm not sure it's the appropriate place to ask my questions, but should it not 
be the case, please tell me what an appropriate place might be.

I understand that MPI is a framework that would facilitate the communication 
between the user's computer and the nodes that perform the distributed tasks.
What I have a hard time grasping are these :

What communication layer is used? How do I choose it?

What is the behavior in case a node dies or becomes unreachable?

What makes any given machine become a node available for tasks?

Is there some sort of load balancing ?

Is there a monitoring tool that would give me indications of the status and 
health of the nodes?

How does the "MPI enabled" code gets transferred to the nodes? If I understand 
things correctly, I would have to write a separate command line exe that takes 
care of the tasks and this would be the exe that gets sent over to node.

I'm quite sure all these are trivial questions for those with more experience, 
but I'm having a hard time finding resources that would answer those.

Thanks in advance for your help
Olivier


Re: [OMPI users] Help with some fundamentals

2011-01-20 Thread Nico Mittenzwey

Hi,


What communication layer is used? How do I choose it?

The fastest available. You can choose the network by parameters given to 
mpirun see

http://www.open-mpi.org/faq/?category=tuning#mca-def


What is the behavior in case a node dies or becomes unreachable?

Your run will be aborted. However there is checkpoint/restart support 
for Linux http://www.open-mpi.org/faq/?category=ft


What makes any given machine become a node available for tasks?


You define it in a host file or a batch system tells it OpenMPI.


Is there some sort of load balancing ?


No, you have to do that yourself.


Is there a monitoring tool that would give me indications of the 
status and health of the nodes?



This has nothing to do with MPI. Nagios or Ganglia can do that.


How does the "MPI enabled" code gets transferred to the nodes? If I 
understand things correctly, I would have to write a separate command 
line exe that takes care of the tasks and this would be the exe that 
gets sent over to node.



Usually you use a shared file system.


I'm quite sure all these are trivial questions for those with more 
experience, but I'm having a hard time finding resources that would 
answer those.


Read an introduction on programming with MPI and another one on Beowulf 
clusters (batch systems, monitoring, shared file systems). This should 
give you enough information on the topic. If you don't mind spending 
more money on software you can also take a look at Microsofts HPC Server.


Nico



Re: [OMPI users] Help with some fundamentals

2011-01-20 Thread Olivier SANNIER
First of all, thank you for answers.
I have a bit more questions, added below.

What is the behavior in case a node dies or becomes unreachable?
Your run will be aborted. However there is checkpoint/restart support for Linux 
http://www.open-mpi.org/faq/?category=ft

As this is a Win32 program, I'll have to take into account that there is only 
the < abort > behavior.

What makes any given machine become a node available for tasks?
You define it in a host file or a batch system tells it OpenMPI.

So there is no dynamic discovery of nodes available on the network. Unless, of 
course, if I was to write a tool that would do it before the actual run is 
started.


Is there a monitoring tool that would give me indications of the status and 
health of the nodes?
This has nothing to do with MPI. Nagios or Ganglia can do that.

I was more thinking of a tool that would tell me a node is already performing a 
task, so that I can avoid having it oversubscribed.


I'm quite sure all these are trivial questions for those with more experience, 
but I'm having a hard time finding resources that would answer those.
Read an introduction on programming with MPI and another one on Beowulf 
clusters (batch systems, monitoring, shared file systems). This should give you 
enough information on the topic. If you don't mind spending more money on 
software you can also take a look at Microsofts HPC Server.
I've started looking at beowulf clusters, and that lead me to PBS. Am I right 
in assuming that PBS (PBSPro or TORQUE) could be used to do the monitoring and 
the load balancing I thought of?

Thanks
Olivier


Re: [OMPI users] Help with some fundamentals

2011-01-20 Thread David Zhang
you would probably want some kind of cluster managing software like torque

On Thu, Jan 20, 2011 at 8:50 AM, Olivier SANNIER <
olivier.sann...@actuaris.com> wrote:

> First of all, thank you for answers.
>
> I have a bit more questions, added below.
>
>
>
> What is the behavior in case a node dies or becomes unreachable?
>
> Your run will be aborted. However there is checkpoint/restart support for
> Linux http://www.open-mpi.org/faq/?category=ft
>
>
>
> As this is a Win32 program, I’ll have to take into account that there is
> only the « abort » behavior.
>
>
>
> What makes any given machine become a node available for tasks?
>
> You define it in a host file or a batch system tells it OpenMPI.
>
>
>
> So there is no dynamic discovery of nodes available on the network. Unless,
> of course, if I was to write a tool that would do it before the actual run
> is started.
>
>
>
> Is there a monitoring tool that would give me indications of the status and
> health of the nodes?
>
> This has nothing to do with MPI. Nagios or Ganglia can do that.
>
>
>
> I was more thinking of a tool that would tell me a node is already
> performing a task, so that I can avoid having it oversubscribed.
>
>
>
> I’m quite sure all these are trivial questions for those with more
> experience, but I’m having a hard time finding resources that would answer
> those.
>
> Read an introduction on programming with MPI and another one on Beowulf
> clusters (batch systems, monitoring, shared file systems). This should give
> you enough information on the topic. If you don't mind spending more money
> on software you can also take a look at Microsofts HPC Server.
>
> I’ve started looking at beowulf clusters, and that lead me to PBS. Am I
> right in assuming that PBS (PBSPro or TORQUE) could be used to do the
> monitoring and the load balancing I thought of?
>
>
>
> Thanks
>
> Olivier
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
David Zhang
University of California, San Diego


[OMPI users] FW: Open MPI on HPUX

2011-01-20 Thread dj M

 Hi,

Does anyone know if Open MPI 1.4.x works on HPUX 11i.v3?

Thanks
  

Re: [OMPI users] Help with some fundamentals

2011-01-20 Thread Nico Mittenzwey

On 01/20/2011 05:50 PM, Olivier SANNIER wrote:

What is the behavior in case a node dies or becomes unreachable?
Your run will be aborted. However there is checkpoint/restart support for Linux 
http://www.open-mpi.org/faq/?category=ft

As this is a Win32 program, I'll have to take into account that there is only the< 
 abort>  behavior.

AFAIK yes

So there is no dynamic discovery of nodes available on the network. Unless, of 
course, if I was to write a tool that would do it before the actual run is 
started.

This is done by a batch system like PBS (torque) or SGE


Is there a monitoring tool that would give me indications of the status and 
health of the nodes?
This has nothing to do with MPI. Nagios or Ganglia can do that.

I was more thinking of a tool that would tell me a node is already performing a 
task, so that I can avoid having it oversubscribed.

This is also done by a batch system

I've started looking at beowulf clusters, and that lead me to PBS. Am I right 
in assuming that PBS (PBSPro or TORQUE) could be used to do the monitoring and 
the load balancing I thought of?
Yes, however the terms "monitoring" and "load balancing" are usually 
used in other contexts.


[OMPI users] Hair depleting issue with Ompi143 and one program

2011-01-20 Thread David Mathog
I have been working on slightly modifying a software package by Sean
Eddy called Hmmer 3.  The hardware acceleration was originally SSE2 but
since most of our compute nodes only have SSE1 and MMX I rewrote a few
small sections to just use those instructions.  (And yes, as far as I
can tell it invokes emms before any floating point operations are run
after each MMX usage.)   On top of that each binary has 3 options for
running the programs: single threaded, threaded, or MPI (using 
Ompi143).  For all other programs in this package everything works
everywhere.  For one called "jackhmmer" this table results (+=runs
correctly, - = problems), where the exact same problem is run in each
test (theoretically exercising exactly the same routines, just under
different threading control):

   SSE2   SSE1 
Single  +  +
Threaded+  +
Ompi143 +  -

The negative result for the SSE/Ompi143 combination happens whether the
worker nodes are Athlon MP (SSE1 only) or Athlon64.  The test machine
for the single and threaded runs is a two CPU Opteron 280 (4 cores
total).  Ompi143 is 32 bit everywhere (local copies though).  There have
been no modifications whatsoever made to the main jackhmmer.c file,
which is where the various run methods are implemented.

Now if there was some intrinsic problem with my SSE1 code it should
presumably manifest in both the Single and Threaded versions as well
(the thread control is different, but they all feed through the same
underlying functions), or in one of the other programs, which isn't
seen.  Running under valgrind using Single or Threaded produces no
warnings.  Using mpirun with valgrind on the SSE2 produces 3: two
related to OMPI itself which are seen in every OMPI program run in
valgrind, and one caused by an MPIsend operation where the buffer
contains some uninitialized data (this is nothing toxic, just bytes in
fixed length fields which which were never set because a shorter string
is stored there). 

==19802== Syscall param writev(vector[...]) points to uninitialised byte(s)
==19802==at 0x4C77AC1: writev (in /lib/libc-2.10.1.so)
==19802==by 0x8A069B5: mca_btl_tcp_frag_send (in
/opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so)
==19802==by 0x8A0626E: mca_btl_tcp_endpoint_send (in
/opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so)
==19802==by 0x8A01ADC: mca_btl_tcp_send (in
/opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so)
==19802==by 0x7FA24A9: mca_pml_ob1_send_request_start_prepare (in
/opt/ompi143.X32/lib/openmpi/mca_pml_ob1.so)
==19802==by 0x7F98443: mca_pml_ob1_send (in
/opt/ompi143.X32/lib/openmpi/mca_pml_ob1.so)
==19802==by 0x4A8530F: PMPI_Send (in
/opt/ompi143.X32/lib/libmpi.so.0.0.2)
==19802==by 0x808D5F2: p7_oprofile_MPISend (mpi.c:101)
==19802==by 0x805762E: main (jackhmmer.c:1149)
==19802==  Address 0x770bc9d is 15,101 bytes inside a block of size
15,389 alloc'd
==19802==at 0x49E3A12: realloc (vg_replace_malloc.c:476)
==19802==by 0x808D4E3: p7_oprofile_MPISend (mpi.c:88)
==19802==by 0x805762E: main (jackhmmer.c:1149)

Do that for the SSE1 version and the same 3 errors are seen, plus many
more like the following:

==9416== Conditional jump or move depends on uninitialised value(s)
==9416==at 0x807FE3E: forward_engine (fwdback.c:420)
==9416==by 0x8080051: p7_ForwardParser (fwdback.c:143)
==9416==by 0x806C3CC: p7_Pipeline (p7_pipeline.c:590)
==9416==by 0x80564F0: main (jackhmmer.c:1426)

Unfortunately this makes absolutely no sense.  Line 420 is

   if (xE > 1.0e4)

which tells us that xE wasn't set (fine), so assaying uninitialized
with statements like:

  fprintf(stderr,"DEBUG xEv %lld\n",xEv);fflush(stderr);

(each of which generates its own uninitialized value message) the first
uninitialized variable appears very early in the code after this
_mm_setzero_ps:

  register __m128 xEv;
  //other stuff that does not touch xEv
  xEv   = _mm_setzero_ps();

Now this is hair pulling for many reasons.  The first is that nothing of
substance was changed in this file (just some #defines that
resolve to the same values as they had originally).  The second is that
this is an SSE1 operation even in the original unmodified code.  The
third is that it just isn't possible for xEv to be uninitialized after
that statement - yet it is.  (Valgrind with --smc-check=all turns up
nothing more than leaving out that parameter.)   Here is the relevant
section in xmmintrin.h:

/* Create a vector of zeros.  */
extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm_setzero_ps (void)
{
  return __extension__ (__m128){ 0.0f, 0.0f, 0.0f, 0.0f };
}

Of course all of this nonsense is happening on a worker node, which
isn't making getting to the root of the problem any easier.

The module where these uninitialized variables are seen was compiled like;

mpicc -std=gnu99 -O1 -g -m32 -pthread -msse -mno-sse2  -DHAVE_CONFIG_H 
-I../../easel -I../../easel -I. -I.. -I. -I../../src -o 

Re: [OMPI users] Hair depleting issue with Ompi143 and one program

2011-01-20 Thread David Mathog
> (And yes, as far as I
> can tell it invokes emms before any floating point operations are run
> after each MMX usage.)

Is there anything in Ompi which is likely to cause one of the MMX
routines to be interrupted in such a way that the MMX state is not
saved?  The bugs that arise when emms is not invoked after an MMX run
can be very strange.  Grasping at straws here though, presumably both
the OS and MPI (it it does this at all) preserve the state of all
registers when swapping processes around on a machine.

Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


Re: [OMPI users] Hair depleting issue with Ompi143 and one program

2011-01-20 Thread Dave Goodell
I can't speak to what OMPI might be doing to your program, but I have a few 
suggestions for looking into the Valgrind issues.

Valgrind's "--track-origins=yes" option is usually helpful for figuring out 
where the uninitialized values came from.  However, if I understand you 
correctly and if you are correct in your assumption that _mm_setzero_ps is not 
actually zeroing your xEv variable for some reason, then this option will 
unhelpfully tell you that it was caused by a stack allocation at the entrance 
to the function where the variable is declared.  But it's worth turning on 
because it's easy to do and it might show you something obvious that you are 
missing.

The next thing you can do is disable optimization when building your code in 
case GCC is taking a shortcut that is either incorrect or just doesn't play 
nicely with Valgrind.  Valgrind might run pretty slow though, because -O0 code 
can be really verbose and slow to check.

After that, if you really want to dig in, you can try reading the assembly code 
that is generated for that _mm_setzero_ps line.  The easiest way is to pass 
"-save-temps" to gcc and it will keep a copy of "sourcefile.s" corresponding to 
"sourcefile.c".  Sometimes "-fverbose-asm" helps, sometimes it makes things 
harder to follow.

And the last semi-desperate step is to dig into what Valgrind thinks is going 
on.  You'll want to read up on how memcheck really works [1] before doing this. 
 Then read up on client requests [2,3].  You can then use the 
VALGRIND_GET_VBITS client request on your xEv variable in order to see which 
parts of the variable Valgrind thinks are undefined.  If the vbits don't match 
with what you expect, there's a chance that you might have found a bug in 
Valgrind itself.  It doesn't happen often, but the SSE code can be complicated 
and isn't exercised as often as the non-vector portions of Valgrind.

Good luck,
-Dave

[1] http://valgrind.org/docs/manual/mc-manual.html#mc-manual.machine
[2] 
http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.clientreq
[3] http://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs

On Jan 20, 2011, at 5:07 PM CST, David Mathog wrote:

> I have been working on slightly modifying a software package by Sean
> Eddy called Hmmer 3.  The hardware acceleration was originally SSE2 but
> since most of our compute nodes only have SSE1 and MMX I rewrote a few
> small sections to just use those instructions.  (And yes, as far as I
> can tell it invokes emms before any floating point operations are run
> after each MMX usage.)   On top of that each binary has 3 options for
> running the programs: single threaded, threaded, or MPI (using 
> Ompi143).  For all other programs in this package everything works
> everywhere.  For one called "jackhmmer" this table results (+=runs
> correctly, - = problems), where the exact same problem is run in each
> test (theoretically exercising exactly the same routines, just under
> different threading control):
> 
>   SSE2   SSE1 
> Single  +  +
> Threaded+  +
> Ompi143 +  -
> 
> The negative result for the SSE/Ompi143 combination happens whether the
> worker nodes are Athlon MP (SSE1 only) or Athlon64.  The test machine
> for the single and threaded runs is a two CPU Opteron 280 (4 cores
> total).  Ompi143 is 32 bit everywhere (local copies though).  There have
> been no modifications whatsoever made to the main jackhmmer.c file,
> which is where the various run methods are implemented.
> 
> Now if there was some intrinsic problem with my SSE1 code it should
> presumably manifest in both the Single and Threaded versions as well
> (the thread control is different, but they all feed through the same
> underlying functions), or in one of the other programs, which isn't
> seen.  Running under valgrind using Single or Threaded produces no
> warnings.  Using mpirun with valgrind on the SSE2 produces 3: two
> related to OMPI itself which are seen in every OMPI program run in
> valgrind, and one caused by an MPIsend operation where the buffer
> contains some uninitialized data (this is nothing toxic, just bytes in
> fixed length fields which which were never set because a shorter string
> is stored there). 
> 
> ==19802== Syscall param writev(vector[...]) points to uninitialised byte(s)
> ==19802==at 0x4C77AC1: writev (in /lib/libc-2.10.1.so)
> ==19802==by 0x8A069B5: mca_btl_tcp_frag_send (in
> /opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so)
> ==19802==by 0x8A0626E: mca_btl_tcp_endpoint_send (in
> /opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so)
> ==19802==by 0x8A01ADC: mca_btl_tcp_send (in
> /opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so)
> ==19802==by 0x7FA24A9: mca_pml_ob1_send_request_start_prepare (in
> /opt/ompi143.X32/lib/openmpi/mca_pml_ob1.so)
> ==19802==by 0x7F98443: mca_pml_ob1_send (in
> /opt/ompi143.X32/lib/openmpi/mca_pml_ob1.so)
> ==19802==by 0x4A8530F: PMPI_Send (in
> /opt/ompi143.X32/lib/libmpi.so.0.0.2)
> =