[OMPI users] Concerning infiniband support
Dear all, I want to use infiniband, I am from a University in the US, my University's high performance center don't have Gcc compiled openmpi that support infiniband, so I want to compile myself. But I have a few questions, 1. Is it ok to compile openmpi myself with infiniband support, if I don't have the root privilege? Is it going to work? 2. If it is ok, how can I find out where is the infiniband installation directory, any shell command to show it? 3. Which configuration is correct? For example, using "--with-openib=/usr/include/infiniband" as told in openmpi FAQ, or I need to use "--with-openib=/usr/include/infiniband --with-openib-libdir=/usr/lib64" both? Thanks so much. Daniel Wei --- University of Notre Dame
Re: [OMPI users] Concerning infiniband support
On 20 January 2011 06:59, Zhigang Wei wrote: > Dear all, > > > > > > I want to use infiniband, I am from a University in the US, my University’s > high performance center don’t have Gcc compiled openmpi that support > infiniband, so I want to compile myself. That is a surprise - you must have some Infiniband hardware available. Is this hardware not managed by your high performance centre? My advice - bring beer and salty snacks to your high performance systems administrators. Ask them to help.
Re: [OMPI users] Concerning infiniband support
Haha! +1 on what John says. But otherwise, you shouldn't need root to install OMPI with ib support. If the ib drivers are installed correctly, you shouldn't need the --with-openib configure switches at all. Sent from my PDA. No type good. On Jan 20, 2011, at 4:48 AM, "John Hearns" wrote: > On 20 January 2011 06:59, Zhigang Wei wrote: >> Dear all, >> >> >> >> >> >> I want to use infiniband, I am from a University in the US, my University’s >> high performance center don’t have Gcc compiled openmpi that support >> infiniband, so I want to compile myself. > > That is a surprise - you must have some Infiniband hardware available. > Is this hardware not managed by your high performance centre? > > My advice - bring beer and salty snacks to your high performance > systems administrators. > Ask them to help. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Concerning infiniband support
Hi, Besides all these advices have been given, you may need to use --prefix in configure script to override default installation directory since you don't have root account. Also you might want to look at MVAPICH as an alternative, an variant of MPICH2 that supports infiniband. good luck, Bowen Zhou On 01/20/2011 01:59 AM, Dear all, I want to use infiniband, I am from a University in the US, my University’s high performance center don’t have Gcc compiled openmpi that support infiniband, so I want to compile myself. But I have a few questions, 1.Is it ok to compile openmpi myself with infiniband support, if I don’t have the root privilege? Is it going to work? 2.If it is ok, how can I find out where is the infiniband installation directory, any shell command to show it? 3.Which configuration is correct? For example, using “--with-openib=/usr/include/infiniband” as told in openmpi FAQ, or I need to use "--with-openib=/usr/include/infiniband --with-openib-libdir=/usr/lib64" both? Thanks so much. Daniel Wei --- University of Notre Dame ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Concerning infiniband support
On Jan 20, 2011, at 7:51 AM, Bowen Zhou wrote: > Besides all these advices have been given, you may need to use --prefix in > configure script to override default installation directory since you don't > have root account. Also you might want to look at MVAPICH as an alternative, > an variant of MPICH2 that supports infiniband. Ouch. Such blasphemy on the Open MPI list hurts my poor little eyes... ;-) (yes, that's a joke; all of us MPI people know each other. Heck, we see each other every 2 months at the ongoing MPI-3 Forum meetings! :-) ) -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Concerning infiniband support
On 01/20/2011 07:57 AM, On Jan 20, 2011, at 7:51 AM, Bowen Zhou wrote: Besides all these advices have been given, you may need to use --prefix in configure script to override default installation directory since you don't have root account. Also you might want to look at MVAPICH as an alternative, an variant of MPICH2 that supports infiniband. Ouch. Such blasphemy on the Open MPI list hurts my poor little eyes... ;-) (yes, that's a joke; all of us MPI people know each other. Heck, we see each other every 2 months at the ongoing MPI-3 Forum meetings! :-) ) Haha, my bad. I found people in this list are equally capable of answering technical questions and making jokes. That's something unique. :-)
[OMPI users] Help with some fundamentals
Hello, I am currently working on a Win32 program that makes some intensive calculation, and is already written to be multithreaded. As a result, it uses all the available cores on the PC it runs on. The basic behavior is for the user to open a model, click the "start" button, then the threads are spawned, and once all is finished, control is given back to the user. While this works great, we have found that for larger models, the computation time is limited by the number of cores as the pool of tasks that could run in parallel is not empty. As a result, we are investigating the possibility to use grid computing to somehow multiply the number of available cores. This, of course, has technical challenges and reading documentation on various websites led me to the OpenMPI one and to this list. I'm not sure it's the appropriate place to ask my questions, but should it not be the case, please tell me what an appropriate place might be. I understand that MPI is a framework that would facilitate the communication between the user's computer and the nodes that perform the distributed tasks. What I have a hard time grasping are these : What communication layer is used? How do I choose it? What is the behavior in case a node dies or becomes unreachable? What makes any given machine become a node available for tasks? Is there some sort of load balancing ? Is there a monitoring tool that would give me indications of the status and health of the nodes? How does the "MPI enabled" code gets transferred to the nodes? If I understand things correctly, I would have to write a separate command line exe that takes care of the tasks and this would be the exe that gets sent over to node. I'm quite sure all these are trivial questions for those with more experience, but I'm having a hard time finding resources that would answer those. Thanks in advance for your help Olivier
Re: [OMPI users] Help with some fundamentals
Hi, What communication layer is used? How do I choose it? The fastest available. You can choose the network by parameters given to mpirun see http://www.open-mpi.org/faq/?category=tuning#mca-def What is the behavior in case a node dies or becomes unreachable? Your run will be aborted. However there is checkpoint/restart support for Linux http://www.open-mpi.org/faq/?category=ft What makes any given machine become a node available for tasks? You define it in a host file or a batch system tells it OpenMPI. Is there some sort of load balancing ? No, you have to do that yourself. Is there a monitoring tool that would give me indications of the status and health of the nodes? This has nothing to do with MPI. Nagios or Ganglia can do that. How does the "MPI enabled" code gets transferred to the nodes? If I understand things correctly, I would have to write a separate command line exe that takes care of the tasks and this would be the exe that gets sent over to node. Usually you use a shared file system. I'm quite sure all these are trivial questions for those with more experience, but I'm having a hard time finding resources that would answer those. Read an introduction on programming with MPI and another one on Beowulf clusters (batch systems, monitoring, shared file systems). This should give you enough information on the topic. If you don't mind spending more money on software you can also take a look at Microsofts HPC Server. Nico
Re: [OMPI users] Help with some fundamentals
First of all, thank you for answers. I have a bit more questions, added below. What is the behavior in case a node dies or becomes unreachable? Your run will be aborted. However there is checkpoint/restart support for Linux http://www.open-mpi.org/faq/?category=ft As this is a Win32 program, I'll have to take into account that there is only the < abort > behavior. What makes any given machine become a node available for tasks? You define it in a host file or a batch system tells it OpenMPI. So there is no dynamic discovery of nodes available on the network. Unless, of course, if I was to write a tool that would do it before the actual run is started. Is there a monitoring tool that would give me indications of the status and health of the nodes? This has nothing to do with MPI. Nagios or Ganglia can do that. I was more thinking of a tool that would tell me a node is already performing a task, so that I can avoid having it oversubscribed. I'm quite sure all these are trivial questions for those with more experience, but I'm having a hard time finding resources that would answer those. Read an introduction on programming with MPI and another one on Beowulf clusters (batch systems, monitoring, shared file systems). This should give you enough information on the topic. If you don't mind spending more money on software you can also take a look at Microsofts HPC Server. I've started looking at beowulf clusters, and that lead me to PBS. Am I right in assuming that PBS (PBSPro or TORQUE) could be used to do the monitoring and the load balancing I thought of? Thanks Olivier
Re: [OMPI users] Help with some fundamentals
you would probably want some kind of cluster managing software like torque On Thu, Jan 20, 2011 at 8:50 AM, Olivier SANNIER < olivier.sann...@actuaris.com> wrote: > First of all, thank you for answers. > > I have a bit more questions, added below. > > > > What is the behavior in case a node dies or becomes unreachable? > > Your run will be aborted. However there is checkpoint/restart support for > Linux http://www.open-mpi.org/faq/?category=ft > > > > As this is a Win32 program, I’ll have to take into account that there is > only the « abort » behavior. > > > > What makes any given machine become a node available for tasks? > > You define it in a host file or a batch system tells it OpenMPI. > > > > So there is no dynamic discovery of nodes available on the network. Unless, > of course, if I was to write a tool that would do it before the actual run > is started. > > > > Is there a monitoring tool that would give me indications of the status and > health of the nodes? > > This has nothing to do with MPI. Nagios or Ganglia can do that. > > > > I was more thinking of a tool that would tell me a node is already > performing a task, so that I can avoid having it oversubscribed. > > > > I’m quite sure all these are trivial questions for those with more > experience, but I’m having a hard time finding resources that would answer > those. > > Read an introduction on programming with MPI and another one on Beowulf > clusters (batch systems, monitoring, shared file systems). This should give > you enough information on the topic. If you don't mind spending more money > on software you can also take a look at Microsofts HPC Server. > > I’ve started looking at beowulf clusters, and that lead me to PBS. Am I > right in assuming that PBS (PBSPro or TORQUE) could be used to do the > monitoring and the load balancing I thought of? > > > > Thanks > > Olivier > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- David Zhang University of California, San Diego
[OMPI users] FW: Open MPI on HPUX
Hi, Does anyone know if Open MPI 1.4.x works on HPUX 11i.v3? Thanks
Re: [OMPI users] Help with some fundamentals
On 01/20/2011 05:50 PM, Olivier SANNIER wrote: What is the behavior in case a node dies or becomes unreachable? Your run will be aborted. However there is checkpoint/restart support for Linux http://www.open-mpi.org/faq/?category=ft As this is a Win32 program, I'll have to take into account that there is only the< abort> behavior. AFAIK yes So there is no dynamic discovery of nodes available on the network. Unless, of course, if I was to write a tool that would do it before the actual run is started. This is done by a batch system like PBS (torque) or SGE Is there a monitoring tool that would give me indications of the status and health of the nodes? This has nothing to do with MPI. Nagios or Ganglia can do that. I was more thinking of a tool that would tell me a node is already performing a task, so that I can avoid having it oversubscribed. This is also done by a batch system I've started looking at beowulf clusters, and that lead me to PBS. Am I right in assuming that PBS (PBSPro or TORQUE) could be used to do the monitoring and the load balancing I thought of? Yes, however the terms "monitoring" and "load balancing" are usually used in other contexts.
[OMPI users] Hair depleting issue with Ompi143 and one program
I have been working on slightly modifying a software package by Sean Eddy called Hmmer 3. The hardware acceleration was originally SSE2 but since most of our compute nodes only have SSE1 and MMX I rewrote a few small sections to just use those instructions. (And yes, as far as I can tell it invokes emms before any floating point operations are run after each MMX usage.) On top of that each binary has 3 options for running the programs: single threaded, threaded, or MPI (using Ompi143). For all other programs in this package everything works everywhere. For one called "jackhmmer" this table results (+=runs correctly, - = problems), where the exact same problem is run in each test (theoretically exercising exactly the same routines, just under different threading control): SSE2 SSE1 Single + + Threaded+ + Ompi143 + - The negative result for the SSE/Ompi143 combination happens whether the worker nodes are Athlon MP (SSE1 only) or Athlon64. The test machine for the single and threaded runs is a two CPU Opteron 280 (4 cores total). Ompi143 is 32 bit everywhere (local copies though). There have been no modifications whatsoever made to the main jackhmmer.c file, which is where the various run methods are implemented. Now if there was some intrinsic problem with my SSE1 code it should presumably manifest in both the Single and Threaded versions as well (the thread control is different, but they all feed through the same underlying functions), or in one of the other programs, which isn't seen. Running under valgrind using Single or Threaded produces no warnings. Using mpirun with valgrind on the SSE2 produces 3: two related to OMPI itself which are seen in every OMPI program run in valgrind, and one caused by an MPIsend operation where the buffer contains some uninitialized data (this is nothing toxic, just bytes in fixed length fields which which were never set because a shorter string is stored there). ==19802== Syscall param writev(vector[...]) points to uninitialised byte(s) ==19802==at 0x4C77AC1: writev (in /lib/libc-2.10.1.so) ==19802==by 0x8A069B5: mca_btl_tcp_frag_send (in /opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so) ==19802==by 0x8A0626E: mca_btl_tcp_endpoint_send (in /opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so) ==19802==by 0x8A01ADC: mca_btl_tcp_send (in /opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so) ==19802==by 0x7FA24A9: mca_pml_ob1_send_request_start_prepare (in /opt/ompi143.X32/lib/openmpi/mca_pml_ob1.so) ==19802==by 0x7F98443: mca_pml_ob1_send (in /opt/ompi143.X32/lib/openmpi/mca_pml_ob1.so) ==19802==by 0x4A8530F: PMPI_Send (in /opt/ompi143.X32/lib/libmpi.so.0.0.2) ==19802==by 0x808D5F2: p7_oprofile_MPISend (mpi.c:101) ==19802==by 0x805762E: main (jackhmmer.c:1149) ==19802== Address 0x770bc9d is 15,101 bytes inside a block of size 15,389 alloc'd ==19802==at 0x49E3A12: realloc (vg_replace_malloc.c:476) ==19802==by 0x808D4E3: p7_oprofile_MPISend (mpi.c:88) ==19802==by 0x805762E: main (jackhmmer.c:1149) Do that for the SSE1 version and the same 3 errors are seen, plus many more like the following: ==9416== Conditional jump or move depends on uninitialised value(s) ==9416==at 0x807FE3E: forward_engine (fwdback.c:420) ==9416==by 0x8080051: p7_ForwardParser (fwdback.c:143) ==9416==by 0x806C3CC: p7_Pipeline (p7_pipeline.c:590) ==9416==by 0x80564F0: main (jackhmmer.c:1426) Unfortunately this makes absolutely no sense. Line 420 is if (xE > 1.0e4) which tells us that xE wasn't set (fine), so assaying uninitialized with statements like: fprintf(stderr,"DEBUG xEv %lld\n",xEv);fflush(stderr); (each of which generates its own uninitialized value message) the first uninitialized variable appears very early in the code after this _mm_setzero_ps: register __m128 xEv; //other stuff that does not touch xEv xEv = _mm_setzero_ps(); Now this is hair pulling for many reasons. The first is that nothing of substance was changed in this file (just some #defines that resolve to the same values as they had originally). The second is that this is an SSE1 operation even in the original unmodified code. The third is that it just isn't possible for xEv to be uninitialized after that statement - yet it is. (Valgrind with --smc-check=all turns up nothing more than leaving out that parameter.) Here is the relevant section in xmmintrin.h: /* Create a vector of zeros. */ extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_setzero_ps (void) { return __extension__ (__m128){ 0.0f, 0.0f, 0.0f, 0.0f }; } Of course all of this nonsense is happening on a worker node, which isn't making getting to the root of the problem any easier. The module where these uninitialized variables are seen was compiled like; mpicc -std=gnu99 -O1 -g -m32 -pthread -msse -mno-sse2 -DHAVE_CONFIG_H -I../../easel -I../../easel -I. -I.. -I. -I../../src -o
Re: [OMPI users] Hair depleting issue with Ompi143 and one program
> (And yes, as far as I > can tell it invokes emms before any floating point operations are run > after each MMX usage.) Is there anything in Ompi which is likely to cause one of the MMX routines to be interrupted in such a way that the MMX state is not saved? The bugs that arise when emms is not invoked after an MMX run can be very strange. Grasping at straws here though, presumably both the OS and MPI (it it does this at all) preserve the state of all registers when swapping processes around on a machine. Thanks, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
Re: [OMPI users] Hair depleting issue with Ompi143 and one program
I can't speak to what OMPI might be doing to your program, but I have a few suggestions for looking into the Valgrind issues. Valgrind's "--track-origins=yes" option is usually helpful for figuring out where the uninitialized values came from. However, if I understand you correctly and if you are correct in your assumption that _mm_setzero_ps is not actually zeroing your xEv variable for some reason, then this option will unhelpfully tell you that it was caused by a stack allocation at the entrance to the function where the variable is declared. But it's worth turning on because it's easy to do and it might show you something obvious that you are missing. The next thing you can do is disable optimization when building your code in case GCC is taking a shortcut that is either incorrect or just doesn't play nicely with Valgrind. Valgrind might run pretty slow though, because -O0 code can be really verbose and slow to check. After that, if you really want to dig in, you can try reading the assembly code that is generated for that _mm_setzero_ps line. The easiest way is to pass "-save-temps" to gcc and it will keep a copy of "sourcefile.s" corresponding to "sourcefile.c". Sometimes "-fverbose-asm" helps, sometimes it makes things harder to follow. And the last semi-desperate step is to dig into what Valgrind thinks is going on. You'll want to read up on how memcheck really works [1] before doing this. Then read up on client requests [2,3]. You can then use the VALGRIND_GET_VBITS client request on your xEv variable in order to see which parts of the variable Valgrind thinks are undefined. If the vbits don't match with what you expect, there's a chance that you might have found a bug in Valgrind itself. It doesn't happen often, but the SSE code can be complicated and isn't exercised as often as the non-vector portions of Valgrind. Good luck, -Dave [1] http://valgrind.org/docs/manual/mc-manual.html#mc-manual.machine [2] http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.clientreq [3] http://valgrind.org/docs/manual/mc-manual.html#mc-manual.clientreqs On Jan 20, 2011, at 5:07 PM CST, David Mathog wrote: > I have been working on slightly modifying a software package by Sean > Eddy called Hmmer 3. The hardware acceleration was originally SSE2 but > since most of our compute nodes only have SSE1 and MMX I rewrote a few > small sections to just use those instructions. (And yes, as far as I > can tell it invokes emms before any floating point operations are run > after each MMX usage.) On top of that each binary has 3 options for > running the programs: single threaded, threaded, or MPI (using > Ompi143). For all other programs in this package everything works > everywhere. For one called "jackhmmer" this table results (+=runs > correctly, - = problems), where the exact same problem is run in each > test (theoretically exercising exactly the same routines, just under > different threading control): > > SSE2 SSE1 > Single + + > Threaded+ + > Ompi143 + - > > The negative result for the SSE/Ompi143 combination happens whether the > worker nodes are Athlon MP (SSE1 only) or Athlon64. The test machine > for the single and threaded runs is a two CPU Opteron 280 (4 cores > total). Ompi143 is 32 bit everywhere (local copies though). There have > been no modifications whatsoever made to the main jackhmmer.c file, > which is where the various run methods are implemented. > > Now if there was some intrinsic problem with my SSE1 code it should > presumably manifest in both the Single and Threaded versions as well > (the thread control is different, but they all feed through the same > underlying functions), or in one of the other programs, which isn't > seen. Running under valgrind using Single or Threaded produces no > warnings. Using mpirun with valgrind on the SSE2 produces 3: two > related to OMPI itself which are seen in every OMPI program run in > valgrind, and one caused by an MPIsend operation where the buffer > contains some uninitialized data (this is nothing toxic, just bytes in > fixed length fields which which were never set because a shorter string > is stored there). > > ==19802== Syscall param writev(vector[...]) points to uninitialised byte(s) > ==19802==at 0x4C77AC1: writev (in /lib/libc-2.10.1.so) > ==19802==by 0x8A069B5: mca_btl_tcp_frag_send (in > /opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so) > ==19802==by 0x8A0626E: mca_btl_tcp_endpoint_send (in > /opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so) > ==19802==by 0x8A01ADC: mca_btl_tcp_send (in > /opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so) > ==19802==by 0x7FA24A9: mca_pml_ob1_send_request_start_prepare (in > /opt/ompi143.X32/lib/openmpi/mca_pml_ob1.so) > ==19802==by 0x7F98443: mca_pml_ob1_send (in > /opt/ompi143.X32/lib/openmpi/mca_pml_ob1.so) > ==19802==by 0x4A8530F: PMPI_Send (in > /opt/ompi143.X32/lib/libmpi.so.0.0.2) > =