[OMPI users] Correction to FAQ: How do I build BLACS with Open MPI?
In the FAQ <http://www.open-mpi.org/faq/?category=mpi-apps>, section labeled: 12. How do I build BLACS with Open MPI? INTFACE = -Df77IsF2C That INTFACE value is only for G77, G95, and related compilers. For the Intel Fortran compiler it is: -DAdd_ I have successfully built the combination of OpenMPI 1.2.3, ATLAS, BLACS, ScalaPack, and MUMPS using the Intel Fortran compiler on two different Debian Linux systems (3.0r3 on AMD Opterons and 4.0r0 on Intel Woodcrest/MacPro). Michael
Re: [OMPI users] Correction to FAQ: How do I build BLACS with Open MPI?
On Jul 12, 2007, at 4:42 PM, George Bosilca wrote: The INTFACE is for the namespace interface in order to allow the Fortran code to call a C function. So it should be dependent on the compiler. Btw, for some reasons I was quite sure we generate all 4 versions of the Fortran interface ... If this is true is doesn't really mater what you have in the INTFACE. It would except this flag not only affects the names BLACS uses to link to OpenMPI but also what interfaces it generates (based on my experience), which then for example affects what happens when you build ScalaPack. I believe that was what I was seeing when building those three with the Intel compiler and g95, the latter was harder then expected. The option Jeff is refering to is the TRANSCOMM define. It allow BLACS to know how to convert between Fortran and C handlers. For Open MPI this should be set to -DUseMpi2. Fortunately documented on the web FAQ but not in the BLACS documentation. Michael Thanks, george. On Jul 12, 2007, at 2:41 PM, Jeff Squyres wrote: On Jul 12, 2007, at 2:28 PM, Michael wrote: In the FAQ <http://www.open-mpi.org/faq/?category=mpi-apps>, section labeled: 12. How do I build BLACS with Open MPI? INTFACE = -Df77IsF2C That INTFACE value is only for G77, G95, and related compilers. For the Intel Fortran compiler it is: -DAdd_ Really? I always thought that this flag discussed how to convert F77 MPI handles to C handles (some MPI implementations use integers for MPI handles in C, so there's no conversion necessary, but LAM and Open MPI use pointers, so using the MPI_*_f2c() functions are necessary). Hence, it's not specific to a given fortran compiler. But I could be completely misunderstanding this value... UTK: can you confirm/deny both of these values? (I do not claim to be a BLACS expert...) I have successfully built the combination of OpenMPI 1.2.3, ATLAS, BLACS, ScalaPack, and MUMPS using the Intel Fortran compiler on two different Debian Linux systems (3.0r3 on AMD Opterons and 4.0r0 on Intel Woodcrest/MacPro). Michael ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] mpi_file_set_view
On Sep 17, 2007, at 7:55 PM, Jeff Squyres wrote: Are you using the MPI F90 bindings perchance? If so, the issue could be that the prototype for MPI_FILE_SET_VIEW is: interface MPI_File_set_view subroutine MPI_File_set_view(fh, disp, etype, filetype, datarep, & info, ierr) include 'mpif-config.h' integer, intent(in) :: fh integer(kind=MPI_OFFSET_KIND), intent(in) :: disp integer, intent(in) :: etype integer, intent(in) :: filetype character(len=*), intent(in) :: datarep integer, intent(in) :: info integer, intent(out) :: ierr end subroutine MPI_File_set_view end interface and you might need a variable to be explicitly typed "integer (kind=MPI_OFFSET_KIND)" -... On Sep 17, 2007, at 12:40 PM, Andrus, Mr. Brian (Contractor) wrote: I have run into something that I don't quite understand. I have some code that is meant to open a file for reading, but at compile time I get "Could not resolve generic procedure mpi_file_set_view" Jeff is precisely correct. In Fortran 90 if you get a message of this type from the compiler it means that the variable types don't line up between subroutine/function and the calling code. The only promotion in Fortran 90 is inline, i.e. x = i * y. Fortran 90 is a strongly typed language if you use interfaces. Unfortunately I have yet to see a Fortran 90 compiler that gives a obvious error message pointing to the specific error for these interfacing errors. Michael
Re: [OMPI users] C and Fortran 77 compilers are not link compatible. Can not continue.
On Sep 20, 2007, at 7:49 AM, Tim Prins wrote: This is because Open MPI is finding gcc for the C compiler and ifort for the Fortran compiler. Just to be clear: it is possible to build OpenMPI using ifort for Fortran and gcc for the C compiler on at least Linux. I have done that on several Linux systems for many releases of OpenMPI, but have not tried on OS X. On OS X I have been using g95. For reference below is my build commands for Linux with ifort: ./configure F77=ifort FC=ifort --with-mpi-f90-size=small ; make all For reference below is my build commands OS X with g95: ./configure F77=g95 FC=g95 LDFLAGS=-lSystemStubs --with-mpi-f90- size=small ; make all I'm not aware if special flags are needed with ifort on OS X, but - ISystemStubs is required for g95 and might be for ifort as well on OS X. Michael
Re: [OMPI users] which alternative to OpenMPI should I choose?
On Oct 19, 2007, at 9:29 AM, Marcin Skoczylas wrote: Jeff Squyres wrote: On Oct 18, 2007, at 9:24 AM, Marcin Skoczylas wrote: /I assume this could be because of: $ /sbin/route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.125.17.0* 255.255.255.0 U 0 00 eth1 192.168.12.0* 255.255.255.0 U 0 00 eth1 161.254.0.0 * 255.255.0.0 U 0 00 eth1 default 192.125.17.10.0.0.0 UG0 00 eth1 Actually the configuration here is quite strange, this is not a private address. The head node sits on a public address from 192.125.17.0 net (routable from outside), workers are on 192.168.12.0 I have an almost similar configuration that works just fine with OpenMPI, in my case the head node has three interfaces and the worker nodes each have two interfaces, the configuration is roughly: master: eth0: 192.168.x.x, eth1 & eth2 bonded to 10.0.0.1 node2: eth0 & eth1 bonded to 10.0.0.2 nodeN: eth0 & eth1 bonded to 10.0.0.N So our "outside" communication with the head node is on the 192.168 network and the internal communication is on the 10.0.0.x network. In your case the "outside" communication is on the the 192.125 network and the internal communication is on the 192.168 network. The primary difference seems to be that you have all communication going over a single interface. I'm a little surprised there is any problem at all with OpenMPI & your configuration as my configuration is more complicated. Michael
Re: [OMPI users] OpenMP + OpenMPI
On Dec 5, 2007, at 9:57 PM, Tee Wen Kai wrote: I have installed openmpi-1.2.3. My system has two ethernet ports. Thus, I am trying to make use of both ports to speed up the communication process by using openmp to split into two threads. Why not use Ethernet Bonding at the system level, it's a lot easier then what it sounds like you are trying to do and it speeds up all ethernet traffic on the computer. What OS are you trying to do this on. Michael
[OMPI users] Dual ethernet & OpenMPI
In the past I configured a Linux cluster by bonding two ethernet ports together on each node (with the master having a third port of outside communication); however, recent discussions seem to say that if I have two ethernet cards OpenMPI can handle all the setup itself. My question is what address ranges should I use, that is, should both ports be on the same network range, i.e. 10.0.0.x/255.255.255.0, or should they be on separate network ranges, i.e. 10.0.0.x/255.255.255.0 and 10.0.1.x/255.255.255.0. Would I need a third ethernet card for outside communication or could one port on the master node handle both internal and external communications. Would there be any special flags to set this up or would OpenMPI detect the two paths -- obviously each port would have a different IP address if I'm not using bonding so do you just double the host list? How would I test if I have doubled my bandwidth? Michael
Re: [OMPI users] ScaLapack and BLACS on Leopard
On Mar 6, 2008, at 12:49 PM, Doug Reeder wrote: Greg, I would disagree with your statement that the available fortran options can't pass a cost-benefit analysis. I have found that for scientific programming (e.g., Livermore Fortran Kernels and actual PDE solvers) that code produced by the intel compiler runs 25 to 55% faster than code from gfortran or g95. Looking at the cost of adding processors with g95/gfortran to get the same throughput as with ifort you recover the $549 compiler cost real quickly. Doug Reeder I've a big fan of g95, but actually I'm seeing even greater differences in a small code I'm using for some lengthy calculations. With 14 MB of data being read into memory and processed: Intel ifort is 7.7x faster then Linux g95 on MacPro 3.0 GHz Intel ifort is 2.9x faster then Linux g95 on Dual Opteron 1.4 GHz Intel ifort is 1.8x faster then Linux g95 on SGI Altix 350 dual Itanium2 1.4 GHz OS X g95 is 2.7x faster then Linux g95 on a MacPro 2.66 GHz (same hardware exactly) The complete data set is very large, 56 GB, but that is 42 individual frequencies, where as the 14 MB is a single frequency, data averaged over areas, so get a favor of the answer but not exactly the right answer. I played around with compiler options, specified the exact processor type within the limits of gcc and I gained only factions of a percent. A co-worker saw factor 2 differences between Intel's compiler and g95 with a very complicated code. Michael
Re: [OMPI users] MPI-2 Supported on Open MPI 1.2.5?
Quick answer, till you get a complete answer, Yes, OpenMPI has long supported most of the MPI-2 features. Michael On Mar 7, 2008, at 7:44 AM, Jeff Pummill wrote: Just a quick question... Does Open MPI 1.2.5 support most or all of the MPI-2 directives and features? I have a user who specified MVAPICH2 as he needs some features like extra task spawning, but I am trying to standardize on Open MPI compiled against Infiniband for my primary software stack. Thanks! -- Jeff F. Pummill Senior Linux Cluster Administrator University of Arkansas
Re: [OMPI users] configure:25579: error: No atomic primitives available for ppc74xx-linux-gnu
On Apr 9, 2008, at 1:57 PM, Bailey, Eric wrote: I am trying to use a cross compiler to build Open MPI for an embedded ppc7448 running Linux 2.6 but during configure I get the following error. configure:25579: error: No atomic primitives available for ppc74xx-linux-gnu Does anyone have an idea as to how to get past this error? ... The configure is complaining about the missing atomic directives for your processor. We have the MIPS atomic calls but not the MIPS64. We just have to add them in the opal/asm/base. based on my reading PPC 7448 is basically the same processor in my Apple PowerMac G4 <http://en.wikipedia.org/wiki/PowerPC_G4>. Therefore, OpenMPI should have no trouble as I build OpenMPI on my G4 many times. I have no idea where the MIPS references come from. PPC has always meant PowerPC in everything I have seen. All MIPS chips I'm aware of are labeled R. It might be best to get a PowerMac G4 and build OpenMPI on it, but you'd probably have better luck if you install Linux on the G4 instead of building OpenMPI on OS X as your final platform is Linux. Michael
[OMPI users] Problem compiling open MPI on cygwin on windows
Hi, New to open MPI, but have used MPI before. I am trying to compile open MPI on cygwin on widows XP. From what I have read this should work? Initially I hit a problem with the 1.2.6 standard download in that time related header file was incorrect and the mailing list pointed me to the trunk build to solve that problem. Now when I try to compile I am getting the following error at the bottom of this mail. My question is am I wasting my time trying to use cygwin, or are there people out there using it on cygwin. If so, is there a solution to the problem below? Thanks in Advance, Michael. mv -f $depbase.Tpo $depbase.Plo libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../orte/include -I../../../../ompi/include -I../../../../op al/mca/paffinity/linux/plpa/src/libplpa -I../../../.. -D_REENTRANT -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -MT paffinity_window s_module.lo -MD -MP -MF .deps/paffinity_windows_module.Tpo -c paffinity_windows_module.c -DDLL_EXPORT -DPIC -o .libs/paffinity_windows_modu le.o paffinity_windows_module.c:41: error: parse error before "sys_info" paffinity_windows_module.c:41: warning: data definition has no type or storage class paffinity_windows_module.c: In function `windows_module_get_num_procs': paffinity_windows_module.c:90: error: request for member `dwNumberOfProcessors' in something not a structure or union paffinity_windows_module.c: In function `windows_module_set': paffinity_windows_module.c:96: error: `HANDLE' undeclared (first use in this function) paffinity_windows_module.c:96: error: (Each undeclared identifier is reported only once paffinity_windows_module.c:96: error: for each function it appears in.) paffinity_windows_module.c:96: error: parse error before "threadid" paffinity_windows_module.c:97: error: `DWORD_PTR' undeclared (first use in this function) paffinity_windows_module.c:99: error: `threadid' undeclared (first use in this function) paffinity_windows_module.c:99: error: `process_mask' undeclared (first use in this function) paffinity_windows_module.c:99: error: `system_mask' undeclared (first use in this function) paffinity_windows_module.c: In function `windows_module_get': paffinity_windows_module.c:116: error: `HANDLE' undeclared (first use in this function) paffinity_windows_module.c:116: error: parse error before "threadid" paffinity_windows_module.c:117: error: `DWORD_PTR' undeclared (first use in this function) paffinity_windows_module.c:119: error: `threadid' undeclared (first use in this function) paffinity_windows_module.c:119: error: `process_mask' undeclared (first use in this function) paffinity_windows_module.c:119: error: `system_mask' undeclared (first use in this function) make[2]: *** [paffinity_windows_module.lo] Error 1 make[2]: Leaving directory `/home/Michael/mpi/openmpi-1.3a1r18208/opal/mca/paffinity/windows' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/Michael/mpi/openmpi-1.3a1r18208/opal' make: *** [all-recursive] Error 1
Re: [OMPI users] install intel mac with Laopard
On Apr 24, 2008, at 10:48 AM, Jeff Squyres (jsquyres) wrote: You probably want to use all the intel compilers, not just ifort. CC=icc CXX=icpc FC=ifort F77=ifort I have long used gcc and ifort on 64-bit Linux (since SUSE did 64-bit Linux on AMD Opterons), I don't see a good reason why OS X should be be/remain a problem. There are very major reasons to use gcc and ifort. It works and buying a bunch of ifort C licenses is an unnecessary expense, unless there is some type of speed advantage to compiling OpenMPI with the Intel compilers instead of gcc. Given that we have very little C code that requires a highly optimized compiler for the latest Intel Xeon we will continue to use gcc when building OpenMPI. However, given that ifort generates code that is 7.7 times faster under Linux than that generated by latest g95 and gfortran in my tests on a 5300-series Xeon (first dual quad 3.0 GHz Mac Pro) we will also continue using ifort for the foreseeable future. Strangely g95 is 2.7 times faster under OS X (with virus-checking) than under g95 under Linux on a 5100-series Xeon (first quad 2.66 GHz Mac Pro). This is something I will test again with Intel's OS X Fortran compiler against their Linux compiler to see if there is a difference there. Michael
Re: [OMPI users] Memory question and possible bug in 64bit addressing under Leopard!
On Apr 25, 2008, at 4:10 PM, Brian Barrett wrote: On Apr 25, 2008, at 2:06 PM, Gregory John Orris wrote: produces a core dump on a machine with 12Gb of RAM. and the error message mpiexec noticed that job rank 0 with PID 75545 on node mymachine.com exited on signal 4 (Illegal instruction). However, substituting in float *X = new float[n]; for float X[n]; Succeeds! You're running off the end of the stack, because of the large amount of data you're trying to put there. OS X by default has a tiny stack size, so codes that run on Linux (which defaults to a much larger stack size) sometimes show this problem. Your best bets are either to increase the max stack size or (more portably) just allocate everything on the heap with malloc/new. Where are Fortran 90 arrays allocated, stack or heap? I can't see us using malloc in our Fortran 90 codes, I need to understand this before I start configuring a new clusters, I was planning for it to run OS X instead of Linux. At the moment I don't have an OS X system with enough RAM to test this. Michael
Re: [OMPI users] Error: SAVE statement at (1) follows blanket SAVE statement in file mpif.h
The problem discussed here is with MPICH2 version of MPI not OpenMPI. Michael On Nov 18, 2006, at 9:22 AM, Jeff Squyres wrote: We do not appear to have the token "save" anywhere in our mpif.h file. Can you send a copy of the mpif.h file that your compiler is finding (and ensure that it belongs to Open MPI)? Also please send the information regarding compilation problems listed on the "Getting help" page on the web site. Thanks! On Nov 16, 2006, at 4:11 PM, Yu Chen wrote: Hello, Not sure if it's openmpi related or the program I am installing. I installed openmpi using g95 as F77 and F90 compiler with flags "-ffixed-line-length-132 -fno-underscoring" on PowerMac G5 with OS X 10.4 without any problems. Then I tried to compile this program (CYANA if it matters, it's a molucular caculation program) with openmpi generated mpif90 wrapper with same flags, then it gave out the following errors, wondering if someone has idea about this, googled it without much help. Thanks a lot in advance. . . ../../etc/prepare -c -Dg95 -Dmpi -Dapple -Dapple_ompig95-withflags -w inclan.for > inclan.f /sw/mpich2_g95_withflags/bin/mpif90 -c -ffixed-line-length-132 -fno-underscoring inclan.f In file mpif.h:420 Included at inclan.f:26 SAVE /MPIPRIV1/,/MPIPRIV2/ 1 Error: SAVE statement at (1) follows blanket SAVE statement In file mpif.h:423 Included at inclan.f:26 SAVE /MPIPRIVC/ 1 Error: SAVE statement at (1) follows blanket SAVE statement make[2]: *** [inclan.o] Error 1 . . === Yu Chen Howard Hughes Medical Institute Chemistry Building, Rm 182 University of Maryland at Baltimore County 1000 Hilltop Circle Baltimore, MD 21250 phone: (410)455-6347 (primary) (410)455-2718 (secondary) fax:(410)455-1174 email: c...@hhmi.umbc.edu === ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] install script issue
Building openmpi-1.3a1r13525 on OS X 10.4.8 (PowerPC), using my standard compile line ./configure F77=g95 FC=g95 LDFLAGS=-lSystemStubs --with-mpi-f90- size=large --with-f90-max-array-dim=3 ; make all and after installing I found that I couldn't compile, because of the following: -rw--- 1 root wheel 640216 Feb 7 14:48 libmpi_f90.a This has not happened in the past and I followed the same procedures I've been using for many months. One slight difference is that I installed using the command "make install all" rather then "make install", also I had uninstalled the previous version prior to installing this version. Michael
[OMPI users] Fortran90 interfaces--problem?
I have discovered a problem with the Fortran90 interfaces for all types of communication when one uses derived datatypes (I'm currently using openmpi-1.3a1r13918 [for testing] and openmpi-1.1.2 [for compatibility with an HPC system]), for example call MPI_RECV(tsk,1,MPI_TASKSTATE,src, 1,MPI_COMM_WORLD,MPI_STATUS_IGNORE,ier) where tsk is a Fortran 90 structure and MPI_TASKSTATE has been created by MPI_TYPE_CREATE_STRUCT. At the moment I can't imagine a way to modify the OpenMPI interface generation to work around this besides switching to --with-mpi-f90- size=small. Michael
[OMPI users] MPI_PACK very slow?
I have a section of code were I need to send 8 separate integers via BCAST. Initially I was just putting the 8 integers into an array and then sending that array. I just tried using MPI_PACK on those 8 integers and I'm seeing a massive slow down in the code, I have a lot of other communication and this section is being used only 5 times. I went from 140 seconds to 277 seconds on 16 processors using TCP via a dual gigabit ethernet setup (I'm the only user working on this system today). This was run with OpenMPI 1.1.2 to maintain compatibility with a major HPC site. Is there a know problem with MPI_PACK/UNPACK in OpenMPI? Michael
Re: [OMPI users] MPI_PACK very slow?
I discovered I made a minor change that cost me dearly (I had thought I had tested this single change but perhaps didn't track the timing data closely). MPI_Type_creat_struct performs well only when all the data is continuous in memory (at least for OpenMPI 1.1.2). Is this normal or expected? In my case the program has a f90 structure with 11 integers, 2 logicals, and five 50 element integer arrays. But at the first stage of the program only the first element of those arrays are used. But using MPI_Type_create_struct it is more efficient to send the entire 263 words of continuous memory (58 sec's) than to try and send only 18 words of noncontinuous memory (64 sec's). At the second stage it's 33 words and at that stage it becomes 47 sec's vs. 163 sec's, an extra 116 seconds, which dominates the push of my overall wall clock time from 130 to 278 seconds. The third stage increases from 13 seconds to 37 seconds. Because I need to send this block of data back and forward a lot I was hoping to find a way to speed up this data transfer of this odd block of data and a couple other variables. I may try PACK and UNPACK on the structure, but calling those lots of times can't be more efficient. Previously I was equivalencing the structure to a integer array and sending the integer array as a fast dirty solution to get started and it worked. Not completely portable no doubt. Michael ps. I don't currently have valgrind installed on this cluster and valgrind is not part of the Debian Linux 3.1r3 distribution. Without any experience with valgrind I'm not sure how useful valgrind will be with a MPI program of 500+ subroutines and 50K+ lines running on 16 processes. It took us a bit to get profiling working for the OpenMP version of this code. On Mar 6, 2007, at 11:28 AM, George Bosilca wrote: I doubt this come from the MPI_Pack/MPI_Unpack. The difference is 137 seconds for 5 calls. That's basically 27 seconds by call to MPI_Pack, for packing 8 integers. I know the code and I'm affirmative there is no way to spend 27 seconds over there. Can you run your application using valgrind with the callgrind tool. This will give you some basic informations about where the time is spend. This will give us additional information about where to look. Thanks, george. On Mar 6, 2007, at 11:26 AM, Michael wrote: I have a section of code were I need to send 8 separate integers via BCAST. Initially I was just putting the 8 integers into an array and then sending that array. I just tried using MPI_PACK on those 8 integers and I'm seeing a massive slow down in the code, I have a lot of other communication and this section is being used only 5 times. I went from 140 seconds to 277 seconds on 16 processors using TCP via a dual gigabit ethernet setup (I'm the only user working on this system today). This was run with OpenMPI 1.1.2 to maintain compatibility with a major HPC site. Is there a know problem with MPI_PACK/UNPACK in OpenMPI? Michael ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users "Half of what I say is meaningless; but I say it so that the other half may reach you" Kahlil Gibran ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] LSF & OpenMPI
What is the status of LSF and OpenMPI? I'm running on a major HPC system using GM & LSF and we have to use a number of workarounds so that we can use OpenMPI. Specifically, using the scripts on this system we have to have our csh file source a file to set up the environment on the nodes. Using OpenMPI's mpirun directly does not work because at the very minimum the hosts to run on are not available to it, I had a work around but there it seems that the environment is not passed to the nodes. The notes from the support people indicate that the problem is that openmpi's mpirun command doesn't recognize the "-gm-copy-env" option. Does this mean anything to anyone? Open MPI: 1.1.2 Open MPI SVN revision: r12073 MCA btl: self (MCA v1.0, API v1.0, Component v1.1.2) MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.2) MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.2) MCA btl: mvapi (MCA v1.0, API v1.0, Component v1.1.2) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) Have there been any improvements in the compatibility of OpenMPI with LSF since version 1.1.2? Does anyone in the OpenMPI team have access to a system using the LSF batch queueing system? Is an machine with Gm and LSF different yet? Michael
[OMPI users] portability of the executables compiled with OpenMPI
I'm having trouble with the portability of executables compiled with OpenMPI. I suspect the sysadms on the HPC system I'm using changed something because I think it worked previously. Situation: I'm compiling my code locally on a machine with just ethernet interfaces and OpenMPI 1.1.2 that I built. When I attempt to run that executable on a HPC machine with OpenMPI 1.1.2 and InfiniBand interfaces I get messages about "can't find libmosal.so.0.0" -- I'm certain this wasn't happening earlier. I can compile on this machine and run on it, even though there is no libmosal.* in my path. mpif90 --showme on this system gives me: /opt/compiler/intel/compiler91/x86_64/bin/ifort -I/opt/mpi/x86_64/ intel/9.1/openmpi-1.1.4/include -pthread -I/opt/mpi/x86_64/intel/9.1/ openmpi-1.1.4/lib -L/opt/mpi/x86_64/intel/9.1/openmpi-1.1.4/lib -L/ opt/gm/lib64 -lmpi_f90 -lmpi -lorte -lopal -lgm -lvapi -lmosal -lrt - lnuma -ldl -Wl,--export-dynamic -lnsl -lutil -ldl I suspect that read access to libmosal.so has been removed and somehow when I link on this machine I'm getting a static library, i.e. libmosal.a Does this make any sense? Is there a flag in this compile line that permits linking an executable even when the person doing the linking does not have access to all the libraries, i.e. export-dynamic? Michael
Re: [OMPI users] portability of the executables compiled with OpenMPI
On Mar 15, 2007, at 12:18 PM, Michael wrote: I'm having trouble with the portability of executables compiled with OpenMPI. I suspect the sysadms on the HPC system I'm using changed something because I think it worked previously. Apparently there was a misconfiguration, i.e. missing libraries and links on some nodes. I would like to hear just how portable an executable compiled against OpenMPI shared libraries should be. I'm compiling on a Debian Linux system with dual 1.3 GHz AMD Opterons per node and an internal network of dual gigabit ethernet. I'm planning on a SUSE Linux Enterprise Server 9 system with dual 3.6 GHz Intel Xeon EM64T per node and an internal network using Myrinet. I believe I actually had this working previously and now there are a mix of libraries missing from some nodes. Michael
Re: [OMPI users] portability of the executables compiled with OpenMPI
On Mar 22, 2007, at 7:55 AM, Jeff Squyres wrote: On Mar 15, 2007, at 12:18 PM, Michael wrote: Situation: I'm compiling my code locally on a machine with just ethernet interfaces and OpenMPI 1.1.2 that I built. When I attempt to run that executable on a HPC machine with OpenMPI 1.1.2 and InfiniBand interfaces I get messages about "can't find libmosal.so.0.0" -- I'm certain this wasn't happening earlier. I can compile on this machine and run on it, even though there is no libmosal.* in my path. mpif90 --showme on this system gives me: /opt/compiler/intel/compiler91/x86_64/bin/ifort -I/opt/mpi/x86_64/ intel/9.1/openmpi-1.1.4/include -pthread -I/opt/mpi/x86_64/intel/9.1/ openmpi-1.1.4/lib -L/opt/mpi/x86_64/intel/9.1/openmpi-1.1.4/lib -L/ opt/gm/lib64 -lmpi_f90 -lmpi -lorte -lopal -lgm -lvapi -lmosal -lrt - lnuma -ldl -Wl,--export-dynamic -lnsl -lutil -ldl Based on this output, I assume you have configured OMPI with either -- enable-static or otherwise including all plugins in libmpi.so, right? No, I did not configure OpenMPI on this machine. I believe OpenMPI was configured not static by the installers based on the messages and the dependency on the missing libraries. The issue was that some of the 1000+ nodes on this major HPC machine were missing libraries needed for OpenMPI but because of the low usage of OpenMPI I'm the first to discover the problem. For whatever reason these libraries are not on the front-end machines that feed the main system. It's always nice running OpenMPI on your own machine but not everyone can always do that. The way I read my experience is that OpenMPI's libmpi.so depends on different libraries on different machines, this means that if you don't compile static you can compile on a machine that does not have libraries for expensive interfaces and run on another machine with those expensive interfaces -- that's what I'm am doing now successfully. Michael
Re: [OMPI users] portability of the executables compiled with OpenMPI
For your reference: The following cross compile/run combination with OpenMPI 1.1.4 is currently working for me: I'm compiling on a Debian Linux system with dual 1.3 GHz AMD Opterons per node and an internal network of dual gigabit ethernet. With OpenMPI compiled with Intel Fortran 9.1.041 and gcc 3.3.5 I'm running on a SUSE Linux Enterprise Server 9 system with dual 3.6 GHz Intel Xeon EM64T per node and an internal network using Myrinet. OpenMPI compiled with Intel Fortran 9.1.041 and Intel icc 9.1.046 There is enough compatibility between the two different libmpi.so's that I do not have a problem. I have to periodically check the second system to see if it has been updated in which I case I have to update my system. Michael
[OMPI users] Buffered sends
Is there known issue with buffered sends in OpenMPI 1.1.4? I changed a single send which is called thousands of times from MPI_SEND (& MPI_ISEND) to MPI_BSEND (& MPI_IBSEND) and my Fortran 90 code slowed down by a factor of 10. I've looked at several references and I can't see where I'm making a mistake. The MPI_SEND is for MPI_PACKED data, so it's first parameter is an allocated character array. I also allocated a character array for the buffer passed to MPI_BUFFER_ATTACH. Looking at the model implementation in a reference they give a model of using MPI_PACKED inside MPI_BSEND, I was wondering if this could be a problem, i.e. packing packed data? Michael ps. I have to use OpenMPI 1.1.4 to maintain compatibility with a major HPC center.
[OMPI users] OpenMPI 1.1.4 vs. 1.2
To maintain compatibility with a major HPC center I upgraded(?) from OpenMPI 1.1.4 to OpenMPI 1.2 on my local cluster. In testing on my local cluster, 13 dual-Opteron Linux boxes with dual gigabit ethernet, I discovered that my program runs slower using OpenMPI 1.2 then OpenMPI 1.1.4 (780.3 versus 402.4 seconds with 3 processes--tested twice to be certain). This particular version of my program was designed to minimize the amount of communications and the only MPI calls that get used a lot are MPI_SEND and MPI_RECV with MPI_PACKED data (so MPI_PACK and MPI_UNPACK also get used a lot). Was there a known problem with OpenMPI 1.2 (r14027) and ethernet communication that got fixed later? The same executable run at the major center seems fine, but they have Myrinet. Michael
Re: [OMPI users] ethernet bonding
On May 24, 2007, at 10:38 AM, Adams, Samuel D Contr AFRL/HEDR wrote: We recently got 33 new cluster nodes all of which have two onboard GigE nics. We also got two powerconnect 2748 48 port switches which support IEEE 802.3ad (link aggregation). I have configured the nodes to do Ethernet bonding to aggregate the two nics in to one bonded device: ... Now I am wondering what is the best way to configure my switches. I guess I could do it in two ways: use the IEEE 802.3ab on the switch, plug both nics of a node into one switch, and have some nodes on either switch, or perhaps for each node, plug one nic in one switch and the second nic in the other switch. Based on the configuration of a system that we purchased and our experience with that system, I would say that one nic from each node into one switch and the other nic from each node into the other switch. This assumes the two switches have more ports than you have nodes. I have no experience with IEEE 802.3ab, someone else would have to speak to that. The question also is which bonding configuration you choose and which choices would work and which gives the best performance. Michael --- "Producing a system from a specification is like walking on water-- it's easier if it's frozen."
[O-MPI users] latest g95: size of FORTRAN integer(selected_int_kind(2))... unknown
Building Open MPI 1.0.1 on a PowerMac running OS X 10.4.4 using 1) Apple gnu compilers from Xcode 2.2.1 2) fink-installed g77 3) latest g95 "G95 (GCC 4.0.1 (g95!) Jan 23 2006)" (the binary from G95 Home) setenv F77 g77 setenv FC g95 ./configure In the G95 section of the configure I get checking size of FORTRAN integer(selected_int_kind(2))... unknown configure: WARNING: *** Problem running configure test! Gzipped config.log attached. If I change to the older Fink g95 "G95 (GCC 4.0.2 (g95!) Dec 19 2005)" I don't see this problem System info: uname -a Darwin 8.4.0 Darwin Kernel Version 8.4.0: Tue Jan 3 18:22:10 PST 2006; root:xnu-792.6.56.obj~1/RELEASE_PPC Power Macintosh powerpcgcc --version powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc. build 5026) g++ --version powerpc-apple-darwin8-g++-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5250) g77 --version GNU Fortran (GCC) 3.4.3 Details on latest G95 build: g95 -v Using built-in specs. Target: Configured with: /Users/andy/g95/osx/gcc.osx/configure --enable- languages=c Thread model: posix gcc version 4.0.1 (g95!) Jan 23 2006 Details on older Fink g95 build: g95 -v Using built-in specs. Target: Configured with: ../configure --prefix=/sw/lib/gcc-lib/powerpc-apple- darwin8/4.0.2 --with-gmp=/sw --enable-languages=c --disable-checking --with-included-gettext Thread model: posix gcc version 4.0.2 (g95!) Dec 19 2005 config.log.gz Description: GNU Zip compressed data
Re: [O-MPI users] latest g95: size of FORTRAN integer(selected_int_kind(2))... unknown
Confirmed by author of g95: On Jan 26, 2006, at 3:40 PM, Andy Vaught wrote: It's a known issue. Use LDFLAGS=-lSystemStubs on the configure line. On Jan 26, 2006, at 11:35 AM, Kraig Winters wrote: I believe that ld: Undefined symbols: _fprintf$LDBLStub can be fixed by adding -L/usr/lib -lSystemStubs to your link statement. For xlf, this can be done once and for all in the compiler configuration file. I don't know if something similar can be done for g95. This problem seems to have started w/ 10.4. Kraig On Jan 26, 2006, at 4:57 AM, Jeff Squyres wrote: It looks like your g95 may not be installed correctly. Here's the relevant information from the config.log: configure:32697: gcc -O3 -DNDEBUG -fno-strict-aliasing -I. -c conftest.c configure:32704: $? = 0 configure:32714: g95 conftestf.f90 conftest.o -o conftest ld: Undefined symbols: _fprintf$LDBLStub That is, configure tried to compile a .f90 file and link it with a C- compiled .o file (normally, this should work just fine). In performing the final link, however, it did not find the symbol fprintf (). This seems to indicate that the g95 compiler was not able to find the C libraries properly. Can you verify that everything is installed properly, and that g95 is able to link to C libraries? On Jan 24, 2006, at 3:11 PM, Michael Kluskens wrote: Building Open MPI 1.0.1 on a PowerMac running OS X 10.4.4 using 1) Apple gnu compilers from Xcode 2.2.1 2) fink-installed g77 3) latest g95 "G95 (GCC 4.0.1 (g95!) Jan 23 2006)" (the binary from G95 Home) setenv F77 g77 setenv FC g95 ./configure In the G95 section of the configure I get checking size of FORTRAN integer(selected_int_kind(2))... unknown configure: WARNING: *** Problem running configure test! Gzipped config.log attached. If I change to the older Fink g95 "G95 (GCC 4.0.2 (g95!) Dec 19 2005)" I don't see this problem System info: uname -a Darwin 8.4.0 Darwin Kernel Version 8.4.0: Tue Jan 3 18:22:10 PST 2006; root:xnu-792.6.56.obj~1/RELEASE_PPC Power Macintosh powerpcgcc --version powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc. build 5026) g++ --version powerpc-apple-darwin8-g++-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5250) g77 --version GNU Fortran (GCC) 3.4.3 Details on latest G95 build: g95 -v Using built-in specs. Target: Configured with: /Users/andy/g95/osx/gcc.osx/configure --enable- languages=c Thread model: posix gcc version 4.0.1 (g95!) Jan 23 2006 Details on older Fink g95 build: g95 -v Using built-in specs. Target: Configured with: ../configure --prefix=/sw/lib/gcc-lib/powerpc- apple-darwin8/4.0.2 --with-gmp=/sw --enable-languages=c --disable- checking --with-included-gettext Thread model: posix gcc version 4.0.2 (g95!) Dec 19 2005 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[O-MPI users] f90 compiling: USE MPI vs. include 'mpif.h'
Question regarding f90 compiling Using: USE MPI instead of include 'mpif.h' makes the compilation take an extra two minutes using g95 under OS X 10.4.4 (simple test program 115 seconds versus 0.2 seconds) Is this normal? Michael
Re: [OMPI users] Runtime replacement of mpi libraries?
Another option is SGI PerfBoost. It will let you run apps compiled against other ABIs with SGI MPT with practically no performance loss. $ module load openmpi $ make $ module unload openmpi $ module load mpt perfboost $ mpiexec_mpt -np 2 perfboost -ompi a.out On 09/11/2014 01:28 PM, JR Cary wrote: We need to build an application on our machine with one mpi (e.g. openmpi), but for performance reasons, upon installation, we would like to runtime link to a different, specialized mpi, such as an SGI implementation provided for their systems. Can one expect this to work? I tried this with openmpi and mpich, building the code against shared openmpi and then changing the LD_LIBRARY_PATH to point to the shared mpich. This failed due to the sonames being different. $ ldd foo | grep mpi libmpi_usempi.so.1 => not found libmpi_mpifh.so.2 => not found libmpi.so.1 => not found libmpi_cxx.so.1 => not found but in the mpich distribution one has different sonames libmpi.so.12 so the runtime loader will not load the mpich libraries instead. and the fortran libraries (which may not matter to us) have different names, $ \ls /contrib/mpich-shared/lib/*.so.12 /contrib/mpich-shared/lib/libmpicxx.so.12 /contrib/mpich-shared/lib/libmpifort.so.12 /contrib/mpich-shared/lib/libmpi.so.12 Is there a general approach to this? Or in practice, must one build on a machine to use that machine's MPI? Thx.John Cary ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/25311.php -- Michael A. Raymond SGI MPT Team Leader 1 (651) 683-7523
Re: [OMPI users] Runtime replacement of mpi libraries?
Doing the `module load perfboost` sets the LD_LIBRARY_PATH. To see more, after doing the module load of both SGI modules, do an `ldd` on your app. On 09/11/2014 03:40 PM, John Cary wrote: Thanks much! So does mpiexec_mpt then set the LD_LIBRARY_PATH as needed? John On 9/11/2014 1:27 PM, Michael Raymond wrote: Another option is SGI PerfBoost. It will let you run apps compiled against other ABIs with SGI MPT with practically no performance loss. $ module load openmpi $ make $ module unload openmpi $ module load mpt perfboost $ mpiexec_mpt -np 2 perfboost -ompi a.out On 09/11/2014 01:28 PM, JR Cary wrote: We need to build an application on our machine with one mpi (e.g. openmpi), but for performance reasons, upon installation, we would like to runtime link to a different, specialized mpi, such as an SGI implementation provided for their systems. Can one expect this to work? I tried this with openmpi and mpich, building the code against shared openmpi and then changing the LD_LIBRARY_PATH to point to the shared mpich. This failed due to the sonames being different. $ ldd foo | grep mpi libmpi_usempi.so.1 => not found libmpi_mpifh.so.2 => not found libmpi.so.1 => not found libmpi_cxx.so.1 => not found but in the mpich distribution one has different sonames libmpi.so.12 so the runtime loader will not load the mpich libraries instead. and the fortran libraries (which may not matter to us) have different names, $ \ls /contrib/mpich-shared/lib/*.so.12 /contrib/mpich-shared/lib/libmpicxx.so.12 /contrib/mpich-shared/lib/libmpifort.so.12 /contrib/mpich-shared/lib/libmpi.so.12 Is there a general approach to this? Or in practice, must one build on a machine to use that machine's MPI? Thx.John Cary ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/25311.php ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/09/25316.php -- Michael A. Raymond SGI MPT Team Leader 1 (651) 683-7523
[OMPI users] Strange behavior of OMPI 1.8.3
Hello, I've configured OpenMPI1.8.3 with the following command line $ AXFLAGS="-xSSE4.2 -axAVX,CORE-AVX-I,CORE-AVX2" $ myFLAGS="-O2 ${AXFLAGS}" ; $ ./configure --prefix=${proot} \ --with-lsf \ --with-cma \ --enable-peruse --enable-branch-probabilities \ --enable-mpi-fortran=all \ --enable-cxx-exceptions \ --enable-ipv6 \ --enable-sparse-groups \ --with-threads=posix \ --enable-mpi-thread-multiple \ --enable-openib-connectx-xrc \ --enable-mtl-portals4-flow-control \ --with-hwloc=internal \ --enable-orterun-prefix-by-default \ --with-ident-string="MikeT_15.0" \ CC=icc CFLAGS="$myFLAGS" \ CXX=icpc CXXFLAGS="$myFLAGS" \ F77=ifort FFLAGS="$myFLAGS" FC=ifort FCFLAGS="$myFLAGS" \ LIBS="-lnsl" \ && make -j 8 && make install but when I run it with $ mpirun --bind-to core --map-by core -mca mpi_show_mca_params all --host H1,H2 -np 2 ~/performance/analysis/networks/Intel64_SandyBridge/HPCI/OMB_4.3.0/ompi_1.8.2/cpu/osu-micro-benchmarks-4.3/libexec/osu-micro-benchmarks/mpi/one-sided/osu_put_bibw H H I am getting " [H1:33580] [[41149,0],0] ORTE_ERROR_LOG: Address family not supported by protocol in file oob_tcp_listener.c at line 120 [h2:33580] [[41149,0],0] ORTE_ERROR_LOG: Address family not supported by protocol in file oob_tcp_component.c at line 584 " Any suggestions ? Thanks ! Michael
Re: [OMPI users] Strange behavior of OMPI 1.8.3
Hi Howard, We have NOT defined IPv6 on the nodes. Actually I was looking at the location of the code that complains and I also saw references to IPv6 sockets. Thanks a lot for the suggestion! I'll try this out tomorrow. Regards Michael On Mon, Oct 6, 2014 at 11:07 PM, Howard Pritchard wrote: > Hi Michael, > > If you do not include --enable-ipv6 in the config line, do you still > observe the problem? > Is it possible that one or more interfaces on nodes H1 and H2 do not have > ipv6 enabled? > > Howard > > > 2014-10-06 16:51 GMT-06:00 Michael Thomadakis : > >> Hello, >> >> I've configured OpenMPI1.8.3 with the following command line >> >> >> >> $ AXFLAGS="-xSSE4.2 -axAVX,CORE-AVX-I,CORE-AVX2" >> $ myFLAGS="-O2 ${AXFLAGS}" ; >> >> $ ./configure --prefix=${proot} \ >> --with-lsf \ >> --with-cma \ >> --enable-peruse --enable-branch-probabilities \ >> --enable-mpi-fortran=all \ >> --enable-cxx-exceptions \ >> --enable-ipv6 \ >> --enable-sparse-groups \ >> --with-threads=posix \ >> --enable-mpi-thread-multiple \ >> --enable-openib-connectx-xrc \ >> --enable-mtl-portals4-flow-control \ >> --with-hwloc=internal \ >> --enable-orterun-prefix-by-default \ >> --with-ident-string="MikeT_15.0" \ >> CC=icc CFLAGS="$myFLAGS" \ >> CXX=icpc CXXFLAGS="$myFLAGS" \ >> F77=ifort FFLAGS="$myFLAGS" FC=ifort FCFLAGS="$myFLAGS" \ >> LIBS="-lnsl" \ >> && make -j 8 && make install >> >> but when I run it with >> >> $ mpirun --bind-to core --map-by core -mca mpi_show_mca_params all --host >> H1,H2 -np 2 >> ~/performance/analysis/networks/Intel64_SandyBridge/HPCI/OMB_4.3.0/ompi_1.8.2/cpu/osu-micro-benchmarks-4.3/libexec/osu-micro-benchmarks/mpi/one-sided/osu_put_bibw >> H H >> >> I am getting >> " >> [H1:33580] [[41149,0],0] ORTE_ERROR_LOG: Address family not supported by >> protocol in file oob_tcp_listener.c at line 120 >> [h2:33580] [[41149,0],0] ORTE_ERROR_LOG: Address family not supported by >> protocol in file oob_tcp_component.c at line 584 >> >> " >> >> Any suggestions ? >> >> >> Thanks ! >> Michael >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/25468.php >> > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25472.php >
[OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.
Dear OpenMPI list, As far as I know, when we build OpenMPI itself with GNU or Intel compilers we expect that the subsequent MPI application binary will use the same compiler set and run-times. Would it be possible to build OpenMPI with the GNU tool chain but then subsequently instruct the OpenMPI compiler wrappers to use the Intel compiler set? Would there be any issues with compiling C++ / Fortran or corresponding OMP codes ? In general, what is clean way to build OpenMPI with a GNU compiler set but then instruct the wrappers to use Intel compiler set? Thanks! Michael ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.
Thanks for the note. How about OMP runtimes though? Michael On Mon, Sep 18, 2017 at 3:21 PM, n8tm via users wrote: > On Linux and Mac, Intel c and c++ are sufficiently compatible with gcc and > g++ that this should be possible. This is not so for Fortran libraries or > Windows. > > > > > > > Sent via the Samsung Galaxy S8 active, an AT&T 4G LTE smartphone > > ---- Original message > From: Michael Thomadakis > Date: 9/18/17 3:51 PM (GMT-05:00) > To: users@lists.open-mpi.org > Subject: [OMPI users] Question concerning compatibility of languages used > with building OpenMPI and languages OpenMPI uses to build MPI binaries. > > Dear OpenMPI list, > > As far as I know, when we build OpenMPI itself with GNU or Intel compilers > we expect that the subsequent MPI application binary will use the same > compiler set and run-times. > > Would it be possible to build OpenMPI with the GNU tool chain but then > subsequently instruct the OpenMPI compiler wrappers to use the Intel > compiler set? Would there be any issues with compiling C++ / Fortran or > corresponding OMP codes ? > > In general, what is clean way to build OpenMPI with a GNU compiler set but > then instruct the wrappers to use Intel compiler set? > > Thanks! > Michael > > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.
Hello OpenMPI team, Thank you for the insightful feedback. I am not claiming in any way that it is a meaningful practice to build the OpenMPI stack with one compiler and then just try to convince / force it to use another compilation environment to build MPI applications. There are occasions though that one *may only have an OpenMPI* stack built, say, by GNU compilers but for efficiency of execution of the resulting MPI applications try to use Intel / PGI compilers with the same OpenMPI stack to compile MPI applications. It is too much unnecessary trouble to use the same MPI stack with different compilation environments. Thank you, Michael On Mon, Sep 18, 2017 at 7:35 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Even if i do not fully understand the question, keep in mind Open MPI > does not use OpenMP, so from that point of view, Open MPI is > independant of the OpenMP runtime. > > Let me emphasize on what Jeff already wrote : use different installs > of Open MPI (and you can use modules or lmod in order to choose > between them easily) and always use the compilers that were used to > build Open MPI. This is mandatory is you use Fortran bindings (use mpi > and use mpi_f08), and you'd better keep yourself out of trouble with > C/C++ and mpif.h > > Cheers, > > Gilles > > On Tue, Sep 19, 2017 at 5:57 AM, Michael Thomadakis > wrote: > > Thanks for the note. How about OMP runtimes though? > > > > Michael > > > > On Mon, Sep 18, 2017 at 3:21 PM, n8tm via users < > users@lists.open-mpi.org> > > wrote: > >> > >> On Linux and Mac, Intel c and c++ are sufficiently compatible with gcc > and > >> g++ that this should be possible. This is not so for Fortran libraries > or > >> Windows. > >> > >> > >> > >> > >> > >> > >> Sent via the Samsung Galaxy S8 active, an AT&T 4G LTE smartphone > >> > >> Original message > >> From: Michael Thomadakis > >> Date: 9/18/17 3:51 PM (GMT-05:00) > >> To: users@lists.open-mpi.org > >> Subject: [OMPI users] Question concerning compatibility of languages > used > >> with building OpenMPI and languages OpenMPI uses to build MPI binaries. > >> > >> Dear OpenMPI list, > >> > >> As far as I know, when we build OpenMPI itself with GNU or Intel > compilers > >> we expect that the subsequent MPI application binary will use the same > >> compiler set and run-times. > >> > >> Would it be possible to build OpenMPI with the GNU tool chain but then > >> subsequently instruct the OpenMPI compiler wrappers to use the Intel > >> compiler set? Would there be any issues with compiling C++ / Fortran or > >> corresponding OMP codes ? > >> > >> In general, what is clean way to build OpenMPI with a GNU compiler set > but > >> then instruct the wrappers to use Intel compiler set? > >> > >> Thanks! > >> Michael > >> > >> ___ > >> users mailing list > >> users@lists.open-mpi.org > >> https://lists.open-mpi.org/mailman/listinfo/users > > > > > > > > ___ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Question concerning compatibility of languages used with building OpenMPI and languages OpenMPI uses to build MPI binaries.
OMP is yet another source of incompatibility between GNU and Intel environments. So compiling say Fortran OMP code into a library and trying to link it with Intel Fortran codes just aggravates the problem. Michael On Mon, Sep 18, 2017 at 7:35 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Even if i do not fully understand the question, keep in mind Open MPI > does not use OpenMP, so from that point of view, Open MPI is > independant of the OpenMP runtime. > > Let me emphasize on what Jeff already wrote : use different installs > of Open MPI (and you can use modules or lmod in order to choose > between them easily) and always use the compilers that were used to > build Open MPI. This is mandatory is you use Fortran bindings (use mpi > and use mpi_f08), and you'd better keep yourself out of trouble with > C/C++ and mpif.h > > Cheers, > > Gilles > > On Tue, Sep 19, 2017 at 5:57 AM, Michael Thomadakis > wrote: > > Thanks for the note. How about OMP runtimes though? > > > > Michael > > > > On Mon, Sep 18, 2017 at 3:21 PM, n8tm via users < > users@lists.open-mpi.org> > > wrote: > >> > >> On Linux and Mac, Intel c and c++ are sufficiently compatible with gcc > and > >> g++ that this should be possible. This is not so for Fortran libraries > or > >> Windows. > >> > >> > >> > >> > >> > >> > >> Sent via the Samsung Galaxy S8 active, an AT&T 4G LTE smartphone > >> > >> Original message > >> From: Michael Thomadakis > >> Date: 9/18/17 3:51 PM (GMT-05:00) > >> To: users@lists.open-mpi.org > >> Subject: [OMPI users] Question concerning compatibility of languages > used > >> with building OpenMPI and languages OpenMPI uses to build MPI binaries. > >> > >> Dear OpenMPI list, > >> > >> As far as I know, when we build OpenMPI itself with GNU or Intel > compilers > >> we expect that the subsequent MPI application binary will use the same > >> compiler set and run-times. > >> > >> Would it be possible to build OpenMPI with the GNU tool chain but then > >> subsequently instruct the OpenMPI compiler wrappers to use the Intel > >> compiler set? Would there be any issues with compiling C++ / Fortran or > >> corresponding OMP codes ? > >> > >> In general, what is clean way to build OpenMPI with a GNU compiler set > but > >> then instruct the wrappers to use Intel compiler set? > >> > >> Thanks! > >> Michael > >> > >> ___ > >> users mailing list > >> users@lists.open-mpi.org > >> https://lists.open-mpi.org/mailman/listinfo/users > > > > > > > > ___ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] Segmentation fault with SLURM and non-local nodes
Hi, I'm not sure whether this problem is with SLURM or OpenMPI, but the stack traces (below) point to an issue within OpenMPI. Whenever I try to launch an MPI job within SLURM, mpirun immediately segmentation faults -- but only if the machine that SLURM allocated to MPI is different to the one that I launched the MPI job. However, if I force SLURM to allocate only the local node (ie, the one on which salloc was called), everything works fine. Failing case: michael@ipc ~ $ salloc -n8 mpirun --display-map ./mpi JOB MAP Data for node: Name: ipc4 Num procs: 8 Process OMPI jobid: [21326,1] Process rank: 0 Process OMPI jobid: [21326,1] Process rank: 1 Process OMPI jobid: [21326,1] Process rank: 2 Process OMPI jobid: [21326,1] Process rank: 3 Process OMPI jobid: [21326,1] Process rank: 4 Process OMPI jobid: [21326,1] Process rank: 5 Process OMPI jobid: [21326,1] Process rank: 6 Process OMPI jobid: [21326,1] Process rank: 7 = [ipc:16986] *** Process received signal *** [ipc:16986] Signal: Segmentation fault (11) [ipc:16986] Signal code: Address not mapped (1) [ipc:16986] Failing at address: 0x801328268 [ipc:16986] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7ff85c7638f0] [ipc:16986] [ 1] /usr/lib/libopen-rte.so.0(+0x3459a) [0x7ff85d4a059a] [ipc:16986] [ 2] /usr/lib/libopen-pal.so.0(+0x1eeb8) [0x7ff85d233eb8] [ipc:16986] [ 3] /usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7ff85d228439] [ipc:16986] [ 4] /usr/lib/libopen-rte.so.0(orte_plm_base_daemon_callback+0x9d) [0x7ff85d4a002d] [ipc:16986] [ 5] /usr/lib/openmpi/lib/openmpi/mca_plm_slurm.so(+0x211a) [0x7ff85bbc311a] [ipc:16986] [ 6] mpirun() [0x403c1f] [ipc:16986] [ 7] mpirun() [0x403014] [ipc:16986] [ 8] /lib/libc.so.6(__libc_start_main+0xfd) [0x7ff85c3efc4d] [ipc:16986] [ 9] mpirun() [0x402f39] [ipc:16986] *** End of error message *** Non-failing case: michael@eng-ipc4 ~ $ salloc -n8 -w ipc4 mpirun --display-map ./mpi JOB MAP Data for node: Name: eng-ipc4.FQDN Num procs: 8 Process OMPI jobid: [12467,1] Process rank: 0 Process OMPI jobid: [12467,1] Process rank: 1 Process OMPI jobid: [12467,1] Process rank: 2 Process OMPI jobid: [12467,1] Process rank: 3 Process OMPI jobid: [12467,1] Process rank: 4 Process OMPI jobid: [12467,1] Process rank: 5 Process OMPI jobid: [12467,1] Process rank: 6 Process OMPI jobid: [12467,1] Process rank: 7 = Process 1 on eng-ipc4.FQDN out of 8 Process 3 on eng-ipc4.FQDN out of 8 Process 4 on eng-ipc4.FQDN out of 8 Process 6 on eng-ipc4.FQDN out of 8 Process 7 on eng-ipc4.FQDN out of 8 Process 0 on eng-ipc4.FQDN out of 8 Process 2 on eng-ipc4.FQDN out of 8 Process 5 on eng-ipc4.FQDN out of 8 Using mpi directly is fine: eg mpirun -H 'ipc3,ipc4' -np 8 ./mpi Works as expected This is a (small) homogenous cluster, all Xeon class machines with plenty of RAM and shared filesystem over NFS, running 64-bit Ubuntu server. I was running stock OpenMPI (1.4.1) and SLURM (2.1.1), I have since upgraded to latest stable OpenMPI (1.4.3) and SLURM (2.2.0), with no effect. (the newer binaries were compiled from the respective upstream Debian packages). strace (not shown) shows that the job is launched via srun and a connection is received back from the child process over TCP/IP. Soon after this, mpirun crashes. Nodes communicate over a semi-dedicated TCP/IP GigE connection. Is this a known bug? What is going wrong? Regards, Michael Curtis
Re: [OMPI users] Segmentation fault with SLURM and non-local nodes
On 27/01/2011, at 4:51 PM, Michael Curtis wrote: Some more debugging information: > Failing case: > michael@ipc ~ $ salloc -n8 mpirun --display-map ./mpi > JOB MAP Backtrace with debugging symbols #0 0x77bb5c1e in ?? () from /usr/lib/libopen-rte.so.0 #1 0x7792e23f in ?? () from /usr/lib/libopen-pal.so.0 #2 0x77920679 in opal_progress () from /usr/lib/libopen-pal.so.0 #3 0x77bb6e5d in orte_plm_base_daemon_callback () from /usr/lib/libopen-rte.so.0 #4 0x762b67e7 in plm_slurm_launch_job (jdata=) at ../../../../../../orte/mca/plm/slurm/plm_slurm_module.c:360 #5 0x004041c8 in orterun (argc=4, argv=0x7fffe7d8) at ../../../../../orte/tools/orterun/orterun.c:754 #6 0x00403234 in main (argc=4, argv=0x7fffe7d8) at ../../../../../orte/tools/orterun/main.c:13 Trace output with -d100 and --enable-trace: [:10821] progressed_wait: ../../../../../orte/mca/plm/base/plm_base_launch_support.c 459 [:10821] defining message event: ../../../../../orte/mca/plm/base/plm_base_launch_support.c 423 I'm guessing from this that it's crashing in the event loop, maybe at : static void process_orted_launch_report(int fd, short event, void *data) strace: poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}], 6, 1000) = 1 ([{fd=13, revents=POLLIN}]) readv(13, [{"R\333\0\0\377\377\377\377R\333\0\0\377\377\377\377R\333\0\0\0\0\0\0\0\0\0\4\0\0\0\232"..., 36}], 1) = 36 readv(13, [{"R\333\0\0\377\377\377\377R\333\0\0\0\0\0\0\0\0\0\n\0\0\0\1\0\0\0u1390"..., 154}], 1) = 154 poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}], 6, 0) = 0 (Timeout) --- SIGSEGV (Segmentation fault) @ 0 (0) --- OK, I matched the disassemblies and confirmed that the crash originates in process_orted_launch_report, and therefore matched up the source code line with where gdb reckons the program counter was at that point: /* update state */ pdatorted[mev->sender.vpid]->state = ORTE_PROC_STATE_RUNNING; Hopefully all this information helps a little!
Re: [OMPI users] Segmentation fault with SLURM and non-local nodes
On 28/01/2011, at 8:16 PM, Michael Curtis wrote: > > On 27/01/2011, at 4:51 PM, Michael Curtis wrote: > > Some more debugging information: Is anyone able to help with this problem? As far as I can tell it's a stock-standard recently installed SLURM installation. I can try 1.5.1 but hesitant to deploy this as it would require a recompile of some rather large pieces of software. Should I re-post to the -devel lists? Regards,
Re: [OMPI users] Segmentation fault with SLURM and non-local nodes
On 04/02/2011, at 9:35 AM, Samuel K. Gutierrez wrote: > I just tried to reproduce the problem that you are experiencing and was > unable to. > > > SLURM 2.1.15 > Open MPI 1.4.3 configured with: > --with-platform=./contrib/platform/lanl/tlcc/debug-nopanasas > > I'll dig a bit further. Interesting. I'll try a local, vanilla (ie, non-debian) build and report back. Michael
Re: [OMPI users] Segmentation fault with SLURM and non-local nodes
On 04/02/2011, at 9:35 AM, Samuel K. Gutierrez wrote: Hi, > I just tried to reproduce the problem that you are experiencing and was > unable to. > > SLURM 2.1.15 > Open MPI 1.4.3 configured with: > --with-platform=./contrib/platform/lanl/tlcc/debug-nopanasas I compiled OpenMPI 1.4.3 (vanilla from source tarball) with the same platform file (the only change was to re-enable btl-tcp). Unfortunately, the result is the same: salloc -n16 ~/../openmpi/bin/mpirun --display-map ~/ServerAdmin/mpi salloc: Granted job allocation 145 JOB MAP Data for node: Name: eng-ipc4.{FQDN} Num procs: 8 Process OMPI jobid: [6932,1] Process rank: 0 Process OMPI jobid: [6932,1] Process rank: 1 Process OMPI jobid: [6932,1] Process rank: 2 Process OMPI jobid: [6932,1] Process rank: 3 Process OMPI jobid: [6932,1] Process rank: 4 Process OMPI jobid: [6932,1] Process rank: 5 Process OMPI jobid: [6932,1] Process rank: 6 Process OMPI jobid: [6932,1] Process rank: 7 Data for node: Name: ipc3 Num procs: 8 Process OMPI jobid: [6932,1] Process rank: 8 Process OMPI jobid: [6932,1] Process rank: 9 Process OMPI jobid: [6932,1] Process rank: 10 Process OMPI jobid: [6932,1] Process rank: 11 Process OMPI jobid: [6932,1] Process rank: 12 Process OMPI jobid: [6932,1] Process rank: 13 Process OMPI jobid: [6932,1] Process rank: 14 Process OMPI jobid: [6932,1] Process rank: 15 = [eng-ipc4:31754] *** Process received signal *** [eng-ipc4:31754] Signal: Segmentation fault (11) [eng-ipc4:31754] Signal code: Address not mapped (1) [eng-ipc4:31754] Failing at address: 0x8012eb748 [eng-ipc4:31754] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7f81ce4bf8f0] [eng-ipc4:31754] [ 1] ~/../openmpi/lib/libopen-rte.so.0(+0x7f869) [0x7f81cf262869] [eng-ipc4:31754] [ 2] ~/../openmpi/lib/libopen-pal.so.0(+0x22338) [0x7f81cef93338] [eng-ipc4:31754] [ 3] ~/../openmpi/lib/libopen-pal.so.0(+0x2297e) [0x7f81cef9397e] [eng-ipc4:31754] [ 4] ~/../openmpi/lib/libopen-pal.so.0(opal_event_loop+0x1f) [0x7f81cef9356f] [eng-ipc4:31754] [ 5] ~/../openmpi/lib/libopen-pal.so.0(opal_progress+0x89) [0x7f81cef87916] [eng-ipc4:31754] [ 6] ~/../openmpi/lib/libopen-rte.so.0(orte_plm_base_daemon_callback+0x13f) [0x7f81cf262e20] [eng-ipc4:31754] [ 7] ~/../openmpi/lib/libopen-rte.so.0(+0x84ed7) [0x7f81cf267ed7] [eng-ipc4:31754] [ 8] ~/../home/../openmpi/bin/mpirun() [0x403f46] [eng-ipc4:31754] [ 9] ~/../home/../openmpi/bin/mpirun() [0x402fb4] [eng-ipc4:31754] [10] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f81ce14bc4d] [eng-ipc4:31754] [11] ~/../openmpi/bin/mpirun() [0x402ed9] [eng-ipc4:31754] *** End of error message *** salloc: Relinquishing job allocation 145 salloc: Job allocation 145 has been revoked. zsh: exit 1 salloc -n16 ~/../openmpi/bin/mpirun --display-map ~/ServerAdmin/mpi I've anonymised the paths and domain, otherwise pasted verbatim. The only odd thing I notice is that the launching machine uses its full domain name, whereas the other machine is referred to by the short name. Despite the FQDN, the domain does not exist in the DNS (for historical reasons), but does exist in the /etc/hosts file. Any further clues would be appreciated. In case it may be relevant, core system versions are: glibc 2.11, gcc 4.4.3, kernel 2.6.32. One other point of difference may be that our environment is tcp (ethernet) based whereas the LANL test environment is not? Michael
Re: [OMPI users] Segmentation fault with SLURM and non-local nodes
On 07/02/2011, at 12:36 PM, Michael Curtis wrote: > > On 04/02/2011, at 9:35 AM, Samuel K. Gutierrez wrote: > > Hi, > >> I just tried to reproduce the problem that you are experiencing and was >> unable to. >> >> SLURM 2.1.15 >> Open MPI 1.4.3 configured with: >> --with-platform=./contrib/platform/lanl/tlcc/debug-nopanasas > > I compiled OpenMPI 1.4.3 (vanilla from source tarball) with the same platform > file (the only change was to re-enable btl-tcp). > > Unfortunately, the result is the same: To reply to my own post again (sorry!), I tried OpenMPI 1.5.1. This works fine: salloc -n16 ~/../openmpi/bin/mpirun --display-map mpi salloc: Granted job allocation 151 JOB MAP Data for node: ipc3Num procs: 8 Process OMPI jobid: [3365,1] Process rank: 0 Process OMPI jobid: [3365,1] Process rank: 1 Process OMPI jobid: [3365,1] Process rank: 2 Process OMPI jobid: [3365,1] Process rank: 3 Process OMPI jobid: [3365,1] Process rank: 4 Process OMPI jobid: [3365,1] Process rank: 5 Process OMPI jobid: [3365,1] Process rank: 6 Process OMPI jobid: [3365,1] Process rank: 7 Data for node: ipc4Num procs: 8 Process OMPI jobid: [3365,1] Process rank: 8 Process OMPI jobid: [3365,1] Process rank: 9 Process OMPI jobid: [3365,1] Process rank: 10 Process OMPI jobid: [3365,1] Process rank: 11 Process OMPI jobid: [3365,1] Process rank: 12 Process OMPI jobid: [3365,1] Process rank: 13 Process OMPI jobid: [3365,1] Process rank: 14 Process OMPI jobid: [3365,1] Process rank: 15 = Process 2 on eng-ipc3.{FQDN} out of 16 Process 4 on eng-ipc3.{FQDN} out of 16 Process 5 on eng-ipc3.{FQDN} out of 16 Process 0 on eng-ipc3.{FQDN} out of 16 Process 1 on eng-ipc3.{FQDN} out of 16 Process 6 on eng-ipc3.{FQDN} out of 16 Process 3 on eng-ipc3.{FQDN} out of 16 Process 7 on eng-ipc3.{FQDN} out of 16 Process 8 on eng-ipc4.{FQDN} out of 16 Process 11 on eng-ipc4.{FQDN} out of 16 Process 12 on eng-ipc4.{FQDN} out of 16 Process 14 on eng-ipc4.{FQDN} out of 16 Process 15 on eng-ipc4.{FQDN} out of 16 Process 10 on eng-ipc4.{FQDN} out of 16 Process 9 on eng-ipc4.{FQDN} out of 16 Process 13 on eng-ipc4.{FQDN} out of 16 salloc: Relinquishing job allocation 151 It does seem very much like there is a bug of some sort in 1.4.3? Michael
Re: [OMPI users] Segmentation fault with SLURM and non-local nodes
On 09/02/2011, at 2:38 AM, Ralph Castain wrote: > Another possibility to check - are you sure you are getting the same OMPI > version on the backend nodes? When I see it work on local node, but fail > multi-node, the most common problem is that you are picking up a different > OMPI version due to path differences on the backend nodes. It's installed as a system package, and the software set on all machines is managed by a configuration tool, so the machines should be identical. However, it may be worth checking the dependency versions and I'll double check that the OMPI versions really do match.
Re: [OMPI users] Segmentation fault with SLURM and non-local nodes
On 09/02/2011, at 2:17 AM, Samuel K. Gutierrez wrote: > Hi Michael, > > You may have tried to send some debug information to the list, but it appears > to have been blocked. Compressed text output of the backtrace text is > sufficient. Odd, I thought I sent it to you directly. In any case, here is the backtrace and some information from gdb: $ salloc -n16 gdb -args mpirun mpi (gdb) run Starting program: /mnt/f1/michael/openmpi/bin/mpirun /mnt/f1/michael/home/ServerAdmin/mpi [Thread debugging using libthread_db enabled] Program received signal SIGSEGV, Segmentation fault. 0x77b76869 in process_orted_launch_report (fd=-1, opal_event=1, data=0x681170) at base/plm_base_launch_support.c:342 342 pdatorted[mev->sender.vpid]->state = ORTE_PROC_STATE_RUNNING; (gdb) bt #0 0x77b76869 in process_orted_launch_report (fd=-1, opal_event=1, data=0x681170) at base/plm_base_launch_support.c:342 #1 0x778a7338 in event_process_active (base=0x615240) at event.c:651 #2 0x778a797e in opal_event_base_loop (base=0x615240, flags=1) at event.c:823 #3 0x778a756f in opal_event_loop (flags=1) at event.c:730 #4 0x7789b916 in opal_progress () at runtime/opal_progress.c:189 #5 0x77b76e20 in orte_plm_base_daemon_callback (num_daemons=2) at base/plm_base_launch_support.c:459 #6 0x77b7bed7 in plm_slurm_launch_job (jdata=0x610560) at plm_slurm_module.c:360 #7 0x00403f46 in orterun (argc=2, argv=0x7fffe7d8) at orterun.c:754 #8 0x00402fb4 in main (argc=2, argv=0x7fffe7d8) at main.c:13 (gdb) print pdatorted $1 = (orte_proc_t **) 0x67c610 (gdb) print mev $2 = (orte_message_event_t *) 0x681550 (gdb) print mev->sender.vpid $3 = 4294967295 (gdb) print mev->sender $4 = {jobid = 1721696256, vpid = 4294967295} (gdb) print *mev $5 = {super = {obj_magic_id = 16046253926196952813, obj_class = 0x77dd4f40, obj_reference_count = 1, cls_init_file_name = 0x77bb9a78 "base/plm_base_launch_support.c", cls_init_lineno = 423}, ev = 0x680850, sender = {jobid = 1721696256, vpid = 4294967295}, buffer = 0x6811b0, tag = 10, file = 0x680640 "rml_oob_component.c", line = 279} That vpid looks suspiciously like -1. Further debugging: Breakpoint 3, orted_report_launch (status=32767, sender=0x7fffe170, buffer=0x77b1a85f, tag=32767, cbdata=0x612d20) at base/plm_base_launch_support.c:411 411 { (gdb) print sender $2 = (orte_process_name_t *) 0x7fffe170 (gdb) print *sender $3 = {jobid = 6822016, vpid = 0} (gdb) continue Continuing. -- A daemon (pid unknown) died unexpectedly with status 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- Program received signal SIGSEGV, Segmentation fault. 0x77b76869 in process_orted_launch_report (fd=-1, opal_event=1, data=0x681550) at base/plm_base_launch_support.c:342 342 pdatorted[mev->sender.vpid]->state = ORTE_PROC_STATE_RUNNING; (gdb) print mev->sender $4 = {jobid = 1778450432, vpid = 4294967295} The daemon probably died as I spent too long thinking about my gdb input ;)
Re: [OMPI users] Segmentation fault with SLURM and non-local nodes
On 09/02/2011, at 9:16 AM, Ralph Castain wrote: > See below > > > On Feb 8, 2011, at 2:44 PM, Michael Curtis wrote: > >> >> On 09/02/2011, at 2:17 AM, Samuel K. Gutierrez wrote: >> >>> Hi Michael, >>> >>> You may have tried to send some debug information to the list, but it >>> appears to have been blocked. Compressed text output of the backtrace text >>> is sufficient. >> >> >> Odd, I thought I sent it to you directly. In any case, here is the >> backtrace and some information from gdb: >> >> $ salloc -n16 gdb -args mpirun mpi >> (gdb) run >> Starting program: /mnt/f1/michael/openmpi/bin/mpirun >> /mnt/f1/michael/home/ServerAdmin/mpi >> [Thread debugging using libthread_db enabled] >> >> Program received signal SIGSEGV, Segmentation fault. >> 0x77b76869 in process_orted_launch_report (fd=-1, opal_event=1, >> data=0x681170) at base/plm_base_launch_support.c:342 >> 342 pdatorted[mev->sender.vpid]->state = ORTE_PROC_STATE_RUNNING; >> (gdb) bt >> #0 0x77b76869 in process_orted_launch_report (fd=-1, opal_event=1, >> data=0x681170) at base/plm_base_launch_support.c:342 >> #1 0x778a7338 in event_process_active (base=0x615240) at event.c:651 >> #2 0x778a797e in opal_event_base_loop (base=0x615240, flags=1) at >> event.c:823 >> #3 0x778a756f in opal_event_loop (flags=1) at event.c:730 >> #4 0x7789b916 in opal_progress () at runtime/opal_progress.c:189 >> #5 0x77b76e20 in orte_plm_base_daemon_callback (num_daemons=2) at >> base/plm_base_launch_support.c:459 >> #6 0x77b7bed7 in plm_slurm_launch_job (jdata=0x610560) at >> plm_slurm_module.c:360 >> #7 0x00403f46 in orterun (argc=2, argv=0x7fffe7d8) at >> orterun.c:754 >> #8 0x00402fb4 in main (argc=2, argv=0x7fffe7d8) at main.c:13 >> (gdb) print pdatorted >> $1 = (orte_proc_t **) 0x67c610 >> (gdb) print mev >> $2 = (orte_message_event_t *) 0x681550 >> (gdb) print mev->sender.vpid >> $3 = 4294967295 >> (gdb) print mev->sender >> $4 = {jobid = 1721696256, vpid = 4294967295} >> (gdb) print *mev >> $5 = {super = {obj_magic_id = 16046253926196952813, obj_class = >> 0x77dd4f40, obj_reference_count = 1, cls_init_file_name = 0x77bb9a78 >> "base/plm_base_launch_support.c", >> cls_init_lineno = 423}, ev = 0x680850, sender = {jobid = 1721696256, vpid = >> 4294967295}, buffer = 0x6811b0, tag = 10, file = 0x680640 >> "rml_oob_component.c", line = 279} > > The jobid and vpid look like the defined INVALID values, indicating that > something is quite wrong. This would quite likely lead to the segfault. > >> From this, it would indeed appear that you are getting some kind of library >> confusion - the most likely cause of such an error is a daemon from a >> different version trying to respond, and so the returned message isn't >> correct. > > Not sure why else it would be happening...you could try setting -mca > plm_base_verbose 5 to get more debug output displayed on your screen, > assuming you built OMPI with --enable-debug. > Found the problem It is a site configuration issue, which I'll need to find a workaround for. [bio-ipc.{FQDN}:27523] mca:base:select:( plm) Query of component [slurm] set priority to 75 [bio-ipc.{FQDN}:27523] mca:base:select:( plm) Selected component [slurm] [bio-ipc.{FQDN}:27523] mca: base: close: component rsh closed [bio-ipc.{FQDN}:27523] mca: base: close: unloading component rsh [bio-ipc.{FQDN}:27523] plm:base:set_hnp_name: initial bias 27523 nodename hash 1936089714 [bio-ipc.{FQDN}:27523] plm:base:set_hnp_name: final jobfam 31383 [bio-ipc.{FQDN}:27523] [[31383,0],0] plm:base:receive start comm [bio-ipc.{FQDN}:27523] [[31383,0],0] plm:slurm: launching job [31383,1] [bio-ipc.{FQDN}:27523] [[31383,0],0] plm:base:setup_job for job [31383,1] [bio-ipc.{FQDN}:27523] [[31383,0],0] plm:slurm: launching on nodes ipc3 [bio-ipc.{FQDN}:27523] [[31383,0],0] plm:slurm: final top-level argv: srun --nodes=1 --ntasks=1 --kill-on-bad-exit --nodelist=ipc3 orted -mca ess slurm -mca orte_ess_jobid 2056716288 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri "2056716288.0;tcp://lanip:37493;tcp://globalip:37493;tcp://lanip2:37493" -mca plm_base_verbose 20 I then inserted some printf's into the ess_slurm_module (rough and ready, I know, but I was in a hurry). Just after initialisation: (at around line 345) orte_ess_slurm: jobid 2056716288 vpid 1 So it gets that... I narrowed it down to the get_slurm_nodename function, as the method didn't p
[OMPI users] RoCE (IBoE) & OpenMPI
I've been looking into OpenMPI's support for RoCE (Mellanox's recent Infiniband-over-Ethernet) lately. While it's promising, I've hit a snag: RoCE requires lossless ethernet, and on my switches the only way to guarantee this is with CoS. RoCE adapters cannot emit CoS priority tags unless the client program selects an IB service level and uses a non-default GID. There's a command-line option in OpenMPI to pick an IB SL, but I can't find one for picking a different GID. Does this exist for the openib btl? Or am I going about this the wrong way? -- Mike Shuey
Re: [OMPI users] RoCE (IBoE) & OpenMPI
It's a little different in RoCE. There's no subnet manager, so (as near as I can tell) you don't really have a subnet ID. Instead, the GID = GUID + VLAN tag (more or less). gid[0] has special bits in the VLAN tag section, to indicate that packets relating to this GID don't get a VLAN tag. Unfortunately, without a VLAN tag, those packets lack priority bits - meaning they can't be matched to a lossless class on our Cisco switches. RoCE HCAs keep a GID table, like normal HCAs. Every time you bring up a vlan interface, another entry gets automatically added to the table. If I select one of these other GIDs, packets get a VLAN tag, and that contains the necessary priority bits (well, assuming I selected the right IB service level, which is mapped to the priority tag in the VLAN header) for the traffic to match a lossless class of service on the switch. For this to work, I really need for the IB client to select a non-default GID. A few test programs included in OFED will do this, but I'm not sure OpenMPI will. Any thoughts? -- Mike Shuey On Fri, Feb 18, 2011 at 9:30 AM, Jeff Squyres wrote: > Greetings Mike. I'll answer today because Fri-Sat is the weekend in Israel > (i.e., the MPI team at Mellanox won't see this until Sunday). > > I don't have a lot of experience with RoCE; do you need a different GUID or a > different subnet ID? At least in IB, the GID = GUID + Subnet ID. The GUID > should be your unique port ID and the subnet ID is, well, the subnet ID. :-) > > Changing either of these in IB is an administrative function, not a > user-level function. Meaning: I'm *guessing* that the same is true for RoCE > -- changing the subnet ID (which is what I'm further guessing you need to do) > should be somewhere in the root-level setup for RoCE. Once you set a > different subnet ID, Open MPI should just use it. > > > On Feb 18, 2011, at 8:17 AM, Michael Shuey wrote: > >> I've been looking into OpenMPI's support for RoCE (Mellanox's recent >> Infiniband-over-Ethernet) lately. While it's promising, I've hit a >> snag: RoCE requires lossless ethernet, and on my switches the only way >> to guarantee this is with CoS. RoCE adapters cannot emit CoS priority >> tags unless the client program selects an IB service level and uses a >> non-default GID. >> >> There's a command-line option in OpenMPI to pick an IB SL, but I can't >> find one for picking a different GID. Does this exist for the openib >> btl? Or am I going about this the wrong way? >> >> -- >> Mike Shuey >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] RoCE (IBoE) & OpenMPI
Per-node GID & SL settings == bad. Site-wide GID & SL settings == good. If this could be an MCA param (like btl_openib_ib_service_level) that'd be great - we already have a global config file of similar params. We'd definitely want the same N everywhere. -- Mike Shuey On Fri, Feb 18, 2011 at 3:44 PM, Jeff Squyres wrote: > On Feb 18, 2011, at 1:39 PM, Michael Shuey wrote: > >> RoCE HCAs keep a GID table, like normal HCAs. Every time you bring up >> a vlan interface, another entry gets automatically added to the table. >> If I select one of these other GIDs, packets get a VLAN tag, and that >> contains the necessary priority bits (well, assuming I selected the >> right IB service level, which is mapped to the priority tag in the >> VLAN header) for the traffic to match a lossless class of service on >> the switch. > > Ah -- I see it now (it's been a looong time since I've looked in Open MPI's > verbs code!). We query and simply take the 0th GID from a given IBV device > port's GID table. > >> For this to work, I really need for the IB client to select a >> non-default GID. A few test programs included in OFED will do this, >> but I'm not sure OpenMPI will. Any thoughts? > > Yes, we can do this. It's pretty easy to add an MCA parameter to select the > Nth GID rather than always taking the 0th. > > To make this simple, can you make it so that the value of N is the same > across all nodes in your cluster? Then you can set a site-wide MCA param for > that value of N and be done with this issue. If we have to have a per-node > setting of N, it could get a little hairy (it's do-able, but... it's a > heckuva lot easier if N is the same everywhere). > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > >
Re: [OMPI users] RoCE (IBoE) & OpenMPI
Could you re-enable the SL param (btl_openib_ib_service_level) for RoCE? Jeff was kind enough to provide a patch to let me specify the gid_index, but that doesn't seem to be working. To get RoCE to work correctly (at least, on Nexus switches) I'll need to specify both a gid_index and an IB service level. I think. :-) Also, while the rdmacm connection manager is required for RoCE, it's not selected by default (like it is for iWARP). You still need to add that to a config file or command line, or you get a rather cryptic option (at least up through OpenMPI 1.5.1). -- Mike Shuey On Mon, Feb 21, 2011 at 12:34 PM, Jeff Squyres wrote: > Random thought: is there a check to ensure that the SL MCA param is not set > in a RoCE environment? If not, we should probably add a show_help warning if > the SL MCA param is set when using RoCE (i.e., that its value will be > ignored). > > > On Feb 19, 2011, at 12:22 AM, Shamis, Pavel wrote: > >> As far as I remember we don't allow to user to specify SL for RoCE. RoCE >> considered kinda ethernet device and RDMACM connection manager is used to >> setup the connections. it means that in order to select network X or Y, you >> may use ip/netmask (btl_openib_ipaddr_include) . >> >> Pavel (Pasha) Shamis >> --- >> Application Performance Tools Group >> Computer Science and Math Division >> Oak Ridge National Laboratory >> >> >> >> >> >> >> On Feb 18, 2011, at 4:14 PM, Michael Shuey wrote: >> >>> Per-node GID & SL settings == bad. Site-wide GID & SL settings == good. >>> >>> If this could be an MCA param (like btl_openib_ib_service_level) >>> that'd be great - we already have a global config file of similar >>> params. We'd definitely want the same N everywhere. >>> >>> -- >>> Mike Shuey >>> >>> >>> >>> On Fri, Feb 18, 2011 at 3:44 PM, Jeff Squyres wrote: >>>> On Feb 18, 2011, at 1:39 PM, Michael Shuey wrote: >>>> >>>>> RoCE HCAs keep a GID table, like normal HCAs. Every time you bring up >>>>> a vlan interface, another entry gets automatically added to the table. >>>>> If I select one of these other GIDs, packets get a VLAN tag, and that >>>>> contains the necessary priority bits (well, assuming I selected the >>>>> right IB service level, which is mapped to the priority tag in the >>>>> VLAN header) for the traffic to match a lossless class of service on >>>>> the switch. >>>> >>>> Ah -- I see it now (it's been a looong time since I've looked in Open >>>> MPI's verbs code!). We query and simply take the 0th GID from a given IBV >>>> device port's GID table. >>>> >>>>> For this to work, I really need for the IB client to select a >>>>> non-default GID. A few test programs included in OFED will do this, >>>>> but I'm not sure OpenMPI will. Any thoughts? >>>> >>>> Yes, we can do this. It's pretty easy to add an MCA parameter to select >>>> the Nth GID rather than always taking the 0th. >>>> >>>> To make this simple, can you make it so that the value of N is the same >>>> across all nodes in your cluster? Then you can set a site-wide MCA param >>>> for that value of N and be done with this issue. If we have to have a >>>> per-node setting of N, it could get a little hairy (it's do-able, but... >>>> it's a heckuva lot easier if N is the same everywhere). >>>> >>>> -- >>>> Jeff Squyres >>>> jsquy...@cisco.com >>>> For corporate legal information go to: >>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> >>>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] RoCE (IBoE) & OpenMPI
Late yesterday I did have a chance to test the patch Jeff provided (against 1.4.3 - testing 1.5.x is on the docket for today). While it works, in that I can specify a gid_index, it doesn't do everything required - my traffic won't match a lossless CoS on the ethernet switch. Specifying a GID is only half of it; I really need to also specify a service level. The bottom 3 bits of the IB SL are mapped to ethernet's PCP bits in the VLAN tag. With a non-default gid, I can select an available VLAN (so RoCE's packets will include the PCP bits), but the only way to specify a priority is to use an SL. So far, the only RoCE-enabled app I've been able to make work correctly (such that traffic matches a lossless CoS on the switch) is ibv_rc_pingpong - and then, I need to use both a specific GID and a specific SL. The slides Pavel found seem a little misleading to me. The VLAN isn't determined by bound netdev; all VLAN netdevs map to the same IB adapter for RoCE. VLAN is determined by gid index. Also, the SL isn't determined by a set kernel policy; it's provided via the IB interfaces. As near as I can tell from Mellanox's documentation, OFED test apps, and the driver source, a RoCE adapter is an Infiniband card in almost all respects (even more so than an iWARP adapter). -- Mike Shuey On Wed, Feb 23, 2011 at 5:03 PM, Jeff Squyres wrote: > On Feb 23, 2011, at 3:54 PM, Shamis, Pavel wrote: > >> I remember that I updated the trunk to select by default RDMACM connection >> manager for RoCE ports - https://svn.open-mpi.org/trac/ompi/changeset/22311 >> >> I'm not sure it the change made his way to any production version. I don't >> work on this part code anymore :-) > > Mellanox -- can you follow up on this? > > Also, in addition to the patches I provided for selecting an arbitrary GID (I > was planning on committing them when Mike tested them at Purdue, but perhaps > I should just commit to the trunk anyway), perhaps we should check if a > non-default SL is supplied via MCA param in the RoCE case and output an > orte_show_help to warn that it will have no effect (i.e., principle of least > surprise and all that). > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] RoCE (IBoE) & OpenMPI
So, since RoCE has no SM, and setting an SL is required to get lossless ethernet on Cisco switches (and possibly others), does this mean that RoCE will never work correctly with OpenMPI on Cisco hardware? -- Mike Shuey On Tue, Mar 1, 2011 at 3:42 AM, Doron Shoham wrote: > Hi, > > Regarding to using a specific SL with RDMA CM, I've checked in the code and > it seems that RDMA_CM uses the SL from the SA. > So if you want to configure a specific SL, you need to do it via the SM. > > Doron > > -Original Message- > From: Jeff Squyres [mailto:jsquy...@cisco.com] > Sent: Thursday, February 24, 2011 3:45 PM > To: Michael Shuey > Cc: Open MPI Users , Mike Dubman > Subject: Re: [OMPI users] RoCE (IBoE) & OpenMPI > > On Feb 24, 2011, at 8:00 AM, Michael Shuey wrote: > >> Late yesterday I did have a chance to test the patch Jeff provided >> (against 1.4.3 - testing 1.5.x is on the docket for today). While it >> works, in that I can specify a gid_index, > > Great! I'll commit that to the trunk and start the process of moving it to > the v1.5.x series (I know you haven't tested it yet, but it's essentially the > same patch, just slightly adjusted for each of the 3 branches). > >> it doesn't do everything >> required - my traffic won't match a lossless CoS on the ethernet >> switch. Specifying a GID is only half of it; I really need to also >> specify a service level. > > RoCE requires the use of the RDMA CM (I think?), and I didn't think there was > a way to request a specific SL via the RDMA CM...? (I could certainly be > wrong here) > > I think Mellanox will need to follow up with these questions... > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] RoCE (IBoE) & OpenMPI
Honestly, I don't know - I haven't looked into the source. OFED 1.5.2 has a version of ibv_rc_pingpong that's been modified to work with RoCE; you can pass the gid_index and SL as command-line arguments. I'm not sure how that's handled at the IB layer, but the source may be a good place to start. -- Mike Shuey On Tue, Mar 1, 2011 at 9:14 AM, Jeff Squyres wrote: > I thought you mentioned in a prior email that you had gotten one or two other > OFED sample applications to work properly. How are they setting the SL? Are > they not using the RDMA CM? > > > On Mar 1, 2011, at 7:35 AM, Michael Shuey wrote: > >> So, since RoCE has no SM, and setting an SL is required to get >> lossless ethernet on Cisco switches (and possibly others), does this >> mean that RoCE will never work correctly with OpenMPI on Cisco >> hardware? >> >> -- >> Mike Shuey >> >> >> >> On Tue, Mar 1, 2011 at 3:42 AM, Doron Shoham wrote: >>> Hi, >>> >>> Regarding to using a specific SL with RDMA CM, I've checked in the code and >>> it seems that RDMA_CM uses the SL from the SA. >>> So if you want to configure a specific SL, you need to do it via the SM. >>> >>> Doron >>> >>> -Original Message- >>> From: Jeff Squyres [mailto:jsquy...@cisco.com] >>> Sent: Thursday, February 24, 2011 3:45 PM >>> To: Michael Shuey >>> Cc: Open MPI Users , Mike Dubman >>> Subject: Re: [OMPI users] RoCE (IBoE) & OpenMPI >>> >>> On Feb 24, 2011, at 8:00 AM, Michael Shuey wrote: >>> >>>> Late yesterday I did have a chance to test the patch Jeff provided >>>> (against 1.4.3 - testing 1.5.x is on the docket for today). While it >>>> works, in that I can specify a gid_index, >>> >>> Great! I'll commit that to the trunk and start the process of moving it to >>> the v1.5.x series (I know you haven't tested it yet, but it's essentially >>> the same patch, just slightly adjusted for each of the 3 branches). >>> >>>> it doesn't do everything >>>> required - my traffic won't match a lossless CoS on the ethernet >>>> switch. Specifying a GID is only half of it; I really need to also >>>> specify a service level. >>> >>> RoCE requires the use of the RDMA CM (I think?), and I didn't think there >>> was a way to request a specific SL via the RDMA CM...? (I could certainly >>> be wrong here) >>> >>> I think Mellanox will need to follow up with these questions... >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] RDMACM Differences
Alternatively, if OpenMPI is really trying to use both ports, you could force it to use just one port with --mca btl_openib_if_include mlx4_0:1 (probably) -- Mike Shuey On Tue, Mar 1, 2011 at 1:02 PM, Jeff Squyres wrote: > On Feb 28, 2011, at 12:49 PM, Jagga Soorma wrote: > >> -bash-3.2$ mpiexec --mca btl openib,self -mca btl_openib_warn_default_gid_ >> prefix 0 -np 2 --hostfile mpihosts >> /home/jagga/osu-micro-benchmarks-3.3/openmpi/ofed-1.5.2/bin/osu_latency > > Your use of btl_openib_warn_default_gid_prefix may have brought up a subtle > issue in Open MPI's verbs support. More below. > >> # OSU MPI Latency Test v3.3 >> # Size Latency (us) >> [amber04][[10252,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all] >> error modifing QP to RTR errno says Invalid argument >> [amber04][[10252,1],1][connect/btl_openib_connect_oob.c:815:rml_recv_cb] >> error in endpoint reply start connect > > Looking at this error message and your ibv_devinfo output: > >> [root@amber03 ~]# ibv_devinfo >> hca_id: mlx4_0 >> transport: InfiniBand (0) >> fw_ver: 2.7.9294 >> node_guid: 78e7:d103:0021:8884 >> sys_image_guid: 78e7:d103:0021:8887 >> vendor_id: 0x02c9 >> vendor_part_id: 26438 >> hw_ver: 0xB0 >> board_id: HP_020003 >> phys_port_cnt: 2 >> port: 1 >> state: PORT_ACTIVE (4) >> max_mtu: 2048 (4) >> active_mtu: 2048 (4) >> sm_lid: 1 >> port_lid: 20 >> port_lmc: 0x00 >> link_layer: IB >> >> port: 2 >> state: PORT_ACTIVE (4) >> max_mtu: 2048 (4) >> active_mtu: 1024 (3) >> sm_lid: 0 >> port_lid: 0 >> port_lmc: 0x00 >> link_layer: Ethernet > > It looks like you have 1 HCA port as IB and the other at Ethernet. > > I'm wondering if OMPI is not taking the device transport into account and is > *only* using the subnet ID to determine reachability (i.e., I'm wondering if > we didn't anticipate multiple devices/ports with the same subnet ID but with > different transports). I pointed this out to Mellanox yesterday; I think > they're following up on it. > > In the meantime, a workaround might be to set a non-default subnet ID on your > IB network. That should allow Open MPI to tell these networks apart without > additional help. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] Conflicting versions of libgfortran.so with mpif90?
I do IT support for people who are using OpenMPI for research. However, they are reporting the following warnings when compiling code with mpif90: /usr/bin/ld: warning: libgfortran.so.1, needed by /usr/lib64/openmpi/1.4-gcc/lib/libmpi_f90.so, may conflict with libgfortran.so.3 Running ldd on the resulting executable gives: libmpi_f90.so.0 => /usr/lib64/openmpi/1.4-gcc/lib/libmpi_f90.so.0 (0x2b5aac251000) libmpi_f77.so.0 => /usr/lib64/openmpi/1.4-gcc/lib/libmpi_f77.so.0 (0x2b5aac454000) libmpi.so.0 => /usr/lib64/openmpi/1.4-gcc/lib/libmpi.so.0 (0x003df360) libopen-rte.so.0 => /usr/lib64/openmpi/1.4-gcc/lib/libopen-rte.so.0 (0x003df1a0) libopen-pal.so.0 => /usr/lib64/openmpi/1.4-gcc/lib/libopen-pal.so.0 (0x003df1e0) libdl.so.2 => /lib64/libdl.so.2 (0x003df2e0) libnsl.so.1 => /lib64/libnsl.so.1 (0x003df220) libutil.so.1 => /lib64/libutil.so.1 (0x003dff40) libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x2b5aac6a3000) libm.so.6 => /lib64/libm.so.6 (0x003df2a0) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x003e02c0) libpthread.so.0 => /lib64/libpthread.so.0 (0x003df320) libc.so.6 => /lib64/libc.so.6 (0x003df260) libgfortran.so.1 => /usr/lib64/libgfortran.so.1 (0x2b5aac999000) /lib64/ld-linux-x86-64.so.2 (0x003df160) It looks like there are attempts to link to two versions of libgfortran, which aren't compatible. I'm not familiar with OpenMPI myself, but the people using it would like to know how these warnings can be dealt with. -- Michael Cugley School of Engineering IT Support m.cug...@eng.gla.ac.uk Please direct IT support queries to itsupp...@eng.gla.ac.uk
Re: [OMPI users] Conflicting versions of libgfortran.so with mpif90? Solved!
On June 14, 2011 at 12:35 PM, Jeff Squyres wrote: Are they using a different version of gfortran to compile / link their application than what was used to compile / build Open MPI? FWIW: it's typically easier to use the same compilers to build Open MPI as the application. That did, in fact, turn out to be the problem. After some head scratching and the mighty Google, I found this page: https://www.scotgrid.ac.uk/wiki/index.php/Building_OPENMPI which (amongst other things) gave me the requisite flags to feed configure. Oddly, ldd on the resulting executable still shows references to libgfortran.so.3 and .so.1, but the warnings are gone and the user is happy, so I'm counting it as a victory. -- Michael Cugley School of Engineering IT Support m.cug...@eng.gla.ac.uk Please direct IT support queries to itsupp...@eng.gla.ac.uk
[OMPI users] btl_openib_ipaddr_include broken in 1.4.4rc2?
I'm using RoCE (or rather, attempting to) and need to select a non-default GID to get my traffic properly classified. Both 1.4.4rc2 and 1.5.4 support the btl_openib_ipaddr_include option, but only 1.5.4 causes my traffic to use the proper GID and VLAN. Is there something broken with ipaddr_include in 1.4.4rc2? -- Mike Shuey
Re: [OMPI users] Directed to Undirected Graph
You need to use 2 calls. One option is an Allgather followed by an Allgatherv. Allgather() with one integer, which is the number of nodes the rank is linked to Allgatherv() with a variable size array of integers where each entry is a connected to node On 06/05/2012 08:39 AM, Mudassar Majeed wrote: Dear people, Let say there are N MPI processes. Each MPI process has to communicate with some T processes, where T < N. This information is a directed graph (and every process knows only about its own). I need to convert it to undirected graph, so that each process will inform T other processes about it. Every process will update this information. (that may be stored in an array of maximum size N). What can be the best way to exchange this information among all MPI processes ? MPI_AllGather and MPI_AllGatherv do not solve my problem. best regards, -- Mudassar ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Michael A. Raymond SGI MPT Team Leader (651) 683-3434
[OMPI users] mpivars.sh - Intel Fortran 13.1 conflict with OpenMPI 1.6.3
This is for reference and suggestions as this took me several hours to track down and the previous discussion on "mpivars.sh" failed to cover this point (nothing in the FAQ): I successfully build and installed OpenMPI 1.6.3 using the following on Debian Linux: ./configure --prefix=/opt/openmpi/intel131 --disable-ipv6 --with-mpi-f90-size=medium --with-f90-max-array-dim=4 --disable-vt F77=/opt/intel/composer_xe_2013.1.117/bin/intel64/ifort FC=/opt/ intel/composer_xe_2013.1.117/bin/intel64/ifort CXXFLAGS=-m64 CFLAGS=-m64 CC=gcc CXX=g++ (disable-vt was required because of an error finding -lz which I gave up on). My .tcshrc file HAD the following: set path = (/opt/openmpi/intel131/bin $path) setenv LD_LIBRARY_PATH /opt/openmpi/intel131/lib:$LD_LIBRARY_PATH setenv MANPATH /opt/openmpi/intel131/share/man:$MANPATH alias mpirun "mpirun --prefix /opt/openmpi/intel131 " source /opt/intel/composer_xe_2013.1.117/bin/compilervars.csh intel64 For years I have used these procedures on Debian Linux and OS X with earlier versions of OpenMPI and Intel Fortran. However, at some point Intel Fortran started including "mpirt", including: /opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpirun So even through I have the alias set for mpirun, I got the following error: > mpirun -V .: 131: Can't open /opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpivars.sh Part of the confusion is that OpenMPI source does include a reference to "mpivars" in "contrib/dist/linux/openmpi.spec" The solution only occurred as I was writing this up, source intel setup first: source /opt/intel/composer_xe_2013.1.117/bin/compilervars.csh intel64 set path = (/opt/openmpi/intel131/bin $path) setenv LD_LIBRARY_PATH /opt/openmpi/intel131/lib:$LD_LIBRARY_PATH setenv MANPATH /opt/openmpi/intel131/share/man:$MANPATH alias mpirun "mpirun --prefix /opt/openmpi/intel131 " Now I finally get: > mpirun -V mpirun (Open MPI) 1.6.3 The mpi runtime should be in the redistributable for their MPI compiler not in the base compiler. The question is how much of /opt/intel/composer_xe_2013.1.117/mpirt can I eliminate safely and should I ( multi-user machine were each user has their own Intel license, so I don't wish to trouble shoot this in the future ) ?
Re: [OMPI users] mpirun error
The Intel Fortran 2013 compiler comes with support for Intel's MPI runtime and you are getting that instead of OpenMPI. You need to fix your path for all the shells you use. On Apr 1, 2013, at 5:12 AM, Pradeep Jha wrote: > /opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpirun: line 96: > /opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpivars.sh: No such file > or directory
[OMPI users] How to select specific out of multiple interfaces for communication and support for heterogeneous fabrics
Hello OpenMPI We area seriously considering deploying OpenMPI 1.6.5 for production (and 1.7.2 for testing) on HPC clusters which consists of nodes with *different types of networking interfaces*. 1) Interface selection We are using OpenMPI 1.6.5 and was wondering how one would go about selecting* at run time* which networking interface to use for MPI communications in case that both IB, 10GigE and 1 GigE are present. This issues arises in a cluster with nodes that are equipped with different types of interfaces: *Some *have both IB-QDR or FDR and 10- and 1-GigE. Others *only* have 10-GigE and 1-GigE and simply others only 1-GigE. 2) OpenMPI 1.6.5 level of support for Heterogeneous Fabric Can OpenMPI support running an MPI application using a mix of nodes with all of the above networking interface combinations ? 2.a) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run on nodes with QDR IB and another subset on FDR IB simultaneously? These are Mellanox QDR and FDR HCAs. Mellanox mentioned to us that they support both QDR and FDR HCAs attached to the same IB subnet. Do you think MVAPICH2 will have any issue with this? 2.b) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run on nodes with IB and another subset over 10GiGE simultaneously? That is imagine nodes I1, I2, ..., IN having say QDR HCAs and nodes G1, G2, GM having only 10GigE interfaces. Could we have the same MPI application run across both types of nodes? Or should there be say 2 communicators with one of them explicitly overlaid on a IB only subnet and the other on a 10GigE only subnet? Please let me know if the above are not very clear. Thank you much
Re: [OMPI users] How to select specific out of multiple interfaces for communication and support for heterogeneous fabrics
Sorry on the mvapich2 reference :) All nodes are attached over a common 1GigE network. We wish ofcourse that if a node-pair is connected via a higher-speed fabric *as well* (IB FDR or 10GigE) then that this would be leveraged instead of the common 1GigE. One question: suppose that we use nodes having either FDR or QDR IB interfaces available, connected to one common IB fabric, all defined over a common IP subnet: Will OpenMPI have any problem with this? Can MPI communication take place over this type of hybrid IB fabric? We already have a sub-cluster with QDR HCAs and we are attaching it to IB fabric with FDR "backbone" and another cluster with FDR HCAs. Do you think there may be some issue with this? The HCAs are FDR and QDR Mellanox devices and the switching is also over FDR Mellanox fabric. Mellanox claims that at the IB level this is doable (i.e., FDR link pairs talk to each other at FDR speeds and QDR link pairs at QDR). I guess if we use the RC connection types then it does not matter to OpenMPI. thanks .... Michael On Fri, Jul 5, 2013 at 4:59 PM, Ralph Castain wrote: > I can't speak for MVAPICH - you probably need to ask them about this > scenario. OMPI will automatically select whatever available transport that > can reach the intended process. This requires that each communicating pair > of processes have access to at least one common transport. > > So if a process that is on a node with only 1G-E wants to communicate with > another process, then the node where that other process is running must > also have access to a compatible Ethernet interface (1G can talk to 10G, so > they can have different capabilities) on that subnet (or on a subnet that > knows how to route to the other one). If both nodes have 10G-E as well as > 1G-E interfaces, then OMPI will automatically take the 10G interface as it > is the faster of the two. > > Note this means that if a process is on a node that only has IB, and wants > to communicate to a process on a node that only has 1G-E, then the two > processes cannot communicate. > > HTH > Ralph > > On Jul 5, 2013, at 2:34 PM, Michael Thomadakis > wrote: > > Hello OpenMPI > > We area seriously considering deploying OpenMPI 1.6.5 for production (and > 1.7.2 for testing) on HPC clusters which consists of nodes with *different > types of networking interfaces*. > > > 1) Interface selection > > We are using OpenMPI 1.6.5 and was wondering how one would go about > selecting* at run time* which networking interface to use for MPI > communications in case that both IB, 10GigE and 1 GigE are present. > > This issues arises in a cluster with nodes that are equipped with > different types of interfaces: > > *Some *have both IB-QDR or FDR and 10- and 1-GigE. Others *only* have > 10-GigE and 1-GigE and simply others only 1-GigE. > > > 2) OpenMPI 1.6.5 level of support for Heterogeneous Fabric > > Can OpenMPI support running an MPI application using a mix of nodes with > all of the above networking interface combinations ? > > 2.a) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run > on nodes with QDR IB and another subset on FDR IB simultaneously? These are > Mellanox QDR and FDR HCAs. > > Mellanox mentioned to us that they support both QDR and FDR HCAs attached > to the same IB subnet. Do you think MVAPICH2 will have any issue with this? > > 2.b) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run > on nodes with IB and another subset over 10GiGE simultaneously? > > That is imagine nodes I1, I2, ..., IN having say QDR HCAs and nodes G1, > G2, GM having only 10GigE interfaces. Could we have the same MPI > application run across both types of nodes? > > Or should there be say 2 communicators with one of them explicitly > overlaid on a IB only subnet and the other on a 10GigE only subnet? > > > Please let me know if the above are not very clear. > > Thank you much > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] How to select specific out of multiple interfaces for communication and support for heterogeneous fabrics
Great ... thanks. We will try it out as soon as the common backbone IB is in place. cheers Michael On Fri, Jul 5, 2013 at 6:10 PM, Ralph Castain wrote: > As long as the IB interfaces can communicate to each other, you should be > fine. > > On Jul 5, 2013, at 3:26 PM, Michael Thomadakis > wrote: > > Sorry on the mvapich2 reference :) > > All nodes are attached over a common 1GigE network. We wish ofcourse that > if a node-pair is connected via a higher-speed fabric *as well* (IB FDR > or 10GigE) then that this would be leveraged instead of the common 1GigE. > > One question: suppose that we use nodes having either FDR or QDR IB > interfaces available, connected to one common IB fabric, all defined over a > common IP subnet: Will OpenMPI have any problem with this? Can MPI > communication take place over this type of hybrid IB fabric? We already > have a sub-cluster with QDR HCAs and we are attaching it to IB fabric with > FDR "backbone" and another cluster with FDR HCAs. > > Do you think there may be some issue with this? The HCAs are FDR and QDR > Mellanox devices and the switching is also over FDR Mellanox fabric. > Mellanox claims that at the IB level this is doable (i.e., FDR link pairs > talk to each other at FDR speeds and QDR link pairs at QDR). > > I guess if we use the RC connection types then it does not matter to > OpenMPI. > > thanks > Michael > > > > > On Fri, Jul 5, 2013 at 4:59 PM, Ralph Castain wrote: > >> I can't speak for MVAPICH - you probably need to ask them about this >> scenario. OMPI will automatically select whatever available transport that >> can reach the intended process. This requires that each communicating pair >> of processes have access to at least one common transport. >> >> So if a process that is on a node with only 1G-E wants to communicate >> with another process, then the node where that other process is running >> must also have access to a compatible Ethernet interface (1G can talk to >> 10G, so they can have different capabilities) on that subnet (or on a >> subnet that knows how to route to the other one). If both nodes have 10G-E >> as well as 1G-E interfaces, then OMPI will automatically take the 10G >> interface as it is the faster of the two. >> >> Note this means that if a process is on a node that only has IB, and >> wants to communicate to a process on a node that only has 1G-E, then the >> two processes cannot communicate. >> >> HTH >> Ralph >> >> On Jul 5, 2013, at 2:34 PM, Michael Thomadakis >> wrote: >> >> Hello OpenMPI >> >> We area seriously considering deploying OpenMPI 1.6.5 for production (and >> 1.7.2 for testing) on HPC clusters which consists of nodes with *different >> types of networking interfaces*. >> >> >> 1) Interface selection >> >> We are using OpenMPI 1.6.5 and was wondering how one would go about >> selecting* at run time* which networking interface to use for MPI >> communications in case that both IB, 10GigE and 1 GigE are present. >> >> This issues arises in a cluster with nodes that are equipped with >> different types of interfaces: >> >> *Some *have both IB-QDR or FDR and 10- and 1-GigE. Others *only* have >> 10-GigE and 1-GigE and simply others only 1-GigE. >> >> >> 2) OpenMPI 1.6.5 level of support for Heterogeneous Fabric >> >> Can OpenMPI support running an MPI application using a mix of nodes with >> all of the above networking interface combinations ? >> >> 2.a) Can the same MPI code (SPMD or MPMD) have a subset of its ranks >> run on nodes with QDR IB and another subset on FDR IB simultaneously? These >> are Mellanox QDR and FDR HCAs. >> >> Mellanox mentioned to us that they support both QDR and FDR HCAs attached >> to the same IB subnet. Do you think MVAPICH2 will have any issue with this? >> >> 2.b) Can the same MPI code (SPMD or MPMD) have a subset of its ranks run >> on nodes with IB and another subset over 10GiGE simultaneously? >> >> That is imagine nodes I1, I2, ..., IN having say QDR HCAs and nodes G1, >> G2, GM having only 10GigE interfaces. Could we have the same MPI >> application run across both types of nodes? >> >> Or should there be say 2 communicators with one of them explicitly >> overlaid on a IB only subnet and the other on a 10GigE only subnet? >> >> >> Please let me know if the above are not very clear. >> >> Thank you much >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2
Hello OpenMPI, I am wondering what level of support is there for CUDA and GPUdirect on OpenMPI 1.6.5 and 1.7.2. I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, it seems that with configure v1.6.5 it was ignored. Can you identify GPU memory and send messages from it directly without copying to host memory first? Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? Do you support SDK 5.0 and above? Cheers ... Michael
[OMPI users] Question on handling of memory for communications
Hello OpenMPI, When you stack runs on SandyBridge nodes atached to HCAs ove PCI3 *gen 3*do you pay any special attention to the memory buffers according to which socket/memory controller their physical memory belongs to? For instance, if the HCA is attached to the PCIgen3 lanes of Socket 1 do you do anything special when the read/write buffers map to physical memory belonging to Socket 2? Or do you7 avoid using buffers mapping ro memory that belongs (is accessible via) the other socket? Has this situation improved with Ivy-Brige systems or Haswell? Cheers Michael
Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2
thanks, Do you guys have any plan to support Intel Phi in the future? That is, running MPI code on the Phi cards or across the multicore and Phi, as Intel MPI does? thanks... Michael On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain wrote: > Rolf will have to answer the question on level of support. The CUDA code > is not in the 1.6 series as it was developed after that series went > "stable". It is in the 1.7 series, although the level of support will > likely be incrementally increasing as that "feature" series continues to > evolve. > > > On Jul 6, 2013, at 12:06 PM, Michael Thomadakis > wrote: > > > Hello OpenMPI, > > > > I am wondering what level of support is there for CUDA and GPUdirect on > OpenMPI 1.6.5 and 1.7.2. > > > > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, > it seems that with configure v1.6.5 it was ignored. > > > > Can you identify GPU memory and send messages from it directly without > copying to host memory first? > > > > > > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? > Do you support SDK 5.0 and above? > > > > Cheers ... > > Michael > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Question on handling of memory for communications
Hi Jeff, thanks for the reply. The issue is that when you read or write PCIe_gen 3 dat to a non-local NUMA memory, SandyBridge will use the inter-socket QPIs to get this data across to the other socket. I think there is considerable limitation in PCIe I/O traffic data going over the inter-socket QPI. One way to get around this is for reads to buffer all data into memory space local to the same socket and then transfer them by code across to the other socket's physical memory. For writes the same approach can be used with intermediary process copying data. I was wondering if OpenMPI does anything special memory mapping to work around this. And if with Ivy Bridge (or Haswell) he situation has improved. thanks Mike On Mon, Jul 8, 2013 at 9:57 AM, Jeff Squyres (jsquyres) wrote: > On Jul 6, 2013, at 4:59 PM, Michael Thomadakis > wrote: > > > When you stack runs on SandyBridge nodes atached to HCAs ove PCI3 gen 3 > do you pay any special attention to the memory buffers according to which > socket/memory controller their physical memory belongs to? > > > > For instance, if the HCA is attached to the PCIgen3 lanes of Socket 1 do > you do anything special when the read/write buffers map to physical memory > belonging to Socket 2? Or do you7 avoid using buffers mapping ro memory > that belongs (is accessible via) the other socket? > > It is not *necessary* to do ensure that buffers are NUMA-local to the PCI > device that they are writing to, but it certainly results in lower latency > to read/write to PCI devices (regardless of flavor) that are attached to an > MPI process' local NUMA node. The Hardware Locality (hwloc) tool "lstopo" > can print a pretty picture of your server to show you where your PCI busses > are connected. > > For TCP, Open MPI will use all TCP devices that it finds by default > (because it is assumed that latency is so high that NUMA locality doesn't > matter). The openib (OpenFabrics) transport will use the "closest" HCA > ports that it can find to each MPI process. > > In our upcoming Cisco ultra low latency BTL, it defaults to using the > closest Cisco VIC ports that it can find for short messages (i.e., to > minimize latency), but uses all available VICs for long messages (i.e., to > maximize bandwidth). > > > Has this situation improved with Ivy-Brige systems or Haswell? > > It's the same overall architecture (i.e., NUMA). > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2
Thanks ... Michael On Mon, Jul 8, 2013 at 8:50 AM, Rolf vandeVaart wrote: > With respect to the CUDA-aware support, Ralph is correct. The ability to > send and receive GPU buffers is in the Open MPI 1.7 series. And > incremental improvements will be added to the Open MPI 1.7 series. CUDA > 5.0 is supported. > > ** ** > > ** ** > > ** ** > > *From:* users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] *On > Behalf Of *Ralph Castain > *Sent:* Saturday, July 06, 2013 5:14 PM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI > 1.6.5 an 1.7.2 > > ** ** > > There was discussion of this on a prior email thread on the OMPI devel > mailing list: > > ** ** > > http://www.open-mpi.org/community/lists/devel/2013/05/12354.php > > ** ** > > ** ** > > On Jul 6, 2013, at 2:01 PM, Michael Thomadakis > wrote: > > > > > > thanks, > > Do you guys have any plan to support Intel Phi in the future? That is, > running MPI code on the Phi cards or across the multicore and Phi, as Intel > MPI does? > > thanks... > > Michael > > ** ** > > On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain wrote:*** > * > > Rolf will have to answer the question on level of support. The CUDA code > is not in the 1.6 series as it was developed after that series went > "stable". It is in the 1.7 series, although the level of support will > likely be incrementally increasing as that "feature" series continues to > evolve. > > > > On Jul 6, 2013, at 12:06 PM, Michael Thomadakis > wrote: > > > Hello OpenMPI, > > > > I am wondering what level of support is there for CUDA and GPUdirect on > OpenMPI 1.6.5 and 1.7.2. > > > > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, > it seems that with configure v1.6.5 it was ignored. > > > > Can you identify GPU memory and send messages from it directly without > copying to host memory first? > > > > > > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? > Do you support SDK 5.0 and above? > > > > Cheers ... > > Michael > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ** ** > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ** ** > -- > This email message is for the sole use of the intended recipient(s) and > may contain confidential information. Any unauthorized review, use, > disclosure or distribution is prohibited. If you are not the intended > recipient, please contact the sender by reply email and destroy all copies > of the original message. > -- > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Question on handling of memory for communications
People have mentioned that they experience unexpected slow downs in PCIe_gen3 I/O when the pages map to a socket different from the one the HCA connects to. It is speculated that the inter-socket QPI is not provisioned to transfer more than 1GiB/sec for PCIe_gen 3 traffic. This situation may not be in effect on all SandyBrige or IvyBridge systems. Have you measured anything like this on you systems as well? That would require using physical memory mapped to the socket w/o HCA exclusively for MPI messaging. Mike On Mon, Jul 8, 2013 at 10:52 AM, Jeff Squyres (jsquyres) wrote: > On Jul 8, 2013, at 11:35 AM, Michael Thomadakis > wrote: > > > The issue is that when you read or write PCIe_gen 3 dat to a non-local > NUMA memory, SandyBridge will use the inter-socket QPIs to get this data > across to the other socket. I think there is considerable limitation in > PCIe I/O traffic data going over the inter-socket QPI. One way to get > around this is for reads to buffer all data into memory space local to the > same socket and then transfer them by code across to the other socket's > physical memory. For writes the same approach can be used with intermediary > process copying data. > > Sure, you'll cause congestion across the QPI network when you do non-local > PCI reads/writes. That's a given. > > But I'm not aware of a hardware limitation on PCI-requested traffic across > QPI (I could be wrong, of course -- I'm a software guy, not a hardware > guy). A simple test would be to bind an MPI process to a far NUMA node and > run a simple MPI bandwidth test and see if to get better/same/worse > bandwidth compared to binding an MPI process on a near NUMA socket. > > But in terms of doing intermediate (pipelined) reads/writes to local NUMA > memory before reading/writing to PCI, no, Open MPI does not do this. > Unless there is a PCI-QPI bandwidth constraint that we're unaware of, I'm > not sure why you would do this -- it would likely add considerable > complexity to the code and it would definitely lead to higher overall MPI > latency. > > Don't forget that the MPI paradigm is for the application to provide the > send/receive buffer. Meaning: MPI doesn't (always) control where the > buffer is located (particularly for large messages). > > > I was wondering if OpenMPI does anything special memory mapping to work > around this. > > Just what I mentioned in the prior email. > > > And if with Ivy Bridge (or Haswell) he situation has improved. > > Open MPI doesn't treat these chips any different. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2
Thanks Tom, that sounds good. I will give it a try as soon as our Phi host here host gets installed. I assume that all the prerequisite libs and bins on the Phi side are available when we download the Phi s/w stack from Intel's site, right ? Cheers Michael On Mon, Jul 8, 2013 at 12:10 PM, Elken, Tom wrote: >Do you guys have any plan to support Intel Phi in the future? That is, > running MPI code on the Phi cards or across the multicore and Phi, as Intel > MPI does? > > *[Tom] * > > Hi Michael, > > Because a Xeon Phi card acts a lot like a Linux host with an x86 > architecture, you can build your own Open MPI libraries to serve this > purpose. > > Our team has used existing (an older 1.4.3 version of) Open MPI source to > build an Open MPI for running MPI code on Intel Xeon Phi cards over Intel’s > (formerly QLogic’s) True Scale InfiniBand fabric, and it works quite well. > We have not released a pre-built Open MPI as part of any Intel software > release. But I think if you have a compiler for Xeon Phi (Intel Compiler > or GCC) and an interconnect for it, you should be able to build an Open MPI > that works on Xeon Phi. > > Cheers, > Tom Elken > > thanks... > > Michael > > ** ** > > On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain wrote:*** > * > > Rolf will have to answer the question on level of support. The CUDA code > is not in the 1.6 series as it was developed after that series went > "stable". It is in the 1.7 series, although the level of support will > likely be incrementally increasing as that "feature" series continues to > evolve. > > > > On Jul 6, 2013, at 12:06 PM, Michael Thomadakis > wrote: > > > Hello OpenMPI, > > > > I am wondering what level of support is there for CUDA and GPUdirect on > OpenMPI 1.6.5 and 1.7.2. > > > > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, > it seems that with configure v1.6.5 it was ignored. > > > > Can you identify GPU memory and send messages from it directly without > copying to host memory first? > > > > > > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? > Do you support SDK 5.0 and above? > > > > Cheers ... > > Michael > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ** ** > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Question on handling of memory for communications
Hi Brice, thanks for testing this out. How did you make sure that the pinned pages used by the I/O adapter mapped to the "other" socket's memory controller ? Is pining the MPI binary to a socket sufficient to pin the space used for MPI I/O as well to that socket? I think this is something done by and at the HCA device driver level. Anyways, as long as the memory performance difference is a the levels you mentioned then there is no "big" issue. Most likely the device driver get space from the same numa domain that of the socket the HCA is attached to. Thanks for trying it out Michael On Mon, Jul 8, 2013 at 11:45 AM, Brice Goglin wrote: > On a dual E5 2650 machine with FDR cards, I see the IMB Pingpong > throughput drop from 6000 to 5700MB/s when the memory isn't allocated on > the right socket (and latency increases from 0.8 to 1.4us). Of course > that's pingpong only, things will be worse on a memory-overloaded machine. > But I don't expect things to be "less worse" if you do an intermediate copy > through the memory near the HCA: you would overload the QPI link as much as > here, and you would overload the CPU even more because of the additional > copies. > > Brice > > > > Le 08/07/2013 18:27, Michael Thomadakis a écrit : > > People have mentioned that they experience unexpected slow downs in > PCIe_gen3 I/O when the pages map to a socket different from the one the HCA > connects to. It is speculated that the inter-socket QPI is not provisioned > to transfer more than 1GiB/sec for PCIe_gen 3 traffic. This situation may > not be in effect on all SandyBrige or IvyBridge systems. > > Have you measured anything like this on you systems as well? That would > require using physical memory mapped to the socket w/o HCA exclusively for > MPI messaging. > > Mike > > > On Mon, Jul 8, 2013 at 10:52 AM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: > >> On Jul 8, 2013, at 11:35 AM, Michael Thomadakis >> wrote: >> >> > The issue is that when you read or write PCIe_gen 3 dat to a non-local >> NUMA memory, SandyBridge will use the inter-socket QPIs to get this data >> across to the other socket. I think there is considerable limitation in >> PCIe I/O traffic data going over the inter-socket QPI. One way to get >> around this is for reads to buffer all data into memory space local to the >> same socket and then transfer them by code across to the other socket's >> physical memory. For writes the same approach can be used with intermediary >> process copying data. >> >> Sure, you'll cause congestion across the QPI network when you do >> non-local PCI reads/writes. That's a given. >> >> But I'm not aware of a hardware limitation on PCI-requested traffic >> across QPI (I could be wrong, of course -- I'm a software guy, not a >> hardware guy). A simple test would be to bind an MPI process to a far NUMA >> node and run a simple MPI bandwidth test and see if to get >> better/same/worse bandwidth compared to binding an MPI process on a near >> NUMA socket. >> >> But in terms of doing intermediate (pipelined) reads/writes to local NUMA >> memory before reading/writing to PCI, no, Open MPI does not do this. >> Unless there is a PCI-QPI bandwidth constraint that we're unaware of, I'm >> not sure why you would do this -- it would likely add considerable >> complexity to the code and it would definitely lead to higher overall MPI >> latency. >> >> Don't forget that the MPI paradigm is for the application to provide the >> send/receive buffer. Meaning: MPI doesn't (always) control where the >> buffer is located (particularly for large messages). >> >> > I was wondering if OpenMPI does anything special memory mapping to work >> around this. >> >> Just what I mentioned in the prior email. >> >> > And if with Ivy Bridge (or Haswell) he situation has improved. >> >> Open MPI doesn't treat these chips any different. >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > ___ > users mailing > listusers@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2
Thanks Tom, I will test it out... regards Michael On Mon, Jul 8, 2013 at 1:16 PM, Elken, Tom wrote: > ** ** > > Thanks Tom, that sounds good. I will give it a try as soon as our Phi host > here host gets installed. > > ** ** > > I assume that all the prerequisite libs and bins on the Phi side are > available when we download the Phi s/w stack from Intel's site, right ?*** > * > > *[Tom] * > > *Right. When you install Intel’s MPSS (Manycore Platform Software > Stack), including following the section on “OFED Support” in the readme > file, you should have all the prerequisite libs and bins. Note that I have > not built Open MPI for Xeon Phi for your interconnect, but it seems to me > that it should work. * > > * * > > *-Tom* > > ** ** > > Cheers > > Michael > > ** ** > > ** ** > > ** ** > > On Mon, Jul 8, 2013 at 12:10 PM, Elken, Tom wrote:** > ** > > Do you guys have any plan to support Intel Phi in the future? That is, > running MPI code on the Phi cards or across the multicore and Phi, as Intel > MPI does? > > *[Tom] * > > Hi Michael, > > Because a Xeon Phi card acts a lot like a Linux host with an x86 > architecture, you can build your own Open MPI libraries to serve this > purpose. > > Our team has used existing (an older 1.4.3 version of) Open MPI source to > build an Open MPI for running MPI code on Intel Xeon Phi cards over Intel’s > (formerly QLogic’s) True Scale InfiniBand fabric, and it works quite well. > We have not released a pre-built Open MPI as part of any Intel software > release. But I think if you have a compiler for Xeon Phi (Intel Compiler > or GCC) and an interconnect for it, you should be able to build an Open MPI > that works on Xeon Phi. > > Cheers, > Tom Elken > > thanks... > > Michael > > > > On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain wrote:*** > * > > Rolf will have to answer the question on level of support. The CUDA code > is not in the 1.6 series as it was developed after that series went > "stable". It is in the 1.7 series, although the level of support will > likely be incrementally increasing as that "feature" series continues to > evolve. > > > > On Jul 6, 2013, at 12:06 PM, Michael Thomadakis > wrote: > > > Hello OpenMPI, > > > > I am wondering what level of support is there for CUDA and GPUdirect on > OpenMPI 1.6.5 and 1.7.2. > > > > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, > it seems that with configure v1.6.5 it was ignored. > > > > Can you identify GPU memory and send messages from it directly without > copying to host memory first? > > > > > > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? > Do you support SDK 5.0 and above? > > > > Cheers ... > > Michael > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ** ** > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Question on handling of memory for communications
| The driver doesn't allocate much memory here. Maybe some small control buffers, but nothing significantly involved in large message transfer | performance. Everything critical here is allocated by user-space (either MPI lib or application), so we just have to make sure we bind the | process memory properly. I used hwloc-bind to do that. I see ... So the user level process (user or MPI library) sets aside memory (malloc?) and basically then the OFED/IB sets up RDMA messaging with addresses pointing back to that user physical memory. I guess before running the MPI benchmark you requested *data *memory allocation policy to allocate pages "owned" by the other socket? | Note that we have seen larger issues on older platforms. You basically just need a big HCA and PCI link on a not-so-big machine. Not very | common fortunately with todays QPI links between Sandy-Bridge socket, those are quite big compared to PCI Gen3 8x links to the HCA. On | old AMD platforms (and modern Intels with big GPUs), issues are not that uncommon (we've seen up to 40% DMA bandwidth difference | there). The issue that has been observed is with PCIe_gen 3 traffic on attached I/O which, say, reads data off of the HCA and has to store it to memory but when this memory belongs to the other socket. In that case PCI e data uses the QPI links on SB to send out these packets to the other socket. It has been speculated that QPI links where NOT provisioned to transfer more than 1GiB of PCIe data alongside the regular inter-NUMA memory traffic. It may be the case that Intel has re-provisioned QPI to be able to accommodate more PCIe traffic. Thanks again Michael On Mon, Jul 8, 2013 at 1:01 PM, Brice Goglin wrote: > The driver doesn't allocate much memory here. Maybe some small control > buffers, but nothing significantly involved in large message transfer > performance. Everything critical here is allocated by user-space (either > MPI lib or application), so we just have to make sure we bind the process > memory properly. I used hwloc-bind to do that. > > Note that we have seen larger issues on older platforms. You basically > just need a big HCA and PCI link on a not-so-big machine. Not very common > fortunately with todays QPI links between Sandy-Bridge socket, those are > quite big compared to PCI Gen3 8x links to the HCA. On old AMD platforms > (and modern Intels with big GPUs), issues are not that uncommon (we've seen > up to 40% DMA bandwidth difference there). > > Brice > > > > Le 08/07/2013 19:44, Michael Thomadakis a écrit : > > Hi Brice, > > thanks for testing this out. > > How did you make sure that the pinned pages used by the I/O adapter > mapped to the "other" socket's memory controller ? Is pining the MPI binary > to a socket sufficient to pin the space used for MPI I/O as well to that > socket? I think this is something done by and at the HCA device driver > level. > > Anyways, as long as the memory performance difference is a the levels > you mentioned then there is no "big" issue. Most likely the device driver > get space from the same numa domain that of the socket the HCA is attached > to. > > Thanks for trying it out > Michael > > > > > > > On Mon, Jul 8, 2013 at 11:45 AM, Brice Goglin wrote: > >> On a dual E5 2650 machine with FDR cards, I see the IMB Pingpong >> throughput drop from 6000 to 5700MB/s when the memory isn't allocated on >> the right socket (and latency increases from 0.8 to 1.4us). Of course >> that's pingpong only, things will be worse on a memory-overloaded machine. >> But I don't expect things to be "less worse" if you do an intermediate copy >> through the memory near the HCA: you would overload the QPI link as much as >> here, and you would overload the CPU even more because of the additional >> copies. >> >> Brice >> >> >> >> Le 08/07/2013 18:27, Michael Thomadakis a écrit : >> >> People have mentioned that they experience unexpected slow downs in >> PCIe_gen3 I/O when the pages map to a socket different from the one the HCA >> connects to. It is speculated that the inter-socket QPI is not provisioned >> to transfer more than 1GiB/sec for PCIe_gen 3 traffic. This situation may >> not be in effect on all SandyBrige or IvyBridge systems. >> >> Have you measured anything like this on you systems as well? That would >> require using physical memory mapped to the socket w/o HCA exclusively for >> MPI messaging. >> >> Mike >> >> >> On Mon, Jul 8, 2013 at 10:52 AM, Jeff Squyres (jsquyres) < >> jsquy...@cisco.com> wrote: >> >>> On Jul 8, 2013, at 11:35 AM, Michael Thomadakis < >>
Re: [OMPI users] Question on handling of memory for communications
| Remember that the point of IB and other operating-system bypass devices is that the driver is not involved in the fast path of sending / | receiving. One of the side-effects of that design point is that userspace does all the allocation of send / receive buffers. That's a good point. It was not clear to me who and with what logic was allocating memory. But definitely for IB it makes sense that the user provides pointers to their memory. thanks Michael On Mon, Jul 8, 2013 at 1:07 PM, Jeff Squyres (jsquyres) wrote: > On Jul 8, 2013, at 2:01 PM, Brice Goglin wrote: > > > The driver doesn't allocate much memory here. Maybe some small control > buffers, but nothing significantly involved in large message transfer > performance. Everything critical here is allocated by user-space (either > MPI lib or application), so we just have to make sure we bind the process > memory properly. I used hwloc-bind to do that. > > +1 > > Remember that the point of IB and other operating-system bypass devices is > that the driver is not involved in the fast path of sending / receiving. > One of the side-effects of that design point is that userspace does all > the allocation of send / receive buffers. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2
Hi Tim, Well, in general and not on MIC I usually build the MPI stacks using the Intel compiler set. Have you ran into s/w that requires GCC instead of Intel compilers (beside Nvidia Cuda)? Did you try to use Intel compiler to produce MIC native code (the OpenMPI stack for that matter)? regards Michael On Mon, Jul 8, 2013 at 4:30 PM, Tim Carlson wrote: > On Mon, 8 Jul 2013, Elken, Tom wrote: > > It isn't quite so easy. > > Out of the box, there is no gcc on the Phi card. You can use the cross > compiler on the host, but you don't get gcc on the Phi by default. > > See this post > http://software.intel.com/en-**us/forums/topic/382057<http://software.intel.com/en-us/forums/topic/382057> > > I really think you would need to build and install gcc on the Phi first. > > My first pass at doing a cross-compile with the GNU compilers failed to > produce something with OFED support (not surprising) > > export PATH=/usr/linux-k1om-4.7/bin:$**PATH > ./configure --build=x86_64-unknown-linux-**gnu --host=x86_64-k1om-linux \ > --disable-mpi-f77 > > checking if MCA component btl:openib can compile... no > > > Tim > > > >> >> >> Thanks Tom, that sounds good. I will give it a try as soon as our Phi host >> here host gets installed. >> >> >> >> I assume that all the prerequisite libs and bins on the Phi side are >> available when we download the Phi s/w stack from Intel's site, right ? >> >> [Tom] >> >> Right. When you install Intel’s MPSS (Manycore Platform Software Stack), >> including following the section on “OFED Support” in the readme file, you >> should have all the prerequisite libs and bins. Note that I have not >> built >> Open MPI for Xeon Phi for your interconnect, but it seems to me that it >> should work. >> >> >> >> -Tom >> >> >> >> Cheers >> >> Michael >> >> >> >> >> >> >> >> On Mon, Jul 8, 2013 at 12:10 PM, Elken, Tom wrote: >> >> Do you guys have any plan to support Intel Phi in the future? That is, >> running MPI code on the Phi cards or across the multicore and Phi, as >> Intel >> MPI does? >> >> [Tom] >> >> Hi Michael, >> >> Because a Xeon Phi card acts a lot like a Linux host with an x86 >> architecture, you can build your own Open MPI libraries to serve this >> purpose. >> >> Our team has used existing (an older 1.4.3 version of) Open MPI source to >> build an Open MPI for running MPI code on Intel Xeon Phi cards over >> Intel’s >> (formerly QLogic’s) True Scale InfiniBand fabric, and it works quite >> well. >> We have not released a pre-built Open MPI as part of any Intel software >> release. But I think if you have a compiler for Xeon Phi (Intel Compiler >> or GCC) and an interconnect for it, you should be able to build an Open >> MPI >> that works on Xeon Phi. >> >> Cheers, >> Tom Elken >> >> thanks... >> >> Michael >> >> >> >> On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain wrote: >> >> Rolf will have to answer the question on level of support. The CUDA code >> is >> not in the 1.6 series as it was developed after that series went "stable". >> It is in the 1.7 series, although the level of support will likely be >> incrementally increasing as that "feature" series continues to evolve. >> >> >> >> On Jul 6, 2013, at 12:06 PM, Michael Thomadakis > > >> wrote: >> >> > Hello OpenMPI, >> > >> > I am wondering what level of support is there for CUDA and GPUdirect on >> OpenMPI 1.6.5 and 1.7.2. >> > >> > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, >> it >> seems that with configure v1.6.5 it was ignored. >> > >> > Can you identify GPU memory and send messages from it directly without >> copying to host memory first? >> > >> > >> > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? >> Do >> you support SDK 5.0 and above? >> > >> > Cheers ... >> > Michael >> >> > __**_ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users> >> >> >> __**_ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users> >> >> >> >> >> __**_ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/**mailman/listinfo.cgi/users<http://www.open-mpi.org/mailman/listinfo.cgi/users> >> >> >> >> >> >> > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2
Tim, thanks for trying this out ... Now you should be able to let part of the same OpenMPI application run on the host multi-core side and the other part on the MIC. IntelMPI can do this using an MPMD command line where the Xeon binaries run on the host, whereas the MIC ones on MIC card(s). I guess you should be able to directly do this from the same OpenMPI mpirun command line ... thanks Michael On Tue, Jul 9, 2013 at 12:18 PM, Tim Carlson wrote: > On Mon, 8 Jul 2013, Tim Carlson wrote: > > Now that I have gone through this process, I'll report that it works with > the caveat that you can't use the openmpi wrappers for compiling. Recall > that the Phi card does not have either the GNU or Intel compilers > installed. While you could build up a tool chain for the GNU compilers, > you're not going to get a native Intel compiler unless Intel decides to > support it. > > Here is the process from end to end to get Openmpi to build a native Phi > application. > > export PATH=/usr/linux-k1om-4.7/bin:$**PATH > . /share/apps/intel/composer_xe_**2013.3.163/bin/iccvars.sh intel64 > export CC="icc -mmic" > export CXX="icpc -mmic" > > cd ~ > tar zxf openmpi-1.6.4.tar.gz > cd openmpi-1.6.4 > ./configure --prefix=/people/tim/mic/**openmpi/intel \ > --build=x86_64-unknown-linux-**gnu --host=x86_64-k1om-linux \ > --disable-mpi-f77 \ > AR=x86_64-k1om-linux-ar RANLIB=x86_64-k1om-linux-**ranlib > LD=x86_64-k1om-linux-ld > make > make install > > That leaves me with a native build of openmpi in > /people/tim/mic/openmpi/intel > > It is of course tempting to just do a > export PATH=/people/tim/mic/openmpi/**intel/bin:$PATH > and start using mpicc to build my code but that does not work because: > > 1) If I try this on the host system I am going to get "wrong architecture" > because mpicc was build for the Phi and not for the x86_64 host > > 2) If I try running it on the Phi, I don't have access to "icc" because I > can't run the compiler directly on the Phi. > > I can "cheat" and see what the mpicc command really does by using "mpicc > --show" for another installation of openmpi and munge the paths correctly. > In this case > > icc -mmic cpi.c -I/people/tim/mic/openmpi/**intel/include -pthread \ > -L/people/tim/mic/openmpi/**intel/lib -lmpi -ldl -lm -Wl,--export-dynamic > \ > -lrt -lnsl -lutil -lm -ldl -o cpi.x > > That leaves me with a Phi native version of cpi.x which I can then execute > on the Phi > > $ ssh phi002-mic0 > > ( I have NFS mounts on the Phi for all the bits I need ) > > ~ $ export PATH=/people/tim/mic/openmpi/**intel/bin/:$PATH > ~ $ export LD_LIBRARY_PATH=/share/apps/**intel/composer_xe_2013.3.163/** > compiler/lib/mic/ > ~ $ export LD_LIBRARY_PATH=/people/tim/**mic/openmpi/intel/lib:$LD_** > LIBRARY_PATH > ~ $ cd mic > ~/mic $ mpirun -np 12 cpi.x > Process 7 on phi002-mic0.local > Process 10 on phi002-mic0.local > Process 2 on phi002-mic0.local > Process 9 on phi002-mic0.local > Process 1 on phi002-mic0.local > Process 3 on phi002-mic0.local > Process 11 on phi002-mic0.local > Process 5 on phi002-mic0.local > Process 8 on phi002-mic0.local > Process 4 on phi002-mic0.local > Process 6 on phi002-mic0.local > Process 0 on phi002-mic0.local > pi is approximately 3.1416009869231245, Error is 0.0814 > wall clock time = 0.001766 > > > > On Mon, 8 Jul 2013, Elken, Tom wrote: >> >> My mistake on the OFED bits. The host I was installing on did not have >> all of the MPSS software installed (my cluster admin node and not one of >> the compute nodes). Adding the intel-mic-ofed-card RPM fixed the problem >> with compiling the btl:openib bits with both the GNU and Intel compilers >> using the cross-compiler route (-mmic on the Intel side) >> >> Still working on getting the resulting mpicc wrapper working on the MIC >> side. When I get a working example I'll post the results. >> >> Thanks! >> >> Tim >> >> >> >>> >>> >>> Hi Tim, >>> >>> >>> >>> >>> >>> Well, in general and not on MIC I usually build the MPI stacks using the >>> Intel compiler set. Have you ran into s/w that requires GCC instead of >>> Intel >>> compilers (beside Nvidia Cuda)? Did you try to use Intel compiler to >>> produce >>> MIC native code (the OpenMPI stack for that matter)? >>> >>> [Tom] >>> >>> Good idea Michael, With the Intel Compiler, I would use the -mmic flag >>> to >>> build MIC code. >>> >>> >
[OMPI users] Planned support for Intel Phis
Hello OpenMPI, I was wondering what is the support that is being implemented for the Intel Phi platforms. That is would we be able to run MPI code in "symmetric" fashion, where some ranks run on the cores of the multicore hostst and some on the cores of the Phis in a multinode cluster environment. Also is it based on OFED 1.5.4.1 or on which OFED? Best regards Michael
Re: [OMPI users] openmpi fails on mx endpoint busy
If the machine is multi-processor you might want to add the sm btl. That cleared up some similar problems for me, though I don't use mx so your millage may vary. On 7/5/07, SLIM H.A. wrote: Hello I have compiled openmpi-1.2.3 with the --with-mx= configuration and gcc compiler. On testing with 4-8 slots I get an error message, the mx ports are busy: >mpirun --mca btl mx,self -np 4 ./cpi [node001:10071] mca_btl_mx_init: mx_open_endpoint() failed with status=20 [node001:10074] mca_btl_mx_init: mx_open_endpoint() failed with status=20 [node001:10073] mca_btl_mx_init: mx_open_endpoint() failed with status=20 -- Process 0.1.0 is unable to reach 0.1.1 for MPI communication. If you specified the use of a BTL component, you may have forgotten a component (such as "self") in the list of usable components. ... snipped It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Unreachable" (-12) instead of "Success" (0) -- *** An error occurred in MPI_Init *** before MPI was initialized *** MPI_ERRORS_ARE_FATAL (goodbye) mpirun noticed that job rank 0 with PID 10071 on node node001 exited on signal 1 (Hangup). I would not expect mx messages as communication should not go through the mx card? (This is a twin dual core shared memory node) The same happens when testing on 2 nodes, using a hostfile. I checked the state of the mx card with mx_endpoint_info and mx_info, they are healthy and free. What is missing here? Thanks Henk ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] mpirun hanging followup
If you are having difficulty getting openmpi set up yourself, you might look into OSCAR or Rocks, they make setting up your cluster much easier and include various mpi packages as well as other utilities for reducing your management overhead. I can help you (off list) get set up with OSCAR if you like, and there are very helpful mailing lists for both projects. On 7/17/07, Bill Johnstone wrote: Hello all. I could really use help trying to figure out why mpirun is hanging as detailed in my previous message yesterday, 16 July. Since there's been no response, please allow me to give a short summary. -Open MPI 1.2.3 on GNU/Linux, 2.6.21 kernel, gcc 4.1.2, bash 3.2.15 is default shell -Open MPI installed to /usr/local, which is in non-interactive session path -Systems are AMD64, using ethernet as interconnect, on private IP network mpirun hangs whenever I invoke any process running on a remote node. It runs a job fine if I invoke it so that it only runs on the local node. Ctrl+C never successfully cancels an mpirun job -- I have to use kill -9. I'm asking for help trying to figure what steps have been taken by mpirun, and how I can figure out where things are getting stuck / crashing. What could be happening on the remote nodes? What debugging steps can I take? Without MPI running, the cluster is of no use, so I would really appreciate some help here. Need Mail bonding? Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users. http://answers.yahoo.com/dir/?link=list&sid=396546091 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] mpirun hanging followup
On 7/17/07, Bill Johnstone wrote: Thanks for the help. I've replied below. --- "G.O." wrote: > 1- Check to make sure that there are no firewalls blocking > traffic between the nodes. There is no firewall in-between the nodes. If I run jobs directly via ssh, e.g. "ssh node4 env" they work. Are you using host based authentication of some kind? ie, are you being prompted for a password when you ssh between nodes?
[OMPI users] OpenMPI and PathScale problem
I'm trying to make work the pathscale fortran compiler with OpenMPI on a 64bit Linux machine and can't get passed a simple demo program. Here is detailed info: pathf90 -v PathScale EKOPath(TM) Compiler Suite: Version 2.5 Built on: 2006-08-22 21:02:51 -0700 Thread model: posix GNU gcc version 3.3.1 (PathScale 2.5 driver) mpif90 --show pathf90 -I/home/fort/usr//include -pthread -I/home/fort/usr//lib -L/home/fort/usr//lib -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl The OpenMPI version 1.2.3 resides in the /home/fort/usr/ directory. When I compile a simple program using mpif90 -o test test.f90 I get a binary all right but it has broken linked libraries ldd test libmpi_f90.so.0 => not found libmpi_f77.so.0 => not found libmpi.so.0 => /usr/lib64/lam/libmpi.so.0 (0x003db360) libopen-rte.so.0 => not found libopen-pal.so.0 => not found libdl.so.2 => /lib64/libdl.so.2 (0x003db320) libnsl.so.1 => /lib64/libnsl.so.1 (0x003db990) libutil.so.1 => /lib64/libutil.so.1 (0x003db840) libmv.so.1 => /opt/pathscale/lib/2.5/libmv.so.1 (0x002a9557f000) libmpath.so.1 => /opt/pathscale/lib/2.5/libmpath.so.1 (0x002a956a8000) libm.so.6 => /lib64/tls/libm.so.6 (0x003db300) libpathfortran.so.1 => /opt/pathscale/lib/2.5/libpathfortran.so.1 (0x002a957c9000) libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x003db380) libc.so.6 => /lib64/tls/libc.so.6 (0x003db2d0) /lib64/ld-linux-x86-64.so.2 (0x003db290) The demo program fails to start due to missing shared libraries. In addition the pathf90 uses some lame mpi library instead of openMPI! Any ideas on where the problem could be? Michael ******** Mgr. Michael Komm Tokamak Department Institute of Plasma Physics of Academy of Sciences of Czech Republic E-mail:k...@ipp.cas.cz Za Slovankou 3 182 00 PRAGUE 8
Re: [OMPI users] OpenMPI and PathScale problem
Thanks Christian it works just fine now! I altered LIBRARY_PATH and LD_PATH but not this one :) Michael __ > Od: christian.bec...@math.uni-dortmund.de > Komu: Open MPI Users > Datum: 07.08.2007 19:32 > Předmět: Re: [OMPI users] OpenMPI and PathScale problem > >Hi Michael, > >you have to add the path to the openmpi libraries to the LD_LIBRARY_PATH variable > >export LD_LIBRARY_PATH=/home/fort/usr//lib > >should fix the problem. > >Bye, >Christian > >Michael Komm wrote: >> I'm trying to make work the pathscale fortran compiler with OpenMPI on a 64bit Linux machine and can't get passed a simple demo program. Here is detailed info: >> >> pathf90 -v >> PathScale EKOPath(TM) Compiler Suite: Version 2.5 >> Built on: 2006-08-22 21:02:51 -0700 >> Thread model: posix >> GNU gcc version 3.3.1 (PathScale 2.5 driver) >> >> mpif90 --show >> pathf90 -I/home/fort/usr//include -pthread -I/home/fort/usr//lib -L/home/fort/usr//lib -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl >> >> The OpenMPI version 1.2.3 resides in the /home/fort/usr/ directory. >> >> When I compile a simple program using >> >> mpif90 -o test test.f90 >> >> I get a binary all right but it has broken linked libraries >> >> ldd test >> libmpi_f90.so.0 => not found >> libmpi_f77.so.0 => not found >> libmpi.so.0 => /usr/lib64/lam/libmpi.so.0 (0x003db360) >> libopen-rte.so.0 => not found >> libopen-pal.so.0 => not found >> libdl.so.2 => /lib64/libdl.so.2 (0x003db320) >> libnsl.so.1 => /lib64/libnsl.so.1 (0x003db990) >> libutil.so.1 => /lib64/libutil.so.1 (0x003db840) >> libmv.so.1 => /opt/pathscale/lib/2.5/libmv.so.1 (0x002a9557f000) >> libmpath.so.1 => /opt/pathscale/lib/2.5/libmpath.so.1 (0x002a956a8000) >> libm.so.6 => /lib64/tls/libm.so.6 (0x003db300) >> libpathfortran.so.1 => /opt/pathscale/lib/2.5/libpathfortran.so.1 (0x002a957c9000) >> libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x003db380) >> libc.so.6 => /lib64/tls/libc.so.6 (0x003db2d0) >> /lib64/ld-linux-x86-64.so.2 (0x003db290) >> >> The demo program fails to start due to missing shared libraries. In addition the pathf90 uses some lame mpi library instead of openMPI! Any ideas on where the problem could be? >> >> Michael >> >> >> Mgr. Michael Komm >> Tokamak Department >> Institute of Plasma Physics of Academy of Sciences of Czech Republic >> E-mail:k...@ipp.cas.cz >> Za Slovankou 3 >> 182 00 >> PRAGUE 8 >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > >___ >users mailing list >us...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] sed: 33: ...: unescaped newline inside substitute pattern
I have been attempting to compile open-mpi, both 1.2.2 and 1.2.3 on a new iMac (core 2 duo, 2.4 GHz, OS X 10.4.10), using gfortran as my fortran compiler, and a very recent Xtools (ld -v gives version cctools-622.5.obj~13). I have tried both the full line, configure --prefix=/usr/local/openmpi --disable-mpi-cxx --disable-mpi- f90 --without-xgrid FC=gfortran as well as a truncated line, configure --prefix=/usr/local/openmpi and switched compilers via setenv FC g95 configure --prefix=/usr/local/openmpi --disable-mpi-cxx --disable-mpi- f90 --without-xgrid and in all cases, after minutes of working away, get to the point that someone else got to last year (when it tries to create the Makefiles, etc) and get the following output (approximately 200 pairs of sed:33 and sed:4's). This has been happening for over a week, with reboots every night. I attach the configure terminal output as well as the log file (for a 1.2.2 attempt). ompi-output.tar.gz Description: GNU Zip compressed data ...checking for OMPI LIBS... -lSystemStubs checking for OMPI extra include dirs... openmpi *** Final output configure: creating ./config.status config.status: creating ompi/include/ompi/version.h sed: 33: ./confstatkVPvQm/subs-3.sed: unescaped newline inside substitute pattern sed: 4: ./confstatkVPvQm/subs-4.sed: unescaped newline inside substitute pattern config.status: creating orte/include/orte/version.h sed: 33: ./confstatkVPvQm/subs-3.sed: unescaped newline inside substitute pattern Michael Clover mclo...@san.rr.com
[OMPI users] ompi-1.2.4 fails to make on iMac (10.4.10)
I was just trying to install openmpi-1.2.4 on a brand new iMac (2.4 GHZ Intel Core 2 Duo, 1GB RAM, OSX 10.4.10), having just loaded the xtools environnment. I am able to successfully run the configure, but make dies instantly: configure -prefix=/usr/local/openmpi --disable-mpi-cxx --disable-mpi- f90 --without-xgrid FC=gfortran | tee config.out ... config.status: executing depfiles commands config.status: executing libtool commands cloverm:~/bin/openmpi-1.2.4:[22]>make -j 4 |& tee make.out Makefile:602: *** missing separator. Stop. cloverm:~/bin/openmpi-1.2.4:[23]>ls *.out config.out make.out cloverm:~/bin/openmpi-1.2.4:[24]>tar -zcvf ompi-output.tar.gz *.log *.out config.log config.out make.out cloverm:~/bin/openmpi-1.2.4:[25]>ld -v Apple Computer, Inc. version cctools-622.5.obj~13 I have copied lines 599-609 from Makefile, so you can see that Make is trying to run gcc, in a way that doesn't look correct OMPI_AS_GLOBAL = OMPI_AS_LABEL_SUFFIX = OMPI_CC_ABSOLUTE = DISPLAY known /usr/bin/gcc OMPI_CONFIGURE_DATE = Sat Oct 6 16:05:59 PDT 2007 OMPI_CONFIGURE_HOST = michael-clovers-computer.local OMPI_CONFIGURE_USER = mrc OMPI_CXX_ABSOLUTE = DISPLAY known /usr/bin/g++ OMPI_F77_ABSOLUTE = none OMPI_F90_ABSOLUTE = none I am also attaching the tee'd results, the config.log, and the Makefile that doesn't work: cloverm:~/bin/openmpi-1.2.4:[27]>tar -zcvf ompi-output.tar.gz *.log *.out Makefile config.log config.out make.out Makefile ompi-output.tar.gz Description: GNU Zip compressed data Michael Clover mclo...@san.rr.com
[OMPI users] sed and openmpi on Mac OSX 10.4.10
Jeff, I tried to look at the checksum of my version of sed, and got a different number. I also found instructions on an Octave web page about loading the GNU sed on a Mac, to replace the POSIX flavored one that comes with it. I was able to load sed-4.1.4, but still don't get your checksums (I changed the name of the original Mac sed to __sed). Are you using the Mac supplied sed or not? cloverm:~/bin/openmpi-1.2.3:[132]>md5 /usr/local/bin/sed MD5 (/usr/local/bin/sed) = 01f9ed14ed1fa9fcf7406dd8a609 cloverm:~/bin/openmpi-1.2.3:[133]>md5 /usr/bin/__sed MD5 (/usr/bin/__sed) = e8e106779d71f6f2cca9c7157ce4b5eb However, this new sed made only a slight difference on openmpi-1.2.3: instead of getting unescaped newlines, I now get unterminated "s" commands: (and with openmpi-1.2.4, I still get the same problem reported yesterday when I try to "make" the successfully configured 1.2.4, namely that line 602 of Makefile is missing a separator). checking for OMPI LIBS... checking for OMPI extra include dirs... *** Final output configure: creating ./config.status config.status: creating ompi/include/ompi/version.h sed: file ./confstatA1BhUF/subs-3.sed line 33: unterminated `s' command sed: file ./confstatA1BhUF/subs-4.sed line 4: unterminated `s' command config.status: creating orte/include/orte/version.h Michael Clover mclo...@san.rr.com
[OMPI users] ompi-1.2.4 fails to make on iMac (10.4.10)
Reuti, My gcc is also in /usr/bin as gcc: cloverm:~:[5]>which gcc /usr/bin/gcc cloverm:~:[6]>gcc -v Using built-in specs. Target: i686-apple-darwin8 Configured with: /private/var/tmp/gcc/gcc-5367.obj~1/src/configure -- disable-checking -enable-werror --prefix=/usr --mandir=/share/man -- enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg] [^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with- slibdir=/usr/lib --build=powerpc-apple-darwin8 --with-arch=nocona -- with-tune=generic --program-prefix= --host=i686-apple-darwin8 -- target=i686-apple-darwin8 Thread model: posix gcc version 4.0.1 (Apple Computer, Inc. build 5367) I thought the " DISPLAY known" might have been some result of my .tcshrc file, so I started up sh in a terminal window before running configure and make, but I still get the same error Michael Clover mclo...@san.rr.com On Oct 8, 2007, at 9:00 , users-requ...@open-mpi.org wrote: Message: 2 Date: Mon, 8 Oct 2007 17:19:57 +0200 From: Reuti Subject: Re: [OMPI users] ompi-1.2.4 fails to make on iMac (10.4.10) To: Open MPI Users Message-ID: <897b6169-808d-4581-a19b-4cb8da2e3...@staff.uni-marburg.de> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Am 07.10.2007 um 01:16 schrieb Michael Clover: I was just trying to install openmpi-1.2.4 on a brand new iMac (2.4 GHZ Intel Core 2 Duo, 1GB RAM, OSX 10.4.10), having just loaded the xtools environnment. I am able to successfully run the configure, but make dies instantly: configure -prefix=/usr/local/openmpi --disable-mpi-cxx --disable- mpi-f90 --without-xgrid FC=gfortran | tee config.out ... config.status: executing depfiles commands config.status: executing libtool commands cloverm:~/bin/openmpi-1.2.4:[22]>make -j 4 |& tee make.out Makefile:602: *** missing separator. Stop. cloverm:~/bin/openmpi-1.2.4:[23]>ls *.out config.out make.out cloverm:~/bin/openmpi-1.2.4:[24]>tar -zcvf ompi-output.tar.gz *.log *.out config.log config.out make.out cloverm:~/bin/openmpi-1.2.4:[25]>ld -v Apple Computer, Inc. version cctools-622.5.obj~13 I have copied lines 599-609 from Makefile, so you can see that Make is trying to run gcc, in a way that doesn't look correct OMPI_AS_GLOBAL = OMPI_AS_LABEL_SUFFIX = OMPI_CC_ABSOLUTE = DISPLAY known /usr/bin/gcc The "DISPLAY known" shouldn't be there. What is a plain: which gcc giving? Just /usr/bin/gcc as for me or something more? -- Reuti OMPI_CONFIGURE_DATE = Sat Oct 6 16:05:59 PDT 2007 OMPI_CONFIGURE_HOST = michael-clovers-computer.local OMPI_CONFIGURE_USER = mrc OMPI_CXX_ABSOLUTE = DISPLAY known /usr/bin/g++ OMPI_F77_ABSOLUTE = none OMPI_F90_ABSOLUTE = none
[OMPI users] ompi-1.2.4 fails to make on iMac (10.4.10)
Jeff, as it turned out, my .tcshrc file did output "DISPLAY known"... I had logic to set DISPLAY if it was undefined: if ( ! $?DISPLAY ) then if ( ! $?SSH_CLIENT ) then if ( "$OS" == "darwin") then # ... irrelevant else echo "no environment variable to capture your IP from" set w_data = $user setenv DISPLAY ${w_data}:0.0 endif else set whom = `echo $SSH_CLIENT ` if ( $?whom ) then set i_am = `echo $whom[1] | sed -e "s/:::128/128/"` setenv DISPLAY ${i_am}:0.0 echo " DISPLAY set from SSH_CLIENT" endif endif else echo " DISPLAY known" endif I have commented out the echo of "display known", and now openmpi-1.2.4 makes just fine. Furthermore, openmpi-1.2.3 no longer generates the unterminated newlines for sed either, and also makes correctly. I must have mistyped something when I grep'ed for "display" or "known" before my reply to Reuti, since I didn't find it until your question. thanks for all the help. Michael Clover mclo...@san.rr.com On Oct 8, 2007, at 11:09 , users-requ...@open-mpi.org wrote: On Oct 8, 2007, at 9:00 , users-requ...@open-mpi.org wrote: -- Message: 5 Date: Tue, 9 Oct 2007 08:08:23 +0200 From: Jeff Squyres Subject: Re: [OMPI users] ompi-1.2.4 fails to make on iMac (10.4.10) To: Open MPI Users Message-ID: <897eb321-9a89-4d8b-8b19-d53225573...@cisco.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed From the files you attached, I see the following in config.log: OMPI_CC_ABSOLUTE=' DISPLAY known and several lines later: OMPI_CXX_ABSOLUTE=' DISPLAY known But in Makefile, I see this bogus 2-line value (same as you noted): OMPI_CC_ABSOLUTE = DISPLAY known /usr/bin/gcc and several lines later: OMPI_CXX_ABSOLUTE = DISPLAY known /usr/bin/g++ Note that we set these two values in configure with the following commands: OMPI_CC_ABSOLUTE="`which $CC`" OMPI_CXX_ABSOLUTE="`which $CXX`" So I *suspect* that the bogus values in config.status are totally hosing you when trying to create all the other files -- the version of "sed" is a red herring. What exactly is your output when you run "which gcc" and "which g+ +"? We are blindly taking the whole value -- mainly because I've never seen "which foo" give more than one line on stdout. ;-) What *could* be happening is that your shell startup files are generating some output (e.g., "DISPLAY known") and that's being output before "which foo" is run because of the `` usage. Do your shell startup files emit "DISPLAY known" when you start up? -- ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users End of users Digest, Vol 713, Issue 1 *
[OMPI users] mpicc Segmentation Fault with Intel Compiler
Hi, I've the same problem described by some other users, that I can't compile anything if I'm using the open-mpi compiled with the Intel- Compiler. > ompi_info --all Segmentation fault OpenSUSE 10.3 Kernel: 2.6.22.9-0.4-default Intel P4 Configure-Flags: CC=icc, CXX=icpc, F77=ifort, F90=ifort Intel-Compiler: both, C and Fortran 10.0.025 Is there any known solution? Thanks, Michael
Re: [OMPI users] mpicc Segmentation Fault with Intel Compiler
On 06.11.2007, at 10:42, Åke Sandgren wrote: Hi, On Tue, 2007-11-06 at 10:28 +0100, Michael Schulz wrote: Hi, I've the same problem described by some other users, that I can't compile anything if I'm using the open-mpi compiled with the Intel- Compiler. ompi_info --all Segmentation fault OpenSUSE 10.3 Kernel: 2.6.22.9-0.4-default Intel P4 Configure-Flags: CC=icc, CXX=icpc, F77=ifort, F90=ifort Intel-Compiler: both, C and Fortran 10.0.025 Is there any known solution? I had the same problem with pathscale. Try this, i think it is the solution i found. diff -ru site/opal/runtime/opal_init.c amd64_ubuntu606-psc/opal/runtime/opal_init.c --- site/opal/runtime/opal_init.c 2007-10-20 03:00:35.0 +0200 +++ amd64_ubuntu606-psc/opal/runtime/opal_init.c2007-10-23 16:12:15.0 +0200 @@ -169,7 +169,7 @@ } /* register params for opal */ -if (OPAL_SUCCESS != opal_register_params()) { +if (OPAL_SUCCESS != (ret = opal_register_params())) { error = "opal_register_params"; goto return_error; } thanks, but this doesn't solve my segv Problem. Michael
[OMPI users] CfP 3rd Workshop on Virtualization in HPC Cluster and Grid Computing Environments (VHPC'08)
Apologies if you received multiple copies of this message. === CALL FOR PAPERS 3rd Workshop on Virtualization in High-Performance Cluster and Grid Computing (VHPC'08) as part of Euro-Par 2008, Las Palmas de Gran Canaria, Canary Island, Spain === Date: August 26-29, 2008 Euro-Par 2008: http://europar2008.caos.uab.es/ Workshop URL: http://xhpc.wu-wien.ac.at/ SUBMISSION DEADLINE: Abstracts: February 4, 2008 Full Paper: April 14, 2008 Scope: Virtual machine monitors (VMMs) are becoming tightly integrated with standard OS distributions, leading to increased adoption in many application areas including scientific educational and high-performance computing (HPC). VMMs allow for the concurrent execution of potentially large numbers of virtual machines, providing encapsulation, isolation, and the possibility for migrating VMs between physical hosts. These features enable physical clusters to be treated as "computation pools", where a variety of execution environments can be dynamically instantiated on the underlying hardware. VM technology is therefore opening up new architectures and services for HPC in cluster and grid environments, but consensus has not yet emerged on the best models and tools. This workshop aims to bring together researchers and practitioners working on virtualization in HPC environments, with the goal of sharing experience and promoting the development of a research community in this emerging area. The workshop will be one day in length, composed of 20 min paper presentations, each followed by 10 min discussion sections. Presentations may be accompanied by interactive demonstrations. The workshop will also include a 30 min panel discussion by presenters. TOPICS Topics include, but are not limited to, the following subject matters: - Virtualization in cluster and grid environments - Workload characterizations for VM-based clusters - VM cluster and grid architectures - Cluster reliability, fault-tolerance, and security - Compute job entry and scheduling - Compute workload load leveling - Cluster and grid filesystems for VMs - VMMs, VMs and QoS guarantees - Research and education use cases - VM cluster distribution algorithms - MPI, PVM on virtual machines - System sizing - Hardware support for virtualization - High-speed interconnects in hypervisors - Hypervisor extensions and utilities for cluster and grid computing - Network architectures for VM-based clusters - VMMs/Hypervisors on large SMP machines - Performance models - Performance management and tuning hosts and guest VMs - Power considerations - VMM performance tuning on various load types - Xen/other VMM cluster/grid tools - High-speed Device access from VMs - Management, deployment of clusters and grid environments with VMs - Information systems for virtualized clusters - Management of system images for virtual machines - Integration with relevant standards e.g. CIM, GLUE, OGF, etc. PAPER SUBMISSION Papers submitted to each workshop will be reviewed by at least two members of the program committee and external reviewers. Submissions should include abstract, key words, the e-mail address of the corresponding author, and must not exceed 10 pages, including tables and figures at a main font size no smaller than 11 point. Submission of a paper should be regarded as a commitment that, should the paper be accepted, at least one of the authors will register and attend the conference to present the work. Accepted papers will be published in the Springer LNCS series - the format must be according to the Springer LNCS Style. Initial submissions are in PDF, accepted papers will be requested to provided source files. http://www.springer.de/comp/lncs/authors.html Submission Link: http://www.edas.info/newPaper.php?c=6123&; IMPORTANT DATES February 4, 2008 - Abstract submissions due Full paper submission due: April 14, 2008 Acceptance notification: May 3, 2008 Camera-ready due: May 26, 2008 Conference: August 26-29, 2008 CHAIR Michael Alexander (chair), WU Vienna, Austria Stephen Childs (co-chair), Trinity College, Dublin, Ireland PROGRAM COMMITTEE Jussara Almeida, Federal University of Minas Gerais, Brasil Padmashree Apparao, Intel Corp., US Hassan Barada, Etisalat University College, UAE Volker Buege, University of Karlsruhe, Germany Simon Crosby, Xensource, UK Marcus Hardt, Forschungszentrum Karlsruhe, Germany Sverre Jarp, CERN, Switzerland Krishna Kant, Intel Corporation, US Yves Kemp, University of Karlsruhe, Germany Naoya Maruyama, Tokyo Institute of Technology, Japan Jean-Marc Menaud, Ecole des Mines de Nantes, France José E. Moreira, IBM Watson Research Center, US Yoshio Turner, HP Labs Andreas Unterkircher, CERN, Switzerland Dongyan Xu, Purdue University, US GENERAL INFORMATION The workshop will be held as part of Euro-Par 2008, Las Palmas de Gran Canaria, C
Re: [OMPI users] RPM build errors when creating multiple rpms
On Tuesday, 18 March 2008, at 12:15:34 (-0700), Christopher Irving wrote: > Now, if you removed line 651 and 653 from the new spec file it works > for both cases. You wont get the files listed twice error because > although you have the statement %dir %{_prefix} on line 649 you > never have a line with just %{_prefix}. So the _prefix directory > itself gets included but not all files underneath it. You've > handled that by explicitly including all files and sub directories > on lines 672-681 and in the runtime.file. The only package which should own %{_prefix} is something like setup or filesystem in the core OS package set. No openmpi RPM should ever own %{_prefix}, so it should never appear in %files, either by itself or with %dir. > Going back to the original spec file, the one that came with the > source RPM, the problems where kind of reversed. Building with the > 'install_in_opt 1' option worked just fine but when it wasn't set > you got the files listed twice error as I described in my original > message. "files listed twice" messages are not errors, per se, and can usually be safely ignored. Those who are truly bothered by them can always add %exclude directives if they so choose. Michael -- Michael Jennings Linux Systems and Cluster Admin UNIX and Cluster Computing Group
Re: [OMPI users] RPM build errors when creating multiple rpms
On Tuesday, 18 March 2008, at 18:18:36 (-0700), Christopher Irving wrote: > Well you're half correct. You're thinking that _prefix is always > defined as /usr. No, actually I'm not. :) > But in the case were install_in_opt is defined they have redefined > _prefix to be /opt/%{name}/%{version} in which case it is fine for > one of the openmpi rpms to claim that directory with a %dir > directive. Except that you should never do that. First off, RPMs should never install in /opt by default. Secondly, the correct way to support installing in /opt is to list the necessary prefixes in the RPM headers so that the --prefix option (or the --relocate option) may be used at install time. OpenMPI already has hooks (IIRC) for figuring things out intelligently based on invocation prefix, so it should fit quite nicely into this model. Obviously RPMs only intended for local use can do anything they want, but RPMs which install in /opt should never be redistributed. > However I think you missed the point. I'm not suggesting they need > to a %{_prefix} statement in the %files section, I'm just pointing > out what's not the source of the duplicated files. In other words > %dir %{_prefix} is not the same as %{_prefix} and wont cause all the > files in _prefix to be included. That's correct. > It can't be safely ignored when it causes rpm build to fail. The warning by itself should never cause rpmbuild to fail. If it does, the problem lies elsewhere. Nothing in either the rpm 4.4 nor 5 code can cause failure at that point. > Also you don't want to use an %exclude because that would prevent > the specified files from ever getting included which is not the > desired result. If you use %exclude in only one of the locations where the file is listed (presumably the "less correct" one), it will solve the problem. Michael -- Michael Jennings Linux Systems and Cluster Admin UNIX and Cluster Computing Group
Re: [OMPI users] cluster LiveCD
On Thursday, 07 August 2008, at 15:03:24 (-0400), Tim Mattox wrote: > I think a better approach than using NFS-root or LiveCDs is to use Perceus in > this situation, since it has been developed over many years to handle this > sort of thing (diskless/stateless beowulf clusters): > http://www.perceus.org/ > It leverages PXE booting so all you need to do on a per-node basis is enable > PXE booting in the BIOS. The primary limitation I see would be if your > windows machines are set up to use DHCP to get their IP addresses from > some server that is outside your control, since Perceus would need to take > over DHCP services to do its magic. At the risk of being slightly off-topic, Perceus actually has no problem working with a separate DHCP server. It has to be properly configured to hand out the payload, of course, but it works fine. Michael -- Michael Jennings Linux Systems and Cluster Admin UNIX and Cluster Computing Group Bldg 50B-3209E W: 510-495-2687 MS 050C-3396 F: 510-486-8615
[OMPI users] build problems - undefined reference to `lt_libltdlc_LTX_preloaded_symbols and libtool install
I'm building from todays svn co on an x86_64 centos 5 Linux 2.6.18-128.1.10.el5 #1 SMP using m4 (GNU M4) 1.4.13 automake (GNU automake) 1.11 autoconf (GNU Autoconf) 2.64 ltmain.sh (GNU libtool) 2.2.6 gcc (GCC) 4.3.2 and configured with ../configure --prefix=$HOME/openmpi --srcdir=.. --disable-mpi-f77 --disable-mpi-f90 and get libtool: link: gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -fvisibility=hidden -o opal_wrapper opal_wrapper.o ../../../opal/.libs/libopen-pal.a -ldl -lnsl -lutil -lm -pthread ../../../opal/.libs/libopen-pal.a(libltdlc_la-ltdl.o): In function `lt_dlinit': ltdl.c:(.text+0x10d3): undefined reference to `lt_libltdlc_LTX_preloaded_symbols' Is anyone familiar with this or what to do about it? If I try to avoid with ../configure --prefix=$HOME/openmpi --srcdir=.. --disable-mpi-f77 --disable-mpi-f90 --disable-dlopen I 'make -j 4' successfully but during 'make install' get /bin/sh ../../../libtool --mode=install /usr/bin/install -c opal_wrapper '/home/hines/openmpi/bin' ./opal_wrapper: line 1: ELF: command not found libtool: install: invalid libtool wrapper script `opal_wrapper' Hints on how to build on this machine are greatly welcome. I had the same problems when using openmpi-1.3.3.tar.gz and my normal development environment (less recent m4 and autotools, and gcc-4.1.2) Thanks, Michael
Re: [OMPI users] build problems - undefined referenceto `lt_libltdlc_LTX_preloaded_symbols and libtool install
Hello. Thanks! On Wed, 2009-09-02 at 10:51 +0300, Jeff Squyres wrote: > On Aug 27, 2009, at 8:34 PM, Michael Hines wrote: ... > > ltdl.c:(.text+0x10d3): undefined reference to > > `lt_libltdlc_LTX_preloaded_symbols' > > > > Hmm. This feels like a mismatch of libtool somehow... (ltdl is a > part of the larger Libtool package). Can you send all the information > listed here: > Enclosed. I started from openmpi-1.3.3.tar.gz and tar xzf ~/Desktop/openmpi-1.3.3.tar.gz cd openmpi-1.3.3 ./configure --prefix=$HOME/openmpi --disable-mpi-f77 --disable-mpi-f90 &> config.out make all &> make.out tar czf ompibld.tgz config.out config.log make.out ... > > /bin/sh ../../../libtool --mode=install /usr/bin/install -c > > opal_wrapper '/home/hines/openmpi/bin' > > ./opal_wrapper: line 1: ELF: command not found > > libtool: install: invalid libtool wrapper script `opal_wrapper' > This seems like an even bigger problem -- ELF is not a command, so how > it's trying to execute that seems pretty nebulous. I guess opal_wrapper is supposed to be a script that contains the executable but it turned out just to be the executable. If the earlier issue is resolved (presumably libtool related) this may go away as well. -Michael ompibld.tgz Description: application/compressed-tar
Re: [OMPI users] build problems - undefined referenceto `lt_libltdlc_LTX_preloaded_symbols and libtool install
On Wed, 2009-09-02 at 10:51 +0300, Jeff Squyres wrote: > On Aug 27, 2009, at 8:34 PM, Michael Hines wrote: > > libtool: link: gcc -O3 -DNDEBUG -finline-functions -fno-strict- > > aliasing > > -pthread -fvisibility=hidden -o opal_wrapper > > opal_wrapper.o ../../../opal/.libs/libopen-pal.a -ldl -lnsl -lutil - > > lm > > -pthread > > ../../../opal/.libs/libopen-pal.a(libltdlc_la-ltdl.o): In function > > `lt_dlinit': > > ltdl.c:(.text+0x10d3): undefined reference to > > `lt_libltdlc_LTX_preloaded_symbols' > > > > Hmm. This feels like a mismatch of libtool somehow... (ltdl is a > part of the larger Libtool package). I should mention that when I [hines@hines490 openmpi-1.3.3]$ for i in `find . -name \*.o -print` ; do echo $i ; nm $i |grep preloaded ; done the only files that have preloaded in them are ./opal/libltdl/libltdlc_la-preopen.o 0008 b default_preloaded_symbols b preloaded_symlists ./opal/libltdl/libltdlc_la-ltdl.o U lt_libltdlc_LTX_preloaded_symbols and I can't find anywhere that lt_libltdlc_LTX_preloaded_symbols is defined -Michael
[OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)
Dear readers, With OpenMPI, how would one go about requesting to load environment modules (of the http://modules.sourceforge.net/ kind) on remote nodes, augmenting those normally loaded there by shell dotfiles? Background: I run a RHEL-5/CentOS-5 cluster. I load a bunch of default modules through /etc/profile.d/ and recommend to users to customize modules in ~/.bashrc. A problem arises for PBS jobs which might need job-specific modules, e.g., to pick a specific flavor of an application. With other MPI implementations (ahem) which export all (or judiciously nearly all) environment variables by default, you can say: #PBS ... module load foo # not for OpenMPI mpirun -np 42 ... \ bar-app Not so with OpenMPI - any such customization is only effective for processes on the master (=local) node of the job, and any variables changed by a given module would have to be specifically passed via mpirun -x VARNAME. On the remote nodes, those variables are not available in the dotfiles because they are passed only once orted is live (after dotfile processing by the shell), which then immediately spawns the application binaries (right?) I thought along the following lines: (1) I happen to run Lustre, which would allow writing a file coherently across nodes prior to mpirun, and thus hook into the shell dotfile processing, but that seems rather crude. (2) "mpirun -x PATH -x LD_LIBRARY_PATH …" would take care of a lot, but is not really general. Is there a recommended way? regards, Michael
Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)
Hi David, Hmm, your demo is well-chosen and crystal-clear, yet the output is unexpected. I do not see environment vars passed by default here: login3$ qsub -l nodes=2:ppn=1 -I qsub: waiting for job 34683.mds01 to start qsub: job 34683.mds01 ready n102$ mpirun -n 2 -machinefile $PBS_NODEFILE hostname n102 n085 n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO n102$ export FOO=BAR n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO FOO=BAR n102$ type mpirun mpirun is hashed (/opt/soft/openmpi-1.3.2-intel10-1/bin/mpirun) Curious, what do you get upon: where mpirun I built OpenMPI-1.3.2 here from source with: CC=icc CXX=icpc FC=ifort F77=ifort \ LDFLAGS='-Wl,-z,noexecstack' \ CFLAGS='-O2 -g -fPIC' \ CXXFLAGS='-O2 -g -fPIC' \ FFLAGS='-O2 -g -fPIC' \ ./configure --prefix=$prefix \ --with-libnuma=/usr \ --with-openib=/usr \ --with-udapl \ --enable-mpirun-prefix-by-default \ --without-tm I did't find the behavior I saw strange, given that orterun(1) talks only about $OPMI_* and inheritance from the remote shell. It also mentions a "boot MCA module", about which I couldn't find much on open-mpi.org - hmm. In the meantime, I did find a possible solution, namely, to tell ssh to pass a variable using SendEnv/AcceptEnv. That variable is then seen by and can be interpreted (cautiously) in /etc/profile.d/ scripts. A user could set it in the job file (or even qalter it post submission): #PBS -v VARNAME=foo:bar:baz For VARNAME, I think simply "MODULES" or "EXTRAMODULES" could do. With best regards, Michael On Nov 17, 2009, at 4:29 , David Singleton wrote: > > I'm not sure why you dont see Open MPI behaving like other MPI's w.r.t. > modules/environment on remote MPI tasks - we do. > > xe:~ > qsub -q express -lnodes=2:ppn=8,walltime=10:00,vmem=2gb -I > qsub: waiting for job 376366.xepbs to start > qsub: job 376366.xepbs ready > > [dbs900@x27 ~]$ module load openmpi > [dbs900@x27 ~]$ mpirun -n 2 --bynode hostname > x27 > x28 > [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO > [dbs900@x27 ~]$ setenv FOO BAR > [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep FOO > FOO=BAR > FOO=BAR > [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep amber > [dbs900@x27 ~]$ module load amber > [dbs900@x27 ~]$ mpirun -n 2 --bynode env | grep amber > LOADEDMODULES=openmpi/1.3.3:amber/9 > PATH=/apps/openmpi/1.3.3/bin:/home/900/dbs900/bin:/bin:/usr/bin::/opt/bin:/usr/X11R6/bin:/opt/pbs/bin:/sbin:/usr/sbin:/apps/amber/9/exe > _LMFILES_=/apps/Modules/modulefiles/openmpi/1.3.3:/apps/Modules/modulefiles/amber/9 > AMBERHOME=/apps/amber/9 > LOADEDMODULES=openmpi/1.3.3:amber/9 > PATH=/apps/openmpi/1.3.3/bin:/home/900/dbs900/bin:/bin:/usr/bin:/opt/bin:/usr/X11R6/bin:/opt/pbs/bin:/sbin:/usr/sbin:/apps/amber/9/exe > _LMFILES_=/apps/Modules/modulefiles/openmpi/1.3.3:/apps/Modules/modulefiles/amber/9 > AMBERHOME=/apps/amber/9 > > David > > > Michael Sternberg wrote: >> Dear readers, >> With OpenMPI, how would one go about requesting to load environment modules >> (of the http://modules.sourceforge.net/ kind) on remote nodes, augmenting >> those normally loaded there by shell dotfiles? >> Background: >> I run a RHEL-5/CentOS-5 cluster. I load a bunch of default modules through >> /etc/profile.d/ and recommend to users to customize modules in ~/.bashrc. A >> problem arises for PBS jobs which might need job-specific modules, e.g., to >> pick a specific flavor of an application. With other MPI implementations >> (ahem) which export all (or judiciously nearly all) environment variables by >> default, you can say: >> #PBS ... >> module load foo # not for OpenMPI >> mpirun -np 42 ... \ >> bar-app >> Not so with OpenMPI - any such customization is only effective for processes >> on the master (=local) node of the job, and any variables changed by a given >> module would have to be specifically passed via mpirun -x VARNAME. On the >> remote nodes, those variables are not available in the dotfiles because they >> are passed only once orted is live (after dotfile processing by the shell), >> which then immediately spawns the application binaries (right?) >> I thought along the following lines: >> (1) I happen to run Lustre, which would allow writing a file coherently >> across nodes prior to mpirun, and thus hook into the shell dotfile >> processing, but that seems rather crude. >> (2) "mpirun -x PATH -x LD_LIBRARY_PATH …" would take care of a lot, but is >> not really general. >> Is there a recommended way? >> regards, >> Michael > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)
Hi, On Nov 17, 2009, at 9:10 , Ralph Castain wrote: > Not exactly. It completely depends on how Torque was setup - OMPI isn't > forwarding the environment. Torque is. I actually tried compiling OMPI with the tm interface a couple of versions back for both packages but ran into memory trouble, which is why I didn't pursue this. With torque-2.4.x out and OpenMPI getting close to 1.3.4 I'll try again. > We made a design decision at the very beginning of the OMPI project not to > forward non-OMPI envars unless directed to do so by the user. I'm afraid I > disagree with Michael's claim that other MPIs do forward them - yes, MPICH > does, but not all others do. > > The world is bigger than MPICH and OMPI :-) Yup, I saw your message from just last month http://www.open-mpi.org/community/lists/users/2009/10/10994.php ; I didn't mean to make a global claim :-) I'm aware that exporting environment variables (including $PWD) under MPI is implementation dependent. I just happened to have MPICH, Intel MPI (same roots), and OpenMPI on my cluster. > First, if you are using a managed environment like Torque, we recommend that > you work with your sys admin to decide how to configure it. This is the best > way to resolve a problem. Yeah, I wish that guy would know better and not have to ask around mailing lists :-) > Second, if you are not using a managed environment and/or decide not to have > that environment do the forwarding, you can tell OMPI to forward the envars > you need by specifying them via the -x cmd line option. We already have a > request to expand this capability, and I will be doing so as time permits. > One option I'll be adding is the reverse of -x - i.e., "forward all envars > -except- the specified one(s)". The issue with -x is that modules may set any random variable. The reverse option to -x would be great of course. MPICH2 and Intel MPI pass all but a few (known to be host-specific) variables by default, and counter that with "none" and "all" options. Thanks! Michael > HTH > ralph > > On Nov 17, 2009, at 5:55 AM, David Singleton wrote: > >> >> I can see the difference - we built Open MPI with tm support. For some >> reason, I thought mpirun fed its environment to orted (after orted is >> launched) so orted can pass it on to MPI tasks. That should be portable >> between different launch mechanisms. But it looks like tm launches >> orted with the full mpirun environment (at the request of mpirun). >> >> Cheers, >> David >> >> >> Michael Sternberg wrote: >>> Hi David, >>> Hmm, your demo is well-chosen and crystal-clear, yet the output is >>> unexpected. I do not see environment vars passed by default here: >>> login3$ qsub -l nodes=2:ppn=1 -I >>> qsub: waiting for job 34683.mds01 to start >>> qsub: job 34683.mds01 ready >>> n102$ mpirun -n 2 -machinefile $PBS_NODEFILE hostname >>> n102 >>> n085 >>> n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO >>> n102$ export FOO=BAR >>> n102$ mpirun -n 2 -machinefile $PBS_NODEFILE env | grep FOO >>> FOO=BAR >>> n102$ type mpirun >>> mpirun is hashed (/opt/soft/openmpi-1.3.2-intel10-1/bin/mpirun) >>> Curious, what do you get upon: >>> where mpirun >>> I built OpenMPI-1.3.2 here from source with: >>> CC=icc CXX=icpc FC=ifort F77=ifort \ >>> LDFLAGS='-Wl,-z,noexecstack' \ >>> CFLAGS='-O2 -g -fPIC' \ >>> CXXFLAGS='-O2 -g -fPIC' \ >>> FFLAGS='-O2 -g -fPIC' \ >>> ./configure --prefix=$prefix \ >>> --with-libnuma=/usr \ >>> --with-openib=/usr \ >>> --with-udapl \ >>> --enable-mpirun-prefix-by-default \ >>> --without-tm >>> I did't find the behavior I saw strange, given that orterun(1) talks only >>> about $OPMI_* and inheritance from the remote shell. It also mentions a >>> "boot MCA module", about which I couldn't find much on open-mpi.org - hmm. >>> In the meantime, I did find a possible solution, namely, to tell ssh to >>> pass a variable using SendEnv/AcceptEnv. That variable is then seen by and >>> can be interpreted (cautiously) in /etc/profile.d/ scripts. A user could >>> set it in the job file (or even qalter it post submission): >>> #PBS -v VARNAME=foo:bar:baz >>> For VARNAME, I think simply "MODULES" or "EXTRAMODULES" could do. >>> With best regards, >>> Michael >>> On Nov 17, 2009, at 4
Re: [OMPI users] custom modules per job (PBS/OpenMPI/environment-modules)
On Nov 17, 2009, at 10:17 , Michael Sternberg wrote: On Nov 17, 2009, at 9:10 , Ralph Castain wrote: Not exactly. It completely depends on how Torque was setup - OMPI isn't forwarding the environment. Torque is. I actually tried compiling OMPI with the tm interface a couple of versions back for both packages but ran into memory trouble, which is why I didn't pursue this. With torque-2.4.x out and OpenMPI getting close to 1.3.4 I'll try again. Follow-up: I recompiled OpenMPI-1.3.2 "--with-tm" (from torque-2.3.6) and, lo and behold, environment variables and modules now are passed across nodes, which thus includes custom modules loaded in the job file. Darn, that was an old hang-up! The variables passed do include (unsurprisingly) $HOSTNAME, but I can live with that: login4 $ qsub -l nodes=2:ppn=1 -I qsub: waiting for job 34717.mds01 to start qsub: job 34717.mds01 ready n102 $ mpirun hostname n102 n091 n102 $ mpirun env | grep HOSTNAME HOSTNAME=n102 HOSTNAME=n102 Ralph, David - thank you for the pointers! Michael