Re: [OMPI users] problems compiling openmpi-1.6 on some platforms
On Jun 7, 2012, at 10:27 AM, Siegmar Gross wrote: > thank you very much for your help. You were right with your suggestion > that one of our system commands is responsible for the segmentation > fault. After splitting the command in config.status I found out that > gawk was responsible. We installed the latest version and now > everything works fine. Thank you very much once more. Excellent -- glad you have it working! -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Question on ./configure error on Tru64unix (OSF1) v5.1B-6 for openmpi-1.6
To be honest, I don't think we've ever tested on Tru64, so I'm not surprised that it doesn't work. Indeed, I think that it is unlikely that we will ever support Tru64. :-( Sorry! On Jun 7, 2012, at 12:43 PM, wrote: > > Hello, > > I am having trouble with the *** Assembler section of the GNU autoconf > step in trying to build OpenMPI version 1.6 on an HP AlphaServer GS160 > running Tru64unix version 5.1B-6: > > # uname -a > OSF1 zozma.cts.cwu.edu V5.1 2650 alpha > > The output is of the ./configure run > zozma(bash)% ./configure --prefix=/usr/local/OpenMPI \ > --enable-shared --enable-static : > > ... > > *** Assembler > checking dependency style of gcc... gcc3 > checking for BSD- or MS-compatible name lister (nm)... /usr/local/bin/nm -B > checking the name lister (/usr/local/bin/nm -B) interface... BSD nm > checking for fgrep... /usr/local/bin/grep -F > checking if need to remove -g from CCASFLAGS... no > checking whether to enable smp locks... yes > checking if .proc/endp is needed... no > checking directive for setting text section... .text > checking directive for exporting symbols... .globl > checking for objdump... objdump > checking if .note.GNU-stack is needed... no > checking suffix for labels... : > checking prefix for global symbol labels... none > configure: error: Could not determine global symbol label prefix > > The ./config.log is appended. > > Can anyone provide some information or suggestions on how to resolve this > issue? > > Thank you for your assistance, > Bill Glessner - System programmer , Cenral Washington University > > ** -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] problems compiling openmpi-1.6 on some platforms
Hello, > >>> Unfortunately "cc" on Linux creates the following error. > >>> > >>> ln -s "../../../openmpi-1.6/opal/asm/generated/ > >>> atomic-ia32-linux-nongas.s" atomic-asm.S > >>> CPPAS atomic-asm.lo > >>> :19:0: warning: "__FLT_EVAL_METHOD__" redefined > >>> [enabled by default] > >>> :110:0: note: this is the location of the previous definition > >>> cpp: fatal error: -fuse-linker-plugin, but liblto_plugin.so not found > >>> compilation terminated. > >>> cc: cpp failed for atomic-asm.S > >>> make[2]: *** [atomic-asm.lo] Error 1 > >>> make[2]: Leaving directory `/.../opal/asm' > >>> make[1]: *** [all-recursive] Error 1 > >>> make[1]: Leaving directory `/.../opal' > >>> make: *** [all-recursive] Error 1 > >> > >> What compiler is "cc"? > > > > "Sun C 5.12" (Oracle Solaris Studio 12.3 for Linux). Do you need > > anything else? > > Ah. I will have to defer this to my Oracle brethren, then... Today I edited ".../openmpi-1.6-Linux.x86_64.64_cc/libtool, removed "|-fuse-linker-plugin" in line 6295 and started "config.status" once more. Afterwards I could compile and install Open MPI. Can somebody fix this problem in libtool? There was one more warning: log.make.Linux.x86_64.64_cc:configure: WARNING: unrecognized options: --enable-ltdl-convenience Furthermore there is a stale link: tyr src 627 ls /usr/local/openmpi-1.6_64_cc/share/man/man1/orteCC.1 ls: /usr/local/openmpi-1.6_64_cc/share/man/man1/orteCC.1: No such file or directory tyr src 628 ls -l /usr/local/openmpi-1.6_64_cc/share/man/man1/orteCC.1 lrwxrwxrwx1 root root9 Jun 8 13:34 /usr/local/openmpi-1.6_64_cc/share/man/man1/orteCC.1 -> ortec++.1 Should it be linked to mpic++.1? I found another warning with Sun C 5.12 which shouldn't be a problem. configure:54329: checking stdbool.h usability configure:54329: cc -c -O -DNDEBUG -m64 conftest.c >&5 "/usr/include/stdbool.h", line 42: #error: "Use of is valid only in a c99 compilation environment." cc: acomp failed for conftest.c configure:54329: $? = 2 configure: failed program was: | /* confdefs.h */ | #define PACKAGE_NAME "Open MPI" | #define PACKAGE_TARNAME "openmpi" | #define PACKAGE_VERSION "" ... | #include configure:54329: result: no configure:54329: checking stdbool.h presence configure:54329: cpp conftest.c configure:54329: $? = 0 configure:54329: result: yes configure:54329: WARNING: stdbool.h: present but cannot be compiled configure:54329: WARNING: stdbool.h: check for missing prerequisite headers? configure:54329: WARNING: stdbool.h: see the Autoconf documentation configure:54329: WARNING: stdbool.h: section "Present But Cannot Be Compiled" configure:54329: WARNING: stdbool.h: proceeding with the compiler's result configure:54329: checking for stdbool.h configure:54329: result: no configure:54341: checking if works configure:54374: result: no (don't have ) I wrote the above definitions and includes into a file and added a main function. cc -c -O -DNDEBUG -m64 stdbool_error.c "/usr/include/stdbool.h", line 42: #error: "Use of is valid only in a c99 compilation environment." cc: acomp failed for stdbool_error.c cc -c -xc99 -O -DNDEBUG -m64 stdbool_error.c We need "-xc99" if "stdbool.h" should be used. Kind regards Siegmar
[OMPI users] Bug when mixing sent types in version 1.6
Hi everybody, I have currently a bug when launching a very simple MPI program with mpirun, on connected nodes. This happens when I send an INT and then some CHAR strings from a master node to a worker node. Here is the minimal code to reproduce the bug : # include # include # include int main(int argc, char **argv) { int rank, size; const char someString[] = "Can haz cheezburgerz?"; MPI_Init(&argc, &argv); MPI_Comm_rank( MPI_COMM_WORLD, & rank ); MPI_Comm_size( MPI_COMM_WORLD, & size ); if ( rank == 0 ) { int len = strlen( someString ); int i; for( i = 1; i < size; ++i) { MPI_Send( &len, 1, MPI_INT, i, 0, MPI_COMM_WORLD ); MPI_Send( &someString, len+1, MPI_CHAR, i, 0, MPI_COMM_WORLD ); } } else { char buffer[ 128 ]; int receivedLen; MPI_Status stat; MPI_Recv( &receivedLen, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &stat ); printf( "[Worker] Length : %d\n", receivedLen ); MPI_Recv( buffer, receivedLen+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat); printf( "[Worker] String : %s\n", buffer ); } MPI_Finalize(); } I know that there is a better way to send a string, by giving a maximum buffer size at the second MPI_Recv, but there is no the main topic here. The launch works locally (i.e when the 2 processes are launched on one machine), but doesn't work when the 2 processes are dispatched in 2 machines through network (i.e one per host). In this case, the worker correctly reads the INT, and then master and worker block on the next call. I have no issue when sending only char strings or only numbers. This only happens when sending char strings then numbers, or in the other order. I'm using OpenMPI version 1.6, locally compiled. $ uname -a Linux trtp7097 2.6.32-220.13.1.el6.x86_64 #1 SMP Thu Mar 29 11:46:40 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/redhat-release Red Hat Enterprise Linux Workstation release 6.2 (Santiago) Is it a bad use of the framework or could it be a bug ? Thank you in advance. Benjamin
Re: [OMPI users] Bug when mixing sent types in version 1.6
On Jun 8, 2012, at 6:43 AM, BOUVIER Benjamin wrote: > # include > # include > # include > > int main(int argc, char **argv) > { >int rank, size; >const char someString[] = "Can haz cheezburgerz?"; > >MPI_Init(&argc, &argv); > >MPI_Comm_rank( MPI_COMM_WORLD, & rank ); >MPI_Comm_size( MPI_COMM_WORLD, & size ); > >if ( rank == 0 ) >{ >int len = strlen( someString ); >int i; >for( i = 1; i < size; ++i) >{ >MPI_Send( &len, 1, MPI_INT, i, 0, MPI_COMM_WORLD ); >MPI_Send( &someString, len+1, MPI_CHAR, i, 0, MPI_COMM_WORLD ); >} >} else { >char buffer[ 128 ]; >int receivedLen; >MPI_Status stat; >MPI_Recv( &receivedLen, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &stat ); >printf( "[Worker] Length : %d\n", receivedLen ); >MPI_Recv( buffer, receivedLen+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD, > &stat); >printf( "[Worker] String : %s\n", buffer ); >} > >MPI_Finalize(); > } I don't see anything obviously wrong with this code. > I know that there is a better way to send a string, by giving a maximum > buffer size at the second MPI_Recv, but there is no the main topic here. > The launch works locally (i.e when the 2 processes are launched on one > machine), but doesn't work when the 2 processes are dispatched in 2 machines > through network (i.e one per host). In this case, the worker correctly reads > the INT, and then master and worker block on the next call. That's very odd. > I have no issue when sending only char strings or only numbers. This only > happens when sending char strings then numbers, or in the other order. That's even more odd. Can you run standard benchmarks like MPI net pipe, and/or the OSU benchmarks? (across multiple nodes, that is) -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Question on ./configure error on Tru64unix (OSF1) v5.1B-6 for openmpi-1.6
Hi Bill, If you *really* have time, then you can go deep into the log, and find out why configure failed. It looks like configure failed when it tried to compile this code: .text # .gsym_test_func .globl .gsym_test_func .gsym_test_func: # .gsym_test_func configure:26752: result: none configure:26756: error: Could not determine global symbol label prefix May be it's a gcc thing?? Like your assembler is too old?? I tried it in Cygwin, which has gcc 3.4.4, and it seems to work fine (just copy the 5 lines of code above into a file and name it with the ".s" ext name. Then compile it with gcc and see if you can reproduce it. I was involved in a TOP500 project that uses AlphaServer SC ES45 nodes (a total of 4,096 cores), and it was the #2 in TOP500 a decade ago! It was fun back then... But I agree with Jeff, it is unlikely that Open MPI is going to work on Tru64 - all modern processors are much faster than Alpha and I believe even the TOP500 Alpha machines are all powered down (even the Earth Simulator is not on the TOP500 list anymore - that was the #1 back then!!). Rayson On Fri, Jun 8, 2012 at 7:07 AM, Jeff Squyres wrote: > To be honest, I don't think we've ever tested on Tru64, so I'm not surprised > that it doesn't work. Indeed, I think that it is unlikely that we will ever > support Tru64. :-( > > Sorry! > > > On Jun 7, 2012, at 12:43 PM, > wrote: > >> >> Hello, >> >> I am having trouble with the *** Assembler section of the GNU autoconf >> step in trying to build OpenMPI version 1.6 on an HP AlphaServer GS160 >> running Tru64unix version 5.1B-6: >> >> # uname -a >> OSF1 zozma.cts.cwu.edu V5.1 2650 alpha >> >> The output is of the ./configure run >> zozma(bash)% ./configure --prefix=/usr/local/OpenMPI \ >> --enable-shared --enable-static : >> >> ... >> >> *** Assembler >> checking dependency style of gcc... gcc3 >> checking for BSD- or MS-compatible name lister (nm)... /usr/local/bin/nm -B >> checking the name lister (/usr/local/bin/nm -B) interface... BSD nm >> checking for fgrep... /usr/local/bin/grep -F >> checking if need to remove -g from CCASFLAGS... no >> checking whether to enable smp locks... yes >> checking if .proc/endp is needed... no >> checking directive for setting text section... .text >> checking directive for exporting symbols... .globl >> checking for objdump... objdump >> checking if .note.GNU-stack is needed... no >> checking suffix for labels... : >> checking prefix for global symbol labels... none >> configure: error: Could not determine global symbol label prefix >> >> The ./config.log is appended. >> >> Can anyone provide some information or suggestions on how to resolve this >> issue? >> == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ http://blogs.scalablelogic.com/
[OMPI users] RE : Bug when mixing sent types in version 1.6
Hi Jeff, Thanks for your answer. I have downloaded the Netpipe benchmarks suite, launched `make mpi` and launched with mpirun the resulting executable. Here is an interesting fact : by launching this executable on 2 nodes, it works ; on 3 nodes, it blocks, I guess on connect. Each process is running on a core, on each machine, using 100% of one CPU ; but nothing else happens. I have to kill the program to quit. Setting the option -mca btl_base_verbose to 30 shows me that the last thing tried by each node is to connect to other nodes. May it be a network issue ? Thanks, -- Benjamin Bouvier De : users-boun...@open-mpi.org [users-boun...@open-mpi.org] de la part de Jeff Squyres [jsquy...@cisco.com] Date d'envoi : vendredi 8 juin 2012 16:30 À : Open MPI Users Objet : Re: [OMPI users] Bug when mixing sent types in version 1.6 On Jun 8, 2012, at 6:43 AM, BOUVIER Benjamin wrote: > # include > # include > # include > > int main(int argc, char **argv) > { >int rank, size; >const char someString[] = "Can haz cheezburgerz?"; > >MPI_Init(&argc, &argv); > >MPI_Comm_rank( MPI_COMM_WORLD, & rank ); >MPI_Comm_size( MPI_COMM_WORLD, & size ); > >if ( rank == 0 ) >{ >int len = strlen( someString ); >int i; >for( i = 1; i < size; ++i) >{ >MPI_Send( &len, 1, MPI_INT, i, 0, MPI_COMM_WORLD ); >MPI_Send( &someString, len+1, MPI_CHAR, i, 0, MPI_COMM_WORLD ); >} >} else { >char buffer[ 128 ]; >int receivedLen; >MPI_Status stat; >MPI_Recv( &receivedLen, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &stat ); >printf( "[Worker] Length : %d\n", receivedLen ); >MPI_Recv( buffer, receivedLen+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD, > &stat); >printf( "[Worker] String : %s\n", buffer ); >} > >MPI_Finalize(); > } I don't see anything obviously wrong with this code. > I know that there is a better way to send a string, by giving a maximum > buffer size at the second MPI_Recv, but there is no the main topic here. > The launch works locally (i.e when the 2 processes are launched on one > machine), but doesn't work when the 2 processes are dispatched in 2 machines > through network (i.e one per host). In this case, the worker correctly reads > the INT, and then master and worker block on the next call. That's very odd. > I have no issue when sending only char strings or only numbers. This only > happens when sending char strings then numbers, or in the other order. That's even more odd. Can you run standard benchmarks like MPI net pipe, and/or the OSU benchmarks? (across multiple nodes, that is) -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] RE : Bug when mixing sent types in version 1.6
On Jun 8, 2012, at 8:51 AM, BOUVIER Benjamin wrote: > I have downloaded the Netpipe benchmarks suite, launched `make mpi` and > launched with mpirun the resulting executable. > > Here is an interesting fact : by launching this executable on 2 nodes, it > works ; on 3 nodes, it blocks, I guess on connect. Netpipe is only intended for 2 processes -- I'm actually not sure offhand what happens if you run it with 3... > Each process is running on a core, on each machine, using 100% of one CPU ; > but nothing else happens. I have to kill the program to quit. This is to be expected. OMPI polls aggressively for network progress (i.e., consumes 100% of a core). > Setting the option -mca btl_base_verbose to 30 shows me that the last thing > tried by each node is to connect to other nodes. We don't output verbose messages for MPI traffic, so the lack of messages there doesn't mean anything. I'd guess that running net pipe with 3 procs may be undefined. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/