Re: [OMPI users] Help: OpenMPI Compilation in Raspberry Pi
Hi all, sorry to reply this thread so late. I tried and it works well. However, it takes me about 12 hrs to compile he while package so I gonna cross-compile in my laptop w/ proper toolchain I created. Here's the command line I used. ./configure --build=x86_64-redhat-linux --host=arm-unknown-linux-gnueabi CFLAGS="-Ofast -mfpu=vfp -mfloat-abi=hard -march=armv6zk -mtune=arm1176jzf-s" .. *** Assembler checking dependency style of arm-unknown-linux-gnueabi-gcc... gcc3 checking for BSD- or MS-compatible name lister (nm)... /home/huli/Projects/arm-devel/bin/arm-unknown-linux-gnueabi-nm -B checking the name lister (/home/huli/Projects/arm-devel/bin/arm-unknown-linux-gnueabi-nm -B) interface... BSD nm checking for fgrep... /bin/grep -F checking if need to remove -g from CCASFLAGS... no checking whether to enable smp locks... yes checking if .proc/endp is needed... no checking directive for setting text section... .text checking directive for exporting symbols... .globl checking for objdump... objdump checking if .note.GNU-stack is needed... yes checking suffix for labels... : checking prefix for global symbol labels... checking prefix for lsym labels... .L checking prefix for function in .type... # checking if .size is needed... yes checking if .align directive takes logarithmic value... yes configure: error: No atomic primitives available for arm-unknown-linux-gnueabi .. Do we have any way to fix that? Thanks. On Sat, Jan 12, 2013 at 3:14 AM, Jeff Squyres (jsquyres) wrote: > Ok, I was able to configure and run successfully on my Raspberry Pi with: > > ./configure CCASFLAGS=-march=armv7-a ... > > Is that something we should put on a FAQ page? > > > > On Jan 11, 2013, at 7:11 AM, George Bosilca wrote: > >> This one belong to arm7 instruction set. Please try one of the following >> `armv7', `armv7-a', `armv7-r'. >> >> George. >> >> >> On Jan 11, 2013, at 00:38 , Jeff Squyres (jsquyres) >> wrote: >> >>> Sadly, none of these solutions worked for me on my RPi: >>> >>> - >>> pi@raspberrypi ~/openmpi-1.6.3/opal/asm $ make CCASFLAGS=-mcpu=arm1176jzf-s >>> CPPAS atomic-asm.lo >>> atomic-asm.S: Assembler messages: >>> atomic-asm.S:7: Error: selected processor does not support ARM mode `dmb' >>> atomic-asm.S:15: Error: selected processor does not support ARM mode `dmb' >>> atomic-asm.S:23: Error: selected processor does not support ARM mode `dmb' >>> atomic-asm.S:55: Error: selected processor does not support ARM mode `dmb' >>> atomic-asm.S:70: Error: selected processor does not support ARM mode `dmb' >>> make: *** [atomic-asm.lo] Error 1 >>> pi@raspberrypi ~/openmpi-1.6.3/opal/asm $ make CCASFLAGS=-march=armv6zk >>> CPPAS atomic-asm.lo >>> atomic-asm.S: Assembler messages: >>> atomic-asm.S:7: Error: selected processor does not support ARM mode `dmb' >>> atomic-asm.S:15: Error: selected processor does not support ARM mode `dmb' >>> atomic-asm.S:23: Error: selected processor does not support ARM mode `dmb' >>> atomic-asm.S:55: Error: selected processor does not support ARM mode `dmb' >>> atomic-asm.S:70: Error: selected processor does not support ARM mode `dmb' >>> make: *** [atomic-asm.lo] Error 1 >>> pi@raspberrypi ~/openmpi-1.6.3/opal/asm $ make CCASFLAGS=-march=argv6k >>> CPPAS atomic-asm.lo >>> cc1: error: bad value (argv6k) for -march switch >>> make: *** [atomic-asm.lo] Error 1 >>> pi@raspberrypi ~/openmpi-1.6.3/opal/asm $ >>> - >>> >>> Although I'm using a bit different system than the original user cited (I'm >>> running the latest Raspbian distro): >>> >>> - >>> pi@raspberrypi ~/openmpi-1.6.3/opal/asm $ uname -a >>> Linux raspberrypi 3.2.27+ #250 PREEMPT Thu Oct 18 19:03:02 BST 2012 armv6l >>> GNU/Linux >>> pi@raspberrypi ~/openmpi-1.6.3/opal/asm $ gcc --version >>> gcc (Debian 4.6.3-12+rpi1) 4.6.3 >>> - >>> >>> On Jan 10, 2013, at 5:39 PM, George Bosilca >>> wrote: >>> A little bit of google shows that this is a known issue. ldrex and strex are not included in the default instruction set gcc uses (arm6). One has to add the compile flag "-march=argv6k" to successfully compiles. George. PS: For more info: http://www.raspberrypi.org/phpBB3/viewtopic.php?f=9&t=4256&start=250 On Jan 10, 2013, at 16:20 , Jeff Squyres (jsquyres) wrote: > Mmmm. Let's rope in our ARM expert here... > > Leif, do you know what the issue is here? > > > On Jan 3, 2013, at 4:28 AM, Lee Eric wrote: > >> Hi, >> >> I am going to compile OpenMPI 1.6.3 in Raspberry Pi and encounter >> following errors. >> >> make[2]: Entering directory `/root/openmpi-1.6.3/opal' >> CC class/opal_bitmap.lo >> CC class/opal_free_list.lo >> CC class/opal_hash_table.lo >> CC class/opal_list.lo >> CC class/opal_object.lo >> /tmp/ccniCtj0.s: Assembler messages: >> /tmp/ccniCtj0.s:83: Error: selected processor does not support ARM mode >> `ldrex r3,[r1]
Re: [OMPI users] Error running program : mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor
Sorry! I removed the mails so I have to post another one. I stopped the iptables on the three nodes. Ping it's working OK (pruebaborja to clienteprueba / clienteprueba to pruebaborja). My /etc/networks/interfaces - node: pruebaborja Masternode #The loopback network interface auto lo iface lo inet loopback #The primary network interface auto eth0 iface eth0 inet dhcp clienteprueba and clientepruebados auto lo ifface lo inet loopback My interface is Auto (eth0) on the three nodes. Do you want to see "ifconfig" also? Thank you again or answer
[OMPI users] Possible memory leak(s) in OpenMPI 1.6.3?
Dear Developers, I am running into memory problems when creating/allocating MPI's window and its memory frequently. Below is listed a sample code reproducing the problem: #include #include #define NEL8 #define NTIMES 100 int main (int argc,char *argv[]) { int i; doublew[NEL]; MPI_Aint win_size,warr_size; MPI_Win *win; win_size=sizeof(MPI_Win); warr_size=sizeof(MPI_DOUBLE)*NEL; MPI_Init (&argc, &argv); for(i=0;i/* C Example */ #include #include #define NEL8 #define NTIMES 100 int main (int argc,char *argv[]) { int i; doublew[NEL]; MPI_Aint win_size,warr_size; MPI_Win *win; win_size=sizeof(MPI_Win); warr_size=sizeof(MPI_DOUBLE)*NEL; MPI_Init (&argc, &argv); for(i=0;i massif.out.15028 Description: massif.out.15028
Re: [OMPI users] help me understand these error msgs
On Wed, 16 Jan 2013 07:46:41 -0800 Ralph Castain wrote: > This one means that a backend node lost its connection to mpirun. We use a > TCP socket between the daemon on a node and mpirun to launch the processes > and to detect if/when that node fails for some reason. Hm. And what would be the reasons for this? Too much load on node where mpirun is run? -- Jure Pečar http://jure.pecar.org
[OMPI users] OMPI 1.6.3, InfiniBand and MTL MXM; unable to make it work!
I tried building from OMPI 1.6.3 tarball with the following ./configure: ./configure --prefix=/apotto/home1/homedirs/fsimula/Lavoro/openmpi-1.6.3/install/ \ --disable-mpi-io \ --disable-io-romio \ --enable-dependency-tracking \ --without-slurm \ --with-platform=optimized \ --disable-mpi-f77 \ --disable-mpi-f90 \ --with-openib \ --disable-static \ --enable-shared \ --disable-vt \ --enable-pty-support \ --enable-mca-no-build=btl-ofud,pml-bfo \ --with-mxm=/opt/mellanox/mxm \ --with-mxm-libdir=/opt/mellanox/mxm/lib As you can see from the last two lines, I want to enable the MXM transport layer on a cluster made of SuperMicro X8DTG-D boards with dual Xeons and Mellanox MT26428 HCAs; the OS is CentOS 5.8. I tried with two different .rpm's for MXM, either 'mxm-1.1.ad085ef-1.x86_64-centos5u7.rpm' taken from here: http://www.mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar and 'mxm-1.5.f583875-1.x86_64-centos5u7.rpm' taken from here: http://www.mellanox.com/downloads/hpc/mxm/v1.5/mxm-latest.tar With both, even if the compilation concludes successfully, a simple test (osu_bw from the OSU Micro-Benchmarks 3.8) fails with the sort of message reported below; the lines: rdma_dev.c:122 MXM DEBUG Port 1 on mlx4_0 has a link layer different from IB. Skipping it rdma_dev.c:155 MXM ERROR An active IB port on a Mellanox device, with lid [any] gid [any] not found make it seem like it cannot access the HW for the HCA: is that so? The very same test works when using '-mca pml ob1' (thus using the openib BTL). I'm quite ready to start pulling my hair; any suggestions? The output of /usr/bin/ibv_devinfo for the two cluster nodes follows: [cut] hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.7.000 node_guid: 0025:90ff:ff07:0ac4 sys_image_guid: 0025:90ff:ff07:0ac7 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: SM_106101000 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu:2048 (4) active_mtu: 2048 (4) sm_lid: 4 port_lid: 6 port_lmc: 0x00 [/cut] [cut] hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.7.000 node_guid: 0025:90ff:ff07:0acc sys_image_guid: 0025:90ff:ff07:0acf vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: SM_106101000 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu:2048 (4) active_mtu: 2048 (4) sm_lid: 4 port_lid: 8 port_lmc: 0x00 [/cut] The complete output of the failing test follows: [fsimula@agape5 osu-micro-benchmarks-3.8]$ mpirun -x MXM_LOG_LEVEL=poll -mca pml cm -mca mtl_mxm_np 1 -np 2 -host agape4,agape5 install/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw H H [1358430343.266782] [agape5:8596 :0] config_parser.c:168 MXM DEBUG [1358430343.266815] [agape5:8596 :0] config_parser.c:168 MXM DEBUG default: MXM_HANDLE_ERRORS=bt [1358430343.266826] [agape5:8596 :0] config_parser.c:168 MXM DEBUG default: MXM_GDB_PATH=/usr/bin/gdb [1358430343.266838] [agape5:8596 :0] config_parser.c:168 MXM DEBUG default: MXM_DUMP_SIGNO=1 [1358430343.266851] [agape5:8596 :0] config_parser.c:168 MXM DEBUG default: MXM_DUMP_LEVEL=conn [1358430343.266924] [agape5:8596 :0] config_parser.c:168 MXM DEBUG default: MXM_ASYNC_MODE=THREAD [1358430343.266936] [agape5:8596 :0] config_parser.c:168 MXM DEBUG default: MXM_TIME_ACCURACY=0.1 [1358430343.266956] [agape5:8596 :0] config_parser.c:168 MXM DEBUG default: MXM_PTLS=self,shm,rdma [1358430343.267249] [agape5:8596 :0] mpool.c:265 MXM DEBUG mpool 'ptl_self_recv_ev': allocated chunk 0xc075f40 of 96016 bytes with 1000 elements [1358430343.267308] [agape5:8596 :0] mpool.c:156 MXM DEBUG mpool 'ptl_self_recv_ev': align 16, maxelems 1000, elemsize 88, padding 8 [1358430343.267316] [agape5:8596 :0] self.c:410 MXM DEBUG Created ptl_self [1358430343.267333] [agape5:8596 :0] shm_ptl.c:56 MXM DEBUG Created ptl_shm [1358430343.268457] [agape5:8596 :0] rdma_ptl.c:65 MXM TRACE Got 1 IB devices [1358430343.268640] [agape5:8596 :0] rdma_ptl.c:112 MXM DEBUG added device mlx4_0 [1358430343
Re: [OMPI users] help me understand these error msgs
On Jan 17, 2013, at 2:25 AM, Jure Pečar wrote: > On Wed, 16 Jan 2013 07:46:41 -0800 > Ralph Castain wrote: > >> This one means that a backend node lost its connection to mpirun. We use a >> TCP socket between the daemon on a node and mpirun to launch the processes >> and to detect if/when that node fails for some reason. > > Hm. And what would be the reasons for this? Too much load on node where > mpirun is run? No, the error means the connection was completely lost - i.e., the socket was closed. Do I understand correctly that the job runs for awhile and then dies? So there are processes executing on the node that reports a lost connection? Or is this happening on startup of the larger job, or during a call to MPI_Comm_spawn? > > -- > > Jure Pečar > http://jure.pecar.org > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Error running program : mca_oob_tcp_msg_send_handler: writev:failed: Bad file descriptor
Configure OMPI with --enable-debug, and then run mpirun -n 1 -host clienteprueba -mca plm_base_verbose 5 hostname You should see a daemon getting launched and successfully reporting back to mpirun, and then the application getting launched on the remote node. On Jan 17, 2013, at 1:25 AM, borja mf wrote: > Sorry! I removed the mails so I have to post another one. > > I stopped the iptables on the three nodes. Ping it's working OK (pruebaborja > to clienteprueba / clienteprueba to pruebaborja). > > My /etc/networks/interfaces - node: > > pruebaborja Masternode > #The loopback network interface > auto lo > iface lo inet loopback > #The primary network interface > auto eth0 > iface eth0 inet dhcp > > clienteprueba and clientepruebados > auto lo > ifface lo inet loopback > > My interface is Auto (eth0) on the three nodes. > Do you want to see "ifconfig" also? > Thank you again or answer > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Problem with mpirun for java codes
Just as an FYI: we have removed the Java bindings from the 1.7.0 release due to all the reported errors - looks like that code just isn't ready yet for release. It remains available on the nightly snapshots of the developer's trunk while we continue to debug it. With that said, I tried your example using the current developer's trunk - and it worked just fine. I ran it on a single node, however. Were you running this across multiple nodes? Is it possible that the "classes" directory wasn't available on the remote node? On Jan 16, 2013, at 4:17 PM, Karos Lotfifar wrote: > Hi, > The version that I am using is > > 1.7rc6 (pre-release) > > > Regards, > Karos > > On 16 Jan 2013, at 21:07, Ralph Castain wrote: > >> Which version of OMPI are you using? >> >> >> On Jan 16, 2013, at 11:43 AM, Karos Lotfifar wrote: >> >>> Hi, >>> >>> I am still struggling with the installation problems! I get very strange >>> errors. everything is fine when I run OpenMPI for C codes, but when I try >>> to run a simple java code I get very strange error. The code is as simple >>> as the following and I can not get it running: >>> >>> import mpi.*; >>> >>> class JavaMPI { >>> public static void main(String[] args) throws MPIException { >>> MPI.Init(args); >>> System.out.println("Hello world from rank " + >>> MPI.COMM_WORLD.Rank() + " of " + >>> MPI.COMM_WORLD.Size() ); >>> MPI.Finalize(); >>> } >>> } >>> >>> everything is ok with mpijavac, my java code, etc. when I try to run the >>> code with the following command: >>> >>> /usr/local/bin/mpijavac -d classes JavaMPI.java --> FINE >>> /usr/local/bin/mpirun -np 2 java -cp ./classes JavaMPI --> *ERROR* >>> >>> I'll the following error. Could you please help me about this (As I >>> mentioned the I can run C MPI codes without any problem ). The system >>> specifications are: >>> >>> JRE version: 6.0_30-b12 (java-sun-6) >>> OS: Linux 3.0.0-30-generic-pae #47-Ubuntu >>> CPU:total 4 (2 cores per cpu, 2 threads per core) family 6 model 42 >>> stepping 7, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, >>> popcnt, ht >>> >>> >>> >>> >>> ## >>> # >>> # A fatal error has been detected by the Java Runtime Environment: >>> # >>> # SIGSEGV# >>> # A fatal error has been detected by the Java Runtime Environment: >>> # >>> # SIGSEGV (0xb) at pc=0x70e1dd12, pid=28616, tid=3063311216 >>> # >>> (0xb) at pc=0x70f61d12, pid=28615, tid=3063343984 >>> # >>> # JRE version: 6.0_30-b12 >>> # JRE version: 6.0_30-b12 >>> # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 ) >>> # Problematic frame: >>> # C [libmpi.so.1+0x20d12] unsigned __int128+0xa2 >>> # >>> # An error report file with more information is saved as: >>> # /home/karos/hs_err_pid28616.log >>> # Java VM: Java HotSpot(TM) Server VM (20.5-b03 mixed mode linux-x86 ) >>> # Problematic frame: >>> # C [libmpi.so.1+0x20d12] unsigned __int128+0xa2 >>> # >>> # An error report file with more information is saved as: >>> # /home/karos/hs_err_pid28615.log >>> # >>> # If you would like to submit a bug report, please visit: >>> # http://java.sun.com/webapps/bugreport/crash.jsp >>> # The crash happened outside the Java Virtual Machine in native code. >>> # See problematic frame for where to report the bug. >>> # >>> [tulips:28616] *** Process received signal *** >>> [tulips:28616] Signal: Aborted (6) >>> [tulips:28616] Signal code: (-6) >>> [tulips:28616] [ 0] [0xb777840c] >>> [tulips:28616] [ 1] [0xb7778424] >>> [tulips:28616] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75e3cff] >>> [tulips:28616] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0xb75e7325] >>> [tulips:28616] [ 4] >>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dcf7f) >>> [0xb6f6df7f] >>> [tulips:28616] [ 5] >>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x724897) >>> [0xb70b5897] >>> [tulips:28616] [ 6] >>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(JVM_handle_linux_signal+0x21c) >>> [0xb6f7529c] >>> [tulips:28616] [ 7] >>> /usr/lib/jvm/java-6-sun-1.6.0.30/jre/lib/i386/server/libjvm.so(+0x5dff64) >>> [0xb6f70f64] >>> [tulips:28616] [ 8] [0xb777840c] >>> [tulips:28616] [ 9] [0xb3891548] >>> [tulips:28616] *** End of error message *** >>> [tulips:28615] *** Process received signal *** >>> [tulips:28615] Signal: Aborted (6) >>> [tulips:28615] Signal code: (-6) >>> # >>> # If you would like to submit a bug report, please visit: >>> # http://java.sun.com/webapps/bugreport/crash.jsp >>> # The crash happened outside the Java Virtual Machine in native code. >>> # See problematic frame for where to report the bug. >>> # >>> [tulips:28615] [ 0] [0xb778040c] >>> [tulips:28615] [ 1] [0xb7780424] >>> [tulips:28615] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0xb75ebcff] >>> [tulips:28615] [
Re: [OMPI users] Help: OpenMPI Compilation in Raspberry Pi
On Jan 16, 2013, at 6:41 AM, Leif Lindholm wrote: > That isn't, technically speaking, correct for the Raspberry Pi - but it is a > workaround if you know you will never actually use the asm implementations of > the atomics, but only the inline C ones.. > > This sort of hides the problem that the dedicated barrier instructions were > not available in ARMv6 (it used "system control coprocessor operations" > instead. > > If you ever executed the asm implementation, you would trigger an undefined > instruction exception on the Pi. Hah; sweet. Ok. So what's the right answer? Would it be acceptable to use a no-op for this operation on such architectures? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/