[OMPI users] slot problem on "SUSE Linux, Enterprise Server 12 (x86_64)"
Hi, yesterday I installed openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.13 and gcc-5.3.0. The following programs don't run anymore. loki hello_2 112 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute" OPAL repo revision: v1.10.2-176-g9d45e07 C compiler absolute: /opt/solstudio12.4/bin/cc loki hello_2 113 mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki hello_2_slave_mpi -- There are not enough slots available in the system to satisfy the 2 slots that were requested by the application: hello_2_slave_mpi Either request fewer slots for your application, or make more slots available for use. -- loki hello_2 114 Everything worked as expected with openmpi-v1.10.0-178-gb80f802. loki hello_2 114 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute" OPAL repo revision: v1.10.0-178-gb80f802 C compiler absolute: /opt/solstudio12.4/bin/cc loki hello_2 115 mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki hello_2_slave_mpi Process 0 of 3 running on loki Process 1 of 3 running on loki Process 2 of 3 running on loki Now 2 slave tasks are sending greetings. Greetings from task 2: message type:3 ... I have the same problem with openmpi-v2.x-dev-1404-g74d8ea0, if I use the following commands. mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki hello_2_slave_mpi mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,nfs1 hello_2_slave_mpi mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi I have also the same problem with openmpi-dev-4010-g6c9d65c, if I use the following command. mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki hello_2_slave_mpi openmpi-dev-4010-g6c9d65c works as expected with the following commands. mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,nfs1 hello_2_slave_mpi mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki --slot-list 0:0-5,1:0-5 hello_2_slave_mpi Has the interface changed so that I'm not allowed to use some of my commands any longer? I would be grateful, if somebody can fix the problem if it is a problem. Thank you very much for any help in advance. Kind regards Siegmar /* Another MPI-version of the "hello world" program, which delivers * some information about its machine and operating system. In this * version the functions "master" and "slave" from "hello_1_mpi.c" * are implemented as independant processes. This is the file for the * "master". * * * Compiling: * Store executable(s) into local directory. * mpicc -o * * Store executable(s) into predefined directories. * make * * Make program(s) automatically on all specified hosts. You must * edit the file "make_compile" and specify your host names before * you execute it. * make_compile * * Running: * LAM-MPI: * mpiexec -boot -np * or * mpiexec -boot \ * -host -np : \ * -host -np * or * mpiexec -boot [-v] -configfile * or * lamboot [-v] [] * mpiexec -np * or * mpiexec [-v] -configfile * lamhalt * * OpenMPI: * "host1", "host2", and so on can all have the same name, * if you want to start a virtual computer with some virtual * cpu's on the local host. The name "localhost" is allowed * as well. * * mpiexec -np * or * mpiexec --host \ * -np * or * mpiexec -hostfile \ * -np * or * mpiexec -app * * Cleaning: * local computer: * rm * or * make clean_all * on all specified computers (you must edit the file "make_clean_all" * and specify your host names before you execute it. * make_clean_all * * * File: hello_2_mpi.c Author: S. Gross * Date: 01.10.2012 * */ #include #include #include #include "mpi.h" #define BUF_SIZE 255 /* message buffer size */ #define MAX_TASKS 12 /* max. number of tasks */ #define SENDTAG 1 /* send message command */ #define EXITTAG 2 /* termination command */ #define MSGTAG 3 /* normal message token */ #define ENTASKS -1 /* error: too many tasks */ int main (int argc, char *argv[]) { int mytid,/* my task id */ ntasks,/* number of parallel tasks */ namelen,/* length of processor name */ num,/* number of chars in buffer */ i;/* loop variable */ char processor_name[MPI_MAX_PROCESSOR_NAME], buf[BUF_SIZE + 1]; /* message buffer (+1 for '\0') */ MPI_Status stat; /* message details */ MPI_Init (&argc, &argv); MPI_Comm_rank (MPI_COMM_WORLD, &mytid); MPI_Comm_size (MPI_COMM_WORLD, &ntasks); MPI_Get_processor_name (processor_name, &namelen); /* With the next statement every process executing this code will
[OMPI users] problem compiling Java programs with openmpi-v1.10.2-176-g9d45e07
Hi, yesterday I installed openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.13 and gcc-5.3.0. Unfortunately I have a problem compiling Java programs. loki java 124 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute" OPAL repo revision: v1.10.2-176-g9d45e07 C compiler absolute: /opt/solstudio12.4/bin/cc loki java 125 mpijavac BcastIntMain.java BcastIntMain.java:44: error: cannot find symbol mytid = MPI.COMM_WORLD.getRank (); ^ symbol: variable COMM_WORLD location: class MPI BcastIntMain.java:52: error: cannot find symbol MPI.COMM_WORLD.bcast (intValue, 1, MPI.INT, 0); ^ symbol: variable INT location: class MPI BcastIntMain.java:52: error: cannot find symbol MPI.COMM_WORLD.bcast (intValue, 1, MPI.INT, 0); ^ symbol: variable COMM_WORLD location: class MPI 3 errors loki java 126 loki java 110 dir /usr/local/openmpi-1.10.3_64_cc/lib64/*.jar -rw-r--r-- 1 root root 60876 May 6 13:05 /usr/local/openmpi-1.10.3_64_cc/lib64/mpi.jar loki java 111 javac -version javac 1.8.0_66 loki java 112 I have the same problem with openmpi-v2.x-dev-1404-g74d8ea0 and with openmpi-dev-4010-g6c9d65c and I would be grateful, if somebody can fix the problem. Thank you very much for any help in advance. Kind regards Siegmar /* Small program that distributes an integer value with a * broadcast operation. * * Java uses call-by-value and doesn't support call-by-reference * for method parameters with the only exception of object references. * Therefore you must use an array with just one element, if you * want to send/receive/broadcast/... primitive datatypes. * * "mpijavac" and Java-bindings are available in "Open MPI * version 1.7.4" or newer. * * * Class file generation: * mpijavac BcastIntMain.java * * Usage: * mpiexec [parameters] java [parameters] BcastIntMain * * Examples: * mpiexec -np 2 java BcastIntMain * mpiexec -np 2 java -cp $HOME/mpi_classfiles BcastIntMain * * * File: BcastIntMain.java Author: S. Gross * Date: 09.09.2013 * */ import mpi.*; public class BcastIntMain { static final int SLEEP_FACTOR = 200; /* 200 ms to get ordered output */ public static void main (String args[]) throws MPIException, InterruptedException { int mytid; /* my task id */ intintValue[] = new int[1]; /* broadcast one intValue */ String processorName; /* name of local machine */ MPI.Init (args); processorName = MPI.getProcessorName (); mytid = MPI.COMM_WORLD.getRank (); intValue[0] = -1; if (mytid == 0) { /* initialize data item */ intValue[0] = 1234567; } /* broadcast value to all processes */ MPI.COMM_WORLD.bcast (intValue, 1, MPI.INT, 0); /* Each process prints its received data item. The outputs * can intermingle on the screen so that you must use * "-output-filename" in Open MPI. */ Thread.sleep (SLEEP_FACTOR * mytid); /* sleep to get ordered output */ System.out.printf ("\nProcess %d running on %s.\n" + " intValue: %d\n", mytid, processorName, intValue[0]); MPI.Finalize (); } }
[OMPI users] slot-list breaks for openmpi-v1.10.2-176-g9d45e07 on "SUSE Linux, Enterprise Server 12 (x86_64)"
Hi, yesterday I installed openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.13 and gcc-5.3.0. Unfortunately I have a problem with one of my spawn programs. loki spawn 129 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute" OPAL repo revision: v1.10.2-176-g9d45e07 C compiler absolute: /opt/solstudio12.4/bin/cc loki spawn 130 mpiexec -np 1 --host loki,loki,loki,loki,loki spawn_master Parent process 0 running on loki I create 4 slave processes Parent process 0: tasks in MPI_COMM_WORLD:1 tasks in COMM_CHILD_PROCESSES local group: 1 tasks in COMM_CHILD_PROCESSES remote group: 4 Slave process 1 of 4 running on loki Slave process 2 of 4 running on loki Slave process 3 of 4 running on loki Slave process 0 of 4 running on loki spawn_slave 0: argv[0]: spawn_slave spawn_slave 1: argv[0]: spawn_slave spawn_slave 2: argv[0]: spawn_slave spawn_slave 3: argv[0]: spawn_slave loki spawn 131 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master Parent process 0 running on loki I create 4 slave processes [loki:02080] *** Process received signal *** [loki:02080] Signal: Segmentation fault (11) [loki:02080] Signal code: Address not mapped (1) [loki:02080] Failing at address: (nil) *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, ***and potentially your MPI job) *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, ***and potentially your MPI job) [loki:2073] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! [loki:2079] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! [loki:02080] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f485c593870] [loki:02080] [ 1] /usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12(+0x16d4df)[0x7f485c90e4df] [loki:02080] [ 2] /usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12(ompi_group_increment_proc_count+0x35)[0x7f485c90eee5] [loki:02080] [ 3] /usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12(ompi_comm_init+0x2fc)[0x7f485c8be9fc] [loki:02080] [ 4] /usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12(ompi_mpi_init+0xd12)[0x7f485c962942] [loki:02080] [ 5] /usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12(PMPI_Init+0x1f2)[0x7f485cda7332] [loki:02080] [ 6] spawn_slave[0x400a89] [loki:02080] [ 7] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f485c1fdb05] [loki:02080] [ 8] spawn_slave[0x400952] [loki:02080] *** End of error message *** --- Child job 2 terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted. --- -- mpiexec detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[38824,2],0] Exit code:1 -- loki spawn 132 Everything works fine with spawn_multiple_master. loki spawn 134 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_multiple_master Parent process 0 running on loki I create 3 slave processes. Parent process 0: tasks in MPI_COMM_WORLD:1 tasks in COMM_CHILD_PROCESSES local group: 1 tasks in COMM_CHILD_PROCESSES remote group: 2 Slave process 0 of 2 running on loki ... I have a similar error with openmpi-v2.x-dev-1404-g74d8ea0. My other spawn programs work more or less as expected, although spawn_intra_comm doesn't return so that I have to break it with . loki spawn 124 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute" OPAL repo revision: v2.x-dev-1404-g74d8ea0 C compiler absolute: /opt/solstudio12.4/bin/cc loki spawn 125 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master Parent process 0 running on loki I create 4 slave processes [loki:03931] OPAL ERROR: Timeout in file ../../../../openmpi-v2.x-dev-1404-g74d8ea0/opal/mca/pmix/base/pmix_base_fns.c at line 190 [loki:3931] *** An error occurred in MPI_Comm_spawn [loki:3931] *** reported by process [2431254529,0] [loki:3931] *** on communicator MPI_COMM_WORLD [loki:3931] *** MPI_ERR_UNKNOWN: unknown error [loki:3931] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [loki:3931] ***and potentially your MPI job) loki spawn 126 I would be grateful, if somebody can fix the problem. Thank you very much for any help in advance. Kind regards Siegmar /* The program demonstrates ho
[OMPI users] warning message for process binding with openmpi-dev-4010-g6c9d65c
Hi, yesterday I installed openmpi-dev-4010-g6c9d65c on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.13 and gcc-5.3.0. Unfortunately I get the following warning message. loki hello_1 128 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute" OPAL repo revision: dev-4010-g6c9d65c C compiler absolute: /opt/solstudio12.4/bin/cc loki hello_1 129 mpiexec -np 3 --host loki --slot-list 0:0-5,1:0-5 hello_1_mpi -- WARNING: a request was made to bind a process. While the system supports binding the process itself, at least one node does NOT support binding memory to the process location. Node: loki Open MPI uses the "hwloc" library to perform process and memory binding. This error message means that hwloc has indicated that processor binding support is not available on this machine. On OS X, processor and memory binding is not available at all (i.e., the OS does not expose this functionality). On Linux, lack of the functionality can mean that you are on a platform where processor and memory affinity is not supported in Linux itself, or that hwloc was built without NUMA and/or processor affinity support. When building hwloc (which, depending on your Open MPI installation, may be embedded in Open MPI itself), it is important to have the libnuma header and library files available. Different linux distributions package these files under different names; look for packages with the word "numa" in them. You may also need a developer version of the package (e.g., with "dev" or "devel" in the name) to obtain the relevant header files. If you are getting this message on a non-OS X, non-Linux platform, then hwloc does not support processor / memory affinity on this platform. If the OS/platform does actually support processor / memory affinity, then you should contact the hwloc maintainers: https://github.com/open-mpi/hwloc. This is a warning only; your job will continue, though performance may be degraded. -- Process 0 of 3 running on loki Process 2 of 3 running on loki Process 1 of 3 running on loki Now 2 slave tasks are sending greetings. Greetings from task 1: message type:3 ... loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 122 ls -l /usr/lib64/*numa* -rwxr-xr-x 1 root root 48256 Nov 24 16:29 /usr/lib64/libnuma.so.1 loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 123 grep numa log.configure.Linux.x86_64.64_cc checking numaif.h usability... no checking numaif.h presence... yes configure: WARNING: numaif.h: present but cannot be compiled configure: WARNING: numaif.h: check for missing prerequisite headers? configure: WARNING: numaif.h: see the Autoconf documentation configure: WARNING: numaif.h: section "Present But Cannot Be Compiled" configure: WARNING: numaif.h: proceeding with the compiler's result checking for numaif.h... no loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 124 I didn't get the warning for openmpi-v1.10.2-176-g9d45e07 and openmpi-v2.x-dev-1404-g74d8ea0 as you can see in my previous emails, although I have the same messages in log.configure.*. I would be grateful, if somebody can fix the problem if it is a problem and not an intended message. Thank you very much for any help in advance. Kind regards Siegmar /* An MPI-version of the "hello world" program, which delivers some * information about its machine and operating system. * * * Compiling: * Store executable(s) into local directory. * mpicc -o * * Store executable(s) into predefined directories. * make * * Make program(s) automatically on all specified hosts. You must * edit the file "make_compile" and specify your host names before * you execute it. * make_compile * * Running: * LAM-MPI: * mpiexec -boot -np * or * mpiexec -boot \ * -host -np : \ * -host -np * or * mpiexec -boot [-v] -configfile * or * lamboot [-v] [] * mpiexec -np * or * mpiexec [-v] -configfile * lamhalt * * OpenMPI: * "host1", "host2", and so on can all have the same name, * if you want to start a virtual computer with some virtual * cpu's on the local host. The name "localhost" is allowed * as well. * * mpiexec -np * or * mpiexec --host \ * -np * or * mpiexec -hostfile \ * -np * or * mpiexec -app * * Cleaning: * local computer: * rm * or * make clean_all * on all specified computers (you must edit the file "make_clean_all" * and specify your host names before you execute it. * make_clean_all * * * File: hello_1_mpi.c Author: S. Gross * Date: 01.10.2012 * */ #include #include #include #include #include #include "mpi.h" #define BUF_SIZE 255 /* message buffer size */ #define MAX_TASKS 12 /* max. number of tasks */ #defin
[OMPI users] problem with Sun C 5.14 beta
Hi, today I tried to install openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.14 beta. Unfortunately "configure" breaks, because it thinks that C and C++ are link incompatible. I used the following configure command. ../openmpi-v1.10.2-176-g9d45e07/configure \ --prefix=/usr/local/openmpi-1.10.3_64_cc \ --libdir=/usr/local/openmpi-1.10.3_64_cc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack" CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \ CPP="cpp" CXXCPP="cpp" \ --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ --enable-heterogeneous \ --enable-mpi-thread-multiple \ --with-hwloc=internal \ --without-verbs \ --with-wrapper-cflags="-m64 -mt" \ --with-wrapper-cxxflags="-m64 -library=stlport4" \ --with-wrapper-fcflags="-m64" \ --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc I don't know if it is a compiler problem or a problem with the configure command. Perhaps you are nevertheless interested in the problem. Kind regards Siegmar This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by Open MPI configure v1.10.2-176-g9d45e07, which was generated by GNU Autoconf 2.69. Invocation command line was $ ../openmpi-v1.10.2-176-g9d45e07/configure --prefix=/usr/local/openmpi-1.10.3_64_cc --libdir=/usr/local/openmpi-1.10.3_64_cc/lib64 --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin --with-jdk-headers=/usr/local/jdk1.8.0_66/include JAVA_HOME=/usr/local/jdk1.8.0_66 LDFLAGS=-m64 -mt -Wl,-z -Wl,noexecstack CC=cc CXX=CC FC=f95 CFLAGS=-m64 -mt CXXFLAGS=-m64 -library=stlport4 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java --enable-heterogeneous --enable-mpi-thread-multiple --with-hwloc=internal --without-verbs --with-wrapper-cflags=-m64 -mt --with-wrapper-cxxflags=-m64 -library=stlport4 --with-wrapper-fcflags=-m64 --with-wrapper-ldflags=-mt --enable-debug ## - ## ## Platform. ## ## - ## hostname = loki uname -m = x86_64 uname -r = 3.12.55-52.42-default uname -s = Linux uname -v = #1 SMP Thu Mar 3 10:35:46 UTC 2016 (4354e1d) /usr/bin/uname -p = x86_64 /bin/uname -X = unknown /bin/arch = x86_64 /usr/bin/arch -k = unknown /usr/convex/getsysinfo = unknown /usr/bin/hostinfo = unknown /bin/machine = unknown /usr/bin/oslevel = unknown /bin/universe = unknown PATH: /usr/local/eclipse-4.5.1 PATH: /usr/local/netbeans-8.1/bin PATH: /usr/local/jdk1.8.0_66/bin PATH: /usr/local/jdk1.8.0_66/db/bin PATH: /usr/local/intel_xe_2016/compilers_and_libraries_2016.3.210/linux/bin/intel64 PATH: /usr/local/intel_xe_2016/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin PATH: /usr/local/intel_xe_2016/debugger_2016/gdb/intel64_mic/bin PATH: /usr/local/intel_xe_2016/compilers_and_libraries_2016.3.210/linux/bin/ia32 PATH: /usr/local/intel_xe_2016/debugger_2016/gdb/intel64_mic/bin PATH: /opt/solstudio12.5b/bin PATH: /usr/local/gcc-6.1.0/bin PATH: /usr/local/sbin PATH: /usr/local/bin PATH: /sbin PATH: /usr/sbin PATH: /bin PATH: /usr/bin PATH: /usr/local/hwloc-1.11.1/bin PATH: /root/Linux/x86_64/bin PATH: . ## --- ## ## Core tests. ## ## --- ## configure:5534: checking build system type configure:5548: result: x86_64-pc-linux-gnu configure:5568: checking host system type configure:5581: result: x86_64-pc-linux-gnu configure:5601: checking target system type configure:5614: result: x86_64-pc-linux-gnu configure:5740: checking for gcc configure:5767: result: cc configure:5996: checking for C compiler version configure:6005: cc --version >&5 cc: Warning: Option --version passed to ld, if ld is invoked, ignored otherwise usage: cc [ options ] files. Use 'cc -flags' for details configure:6016: $? = 1 configure:6005: cc -v >&5 usage: cc [ options ] files. Use 'cc -flags' for details configure:6016: $? = 1 configure:6005: cc -V >&5 cc: Studio 12.5 Sun C 5.14 Linux_i386 Beta 2015/11/17 configure:6016: $? = 0 configure:6005: cc -qversion >&5 cc: Warning: Option -qversion passed to ld, if ld is invoked, ignored otherwise usage: cc [ options ] files. Use 'cc -flags' for details configure:6016: $? = 1 configure:6036: checking whether the C compiler works configure:6058: cc -m64 -mt -m64 -mt -Wl,-z -Wl,noexecstack conftest.c >&5 configure:6062: $? = 0 configure:6110: result: yes configure:6113: checking for C compiler default output file name configure:6115: result: a.out configure:6121: checking for suffix of executables configure:6128: cc -o conftest -m64 -mt -m64 -mt -Wl,-z -Wl,noexecstack conftest.c >&5 configure:6132: $? = 0 configure:6154: result: configure:6176: checking whether we are cross
Re: [OMPI users] warning message for process binding with openmpi-dev-4010-g6c9d65c
Siegmar, did you upgrade your os recently ? or change hyper threading settings ? this error message typically appears when the numactl-devel rpm is not installed (numactl-devel on redhat, the package name might differ on sles) if not, would you mind retesting frI'm scratch a previous tarball that used to work without any warning ? Cheers, Gilles On Saturday, May 7, 2016, Siegmar Gross < siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi, > > yesterday I installed openmpi-dev-4010-g6c9d65c on my "SUSE Linux > Enterprise Server 12 (x86_64)" with Sun C 5.13 and gcc-5.3.0. > Unfortunately I get the following warning message. > > loki hello_1 128 ompi_info | grep -e "OPAL repo revision" -e "C compiler > absolute" > OPAL repo revision: dev-4010-g6c9d65c > C compiler absolute: /opt/solstudio12.4/bin/cc > loki hello_1 129 mpiexec -np 3 --host loki --slot-list 0:0-5,1:0-5 > hello_1_mpi > -- > WARNING: a request was made to bind a process. While the system > supports binding the process itself, at least one node does NOT > support binding memory to the process location. > > Node: loki > > Open MPI uses the "hwloc" library to perform process and memory > binding. This error message means that hwloc has indicated that > processor binding support is not available on this machine. > > On OS X, processor and memory binding is not available at all (i.e., > the OS does not expose this functionality). > > On Linux, lack of the functionality can mean that you are on a > platform where processor and memory affinity is not supported in Linux > itself, or that hwloc was built without NUMA and/or processor affinity > support. When building hwloc (which, depending on your Open MPI > installation, may be embedded in Open MPI itself), it is important to > have the libnuma header and library files available. Different linux > distributions package these files under different names; look for > packages with the word "numa" in them. You may also need a developer > version of the package (e.g., with "dev" or "devel" in the name) to > obtain the relevant header files. > > If you are getting this message on a non-OS X, non-Linux platform, > then hwloc does not support processor / memory affinity on this > platform. If the OS/platform does actually support processor / memory > affinity, then you should contact the hwloc maintainers: > https://github.com/open-mpi/hwloc. > > This is a warning only; your job will continue, though performance may > be degraded. > -- > Process 0 of 3 running on loki > Process 2 of 3 running on loki > Process 1 of 3 running on loki > > > Now 2 slave tasks are sending greetings. > > Greetings from task 1: > message type:3 > ... > > > > loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 122 ls -l > /usr/lib64/*numa* > -rwxr-xr-x 1 root root 48256 Nov 24 16:29 /usr/lib64/libnuma.so.1 > loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 123 grep numa > log.configure.Linux.x86_64.64_cc > checking numaif.h usability... no > checking numaif.h presence... yes > configure: WARNING: numaif.h: present but cannot be compiled > configure: WARNING: numaif.h: check for missing prerequisite headers? > configure: WARNING: numaif.h: see the Autoconf documentation > configure: WARNING: numaif.h: section "Present But Cannot Be Compiled" > configure: WARNING: numaif.h: proceeding with the compiler's result > checking for numaif.h... no > loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 124 > > > > I didn't get the warning for openmpi-v1.10.2-176-g9d45e07 and > openmpi-v2.x-dev-1404-g74d8ea0 as you can see in my previous emails, > although I have the same messages in log.configure.*. I would be > grateful, if somebody can fix the problem if it is a problem > and not an intended message. Thank you very much for any help in > advance. > > > Kind regards > > Siegmar >
Re: [OMPI users] problem with Sun C 5.14 beta
Siegmar, per the config.log, you need to update your CXXFLAGS="-m64 -library=stlport4 -std=sun03" or just CXXFLAGS="-m64" Cheers, Gilles On Saturday, May 7, 2016, Siegmar Gross < siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi, > > today I tried to install openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux > Enterprise Server 12 (x86_64)" with Sun C 5.14 beta. Unfortunately > "configure" breaks, because it thinks that C and C++ are link > incompatible. I used the following configure command. > > ../openmpi-v1.10.2-176-g9d45e07/configure \ > --prefix=/usr/local/openmpi-1.10.3_64_cc \ > --libdir=/usr/local/openmpi-1.10.3_64_cc/lib64 \ > --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ > --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ > JAVA_HOME=/usr/local/jdk1.8.0_66 \ > LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack" CC="cc" CXX="CC" FC="f95" \ > CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \ > CPP="cpp" CXXCPP="cpp" \ > --enable-mpi-cxx \ > --enable-cxx-exceptions \ > --enable-mpi-java \ > --enable-heterogeneous \ > --enable-mpi-thread-multiple \ > --with-hwloc=internal \ > --without-verbs \ > --with-wrapper-cflags="-m64 -mt" \ > --with-wrapper-cxxflags="-m64 -library=stlport4" \ > --with-wrapper-fcflags="-m64" \ > --with-wrapper-ldflags="-mt" \ > --enable-debug \ > |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc > > > I don't know if it is a compiler problem or a problem with the > configure command. Perhaps you are nevertheless interested in > the problem. > > > Kind regards > > Siegmar >
Re: [OMPI users] warning message for process binding with openmpi-dev-4010-g6c9d65c
Hi Gilles, "loki" is a machine in our new lab and I tried "--slot-list 0:0-5,1:0-5" the first time, so that I don't know if it worked before. I can ask our admin on Monday, if numactl-devel is installed. Kind regards Siegmar On 05/07/16 12:10, Gilles Gouaillardet wrote: Siegmar, did you upgrade your os recently ? or change hyper threading settings ? this error message typically appears when the numactl-devel rpm is not installed (numactl-devel on redhat, the package name might differ on sles) if not, would you mind retesting frI'm scratch a previous tarball that used to work without any warning ? Cheers, Gilles On Saturday, May 7, 2016, Siegmar Gross mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote: Hi, yesterday I installed openmpi-dev-4010-g6c9d65c on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.13 and gcc-5.3.0. Unfortunately I get the following warning message. loki hello_1 128 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute" OPAL repo revision: dev-4010-g6c9d65c C compiler absolute: /opt/solstudio12.4/bin/cc loki hello_1 129 mpiexec -np 3 --host loki --slot-list 0:0-5,1:0-5 hello_1_mpi -- WARNING: a request was made to bind a process. While the system supports binding the process itself, at least one node does NOT support binding memory to the process location. Node: loki Open MPI uses the "hwloc" library to perform process and memory binding. This error message means that hwloc has indicated that processor binding support is not available on this machine. On OS X, processor and memory binding is not available at all (i.e., the OS does not expose this functionality). On Linux, lack of the functionality can mean that you are on a platform where processor and memory affinity is not supported in Linux itself, or that hwloc was built without NUMA and/or processor affinity support. When building hwloc (which, depending on your Open MPI installation, may be embedded in Open MPI itself), it is important to have the libnuma header and library files available. Different linux distributions package these files under different names; look for packages with the word "numa" in them. You may also need a developer version of the package (e.g., with "dev" or "devel" in the name) to obtain the relevant header files. If you are getting this message on a non-OS X, non-Linux platform, then hwloc does not support processor / memory affinity on this platform. If the OS/platform does actually support processor / memory affinity, then you should contact the hwloc maintainers: https://github.com/open-mpi/hwloc. This is a warning only; your job will continue, though performance may be degraded. -- Process 0 of 3 running on loki Process 2 of 3 running on loki Process 1 of 3 running on loki Now 2 slave tasks are sending greetings. Greetings from task 1: message type:3 ... loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 122 ls -l /usr/lib64/*numa* -rwxr-xr-x 1 root root 48256 Nov 24 16:29 /usr/lib64/libnuma.so.1 loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 123 grep numa log.configure.Linux.x86_64.64_cc checking numaif.h usability... no checking numaif.h presence... yes configure: WARNING: numaif.h: present but cannot be compiled configure: WARNING: numaif.h: check for missing prerequisite headers? configure: WARNING: numaif.h: see the Autoconf documentation configure: WARNING: numaif.h: section "Present But Cannot Be Compiled" configure: WARNING: numaif.h: proceeding with the compiler's result checking for numaif.h... no loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 124 I didn't get the warning for openmpi-v1.10.2-176-g9d45e07 and openmpi-v2.x-dev-1404-g74d8ea0 as you can see in my previous emails, although I have the same messages in log.configure.*. I would be grateful, if somebody can fix the problem if it is a problem and not an intended message. Thank you very much for any help in advance. Kind regards Siegmar ___ users mailing list us...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/05/29131.php
Re: [OMPI users] [open-mpi/ompi] COMM_SPAWN broken on Solaris/v1.10 (#1569)
Hi Gilles, the minimal configuration to reproduce an error with spawn_master are two Sparc machines. tyr spawn 124 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute" OPAL repo revision: v1.10.2-176-g9d45e07 C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc tyr spawn 125 ssh ruester ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute" OPAL repo revision: v1.10.2-176-g9d45e07 C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc tyr spawn 126 uname -a SunOS tyr.informatik.hs-fulda.de 5.10 Generic_150400-11 sun4u sparc SUNW,A70 Solaris tyr spawn 127 ssh ruester uname -a SunOS ruester.informatik.hs-fulda.de 5.10 Generic_150400-04 sun4u sparc SUNW,SPARC-Enterprise Solaris tyr spawn 128 mpiexec -np 1 --host tyr,tyr,tyr,tyr,tyr spawn_master Parent process 0 running on tyr.informatik.hs-fulda.de I create 4 slave processes Parent process 0: tasks in MPI_COMM_WORLD:1 tasks in COMM_CHILD_PROCESSES local group: 1 tasks in COMM_CHILD_PROCESSES remote group: 4 Slave process 1 of 4 running on tyr.informatik.hs-fulda.de Slave process 0 of 4 running on tyr.informatik.hs-fulda.de Slave process 3 of 4 running on tyr.informatik.hs-fulda.de Slave process 2 of 4 running on tyr.informatik.hs-fulda.de spawn_slave 2: argv[0]: spawn_slave spawn_slave 0: argv[0]: spawn_slave spawn_slave 3: argv[0]: spawn_slave spawn_slave 1: argv[0]: spawn_slave tyr spawn 129 mpiexec -np 1 --host tyr,tyr,tyr,tyr,ruester spawn_master Parent process 0 running on tyr.informatik.hs-fulda.de I create 4 slave processes Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (proc_pointer))->obj_magic_id, file ../../openmpi-v1.10.2-176-g9d45e07/ompi/group/group_init.c, line 215, function ompi_group_increment_proc_count [ruester:23592] *** Process received signal *** [ruester:23592] Signal: Abort (6) [ruester:23592] Signal code: (-1) /usr/local/openmpi-1.10.3_64_gcc/lib64/libopen-pal.so.13.0.2:opal_backtrace_print+0x2c /usr/local/openmpi-1.10.3_64_gcc/lib64/libopen-pal.so.13.0.2:0xc2c0c /lib/sparcv9/libc.so.1:0xd8c28 /lib/sparcv9/libc.so.1:0xcc79c /lib/sparcv9/libc.so.1:0xcc9a8 /lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 6 (ABRT)] /lib/sparcv9/libc.so.1:abort+0xd0 /lib/sparcv9/libc.so.1:_assert_c99+0x78 /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:ompi_group_increment_proc_count+0xf0 /usr/local/openmpi-1.10.3_64_gcc/lib64/openmpi/mca_dpm_orte.so:0x6638 /usr/local/openmpi-1.10.3_64_gcc/lib64/openmpi/mca_dpm_orte.so:0x948c /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:ompi_mpi_init+0x1978 /usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:MPI_Init+0x2a8 /home/fd1026/SunOS/sparc/bin/spawn_slave:main+0x10 /home/fd1026/SunOS/sparc/bin/spawn_slave:_start+0x7c [ruester:23592] *** End of error message *** -- mpiexec noticed that process rank 3 with PID 0 on node ruester exited on signal 6 (Abort). -- tyr spawn 130 A minimal configuration to reproduce an error with spawn_intra_comm is a single machine for openmpi-2.x and openmpi-master. I didn't get an error message on Linux (it just hangs after displaying the messages). tyr spawn 114 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute" OPAL repo revision: dev-4010-g6c9d65c C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc tyr spawn 115 mpiexec -np 1 --host tyr,tyr,tyr spawn_intra_comm Parent process 0: I create 2 slave processes Child process 0 running on tyr.informatik.hs-fulda.de MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:1 Child process 1 running on tyr.informatik.hs-fulda.de MPI_COMM_WORLD ntasks: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:2 Parent process 0 running on tyr.informatik.hs-fulda.de MPI_COMM_WORLD ntasks: 1 COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES:0 [[48188,1],0][../../../../../openmpi-dev-4010-g6c9d65c/opal/mca/btl/tcp/btl_tcp_endpoint.c:755:mca_btl_tcp_endpoint_start_connect] from tyr to: tyr Unable to connect to the peer 193.174.24.39 on port 1026: Connection refused [tyr.informatik.hs-fulda.de:06684] ../../../../../openmpi-dev-4010-g6c9d65c/ompi/mca/pml/ob1/pml_ob1_sendreq.c:237 FATAL tyr spawn 116 sunpc1 fd1026 102 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute" OPAL repo revision: dev-4010-g6c9d65c C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc sunpc1 fd1026 103 mpiexec -np 1 --host sunpc1,sunpc1,sunpc1 spawn_intra_comm Parent process 0: I create 2 slave processes Parent process 0 running on sunpc1 MPI_COMM_WORLD ntasks:
Re: [OMPI users] problem with Sun C 5.14 beta
Hi Gilles, thank you very much for your help. Now C and C++ are link compatible. Kind regards Siegmar On 05/07/16 12:15, Gilles Gouaillardet wrote: Siegmar, per the config.log, you need to update your CXXFLAGS="-m64 -library=stlport4 -std=sun03" or just CXXFLAGS="-m64" Cheers, Gilles On Saturday, May 7, 2016, Siegmar Gross mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote: Hi, today I tried to install openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux Enterprise Server 12 (x86_64)" with Sun C 5.14 beta. Unfortunately "configure" breaks, because it thinks that C and C++ are link incompatible. I used the following configure command. ../openmpi-v1.10.2-176-g9d45e07/configure \ --prefix=/usr/local/openmpi-1.10.3_64_cc \ --libdir=/usr/local/openmpi-1.10.3_64_cc/lib64 \ --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack" CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \ CPP="cpp" CXXCPP="cpp" \ --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \ --enable-heterogeneous \ --enable-mpi-thread-multiple \ --with-hwloc=internal \ --without-verbs \ --with-wrapper-cflags="-m64 -mt" \ --with-wrapper-cxxflags="-m64 -library=stlport4" \ --with-wrapper-fcflags="-m64" \ --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc I don't know if it is a compiler problem or a problem with the configure command. Perhaps you are nevertheless interested in the problem. Kind regards Siegmar ___ users mailing list us...@open-mpi.org Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/05/29132.php
Re: [OMPI users] No core dump in some cases
I'm afraid I don't know what a .btr file is -- that is not something that is controlled by Open MPI. You might want to look into your OS settings to see if it has some kind of alternate corefile mechanism...? > On May 6, 2016, at 8:58 PM, dpchoudh . wrote: > > Hello all > > I run MPI jobs (for test purpose only) on two different 'clusters'. Both > 'clusters' have two nodes only, connected back-to-back. The two are very > similar, but not identical, both software and hardware wise. > > Both have ulimit -c set to unlimited. However, only one of the two creates > core files when an MPI job crashes. The other creates a text file named > something like > .80s-,.btr > > I'd much prefer a core file because that allows me to debug with a lot more > options than a static text file with addresses. How do I get a core file in > all situations? I am using MPI source from the master branch. > > Thanks in advance > Durga > > The surgeon general advises you to eat right, exercise regularly and quit > ageing. > ___ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29124.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/