[OMPI users] slot problem on "SUSE Linux, Enterprise Server 12 (x86_64)"

2016-05-07 Thread Siegmar Gross

Hi,

yesterday I installed openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux
Enterprise Server 12 (x86_64)" with Sun C 5.13  and gcc-5.3.0. The
following programs don't run anymore.


loki hello_2 112 ompi_info | grep -e "OPAL repo revision" -e "C compiler 
absolute"
  OPAL repo revision: v1.10.2-176-g9d45e07
 C compiler absolute: /opt/solstudio12.4/bin/cc
loki hello_2 113 mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki 
hello_2_slave_mpi

--
There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
  hello_2_slave_mpi

Either request fewer slots for your application, or make more slots available
for use.
--
loki hello_2 114



Everything worked as expected with openmpi-v1.10.0-178-gb80f802.

loki hello_2 114 ompi_info | grep -e "OPAL repo revision" -e "C compiler 
absolute"
  OPAL repo revision: v1.10.0-178-gb80f802
 C compiler absolute: /opt/solstudio12.4/bin/cc
loki hello_2 115 mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki 
hello_2_slave_mpi

Process 0 of 3 running on loki
Process 1 of 3 running on loki
Process 2 of 3 running on loki

Now 2 slave tasks are sending greetings.

Greetings from task 2:
  message type:3
...


I have the same problem with openmpi-v2.x-dev-1404-g74d8ea0, if I use
the following commands.

mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki hello_2_slave_mpi
mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,nfs1 hello_2_slave_mpi
mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki --slot-list 
0:0-5,1:0-5 hello_2_slave_mpi



I have also the same problem with openmpi-dev-4010-g6c9d65c, if I use
the following command.

mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,loki hello_2_slave_mpi


openmpi-dev-4010-g6c9d65c works as expected with the following commands.

mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki,nfs1 hello_2_slave_mpi
mpiexec -np 1 --host loki hello_2_mpi : -np 2 --host loki --slot-list 
0:0-5,1:0-5 hello_2_slave_mpi



Has the interface changed so that I'm not allowed to use some of my
commands any longer? I would be grateful, if somebody can fix the
problem if it is a problem. Thank you very much for any help in
advance.



Kind regards

Siegmar
/* Another MPI-version of the "hello world" program, which delivers
 * some information about its machine and operating system. In this
 * version the functions "master" and "slave" from "hello_1_mpi.c"
 * are implemented as independant processes. This is the file for the
 * "master".
 *
 *
 * Compiling:
 *   Store executable(s) into local directory.
 * mpicc -o  
 *
 *   Store executable(s) into predefined directories.
 * make
 *
 *   Make program(s) automatically on all specified hosts. You must
 *   edit the file "make_compile" and specify your host names before
 *   you execute it.
 * make_compile
 *
 * Running:
 *   LAM-MPI:
 * mpiexec -boot -np  
 * or
 * mpiexec -boot \
 *	 -host  -np   : \
 *	 -host  -np  
 * or
 * mpiexec -boot [-v] -configfile 
 * or
 * lamboot [-v] []
 *   mpiexec -np  
 *	 or
 *	 mpiexec [-v] -configfile 
 * lamhalt
 *
 *   OpenMPI:
 * "host1", "host2", and so on can all have the same name,
 * if you want to start a virtual computer with some virtual
 * cpu's on the local host. The name "localhost" is allowed
 * as well.
 *
 * mpiexec -np  
 * or
 * mpiexec --host  \
 *	 -np  
 * or
 * mpiexec -hostfile  \
 *	 -np  
 * or
 * mpiexec -app 
 *
 * Cleaning:
 *   local computer:
 * rm 
 * or
 * make clean_all
 *   on all specified computers (you must edit the file "make_clean_all"
 *   and specify your host names before you execute it.
 * make_clean_all
 *
 *
 * File: hello_2_mpi.c		   	Author: S. Gross
 * Date: 01.10.2012
 *
 */

#include 
#include 
#include 
#include "mpi.h"

#define	BUF_SIZE	255		/* message buffer size		*/
#define	MAX_TASKS	12		/* max. number of tasks		*/
#define	SENDTAG		1		/* send message command		*/
#define	EXITTAG		2		/* termination command		*/
#define	MSGTAG		3		/* normal message token		*/

#define ENTASKS		-1		/* error: too many tasks	*/

int main (int argc, char *argv[])
{
  int  mytid,/* my task id			*/
   ntasks,/* number of parallel tasks	*/
   namelen,/* length of processor name	*/
   num,/* number of chars in buffer	*/
   i;/* loop variable		*/
  char processor_name[MPI_MAX_PROCESSOR_NAME],
   buf[BUF_SIZE + 1];		/* message buffer (+1 for '\0')	*/
  MPI_Status	stat;			/* message details		*/

  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
  MPI_Comm_size (MPI_COMM_WORLD, &ntasks);
  MPI_Get_processor_name (processor_name, &namelen);
  /* With the next statement every process executing this code will
   

[OMPI users] problem compiling Java programs with openmpi-v1.10.2-176-g9d45e07

2016-05-07 Thread Siegmar Gross

Hi,

yesterday I installed openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux
Enterprise Server 12 (x86_64)" with Sun C 5.13  and gcc-5.3.0.
Unfortunately I have a problem compiling Java programs.


loki java 124 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute"
  OPAL repo revision: v1.10.2-176-g9d45e07
 C compiler absolute: /opt/solstudio12.4/bin/cc
loki java 125 mpijavac BcastIntMain.java
BcastIntMain.java:44: error: cannot find symbol
mytid = MPI.COMM_WORLD.getRank ();
   ^
  symbol:   variable COMM_WORLD
  location: class MPI
BcastIntMain.java:52: error: cannot find symbol
MPI.COMM_WORLD.bcast (intValue, 1, MPI.INT, 0);
  ^
  symbol:   variable INT
  location: class MPI
BcastIntMain.java:52: error: cannot find symbol
MPI.COMM_WORLD.bcast (intValue, 1, MPI.INT, 0);
   ^
  symbol:   variable COMM_WORLD
  location: class MPI
3 errors
loki java 126


loki java 110 dir /usr/local/openmpi-1.10.3_64_cc/lib64/*.jar
-rw-r--r-- 1 root root 60876 May  6 13:05 
/usr/local/openmpi-1.10.3_64_cc/lib64/mpi.jar

loki java 111 javac -version
javac 1.8.0_66
loki java 112



I have the same problem with openmpi-v2.x-dev-1404-g74d8ea0 and with
openmpi-dev-4010-g6c9d65c and I would be grateful, if somebody can fix
the problem. Thank you very much for any help in advance.


Kind regards

Siegmar
/* Small program that distributes an integer value with a
 * broadcast operation.
 *
 * Java uses call-by-value and doesn't support call-by-reference
 * for method parameters with the only exception of object references.
 * Therefore you must use an array with just one element, if you
 * want to send/receive/broadcast/... primitive datatypes.
 *
 * "mpijavac" and Java-bindings are available in "Open MPI
 * version 1.7.4" or newer.
 *
 *
 * Class file generation:
 *   mpijavac BcastIntMain.java
 *
 * Usage:
 *   mpiexec [parameters] java [parameters] BcastIntMain
 *
 * Examples:
 *   mpiexec -np 2 java BcastIntMain
 *   mpiexec -np 2 java -cp $HOME/mpi_classfiles BcastIntMain
 *
 *
 * File: BcastIntMain.java		Author: S. Gross
 * Date: 09.09.2013
 *
 */

import mpi.*;

public class BcastIntMain
{
  static final int SLEEP_FACTOR = 200;	/* 200 ms to get ordered output	*/

  public static void main (String args[]) throws MPIException,
		 InterruptedException
  {
int	   mytid;			/* my task id			*/
intintValue[] = new int[1];	/* broadcast one intValue	*/
String processorName;		/* name of local machine	*/

MPI.Init (args);
processorName = MPI.getProcessorName ();
mytid	  = MPI.COMM_WORLD.getRank ();
intValue[0]   = -1;
if (mytid == 0)
{
  /* initialize data item		*/
  intValue[0] = 1234567;
}
/* broadcast value to all processes	*/
MPI.COMM_WORLD.bcast (intValue, 1, MPI.INT, 0);
/* Each process prints its received data item. The outputs
 * can intermingle on the screen so that you must use
 * "-output-filename" in Open MPI.
 */
Thread.sleep (SLEEP_FACTOR * mytid); /* sleep to get ordered output	*/	
System.out.printf ("\nProcess %d running on %s.\n" +
		   "  intValue: %d\n",
		   mytid, processorName, intValue[0]);
MPI.Finalize ();
  }
}


[OMPI users] slot-list breaks for openmpi-v1.10.2-176-g9d45e07 on "SUSE Linux, Enterprise Server 12 (x86_64)"

2016-05-07 Thread Siegmar Gross

Hi,

yesterday I installed openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux
Enterprise Server 12 (x86_64)" with Sun C 5.13  and gcc-5.3.0.
Unfortunately I have a problem with one of my spawn programs.

loki spawn 129 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute"
  OPAL repo revision: v1.10.2-176-g9d45e07
 C compiler absolute: /opt/solstudio12.4/bin/cc
loki spawn 130 mpiexec -np 1 --host loki,loki,loki,loki,loki spawn_master

Parent process 0 running on loki
  I create 4 slave processes

Parent process 0: tasks in MPI_COMM_WORLD:1
  tasks in COMM_CHILD_PROCESSES local group:  1
  tasks in COMM_CHILD_PROCESSES remote group: 4

Slave process 1 of 4 running on loki
Slave process 2 of 4 running on loki
Slave process 3 of 4 running on loki
Slave process 0 of 4 running on loki
spawn_slave 0: argv[0]: spawn_slave
spawn_slave 1: argv[0]: spawn_slave
spawn_slave 2: argv[0]: spawn_slave
spawn_slave 3: argv[0]: spawn_slave
loki spawn 131 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

[loki:02080] *** Process received signal ***
[loki:02080] Signal: Segmentation fault (11)
[loki:02080] Signal code: Address not mapped (1)
[loki:02080] Failing at address: (nil)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[loki:2073] Local abort before MPI_INIT completed successfully; not able to 
aggregate error messages, and not able to guarantee that all other processes 
were killed!
[loki:2079] Local abort before MPI_INIT completed successfully; not able to 
aggregate error messages, and not able to guarantee that all other processes 
were killed!

[loki:02080] [ 0] /lib64/libpthread.so.0(+0xf870)[0x7f485c593870]
[loki:02080] [ 1] 
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12(+0x16d4df)[0x7f485c90e4df]
[loki:02080] [ 2] 
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12(ompi_group_increment_proc_count+0x35)[0x7f485c90eee5]
[loki:02080] [ 3] 
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12(ompi_comm_init+0x2fc)[0x7f485c8be9fc]
[loki:02080] [ 4] 
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12(ompi_mpi_init+0xd12)[0x7f485c962942]
[loki:02080] [ 5] 
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12(PMPI_Init+0x1f2)[0x7f485cda7332]

[loki:02080] [ 6] spawn_slave[0x400a89]
[loki:02080] [ 7] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f485c1fdb05]
[loki:02080] [ 8] spawn_slave[0x400952]
[loki:02080] *** End of error message ***
---
Child job 2 terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
--
mpiexec detected that one or more processes exited with non-zero status, thus 
causing

the job to be terminated. The first process to do so was:

  Process name: [[38824,2],0]
  Exit code:1
--
loki spawn 132



Everything works fine with spawn_multiple_master.

loki spawn 134 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 
spawn_multiple_master


Parent process 0 running on loki
  I create 3 slave processes.

Parent process 0: tasks in MPI_COMM_WORLD:1
  tasks in COMM_CHILD_PROCESSES local group:  1
  tasks in COMM_CHILD_PROCESSES remote group: 2

Slave process 0 of 2 running on loki
...



I have a similar error with openmpi-v2.x-dev-1404-g74d8ea0. My other
spawn programs work more or less as expected, although spawn_intra_comm
doesn't return so that I have to break it with .

loki spawn 124 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute"
  OPAL repo revision: v2.x-dev-1404-g74d8ea0
 C compiler absolute: /opt/solstudio12.4/bin/cc
loki spawn 125 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

[loki:03931] OPAL ERROR: Timeout in file 
../../../../openmpi-v2.x-dev-1404-g74d8ea0/opal/mca/pmix/base/pmix_base_fns.c at 
line 190

[loki:3931] *** An error occurred in MPI_Comm_spawn
[loki:3931] *** reported by process [2431254529,0]
[loki:3931] *** on communicator MPI_COMM_WORLD
[loki:3931] *** MPI_ERR_UNKNOWN: unknown error
[loki:3931] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
abort,
[loki:3931] ***and potentially your MPI job)
loki spawn 126


I would be grateful, if somebody can fix the problem. Thank you very
much for any help in advance.


Kind regards

Siegmar
/* The program demonstrates ho

[OMPI users] warning message for process binding with openmpi-dev-4010-g6c9d65c

2016-05-07 Thread Siegmar Gross

Hi,

yesterday I installed openmpi-dev-4010-g6c9d65c on my "SUSE Linux
Enterprise Server 12 (x86_64)" with Sun C 5.13  and gcc-5.3.0.
Unfortunately I get the following warning message.

loki hello_1 128 ompi_info | grep -e "OPAL repo revision" -e "C compiler 
absolute"
  OPAL repo revision: dev-4010-g6c9d65c
 C compiler absolute: /opt/solstudio12.4/bin/cc
loki hello_1 129 mpiexec -np 3 --host loki --slot-list 0:0-5,1:0-5 hello_1_mpi
--
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

  Node:  loki

Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.

On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).

On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.

If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
affinity, then you should contact the hwloc maintainers:
https://github.com/open-mpi/hwloc.

This is a warning only; your job will continue, though performance may
be degraded.
--
Process 0 of 3 running on loki
Process 2 of 3 running on loki
Process 1 of 3 running on loki


Now 2 slave tasks are sending greetings.

Greetings from task 1:
  message type:3
...



loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 122 ls -l /usr/lib64/*numa*
-rwxr-xr-x 1 root root 48256 Nov 24 16:29 /usr/lib64/libnuma.so.1
loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 123 grep numa 
log.configure.Linux.x86_64.64_cc

checking numaif.h usability... no
checking numaif.h presence... yes
configure: WARNING: numaif.h: present but cannot be compiled
configure: WARNING: numaif.h: check for missing prerequisite headers?
configure: WARNING: numaif.h: see the Autoconf documentation
configure: WARNING: numaif.h: section "Present But Cannot Be Compiled"
configure: WARNING: numaif.h: proceeding with the compiler's result
checking for numaif.h... no
loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 124



I didn't get the warning for openmpi-v1.10.2-176-g9d45e07 and
openmpi-v2.x-dev-1404-g74d8ea0 as you can see in my previous emails,
although I have the same messages in log.configure.*. I would be
grateful, if somebody can fix the problem if it is a problem
and not an intended message. Thank you very much for any help in
advance.


Kind regards

Siegmar
/* An MPI-version of the "hello world" program, which delivers some
 * information about its machine and operating system.
 *
 *
 * Compiling:
 *   Store executable(s) into local directory.
 * mpicc -o  
 *
 *   Store executable(s) into predefined directories.
 * make
 *
 *   Make program(s) automatically on all specified hosts. You must
 *   edit the file "make_compile" and specify your host names before
 *   you execute it.
 * make_compile
 *
 * Running:
 *   LAM-MPI:
 * mpiexec -boot -np  
 * or
 * mpiexec -boot \
 *	 -host  -np   : \
 *	 -host  -np  
 * or
 * mpiexec -boot [-v] -configfile 
 * or
 * lamboot [-v] []
 *   mpiexec -np  
 *	 or
 *	 mpiexec [-v] -configfile 
 * lamhalt
 *
 *   OpenMPI:
 * "host1", "host2", and so on can all have the same name,
 * if you want to start a virtual computer with some virtual
 * cpu's on the local host. The name "localhost" is allowed
 * as well.
 *
 * mpiexec -np  
 * or
 * mpiexec --host  \
 *	 -np  
 * or
 * mpiexec -hostfile  \
 *	 -np  
 * or
 * mpiexec -app 
 *
 * Cleaning:
 *   local computer:
 * rm 
 * or
 * make clean_all
 *   on all specified computers (you must edit the file "make_clean_all"
 *   and specify your host names before you execute it.
 * make_clean_all
 *
 *
 * File: hello_1_mpi.c		   	Author: S. Gross
 * Date: 01.10.2012
 *
 */

#include 
#include 
#include 
#include 
#include 
#include "mpi.h"

#define	BUF_SIZE	255		/* message buffer size		*/
#define	MAX_TASKS	12		/* max. number of tasks		*/
#defin

[OMPI users] problem with Sun C 5.14 beta

2016-05-07 Thread Siegmar Gross

Hi,

today I tried to install openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux
Enterprise Server 12 (x86_64)" with Sun C 5.14 beta. Unfortunately
"configure" breaks, because it thinks that C and C++ are link
incompatible. I used the following configure command.

../openmpi-v1.10.2-176-g9d45e07/configure \
  --prefix=/usr/local/openmpi-1.10.3_64_cc \
  --libdir=/usr/local/openmpi-1.10.3_64_cc/lib64 \
  --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
  --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
  JAVA_HOME=/usr/local/jdk1.8.0_66 \
  LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack" CC="cc" CXX="CC" FC="f95" \
  CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
  CPP="cpp" CXXCPP="cpp" \
  --enable-mpi-cxx \
  --enable-cxx-exceptions \
  --enable-mpi-java \
  --enable-heterogeneous \
  --enable-mpi-thread-multiple \
  --with-hwloc=internal \
  --without-verbs \
  --with-wrapper-cflags="-m64 -mt" \
  --with-wrapper-cxxflags="-m64 -library=stlport4" \
  --with-wrapper-fcflags="-m64" \
  --with-wrapper-ldflags="-mt" \
  --enable-debug \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc


I don't know if it is a compiler problem or a problem with the
configure command. Perhaps you are nevertheless interested in
the problem.


Kind regards

Siegmar
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure v1.10.2-176-g9d45e07, which was
generated by GNU Autoconf 2.69.  Invocation command line was

  $ ../openmpi-v1.10.2-176-g9d45e07/configure --prefix=/usr/local/openmpi-1.10.3_64_cc --libdir=/usr/local/openmpi-1.10.3_64_cc/lib64 --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin --with-jdk-headers=/usr/local/jdk1.8.0_66/include JAVA_HOME=/usr/local/jdk1.8.0_66 LDFLAGS=-m64 -mt -Wl,-z -Wl,noexecstack CC=cc CXX=CC FC=f95 CFLAGS=-m64 -mt CXXFLAGS=-m64 -library=stlport4 FCFLAGS=-m64 CPP=cpp CXXCPP=cpp --enable-mpi-cxx --enable-cxx-exceptions --enable-mpi-java --enable-heterogeneous --enable-mpi-thread-multiple --with-hwloc=internal --without-verbs --with-wrapper-cflags=-m64 -mt --with-wrapper-cxxflags=-m64 -library=stlport4 --with-wrapper-fcflags=-m64 --with-wrapper-ldflags=-mt --enable-debug

## - ##
## Platform. ##
## - ##

hostname = loki
uname -m = x86_64
uname -r = 3.12.55-52.42-default
uname -s = Linux
uname -v = #1 SMP Thu Mar 3 10:35:46 UTC 2016 (4354e1d)

/usr/bin/uname -p = x86_64
/bin/uname -X = unknown

/bin/arch  = x86_64
/usr/bin/arch -k   = unknown
/usr/convex/getsysinfo = unknown
/usr/bin/hostinfo  = unknown
/bin/machine   = unknown
/usr/bin/oslevel   = unknown
/bin/universe  = unknown

PATH: /usr/local/eclipse-4.5.1
PATH: /usr/local/netbeans-8.1/bin
PATH: /usr/local/jdk1.8.0_66/bin
PATH: /usr/local/jdk1.8.0_66/db/bin
PATH: /usr/local/intel_xe_2016/compilers_and_libraries_2016.3.210/linux/bin/intel64
PATH: /usr/local/intel_xe_2016/compilers_and_libraries_2016.3.210/linux/mpi/intel64/bin
PATH: /usr/local/intel_xe_2016/debugger_2016/gdb/intel64_mic/bin
PATH: /usr/local/intel_xe_2016/compilers_and_libraries_2016.3.210/linux/bin/ia32
PATH: /usr/local/intel_xe_2016/debugger_2016/gdb/intel64_mic/bin
PATH: /opt/solstudio12.5b/bin
PATH: /usr/local/gcc-6.1.0/bin
PATH: /usr/local/sbin
PATH: /usr/local/bin
PATH: /sbin
PATH: /usr/sbin
PATH: /bin
PATH: /usr/bin
PATH: /usr/local/hwloc-1.11.1/bin
PATH: /root/Linux/x86_64/bin
PATH: .


## --- ##
## Core tests. ##
## --- ##

configure:5534: checking build system type
configure:5548: result: x86_64-pc-linux-gnu
configure:5568: checking host system type
configure:5581: result: x86_64-pc-linux-gnu
configure:5601: checking target system type
configure:5614: result: x86_64-pc-linux-gnu
configure:5740: checking for gcc
configure:5767: result: cc
configure:5996: checking for C compiler version
configure:6005: cc --version >&5
cc: Warning: Option --version passed to ld, if ld is invoked, ignored otherwise
usage: cc [ options ] files.  Use 'cc -flags' for details
configure:6016: $? = 1
configure:6005: cc -v >&5
usage: cc [ options ] files.  Use 'cc -flags' for details
configure:6016: $? = 1
configure:6005: cc -V >&5
cc: Studio 12.5 Sun C 5.14 Linux_i386 Beta 2015/11/17
configure:6016: $? = 0
configure:6005: cc -qversion >&5
cc: Warning: Option -qversion passed to ld, if ld is invoked, ignored otherwise
usage: cc [ options ] files.  Use 'cc -flags' for details
configure:6016: $? = 1
configure:6036: checking whether the C compiler works
configure:6058: cc -m64 -mt  -m64 -mt -Wl,-z -Wl,noexecstack conftest.c  >&5
configure:6062: $? = 0
configure:6110: result: yes
configure:6113: checking for C compiler default output file name
configure:6115: result: a.out
configure:6121: checking for suffix of executables
configure:6128: cc -o conftest -m64 -mt  -m64 -mt -Wl,-z -Wl,noexecstack conftest.c  >&5
configure:6132: $? = 0
configure:6154: result: 
configure:6176: checking whether we are cross 

Re: [OMPI users] warning message for process binding with openmpi-dev-4010-g6c9d65c

2016-05-07 Thread Gilles Gouaillardet
Siegmar,

did you upgrade your os recently ? or change hyper threading settings ?
this error message typically appears when the numactl-devel rpm is not
installed
(numactl-devel on redhat, the package name might differ on sles)

if not, would you mind retesting frI'm scratch a previous tarball that used
to work without any warning ?

Cheers,

Gilles


On Saturday, May 7, 2016, Siegmar Gross <
siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi,
>
> yesterday I installed openmpi-dev-4010-g6c9d65c on my "SUSE Linux
> Enterprise Server 12 (x86_64)" with Sun C 5.13  and gcc-5.3.0.
> Unfortunately I get the following warning message.
>
> loki hello_1 128 ompi_info | grep -e "OPAL repo revision" -e "C compiler
> absolute"
>   OPAL repo revision: dev-4010-g6c9d65c
>  C compiler absolute: /opt/solstudio12.4/bin/cc
> loki hello_1 129 mpiexec -np 3 --host loki --slot-list 0:0-5,1:0-5
> hello_1_mpi
> --
> WARNING: a request was made to bind a process. While the system
> supports binding the process itself, at least one node does NOT
> support binding memory to the process location.
>
>   Node:  loki
>
> Open MPI uses the "hwloc" library to perform process and memory
> binding. This error message means that hwloc has indicated that
> processor binding support is not available on this machine.
>
> On OS X, processor and memory binding is not available at all (i.e.,
> the OS does not expose this functionality).
>
> On Linux, lack of the functionality can mean that you are on a
> platform where processor and memory affinity is not supported in Linux
> itself, or that hwloc was built without NUMA and/or processor affinity
> support. When building hwloc (which, depending on your Open MPI
> installation, may be embedded in Open MPI itself), it is important to
> have the libnuma header and library files available. Different linux
> distributions package these files under different names; look for
> packages with the word "numa" in them. You may also need a developer
> version of the package (e.g., with "dev" or "devel" in the name) to
> obtain the relevant header files.
>
> If you are getting this message on a non-OS X, non-Linux platform,
> then hwloc does not support processor / memory affinity on this
> platform. If the OS/platform does actually support processor / memory
> affinity, then you should contact the hwloc maintainers:
> https://github.com/open-mpi/hwloc.
>
> This is a warning only; your job will continue, though performance may
> be degraded.
> --
> Process 0 of 3 running on loki
> Process 2 of 3 running on loki
> Process 1 of 3 running on loki
>
>
> Now 2 slave tasks are sending greetings.
>
> Greetings from task 1:
>   message type:3
> ...
>
>
>
> loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 122 ls -l
> /usr/lib64/*numa*
> -rwxr-xr-x 1 root root 48256 Nov 24 16:29 /usr/lib64/libnuma.so.1
> loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 123 grep numa
> log.configure.Linux.x86_64.64_cc
> checking numaif.h usability... no
> checking numaif.h presence... yes
> configure: WARNING: numaif.h: present but cannot be compiled
> configure: WARNING: numaif.h: check for missing prerequisite headers?
> configure: WARNING: numaif.h: see the Autoconf documentation
> configure: WARNING: numaif.h: section "Present But Cannot Be Compiled"
> configure: WARNING: numaif.h: proceeding with the compiler's result
> checking for numaif.h... no
> loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 124
>
>
>
> I didn't get the warning for openmpi-v1.10.2-176-g9d45e07 and
> openmpi-v2.x-dev-1404-g74d8ea0 as you can see in my previous emails,
> although I have the same messages in log.configure.*. I would be
> grateful, if somebody can fix the problem if it is a problem
> and not an intended message. Thank you very much for any help in
> advance.
>
>
> Kind regards
>
> Siegmar
>


Re: [OMPI users] problem with Sun C 5.14 beta

2016-05-07 Thread Gilles Gouaillardet
Siegmar,

per the config.log, you need to update your CXXFLAGS="-m64
-library=stlport4 -std=sun03"
or just CXXFLAGS="-m64"

Cheers,

Gilles

On Saturday, May 7, 2016, Siegmar Gross <
siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi,
>
> today I tried to install openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux
> Enterprise Server 12 (x86_64)" with Sun C 5.14 beta. Unfortunately
> "configure" breaks, because it thinks that C and C++ are link
> incompatible. I used the following configure command.
>
> ../openmpi-v1.10.2-176-g9d45e07/configure \
>   --prefix=/usr/local/openmpi-1.10.3_64_cc \
>   --libdir=/usr/local/openmpi-1.10.3_64_cc/lib64 \
>   --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
>   --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
>   JAVA_HOME=/usr/local/jdk1.8.0_66 \
>   LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack" CC="cc" CXX="CC" FC="f95" \
>   CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
>   CPP="cpp" CXXCPP="cpp" \
>   --enable-mpi-cxx \
>   --enable-cxx-exceptions \
>   --enable-mpi-java \
>   --enable-heterogeneous \
>   --enable-mpi-thread-multiple \
>   --with-hwloc=internal \
>   --without-verbs \
>   --with-wrapper-cflags="-m64 -mt" \
>   --with-wrapper-cxxflags="-m64 -library=stlport4" \
>   --with-wrapper-fcflags="-m64" \
>   --with-wrapper-ldflags="-mt" \
>   --enable-debug \
>   |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
>
>
> I don't know if it is a compiler problem or a problem with the
> configure command. Perhaps you are nevertheless interested in
> the problem.
>
>
> Kind regards
>
> Siegmar
>


Re: [OMPI users] warning message for process binding with openmpi-dev-4010-g6c9d65c

2016-05-07 Thread Siegmar Gross

Hi Gilles,

"loki" is a machine in our new lab and I tried "--slot-list 0:0-5,1:0-5"
the first time, so that I don't know if it worked before. I can ask our
admin on Monday, if numactl-devel is installed.


Kind regards

Siegmar


On 05/07/16 12:10, Gilles Gouaillardet wrote:

Siegmar,

did you upgrade your os recently ? or change hyper threading settings ?
this error message typically appears when the numactl-devel rpm is not installed
(numactl-devel on redhat, the package name might differ on sles)

if not, would you mind retesting frI'm scratch a previous tarball that used to
work without any warning ?

Cheers,

Gilles


On Saturday, May 7, 2016, Siegmar Gross mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote:

Hi,

yesterday I installed openmpi-dev-4010-g6c9d65c on my "SUSE Linux
Enterprise Server 12 (x86_64)" with Sun C 5.13  and gcc-5.3.0.
Unfortunately I get the following warning message.

loki hello_1 128 ompi_info | grep -e "OPAL repo revision" -e "C compiler
absolute"
  OPAL repo revision: dev-4010-g6c9d65c
 C compiler absolute: /opt/solstudio12.4/bin/cc
loki hello_1 129 mpiexec -np 3 --host loki --slot-list 0:0-5,1:0-5 
hello_1_mpi
--
WARNING: a request was made to bind a process. While the system
supports binding the process itself, at least one node does NOT
support binding memory to the process location.

  Node:  loki

Open MPI uses the "hwloc" library to perform process and memory
binding. This error message means that hwloc has indicated that
processor binding support is not available on this machine.

On OS X, processor and memory binding is not available at all (i.e.,
the OS does not expose this functionality).

On Linux, lack of the functionality can mean that you are on a
platform where processor and memory affinity is not supported in Linux
itself, or that hwloc was built without NUMA and/or processor affinity
support. When building hwloc (which, depending on your Open MPI
installation, may be embedded in Open MPI itself), it is important to
have the libnuma header and library files available. Different linux
distributions package these files under different names; look for
packages with the word "numa" in them. You may also need a developer
version of the package (e.g., with "dev" or "devel" in the name) to
obtain the relevant header files.

If you are getting this message on a non-OS X, non-Linux platform,
then hwloc does not support processor / memory affinity on this
platform. If the OS/platform does actually support processor / memory
affinity, then you should contact the hwloc maintainers:
https://github.com/open-mpi/hwloc.

This is a warning only; your job will continue, though performance may
be degraded.
--
Process 0 of 3 running on loki
Process 2 of 3 running on loki
Process 1 of 3 running on loki


Now 2 slave tasks are sending greetings.

Greetings from task 1:
  message type:3
...



loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 122 ls -l 
/usr/lib64/*numa*
-rwxr-xr-x 1 root root 48256 Nov 24 16:29 /usr/lib64/libnuma.so.1
loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 123 grep numa
log.configure.Linux.x86_64.64_cc
checking numaif.h usability... no
checking numaif.h presence... yes
configure: WARNING: numaif.h: present but cannot be compiled
configure: WARNING: numaif.h: check for missing prerequisite headers?
configure: WARNING: numaif.h: see the Autoconf documentation
configure: WARNING: numaif.h: section "Present But Cannot Be Compiled"
configure: WARNING: numaif.h: proceeding with the compiler's result
checking for numaif.h... no
loki openmpi-dev-4010-g6c9d65c-Linux.x86_64.64_cc 124



I didn't get the warning for openmpi-v1.10.2-176-g9d45e07 and
openmpi-v2.x-dev-1404-g74d8ea0 as you can see in my previous emails,
although I have the same messages in log.configure.*. I would be
grateful, if somebody can fix the problem if it is a problem
and not an intended message. Thank you very much for any help in
advance.


Kind regards

Siegmar



___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29131.php


Re: [OMPI users] [open-mpi/ompi] COMM_SPAWN broken on Solaris/v1.10 (#1569)

2016-05-07 Thread Siegmar Gross

Hi Gilles,

the minimal configuration to reproduce an error with spawn_master
are two Sparc machines.


tyr spawn 124 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute"
  OPAL repo revision: v1.10.2-176-g9d45e07
 C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc

tyr spawn 125 ssh ruester ompi_info | grep -e "OPAL repo revision" -e "C 
compiler absolute"

  OPAL repo revision: v1.10.2-176-g9d45e07
 C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc


tyr spawn 126 uname -a
SunOS tyr.informatik.hs-fulda.de 5.10 Generic_150400-11 sun4u sparc SUNW,A70 
Solaris

tyr spawn 127 ssh ruester uname -a
SunOS ruester.informatik.hs-fulda.de 5.10 Generic_150400-04 sun4u sparc 
SUNW,SPARC-Enterprise Solaris



tyr spawn 128 mpiexec -np 1 --host tyr,tyr,tyr,tyr,tyr spawn_master

Parent process 0 running on tyr.informatik.hs-fulda.de
  I create 4 slave processes

Parent process 0: tasks in MPI_COMM_WORLD:1
  tasks in COMM_CHILD_PROCESSES local group:  1
  tasks in COMM_CHILD_PROCESSES remote group: 4

Slave process 1 of 4 running on tyr.informatik.hs-fulda.de
Slave process 0 of 4 running on tyr.informatik.hs-fulda.de
Slave process 3 of 4 running on tyr.informatik.hs-fulda.de
Slave process 2 of 4 running on tyr.informatik.hs-fulda.de
spawn_slave 2: argv[0]: spawn_slave
spawn_slave 0: argv[0]: spawn_slave
spawn_slave 3: argv[0]: spawn_slave
spawn_slave 1: argv[0]: spawn_slave


tyr spawn 129 mpiexec -np 1 --host tyr,tyr,tyr,tyr,ruester spawn_master

Parent process 0 running on tyr.informatik.hs-fulda.de
  I create 4 slave processes

Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) 
(proc_pointer))->obj_magic_id, file 
../../openmpi-v1.10.2-176-g9d45e07/ompi/group/group_init.c, line 215, function 
ompi_group_increment_proc_count

[ruester:23592] *** Process received signal ***
[ruester:23592] Signal: Abort (6)
[ruester:23592] Signal code:  (-1)
/usr/local/openmpi-1.10.3_64_gcc/lib64/libopen-pal.so.13.0.2:opal_backtrace_print+0x2c
/usr/local/openmpi-1.10.3_64_gcc/lib64/libopen-pal.so.13.0.2:0xc2c0c
/lib/sparcv9/libc.so.1:0xd8c28
/lib/sparcv9/libc.so.1:0xcc79c
/lib/sparcv9/libc.so.1:0xcc9a8
/lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 6 (ABRT)]
/lib/sparcv9/libc.so.1:abort+0xd0
/lib/sparcv9/libc.so.1:_assert_c99+0x78
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:ompi_group_increment_proc_count+0xf0
/usr/local/openmpi-1.10.3_64_gcc/lib64/openmpi/mca_dpm_orte.so:0x6638
/usr/local/openmpi-1.10.3_64_gcc/lib64/openmpi/mca_dpm_orte.so:0x948c
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:ompi_mpi_init+0x1978
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:MPI_Init+0x2a8
/home/fd1026/SunOS/sparc/bin/spawn_slave:main+0x10
/home/fd1026/SunOS/sparc/bin/spawn_slave:_start+0x7c
[ruester:23592] *** End of error message ***
--
mpiexec noticed that process rank 3 with PID 0 on node ruester exited on signal 
6 (Abort).

--
tyr spawn 130





A minimal configuration to reproduce an error with spawn_intra_comm
is a single machine for openmpi-2.x and openmpi-master. I didn't get
an error message on Linux (it just hangs after displaying the messages).


tyr spawn 114 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute"
  OPAL repo revision: dev-4010-g6c9d65c
 C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc

tyr spawn 115 mpiexec -np 1 --host tyr,tyr,tyr spawn_intra_comm
Parent process 0: I create 2 slave processes

Child process 0 running on tyr.informatik.hs-fulda.de
MPI_COMM_WORLD ntasks:  2
COMM_ALL_PROCESSES ntasks:  3
mytid in COMM_ALL_PROCESSES:1

Child process 1 running on tyr.informatik.hs-fulda.de
MPI_COMM_WORLD ntasks:  2
COMM_ALL_PROCESSES ntasks:  3
mytid in COMM_ALL_PROCESSES:2

Parent process 0 running on tyr.informatik.hs-fulda.de
MPI_COMM_WORLD ntasks:  1
COMM_CHILD_PROCESSES ntasks_local:  1
COMM_CHILD_PROCESSES ntasks_remote: 2
COMM_ALL_PROCESSES ntasks:  3
mytid in COMM_ALL_PROCESSES:0
[[48188,1],0][../../../../../openmpi-dev-4010-g6c9d65c/opal/mca/btl/tcp/btl_tcp_endpoint.c:755:mca_btl_tcp_endpoint_start_connect] 
from tyr to: tyr Unable to connect to the peer 193.174.24.39 on port 1026: 
Connection refused


[tyr.informatik.hs-fulda.de:06684] 
../../../../../openmpi-dev-4010-g6c9d65c/ompi/mca/pml/ob1/pml_ob1_sendreq.c:237 
FATAL

tyr spawn 116





sunpc1 fd1026 102 ompi_info | grep -e "OPAL repo revision" -e "C compiler 
absolute"
  OPAL repo revision: dev-4010-g6c9d65c
 C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc

sunpc1 fd1026 103 mpiexec -np 1 --host sunpc1,sunpc1,sunpc1 spawn_intra_comm
Parent process 0: I create 2 slave processes

Parent process 0 running on sunpc1
MPI_COMM_WORLD ntasks:   

Re: [OMPI users] problem with Sun C 5.14 beta

2016-05-07 Thread Siegmar Gross

Hi Gilles,

thank you very much for your help. Now C and C++ are link
compatible.


Kind regards

Siegmar


On 05/07/16 12:15, Gilles Gouaillardet wrote:

Siegmar,

per the config.log, you need to update your CXXFLAGS="-m64 -library=stlport4
-std=sun03"
or just CXXFLAGS="-m64"

Cheers,

Gilles

On Saturday, May 7, 2016, Siegmar Gross mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote:

Hi,

today I tried to install openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux
Enterprise Server 12 (x86_64)" with Sun C 5.14 beta. Unfortunately
"configure" breaks, because it thinks that C and C++ are link
incompatible. I used the following configure command.

../openmpi-v1.10.2-176-g9d45e07/configure \
  --prefix=/usr/local/openmpi-1.10.3_64_cc \
  --libdir=/usr/local/openmpi-1.10.3_64_cc/lib64 \
  --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
  --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
  JAVA_HOME=/usr/local/jdk1.8.0_66 \
  LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack" CC="cc" CXX="CC" FC="f95" \
  CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
  CPP="cpp" CXXCPP="cpp" \
  --enable-mpi-cxx \
  --enable-cxx-exceptions \
  --enable-mpi-java \
  --enable-heterogeneous \
  --enable-mpi-thread-multiple \
  --with-hwloc=internal \
  --without-verbs \
  --with-wrapper-cflags="-m64 -mt" \
  --with-wrapper-cxxflags="-m64 -library=stlport4" \
  --with-wrapper-fcflags="-m64" \
  --with-wrapper-ldflags="-mt" \
  --enable-debug \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc


I don't know if it is a compiler problem or a problem with the
configure command. Perhaps you are nevertheless interested in
the problem.


Kind regards

Siegmar



___
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29132.php



Re: [OMPI users] No core dump in some cases

2016-05-07 Thread Jeff Squyres (jsquyres)
I'm afraid I don't know what a .btr file is -- that is not something that is 
controlled by Open MPI.

You might want to look into your OS settings to see if it has some kind of 
alternate corefile mechanism...?


> On May 6, 2016, at 8:58 PM, dpchoudh .  wrote:
> 
> Hello all
> 
> I run MPI jobs (for test purpose only) on two different 'clusters'. Both 
> 'clusters' have two nodes only, connected back-to-back. The two are very 
> similar, but not identical, both software and hardware wise.
> 
> Both have ulimit -c set to unlimited. However, only one of the two creates 
> core files when an MPI job crashes. The other creates a text file named 
> something like
> .80s-,.btr
> 
> I'd much prefer a core file because that allows me to debug with a lot more 
> options than a static text file with addresses. How do I get a core file in 
> all situations? I am using MPI source from the master branch.
> 
> Thanks in advance
> Durga
> 
> The surgeon general advises you to eat right, exercise regularly and quit 
> ageing.
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29124.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/