I’d be a little cautious here - I’m not sure that hetero operations are 
completely fixed. The README is probably a bit over-stated (reflecting an 
earlier state), but I’m certain we haven’t extensively tested hetero operations 
and suspect there are still lingering issues.


> On Dec 23, 2014, at 11:29 PM, Kawashima, Takahiro 
> <t-kawash...@jp.fujitsu.com> wrote:
> 
> Gilles,
> 
> Ahh, I didn't know the current status. Thank you for the notice!
> 
> Thanks,
> Takahiro Kawashima
> 
>> Kawashima-san,
>> 
>> i'd rather consider this as a bug in the README (!)
>> 
>> 
>> heterogenous support has been broken for some time, but it was
>> eventually fixed.
>> 
>> truth is there are *very* limited resources (both human and hardware)
>> maintaining heterogeneous
>> support, but that does not mean heterogeneous support should not be
>> used, nor that bug report
>> will be ignored.
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 2014/12/24 9:26, Kawashima, Takahiro wrote:
>>> Hi Siegmar,
>>> 
>>> Heterogeneous environment is not supported officially.
>>> 
>>> README of Open MPI master says:
>>> 
>>> --enable-heterogeneous
>>>  Enable support for running on heterogeneous clusters (e.g., machines
>>>  with different endian representations).  Heterogeneous support is
>>>  disabled by default because it imposes a minor performance penalty.
>>> 
>>>  *** THIS FUNCTIONALITY IS CURRENTLY BROKEN - DO NOT USE ***
>>> 
>>>> Hi,
>>>> 
>>>> today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 
>>>> Sparc,
>>>> Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the
>>>> new Solaris Studio 12.4 compilers. All build processes finished without
>>>> errors, but I have a problem running a very small program. It works for
>>>> three processes but hangs for six processes. I have the same behaviour
>>>> for both compilers.
>>>> 
>>>> tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr 
>>>> init_finalize; time
>>>> 827.161u 210.126s 30:51.08 56.0%        0+0k 4151+20io 2898pf+0w
>>>> Hello!
>>>> Hello!
>>>> Hello!
>>>> 827.886u 210.335s 30:54.68 55.9%        0+0k 4151+20io 2898pf+0w
>>>> tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr 
>>>> init_finalize; time
>>>> 827.946u 210.370s 31:15.02 55.3%        0+0k 4151+20io 2898pf+0w
>>>> ^CKilled by signal 2.
>>>> Killed by signal 2.
>>>> 869.242u 221.644s 33:40.54 53.9%        0+0k 4151+20io 2898pf+0w
>>>> tyr small_prog 141 
>>>> 
>>>> tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C 
>>>> compiler:"
>>>>  Open MPI repo revision: dev-602-g82c02b4
>>>>              C compiler: cc
>>>> tyr small_prog 146 
>>>> 
>>>> 
>>>> tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
>>>> GNU gdb (GDB) 7.6.1
>>>> ...
>>>> (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
>>>> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host 
>>>> sunpc1,linpc1,tyr 
>>>> init_finalize
>>>> [Thread debugging using libthread_db enabled]
>>>> [New Thread 1 (LWP 1)]
>>>> [New LWP    2        ]
>>>> Hello!
>>>> Hello!
>>>> Hello!
>>>> [LWP    2         exited]
>>>> [New Thread 2        ]
>>>> [Switching to Thread 1 (LWP 1)]
>>>> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
>>>> satisfy query
>>>> (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
>>>> The program being debugged has been started already.
>>>> Start it from the beginning? (y or n) y
>>>> 
>>>> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host 
>>>> sunpc1,linpc1,tyr 
>>>> init_finalize
>>>> [Thread debugging using libthread_db enabled]
>>>> [New Thread 1 (LWP 1)]
>>>> [New LWP    2        ]
>>>> ^CKilled by signal 2.
>>>> Killed by signal 2.
>>>> 
>>>> Program received signal SIGINT, Interrupt.
>>>> [Switching to Thread 1 (LWP 1)]
>>>> 0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
>>>> (gdb) bt
>>>> #0  0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
>>>> #1  0xffffffff7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
>>>> #2  0xffffffff7d170ed8 in poll () from /lib/sparcv9/libc.so.1
>>>> #3  0xffffffff7e69a630 in poll_dispatch ()
>>>>   from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
>>>> #4  0xffffffff7e6894ec in opal_libevent2021_event_base_loop ()
>>>>   from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
>>>> #5  0x000000010000eb14 in orterun (argc=1757447168, 
>>>> argv=0xffffff7ed8550cff)
>>>>    at 
>>>> ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090
>>>> #6  0x0000000100004e2c in main (argc=256, argv=0xffffff7ed8af5c00)
>>>>    at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13
>>>> (gdb) 
>>>> 
>>>> Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test
>>>> any patches until the end of the year. Neverthess I wanted to report the
>>>> problem. At the moment I cannot test if I have the same behaviour in a
>>>> homogeneous environment with three machines because the new version isn't
>>>> available before tomorrow on the other machines. I used the following
>>>> configure command.
>>>> 
>>>> ../openmpi-dev-602-g82c02b4/configure 
>>>> --prefix=/usr/local/openmpi-1.9.0_64_cc \
>>>>  --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \
>>>>  --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
>>>>  --with-jdk-headers=/usr/local/jdk1.8.0/include \
>>>>  JAVA_HOME=/usr/local/jdk1.8.0 \
>>>>  LDFLAGS="-m64 -mt" \
>>>>  CC="cc" CXX="CC" FC="f95" \
>>>>  CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
>>>>  CPP="cpp" CXXCPP="cpp" \
>>>>  CPPFLAGS="" CXXCPPFLAGS="" \
>>>>  --enable-mpi-cxx \
>>>>  --enable-cxx-exceptions \
>>>>  --enable-mpi-java \
>>>>  --enable-heterogeneous \
>>>>  --enable-mpi-thread-multiple \
>>>>  --with-threads=posix \
>>>>  --with-hwloc=internal \
>>>>  --without-verbs \
>>>>  --with-wrapper-cflags="-m64 -mt" \
>>>>  --with-wrapper-cxxflags="-m64 -library=stlport4" \
>>>>  --with-wrapper-ldflags="-mt" \
>>>>  --enable-debug \
>>>>  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
>>>> 
>>>> Furthermore I used the following test program.
>>>> 
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>> #include "mpi.h"
>>>> 
>>>> int main (int argc, char *argv[])
>>>> {
>>>>  MPI_Init (&argc, &argv);
>>>>  printf ("Hello!\n");
>>>>  MPI_Finalize ();
>>>>  return EXIT_SUCCESS;
>>>> }
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/26070.php

Reply via email to