[OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
I installed OpenMPI (I have a simple dual core AMD notebook with Fedora 16)
via:

# yum install openmpi
# yum install openmpi-devel
# mpirun --version
mpirun (Open MPI) 1.5.4

I added:

$ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
$ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/

Then:

$ mpif90 ex1.f95
$ mpiexec -n 4 ./a.out
./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open
shared object file: No such file or directory
./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open
shared object file: No such file or directory
./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open
shared object file: No such file or directory
./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot open
shared object file: No such file or directory
--
mpiexec noticed that the job aborted, but has no info as to the process
that caused that situation.
--

ls -l /usr/lib/openmpi/lib/
total 6788
lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so ->
libmca_common_sm.so.2.0.0
lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 ->
libmca_common_sm.so.2.0.0
-rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so ->
libmpi_cxx.so.1.0.1
lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 ->
libmpi_cxx.so.1.0.1
-rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f77.so ->
libmpi_f77.so.1.0.2
lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f77.so.1 ->
libmpi_f77.so.1.0.2
-rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f90.so ->
libmpi_f90.so.1.1.0
lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f90.so.1 ->
libmpi_f90.so.1.1.0
-rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libmpi.so -> libmpi.so.1.0.2
lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libmpi.so.1 -> libmpi.so.1.0.2
-rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
lrwxrwxrwx. 1 root root  21 Sep 15 12:25 libompitrace.so ->
libompitrace.so.0.0.0
lrwxrwxrwx. 1 root root  21 Sep 14 16:14 libompitrace.so.0 ->
libompitrace.so.0.0.0
-rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-pal.so ->
libopen-pal.so.3.0.0
lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-pal.so.3 ->
libopen-pal.so.3.0.0
-rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-rte.so ->
libopen-rte.so.3.0.0
lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-rte.so.3 ->
libopen-rte.so.3.0.0
-rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
-rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libotf.so -> libotf.so.0.0.1
lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libotf.so.0 -> libotf.so.0.0.1
-rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
-rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
-rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-hyb.so ->
libvt-hyb.so.0.0.0
lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-hyb.so.0 ->
libvt-hyb.so.0.0.0
-rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
-rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-mpi.so ->
libvt-mpi.so.0.0.0
lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-mpi.so.0 ->
libvt-mpi.so.0.0.0
-rwxr-xr-x. 1 root root  396352 Jan 20  2012 libvt-mpi.so.0.0.0
-rw-r--r--. 1 root root  372352 Jan 20  2012 libvt-mt.a
lrwxrwxrwx. 1 root root  17 Sep 15 12:25 libvt-mt.so ->
libvt-mt.so.0.0.0
lrwxrwxrwx. 1 root root  17 Sep 14 16:14 libvt-mt.so.0 ->
libvt-mt.so.0.0.0
-rwxr-xr-x. 1 root root  266104 Jan 20  2012 libvt-mt.so.0.0.0
-rw-r--r--. 1 root root   60390 Jan 20  2012 libvt-pomp.a
lrwxrwxrwx. 1 root root  14 Sep 15 12:25 libvt.so -> libvt.so.0.0.0
lrwxrwxrwx. 1 root root  14 Sep 14 16:14 libvt.so.0 -> libvt.so.0.0.0
-rwxr-xr-x. 1 root root  242604 Jan 20  2012 libvt.so.0.0.0
-rwxr-xr-x. 1 root root  303591 Jan 20  2012 mpi.mod
drwxr-xr-x. 2 root root4096 Sep 14 16:14 openmpi


The file (actually, a link) it claims it can't find: libmpi_f90.so.1, is
clearly there. And LD_LIBRARY_PATH=/usr/lib/openmpi/lib/.

What's the problem?

---John


Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
$ which mpiexec
/usr/lib/openmpi/bin/mpiexec

# mpiexec -n 1 printenv | grep PATH
PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
WINDOWPATH=1



On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain  wrote:

> Couple of things worth checking:
>
> 1. verify that you executed the "mpiexec" you think you did - a simple
> "which mpiexec" should suffice
>
> 2. verify that your environment is correct by "mpiexec -n 1 printenv |
> grep PATH". Sometimes the ld_library_path doesn't carry over like you think
> it should
>
>
> On Sep 15, 2012, at 10:00 AM, John Chludzinski 
> wrote:
>
> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora
> 16) via:
>
> # yum install openmpi
> # yum install openmpi-devel
> # mpirun --version
> mpirun (Open MPI) 1.5.4
>
> I added:
>
> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>
> Then:
>
> $ mpif90 ex1.f95
> $ mpiexec -n 4 ./a.out
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
> open shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
> open shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
> open shared object file: No such file or directory
> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
> open shared object file: No such file or directory
> --
> mpiexec noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
>
> ls -l /usr/lib/openmpi/lib/
> total 6788
> lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so ->
> libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  25 Sep 14 16:14 libmca_common_sm.so.2 ->
> libmca_common_sm.so.2.0.0
> -rwxr-xr-x. 1 root root8492 Jan 20  2012 libmca_common_sm.so.2.0.0
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_cxx.so ->
> libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_cxx.so.1 ->
> libmpi_cxx.so.1.0.1
> -rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f77.so ->
> libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f77.so.1 ->
> libmpi_f77.so.1.0.2
> -rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
> lrwxrwxrwx. 1 root root  19 Sep 15 12:25 libmpi_f90.so ->
> libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  19 Sep 14 16:14 libmpi_f90.so.1 ->
> libmpi_f90.so.1.1.0
> -rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libmpi.so -> libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libmpi.so.1 -> libmpi.so.1.0.2
> -rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
> lrwxrwxrwx. 1 root root  21 Sep 15 12:25 libompitrace.so ->
> libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  21 Sep 14 16:14 libompitrace.so.0 ->
> libompitrace.so.0.0.0
> -rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-pal.so ->
> libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-pal.so.3 ->
> libopen-pal.so.3.0.0
> -rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 15 12:25 libopen-rte.so ->
> libopen-rte.so.3.0.0
> lrwxrwxrwx. 1 root root  20 Sep 14 16:14 libopen-rte.so.3 ->
> libopen-rte.so.3.0.0
> -rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
> -rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
> lrwxrwxrwx. 1 root root  15 Sep 15 12:25 libotf.so -> libotf.so.0.0.1
> lrwxrwxrwx. 1 root root  15 Sep 14 16:14 libotf.so.0 -> libotf.so.0.0.1
> -rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
> -rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
> -rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-hyb.so ->
> libvt-hyb.so.0.0.0
> lrwxrwxrwx. 1 root root  18 Sep 14 16:14 libvt-hyb.so.0 ->
> libvt-hyb.so.0.0.0
> -rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
> -rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
> lrwxrwxrwx. 1 root root  18 Sep 15 12:25 libvt-mpi.so ->
> libvt-mpi.so.0.0.0
> lrwxrwx

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
# export LD_LIBRARY_PATH

# mpiexec -n 1 printenv | grep PATH
LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
WINDOWPATH=1

# mpiexec -n 4 ./a.out
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--
[[3598,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: elzbieta

Another transport will be used instead, although this may result in
lower performance.
--
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
CMA: unable to get RDMA device list
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
[elzbieta:4145] *** An error occurred in MPI_Scatter
[elzbieta:4145] *** on communicator MPI_COMM_WORLD
[elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype
[elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--
mpiexec has exited due to process rank 1 with PID 4145 on
node elzbieta exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--


On Sat, Sep 15, 2012 at 2:24 PM, Ralph Castain  wrote:

> Ah - note that there is no LD_LIBRARY_PATH in the environment. That's the
> problem
>
> On Sep 15, 2012, at 11:19 AM, John Chludzinski 
> wrote:
>
> $ which mpiexec
> /usr/lib/openmpi/bin/mpiexec
>
> # mpiexec -n 1 printenv | grep PATH
>
> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> WINDOWPATH=1
>
>
>
> On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain  wrote:
>
>> Couple of things worth checking:
>>
>> 1. verify that you executed the "mpiexec" you think you did - a simple
>> "which mpiexec" should suffice
>>
>> 2. verify that your environment is correct by "mpiexec -n 1 printenv |
>> grep PATH". Sometimes the ld_library_path doesn't carry over like you think
>> it should
>>
>>
>> On Sep 15, 2012, at 10:00 AM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>> I installed OpenMPI (I have a simple dual core AMD notebook with Fedora
>> 16) via:
>>
>> # yum install openmpi
>> # yum install openmpi-devel
>> # mpirun --version
>> mpirun (Open MPI) 1.5.4
>>
>> I added:
>>
>> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
>> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>>
>> Then:
>>
>> $ mpif90 ex1.f95
>> $ mpiexec -n 4 ./a.out
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>> open shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>> open shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>> open shared object file: No such file or directory
>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: cannot
>> open shared object file: No such file or directory
>> --
>> mpiexec noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --
>>
>> ls -l /usr/lib/openmpi/lib/
>> total 6788
>> lrwxrwxrwx. 1 root root  25 Sep 15 12:25 libmca_common_sm.so ->
>> libmca_comm

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
BTW, here the example code:

program scatter
include 'mpif.h'

integer, parameter :: SIZE=4
integer :: numtasks, rank, sendcount, recvcount, source, ierr
real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)

!  Fortran stores this array in column major order, so the
!  scatter will actually scatter columns, not rows.
data sendbuf /1.0, 2.0, 3.0, 4.0, &
5.0, 6.0, 7.0, 8.0, &
9.0, 10.0, 11.0, 12.0, &
13.0, 14.0, 15.0, 16.0 /

call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)

if (numtasks .eq. SIZE) then
  source = 1
  sendcount = SIZE
  recvcount = SIZE
  call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
   recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
  print *, 'rank= ',rank,' Results: ',recvbuf
else
   print *, 'Must specify',SIZE,' processors.  Terminating.'
endif

call MPI_FINALIZE(ierr)

end program


On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski <
john.chludzin...@gmail.com> wrote:

> # export LD_LIBRARY_PATH
>
>
> # mpiexec -n 1 printenv | grep PATH
> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>
>
> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
> WINDOWPATH=1
>
> # mpiexec -n 4 ./a.out
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[3598,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
>
> Module: OpenFabrics (openib)
>   Host: elzbieta
>
> Another transport will be used instead, although this may result in
> lower performance.
> --
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> [elzbieta:4145] *** An error occurred in MPI_Scatter
> [elzbieta:4145] *** on communicator MPI_COMM_WORLD
> [elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype
> [elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> --
> mpiexec has exited due to process rank 1 with PID 4145 on
> node elzbieta exiting improperly. There are two reasons this could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpiexec (as reported here).
> --
>
>
>
> On Sat, Sep 15, 2012 at 2:24 PM, Ralph Castain  wrote:
>
>> Ah - note that there is no LD_LIBRARY_PATH in the environment. That's the
>> problem
>>
>> On Sep 15, 2012, at 11:19 AM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>> $ which mpiexec
>> /usr/lib/openmpi/bin/mpiexec
>>
>> # mpiexec -n 1 printenv | grep PATH
>>
>> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
>> WINDOWPATH=1
>>
>>
>>
>> On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain  wrote:
>>
>>> Couple of things worth checking:
>>>
>>> 1. verify that you executed the "mpiexec" you think you did - a simple
>>> "which mpiexec" should suffice
>>>
>>> 2. verify that your environment is correct by "mpiexec -n 1 printenv |
>>> grep PATH". Sometimes the ld_library_path doesn't carry over like you think
>>> it should
>>>
>>>
>>>  On Sep 15, 2012, at 10:00 AM, John Chludzinski <
>>> john.chludzin

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
There was a bug in the code.  So now I get this, which is correct but how
do I get rid of all these ABI, CMA, etc. messages?

$ mpiexec -n 4 ./a.out
librdmacm: couldn't read ABI version.
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: assuming: 4
CMA: unable to get RDMA device list
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--
[[6110,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: elzbieta

Another transport will be used instead, although this may result in
lower performance.
--
 rank=1  Results:5.000   6.000
7.000   8.000
 rank=2  Results:9.000   10.00
11.00   12.00
 rank=0  Results:1.000   2.000
3.000   4.000
 rank=3  Results:13.00   14.00
15.00   16.00
[elzbieta:02559] 3 more processes have sent help message
help-mpi-btl-base.txt / btl:no-nics
[elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages


On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski <
john.chludzin...@gmail.com> wrote:

> BTW, here the example code:
>
> program scatter
> include 'mpif.h'
>
> integer, parameter :: SIZE=4
> integer :: numtasks, rank, sendcount, recvcount, source, ierr
> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
>
> !  Fortran stores this array in column major order, so the
> !  scatter will actually scatter columns, not rows.
> data sendbuf /1.0, 2.0, 3.0, 4.0, &
> 5.0, 6.0, 7.0, 8.0, &
> 9.0, 10.0, 11.0, 12.0, &
> 13.0, 14.0, 15.0, 16.0 /
>
> call MPI_INIT(ierr)
> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
>
> if (numtasks .eq. SIZE) then
>   source = 1
>   sendcount = SIZE
>   recvcount = SIZE
>   call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
>recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
>   print *, 'rank= ',rank,' Results: ',recvbuf
> else
>print *, 'Must specify',SIZE,' processors.  Terminating.'
> endif
>
> call MPI_FINALIZE(ierr)
>
> end program
>
>
> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski <
> john.chludzin...@gmail.com> wrote:
>
>> # export LD_LIBRARY_PATH
>>
>>
>> # mpiexec -n 1 printenv | grep PATH
>> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>>
>>
>> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
>> WINDOWPATH=1
>>
>> # mpiexec -n 4 ./a.out
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> --
>> [[3598,1],0]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>>
>> Module: OpenFabrics (openib)
>>   Host: elzbieta
>>
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> librdmacm: couldn't read ABI version.
>> CMA: unable to get RDMA device list
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> [elzbieta:4145] *** An error occurred in MPI_Scatter
>> [elzbieta:4145] *** on communicator MPI_COMM_WORLD
>> [elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype
>> [elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>> --
>> mpiexec has exited due to process rank 1 with PID 4145 on
>> node elzbieta exiting improperly. There are two reasons this could occur:
>>
>> 1. this process did not call "init" before exiting, but others in
>> the job did. This can ca

Re: [OMPI users] Newbie question?

2012-09-15 Thread John Chludzinski
Is this what you intended(?):

*$ mpiexec -n 4 ./a.out -mca btl^openib

*librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--
[[5991,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: elzbieta

Another transport will be used instead, although this may result in
lower performance.
--
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
 rank=1  Results:5.000   6.000
7.000   8.000
 rank=0  Results:1.000   2.000
3.000   4.000
 rank=2  Results:9.000   10.00
11.00   12.00
 rank=3  Results:13.00   14.00
15.00   16.00
[elzbieta:02374] 3 more processes have sent help message
help-mpi-btl-base.txt / btl:no-nics
[elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages


On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:

> Try adding "-mca btl ^openib" to your cmd line and see if that cleans it
> up.
>
>
> On Sep 15, 2012, at 12:44 PM, John Chludzinski 
> wrote:
>
> There was a bug in the code.  So now I get this, which is correct but how
> do I get rid of all these ABI, CMA, etc. messages?
>
> $ mpiexec -n 4 ./a.out
> librdmacm: couldn't read ABI version.
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[6110,1],1]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
>
> Module: OpenFabrics (openib)
>   Host: elzbieta
>
> Another transport will be used instead, although this may result in
> lower performance.
> --
>  rank=1  Results:5.000   6.000
> 7.000   8.000
>  rank=2  Results:9.000   10.00
> 11.00   12.00
>  rank=0  Results:1.000   2.000
> 3.000   4.000
>  rank=3  Results:13.00   14.00
> 15.00   16.00
> [elzbieta:02559] 3 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
>
>
> On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski <
> john.chludzin...@gmail.com> wrote:
>
>> BTW, here the example code:
>>
>> program scatter
>> include 'mpif.h'
>>
>> integer, parameter :: SIZE=4
>> integer :: numtasks, rank, sendcount, recvcount, source, ierr
>> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
>>
>> !  Fortran stores this array in column major order, so the
>> !  scatter will actually scatter columns, not rows.
>> data sendbuf /1.0, 2.0, 3.0, 4.0, &
>> 5.0, 6.0, 7.0, 8.0, &
>> 9.0, 10.0, 11.0, 12.0, &
>> 13.0, 14.0, 15.0, 16.0 /
>>
>> call MPI_INIT(ierr)
>> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
>>
>> if (numtasks .eq. SIZE) then
>>   source = 1
>>   sendcount = SIZE
>>   recvcount = SIZE
>>   call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
>>recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
>>   print *, 'rank= ',rank,' Results: ',recvbuf
>> else
>>print *, 'Must specify',SIZE,' processors.  Terminating.'
>> endif
>>
>> call MPI_FINALIZE(ierr)
>>
>> end program
>>
>>
>> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>>> # export LD_LIBRARY_PATH
>>>
>>>
>>> # mpiexec -n 1 printenv | 

Re: [OMPI users] Newbie question?

2012-09-16 Thread John Chludzinski
BINGO!  That did it.  Thanks.  ---John

On Sat, Sep 15, 2012 at 9:32 PM, Ralph Castain  wrote:

> No - the mca param has to be specified *before* your executable
>
> mpiexec -mca btl ^openib -n 4 ./a.out
>
> Also, note the space between "btl" and "^openib"
>
>
> On Sep 15, 2012, at 5:45 PM, John Chludzinski 
> wrote:
>
> Is this what you intended(?):
>
> *$ mpiexec -n 4 ./a.out -mca btl^openib
>
> *librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --
> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
>
> Module: OpenFabrics (openib)
>   Host: elzbieta
>
> Another transport will be used instead, although this may result in
> lower performance.
> --
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
>  rank=1  Results:5.000   6.000
> 7.000   8.000
>  rank=0  Results:1.000   2.000
> 3.000   4.000
>  rank=2  Results:9.000   10.00
> 11.00   12.00
>  rank=3  Results:13.00   14.00
> 15.00   16.00
> [elzbieta:02374] 3 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
>
>
> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:
>
>> Try adding "-mca btl ^openib" to your cmd line and see if that cleans it
>> up.
>>
>>
>> On Sep 15, 2012, at 12:44 PM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>> There was a bug in the code.  So now I get this, which is correct but how
>> do I get rid of all these ABI, CMA, etc. messages?
>>
>> $ mpiexec -n 4 ./a.out
>> librdmacm: couldn't read ABI version.
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> --
>> [[6110,1],1]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>>
>> Module: OpenFabrics (openib)
>>   Host: elzbieta
>>
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>>  rank=1  Results:5.000   6.000
>> 7.000   8.000
>>  rank=2  Results:9.000   10.00
>> 11.00   12.00
>>  rank=0  Results:1.000   2.000
>> 3.000   4.000
>>  rank=3  Results:13.00   14.00
>> 15.00   16.00
>> [elzbieta:02559] 3 more processes have sent help message
>> help-mpi-btl-base.txt / btl:no-nics
>> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to see
>> all help / error messages
>>
>>
>> On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski <
>> john.chludzin...@gmail.com> wrote:
>>
>>> BTW, here the example code:
>>>
>>> program scatter
>>> include 'mpif.h'
>>>
>>> integer, parameter :: SIZE=4
>>> integer :: numtasks, rank, sendcount, recvcount, source, ierr
>>> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
>>>
>>> !  Fortran stores this array in column major order, so the
>>> !  scatter will actually scatter columns, not rows.
>>> data sendbuf /1.0, 2.0, 3.0, 4.0, &
>>> 5.0, 6.0, 7.0, 8.0, &
>>> 9.0, 10.0, 11.0, 12.0, &
>>> 13.0, 14.0, 15.0, 16.0 /
>>>
>>> call MPI_INIT(ierr)
>>> call MPI_COMM

Re: [OMPI users] Newbie question?

2012-09-16 Thread John Chludzinski
BTW, I looked up the -mca option:

 -mca |--mca  
  Pass context-specific MCA parameters; they are
  considered global if --gmca is not used and only
  one context is specified (arg0 is the parameter
  name; arg1 is the parameter value)

Could you explain the args: btl and ^openib ?

---John


On Sun, Sep 16, 2012 at 12:26 AM, John Chludzinski <
john.chludzin...@gmail.com> wrote:

> BINGO!  That did it.  Thanks.  ---John
>
>
> On Sat, Sep 15, 2012 at 9:32 PM, Ralph Castain  wrote:
>
>> No - the mca param has to be specified *before* your executable
>>
>> mpiexec -mca btl ^openib -n 4 ./a.out
>>
>> Also, note the space between "btl" and "^openib"
>>
>>
>> On Sep 15, 2012, at 5:45 PM, John Chludzinski 
>> wrote:
>>
>> Is this what you intended(?):
>>
>> *$ mpiexec -n 4 ./a.out -mca btl^openib
>>
>> *librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> --
>> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module
>> was unable to find any relevant network interfaces:
>>
>> Module: OpenFabrics (openib)
>>   Host: elzbieta
>>
>> Another transport will be used instead, although this may result in
>> lower performance.
>> --
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>> librdmacm: couldn't read ABI version.
>> librdmacm: assuming: 4
>> CMA: unable to get RDMA device list
>>  rank=1  Results:5.000   6.000
>> 7.000   8.000
>>  rank=0  Results:1.000   2.000
>> 3.000   4.000
>>  rank=2  Results:9.000   10.00
>> 11.00   12.00
>>  rank=3  Results:13.00   14.00
>> 15.00   16.00
>> [elzbieta:02374] 3 more processes have sent help message
>> help-mpi-btl-base.txt / btl:no-nics
>> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to see
>> all help / error messages
>>
>>
>> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:
>>
>>> Try adding "-mca btl ^openib" to your cmd line and see if that cleans it
>>> up.
>>>
>>>
>>> On Sep 15, 2012, at 12:44 PM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
>>> There was a bug in the code.  So now I get this, which is correct but
>>> how do I get rid of all these ABI, CMA, etc. messages?
>>>
>>> $ mpiexec -n 4 ./a.out
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>
>>> --
>>> [[6110,1],1]: A high-performance Open MPI point-to-point messaging module
>>> was unable to find any relevant network interfaces:
>>>
>>> Module: OpenFabrics (openib)
>>>   Host: elzbieta
>>>
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --
>>>  rank=1  Results:5.000   6.000
>>> 7.000   8.000
>>>  rank=2  Results:9.000   10.00
>>> 11.00   12.00
>>>  rank=0  Results:1.000   2.000
>>> 3.000   4.000
>>>  rank=3  Results:13.00   14.00
>>> 15.00   16.00
>>> [elzbieta:02559] 3 more processes have sent help message
>>> help-mpi-btl-base.txt / btl:no-nics
>>> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to
>

Re: [OMPI users] Newbie question?

2012-09-16 Thread John Chludzinski
Thanks, I'll go to the FAQs.  ---John

On Sun, Sep 16, 2012 at 3:21 AM, Jingcha Joba  wrote:

> John,
>
> BTL refers to Byte Transfer Layer, a framework to send/receive point to
> point messages on different network. It has several components
> (implementations) like openib, tcp, mx, shared mem, etc.
>
> ^openib means "not" to use openib component for p2p messages.
>
> On a side note, do you have an RDMA supporting device (
> Infiniband/RoCE/iWarp) ? If so, is OFED installed correctly and is running?
> If you do not have, is the OFED running, which it should not, otherwise ?
>
> The message that you are getting could be because of this. As a
> consequence, if you have a RDMA supported device, you might be getting poor
> performance.
>
> A wealth of information is available in the FAQ section regarding these
> things.
>
> --
> Sent from my iPhone
>
> On Sep 15, 2012, at 9:49 PM, John Chludzinski 
> wrote:
>
> BTW, I looked up the -mca option:
>
>  -mca |--mca  
>   Pass context-specific MCA parameters; they are
>   considered global if --gmca is not used and only
>   one context is specified (arg0 is the parameter
>   name; arg1 is the parameter value)
>
> Could you explain the args: btl and ^openib ?
>
> ---John
>
>
> On Sun, Sep 16, 2012 at 12:26 AM, John Chludzinski <
> john.chludzin...@gmail.com> wrote:
>
>> BINGO!  That did it.  Thanks.  ---John
>>
>>
>> On Sat, Sep 15, 2012 at 9:32 PM, Ralph Castain  wrote:
>>
>>> No - the mca param has to be specified *before* your executable
>>>
>>> mpiexec -mca btl ^openib -n 4 ./a.out
>>>
>>> Also, note the space between "btl" and "^openib"
>>>
>>>
>>> On Sep 15, 2012, at 5:45 PM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
>>> Is this what you intended(?):
>>>
>>> *$ mpiexec -n 4 ./a.out -mca btl^openib
>>>
>>> *librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>
>>> --
>>> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module
>>> was unable to find any relevant network interfaces:
>>>
>>> Module: OpenFabrics (openib)
>>>   Host: elzbieta
>>>
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>  rank=1  Results:5.000   6.000
>>> 7.000   8.000
>>>  rank=0  Results:1.000   2.000
>>> 3.000   4.000
>>>  rank=2  Results:9.000   10.00
>>> 11.00   12.00
>>>  rank=3  Results:13.00   14.00
>>> 15.00   16.00
>>> [elzbieta:02374] 3 more processes have sent help message
>>> help-mpi-btl-base.txt / btl:no-nics
>>> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>> see all help / error messages
>>>
>>>
>>> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:
>>>
>>>> Try adding "-mca btl ^openib" to your cmd line and see if that cleans
>>>> it up.
>>>>
>>>>
>>>> On Sep 15, 2012, at 12:44 PM, John Chludzinski <
>>>> john.chludzin...@gmail.com> wrote:
>>>>
>>>> There was a bug in the code.  So now I get this, which is correct but
>>>> how do I get rid of all these ABI, CMA, etc. messages?
>>>>
>>>> $ mpiexec -n 4 ./a.out
>>>> librdmacm: couldn't read ABI version.
>>>> librdmacm: couldn't read ABI version.
>>>> librdmacm: assuming: 4
>>>> CMA: unable to get RDMA device list
>>>> librdmacm: assuming: 4
>>>> CMA: unable to get RDMA device list
>>>> CMA: unable to get RDMA device list
>>>> 

Re: [OMPI users] Newbie question?

2012-09-16 Thread John Chludzinski
> On a side note, do you have an RDMA supporting device (
Infiniband/RoCE/iWarp) ?

I'm just an engineer trying to get something to work on an AMD dual core
notebook for the powers-that-be at a small engineering concern (all MEs) in
Huntsville, AL - i.e., NASA work.

---John

On Sun, Sep 16, 2012 at 3:21 AM, Jingcha Joba  wrote:

> John,
>
> BTL refers to Byte Transfer Layer, a framework to send/receive point to
> point messages on different network. It has several components
> (implementations) like openib, tcp, mx, shared mem, etc.
>
> ^openib means "not" to use openib component for p2p messages.
>
> On a side note, do you have an RDMA supporting device (
> Infiniband/RoCE/iWarp) ? If so, is OFED installed correctly and is running?
> If you do not have, is the OFED running, which it should not, otherwise ?
>
> The message that you are getting could be because of this. As a
> consequence, if you have a RDMA supported device, you might be getting poor
> performance.
>
> A wealth of information is available in the FAQ section regarding these
> things.
>
> --
> Sent from my iPhone
>
> On Sep 15, 2012, at 9:49 PM, John Chludzinski 
> wrote:
>
> BTW, I looked up the -mca option:
>
>  -mca |--mca  
>   Pass context-specific MCA parameters; they are
>   considered global if --gmca is not used and only
>   one context is specified (arg0 is the parameter
>   name; arg1 is the parameter value)
>
> Could you explain the args: btl and ^openib ?
>
> ---John
>
>
> On Sun, Sep 16, 2012 at 12:26 AM, John Chludzinski <
> john.chludzin...@gmail.com> wrote:
>
>> BINGO!  That did it.  Thanks.  ---John
>>
>>
>> On Sat, Sep 15, 2012 at 9:32 PM, Ralph Castain  wrote:
>>
>>> No - the mca param has to be specified *before* your executable
>>>
>>> mpiexec -mca btl ^openib -n 4 ./a.out
>>>
>>> Also, note the space between "btl" and "^openib"
>>>
>>>
>>> On Sep 15, 2012, at 5:45 PM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
>>> Is this what you intended(?):
>>>
>>> *$ mpiexec -n 4 ./a.out -mca btl^openib
>>>
>>> *librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>
>>> --
>>> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module
>>> was unable to find any relevant network interfaces:
>>>
>>> Module: OpenFabrics (openib)
>>>   Host: elzbieta
>>>
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>  rank=1  Results:5.000   6.000
>>> 7.000   8.000
>>>  rank=0  Results:1.000   2.000
>>> 3.000   4.000
>>>  rank=2  Results:9.000   10.00
>>> 11.00   12.00
>>>  rank=3  Results:13.00   14.00
>>> 15.00   16.00
>>> [elzbieta:02374] 3 more processes have sent help message
>>> help-mpi-btl-base.txt / btl:no-nics
>>> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>> see all help / error messages
>>>
>>>
>>> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain  wrote:
>>>
>>>> Try adding "-mca btl ^openib" to your cmd line and see if that cleans
>>>> it up.
>>>>
>>>>
>>>> On Sep 15, 2012, at 12:44 PM, John Chludzinski <
>>>> john.chludzin...@gmail.com> wrote:
>>>>
>>>> There was a bug in the code.  So now I get this, which is correct but
>>>> how do I get rid of all these ABI, CMA, etc. messages?
>>>>
>>>> $ mpiexec -n 4 ./a.out
>>>> librdmacm: couldn't read ABI version.
>>>> librdmacm: couldn't read ABI version.
>>>&

[OMPI users] client-server example

2013-04-13 Thread John Chludzinski
Found the following client-server example (code) on
http://www.mpi-forum.org and I'm trying to get it to work.  Not sure
what argv[1] should be for the client?  The output from the server
side is:

   server available at
4094230528.0;tcp://192.168.1.4:55803+4094230529.0;tcp://192.168.1.4:51618:300


// SERVER
#include 
#include 
#include 
#include "mpi.h"

#define MAX_DATA 100
#define FATAL 1

int main( int argc, char **argv )
{
  MPI_Comm client;
  MPI_Status status;
  char port_name[MPI_MAX_PORT_NAME];
  double buf[MAX_DATA];
  int size, again;

  MPI_Init( &argc, &argv );
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  if (size != 1) error(FATAL, errno, "Server too big");
  MPI_Open_port(MPI_INFO_NULL, port_name);
  printf("server available at %s\n",port_name);

  while (1)
{
  MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client );
  again = 1;

  while (again)
{
  MPI_Recv( buf, MAX_DATA, MPI_DOUBLE, MPI_ANY_SOURCE,
MPI_ANY_TAG, client, &status );

  switch (status.MPI_TAG)
{
case 0: MPI_Comm_free( &client );
  MPI_Close_port(port_name);
  MPI_Finalize();
  return 0;
case 1: MPI_Comm_disconnect( &client );
  again = 0;
  break;
case 2: /* do something */
  fprintf( stderr, "Do something ...\n" );
default:
  /* Unexpected message type */
  MPI_Abort( MPI_COMM_WORLD, 1 );
}
}
}
}

//CLIENT
#include 
#include "mpi.h"

#define MAX_DATA 100

int main( int argc, char **argv )
{
  MPI_Comm server;
  double buf[MAX_DATA];
  char port_name[MPI_MAX_PORT_NAME];
  int done = 0, tag, n, CNT=0;

  MPI_Init( &argc, &argv );
  strcpy(port_name, argv[1] );  /* assume server's name is cmd-line arg */

  MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server );

  n = MAX_DATA;

  while (!done)
{
  tag = 2; /* Action to perform */
  if ( CNT == 5 ) { tag = 0; done = 1; }
  MPI_Send( buf, n, MPI_DOUBLE, 0, tag, server );
  CNT++;
  /* etc */
}

  MPI_Send( buf, 0, MPI_DOUBLE, 0, 1, server );
  MPI_Comm_disconnect( &server );
  MPI_Finalize();

  return 0;
}


Re: [OMPI users] client-server example

2013-04-13 Thread John Chludzinski
After I "source mpi.ksk", PATH is unchanged but LD_LIBRARY_PATH is there:

   $ print $LD_LIBRARY_PATH
   /usr/lib64/openmpi/lib/

Why does PATH loose its change?

---John


On Sat, Apr 13, 2013 at 12:55 PM, Ralph Castain  wrote:
> You need to pass in the port info that the server printed - just copy/paste 
> the line below "server available at".
>
> On Apr 12, 2013, at 10:58 PM, John Chludzinski  
> wrote:
>
>> Found the following client-server example (code) on
>> http://www.mpi-forum.org and I'm trying to get it to work.  Not sure
>> what argv[1] should be for the client?  The output from the server
>> side is:
>>
>>   server available at
>> 4094230528.0;tcp://192.168.1.4:55803+4094230529.0;tcp://192.168.1.4:51618:300
>>
>>
>> // SERVER
>> #include 
>> #include 
>> #include 
>> #include "mpi.h"
>>
>> #define MAX_DATA 100
>> #define FATAL 1
>>
>> int main( int argc, char **argv )
>> {
>>  MPI_Comm client;
>>  MPI_Status status;
>>  char port_name[MPI_MAX_PORT_NAME];
>>  double buf[MAX_DATA];
>>  int size, again;
>>
>>  MPI_Init( &argc, &argv );
>>  MPI_Comm_size(MPI_COMM_WORLD, &size);
>>  if (size != 1) error(FATAL, errno, "Server too big");
>>  MPI_Open_port(MPI_INFO_NULL, port_name);
>>  printf("server available at %s\n",port_name);
>>
>>  while (1)
>>{
>>  MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client );
>>  again = 1;
>>
>>  while (again)
>>{
>>  MPI_Recv( buf, MAX_DATA, MPI_DOUBLE, MPI_ANY_SOURCE,
>> MPI_ANY_TAG, client, &status );
>>
>>  switch (status.MPI_TAG)
>>{
>>case 0: MPI_Comm_free( &client );
>>  MPI_Close_port(port_name);
>>  MPI_Finalize();
>>  return 0;
>>case 1: MPI_Comm_disconnect( &client );
>>  again = 0;
>>  break;
>>case 2: /* do something */
>>  fprintf( stderr, "Do something ...\n" );
>>default:
>>  /* Unexpected message type */
>>  MPI_Abort( MPI_COMM_WORLD, 1 );
>>}
>>}
>>}
>> }
>>
>> //CLIENT
>> #include 
>> #include "mpi.h"
>>
>> #define MAX_DATA 100
>>
>> int main( int argc, char **argv )
>> {
>>  MPI_Comm server;
>>  double buf[MAX_DATA];
>>  char port_name[MPI_MAX_PORT_NAME];
>>  int done = 0, tag, n, CNT=0;
>>
>>  MPI_Init( &argc, &argv );
>>  strcpy(port_name, argv[1] );  /* assume server's name is cmd-line arg */
>>
>>  MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server );
>>
>>  n = MAX_DATA;
>>
>>  while (!done)
>>{
>>  tag = 2; /* Action to perform */
>>  if ( CNT == 5 ) { tag = 0; done = 1; }
>>  MPI_Send( buf, n, MPI_DOUBLE, 0, tag, server );
>>  CNT++;
>>  /* etc */
>>}
>>
>>  MPI_Send( buf, 0, MPI_DOUBLE, 0, 1, server );
>>  MPI_Comm_disconnect( &server );
>>  MPI_Finalize();
>>
>>  return 0;
>> }
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] client-server example

2013-04-13 Thread John Chludzinski
Sorry: The previous post was intended for another group, ignore it.

With regards to the client-server problem:

$ mpirun -n 1 client
3878879232.0;tcp://192.168.1.4:37625+3878879233.0;tcp://192.168.1.4:38945:300

[jski:01882] [[59199,1],0] ORTE_ERROR_LOG: Not found in file
dpm_orte.c at line 158
[jski:1882] *** An error occurred in MPI_Comm_connect
[jski:1882] *** on communicator MPI_COMM_WORLD
[jski:1882] *** MPI_ERR_INTERN: internal error
[jski:1882] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
--
mpirun has exited due to process rank 0 with PID 1882 on
node jski exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

On Sat, Apr 13, 2013 at 7:16 PM, John Chludzinski
 wrote:
> After I "source mpi.ksk", PATH is unchanged but LD_LIBRARY_PATH is there:
>
>$ print $LD_LIBRARY_PATH
>/usr/lib64/openmpi/lib/
>
> Why does PATH loose its change?
>
> ---John
>
>
> On Sat, Apr 13, 2013 at 12:55 PM, Ralph Castain  wrote:
>> You need to pass in the port info that the server printed - just copy/paste 
>> the line below "server available at".
>>
>> On Apr 12, 2013, at 10:58 PM, John Chludzinski  
>> wrote:
>>
>>> Found the following client-server example (code) on
>>> http://www.mpi-forum.org and I'm trying to get it to work.  Not sure
>>> what argv[1] should be for the client?  The output from the server
>>> side is:
>>>
>>>   server available at
>>> 4094230528.0;tcp://192.168.1.4:55803+4094230529.0;tcp://192.168.1.4:51618:300
>>>
>>>
>>> // SERVER
>>> #include 
>>> #include 
>>> #include 
>>> #include "mpi.h"
>>>
>>> #define MAX_DATA 100
>>> #define FATAL 1
>>>
>>> int main( int argc, char **argv )
>>> {
>>>  MPI_Comm client;
>>>  MPI_Status status;
>>>  char port_name[MPI_MAX_PORT_NAME];
>>>  double buf[MAX_DATA];
>>>  int size, again;
>>>
>>>  MPI_Init( &argc, &argv );
>>>  MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>  if (size != 1) error(FATAL, errno, "Server too big");
>>>  MPI_Open_port(MPI_INFO_NULL, port_name);
>>>  printf("server available at %s\n",port_name);
>>>
>>>  while (1)
>>>{
>>>  MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client 
>>> );
>>>  again = 1;
>>>
>>>  while (again)
>>>{
>>>  MPI_Recv( buf, MAX_DATA, MPI_DOUBLE, MPI_ANY_SOURCE,
>>> MPI_ANY_TAG, client, &status );
>>>
>>>  switch (status.MPI_TAG)
>>>{
>>>case 0: MPI_Comm_free( &client );
>>>  MPI_Close_port(port_name);
>>>  MPI_Finalize();
>>>  return 0;
>>>case 1: MPI_Comm_disconnect( &client );
>>>  again = 0;
>>>  break;
>>>case 2: /* do something */
>>>  fprintf( stderr, "Do something ...\n" );
>>>default:
>>>  /* Unexpected message type */
>>>  MPI_Abort( MPI_COMM_WORLD, 1 );
>>>}
>>>}
>>>}
>>> }
>>>
>>> //CLIENT
>>> #include 
>>> #include "mpi.h"
>>>
>>> #define MAX_DATA 100
>>>
>>> int main( int argc, char **argv )
>>> {
>>>  MPI_Comm server;
>>>  double buf[MAX_DATA];
>>>  char port_name[MPI_MAX_PORT_NAME];
>>>  int done = 0, tag, n, CNT=0;
>>>
>>>  MPI_Init( &argc, &argv );
>>>  strcpy(port_name, argv[1] );  /* assume server's name is cmd-line arg */
>>>
>>>  MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server );
>>>
>>>  n = MAX_DATA;
>>>
>>>  while (!done)
>>>{
>>>  tag = 2; /* Action to perform */
>>>  if ( CNT == 5 ) { tag = 0; done = 1; }
>>>  MPI_Send( buf, n, MPI_DOUBLE, 0, tag, server );
>>>  CNT++;
>>>  /* etc */
>>>}
>>>
>>>  MPI_Send( buf, 0, MPI_DOUBLE, 0, 1, server );
>>>  MPI_Comm_disconnect( &server );
>>>  MPI_Finalize();
>>>
>>>  return 0;
>>> }
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] client-server example

2013-04-13 Thread John Chludzinski
After I replaced ";" with "\;" in the server name I got passed the
ABORT problem.  Now the client and server deadlock until I finally get
(on the client side):

mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--
[jski:02429] [[59675,0],0] -> [[59187,0],0] (node: jski) oob-tcp:
Number of attempts to create TCP connection has been exceeded.  Cannot
communicate with peer.

On Sat, Apr 13, 2013 at 7:24 PM, John Chludzinski
 wrote:
> Sorry: The previous post was intended for another group, ignore it.
>
> With regards to the client-server problem:
>
> $ mpirun -n 1 client
> 3878879232.0;tcp://192.168.1.4:37625+3878879233.0;tcp://192.168.1.4:38945:300
>
> [jski:01882] [[59199,1],0] ORTE_ERROR_LOG: Not found in file
> dpm_orte.c at line 158
> [jski:1882] *** An error occurred in MPI_Comm_connect
> [jski:1882] *** on communicator MPI_COMM_WORLD
> [jski:1882] *** MPI_ERR_INTERN: internal error
> [jski:1882] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> --
> mpirun has exited due to process rank 0 with PID 1882 on
> node jski exiting improperly. There are two reasons this could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> On Sat, Apr 13, 2013 at 7:16 PM, John Chludzinski
>  wrote:
>> After I "source mpi.ksk", PATH is unchanged but LD_LIBRARY_PATH is there:
>>
>>$ print $LD_LIBRARY_PATH
>>/usr/lib64/openmpi/lib/
>>
>> Why does PATH loose its change?
>>
>> ---John
>>
>>
>> On Sat, Apr 13, 2013 at 12:55 PM, Ralph Castain  wrote:
>>> You need to pass in the port info that the server printed - just copy/paste 
>>> the line below "server available at".
>>>
>>> On Apr 12, 2013, at 10:58 PM, John Chludzinski  
>>> wrote:
>>>
>>>> Found the following client-server example (code) on
>>>> http://www.mpi-forum.org and I'm trying to get it to work.  Not sure
>>>> what argv[1] should be for the client?  The output from the server
>>>> side is:
>>>>
>>>>   server available at
>>>> 4094230528.0;tcp://192.168.1.4:55803+4094230529.0;tcp://192.168.1.4:51618:300
>>>>
>>>>
>>>> // SERVER
>>>> #include 
>>>> #include 
>>>> #include 
>>>> #include "mpi.h"
>>>>
>>>> #define MAX_DATA 100
>>>> #define FATAL 1
>>>>
>>>> int main( int argc, char **argv )
>>>> {
>>>>  MPI_Comm client;
>>>>  MPI_Status status;
>>>>  char port_name[MPI_MAX_PORT_NAME];
>>>>  double buf[MAX_DATA];
>>>>  int size, again;
>>>>
>>>>  MPI_Init( &argc, &argv );
>>>>  MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>  if (size != 1) error(FATAL, errno, "Server too big");
>>>>  MPI_Open_port(MPI_INFO_NULL, port_name);
>>>>  printf("server available at %s\n",port_name);
>>>>
>>>>  while (1)
>>>>{
>>>>  MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &client 
>>>> );
>>>>  again = 1;
>>>>
>>>>  while (again)
>>>>{
>>>>  MPI_Recv( buf, MAX_DATA, MPI_DOUBLE, MPI_ANY_SOURCE,
>>>> MPI_ANY_TAG, client, &status );
>>>>
>>>>  switch (status.MPI_TAG)
>>>>{
>>>>case 0: MPI_Comm_free( &client );
>>>>  MPI_Close_port(port_name);
>>>>  MPI_Finalize();
>>>>  return 0;
>>>>case 1: MPI_Comm_disconnect( &client );
>>>>  again = 0;
>>>>  break;
>>>>case 2: /* do something */
>>>>  fprintf( stderr, "D

Re: [OMPI users] client-server example

2013-04-13 Thread John Chludzinski
Yep, I saw both semi-colons but the client process hangs at:

  MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server );

---John

On Sat, Apr 13, 2013 at 10:05 PM, Ralph Castain  wrote:
> Did you see that there are two semi-colon's in that line? They both need to 
> be protected from the shell. I would just put quotes around the whole thing.
>
> Other than that, it looks okay to me...I assume you are using a 1.6 series 
> release?
>
> On Apr 13, 2013, at 4:54 PM, John Chludzinski  
> wrote:
>
>> After I replaced ";" with "\;" in the server name I got passed the
>> ABORT problem.  Now the client and server deadlock until I finally get
>> (on the client side):
>>
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --
>> [jski:02429] [[59675,0],0] -> [[59187,0],0] (node: jski) oob-tcp:
>> Number of attempts to create TCP connection has been exceeded.  Cannot
>> communicate with peer.
>>
>> On Sat, Apr 13, 2013 at 7:24 PM, John Chludzinski
>>  wrote:
>>> Sorry: The previous post was intended for another group, ignore it.
>>>
>>> With regards to the client-server problem:
>>>
>>> $ mpirun -n 1 client
>>> 3878879232.0;tcp://192.168.1.4:37625+3878879233.0;tcp://192.168.1.4:38945:300
>>>
>>> [jski:01882] [[59199,1],0] ORTE_ERROR_LOG: Not found in file
>>> dpm_orte.c at line 158
>>> [jski:1882] *** An error occurred in MPI_Comm_connect
>>> [jski:1882] *** on communicator MPI_COMM_WORLD
>>> [jski:1882] *** MPI_ERR_INTERN: internal error
>>> [jski:1882] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>>> --
>>> mpirun has exited due to process rank 0 with PID 1882 on
>>> node jski exiting improperly. There are two reasons this could occur:
>>>
>>> 1. this process did not call "init" before exiting, but others in
>>> the job did. This can cause a job to hang indefinitely while it waits
>>> for all processes to call "init". By rule, if one process calls "init",
>>> then ALL processes must call "init" prior to termination.
>>>
>>> 2. this process called "init", but exited without calling "finalize".
>>> By rule, all processes that call "init" MUST call "finalize" prior to
>>> exiting or it will be considered an "abnormal termination"
>>>
>>> On Sat, Apr 13, 2013 at 7:16 PM, John Chludzinski
>>>  wrote:
>>>> After I "source mpi.ksk", PATH is unchanged but LD_LIBRARY_PATH is there:
>>>>
>>>>   $ print $LD_LIBRARY_PATH
>>>>   /usr/lib64/openmpi/lib/
>>>>
>>>> Why does PATH loose its change?
>>>>
>>>> ---John
>>>>
>>>>
>>>> On Sat, Apr 13, 2013 at 12:55 PM, Ralph Castain  wrote:
>>>>> You need to pass in the port info that the server printed - just 
>>>>> copy/paste the line below "server available at".
>>>>>
>>>>> On Apr 12, 2013, at 10:58 PM, John Chludzinski 
>>>>>  wrote:
>>>>>
>>>>>> Found the following client-server example (code) on
>>>>>> http://www.mpi-forum.org and I'm trying to get it to work.  Not sure
>>>>>> what argv[1] should be for the client?  The output from the server
>>>>>> side is:
>>>>>>
>>>>>>  server available at
>>>>>> 4094230528.0;tcp://192.168.1.4:55803+4094230529.0;tcp://192.168.1.4:51618:300
>>>>>>
>>>>>>
>>>>>> // SERVER
>>>>>> #include 
>>>>>> #include 
>>>>>> #include 
>>>>>> #include "mpi.h"
>>>>>>
>>>>>> #define MAX_DATA 100
>>>>>> #define FATAL 1
>>>>>>
>>>>>> int main( int argc, char **argv )
>>>>>> {
>>>>>> MPI_Comm client;
>>>>>> MPI_Status status;
>>>>>> char port_name[MPI_MAX_PORT_NAME];
>>>>>> double buf[MAX_DATA];
>>>>>> int size, again;
>>>>>>
>>>>>> MPI_Init( &argc, &argv );
>>>>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>

Re: [OMPI users] client-server example

2013-04-14 Thread John Chludzinski
Thanks!  Works as advertised ... now.

I tried it without the quotes but with two \;  and this works as well.
 Not sure to whom to communicate it to but http://www.mpi-forum.org
should update their web site.

Thanks again,
John

On Sun, Apr 14, 2013 at 12:15 PM, Ralph Castain  wrote:
> Well, first thing is that the example is garbage - cannot work as written. 
> I've attached corrected versions.
>
> Even with those errors, though, it got thru comm_connect just fine for me IF 
> you put quotes around the entire port. With the corrected versions, I get 
> this:
>
> $ mpirun -n 1 ./server
> server available at 
> 2795175936.0;tcp://192.168.1.6:61075+2795175937.0;tcp://192.168.1.6:61076:300
> Server loop 1
> Do something ...
> Server loop 2
> Do something ...
> Server loop 3
> Do something ...
> Server loop 4
> Do something ...
> Server loop 5
> Do something ...
> Server loop 6
> Server recvd terminate cmd
> $
>
>
> $ mpirun -n 1 ./client 
> "2795175936.0;tcp://192.168.1.6:61075+2795175937.0;tcp://192.168.1.6:61076:300"
> Client sending message 0
> Client sending message 1
> Client sending message 2
> Client sending message 3
> Client sending message 4
> Client sending message 5
> $
>
>
>
>
> On Apr 13, 2013, at 7:24 PM, John Chludzinski  
> wrote:
>
>> Yep, I saw both semi-colons but the client process hangs at:
>>
>>  MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &server );
>>
>> ---John
>>
>> On Sat, Apr 13, 2013 at 10:05 PM, Ralph Castain  wrote:
>>> Did you see that there are two semi-colon's in that line? They both need to 
>>> be protected from the shell. I would just put quotes around the whole thing.
>>>
>>> Other than that, it looks okay to me...I assume you are using a 1.6 series 
>>> release?
>>>
>>> On Apr 13, 2013, at 4:54 PM, John Chludzinski  
>>> wrote:
>>>
>>>> After I replaced ";" with "\;" in the server name I got passed the
>>>> ABORT problem.  Now the client and server deadlock until I finally get
>>>> (on the client side):
>>>>
>>>> mpirun noticed that the job aborted, but has no info as to the process
>>>> that caused that situation.
>>>> --
>>>> [jski:02429] [[59675,0],0] -> [[59187,0],0] (node: jski) oob-tcp:
>>>> Number of attempts to create TCP connection has been exceeded.  Cannot
>>>> communicate with peer.
>>>>
>>>> On Sat, Apr 13, 2013 at 7:24 PM, John Chludzinski
>>>>  wrote:
>>>>> Sorry: The previous post was intended for another group, ignore it.
>>>>>
>>>>> With regards to the client-server problem:
>>>>>
>>>>> $ mpirun -n 1 client
>>>>> 3878879232.0;tcp://192.168.1.4:37625+3878879233.0;tcp://192.168.1.4:38945:300
>>>>>
>>>>> [jski:01882] [[59199,1],0] ORTE_ERROR_LOG: Not found in file
>>>>> dpm_orte.c at line 158
>>>>> [jski:1882] *** An error occurred in MPI_Comm_connect
>>>>> [jski:1882] *** on communicator MPI_COMM_WORLD
>>>>> [jski:1882] *** MPI_ERR_INTERN: internal error
>>>>> [jski:1882] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>>>>> --
>>>>> mpirun has exited due to process rank 0 with PID 1882 on
>>>>> node jski exiting improperly. There are two reasons this could occur:
>>>>>
>>>>> 1. this process did not call "init" before exiting, but others in
>>>>> the job did. This can cause a job to hang indefinitely while it waits
>>>>> for all processes to call "init". By rule, if one process calls "init",
>>>>> then ALL processes must call "init" prior to termination.
>>>>>
>>>>> 2. this process called "init", but exited without calling "finalize".
>>>>> By rule, all processes that call "init" MUST call "finalize" prior to
>>>>> exiting or it will be considered an "abnormal termination"
>>>>>
>>>>> On Sat, Apr 13, 2013 at 7:16 PM, John Chludzinski
>>>>>  wrote:
>>>>>> After I "source mpi.ksk", PATH is unchanged but LD_LIBRARY_PATH is there:
>>>>>>
>>>>>>  $ print $LD_LIBRARY_PATH
>>>&g

[OMPI users] MPI based HLA/RTI ?

2013-04-15 Thread John Chludzinski
Is anyone aware of an MPI based HLA/RTI (DoD High Level Architecture
(HLA) / Runtime Infrastructure)?

---John


Re: [OMPI users] MPI based HLA/RTI ?

2013-04-15 Thread John Chludzinski
This would be a departure from the SPMD paradigm that seems central to
MPI's design. Each process would be a completely different program
(piece of code) and I'm not sure how well that would working using
MPI?

BTW, MPI is commonly used in the parallel discrete even world for
communication between LPs (federates in HLA). But these LPs are
usually the same program.

---John

On Mon, Apr 15, 2013 at 10:22 AM, John Chludzinski
 wrote:
> Is anyone aware of an MPI based HLA/RTI (DoD High Level Architecture
> (HLA) / Runtime Infrastructure)?
>
> ---John


Re: [OMPI users] MPI based HLA/RTI ?

2013-04-15 Thread John Chludzinski
I just received an e-mail notifying me that MPI-2 supports MPMD.  This
would seen to be just what the doctor ordered?

---John


On Mon, Apr 15, 2013 at 11:10 AM, Ralph Castain  wrote:

> FWIW: some of us are working on a variant of MPI that would indeed support
> what you describe - it would support send/recv (i.e., MPI-1), but not
> collectives, and so would allow communication between arbitrary programs.
>
> Not specifically targeting HLA/RTI, though I suppose a wrapper that
> conformed to that standard could be created.
>
> On Apr 15, 2013, at 7:50 AM, John Chludzinski 
> wrote:
>
> > This would be a departure from the SPMD paradigm that seems central to
> > MPI's design. Each process would be a completely different program
> > (piece of code) and I'm not sure how well that would working using
> > MPI?
> >
> > BTW, MPI is commonly used in the parallel discrete even world for
> > communication between LPs (federates in HLA). But these LPs are
> > usually the same program.
> >
> > ---John
> >
> > On Mon, Apr 15, 2013 at 10:22 AM, John Chludzinski
> >  wrote:
> >> Is anyone aware of an MPI based HLA/RTI (DoD High Level Architecture
> >> (HLA) / Runtime Infrastructure)?
> >>
> >> ---John
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] MPI based HLA/RTI ?

2013-04-15 Thread John Chludzinski
That would seem to preclude its use for an RTI.  Unless you have a card up
your sleeve?

---John


On Mon, Apr 15, 2013 at 11:23 AM, Ralph Castain  wrote:

> It isn't the fact that there are multiple programs being used - we support
> that just fine. The problem with HLA/RTI is that it allows programs to
> come/go at will - i.e., not every program has to start at the same time,
> nor complete at the same time. MPI requires that all programs be executing
> at the beginning, and that all call finalize prior to anyone exiting.
>
>
> On Apr 15, 2013, at 8:14 AM, John Chludzinski 
> wrote:
>
> I just received an e-mail notifying me that MPI-2 supports MPMD.  This
> would seen to be just what the doctor ordered?
>
> ---John
>
>
> On Mon, Apr 15, 2013 at 11:10 AM, Ralph Castain  wrote:
>
>> FWIW: some of us are working on a variant of MPI that would indeed
>> support what you describe - it would support send/recv (i.e., MPI-1), but
>> not collectives, and so would allow communication between arbitrary
>> programs.
>>
>> Not specifically targeting HLA/RTI, though I suppose a wrapper that
>> conformed to that standard could be created.
>>
>> On Apr 15, 2013, at 7:50 AM, John Chludzinski 
>> wrote:
>>
>> > This would be a departure from the SPMD paradigm that seems central to
>> > MPI's design. Each process would be a completely different program
>> > (piece of code) and I'm not sure how well that would working using
>> > MPI?
>> >
>> > BTW, MPI is commonly used in the parallel discrete even world for
>> > communication between LPs (federates in HLA). But these LPs are
>> > usually the same program.
>> >
>> > ---John
>> >
>> > On Mon, Apr 15, 2013 at 10:22 AM, John Chludzinski
>> >  wrote:
>> >> Is anyone aware of an MPI based HLA/RTI (DoD High Level Architecture
>> >> (HLA) / Runtime Infrastructure)?
>> >>
>> >> ---John
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] MPI based HLA/RTI ?

2013-04-15 Thread John Chludzinski
Is that "in the works"?


On Mon, Apr 15, 2013 at 11:33 AM, Ralph Castain  wrote:

>
> On Apr 15, 2013, at 8:29 AM, John Chludzinski 
> wrote:
>
> That would seem to preclude its use for an RTI.  Unless you have a card up
> your sleeve?
>
>
> One can relax those requirements while maintaining the ability to use
> send/recv - you just can't use MPI collectives, and so the result doesn't
> conform to the MPI standardyet still retains value for those wanting to
> utilize high-speed, low-latency interconnects in non-MPI situations.
>
>
>
> ---John
>
>
> On Mon, Apr 15, 2013 at 11:23 AM, Ralph Castain  wrote:
>
>> It isn't the fact that there are multiple programs being used - we
>> support that just fine. The problem with HLA/RTI is that it allows programs
>> to come/go at will - i.e., not every program has to start at the same time,
>> nor complete at the same time. MPI requires that all programs be executing
>> at the beginning, and that all call finalize prior to anyone exiting.
>>
>>
>> On Apr 15, 2013, at 8:14 AM, John Chludzinski 
>> wrote:
>>
>> I just received an e-mail notifying me that MPI-2 supports MPMD.  This
>> would seen to be just what the doctor ordered?
>>
>> ---John
>>
>>
>> On Mon, Apr 15, 2013 at 11:10 AM, Ralph Castain  wrote:
>>
>>> FWIW: some of us are working on a variant of MPI that would indeed
>>> support what you describe - it would support send/recv (i.e., MPI-1), but
>>> not collectives, and so would allow communication between arbitrary
>>> programs.
>>>
>>> Not specifically targeting HLA/RTI, though I suppose a wrapper that
>>> conformed to that standard could be created.
>>>
>>> On Apr 15, 2013, at 7:50 AM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
>>> > This would be a departure from the SPMD paradigm that seems central to
>>> > MPI's design. Each process would be a completely different program
>>> > (piece of code) and I'm not sure how well that would working using
>>> > MPI?
>>> >
>>> > BTW, MPI is commonly used in the parallel discrete even world for
>>> > communication between LPs (federates in HLA). But these LPs are
>>> > usually the same program.
>>> >
>>> > ---John
>>> >
>>> > On Mon, Apr 15, 2013 at 10:22 AM, John Chludzinski
>>> >  wrote:
>>> >> Is anyone aware of an MPI based HLA/RTI (DoD High Level Architecture
>>> >> (HLA) / Runtime Infrastructure)?
>>> >>
>>> >> ---John
>>> > ___
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] MPI based HLA/RTI ?

2013-04-19 Thread John Chludzinski
So the apparent conclusion to this thread is that an (Open)MPI based RTI is
very doable - if we allow for the future develoment of dynamic joining and
leaving of the MPI collective?

---John


On Wed, Apr 17, 2013 at 12:45 PM, Ralph Castain  wrote:

> Thanks for the clarification - very interesting indeed! I'll look at it
> more closely.
>
>
> On Apr 17, 2013, at 9:20 AM, George Bosilca  wrote:
>
> On Apr 16, 2013, at 15:51 , Ralph Castain  wrote:
>
> Just curious: I thought ULFM dealt with recovering an MPI job where one or
> more processes fail. Is this correct?
>
>
> It depends what is the definition of "recovering" you take. ULFM is about
> leaving the processes that remains (after a fault or a disconnect) in a
> state that allow them to continue to make progress. It is not about
> recovering processes, or user data, but it does provide the minimalistic
> set of functionalities to allow application to do this, if needed (revoke,
> agreement and shrink).
>
> HLA/RTI consists of processes that start at random times, run to
> completion, and then exit normally. While a failure could occur, most
> process terminations are normal and there is no need/intent to revive them.
>
>
> As I said above, there is no revival of processes in ULFM, and it was
> never our intent to have such feature. The dynamic world is to be
> constructed using MPI-2 constructs (MPI_Spawn or MPI_Connect/Accept or even
> MPI_Join).
>
> So it's mostly a case of massively exercising MPI's dynamic
> connect/accept/disconnect functions.
>
> Do ULFM's structures have some utility for that purpose?
>
>
> Absolutely. If the process that leaves instead of calling MPI_Finalize
> calls exit() this will be interpreted by the version of the runtime in ULFM
> as an event triggering a report. All the ensuing mechanisms are then
> activated and the application can react to this event with the most
> meaningful approach it can envision.
>
>   George.
>
>
>
> On Apr 16, 2013, at 3:20 AM, George Bosilca  wrote:
>
> There is an ongoing effort to address the potential volatility of
> processes in MPI called ULFM. There is a working version available at
> http://fault-tolerance.org. It supports TCP, sm and IB (mostly). You will
> find some examples, and the document explaining the additional constructs
> needed in MPI to achieve this.
>
>   George.
>
> On Apr 15, 2013, at 17:29 , John Chludzinski 
> wrote:
>
> That would seem to preclude its use for an RTI.  Unless you have a card up
> your sleeve?
>
> ---John
>
>
> On Mon, Apr 15, 2013 at 11:23 AM, Ralph Castain  wrote:
>
>> It isn't the fact that there are multiple programs being used - we
>> support that just fine. The problem with HLA/RTI is that it allows programs
>> to come/go at will - i.e., not every program has to start at the same time,
>> nor complete at the same time. MPI requires that all programs be executing
>> at the beginning, and that all call finalize prior to anyone exiting.
>>
>>
>> On Apr 15, 2013, at 8:14 AM, John Chludzinski 
>> wrote:
>>
>> I just received an e-mail notifying me that MPI-2 supports MPMD.  This
>> would seen to be just what the doctor ordered?
>>
>> ---John
>>
>>
>> On Mon, Apr 15, 2013 at 11:10 AM, Ralph Castain  wrote:
>>
>>> FWIW: some of us are working on a variant of MPI that would indeed
>>> support what you describe - it would support send/recv (i.e., MPI-1), but
>>> not collectives, and so would allow communication between arbitrary
>>> programs.
>>>
>>> Not specifically targeting HLA/RTI, though I suppose a wrapper that
>>> conformed to that standard could be created.
>>>
>>> On Apr 15, 2013, at 7:50 AM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
>>> > This would be a departure from the SPMD paradigm that seems central to
>>> > MPI's design. Each process would be a completely different program
>>> > (piece of code) and I'm not sure how well that would working using
>>> > MPI?
>>> >
>>> > BTW, MPI is commonly used in the parallel discrete even world for
>>> > communication between LPs (federates in HLA). But these LPs are
>>> > usually the same program.
>>> >
>>> > ---John
>>> >
>>> > On Mon, Apr 15, 2013 at 10:22 AM, John Chludzinski
>>> >  wrote:
>>> >> Is anyone aware of an MPI based HLA/RTI (DoD High Level Architecture
>>> >> (HLA) / Runtime Infrastructure)?
>>> &g

Re: [OMPI users] MPI based HLA/RTI ?

2013-04-22 Thread John Chludzinski
Mainly responding to Ralph's comments.

In HLA a federate (MPI process) can join and leave a federation (MPI
collective) independently from other federates.  And rejoin later.

---John


On Mon, Apr 22, 2013 at 11:20 AM, George Bosilca wrote:

> On Apr 19, 2013, at 17:00 , John Chludzinski 
> wrote:
>
> So the apparent conclusion to this thread is that an (Open)MPI based RTI
> is very doable - if we allow for the future development of dynamic joining
> and leaving of the MPI collective?
>
>
> John,
>
> What do you mean by dynamically joining and leaving of the MPI collective?
>
> There are quite a few functions in MPI to dynamically join and disconnect
> processes (MPI_Spawn, MPI_Connect, MPI_Comm_join). So if your processes
> __always__ leave cleanly (using the defined MPI pattern of comm_disconnect
> + comm_free), you might be lucky enough to have this working today. If you
> want to support processes leaving for reasons outside of your control (such
> as crash) you do not have an option today in MPI, you need to use some
> extension (such as ULFM).
>
>   George.
>
>
>
>
> ---John
>
>
> On Wed, Apr 17, 2013 at 12:45 PM, Ralph Castain  wrote:
>
>> Thanks for the clarification - very interesting indeed! I'll look at it
>> more closely.
>>
>>
>> On Apr 17, 2013, at 9:20 AM, George Bosilca  wrote:
>>
>> On Apr 16, 2013, at 15:51 , Ralph Castain  wrote:
>>
>> Just curious: I thought ULFM dealt with recovering an MPI job where one
>> or more processes fail. Is this correct?
>>
>>
>> It depends what is the definition of "recovering" you take. ULFM is about
>> leaving the processes that remains (after a fault or a disconnect) in a
>> state that allow them to continue to make progress. It is not about
>> recovering processes, or user data, but it does provide the minimalistic
>> set of functionalities to allow application to do this, if needed (revoke,
>> agreement and shrink).
>>
>> HLA/RTI consists of processes that start at random times, run to
>> completion, and then exit normally. While a failure could occur, most
>> process terminations are normal and there is no need/intent to revive them.
>>
>>
>> As I said above, there is no revival of processes in ULFM, and it was
>> never our intent to have such feature. The dynamic world is to be
>> constructed using MPI-2 constructs (MPI_Spawn or MPI_Connect/Accept or even
>> MPI_Join).
>>
>> So it's mostly a case of massively exercising MPI's dynamic
>> connect/accept/disconnect functions.
>>
>> Do ULFM's structures have some utility for that purpose?
>>
>>
>> Absolutely. If the process that leaves instead of calling MPI_Finalize
>> calls exit() this will be interpreted by the version of the runtime in ULFM
>> as an event triggering a report. All the ensuing mechanisms are then
>> activated and the application can react to this event with the most
>> meaningful approach it can envision.
>>
>>   George.
>>
>>
>>
>> On Apr 16, 2013, at 3:20 AM, George Bosilca  wrote:
>>
>> There is an ongoing effort to address the potential volatility of
>> processes in MPI called ULFM. There is a working version available at
>> http://fault-tolerance.org. It supports TCP, sm and IB (mostly). You
>> will find some examples, and the document explaining the additional
>> constructs needed in MPI to achieve this.
>>
>>   George.
>>
>> On Apr 15, 2013, at 17:29 , John Chludzinski 
>> wrote:
>>
>> That would seem to preclude its use for an RTI.  Unless you have a card
>> up your sleeve?
>>
>> ---John
>>
>>
>> On Mon, Apr 15, 2013 at 11:23 AM, Ralph Castain  wrote:
>>
>>> It isn't the fact that there are multiple programs being used - we
>>> support that just fine. The problem with HLA/RTI is that it allows programs
>>> to come/go at will - i.e., not every program has to start at the same time,
>>> nor complete at the same time. MPI requires that all programs be executing
>>> at the beginning, and that all call finalize prior to anyone exiting.
>>>
>>>
>>> On Apr 15, 2013, at 8:14 AM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
>>> I just received an e-mail notifying me that MPI-2 supports MPMD.  This
>>> would seen to be just what the doctor ordered?
>>>
>>> ---John
>>>
>>>
>>> On Mon, Apr 15, 2013 at 11:10 AM, Ralph Castain wrote:
>>>
>>>> FWIW: some of us are working on a va

[OMPI users] Message queue in MPI?

2013-05-02 Thread John Chludzinski
If I'm using MPI_Send(...) and MPI_Recv(...) in a producer/consumer
model and choose not to buffer messages internally (in the app),
allowing them to acumulate in the MPI layer, how large of an MPI
message queue can I expect before something breaks?

---John


Re: [OMPI users] Message queue in MPI?

2013-05-02 Thread John Chludzinski
I assume there are setting for this?


On Thu, May 2, 2013 at 3:23 PM, John Chludzinski  wrote:

> If I'm using MPI_Send(...) and MPI_Recv(...) in a producer/consumer
> model and choose not to buffer messages internally (in the app),
> allowing them to acumulate in the MPI layer, how large of an MPI
> message queue can I expect before something breaks?
>
> ---John
>


Re: [OMPI users] latest Intel CPU bug

2018-01-04 Thread John Chludzinski
From
https://semiaccurate.com/2018/01/04/kaiser-security-holes-will-devastate-intels-marketshare/

Kaiser security holes will devastate Intel’s marketshareAnalysis: This one
tips the balance toward AMD in a big wayJan 4, 2018 by Charlie Demerjian




This latest decade-long critical security hole in Intel CPUs is going to
cost the company significant market share. SemiAccurate thinks it is not
only consequential but will shift the balance of power away from Intel CPUs
for at least the next several years.

Today’s latest crop of gaping security flaws have three sets of holes
across Intel, AMD, and ARM processors along with a slew of official
statements and detailed analyses. On top of that the statements from
vendors range from detailed and direct to intentionally misleading and
slimy. Lets take a look at what the problems are, who they effect and what
the outcome will be. Those outcomes range from trivial patching to
destroying the market share of Intel servers, and no we are not joking.

(*Authors Note 1:* For the technical readers we are simplifying a lot,
sorry we know this hurts. The full disclosure docs are linked, read them
for the details.)

(*Authors Note 2:* For the financial oriented subscribers out there, the
parts relevant to you are at the very end, the section is titled *Rubber
Meet Road*.)

*The Problem(s):*

As we said earlier there are three distinct security flaws that all fall
somewhat under the same umbrella. All are ‘new’ in the sense that the class
of attacks hasn’t been publicly described before, and all are very obscure
CPU speculative execution and timing related problems. The extent the fixes
affect differing architectures also ranges from minor to near-crippling
slowdowns. Worse yet is that all three flaws aren’t bugs or errors, they
exploit correct CPU behavior to allow the systems to be hacked.

The three problems are cleverly labeled Variant One, Variant Two, and
Variant Three. Google Project Zero was the original discoverer of them and
has labeled the classes as Bounds Bypass Check, Branch Target Injection,
and Rogue Data Cache Load respectively. You can read up on the extensive
and gory details here

if
you wish.

If you are the TLDR type the very simplified summary is that modern CPUs
will speculatively execute operations ahead of the one they are currently
running. Some architectures will allow these executions to start even when
they violate privilege levels, but those instructions are killed or rolled
back hopefully before they actually complete running.

Another feature of modern CPUs is virtual memory which can allow memory
from two or more processes to occupy the same physical page. This is a good
thing because if you have memory from the kernel and a bit of user code in
the same physical page but different virtual pages, changing from kernel to
userspace execution doesn’t require a page fault. This saves massive
amounts of time and overhead giving modern CPUs a huge speed boost. (For
the really technical out there, I know you are cringing at this
simplification, sorry).

These two things together allow you to do some interesting things and along
with timing attacks add new weapons to your hacking arsenal. If you have
code executing on one side of a virtual memory page boundary, it can
speculatively execute the next few instructions on the physical page that
cross the virtual page boundary. This isn’t a big deal unless the two
virtual pages are mapped to processes that are from different users or
different privilege levels. Then you have a problem. (Again painfully
simplified and liberties taken with the explanation, read the Google paper
for the full detail.)

This speculative execution allows you to get a few short (low latency)
instructions in before the speculation ends. Under certain circumstances
you can read memory from different threads or privilege levels, write those
things somewhere, and figure out what addresses other bits of code are
using. The latter bit has the nasty effect of potentially blowing through
address space randomization defenses which are a keystone of modern
security efforts. It is ugly.

*Who Gets Hit:*

So we have three attack vectors and three affected companies, Intel, AMD,
and ARM. Each has a different set of vulnerabilities to the different
attacks due to differences in underlying architectures. AMD put out a
pretty clear statement of what is affected, ARM put out by far the best and
most comprehensive description, and Intel obfuscated, denied, blamed
others, and downplayed the problem. If this was a contest for misleading
with doublespeak and misdirection, Intel won with a gold star, the others
weren’t even in the game. Lets look at who said what and why.

*ARM:*

ARM has a page up  listing
vulnerable processor cores, descriptions of the attacks, and plen

Re: [OMPI users] latest Intel CPU bug

2018-01-04 Thread John Chludzinski
That article gives the best technical assessment I've seen of Intel's
architecture bug. I noted the discussion's subject and thought I'd add some
clarity. Nothing more.

For the TL;DR crowd: get an AMD chip in your computer.

On Thursday, January 4, 2018, r...@open-mpi.org  wrote:

> Yes, please - that was totally inappropriate for this mailing list.
> Ralph
>
>
> On Jan 4, 2018, at 4:33 PM, Jeff Hammond  wrote:
>
> Can we restrain ourselves to talk about Open-MPI or at least technical
> aspects of HPC communication on this list and leave the stock market tips
> for Hacker News and Twitter?
>
> Thanks,
>
> Jeff
>
> On Thu, Jan 4, 2018 at 3:53 PM, John Chludzinski  gmail.com> wrote:
>
>> From https://semiaccurate.com/2018/01/04/kaiser-security-
>> holes-will-devastate-intels-marketshare/
>>
>> Kaiser security holes will devastate Intel’s marketshareAnalysis: This
>> one tips the balance toward AMD in a big wayJan 4, 2018 by Charlie
>> Demerjian <https://semiaccurate.com/author/charlie/>
>>
>>
>>
>> This latest decade-long critical security hole in Intel CPUs is going to
>> cost the company significant market share. SemiAccurate thinks it is not
>> only consequential but will shift the balance of power away from Intel CPUs
>> for at least the next several years.
>>
>> Today’s latest crop of gaping security flaws have three sets of holes
>> across Intel, AMD, and ARM processors along with a slew of official
>> statements and detailed analyses. On top of that the statements from
>> vendors range from detailed and direct to intentionally misleading and
>> slimy. Lets take a look at what the problems are, who they effect and what
>> the outcome will be. Those outcomes range from trivial patching to
>> destroying the market share of Intel servers, and no we are not joking.
>>
>> (*Authors Note 1:* For the technical readers we are simplifying a lot,
>> sorry we know this hurts. The full disclosure docs are linked, read them
>> for the details.)
>>
>> (*Authors Note 2:* For the financial oriented subscribers out there, the
>> parts relevant to you are at the very end, the section is titled *Rubber
>> Meet Road*.)
>>
>> *The Problem(s):*
>>
>> As we said earlier there are three distinct security flaws that all fall
>> somewhat under the same umbrella. All are ‘new’ in the sense that the class
>> of attacks hasn’t been publicly described before, and all are very obscure
>> CPU speculative execution and timing related problems. The extent the fixes
>> affect differing architectures also ranges from minor to near-crippling
>> slowdowns. Worse yet is that all three flaws aren’t bugs or errors, they
>> exploit correct CPU behavior to allow the systems to be hacked.
>>
>> The three problems are cleverly labeled Variant One, Variant Two, and
>> Variant Three. Google Project Zero was the original discoverer of them and
>> has labeled the classes as Bounds Bypass Check, Branch Target Injection,
>> and Rogue Data Cache Load respectively. You can read up on the extensive
>> and gory details here
>> <https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html>
>>  if
>> you wish.
>>
>> If you are the TLDR type the very simplified summary is that modern CPUs
>> will speculatively execute operations ahead of the one they are currently
>> running. Some architectures will allow these executions to start even when
>> they violate privilege levels, but those instructions are killed or rolled
>> back hopefully before they actually complete running.
>>
>> Another feature of modern CPUs is virtual memory which can allow memory
>> from two or more processes to occupy the same physical page. This is a good
>> thing because if you have memory from the kernel and a bit of user code in
>> the same physical page but different virtual pages, changing from kernel to
>> userspace execution doesn’t require a page fault. This saves massive
>> amounts of time and overhead giving modern CPUs a huge speed boost. (For
>> the really technical out there, I know you are cringing at this
>> simplification, sorry).
>>
>> These two things together allow you to do some interesting things and
>> along with timing attacks add new weapons to your hacking arsenal. If you
>> have code executing on one side of a virtual memory page boundary, it can
>> speculatively execute the next few instructions on the physical page that
>> cross the virtual page boundary. This isn’t a big deal unless the two
>> virtual pages are mapped to processes th

Re: [OMPI users] latest Intel CPU bug

2018-01-05 Thread John Chludzinski
I believe this snippet sums it up pretty well:

"Now you have a bit more context about why Intel’s response was, well, a
non-response. They blamed others, correctly, for having the same problem
but their blanket statement avoided the obvious issue of the others aren’t
crippled by the effects of the patches like Intel. Intel screwed up, badly,
and are facing a 30% performance hit going forward for it. AMD did right
and are probably breaking out the champagne at HQ about now."

On Fri, Jan 5, 2018 at 5:38 AM, Matthieu Brucher  wrote:

> Hi,
>
> I think, on the contrary, that he did notice the AMD/ARM issue. I suppose
> you haven't read the text (and I like the fact that there are different
> opinions on this issue).
>
> Matthieu
>
> 2018-01-05 8:23 GMT+01:00 Gilles Gouaillardet :
>
>> John,
>>
>>
>> The technical assessment so to speak is linked in the article and is
>> available at https://googleprojectzero.blogspot.jp/2018/01/reading-privil
>> eged-memory-with-side.html.
>>
>> The long rant against Intel PR blinded you and you did not notice AMD and
>> ARM (and though not mentionned here, Power and Sparc too) are vulnerable to
>> some bugs.
>>
>>
>> Full disclosure, i have no affiliation with Intel, but i am getting
>> pissed with the hysteria around this issue.
>>
>> Gilles
>>
>>
>> On 1/5/2018 3:54 PM, John Chludzinski wrote:
>>
>>> That article gives the best technical assessment I've seen of Intel's
>>> architecture bug. I noted the discussion's subject and thought I'd add some
>>> clarity. Nothing more.
>>>
>>> For the TL;DR crowd: get an AMD chip in your computer.
>>>
>>> On Thursday, January 4, 2018, r...@open-mpi.org <mailto:r...@open-mpi.org>
>>> mailto:r...@open-mpi.org>> wrote:
>>>
>>> Yes, please - that was totally inappropriate for this mailing list.
>>> Ralph
>>>
>>>
>>> On Jan 4, 2018, at 4:33 PM, Jeff Hammond >>> <mailto:jeff.scie...@gmail.com>> wrote:
>>>>
>>>> Can we restrain ourselves to talk about Open-MPI or at least
>>>> technical aspects of HPC communication on this list and leave the
>>>> stock market tips for Hacker News and Twitter?
>>>>
>>>> Thanks,
>>>>
>>>> Jeff
>>>>
>>>> On Thu, Jan 4, 2018 at 3:53 PM, John
>>>> Chludzinski>>> <mailto:john.chludzin...@gmail.com>>wrote:
>>>>
>>>> Fromhttps://semiaccurate.com/2018/01/04/kaiser-security-hole
>>>> s-will-devastate-intels-marketshare/
>>>> <https://semiaccurate.com/2018/01/04/kaiser-security-holes-w
>>>> ill-devastate-intels-marketshare/>
>>>>
>>>>
>>>>
>>>>   Kaiser security holes will devastate Intel’s marketshare
>>>>
>>>>
>>>>   Analysis: This one tips the balance toward AMD in a big
>>>> way
>>>>
>>>>
>>>>   Jan 4, 2018 by Charlie Demerjian
>>>>   <https://semiaccurate.com/author/charlie/>
>>>>
>>>>
>>>>
>>>> This latest decade-long critical security hole in Intel CPUs
>>>> is going to cost the company significant market share.
>>>> SemiAccurate thinks it is not only consequential but will
>>>> shift the balance of power away from Intel CPUs for at least
>>>> the next several years.
>>>>
>>>> Today’s latest crop of gaping security flaws have three sets
>>>> of holes across Intel, AMD, and ARM processors along with a
>>>> slew of official statements and detailed analyses. On top of
>>>> that the statements from vendors range from detailed and
>>>> direct to intentionally misleading and slimy. Lets take a
>>>> look at what the problems are, who they effect and what the
>>>> outcome will be. Those outcomes range from trivial patching
>>>> to destroying the market share of Intel servers, and no we
>>>> are not joking.
>>>>
>>>> (*Authors Note 1:* For the technical readers we are
>>>> simplifying a lot, sorry we know this hurts. The full
>>>> disclosure docs are linked, read them for the details.)
>>>>
>>>>