> On a side note, do you have an RDMA supporting device (
Infiniband/RoCE/iWarp) ?

I'm just an engineer trying to get something to work on an AMD dual core
notebook for the powers-that-be at a small engineering concern (all MEs) in
Huntsville, AL - i.e., NASA work.

---John

On Sun, Sep 16, 2012 at 3:21 AM, Jingcha Joba <pukkimon...@gmail.com> wrote:

> John,
>
> BTL refers to Byte Transfer Layer, a framework to send/receive point to
> point messages on different network. It has several components
> (implementations) like openib, tcp, mx, shared mem, etc.
>
> ^openib means "not" to use openib component for p2p messages.
>
> On a side note, do you have an RDMA supporting device (
> Infiniband/RoCE/iWarp) ? If so, is OFED installed correctly and is running?
> If you do not have, is the OFED running, which it should not, otherwise ?
>
> The message that you are getting could be because of this. As a
> consequence, if you have a RDMA supported device, you might be getting poor
> performance.
>
> A wealth of information is available in the FAQ section regarding these
> things.
>
> --
> Sent from my iPhone
>
> On Sep 15, 2012, at 9:49 PM, John Chludzinski <john.chludzin...@gmail.com>
> wrote:
>
> BTW, I looked up the -mca option:
>
>  -mca |--mca <arg0> <arg1>
>               Pass context-specific MCA parameters; they are
>               considered global if --gmca is not used and only
>               one context is specified (arg0 is the parameter
>               name; arg1 is the parameter value)
>
> Could you explain the args: btl and ^openib ?
>
> ---John
>
>
> On Sun, Sep 16, 2012 at 12:26 AM, John Chludzinski <
> john.chludzin...@gmail.com> wrote:
>
>> BINGO!  That did it.  Thanks.  ---John
>>
>>
>> On Sat, Sep 15, 2012 at 9:32 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> No - the mca param has to be specified *before* your executable
>>>
>>> mpiexec -mca btl ^openib -n 4 ./a.out
>>>
>>> Also, note the space between "btl" and "^openib"
>>>
>>>
>>> On Sep 15, 2012, at 5:45 PM, John Chludzinski <
>>> john.chludzin...@gmail.com> wrote:
>>>
>>> Is this what you intended(?):
>>>
>>> *$ mpiexec -n 4 ./a.out -mca btl^openib
>>>
>>> *librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>
>>> --------------------------------------------------------------------------
>>> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module
>>> was unable to find any relevant network interfaces:
>>>
>>> Module: OpenFabrics (openib)
>>>   Host: elzbieta
>>>
>>> Another transport will be used instead, although this may result in
>>> lower performance.
>>>
>>> --------------------------------------------------------------------------
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>> librdmacm: couldn't read ABI version.
>>> librdmacm: assuming: 4
>>> CMA: unable to get RDMA device list
>>>  rank=            1  Results:    5.0000000       6.0000000
>>> 7.0000000       8.0000000
>>>  rank=            0  Results:    1.0000000       2.0000000
>>> 3.0000000       4.0000000
>>>  rank=            2  Results:    9.0000000       10.000000
>>> 11.000000       12.000000
>>>  rank=            3  Results:    13.000000       14.000000
>>> 15.000000       16.000000
>>> [elzbieta:02374] 3 more processes have sent help message
>>> help-mpi-btl-base.txt / btl:no-nics
>>> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>> see all help / error messages
>>>
>>>
>>> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>
>>>> Try adding "-mca btl ^openib" to your cmd line and see if that cleans
>>>> it up.
>>>>
>>>>
>>>> On Sep 15, 2012, at 12:44 PM, John Chludzinski <
>>>> john.chludzin...@gmail.com> wrote:
>>>>
>>>> There was a bug in the code.  So now I get this, which is correct but
>>>> how do I get rid of all these ABI, CMA, etc. messages?
>>>>
>>>> $ mpiexec -n 4 ./a.out
>>>> librdmacm: couldn't read ABI version.
>>>> librdmacm: couldn't read ABI version.
>>>> librdmacm: assuming: 4
>>>> CMA: unable to get RDMA device list
>>>> librdmacm: assuming: 4
>>>> CMA: unable to get RDMA device list
>>>> CMA: unable to get RDMA device list
>>>> librdmacm: couldn't read ABI version.
>>>> librdmacm: assuming: 4
>>>> librdmacm: couldn't read ABI version.
>>>> librdmacm: assuming: 4
>>>> CMA: unable to get RDMA device list
>>>>
>>>> --------------------------------------------------------------------------
>>>> [[6110,1],1]: A high-performance Open MPI point-to-point messaging
>>>> module
>>>> was unable to find any relevant network interfaces:
>>>>
>>>> Module: OpenFabrics (openib)
>>>>   Host: elzbieta
>>>>
>>>> Another transport will be used instead, although this may result in
>>>> lower performance.
>>>>
>>>> --------------------------------------------------------------------------
>>>>  rank=            1  Results:    5.0000000       6.0000000
>>>> 7.0000000       8.0000000
>>>>  rank=            2  Results:    9.0000000       10.000000
>>>> 11.000000       12.000000
>>>>  rank=            0  Results:    1.0000000       2.0000000
>>>> 3.0000000       4.0000000
>>>>  rank=            3  Results:    13.000000       14.000000
>>>> 15.000000       16.000000
>>>> [elzbieta:02559] 3 more processes have sent help message
>>>> help-mpi-btl-base.txt / btl:no-nics
>>>> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>>> see all help / error messages
>>>>
>>>>
>>>> On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski <
>>>> john.chludzin...@gmail.com> wrote:
>>>>
>>>>> BTW, here the example code:
>>>>>
>>>>> program scatter
>>>>> include 'mpif.h'
>>>>>
>>>>> integer, parameter :: SIZE=4
>>>>> integer :: numtasks, rank, sendcount, recvcount, source, ierr
>>>>> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE)
>>>>>
>>>>> !  Fortran stores this array in column major order, so the
>>>>> !  scatter will actually scatter columns, not rows.
>>>>> data sendbuf /1.0, 2.0, 3.0, 4.0, &
>>>>> 5.0, 6.0, 7.0, 8.0, &
>>>>> 9.0, 10.0, 11.0, 12.0, &
>>>>> 13.0, 14.0, 15.0, 16.0 /
>>>>>
>>>>> call MPI_INIT(ierr)
>>>>> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
>>>>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
>>>>>
>>>>> if (numtasks .eq. SIZE) then
>>>>>   source = 1
>>>>>   sendcount = SIZE
>>>>>   recvcount = SIZE
>>>>>   call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, &
>>>>>                    recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
>>>>>   print *, 'rank= ',rank,' Results: ',recvbuf
>>>>> else
>>>>>    print *, 'Must specify',SIZE,' processors.  Terminating.'
>>>>> endif
>>>>>
>>>>> call MPI_FINALIZE(ierr)
>>>>>
>>>>> end program
>>>>>
>>>>>
>>>>> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski <
>>>>> john.chludzin...@gmail.com> wrote:
>>>>>
>>>>>> # export LD_LIBRARY_PATH
>>>>>>
>>>>>>
>>>>>> # mpiexec -n 1 printenv | grep PATH
>>>>>> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>>>>>>
>>>>>>
>>>>>> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
>>>>>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
>>>>>> WINDOWPATH=1
>>>>>>
>>>>>> # mpiexec -n 4 ./a.out
>>>>>> librdmacm: couldn't read ABI version.
>>>>>> librdmacm: assuming: 4
>>>>>> CMA: unable to get RDMA device list
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> [[3598,1],0]: A high-performance Open MPI point-to-point messaging
>>>>>> module
>>>>>> was unable to find any relevant network interfaces:
>>>>>>
>>>>>> Module: OpenFabrics (openib)
>>>>>>   Host: elzbieta
>>>>>>
>>>>>> Another transport will be used instead, although this may result in
>>>>>> lower performance.
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> librdmacm: couldn't read ABI version.
>>>>>> librdmacm: assuming: 4
>>>>>> librdmacm: couldn't read ABI version.
>>>>>> CMA: unable to get RDMA device list
>>>>>> librdmacm: assuming: 4
>>>>>> CMA: unable to get RDMA device list
>>>>>> librdmacm: couldn't read ABI version.
>>>>>> librdmacm: assuming: 4
>>>>>> CMA: unable to get RDMA device list
>>>>>> [elzbieta:4145] *** An error occurred in MPI_Scatter
>>>>>> [elzbieta:4145] *** on communicator MPI_COMM_WORLD
>>>>>> [elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype
>>>>>> [elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> mpiexec has exited due to process rank 1 with PID 4145 on
>>>>>> node elzbieta exiting improperly. There are two reasons this could
>>>>>> occur:
>>>>>>
>>>>>> 1. this process did not call "init" before exiting, but others in
>>>>>> the job did. This can cause a job to hang indefinitely while it waits
>>>>>> for all processes to call "init". By rule, if one process calls
>>>>>> "init",
>>>>>> then ALL processes must call "init" prior to termination.
>>>>>>
>>>>>> 2. this process called "init", but exited without calling "finalize".
>>>>>> By rule, all processes that call "init" MUST call "finalize" prior to
>>>>>> exiting or it will be considered an "abnormal termination"
>>>>>>
>>>>>> This may have caused other processes in the application to be
>>>>>> terminated by signals sent by mpiexec (as reported here).
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Sep 15, 2012 at 2:24 PM, Ralph Castain <r...@open-mpi.org>wrote:
>>>>>>
>>>>>>> Ah - note that there is no LD_LIBRARY_PATH in the environment.
>>>>>>> That's the problem
>>>>>>>
>>>>>>> On Sep 15, 2012, at 11:19 AM, John Chludzinski <
>>>>>>> john.chludzin...@gmail.com> wrote:
>>>>>>>
>>>>>>> $ which mpiexec
>>>>>>> /usr/lib/openmpi/bin/mpiexec
>>>>>>>
>>>>>>> # mpiexec -n 1 printenv | grep PATH
>>>>>>>
>>>>>>> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin
>>>>>>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles
>>>>>>> WINDOWPATH=1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain <r...@open-mpi.org>wrote:
>>>>>>>
>>>>>>>> Couple of things worth checking:
>>>>>>>>
>>>>>>>> 1. verify that you executed the "mpiexec" you think you did - a
>>>>>>>> simple "which mpiexec" should suffice
>>>>>>>>
>>>>>>>> 2. verify that your environment is correct by "mpiexec -n 1
>>>>>>>> printenv | grep PATH". Sometimes the ld_library_path doesn't carry over
>>>>>>>> like you think it should
>>>>>>>>
>>>>>>>>
>>>>>>>>  On Sep 15, 2012, at 10:00 AM, John Chludzinski <
>>>>>>>> john.chludzin...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> I installed OpenMPI (I have a simple dual core AMD notebook with
>>>>>>>> Fedora 16) via:
>>>>>>>>
>>>>>>>> # yum install openmpi
>>>>>>>> # yum install openmpi-devel
>>>>>>>> # mpirun --version
>>>>>>>> mpirun (Open MPI) 1.5.4
>>>>>>>>
>>>>>>>> I added:
>>>>>>>>
>>>>>>>> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH
>>>>>>>> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/
>>>>>>>>
>>>>>>>> Then:
>>>>>>>>
>>>>>>>> $ mpif90 ex1.f95
>>>>>>>> $ mpiexec -n 4 ./a.out
>>>>>>>> ./a.out: error while loading shared libraries: libmpi_f90.so.1:
>>>>>>>> cannot open shared object file: No such file or directory
>>>>>>>> ./a.out: error while loading shared libraries: libmpi_f90.so.1:
>>>>>>>> cannot open shared object file: No such file or directory
>>>>>>>> ./a.out: error while loading shared libraries: libmpi_f90.so.1:
>>>>>>>> cannot open shared object file: No such file or directory
>>>>>>>> ./a.out: error while loading shared libraries: libmpi_f90.so.1:
>>>>>>>> cannot open shared object file: No such file or directory
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> mpiexec noticed that the job aborted, but has no info as to the
>>>>>>>> process
>>>>>>>> that caused that situation.
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>> ls -l /usr/lib/openmpi/lib/
>>>>>>>> total 6788
>>>>>>>> lrwxrwxrwx. 1 root root      25 Sep 15 12:25 libmca_common_sm.so ->
>>>>>>>> libmca_common_sm.so.2.0.0
>>>>>>>> lrwxrwxrwx. 1 root root      25 Sep 14 16:14 libmca_common_sm.so.2
>>>>>>>> -> libmca_common_sm.so.2.0.0
>>>>>>>> -rwxr-xr-x. 1 root root    8492 Jan 20  2012
>>>>>>>> libmca_common_sm.so.2.0.0
>>>>>>>> lrwxrwxrwx. 1 root root      19 Sep 15 12:25 libmpi_cxx.so ->
>>>>>>>> libmpi_cxx.so.1.0.1
>>>>>>>> lrwxrwxrwx. 1 root root      19 Sep 14 16:14 libmpi_cxx.so.1 ->
>>>>>>>> libmpi_cxx.so.1.0.1
>>>>>>>> -rwxr-xr-x. 1 root root   87604 Jan 20  2012 libmpi_cxx.so.1.0.1
>>>>>>>> lrwxrwxrwx. 1 root root      19 Sep 15 12:25 libmpi_f77.so ->
>>>>>>>> libmpi_f77.so.1.0.2
>>>>>>>> lrwxrwxrwx. 1 root root      19 Sep 14 16:14 libmpi_f77.so.1 ->
>>>>>>>> libmpi_f77.so.1.0.2
>>>>>>>> -rwxr-xr-x. 1 root root  179912 Jan 20  2012 libmpi_f77.so.1.0.2
>>>>>>>> lrwxrwxrwx. 1 root root      19 Sep 15 12:25 libmpi_f90.so ->
>>>>>>>> libmpi_f90.so.1.1.0
>>>>>>>> lrwxrwxrwx. 1 root root      19 Sep 14 16:14 libmpi_f90.so.1 ->
>>>>>>>> libmpi_f90.so.1.1.0
>>>>>>>> -rwxr-xr-x. 1 root root   10364 Jan 20  2012 libmpi_f90.so.1.1.0
>>>>>>>> lrwxrwxrwx. 1 root root      15 Sep 15 12:25 libmpi.so ->
>>>>>>>> libmpi.so.1.0.2
>>>>>>>> lrwxrwxrwx. 1 root root      15 Sep 14 16:14 libmpi.so.1 ->
>>>>>>>> libmpi.so.1.0.2
>>>>>>>> -rwxr-xr-x. 1 root root 1383444 Jan 20  2012 libmpi.so.1.0.2
>>>>>>>> lrwxrwxrwx. 1 root root      21 Sep 15 12:25 libompitrace.so ->
>>>>>>>> libompitrace.so.0.0.0
>>>>>>>> lrwxrwxrwx. 1 root root      21 Sep 14 16:14 libompitrace.so.0 ->
>>>>>>>> libompitrace.so.0.0.0
>>>>>>>> -rwxr-xr-x. 1 root root   13572 Jan 20  2012 libompitrace.so.0.0.0
>>>>>>>> lrwxrwxrwx. 1 root root      20 Sep 15 12:25 libopen-pal.so ->
>>>>>>>> libopen-pal.so.3.0.0
>>>>>>>> lrwxrwxrwx. 1 root root      20 Sep 14 16:14 libopen-pal.so.3 ->
>>>>>>>> libopen-pal.so.3.0.0
>>>>>>>> -rwxr-xr-x. 1 root root  386324 Jan 20  2012 libopen-pal.so.3.0.0
>>>>>>>> lrwxrwxrwx. 1 root root      20 Sep 15 12:25 libopen-rte.so ->
>>>>>>>> libopen-rte.so.3.0.0
>>>>>>>> lrwxrwxrwx. 1 root root      20 Sep 14 16:14 libopen-rte.so.3 ->
>>>>>>>> libopen-rte.so.3.0.0
>>>>>>>> -rwxr-xr-x. 1 root root  790052 Jan 20  2012 libopen-rte.so.3.0.0
>>>>>>>> -rw-r--r--. 1 root root  301520 Jan 20  2012 libotf.a
>>>>>>>> lrwxrwxrwx. 1 root root      15 Sep 15 12:25 libotf.so ->
>>>>>>>> libotf.so.0.0.1
>>>>>>>> lrwxrwxrwx. 1 root root      15 Sep 14 16:14 libotf.so.0 ->
>>>>>>>> libotf.so.0.0.1
>>>>>>>> -rwxr-xr-x. 1 root root  206384 Jan 20  2012 libotf.so.0.0.1
>>>>>>>> -rw-r--r--. 1 root root  337970 Jan 20  2012 libvt.a
>>>>>>>> -rw-r--r--. 1 root root  591070 Jan 20  2012 libvt-hyb.a
>>>>>>>> lrwxrwxrwx. 1 root root      18 Sep 15 12:25 libvt-hyb.so ->
>>>>>>>> libvt-hyb.so.0.0.0
>>>>>>>> lrwxrwxrwx. 1 root root      18 Sep 14 16:14 libvt-hyb.so.0 ->
>>>>>>>> libvt-hyb.so.0.0.0
>>>>>>>> -rwxr-xr-x. 1 root root  428844 Jan 20  2012 libvt-hyb.so.0.0.0
>>>>>>>> -rw-r--r--. 1 root root  541004 Jan 20  2012 libvt-mpi.a
>>>>>>>> lrwxrwxrwx. 1 root root      18 Sep 15 12:25 libvt-mpi.so ->
>>>>>>>> libvt-mpi.so.0.0.0
>>>>>>>> lrwxrwxrwx. 1 root root      18 Sep 14 16:14 libvt-mpi.so.0 ->
>>>>>>>> libvt-mpi.so.0.0.0
>>>>>>>> -rwxr-xr-x. 1 root root  396352 Jan 20  2012 libvt-mpi.so.0.0.0
>>>>>>>> -rw-r--r--. 1 root root  372352 Jan 20  2012 libvt-mt.a
>>>>>>>> lrwxrwxrwx. 1 root root      17 Sep 15 12:25 libvt-mt.so ->
>>>>>>>> libvt-mt.so.0.0.0
>>>>>>>> lrwxrwxrwx. 1 root root      17 Sep 14 16:14 libvt-mt.so.0 ->
>>>>>>>> libvt-mt.so.0.0.0
>>>>>>>> -rwxr-xr-x. 1 root root  266104 Jan 20  2012 libvt-mt.so.0.0.0
>>>>>>>> -rw-r--r--. 1 root root   60390 Jan 20  2012 libvt-pomp.a
>>>>>>>> lrwxrwxrwx. 1 root root      14 Sep 15 12:25 libvt.so ->
>>>>>>>> libvt.so.0.0.0
>>>>>>>> lrwxrwxrwx. 1 root root      14 Sep 14 16:14 libvt.so.0 ->
>>>>>>>> libvt.so.0.0.0
>>>>>>>> -rwxr-xr-x. 1 root root  242604 Jan 20  2012 libvt.so.0.0.0
>>>>>>>> -rwxr-xr-x. 1 root root  303591 Jan 20  2012 mpi.mod
>>>>>>>> drwxr-xr-x. 2 root root    4096 Sep 14 16:14 openmpi
>>>>>>>>
>>>>>>>>
>>>>>>>> The file (actually, a link) it claims it can't find:
>>>>>>>> libmpi_f90.so.1, is clearly there. And
>>>>>>>> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/.
>>>>>>>>
>>>>>>>> What's the problem?
>>>>>>>>
>>>>>>>> ---John
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to