> On a side note, do you have an RDMA supporting device ( Infiniband/RoCE/iWarp) ?
I'm just an engineer trying to get something to work on an AMD dual core notebook for the powers-that-be at a small engineering concern (all MEs) in Huntsville, AL - i.e., NASA work. ---John On Sun, Sep 16, 2012 at 3:21 AM, Jingcha Joba <pukkimon...@gmail.com> wrote: > John, > > BTL refers to Byte Transfer Layer, a framework to send/receive point to > point messages on different network. It has several components > (implementations) like openib, tcp, mx, shared mem, etc. > > ^openib means "not" to use openib component for p2p messages. > > On a side note, do you have an RDMA supporting device ( > Infiniband/RoCE/iWarp) ? If so, is OFED installed correctly and is running? > If you do not have, is the OFED running, which it should not, otherwise ? > > The message that you are getting could be because of this. As a > consequence, if you have a RDMA supported device, you might be getting poor > performance. > > A wealth of information is available in the FAQ section regarding these > things. > > -- > Sent from my iPhone > > On Sep 15, 2012, at 9:49 PM, John Chludzinski <john.chludzin...@gmail.com> > wrote: > > BTW, I looked up the -mca option: > > -mca |--mca <arg0> <arg1> > Pass context-specific MCA parameters; they are > considered global if --gmca is not used and only > one context is specified (arg0 is the parameter > name; arg1 is the parameter value) > > Could you explain the args: btl and ^openib ? > > ---John > > > On Sun, Sep 16, 2012 at 12:26 AM, John Chludzinski < > john.chludzin...@gmail.com> wrote: > >> BINGO! That did it. Thanks. ---John >> >> >> On Sat, Sep 15, 2012 at 9:32 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> No - the mca param has to be specified *before* your executable >>> >>> mpiexec -mca btl ^openib -n 4 ./a.out >>> >>> Also, note the space between "btl" and "^openib" >>> >>> >>> On Sep 15, 2012, at 5:45 PM, John Chludzinski < >>> john.chludzin...@gmail.com> wrote: >>> >>> Is this what you intended(?): >>> >>> *$ mpiexec -n 4 ./a.out -mca btl^openib >>> >>> *librdmacm: couldn't read ABI version. >>> librdmacm: assuming: 4 >>> CMA: unable to get RDMA device list >>> >>> -------------------------------------------------------------------------- >>> [[5991,1],0]: A high-performance Open MPI point-to-point messaging module >>> was unable to find any relevant network interfaces: >>> >>> Module: OpenFabrics (openib) >>> Host: elzbieta >>> >>> Another transport will be used instead, although this may result in >>> lower performance. >>> >>> -------------------------------------------------------------------------- >>> librdmacm: couldn't read ABI version. >>> librdmacm: assuming: 4 >>> CMA: unable to get RDMA device list >>> librdmacm: couldn't read ABI version. >>> librdmacm: assuming: 4 >>> CMA: unable to get RDMA device list >>> librdmacm: couldn't read ABI version. >>> librdmacm: assuming: 4 >>> CMA: unable to get RDMA device list >>> rank= 1 Results: 5.0000000 6.0000000 >>> 7.0000000 8.0000000 >>> rank= 0 Results: 1.0000000 2.0000000 >>> 3.0000000 4.0000000 >>> rank= 2 Results: 9.0000000 10.000000 >>> 11.000000 12.000000 >>> rank= 3 Results: 13.000000 14.000000 >>> 15.000000 16.000000 >>> [elzbieta:02374] 3 more processes have sent help message >>> help-mpi-btl-base.txt / btl:no-nics >>> [elzbieta:02374] Set MCA parameter "orte_base_help_aggregate" to 0 to >>> see all help / error messages >>> >>> >>> On Sat, Sep 15, 2012 at 8:22 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> Try adding "-mca btl ^openib" to your cmd line and see if that cleans >>>> it up. >>>> >>>> >>>> On Sep 15, 2012, at 12:44 PM, John Chludzinski < >>>> john.chludzin...@gmail.com> wrote: >>>> >>>> There was a bug in the code. So now I get this, which is correct but >>>> how do I get rid of all these ABI, CMA, etc. messages? >>>> >>>> $ mpiexec -n 4 ./a.out >>>> librdmacm: couldn't read ABI version. >>>> librdmacm: couldn't read ABI version. >>>> librdmacm: assuming: 4 >>>> CMA: unable to get RDMA device list >>>> librdmacm: assuming: 4 >>>> CMA: unable to get RDMA device list >>>> CMA: unable to get RDMA device list >>>> librdmacm: couldn't read ABI version. >>>> librdmacm: assuming: 4 >>>> librdmacm: couldn't read ABI version. >>>> librdmacm: assuming: 4 >>>> CMA: unable to get RDMA device list >>>> >>>> -------------------------------------------------------------------------- >>>> [[6110,1],1]: A high-performance Open MPI point-to-point messaging >>>> module >>>> was unable to find any relevant network interfaces: >>>> >>>> Module: OpenFabrics (openib) >>>> Host: elzbieta >>>> >>>> Another transport will be used instead, although this may result in >>>> lower performance. >>>> >>>> -------------------------------------------------------------------------- >>>> rank= 1 Results: 5.0000000 6.0000000 >>>> 7.0000000 8.0000000 >>>> rank= 2 Results: 9.0000000 10.000000 >>>> 11.000000 12.000000 >>>> rank= 0 Results: 1.0000000 2.0000000 >>>> 3.0000000 4.0000000 >>>> rank= 3 Results: 13.000000 14.000000 >>>> 15.000000 16.000000 >>>> [elzbieta:02559] 3 more processes have sent help message >>>> help-mpi-btl-base.txt / btl:no-nics >>>> [elzbieta:02559] Set MCA parameter "orte_base_help_aggregate" to 0 to >>>> see all help / error messages >>>> >>>> >>>> On Sat, Sep 15, 2012 at 3:34 PM, John Chludzinski < >>>> john.chludzin...@gmail.com> wrote: >>>> >>>>> BTW, here the example code: >>>>> >>>>> program scatter >>>>> include 'mpif.h' >>>>> >>>>> integer, parameter :: SIZE=4 >>>>> integer :: numtasks, rank, sendcount, recvcount, source, ierr >>>>> real :: sendbuf(SIZE,SIZE), recvbuf(SIZE) >>>>> >>>>> ! Fortran stores this array in column major order, so the >>>>> ! scatter will actually scatter columns, not rows. >>>>> data sendbuf /1.0, 2.0, 3.0, 4.0, & >>>>> 5.0, 6.0, 7.0, 8.0, & >>>>> 9.0, 10.0, 11.0, 12.0, & >>>>> 13.0, 14.0, 15.0, 16.0 / >>>>> >>>>> call MPI_INIT(ierr) >>>>> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) >>>>> call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr) >>>>> >>>>> if (numtasks .eq. SIZE) then >>>>> source = 1 >>>>> sendcount = SIZE >>>>> recvcount = SIZE >>>>> call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, & >>>>> recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr) >>>>> print *, 'rank= ',rank,' Results: ',recvbuf >>>>> else >>>>> print *, 'Must specify',SIZE,' processors. Terminating.' >>>>> endif >>>>> >>>>> call MPI_FINALIZE(ierr) >>>>> >>>>> end program >>>>> >>>>> >>>>> On Sat, Sep 15, 2012 at 3:02 PM, John Chludzinski < >>>>> john.chludzin...@gmail.com> wrote: >>>>> >>>>>> # export LD_LIBRARY_PATH >>>>>> >>>>>> >>>>>> # mpiexec -n 1 printenv | grep PATH >>>>>> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/ >>>>>> >>>>>> >>>>>> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin >>>>>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles >>>>>> WINDOWPATH=1 >>>>>> >>>>>> # mpiexec -n 4 ./a.out >>>>>> librdmacm: couldn't read ABI version. >>>>>> librdmacm: assuming: 4 >>>>>> CMA: unable to get RDMA device list >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> [[3598,1],0]: A high-performance Open MPI point-to-point messaging >>>>>> module >>>>>> was unable to find any relevant network interfaces: >>>>>> >>>>>> Module: OpenFabrics (openib) >>>>>> Host: elzbieta >>>>>> >>>>>> Another transport will be used instead, although this may result in >>>>>> lower performance. >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> librdmacm: couldn't read ABI version. >>>>>> librdmacm: assuming: 4 >>>>>> librdmacm: couldn't read ABI version. >>>>>> CMA: unable to get RDMA device list >>>>>> librdmacm: assuming: 4 >>>>>> CMA: unable to get RDMA device list >>>>>> librdmacm: couldn't read ABI version. >>>>>> librdmacm: assuming: 4 >>>>>> CMA: unable to get RDMA device list >>>>>> [elzbieta:4145] *** An error occurred in MPI_Scatter >>>>>> [elzbieta:4145] *** on communicator MPI_COMM_WORLD >>>>>> [elzbieta:4145] *** MPI_ERR_TYPE: invalid datatype >>>>>> [elzbieta:4145] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> mpiexec has exited due to process rank 1 with PID 4145 on >>>>>> node elzbieta exiting improperly. There are two reasons this could >>>>>> occur: >>>>>> >>>>>> 1. this process did not call "init" before exiting, but others in >>>>>> the job did. This can cause a job to hang indefinitely while it waits >>>>>> for all processes to call "init". By rule, if one process calls >>>>>> "init", >>>>>> then ALL processes must call "init" prior to termination. >>>>>> >>>>>> 2. this process called "init", but exited without calling "finalize". >>>>>> By rule, all processes that call "init" MUST call "finalize" prior to >>>>>> exiting or it will be considered an "abnormal termination" >>>>>> >>>>>> This may have caused other processes in the application to be >>>>>> terminated by signals sent by mpiexec (as reported here). >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Sep 15, 2012 at 2:24 PM, Ralph Castain <r...@open-mpi.org>wrote: >>>>>> >>>>>>> Ah - note that there is no LD_LIBRARY_PATH in the environment. >>>>>>> That's the problem >>>>>>> >>>>>>> On Sep 15, 2012, at 11:19 AM, John Chludzinski < >>>>>>> john.chludzin...@gmail.com> wrote: >>>>>>> >>>>>>> $ which mpiexec >>>>>>> /usr/lib/openmpi/bin/mpiexec >>>>>>> >>>>>>> # mpiexec -n 1 printenv | grep PATH >>>>>>> >>>>>>> PATH=/usr/lib/openmpi/bin/:/usr/lib/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/jski/.local/bin:/home/jski/bin >>>>>>> MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles >>>>>>> WINDOWPATH=1 >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Sep 15, 2012 at 1:11 PM, Ralph Castain <r...@open-mpi.org>wrote: >>>>>>> >>>>>>>> Couple of things worth checking: >>>>>>>> >>>>>>>> 1. verify that you executed the "mpiexec" you think you did - a >>>>>>>> simple "which mpiexec" should suffice >>>>>>>> >>>>>>>> 2. verify that your environment is correct by "mpiexec -n 1 >>>>>>>> printenv | grep PATH". Sometimes the ld_library_path doesn't carry over >>>>>>>> like you think it should >>>>>>>> >>>>>>>> >>>>>>>> On Sep 15, 2012, at 10:00 AM, John Chludzinski < >>>>>>>> john.chludzin...@gmail.com> wrote: >>>>>>>> >>>>>>>> I installed OpenMPI (I have a simple dual core AMD notebook with >>>>>>>> Fedora 16) via: >>>>>>>> >>>>>>>> # yum install openmpi >>>>>>>> # yum install openmpi-devel >>>>>>>> # mpirun --version >>>>>>>> mpirun (Open MPI) 1.5.4 >>>>>>>> >>>>>>>> I added: >>>>>>>> >>>>>>>> $ PATH=PATH=/usr/lib/openmpi/bin/:$PATH >>>>>>>> $ LD_LIBRARY_PATH=/usr/lib/openmpi/lib/ >>>>>>>> >>>>>>>> Then: >>>>>>>> >>>>>>>> $ mpif90 ex1.f95 >>>>>>>> $ mpiexec -n 4 ./a.out >>>>>>>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: >>>>>>>> cannot open shared object file: No such file or directory >>>>>>>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: >>>>>>>> cannot open shared object file: No such file or directory >>>>>>>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: >>>>>>>> cannot open shared object file: No such file or directory >>>>>>>> ./a.out: error while loading shared libraries: libmpi_f90.so.1: >>>>>>>> cannot open shared object file: No such file or directory >>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> mpiexec noticed that the job aborted, but has no info as to the >>>>>>>> process >>>>>>>> that caused that situation. >>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> >>>>>>>> ls -l /usr/lib/openmpi/lib/ >>>>>>>> total 6788 >>>>>>>> lrwxrwxrwx. 1 root root 25 Sep 15 12:25 libmca_common_sm.so -> >>>>>>>> libmca_common_sm.so.2.0.0 >>>>>>>> lrwxrwxrwx. 1 root root 25 Sep 14 16:14 libmca_common_sm.so.2 >>>>>>>> -> libmca_common_sm.so.2.0.0 >>>>>>>> -rwxr-xr-x. 1 root root 8492 Jan 20 2012 >>>>>>>> libmca_common_sm.so.2.0.0 >>>>>>>> lrwxrwxrwx. 1 root root 19 Sep 15 12:25 libmpi_cxx.so -> >>>>>>>> libmpi_cxx.so.1.0.1 >>>>>>>> lrwxrwxrwx. 1 root root 19 Sep 14 16:14 libmpi_cxx.so.1 -> >>>>>>>> libmpi_cxx.so.1.0.1 >>>>>>>> -rwxr-xr-x. 1 root root 87604 Jan 20 2012 libmpi_cxx.so.1.0.1 >>>>>>>> lrwxrwxrwx. 1 root root 19 Sep 15 12:25 libmpi_f77.so -> >>>>>>>> libmpi_f77.so.1.0.2 >>>>>>>> lrwxrwxrwx. 1 root root 19 Sep 14 16:14 libmpi_f77.so.1 -> >>>>>>>> libmpi_f77.so.1.0.2 >>>>>>>> -rwxr-xr-x. 1 root root 179912 Jan 20 2012 libmpi_f77.so.1.0.2 >>>>>>>> lrwxrwxrwx. 1 root root 19 Sep 15 12:25 libmpi_f90.so -> >>>>>>>> libmpi_f90.so.1.1.0 >>>>>>>> lrwxrwxrwx. 1 root root 19 Sep 14 16:14 libmpi_f90.so.1 -> >>>>>>>> libmpi_f90.so.1.1.0 >>>>>>>> -rwxr-xr-x. 1 root root 10364 Jan 20 2012 libmpi_f90.so.1.1.0 >>>>>>>> lrwxrwxrwx. 1 root root 15 Sep 15 12:25 libmpi.so -> >>>>>>>> libmpi.so.1.0.2 >>>>>>>> lrwxrwxrwx. 1 root root 15 Sep 14 16:14 libmpi.so.1 -> >>>>>>>> libmpi.so.1.0.2 >>>>>>>> -rwxr-xr-x. 1 root root 1383444 Jan 20 2012 libmpi.so.1.0.2 >>>>>>>> lrwxrwxrwx. 1 root root 21 Sep 15 12:25 libompitrace.so -> >>>>>>>> libompitrace.so.0.0.0 >>>>>>>> lrwxrwxrwx. 1 root root 21 Sep 14 16:14 libompitrace.so.0 -> >>>>>>>> libompitrace.so.0.0.0 >>>>>>>> -rwxr-xr-x. 1 root root 13572 Jan 20 2012 libompitrace.so.0.0.0 >>>>>>>> lrwxrwxrwx. 1 root root 20 Sep 15 12:25 libopen-pal.so -> >>>>>>>> libopen-pal.so.3.0.0 >>>>>>>> lrwxrwxrwx. 1 root root 20 Sep 14 16:14 libopen-pal.so.3 -> >>>>>>>> libopen-pal.so.3.0.0 >>>>>>>> -rwxr-xr-x. 1 root root 386324 Jan 20 2012 libopen-pal.so.3.0.0 >>>>>>>> lrwxrwxrwx. 1 root root 20 Sep 15 12:25 libopen-rte.so -> >>>>>>>> libopen-rte.so.3.0.0 >>>>>>>> lrwxrwxrwx. 1 root root 20 Sep 14 16:14 libopen-rte.so.3 -> >>>>>>>> libopen-rte.so.3.0.0 >>>>>>>> -rwxr-xr-x. 1 root root 790052 Jan 20 2012 libopen-rte.so.3.0.0 >>>>>>>> -rw-r--r--. 1 root root 301520 Jan 20 2012 libotf.a >>>>>>>> lrwxrwxrwx. 1 root root 15 Sep 15 12:25 libotf.so -> >>>>>>>> libotf.so.0.0.1 >>>>>>>> lrwxrwxrwx. 1 root root 15 Sep 14 16:14 libotf.so.0 -> >>>>>>>> libotf.so.0.0.1 >>>>>>>> -rwxr-xr-x. 1 root root 206384 Jan 20 2012 libotf.so.0.0.1 >>>>>>>> -rw-r--r--. 1 root root 337970 Jan 20 2012 libvt.a >>>>>>>> -rw-r--r--. 1 root root 591070 Jan 20 2012 libvt-hyb.a >>>>>>>> lrwxrwxrwx. 1 root root 18 Sep 15 12:25 libvt-hyb.so -> >>>>>>>> libvt-hyb.so.0.0.0 >>>>>>>> lrwxrwxrwx. 1 root root 18 Sep 14 16:14 libvt-hyb.so.0 -> >>>>>>>> libvt-hyb.so.0.0.0 >>>>>>>> -rwxr-xr-x. 1 root root 428844 Jan 20 2012 libvt-hyb.so.0.0.0 >>>>>>>> -rw-r--r--. 1 root root 541004 Jan 20 2012 libvt-mpi.a >>>>>>>> lrwxrwxrwx. 1 root root 18 Sep 15 12:25 libvt-mpi.so -> >>>>>>>> libvt-mpi.so.0.0.0 >>>>>>>> lrwxrwxrwx. 1 root root 18 Sep 14 16:14 libvt-mpi.so.0 -> >>>>>>>> libvt-mpi.so.0.0.0 >>>>>>>> -rwxr-xr-x. 1 root root 396352 Jan 20 2012 libvt-mpi.so.0.0.0 >>>>>>>> -rw-r--r--. 1 root root 372352 Jan 20 2012 libvt-mt.a >>>>>>>> lrwxrwxrwx. 1 root root 17 Sep 15 12:25 libvt-mt.so -> >>>>>>>> libvt-mt.so.0.0.0 >>>>>>>> lrwxrwxrwx. 1 root root 17 Sep 14 16:14 libvt-mt.so.0 -> >>>>>>>> libvt-mt.so.0.0.0 >>>>>>>> -rwxr-xr-x. 1 root root 266104 Jan 20 2012 libvt-mt.so.0.0.0 >>>>>>>> -rw-r--r--. 1 root root 60390 Jan 20 2012 libvt-pomp.a >>>>>>>> lrwxrwxrwx. 1 root root 14 Sep 15 12:25 libvt.so -> >>>>>>>> libvt.so.0.0.0 >>>>>>>> lrwxrwxrwx. 1 root root 14 Sep 14 16:14 libvt.so.0 -> >>>>>>>> libvt.so.0.0.0 >>>>>>>> -rwxr-xr-x. 1 root root 242604 Jan 20 2012 libvt.so.0.0.0 >>>>>>>> -rwxr-xr-x. 1 root root 303591 Jan 20 2012 mpi.mod >>>>>>>> drwxr-xr-x. 2 root root 4096 Sep 14 16:14 openmpi >>>>>>>> >>>>>>>> >>>>>>>> The file (actually, a link) it claims it can't find: >>>>>>>> libmpi_f90.so.1, is clearly there. And >>>>>>>> LD_LIBRARY_PATH=/usr/lib/openmpi/lib/. >>>>>>>> >>>>>>>> What's the problem? >>>>>>>> >>>>>>>> ---John >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>> >>>>>> >>>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >