Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work
It has been explained in a different thread on [ofa-general] that the problem lies in a combination of the OpenIB-cma provider not setting the local and remote port numbers on endpoints correctly and Open MPI stepping over the IA to save the port number to circumvent this problem, thereby confusing the provider. I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI 1.2.1 release) and this fixes the problem. As the problem in the provider is currently being fixed, the whole saving of the port number in the uDAPL BTL code will be unnecessary in the future. Steve Wise wrote: >>> Can the UDAPL OFED wizards shed any light on the error messages that >>> are listed below? In particular, these seem to be worrysome: >>> setup_listener Permission denied >>> setup_listener Address already in use >> These failures are from rdma_cm_bind indicating the port is already >> bound to this IA address. How are you creating the service point? >> dat_psp_create or dat_psp_create_any? If it is psp_create_any then you >> will see some failures until it gets to a free port. That is normal. >> Just make sure your create call returns DAT_SUCCESS. >> > > Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down > and let the rdma-cma pick an available port number? > > > > ___ > general mailing list > gene...@lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > -- | _ RWTH | Boris Bierbaum |_|_`_ | Lehrstuhl fuer Betriebssysteme | |_) _ | RWTH Aachen D-52056 Aachen |_)(_` | Tel: +49-241-80-27805 ._) | Fax: +49-241-80-22339
Re: [OMPI users] Newbie question. Please help.
Hi, We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on x86_64 with opensuse and compiled with gcc and Intel fortran and use torque PBS. I used standard configure to build openmpi something like ./configure --prefix=/usr/local --enable-static --with-threads --with-tm=/usr/local --with-libnuma I used the ACLM math lapack libs and built Blacs and Scalapack with them too. I attached my vasp makefile, I might of added mpi.o : mpi.F $(CPP) $(FC) -FR -lowercase -O0 -c $*$(SUFFIX) to the end of the make file, It doesn't look like it is in the example makefiles they give, but I compiled this a while ago. Hope this helps. Cheers, Kevin On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote: > Hi, all. I am new to OpenMPI and after initial setup I tried to run > my app but got the followign errors: > > [node07.my.com:16673] *** An error occurred in MPI_Comm_rank > [node07.my.com:16673] *** on communicator MPI_COMM_WORLD > [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator > [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye) > [node07.my.com:16674] *** An error occurred in MPI_Comm_rank > [node07.my.com:16674] *** on communicator MPI_COMM_WORLD > [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator > [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye) > [node07.my.com:16675] *** An error occurred in MPI_Comm_rank > [node07.my.com:16675] *** on communicator MPI_COMM_WORLD > [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator > [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye) > [node07.my.com:16676] *** An error occurred in MPI_Comm_rank > [node07.my.com:16676] *** on communicator MPI_COMM_WORLD > [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator > [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye) > mpiexec noticed that job rank 2 with PID 16675 on node node07 exited > on signal 60 (Real-time signal 26). > > /usr/local/openmpi-1.2.1/bin/ompi_info > Open MPI: 1.2.1 >Open MPI SVN revision: r14481 > Open RTE: 1.2.1 >Open RTE SVN revision: r14481 > OPAL: 1.2.1 >OPAL SVN revision: r14481 > Prefix: /usr/local/openmpi-1.2.1 > Configured architecture: x86_64-unknown-linux-gnu >Configured by: root >Configured on: Mon May 7 18:32:56 PDT 2007 > Configure host: neptune.nanostellar.com > Built by: root > Built on: Mon May 7 18:40:28 PDT 2007 > Built host: neptune.my.com > C bindings: yes > C++ bindings: yes > Fortran77 bindings: yes (all) > Fortran90 bindings: yes > Fortran90 bindings size: small > C compiler: gcc > C compiler absolute: /usr/bin/gcc > C++ compiler: g++ >C++ compiler absolute: /usr/bin/g++ > Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort > Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort > Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort > Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort > C profiling: yes >C++ profiling: yes > Fortran77 profiling: yes > Fortran90 profiling: yes > C++ exceptions: no > Thread support: posix (mpi: no, progress: no) > Internal debug support: no > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > libltdl support: yes >Heterogeneous support: yes > mpirun default --prefix: yes >MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1) > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1) >MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1) >MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1) >MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.1) >MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.1) > MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.1) > MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.1) >MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) >MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) > MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1) > MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1) > MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1) > MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1) > MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1) >MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.1) >MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1) > MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1) > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1) > MCA bml: r2 (MCA v1.
Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work
You say that fixes the problem, does it work even when running more than one MPI process per node? (that is the case the hack fixes) Simply doing an mpirun with a -np paremeter higher than the number of nodes you have set up should trigger this case, and making sure to use '-mca btl udapl,self' (ie not SM or anything else). Andrew Boris Bierbaum wrote: It has been explained in a different thread on [ofa-general] that the problem lies in a combination of the OpenIB-cma provider not setting the local and remote port numbers on endpoints correctly and Open MPI stepping over the IA to save the port number to circumvent this problem, thereby confusing the provider. I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI 1.2.1 release) and this fixes the problem. As the problem in the provider is currently being fixed, the whole saving of the port number in the uDAPL BTL code will be unnecessary in the future. Steve Wise wrote: Can the UDAPL OFED wizards shed any light on the error messages that are listed below? In particular, these seem to be worrysome: setup_listener Permission denied setup_listener Address already in use These failures are from rdma_cm_bind indicating the port is already bound to this IA address. How are you creating the service point? dat_psp_create or dat_psp_create_any? If it is psp_create_any then you will see some failures until it gets to a free port. That is normal. Just make sure your create call returns DAT_SUCCESS. Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down and let the rdma-cma pick an available port number? ___ general mailing list gene...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work
I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2 processes per node and --mca btl udapl,self. I didn't encouter any problems. The comment above line 197 says that dat_ep_query() returns wrong port numbers (which it does indeed), but I can't find any call to dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date? Boris Andrew Friedley wrote: > You say that fixes the problem, does it work even when running more than > one MPI process per node? (that is the case the hack fixes) Simply > doing an mpirun with a -np paremeter higher than the number of nodes you > have set up should trigger this case, and making sure to use '-mca btl > udapl,self' (ie not SM or anything else). > > Andrew > > Boris Bierbaum wrote: >> It has been explained in a different thread on [ofa-general] that the >> problem lies in a combination of the OpenIB-cma provider not setting the >> local and remote port numbers on endpoints correctly and Open MPI >> stepping over the IA to save the port number to circumvent this problem, >> thereby confusing the provider. >> >> I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI >> 1.2.1 release) and this fixes the problem. As the problem in the >> provider is currently being fixed, the whole saving of the port number >> in the uDAPL BTL code will be unnecessary in the future. >> >> Steve Wise wrote: > Can the UDAPL OFED wizards shed any light on the error messages that > are listed below? In particular, these seem to be worrysome: > >> setup_listener Permission denied > setup_listener Address already in use These failures are from rdma_cm_bind indicating the port is already bound to this IA address. How are you creating the service point? dat_psp_create or dat_psp_create_any? If it is psp_create_any then you will see some failures until it gets to a free port. That is normal. Just make sure your create call returns DAT_SUCCESS. >>> Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down >>> and let the rdma-cma pick an available port number? >>> >>> >>> >>> ___ >>> general mailing list >>> gene...@lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> >>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general >>> >> > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- | _ RWTH | Boris Bierbaum |_|_`_ | Lehrstuhl fuer Betriebssysteme | |_) _ | RWTH Aachen D-52056 Aachen |_)(_` | Tel: +49-241-80-27805 ._) | Fax: +49-241-80-22339
Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work
OK, strange but good. Yeah I wouldn't be surprised if something has been changed, though I wouldn't know what, and I don't have time right now to go digging :( Maybe Don Kerr knows something? Andrew Boris Bierbaum wrote: I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2 processes per node and --mca btl udapl,self. I didn't encouter any problems. The comment above line 197 says that dat_ep_query() returns wrong port numbers (which it does indeed), but I can't find any call to dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date? Boris Andrew Friedley wrote: You say that fixes the problem, does it work even when running more than one MPI process per node? (that is the case the hack fixes) Simply doing an mpirun with a -np paremeter higher than the number of nodes you have set up should trigger this case, and making sure to use '-mca btl udapl,self' (ie not SM or anything else). Andrew Boris Bierbaum wrote: It has been explained in a different thread on [ofa-general] that the problem lies in a combination of the OpenIB-cma provider not setting the local and remote port numbers on endpoints correctly and Open MPI stepping over the IA to save the port number to circumvent this problem, thereby confusing the provider. I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI 1.2.1 release) and this fixes the problem. As the problem in the provider is currently being fixed, the whole saving of the port number in the uDAPL BTL code will be unnecessary in the future. Steve Wise wrote: Can the UDAPL OFED wizards shed any light on the error messages that are listed below? In particular, these seem to be worrysome: setup_listener Permission denied setup_listener Address already in use These failures are from rdma_cm_bind indicating the port is already bound to this IA address. How are you creating the service point? dat_psp_create or dat_psp_create_any? If it is psp_create_any then you will see some failures until it gets to a free port. That is normal. Just make sure your create call returns DAT_SUCCESS. Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down and let the rdma-cma pick an available port number? ___ general mailing list gene...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work
Looking at that section it appears that we store the port value locally in udapl_addr and use the local copy, so changing the udapl attribute may not be doing anything for the BTL. I will run some tests as well. -DON Andrew Friedley wrote: OK, strange but good. Yeah I wouldn't be surprised if something has been changed, though I wouldn't know what, and I don't have time right now to go digging :( Maybe Don Kerr knows something? Andrew Boris Bierbaum wrote: I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2 processes per node and --mca btl udapl,self. I didn't encouter any problems. The comment above line 197 says that dat_ep_query() returns wrong port numbers (which it does indeed), but I can't find any call to dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date? Boris Andrew Friedley wrote: You say that fixes the problem, does it work even when running more than one MPI process per node? (that is the case the hack fixes) Simply doing an mpirun with a -np paremeter higher than the number of nodes you have set up should trigger this case, and making sure to use '-mca btl udapl,self' (ie not SM or anything else). Andrew Boris Bierbaum wrote: It has been explained in a different thread on [ofa-general] that the problem lies in a combination of the OpenIB-cma provider not setting the local and remote port numbers on endpoints correctly and Open MPI stepping over the IA to save the port number to circumvent this problem, thereby confusing the provider. I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI 1.2.1 release) and this fixes the problem. As the problem in the provider is currently being fixed, the whole saving of the port number in the uDAPL BTL code will be unnecessary in the future. Steve Wise wrote: Can the UDAPL OFED wizards shed any light on the error messages that are listed below? In particular, these seem to be worrysome: setup_listener Permission denied setup_listener Address already in use These failures are from rdma_cm_bind indicating the port is already bound to this IA address. How are you creating the service point? dat_psp_create or dat_psp_create_any? If it is psp_create_any then you will see some failures until it gets to a free port. That is normal. Just make sure your create call returns DAT_SUCCESS. Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down and let the rdma-cma pick an available port number? ___ general mailing list gene...@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] [ofa-general] Re: openMPI over uDAPL doesn't work
I thought about it again: There's probably no call to dat_ep_query() *because* it returns wrong port numbers and the port numbers saved by the uDAPL BTL code itself are used. I'll leave the debugging to those who know the code ... ;-) Boris Andrew Friedley wrote: > OK, strange but good. Yeah I wouldn't be surprised if something has > been changed, though I wouldn't know what, and I don't have time right > now to go digging :( Maybe Don Kerr knows something? > > Andrew > > > Boris Bierbaum wrote: >> I've run the whole IMB Benchmark Suite on 2, 3, and 4 nodes with 2 >> processes per node and --mca btl udapl,self. I didn't encouter any problems. >> >> The comment above line 197 says that dat_ep_query() returns wrong port >> numbers (which it does indeed), but I can't find any call to >> dat_ep_query() in the uDAPL BTL code. Maybe the comment is out of date? >> >> Boris >> >> >> Andrew Friedley wrote: >>> You say that fixes the problem, does it work even when running more than >>> one MPI process per node? (that is the case the hack fixes) Simply >>> doing an mpirun with a -np paremeter higher than the number of nodes you >>> have set up should trigger this case, and making sure to use '-mca btl >>> udapl,self' (ie not SM or anything else). >>> >>> Andrew >>> >>> Boris Bierbaum wrote: It has been explained in a different thread on [ofa-general] that the problem lies in a combination of the OpenIB-cma provider not setting the local and remote port numbers on endpoints correctly and Open MPI stepping over the IA to save the port number to circumvent this problem, thereby confusing the provider. I commented out line 197 in ompi/mca/btl/udapl/btl_udapl.c (Open MPI 1.2.1 release) and this fixes the problem. As the problem in the provider is currently being fixed, the whole saving of the port number in the uDAPL BTL code will be unnecessary in the future. Steve Wise wrote: >>> Can the UDAPL OFED wizards shed any light on the error messages that >>> are listed below? In particular, these seem to be worrysome: >>> setup_listener Permission denied >>> setup_listener Address already in use >> These failures are from rdma_cm_bind indicating the port is already >> bound to this IA address. How are you creating the service point? >> dat_psp_create or dat_psp_create_any? If it is psp_create_any then you >> will see some failures until it gets to a free port. That is normal. >> Just make sure your create call returns DAT_SUCCESS. >> > Arlin, why doesn't dapl_psp_create_any() just pass a port of zero down > and let the rdma-cma pick an available port number? > > > > ___ > general mailing list > gene...@lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- | _ RWTH | Boris Bierbaum |_|_`_ | Lehrstuhl fuer Betriebssysteme | |_) _ | RWTH Aachen D-52056 Aachen |_)(_` | Tel: +49-241-80-27805 ._) | Fax: +49-241-80-22339
Re: [OMPI users] Newbie question. Please help.
Thank Kevin and Brook for replying to my question. I am going to try out what Kevin suggested. Steven. On 5/9/07, Kevin Radican wrote: Hi, We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on x86_64 with opensuse and compiled with gcc and Intel fortran and use torque PBS. I used standard configure to build openmpi something like ./configure --prefix=/usr/local --enable-static --with-threads --with-tm=/usr/local --with-libnuma I used the ACLM math lapack libs and built Blacs and Scalapack with them too. I attached my vasp makefile, I might of added mpi.o : mpi.F $(CPP) $(FC) -FR -lowercase -O0 -c $*$(SUFFIX) to the end of the make file, It doesn't look like it is in the example makefiles they give, but I compiled this a while ago. Hope this helps. Cheers, Kevin On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote: > Hi, all. I am new to OpenMPI and after initial setup I tried to run > my app but got the followign errors: > > [node07.my.com:16673] *** An error occurred in MPI_Comm_rank > [node07.my.com:16673] *** on communicator MPI_COMM_WORLD > [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator > [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye) > [node07.my.com:16674] *** An error occurred in MPI_Comm_rank > [node07.my.com:16674] *** on communicator MPI_COMM_WORLD > [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator > [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye) > [node07.my.com:16675] *** An error occurred in MPI_Comm_rank > [node07.my.com:16675] *** on communicator MPI_COMM_WORLD > [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator > [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye) > [node07.my.com:16676] *** An error occurred in MPI_Comm_rank > [node07.my.com:16676] *** on communicator MPI_COMM_WORLD > [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator > [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye) > mpiexec noticed that job rank 2 with PID 16675 on node node07 exited > on signal 60 (Real-time signal 26). > > /usr/local/openmpi-1.2.1/bin/ompi_info > Open MPI: 1.2.1 >Open MPI SVN revision: r14481 > Open RTE: 1.2.1 >Open RTE SVN revision: r14481 > OPAL: 1.2.1 >OPAL SVN revision: r14481 > Prefix: /usr/local/openmpi-1.2.1 > Configured architecture: x86_64-unknown-linux-gnu >Configured by: root >Configured on: Mon May 7 18:32:56 PDT 2007 > Configure host: neptune.nanostellar.com > Built by: root > Built on: Mon May 7 18:40:28 PDT 2007 > Built host: neptune.my.com > C bindings: yes > C++ bindings: yes > Fortran77 bindings: yes (all) > Fortran90 bindings: yes > Fortran90 bindings size: small > C compiler: gcc > C compiler absolute: /usr/bin/gcc > C++ compiler: g++ >C++ compiler absolute: /usr/bin/g++ > Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort > Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort > Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort > Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort > C profiling: yes >C++ profiling: yes > Fortran77 profiling: yes > Fortran90 profiling: yes > C++ exceptions: no > Thread support: posix (mpi: no, progress: no) > Internal debug support: no > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > libltdl support: yes >Heterogeneous support: yes > mpirun default --prefix: yes >MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1) > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1) >MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1) >MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1) >MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.1) >MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.1) > MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.1) > MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.1) >MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) >MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) > MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1) > MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1) > MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1) > MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1) > MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1) >MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.1) >MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1) > MCA pml: cm (MCA v1.0, API v
Re: [OMPI users] Newbie question. Please help.
Hi, Kevin and all. I tried with the following: ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6 --with-tm=/usr/local/pbs --enable-mpirun-prefix-by-default --enable-mpi-f90 --with-threads=posix --enable-static and added the mpi.o in my VASP's makefile but i still got error. I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and my machines are dual proc dual core Xeon 5130 . Well, I am going to try acml too. Attached is my makefile for VASP and I am not sure if I missed anything again. Thank you very much for all your helps. On 5/9/07, Steven Truong wrote: Thank Kevin and Brook for replying to my question. I am going to try out what Kevin suggested. Steven. On 5/9/07, Kevin Radican wrote: > Hi, > > We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on > x86_64 with opensuse and compiled with gcc and Intel fortran and use > torque PBS. > > I used standard configure to build openmpi something like > > ./configure --prefix=/usr/local --enable-static --with-threads > --with-tm=/usr/local --with-libnuma > > I used the ACLM math lapack libs and built Blacs and Scalapack with them > too. > > I attached my vasp makefile, I might of added > > mpi.o : mpi.F > $(CPP) > $(FC) -FR -lowercase -O0 -c $*$(SUFFIX) > > to the end of the make file, It doesn't look like it is in the example > makefiles they give, but I compiled this a while ago. > > Hope this helps. > > Cheers, > Kevin > > > > > > On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote: > > Hi, all. I am new to OpenMPI and after initial setup I tried to run > > my app but got the followign errors: > > > > [node07.my.com:16673] *** An error occurred in MPI_Comm_rank > > [node07.my.com:16673] *** on communicator MPI_COMM_WORLD > > [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator > > [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye) > > [node07.my.com:16674] *** An error occurred in MPI_Comm_rank > > [node07.my.com:16674] *** on communicator MPI_COMM_WORLD > > [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator > > [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye) > > [node07.my.com:16675] *** An error occurred in MPI_Comm_rank > > [node07.my.com:16675] *** on communicator MPI_COMM_WORLD > > [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator > > [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye) > > [node07.my.com:16676] *** An error occurred in MPI_Comm_rank > > [node07.my.com:16676] *** on communicator MPI_COMM_WORLD > > [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator > > [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye) > > mpiexec noticed that job rank 2 with PID 16675 on node node07 exited > > on signal 60 (Real-time signal 26). > > > > /usr/local/openmpi-1.2.1/bin/ompi_info > > Open MPI: 1.2.1 > >Open MPI SVN revision: r14481 > > Open RTE: 1.2.1 > >Open RTE SVN revision: r14481 > > OPAL: 1.2.1 > >OPAL SVN revision: r14481 > > Prefix: /usr/local/openmpi-1.2.1 > > Configured architecture: x86_64-unknown-linux-gnu > >Configured by: root > >Configured on: Mon May 7 18:32:56 PDT 2007 > > Configure host: neptune.nanostellar.com > > Built by: root > > Built on: Mon May 7 18:40:28 PDT 2007 > > Built host: neptune.my.com > > C bindings: yes > > C++ bindings: yes > > Fortran77 bindings: yes (all) > > Fortran90 bindings: yes > > Fortran90 bindings size: small > > C compiler: gcc > > C compiler absolute: /usr/bin/gcc > > C++ compiler: g++ > >C++ compiler absolute: /usr/bin/g++ > > Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort > > Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort > > Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort > > Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort > > C profiling: yes > >C++ profiling: yes > > Fortran77 profiling: yes > > Fortran90 profiling: yes > > C++ exceptions: no > > Thread support: posix (mpi: no, progress: no) > > Internal debug support: no > > MPI parameter check: runtime > > Memory profiling support: no > > Memory debugging support: no > > libltdl support: yes > >Heterogeneous support: yes > > mpirun default --prefix: yes > >MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1) > > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1) > >MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1) > >MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1) > >MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.1) > >MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.1) > > MCA installdirs: env (MCA v1.0, API
Re: [OMPI users] Newbie question. Please help.
Oh, no. I tried with ACML and had the same set of errors. Steven. On 5/9/07, Steven Truong wrote: Hi, Kevin and all. I tried with the following: ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6 --with-tm=/usr/local/pbs --enable-mpirun-prefix-by-default --enable-mpi-f90 --with-threads=posix --enable-static and added the mpi.o in my VASP's makefile but i still got error. I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and my machines are dual proc dual core Xeon 5130 . Well, I am going to try acml too. Attached is my makefile for VASP and I am not sure if I missed anything again. Thank you very much for all your helps. On 5/9/07, Steven Truong wrote: > Thank Kevin and Brook for replying to my question. I am going to try > out what Kevin suggested. > > Steven. > > On 5/9/07, Kevin Radican wrote: > > Hi, > > > > We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on > > x86_64 with opensuse and compiled with gcc and Intel fortran and use > > torque PBS. > > > > I used standard configure to build openmpi something like > > > > ./configure --prefix=/usr/local --enable-static --with-threads > > --with-tm=/usr/local --with-libnuma > > > > I used the ACLM math lapack libs and built Blacs and Scalapack with them > > too. > > > > I attached my vasp makefile, I might of added > > > > mpi.o : mpi.F > > $(CPP) > > $(FC) -FR -lowercase -O0 -c $*$(SUFFIX) > > > > to the end of the make file, It doesn't look like it is in the example > > makefiles they give, but I compiled this a while ago. > > > > Hope this helps. > > > > Cheers, > > Kevin > > > > > > > > > > > > On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote: > > > Hi, all. I am new to OpenMPI and after initial setup I tried to run > > > my app but got the followign errors: > > > > > > [node07.my.com:16673] *** An error occurred in MPI_Comm_rank > > > [node07.my.com:16673] *** on communicator MPI_COMM_WORLD > > > [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator > > > [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye) > > > [node07.my.com:16674] *** An error occurred in MPI_Comm_rank > > > [node07.my.com:16674] *** on communicator MPI_COMM_WORLD > > > [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator > > > [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye) > > > [node07.my.com:16675] *** An error occurred in MPI_Comm_rank > > > [node07.my.com:16675] *** on communicator MPI_COMM_WORLD > > > [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator > > > [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye) > > > [node07.my.com:16676] *** An error occurred in MPI_Comm_rank > > > [node07.my.com:16676] *** on communicator MPI_COMM_WORLD > > > [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator > > > [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye) > > > mpiexec noticed that job rank 2 with PID 16675 on node node07 exited > > > on signal 60 (Real-time signal 26). > > > > > > /usr/local/openmpi-1.2.1/bin/ompi_info > > > Open MPI: 1.2.1 > > >Open MPI SVN revision: r14481 > > > Open RTE: 1.2.1 > > >Open RTE SVN revision: r14481 > > > OPAL: 1.2.1 > > >OPAL SVN revision: r14481 > > > Prefix: /usr/local/openmpi-1.2.1 > > > Configured architecture: x86_64-unknown-linux-gnu > > >Configured by: root > > >Configured on: Mon May 7 18:32:56 PDT 2007 > > > Configure host: neptune.nanostellar.com > > > Built by: root > > > Built on: Mon May 7 18:40:28 PDT 2007 > > > Built host: neptune.my.com > > > C bindings: yes > > > C++ bindings: yes > > > Fortran77 bindings: yes (all) > > > Fortran90 bindings: yes > > > Fortran90 bindings size: small > > > C compiler: gcc > > > C compiler absolute: /usr/bin/gcc > > > C++ compiler: g++ > > >C++ compiler absolute: /usr/bin/g++ > > > Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort > > > Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort > > > Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort > > > Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort > > > C profiling: yes > > >C++ profiling: yes > > > Fortran77 profiling: yes > > > Fortran90 profiling: yes > > > C++ exceptions: no > > > Thread support: posix (mpi: no, progress: no) > > > Internal debug support: no > > > MPI parameter check: runtime > > > Memory profiling support: no > > > Memory debugging support: no > > > libltdl support: yes > > >Heterogeneous support: yes > > > mpirun default --prefix: yes > > >MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1) > > > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1) > > >MCA paffinity: linux (MCA v1
Re: [OMPI users] Newbie question. Please help.
Can you send a simple test that reproduces these errors? I.e., if there's a single, simple package that you can send instructions on how to build, it would be most helpful if we could reproduce the error (and therefore figure out how to fix it). Thanks! On May 9, 2007, at 2:19 PM, Steven Truong wrote: Oh, no. I tried with ACML and had the same set of errors. Steven. On 5/9/07, Steven Truong wrote: Hi, Kevin and all. I tried with the following: ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6 --with-tm=/usr/local/pbs --enable-mpirun-prefix-by-default --enable-mpi-f90 --with-threads=posix --enable-static and added the mpi.o in my VASP's makefile but i still got error. I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and my machines are dual proc dual core Xeon 5130 . Well, I am going to try acml too. Attached is my makefile for VASP and I am not sure if I missed anything again. Thank you very much for all your helps. On 5/9/07, Steven Truong wrote: Thank Kevin and Brook for replying to my question. I am going to try out what Kevin suggested. Steven. On 5/9/07, Kevin Radican wrote: Hi, We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on x86_64 with opensuse and compiled with gcc and Intel fortran and use torque PBS. I used standard configure to build openmpi something like ./configure --prefix=/usr/local --enable-static --with-threads --with-tm=/usr/local --with-libnuma I used the ACLM math lapack libs and built Blacs and Scalapack with them too. I attached my vasp makefile, I might of added mpi.o : mpi.F $(CPP) $(FC) -FR -lowercase -O0 -c $*$(SUFFIX) to the end of the make file, It doesn't look like it is in the example makefiles they give, but I compiled this a while ago. Hope this helps. Cheers, Kevin On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote: Hi, all. I am new to OpenMPI and after initial setup I tried to run my app but got the followign errors: [node07.my.com:16673] *** An error occurred in MPI_Comm_rank [node07.my.com:16673] *** on communicator MPI_COMM_WORLD [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye) [node07.my.com:16674] *** An error occurred in MPI_Comm_rank [node07.my.com:16674] *** on communicator MPI_COMM_WORLD [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye) [node07.my.com:16675] *** An error occurred in MPI_Comm_rank [node07.my.com:16675] *** on communicator MPI_COMM_WORLD [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye) [node07.my.com:16676] *** An error occurred in MPI_Comm_rank [node07.my.com:16676] *** on communicator MPI_COMM_WORLD [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye) mpiexec noticed that job rank 2 with PID 16675 on node node07 exited on signal 60 (Real-time signal 26). /usr/local/openmpi-1.2.1/bin/ompi_info Open MPI: 1.2.1 Open MPI SVN revision: r14481 Open RTE: 1.2.1 Open RTE SVN revision: r14481 OPAL: 1.2.1 OPAL SVN revision: r14481 Prefix: /usr/local/openmpi-1.2.1 Configured architecture: x86_64-unknown-linux-gnu Configured by: root Configured on: Mon May 7 18:32:56 PDT 2007 Configure host: neptune.nanostellar.com Built by: root Built on: Mon May 7 18:40:28 PDT 2007 Built host: neptune.my.com C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: yes mpirun default --prefix: yes MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1) MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1) MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1
Re: [OMPI users] Newbie question. Please help.
Hi, Jeff. Thank you very much for looking into this issue. I am afraid that I can not give you the application/package because it is a comercial software. I believe that a lot of people are using this VASP software package http://cms.mpi.univie.ac.at/vasp/. My current environment uses MPICH 1.2.7p1, however, because a new set of dual core machines has posed a new set of challenges and I am looking into replacing MPICH with openmpi on these machines. Could Mr. Radican, who wrote that he was able to run VASP with openMPI, provide a lot more detail regarding how he configure openmpi, how he compile and run VASP job and anything relating to this issue? Thank you very much for all your helps. Steven. On 5/9/07, Jeff Squyres wrote: Can you send a simple test that reproduces these errors? I.e., if there's a single, simple package that you can send instructions on how to build, it would be most helpful if we could reproduce the error (and therefore figure out how to fix it). Thanks! On May 9, 2007, at 2:19 PM, Steven Truong wrote: > Oh, no. I tried with ACML and had the same set of errors. > > Steven. > > On 5/9/07, Steven Truong wrote: >> Hi, Kevin and all. I tried with the following: >> >> ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6 >> --with-tm=/usr/local/pbs --enable-mpirun-prefix-by-default >> --enable-mpi-f90 --with-threads=posix --enable-static >> >> and added the mpi.o in my VASP's makefile but i still got error. >> >> I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and >> my machines are dual proc dual core Xeon 5130 . >> >> Well, I am going to try acml too. >> >> Attached is my makefile for VASP and I am not sure if I missed >> anything again. >> >> Thank you very much for all your helps. >> >> On 5/9/07, Steven Truong wrote: >>> Thank Kevin and Brook for replying to my question. I am going to >>> try >>> out what Kevin suggested. >>> >>> Steven. >>> >>> On 5/9/07, Kevin Radican wrote: Hi, We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on x86_64 with opensuse and compiled with gcc and Intel fortran and use torque PBS. I used standard configure to build openmpi something like ./configure --prefix=/usr/local --enable-static --with-threads --with-tm=/usr/local --with-libnuma I used the ACLM math lapack libs and built Blacs and Scalapack with them too. I attached my vasp makefile, I might of added mpi.o : mpi.F $(CPP) $(FC) -FR -lowercase -O0 -c $*$(SUFFIX) to the end of the make file, It doesn't look like it is in the example makefiles they give, but I compiled this a while ago. Hope this helps. Cheers, Kevin On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote: > Hi, all. I am new to OpenMPI and after initial setup I tried > to run > my app but got the followign errors: > > [node07.my.com:16673] *** An error occurred in MPI_Comm_rank > [node07.my.com:16673] *** on communicator MPI_COMM_WORLD > [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator > [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye) > [node07.my.com:16674] *** An error occurred in MPI_Comm_rank > [node07.my.com:16674] *** on communicator MPI_COMM_WORLD > [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator > [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye) > [node07.my.com:16675] *** An error occurred in MPI_Comm_rank > [node07.my.com:16675] *** on communicator MPI_COMM_WORLD > [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator > [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye) > [node07.my.com:16676] *** An error occurred in MPI_Comm_rank > [node07.my.com:16676] *** on communicator MPI_COMM_WORLD > [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator > [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye) > mpiexec noticed that job rank 2 with PID 16675 on node node07 > exited > on signal 60 (Real-time signal 26). > > /usr/local/openmpi-1.2.1/bin/ompi_info > Open MPI: 1.2.1 >Open MPI SVN revision: r14481 > Open RTE: 1.2.1 >Open RTE SVN revision: r14481 > OPAL: 1.2.1 >OPAL SVN revision: r14481 > Prefix: /usr/local/openmpi-1.2.1 > Configured architecture: x86_64-unknown-linux-gnu >Configured by: root >Configured on: Mon May 7 18:32:56 PDT 2007 > Configure host: neptune.nanostellar.com > Built by: root > Built on: Mon May 7 18:40:28 PDT 2007 > Built host: neptune.my.com > C bindings: yes > C++ bindings: yes > Fortran77 bindings: yes (all) >>>
Re: [OMPI users] Newbie question. Please help.
I have mailed the VASP maintainer asking for a copy of the code. Let's see what happens. On May 9, 2007, at 2:44 PM, Steven Truong wrote: Hi, Jeff. Thank you very much for looking into this issue. I am afraid that I can not give you the application/package because it is a comercial software. I believe that a lot of people are using this VASP software package http://cms.mpi.univie.ac.at/vasp/. My current environment uses MPICH 1.2.7p1, however, because a new set of dual core machines has posed a new set of challenges and I am looking into replacing MPICH with openmpi on these machines. Could Mr. Radican, who wrote that he was able to run VASP with openMPI, provide a lot more detail regarding how he configure openmpi, how he compile and run VASP job and anything relating to this issue? Thank you very much for all your helps. Steven. On 5/9/07, Jeff Squyres wrote: Can you send a simple test that reproduces these errors? I.e., if there's a single, simple package that you can send instructions on how to build, it would be most helpful if we could reproduce the error (and therefore figure out how to fix it). Thanks! On May 9, 2007, at 2:19 PM, Steven Truong wrote: Oh, no. I tried with ACML and had the same set of errors. Steven. On 5/9/07, Steven Truong wrote: Hi, Kevin and all. I tried with the following: ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6 --with-tm=/usr/local/pbs --enable-mpirun-prefix-by-default --enable-mpi-f90 --with-threads=posix --enable-static and added the mpi.o in my VASP's makefile but i still got error. I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and my machines are dual proc dual core Xeon 5130 . Well, I am going to try acml too. Attached is my makefile for VASP and I am not sure if I missed anything again. Thank you very much for all your helps. On 5/9/07, Steven Truong wrote: Thank Kevin and Brook for replying to my question. I am going to try out what Kevin suggested. Steven. On 5/9/07, Kevin Radican wrote: Hi, We use VASP 4.6 in parallel with opemmpi 1.1.2 without any problems on x86_64 with opensuse and compiled with gcc and Intel fortran and use torque PBS. I used standard configure to build openmpi something like ./configure --prefix=/usr/local --enable-static --with-threads --with-tm=/usr/local --with-libnuma I used the ACLM math lapack libs and built Blacs and Scalapack with them too. I attached my vasp makefile, I might of added mpi.o : mpi.F $(CPP) $(FC) -FR -lowercase -O0 -c $*$(SUFFIX) to the end of the make file, It doesn't look like it is in the example makefiles they give, but I compiled this a while ago. Hope this helps. Cheers, Kevin On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote: Hi, all. I am new to OpenMPI and after initial setup I tried to run my app but got the followign errors: [node07.my.com:16673] *** An error occurred in MPI_Comm_rank [node07.my.com:16673] *** on communicator MPI_COMM_WORLD [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye) [node07.my.com:16674] *** An error occurred in MPI_Comm_rank [node07.my.com:16674] *** on communicator MPI_COMM_WORLD [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye) [node07.my.com:16675] *** An error occurred in MPI_Comm_rank [node07.my.com:16675] *** on communicator MPI_COMM_WORLD [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye) [node07.my.com:16676] *** An error occurred in MPI_Comm_rank [node07.my.com:16676] *** on communicator MPI_COMM_WORLD [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye) mpiexec noticed that job rank 2 with PID 16675 on node node07 exited on signal 60 (Real-time signal 26). /usr/local/openmpi-1.2.1/bin/ompi_info Open MPI: 1.2.1 Open MPI SVN revision: r14481 Open RTE: 1.2.1 Open RTE SVN revision: r14481 OPAL: 1.2.1 OPAL SVN revision: r14481 Prefix: /usr/local/openmpi-1.2.1 Configured architecture: x86_64-unknown-linux-gnu Configured by: root Configured on: Mon May 7 18:32:56 PDT 2007 Configure host: neptune.nanostellar.com Built by: root Built on: Mon May 7 18:40:28 PDT 2007 Built host: neptune.my.com C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort Fortran90 compiler: /opt/intel/f
[OMPI users] CFP - HPCC 2007, Houston, TX - Extended Deadline: May 21, 2007
Due to many requests, the deadline has been extended to May 21, 2007. sorry for cross-posting CALL FOR PAPERS HPCC-07 The 2007 International Conference on High Performance Computing and Communications September 26 - 28, Houston, Texas, USA, http://www.tlc2.uh.edu/hpcc07/ With the rapid growth in computing and communications technology, the past decade has witnessed a proliferation of powerful parallel and distributed systems and an ever increasing demand for practice of high performance computing and communications (HPCC). HPCC has moved into the mainstream of computing and has become a key technology in determining future research and development activities in many academic and industrial branches, especially when the solution of large and complex problems must cope with very tight timing schedules. The HPCC-07 conference provides a forum for engineers and scientists in academia, industry, and government to address the resulting profound challenges and to present and discuss their new ideas, research results, applications and experience on all aspects of high performance computing and communications. With this third event in this conference series, HPCC-07 plans to continue the success of the previous two HPCC conferences, held 2005 in Sorrento, Italy, and HPCC-06 held in Munich, Germany, each with over 300 submitted papers and more than 100 participants. TOPICS OF INTEREST * Networking protocols, routing, algorithms * Languages and compilers for HPC * Parallel and distributed system architectures * Parallel and distributed algorithms * Embedded systems * Wireless, mobile and pervasive computing * Web services and Internet computing * Peer-to-peer computing * Grid computing * Cluster computing * Reliability, fault-tolerance, and security * Performance evaluation and measurement * Tools and environments for software development * Distributed systems and applications * High-performance scientific and engineering computing * Database applications and data mining * Biological/molecular computing * Collaborative and cooperative environments IMPORTANT DATES Paper submission: April 30, 2007 --> May 21, 2007 Acceptance notification: June 15, 2007 Camera ready version: June 29, 2007 Special session proposal: April 30, 2007 Conference: September 26-28, 2007 STEERING CHAIRS * Beniamino Di Martino, Seconda Universita' di Napoli, Italy * Laurence T. Yang, St. Francis Xavier University, Canada ORGANIZATION * Barbara Chapman, University of Houston (Co-Chair) * Jaspal Subhlok, University of Houston (Co-Chair) * Ronald Perrott, Queen's University of Belfast (Program Chair) * Rosalinda Mendez, University of Houston (Local Chair) PUBLICITY CHAIRS * Hai Jiang, Arkansas State University, USA * Weisong Shi, Wayne State University, USA * Haiying Shen, University of Arkansas, USA PUBLICATIONS Accepted papers are published in the proceedings of the HPCC-07 conference by Springer's Lecture Notes in Computer Science (LNCS). Selected best papers will be considered for special issues of Scientific Programming and the International Journal of High Performance Computing and Networking (IJHPCN). CONTACT ADDRESS hpc...@tlc2.uh.edu or hpc...@googlegroups.com
Re: [OMPI users] Newbie question. Please help.
Thank Jeff very much for your efforts and helps. On 5/9/07, Jeff Squyres wrote: I have mailed the VASP maintainer asking for a copy of the code. Let's see what happens. On May 9, 2007, at 2:44 PM, Steven Truong wrote: > Hi, Jeff. Thank you very much for looking into this issue. I am > afraid that I can not give you the application/package because it is a > comercial software. I believe that a lot of people are using this > VASP software package http://cms.mpi.univie.ac.at/vasp/. > > My current environment uses MPICH 1.2.7p1, however, because a new set > of dual core machines has posed a new set of challenges and I am > looking into replacing MPICH with openmpi on these machines. > > Could Mr. Radican, who wrote that he was able to run VASP with > openMPI, provide a lot more detail regarding how he configure openmpi, > how he compile and run VASP job and anything relating to this issue? > > Thank you very much for all your helps. > Steven. > > On 5/9/07, Jeff Squyres wrote: >> Can you send a simple test that reproduces these errors? >> >> I.e., if there's a single, simple package that you can send >> instructions on how to build, it would be most helpful if we could >> reproduce the error (and therefore figure out how to fix it). >> >> Thanks! >> >> >> On May 9, 2007, at 2:19 PM, Steven Truong wrote: >> >>> Oh, no. I tried with ACML and had the same set of errors. >>> >>> Steven. >>> >>> On 5/9/07, Steven Truong wrote: Hi, Kevin and all. I tried with the following: ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6 --with-tm=/usr/local/pbs --enable-mpirun-prefix-by-default --enable-mpi-f90 --with-threads=posix --enable-static and added the mpi.o in my VASP's makefile but i still got error. I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and my machines are dual proc dual core Xeon 5130 . Well, I am going to try acml too. Attached is my makefile for VASP and I am not sure if I missed anything again. Thank you very much for all your helps. On 5/9/07, Steven Truong wrote: > Thank Kevin and Brook for replying to my question. I am going to > try > out what Kevin suggested. > > Steven. > > On 5/9/07, Kevin Radican wrote: >> Hi, >> >> We use VASP 4.6 in parallel with opemmpi 1.1.2 without any >> problems on >> x86_64 with opensuse and compiled with gcc and Intel fortran and >> use >> torque PBS. >> >> I used standard configure to build openmpi something like >> >> ./configure --prefix=/usr/local --enable-static --with-threads >> --with-tm=/usr/local --with-libnuma >> >> I used the ACLM math lapack libs and built Blacs and Scalapack >> with them >> too. >> >> I attached my vasp makefile, I might of added >> >> mpi.o : mpi.F >> $(CPP) >> $(FC) -FR -lowercase -O0 -c $*$(SUFFIX) >> >> to the end of the make file, It doesn't look like it is in the >> example >> makefiles they give, but I compiled this a while ago. >> >> Hope this helps. >> >> Cheers, >> Kevin >> >> >> >> >> >> On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote: >>> Hi, all. I am new to OpenMPI and after initial setup I tried >>> to run >>> my app but got the followign errors: >>> >>> [node07.my.com:16673] *** An error occurred in MPI_Comm_rank >>> [node07.my.com:16673] *** on communicator MPI_COMM_WORLD >>> [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator >>> [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye) >>> [node07.my.com:16674] *** An error occurred in MPI_Comm_rank >>> [node07.my.com:16674] *** on communicator MPI_COMM_WORLD >>> [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator >>> [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye) >>> [node07.my.com:16675] *** An error occurred in MPI_Comm_rank >>> [node07.my.com:16675] *** on communicator MPI_COMM_WORLD >>> [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator >>> [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye) >>> [node07.my.com:16676] *** An error occurred in MPI_Comm_rank >>> [node07.my.com:16676] *** on communicator MPI_COMM_WORLD >>> [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator >>> [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye) >>> mpiexec noticed that job rank 2 with PID 16675 on node node07 >>> exited >>> on signal 60 (Real-time signal 26). >>> >>> /usr/local/openmpi-1.2.1/bin/ompi_info >>> Open MPI: 1.2.1 >>>Open MPI SVN revision: r14481 >>> Open RTE: 1.2.1 >>>Open RTE SVN revision: r14481 >>> OPAL: 1.2.1 >>>OPAL SVN revision: r14481 >>> Prefix: /usr/local/op
[OMPI users] OMPI and OSU bandwidth benchmark
Greetings eveyrong, My name is Mike, and I have recently downloaded the OMPI v1.2.1 and decide to run the OSU bandwidth benchmark. However, I have noticed a few weird things during my run. Btw, I am using FreeBSD 6.2. The OSU bandwidth test basically pre-post many ISend and IRecv. It tries to measure the max. sustainable bandwidth. Here is an output (I didn't finish running, but it should be sufficient to show the problem that I am seeing): Quick system info: Two nodes testing (running Intel P4 Xeon 3.2Ghz Hyperthreading disabled, 1024Mb RAM). 3 1-Gig NiCs, all Intel Pro em1000(em0 and em2 are the private interfaces ( 10.1.x.x) , while em1 is the public interface) -- [myct@netbed21 ~/mpich/osu_benchmarks]$ mpirun --mca btl_tcp_if_include em0 --hostfile ~/mpd.hosts.private --mca btl tcp,self --mca btl_tcp_sndbuf 233016 --mca btl_tcp_rcvbuf 233016 -np 2 ./osu_bw # OSU MPI Bandwidth Test (Version 2.3) # Size Bandwidth (MB/s) 1 0.12 2 0.26 4 0.53 8 1.06 16 2.12 32 4.22 64 8.26 128 14.61 256 28.06 512 51.27 102482.59 2048102.21 4096110.53 8192114.58 16384 118.16 32768 120.71 65536 33.23 131072 41.75 262144 70.42 524288 82.96 ^Cmpirun: killing job... -- The rendezvous threshold is set to 64k by default. It seems that when the rendezvous starts, the performance dropped tremendously. Btw, this is an out-of-box run, I have not tweaked anything except changing the socket buffer sizes during runtime. Is there something obvious that I am not doing correctly? I have also attached the "ompi-info" output. Thanks for everything, Mike ompi_out Description: Binary data