Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)
Dear Jeff, Dear all, I normaly use "USE MPI" This is the answar fro intel HPC forum: *If you are switching between intel and openmpi you must remember not to mix environment. You might use modules to manage this. As the data types encodings differ, you must take care that all objects are built against the same headers.* Could someone explain me what are these modules and how I can use them? Thanks Diego Diego On 2 September 2015 at 19:07, Jeff Squyres (jsquyres) wrote: > Can you reproduce the error in a small example? > > Also, try using "use mpi" instead of "include 'mpif.h'", and see if that > turns up any errors. > > > > On Sep 2, 2015, at 12:13 PM, Diego Avesani > wrote: > > > > Dear Gilles, Dear all, > > I have found the error. Some CPU has no element to share. It was a my > error. > > > > Now I have another one: > > > > Fatal error in MPI_Isend: Invalid communicator, error stack: > > MPI_Isend(158): MPI_Isend(buf=0x137b7b4, count=1, INVALID DATATYPE, > dest=0, tag=0, comm=0x0, request=0x7fffe8726fc0) failed > > > > In this case with MPI does not work, with openMPI it works. > > > > Could you see some particular information from the error message? > > > > Diego > > > > > > Diego > > > > > > On 2 September 2015 at 14:52, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > > Diego, > > > > about MPI_Allreduce, you should use MPI_IN_PLACE if you want the same > buffer in send and recv > > > > about the stack, I notice comm is NULL which is a bit surprising... > > at first glance, type creation looks good. > > that being said, you do not check MPIdata%iErr is MPI_SUCCESS after each > MPI call. > > I recommend you first do this, so you can catch the error as soon it > happens, and hopefully understand why it occurs. > > > > Cheers, > > > > Gilles > > > > > > On Wednesday, September 2, 2015, Diego Avesani > wrote: > > Dear all, > > > > I have notice small difference between OPEN-MPI and intel MPI. > > For example in MPI_ALLREDUCE in intel MPI is not allowed to use the same > variable in send and receiving Buff. > > > > I have written my code in OPEN-MPI, but unfortunately I have to run in > on a intel-MPI cluster. > > Now I have the following error: > > > > atal error in MPI_Isend: Invalid communicator, error stack: > > MPI_Isend(158): MPI_Isend(buf=0x1dd27b0, count=1, INVALID DATATYPE, > dest=0, tag=0, comm=0x0, request=0x7fff9d7dd9f0) failed > > > > > > This is ho I create my type: > > > > CALL MPI_TYPE_VECTOR(1, Ncoeff_MLS, Ncoeff_MLS, MPI_DOUBLE_PRECISION, > coltype, MPIdata%iErr) > > CALL MPI_TYPE_COMMIT(coltype, MPIdata%iErr) > > ! > > CALL MPI_TYPE_VECTOR(1, nVar, nVar, coltype, MPI_WENO_TYPE, > MPIdata%iErr) > > CALL MPI_TYPE_COMMIT(MPI_WENO_TYPE, MPIdata%iErr) > > > > > > do you believe that is here the problem? > > Is also this the way how intel MPI create a datatype? > > > > maybe I could also ask to intel MPI users > > What do you think? > > > > Diego > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27523.php > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27524.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27525.php
Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)
When you change environment, that is change between OpenMPI and Intel MPI, or compiler, it is recommended that you recompile everything. use mpi is a module, you cannot mix these between compilers/environments, sadly the Fortran specification does not enforce a strict module format which is why this is necessary. 2015-09-03 14:43 GMT+00:00 Diego Avesani : > Dear Jeff, Dear all, > I normaly use "USE MPI" > > This is the answar fro intel HPC forum: > > *If you are switching between intel and openmpi you must remember not to > mix environment. You might use modules to manage this. As the data types > encodings differ, you must take care that all objects are built against the > same headers.* > > Could someone explain me what are these modules and how I can use them? > > Thanks > > Diego > > Diego > > > On 2 September 2015 at 19:07, Jeff Squyres (jsquyres) > wrote: > >> Can you reproduce the error in a small example? >> >> Also, try using "use mpi" instead of "include 'mpif.h'", and see if that >> turns up any errors. >> >> >> > On Sep 2, 2015, at 12:13 PM, Diego Avesani >> wrote: >> > >> > Dear Gilles, Dear all, >> > I have found the error. Some CPU has no element to share. It was a my >> error. >> > >> > Now I have another one: >> > >> > Fatal error in MPI_Isend: Invalid communicator, error stack: >> > MPI_Isend(158): MPI_Isend(buf=0x137b7b4, count=1, INVALID DATATYPE, >> dest=0, tag=0, comm=0x0, request=0x7fffe8726fc0) failed >> > >> > In this case with MPI does not work, with openMPI it works. >> > >> > Could you see some particular information from the error message? >> > >> > Diego >> > >> > >> > Diego >> > >> > >> > On 2 September 2015 at 14:52, Gilles Gouaillardet < >> gilles.gouaillar...@gmail.com> wrote: >> > Diego, >> > >> > about MPI_Allreduce, you should use MPI_IN_PLACE if you want the same >> buffer in send and recv >> > >> > about the stack, I notice comm is NULL which is a bit surprising... >> > at first glance, type creation looks good. >> > that being said, you do not check MPIdata%iErr is MPI_SUCCESS after >> each MPI call. >> > I recommend you first do this, so you can catch the error as soon it >> happens, and hopefully understand why it occurs. >> > >> > Cheers, >> > >> > Gilles >> > >> > >> > On Wednesday, September 2, 2015, Diego Avesani >> wrote: >> > Dear all, >> > >> > I have notice small difference between OPEN-MPI and intel MPI. >> > For example in MPI_ALLREDUCE in intel MPI is not allowed to use the >> same variable in send and receiving Buff. >> > >> > I have written my code in OPEN-MPI, but unfortunately I have to run in >> on a intel-MPI cluster. >> > Now I have the following error: >> > >> > atal error in MPI_Isend: Invalid communicator, error stack: >> > MPI_Isend(158): MPI_Isend(buf=0x1dd27b0, count=1, INVALID DATATYPE, >> dest=0, tag=0, comm=0x0, request=0x7fff9d7dd9f0) failed >> > >> > >> > This is ho I create my type: >> > >> > CALL MPI_TYPE_VECTOR(1, Ncoeff_MLS, Ncoeff_MLS, >> MPI_DOUBLE_PRECISION, coltype, MPIdata%iErr) >> > CALL MPI_TYPE_COMMIT(coltype, MPIdata%iErr) >> > ! >> > CALL MPI_TYPE_VECTOR(1, nVar, nVar, coltype, MPI_WENO_TYPE, >> MPIdata%iErr) >> > CALL MPI_TYPE_COMMIT(MPI_WENO_TYPE, MPIdata%iErr) >> > >> > >> > do you believe that is here the problem? >> > Is also this the way how intel MPI create a datatype? >> > >> > maybe I could also ask to intel MPI users >> > What do you think? >> > >> > Diego >> > >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/09/27523.php >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/09/27524.php >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/09/27525.php > > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27527.php > -- Kind regards Nick
Re: [OMPI users] tracking down what's causing a cuIpcOpenMemHandle error emitted by OpenMPI
Lev: Can you run with --mca mpi_common_cuda_verbose 100 --mca mpool_rgpusm_verbose 100 and send me (rvandeva...@nvidia.com) the output of that. Thanks, Rolf >-Original Message- >From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lev Givon >Sent: Wednesday, September 02, 2015 7:15 PM >To: us...@open-mpi.org >Subject: [OMPI users] tracking down what's causing a cuIpcOpenMemHandle >error emitted by OpenMPI > >I recently noticed the following error when running a Python program I'm >developing that repeatedly performs GPU-to-GPU data transfers via >OpenMPI: > >The call to cuIpcGetMemHandle failed. This means the GPU RDMA protocol >cannot be used. > cuIpcGetMemHandle return value: 1 > address: 0x602e75000 >Check the cuda.h file for what the return value means. Perhaps a reboot of >the node will clear the problem. > >The system is running Ubuntu 14.04.3 and contains several Tesla S2050 GPUs. >I'm using the following software: > >- Linux kernel 3.19.0 (backported to Ubuntu 14.04.3 from 15.04) >- CUDA 7.0 (installed via NVIDIA's deb packages) >- NVIDIA kernel driver 346.82 >- OpenMPI 1.10.0 (manually compiled with CUDA support) >- Python 2.7.10 >- pycuda 2015.1.3 (manually compiled against CUDA 7.0) >- mpi4py (manually compiled git revision 1d8ab22) > >OpenMPI, Python, pycuda, and mpi4py are all locally installed in a conda >environment. > >Judging from my program's logs, the error pops up during one of the >program's first few iterations. The error isn't fatal, however - the program >continues running to completion after the message appears. Running >mpiexec with --mca plm_base_verbose 10 doesn't seem to produce any >additional debug info of use in tracking this down. I did notice, though, that >there are undeleted cuda.shm.* files in /run/shm after the error message >appears and my program exits. Deleting the files does not prevent the error >from recurring if I subsequently rerun the program. > >Oddly, the above problem doesn't crop up when I run the same code on an >Ubuntu >14.04.3 system with the exact same software containing 2 non-Tesla GPUs >(specifically, a GTX 470 and 750). The error seems to have started occurring >over the past two weeks, but none of the changes I made to my code over >that time seem to be related to the problem (i.e., running an older revision >resulted in the same errors). I also tried running my code using older releases >of OpenMPI (e.g., 1.8.5) and mpi4py (e.g., from about 4 weeks ago), but the >error message still occurs. Both Ubuntu systems are 64-bit and have been >kept up to date with the latest package updates. > >Any thoughts as to what could be causing the problem? >-- >Lev Givon >Bionet Group | Neurokernel Project >http://www.columbia.edu/~lev/ >http://lebedov.github.io/ >http://neurokernel.github.io/ > >___ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: http://www.open- >mpi.org/community/lists/users/2015/09/27526.php --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ---
Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)
Diego, basically that means "do not build with openmpi and run with intelmpi, or the other way around" and/or "do not build a part of your app with openmpi and an other part with intelmpi" "part" can be your app or the use of third party libraries. if you use intel scalapack, make you use the lib built for openmpi that can happen by inadvertence if environment ($PATH) is messed up. a convenient way to keep its environment clean is to use modules http://modules.sourceforge.net if module files are correctly written, there should be virtually no way to mix intelmpi and openmpi, or use openmpi with a lib built with intelmpi. Cheers, Gilles On Thursday, September 3, 2015, Diego Avesani wrote: > Dear Jeff, Dear all, > I normaly use "USE MPI" > > This is the answar fro intel HPC forum: > > *If you are switching between intel and openmpi you must remember not to > mix environment. You might use modules to manage this. As the data types > encodings differ, you must take care that all objects are built against the > same headers.* > > Could someone explain me what are these modules and how I can use them? > > Thanks > > Diego > > Diego > > > On 2 September 2015 at 19:07, Jeff Squyres (jsquyres) > wrote: > >> Can you reproduce the error in a small example? >> >> Also, try using "use mpi" instead of "include 'mpif.h'", and see if that >> turns up any errors. >> >> >> > On Sep 2, 2015, at 12:13 PM, Diego Avesani > > wrote: >> > >> > Dear Gilles, Dear all, >> > I have found the error. Some CPU has no element to share. It was a my >> error. >> > >> > Now I have another one: >> > >> > Fatal error in MPI_Isend: Invalid communicator, error stack: >> > MPI_Isend(158): MPI_Isend(buf=0x137b7b4, count=1, INVALID DATATYPE, >> dest=0, tag=0, comm=0x0, request=0x7fffe8726fc0) failed >> > >> > In this case with MPI does not work, with openMPI it works. >> > >> > Could you see some particular information from the error message? >> > >> > Diego >> > >> > >> > Diego >> > >> > >> > On 2 September 2015 at 14:52, Gilles Gouaillardet < >> gilles.gouaillar...@gmail.com >> > wrote: >> > Diego, >> > >> > about MPI_Allreduce, you should use MPI_IN_PLACE if you want the same >> buffer in send and recv >> > >> > about the stack, I notice comm is NULL which is a bit surprising... >> > at first glance, type creation looks good. >> > that being said, you do not check MPIdata%iErr is MPI_SUCCESS after >> each MPI call. >> > I recommend you first do this, so you can catch the error as soon it >> happens, and hopefully understand why it occurs. >> > >> > Cheers, >> > >> > Gilles >> > >> > >> > On Wednesday, September 2, 2015, Diego Avesani > > wrote: >> > Dear all, >> > >> > I have notice small difference between OPEN-MPI and intel MPI. >> > For example in MPI_ALLREDUCE in intel MPI is not allowed to use the >> same variable in send and receiving Buff. >> > >> > I have written my code in OPEN-MPI, but unfortunately I have to run in >> on a intel-MPI cluster. >> > Now I have the following error: >> > >> > atal error in MPI_Isend: Invalid communicator, error stack: >> > MPI_Isend(158): MPI_Isend(buf=0x1dd27b0, count=1, INVALID DATATYPE, >> dest=0, tag=0, comm=0x0, request=0x7fff9d7dd9f0) failed >> > >> > >> > This is ho I create my type: >> > >> > CALL MPI_TYPE_VECTOR(1, Ncoeff_MLS, Ncoeff_MLS, >> MPI_DOUBLE_PRECISION, coltype, MPIdata%iErr) >> > CALL MPI_TYPE_COMMIT(coltype, MPIdata%iErr) >> > ! >> > CALL MPI_TYPE_VECTOR(1, nVar, nVar, coltype, MPI_WENO_TYPE, >> MPIdata%iErr) >> > CALL MPI_TYPE_COMMIT(MPI_WENO_TYPE, MPIdata%iErr) >> > >> > >> > do you believe that is here the problem? >> > Is also this the way how intel MPI create a datatype? >> > >> > maybe I could also ask to intel MPI users >> > What do you think? >> > >> > Diego >> > >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/09/27523.php >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/09/27524.php >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/09/27525.php > > >
Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)
Dear Nick, Dear all, I use mpi. I recompile everything, every time. I do not understand what I shall do. Thanks again Diego Diego On 3 September 2015 at 16:52, Nick Papior wrote: > When you change environment, that is change between OpenMPI and Intel MPI, > or compiler, it is recommended that you recompile everything. > > use mpi > > is a module, you cannot mix these between compilers/environments, sadly > the Fortran specification does not enforce a strict module format which is > why this is necessary. > > > > 2015-09-03 14:43 GMT+00:00 Diego Avesani : > >> Dear Jeff, Dear all, >> I normaly use "USE MPI" >> >> This is the answar fro intel HPC forum: >> >> *If you are switching between intel and openmpi you must remember not to >> mix environment. You might use modules to manage this. As the data types >> encodings differ, you must take care that all objects are built against the >> same headers.* >> >> Could someone explain me what are these modules and how I can use them? >> >> Thanks >> >> Diego >> >> Diego >> >> >> On 2 September 2015 at 19:07, Jeff Squyres (jsquyres) > > wrote: >> >>> Can you reproduce the error in a small example? >>> >>> Also, try using "use mpi" instead of "include 'mpif.h'", and see if that >>> turns up any errors. >>> >>> >>> > On Sep 2, 2015, at 12:13 PM, Diego Avesani >>> wrote: >>> > >>> > Dear Gilles, Dear all, >>> > I have found the error. Some CPU has no element to share. It was a my >>> error. >>> > >>> > Now I have another one: >>> > >>> > Fatal error in MPI_Isend: Invalid communicator, error stack: >>> > MPI_Isend(158): MPI_Isend(buf=0x137b7b4, count=1, INVALID DATATYPE, >>> dest=0, tag=0, comm=0x0, request=0x7fffe8726fc0) failed >>> > >>> > In this case with MPI does not work, with openMPI it works. >>> > >>> > Could you see some particular information from the error message? >>> > >>> > Diego >>> > >>> > >>> > Diego >>> > >>> > >>> > On 2 September 2015 at 14:52, Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com> wrote: >>> > Diego, >>> > >>> > about MPI_Allreduce, you should use MPI_IN_PLACE if you want the same >>> buffer in send and recv >>> > >>> > about the stack, I notice comm is NULL which is a bit surprising... >>> > at first glance, type creation looks good. >>> > that being said, you do not check MPIdata%iErr is MPI_SUCCESS after >>> each MPI call. >>> > I recommend you first do this, so you can catch the error as soon it >>> happens, and hopefully understand why it occurs. >>> > >>> > Cheers, >>> > >>> > Gilles >>> > >>> > >>> > On Wednesday, September 2, 2015, Diego Avesani < >>> diego.aves...@gmail.com> wrote: >>> > Dear all, >>> > >>> > I have notice small difference between OPEN-MPI and intel MPI. >>> > For example in MPI_ALLREDUCE in intel MPI is not allowed to use the >>> same variable in send and receiving Buff. >>> > >>> > I have written my code in OPEN-MPI, but unfortunately I have to run in >>> on a intel-MPI cluster. >>> > Now I have the following error: >>> > >>> > atal error in MPI_Isend: Invalid communicator, error stack: >>> > MPI_Isend(158): MPI_Isend(buf=0x1dd27b0, count=1, INVALID DATATYPE, >>> dest=0, tag=0, comm=0x0, request=0x7fff9d7dd9f0) failed >>> > >>> > >>> > This is ho I create my type: >>> > >>> > CALL MPI_TYPE_VECTOR(1, Ncoeff_MLS, Ncoeff_MLS, >>> MPI_DOUBLE_PRECISION, coltype, MPIdata%iErr) >>> > CALL MPI_TYPE_COMMIT(coltype, MPIdata%iErr) >>> > ! >>> > CALL MPI_TYPE_VECTOR(1, nVar, nVar, coltype, MPI_WENO_TYPE, >>> MPIdata%iErr) >>> > CALL MPI_TYPE_COMMIT(MPI_WENO_TYPE, MPIdata%iErr) >>> > >>> > >>> > do you believe that is here the problem? >>> > Is also this the way how intel MPI create a datatype? >>> > >>> > maybe I could also ask to intel MPI users >>> > What do you think? >>> > >>> > Diego >>> > >>> > >>> > ___ >>> > users mailing list >>> > us...@open-mpi.org >>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> > Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/09/27523.php >>> > >>> > ___ >>> > users mailing list >>> > us...@open-mpi.org >>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> > Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/09/27524.php >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/09/27525.php >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/201
Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)
You still havent shown us anything about what goes wrong, you just give us the error statement and assume it is because of ill-defined type-creation, it might as well be because you call allreduce erroneously. Please give us more information... 2015-09-03 14:59 GMT+00:00 Diego Avesani : > Dear Nick, Dear all, > > I use mpi. > > I recompile everything, every time. > > I do not understand what I shall do. > > Thanks again > > Diego > > Diego > > > On 3 September 2015 at 16:52, Nick Papior wrote: > >> When you change environment, that is change between OpenMPI and Intel >> MPI, or compiler, it is recommended that you recompile everything. >> >> use mpi >> >> is a module, you cannot mix these between compilers/environments, sadly >> the Fortran specification does not enforce a strict module format which is >> why this is necessary. >> >> >> >> 2015-09-03 14:43 GMT+00:00 Diego Avesani : >> >>> Dear Jeff, Dear all, >>> I normaly use "USE MPI" >>> >>> This is the answar fro intel HPC forum: >>> >>> *If you are switching between intel and openmpi you must remember not to >>> mix environment. You might use modules to manage this. As the data types >>> encodings differ, you must take care that all objects are built against the >>> same headers.* >>> >>> Could someone explain me what are these modules and how I can use them? >>> >>> Thanks >>> >>> Diego >>> >>> Diego >>> >>> >>> On 2 September 2015 at 19:07, Jeff Squyres (jsquyres) < >>> jsquy...@cisco.com> wrote: >>> Can you reproduce the error in a small example? Also, try using "use mpi" instead of "include 'mpif.h'", and see if that turns up any errors. > On Sep 2, 2015, at 12:13 PM, Diego Avesani wrote: > > Dear Gilles, Dear all, > I have found the error. Some CPU has no element to share. It was a my error. > > Now I have another one: > > Fatal error in MPI_Isend: Invalid communicator, error stack: > MPI_Isend(158): MPI_Isend(buf=0x137b7b4, count=1, INVALID DATATYPE, dest=0, tag=0, comm=0x0, request=0x7fffe8726fc0) failed > > In this case with MPI does not work, with openMPI it works. > > Could you see some particular information from the error message? > > Diego > > > Diego > > > On 2 September 2015 at 14:52, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Diego, > > about MPI_Allreduce, you should use MPI_IN_PLACE if you want the same buffer in send and recv > > about the stack, I notice comm is NULL which is a bit surprising... > at first glance, type creation looks good. > that being said, you do not check MPIdata%iErr is MPI_SUCCESS after each MPI call. > I recommend you first do this, so you can catch the error as soon it happens, and hopefully understand why it occurs. > > Cheers, > > Gilles > > > On Wednesday, September 2, 2015, Diego Avesani < diego.aves...@gmail.com> wrote: > Dear all, > > I have notice small difference between OPEN-MPI and intel MPI. > For example in MPI_ALLREDUCE in intel MPI is not allowed to use the same variable in send and receiving Buff. > > I have written my code in OPEN-MPI, but unfortunately I have to run in on a intel-MPI cluster. > Now I have the following error: > > atal error in MPI_Isend: Invalid communicator, error stack: > MPI_Isend(158): MPI_Isend(buf=0x1dd27b0, count=1, INVALID DATATYPE, dest=0, tag=0, comm=0x0, request=0x7fff9d7dd9f0) failed > > > This is ho I create my type: > > CALL MPI_TYPE_VECTOR(1, Ncoeff_MLS, Ncoeff_MLS, MPI_DOUBLE_PRECISION, coltype, MPIdata%iErr) > CALL MPI_TYPE_COMMIT(coltype, MPIdata%iErr) > ! > CALL MPI_TYPE_VECTOR(1, nVar, nVar, coltype, MPI_WENO_TYPE, MPIdata%iErr) > CALL MPI_TYPE_COMMIT(MPI_WENO_TYPE, MPIdata%iErr) > > > do you believe that is here the problem? > Is also this the way how intel MPI create a datatype? > > maybe I could also ask to intel MPI users > What do you think? > > Diego > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: http://www.open-mpi.org/community/lists/users/2015/09/27523.php > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: http://www.open-mpi.org/community/lists/users/2015/09/27524.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ __
Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)
Hello, On 09/03/15 16:52, Nick Papior wrote: When you change environment, that is change between OpenMPI and Intel MPI, or compiler, it is recommended that you recompile everything. use mpi is a module, you cannot mix these between compilers/environments, sadly the Fortran specification does not enforce a strict module format which is why this is necessary. this is sensible because the ISO Fortran standard also does not enforce the memory layout of descriptors that get passed around in a Fortran program, so having a module from one compiler work in another cannot work unless a common ABI for arguments is agreed upon (like it was for Fortran 77, where the SysV ABI is pretty consistently used). Because Fortran modules (introduced in Fortran 90) supply more semantics than the Fortran 77 mpif.h it's often useful to 'use mpi' than 'include "mpif.h"'. 2015-09-03 14:43 GMT+00:00 Diego Avesani mailto:diego.aves...@gmail.com>>: Dear Jeff, Dear all, I normaly use "USE MPI" This is the answar fro intel HPC forum: /If you are switching between intel and openmpi you must remember not to mix environment. You might use modules to manage this. As the data types encodings differ, you must take care that all objects are built against the same headers./ Could someone explain me what are these modules and how I can use them? This refers to the 'modules' software package[1] (different from Fortran modules) which simplifies having multiple versions of the same and different software packages with the same commands around. It's typically used in server environments where an upgrade for all users/dependent software packages is non-trivial. Regards, Thomas [1] http://modules.sourceforge.net/ smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)
Diego, did you update your code to check all MPI calls are successful ? (e.g. test ierr is MPI_SUCCES after each MPI call) can you write a short program that reproduce the same issue ? if not, is your program and input data public ally available ? Cheers, Gilles On Thursday, September 3, 2015, Diego Avesani wrote: > Dear Nick, Dear all, > > I use mpi. > > I recompile everything, every time. > > I do not understand what I shall do. > > Thanks again > > Diego > > Diego > > > On 3 September 2015 at 16:52, Nick Papior > wrote: > >> When you change environment, that is change between OpenMPI and Intel >> MPI, or compiler, it is recommended that you recompile everything. >> >> use mpi >> >> is a module, you cannot mix these between compilers/environments, sadly >> the Fortran specification does not enforce a strict module format which is >> why this is necessary. >> >> >> >> 2015-09-03 14:43 GMT+00:00 Diego Avesani > >: >> >>> Dear Jeff, Dear all, >>> I normaly use "USE MPI" >>> >>> This is the answar fro intel HPC forum: >>> >>> *If you are switching between intel and openmpi you must remember not to >>> mix environment. You might use modules to manage this. As the data types >>> encodings differ, you must take care that all objects are built against the >>> same headers.* >>> >>> Could someone explain me what are these modules and how I can use them? >>> >>> Thanks >>> >>> Diego >>> >>> Diego >>> >>> >>> On 2 September 2015 at 19:07, Jeff Squyres (jsquyres) < >>> jsquy...@cisco.com > >>> wrote: >>> Can you reproduce the error in a small example? Also, try using "use mpi" instead of "include 'mpif.h'", and see if that turns up any errors. > On Sep 2, 2015, at 12:13 PM, Diego Avesani >>> > wrote: > > Dear Gilles, Dear all, > I have found the error. Some CPU has no element to share. It was a my error. > > Now I have another one: > > Fatal error in MPI_Isend: Invalid communicator, error stack: > MPI_Isend(158): MPI_Isend(buf=0x137b7b4, count=1, INVALID DATATYPE, dest=0, tag=0, comm=0x0, request=0x7fffe8726fc0) failed > > In this case with MPI does not work, with openMPI it works. > > Could you see some particular information from the error message? > > Diego > > > Diego > > > On 2 September 2015 at 14:52, Gilles Gouaillardet < gilles.gouaillar...@gmail.com > wrote: > Diego, > > about MPI_Allreduce, you should use MPI_IN_PLACE if you want the same buffer in send and recv > > about the stack, I notice comm is NULL which is a bit surprising... > at first glance, type creation looks good. > that being said, you do not check MPIdata%iErr is MPI_SUCCESS after each MPI call. > I recommend you first do this, so you can catch the error as soon it happens, and hopefully understand why it occurs. > > Cheers, > > Gilles > > > On Wednesday, September 2, 2015, Diego Avesani < diego.aves...@gmail.com > wrote: > Dear all, > > I have notice small difference between OPEN-MPI and intel MPI. > For example in MPI_ALLREDUCE in intel MPI is not allowed to use the same variable in send and receiving Buff. > > I have written my code in OPEN-MPI, but unfortunately I have to run in on a intel-MPI cluster. > Now I have the following error: > > atal error in MPI_Isend: Invalid communicator, error stack: > MPI_Isend(158): MPI_Isend(buf=0x1dd27b0, count=1, INVALID DATATYPE, dest=0, tag=0, comm=0x0, request=0x7fff9d7dd9f0) failed > > > This is ho I create my type: > > CALL MPI_TYPE_VECTOR(1, Ncoeff_MLS, Ncoeff_MLS, MPI_DOUBLE_PRECISION, coltype, MPIdata%iErr) > CALL MPI_TYPE_COMMIT(coltype, MPIdata%iErr) > ! > CALL MPI_TYPE_VECTOR(1, nVar, nVar, coltype, MPI_WENO_TYPE, MPIdata%iErr) > CALL MPI_TYPE_COMMIT(MPI_WENO_TYPE, MPIdata%iErr) > > > do you believe that is here the problem? > Is also this the way how intel MPI create a datatype? > > maybe I could also ask to intel MPI users > What do you think? > > Diego > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: http://www.open-mpi.org/community/lists/users/2015/09/27523.php > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: http://www.open-mpi.org/community/lists/users/2015/09/27524.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/
Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)
Hi Diego, I think the Intel HPC forum comment is about using environment modules to manage your environment (PATH, LD_LIBRARY_PATH variables). Most HPC systems use environment modules: - Tcl ( http://modules.cvs.sourceforge.net/viewvc/modules/modules/tcl/ ) - C/Tcl ( http://sourceforge.net/project/showfiles.php?group_id=15538 ) - Lmod ( https://www.tacc.utexas.edu/research-development/tacc-projects/lmod ) If your system has environment modules, you'd typically. - load a compiler (Intel, GCC, PGI, etc). - load a MPI built with that compiler (Intel MPI, OpenMPI). The most important thing here is to have a software stack that is consistent. That is built with the same compiler. For example GCC with OpenMPI to build and execute your program. While not GCC and OpenMPI to build then Intel and Intel MPI to execute. Regards > On Sep 3, 2015, at 8:43 AM, Diego Avesani wrote: > > Dear Jeff, Dear all, > I normaly use "USE MPI" > > This is the answar fro intel HPC forum: > > If you are switching between intel and openmpi you must remember not to mix > environment. You might use modules to manage this. As the data types > encodings differ, you must take care that all objects are built against the > same headers. > > Could someone explain me what are these modules and how I can use them? > > Thanks > > Diego > > Diego > > > On 2 September 2015 at 19:07, Jeff Squyres (jsquyres) > wrote: > Can you reproduce the error in a small example? > > Also, try using "use mpi" instead of "include 'mpif.h'", and see if that > turns up any errors. > > > > On Sep 2, 2015, at 12:13 PM, Diego Avesani wrote: > > > > Dear Gilles, Dear all, > > I have found the error. Some CPU has no element to share. It was a my error. > > > > Now I have another one: > > > > Fatal error in MPI_Isend: Invalid communicator, error stack: > > MPI_Isend(158): MPI_Isend(buf=0x137b7b4, count=1, INVALID DATATYPE, dest=0, > > tag=0, comm=0x0, request=0x7fffe8726fc0) failed > > > > In this case with MPI does not work, with openMPI it works. > > > > Could you see some particular information from the error message? > > > > Diego > > > > > > Diego > > > > > > On 2 September 2015 at 14:52, Gilles Gouaillardet > > wrote: > > Diego, > > > > about MPI_Allreduce, you should use MPI_IN_PLACE if you want the same > > buffer in send and recv > > > > about the stack, I notice comm is NULL which is a bit surprising... > > at first glance, type creation looks good. > > that being said, you do not check MPIdata%iErr is MPI_SUCCESS after each > > MPI call. > > I recommend you first do this, so you can catch the error as soon it > > happens, and hopefully understand why it occurs. > > > > Cheers, > > > > Gilles > > > > > > On Wednesday, September 2, 2015, Diego Avesani > > wrote: > > Dear all, > > > > I have notice small difference between OPEN-MPI and intel MPI. > > For example in MPI_ALLREDUCE in intel MPI is not allowed to use the same > > variable in send and receiving Buff. > > > > I have written my code in OPEN-MPI, but unfortunately I have to run in on a > > intel-MPI cluster. > > Now I have the following error: > > > > atal error in MPI_Isend: Invalid communicator, error stack: > > MPI_Isend(158): MPI_Isend(buf=0x1dd27b0, count=1, INVALID DATATYPE, dest=0, > > tag=0, comm=0x0, request=0x7fff9d7dd9f0) failed > > > > > > This is ho I create my type: > > > > CALL MPI_TYPE_VECTOR(1, Ncoeff_MLS, Ncoeff_MLS, MPI_DOUBLE_PRECISION, > > coltype, MPIdata%iErr) > > CALL MPI_TYPE_COMMIT(coltype, MPIdata%iErr) > > ! > > CALL MPI_TYPE_VECTOR(1, nVar, nVar, coltype, MPI_WENO_TYPE, MPIdata%iErr) > > CALL MPI_TYPE_COMMIT(MPI_WENO_TYPE, MPIdata%iErr) > > > > > > do you believe that is here the problem? > > Is also this the way how intel MPI create a datatype? > > > > maybe I could also ask to intel MPI users > > What do you think? > > > > Diego > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2015/09/27523.php > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2015/09/27524.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27525.php > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post:
Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)
Dear all, Dear Nick, you are right. now will try to erase every time all *.mod file and *.o file. After thar recompile all *.f90 files. If I get another error I will tell you also the message. Thanks again Diego Diego On 3 September 2015 at 17:03, Nick Papior wrote: > You still havent shown us anything about what goes wrong, you just give us > the error statement and assume it is because of ill-defined type-creation, > it might as well be because you call allreduce erroneously. > Please give us more information... > > 2015-09-03 14:59 GMT+00:00 Diego Avesani : > >> Dear Nick, Dear all, >> >> I use mpi. >> >> I recompile everything, every time. >> >> I do not understand what I shall do. >> >> Thanks again >> >> Diego >> >> Diego >> >> >> On 3 September 2015 at 16:52, Nick Papior wrote: >> >>> When you change environment, that is change between OpenMPI and Intel >>> MPI, or compiler, it is recommended that you recompile everything. >>> >>> use mpi >>> >>> is a module, you cannot mix these between compilers/environments, sadly >>> the Fortran specification does not enforce a strict module format which is >>> why this is necessary. >>> >>> >>> >>> 2015-09-03 14:43 GMT+00:00 Diego Avesani : >>> Dear Jeff, Dear all, I normaly use "USE MPI" This is the answar fro intel HPC forum: *If you are switching between intel and openmpi you must remember not to mix environment. You might use modules to manage this. As the data types encodings differ, you must take care that all objects are built against the same headers.* Could someone explain me what are these modules and how I can use them? Thanks Diego Diego On 2 September 2015 at 19:07, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > Can you reproduce the error in a small example? > > Also, try using "use mpi" instead of "include 'mpif.h'", and see if > that turns up any errors. > > > > On Sep 2, 2015, at 12:13 PM, Diego Avesani > wrote: > > > > Dear Gilles, Dear all, > > I have found the error. Some CPU has no element to share. It was a > my error. > > > > Now I have another one: > > > > Fatal error in MPI_Isend: Invalid communicator, error stack: > > MPI_Isend(158): MPI_Isend(buf=0x137b7b4, count=1, INVALID DATATYPE, > dest=0, tag=0, comm=0x0, request=0x7fffe8726fc0) failed > > > > In this case with MPI does not work, with openMPI it works. > > > > Could you see some particular information from the error message? > > > > Diego > > > > > > Diego > > > > > > On 2 September 2015 at 14:52, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > > Diego, > > > > about MPI_Allreduce, you should use MPI_IN_PLACE if you want the > same buffer in send and recv > > > > about the stack, I notice comm is NULL which is a bit surprising... > > at first glance, type creation looks good. > > that being said, you do not check MPIdata%iErr is MPI_SUCCESS after > each MPI call. > > I recommend you first do this, so you can catch the error as soon it > happens, and hopefully understand why it occurs. > > > > Cheers, > > > > Gilles > > > > > > On Wednesday, September 2, 2015, Diego Avesani < > diego.aves...@gmail.com> wrote: > > Dear all, > > > > I have notice small difference between OPEN-MPI and intel MPI. > > For example in MPI_ALLREDUCE in intel MPI is not allowed to use the > same variable in send and receiving Buff. > > > > I have written my code in OPEN-MPI, but unfortunately I have to run > in on a intel-MPI cluster. > > Now I have the following error: > > > > atal error in MPI_Isend: Invalid communicator, error stack: > > MPI_Isend(158): MPI_Isend(buf=0x1dd27b0, count=1, INVALID DATATYPE, > dest=0, tag=0, comm=0x0, request=0x7fff9d7dd9f0) failed > > > > > > This is ho I create my type: > > > > CALL MPI_TYPE_VECTOR(1, Ncoeff_MLS, Ncoeff_MLS, > MPI_DOUBLE_PRECISION, coltype, MPIdata%iErr) > > CALL MPI_TYPE_COMMIT(coltype, MPIdata%iErr) > > ! > > CALL MPI_TYPE_VECTOR(1, nVar, nVar, coltype, MPI_WENO_TYPE, > MPIdata%iErr) > > CALL MPI_TYPE_COMMIT(MPI_WENO_TYPE, MPIdata%iErr) > > > > > > do you believe that is here the problem? > > Is also this the way how intel MPI create a datatype? > > > > maybe I could also ask to intel MPI users > > What do you think? > > > > Diego > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27523.php > > > >
Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)
On Sep 3, 2015, at 10:43 AM, Diego Avesani wrote: > > Dear Jeff, Dear all, > I normaly use "USE MPI" > > This is the answar fro intel HPC forum: > > If you are switching between intel and openmpi you must remember not to mix > environment. You might use modules to manage this. I think the source of the confusion here might well be an overload of the word "modules". I think the word "module" in the phrase "You might use modules to manage this" is referring to *environment modules*, not *Fortran modules*. I.e.: http://modules.sourceforge.net/ Where you can do stuff like this: - # Use Open MPI $ module load openmpi $ mpicc my_program.c $ mpirun -np 4 a.out # Use __some_other_MPI__ $ module load othermpi $ mpicc my_program.c $ mpirun -np 4 a.out - Environment modules are typically used to set things like PATH, LD_LIBRARY_PATH, and MANPATH. I think the poster on the Intel HPC forum was probably referring to you using environment modules to switch your PATH / LD_LIBRARY_PATH / MANPATH between Open MPI and Intel MPI. > As the data types encodings differ, you must take care that all objects are > built against the same headers. Here, the poster is essentially saying that if you want to use Open MPI, you have to compile and mpirun with Open MPI. And if you want to use Open MPI, you have to (re)compile and mpirun with Intel MPI. In short: Open MPI and Intel MPI are not binary compatible, and their mpirun's are not compatible, either. (note that this is an Open MPI mailing list; we can't answer questions about Intel MPI here) My point with "use mpi" was that you should try replacing "include 'mpif.h'" with "use mpi" in your Fortran blocks. Open MPI's "use mpi" implementation will do a lot of compile-time type checking that "include 'mpif.h'" will not. Hence, it help determine if you're passing an incorrect parameter to an MPI subroutine, for example. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] difference between OPENMPI e Intel MPI (DATATYPE)
There is also the package Lmod, which provides similar functionality to environment modules. It is maintained by TACC. https://www.tacc.utexas.edu/research-development/tacc-projects/lmod but I think the current source code is at https://github.com/TACC/Lmod -- bennet On Thu, Sep 3, 2015 at 11:13 AM, Jeff Squyres (jsquyres) wrote: > On Sep 3, 2015, at 10:43 AM, Diego Avesani wrote: >> >> Dear Jeff, Dear all, >> I normaly use "USE MPI" >> >> This is the answar fro intel HPC forum: >> >> If you are switching between intel and openmpi you must remember not to mix >> environment. You might use modules to manage this. > > I think the source of the confusion here might well be an overload of the > word "modules". > > I think the word "module" in the phrase "You might use modules to manage > this" is referring to *environment modules*, not *Fortran modules*. I.e.: > http://modules.sourceforge.net/ > > Where you can do stuff like this: > > - > # Use Open MPI > $ module load openmpi > $ mpicc my_program.c > $ mpirun -np 4 a.out > > # Use __some_other_MPI__ > $ module load othermpi > $ mpicc my_program.c > $ mpirun -np 4 a.out > - > > Environment modules are typically used to set things like PATH, > LD_LIBRARY_PATH, and MANPATH. > > I think the poster on the Intel HPC forum was probably referring to you using > environment modules to switch your PATH / LD_LIBRARY_PATH / MANPATH between > Open MPI and Intel MPI. > >> As the data types encodings differ, you must take care that all objects are >> built against the same headers. > > Here, the poster is essentially saying that if you want to use Open MPI, you > have to compile and mpirun with Open MPI. And if you want to use Open MPI, > you have to (re)compile and mpirun with Intel MPI. > > In short: Open MPI and Intel MPI are not binary compatible, and their > mpirun's are not compatible, either. > > (note that this is an Open MPI mailing list; we can't answer questions about > Intel MPI here) > > My point with "use mpi" was that you should try replacing "include 'mpif.h'" > with "use mpi" in your Fortran blocks. Open MPI's "use mpi" implementation > will do a lot of compile-time type checking that "include 'mpif.h'" will not. > Hence, it help determine if you're passing an incorrect parameter to an MPI > subroutine, for example. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27537.php