[OMPI users] OpenMPI against multiple, evolving SLURM versions
Hi, Our group can't find anyway to do this and it'd be helpful. We use slurm and keep upgrading the slurm environment. OpenMPI bombs out against PMI each time the libslurm stuff changes, which seems to be fairly regularly. Is there a way to compile against slurm but insulate ourselves from the libslurm chaos? Obvious will ask the slurm folks too. [wlaw@some-node /scratch/users/wlaw/imb/src]$ mpirun -n 2 --mca grpcomm ^pmi ./IMB-MPI1 [some-node.local:42584] mca: base: component_find: unable to open /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_ess_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) [some-node.local:42585] mca: base: component_find: unable to open /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) [some-node.local:42586] mca: base: component_find: unable to open /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi: libslurm.so.28: cannot open shared object file: No such file or directory (ignored) (sent it via the wrong email so it bounced. heh) Upon further investigation it seems like the most appropriate thing would be to point it at compile time to libslurm.so instead of libslurm.so.xx; does that make sense? Thanks, Will
Re: [OMPI users] OpenMPI against multiple, evolving SLURM versions
It makes sense - but isn’t it slurm that is linking libpmi against libslurm? I don’t think we are making that connection, so it would be a slurm issue to change it. > On Jan 28, 2016, at 10:12 PM, William Law wrote: > > Hi, > > Our group can't find anyway to do this and it'd be helpful. > > We use slurm and keep upgrading the slurm environment. OpenMPI bombs out > against PMI each time the libslurm stuff changes, which seems to be fairly > regularly. Is there a way to compile against slurm but insulate ourselves > from the libslurm chaos? Obvious will ask the slurm folks too. > > [wlaw@some-node /scratch/users/wlaw/imb/src]$ mpirun -n 2 --mca grpcomm ^pmi > ./IMB-MPI1 > [some-node.local:42584] mca: base: component_find: unable to open > /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_ess_pmi: > libslurm.so.28: cannot open shared object file: No such file or directory > (ignored) > [some-node.local:42585] mca: base: component_find: unable to open > /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi: > libslurm.so.28: cannot open shared object file: No such file or directory > (ignored) > [some-node.local:42586] mca: base: component_find: unable to open > /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi: > libslurm.so.28: cannot open shared object file: No such file or directory > (ignored) > > (sent it via the wrong email so it bounced. heh) > > Upon further investigation it seems like the most appropriate thing would be > to point it at compile time to libslurm.so instead of libslurm.so.xx; does > that make sense? > > Thanks, > > Will > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28408.php
[OMPI users] difference between OpenMPI - intel MPI mpi_waitall
Dear all, I have created a program in fortran and OpenMPI, I test it on my laptop and it works. I would like to use it on a cluster that has, unfortunately, intel MPI. The program crushes on the cluster and I get the following error: *Fatal error in MPI_Waitall: Invalid MPI_Request, error stack:* *MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, status_array=0x744600) failed* *MPI_Waitall(119): The supplied request in array element 2 was invalid (kind=0)* Do OpenMPI and MPI have some difference that I do not know? this is my code REQUEST = MPI_REQUEST_NULL !send data share with left IF(MPIdata%rank.NE.0)THEN MsgLength = MPIdata%imaxN DO icount=1,MPIdata%imaxN iNode = MPIdata%nodeFromUp(icount) send_messageL(icount) = R1(iNode) ENDDO CALL MPI_ISEND(send_messageL, MsgLength, MPIdata%AUTO_COMP, MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, REQUEST(1), MPIdata%iErr) ENDIF ! !recive message FROM RIGHT CPU IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN MsgLength = MPIdata%imaxN CALL MPI_IRECV(recv_messageR, MsgLength, MPIdata%AUTO_COMP, MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD, REQUEST(2), MPIdata%iErr) ENDIF CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN DO i=1,MPIdata%imaxN iNode=MPIdata%nodeList2Up(i) R1(iNode)=recv_messageR(i) ENDDO ENDIF Thank a lot your help Diego
Re: [OMPI users] difference between OpenMPI - intel MPI mpi_waitall
Diego, your code snippet does MPI_Waitall(2,...) but the error is about MPI_Waitall(3,...) Cheers, Gilles On Friday, January 29, 2016, Diego Avesani wrote: > Dear all, > > I have created a program in fortran and OpenMPI, I test it on my laptop > and it works. > I would like to use it on a cluster that has, unfortunately, intel MPI. > > The program crushes on the cluster and I get the following error: > > *Fatal error in MPI_Waitall: Invalid MPI_Request, error stack:* > *MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, > status_array=0x744600) failed* > *MPI_Waitall(119): The supplied request in array element 2 was invalid > (kind=0)* > > Do OpenMPI and MPI have some difference that I do not know? > > this is my code > > REQUEST = MPI_REQUEST_NULL > !send data share with left > IF(MPIdata%rank.NE.0)THEN > MsgLength = MPIdata%imaxN > DO icount=1,MPIdata%imaxN > iNode = MPIdata%nodeFromUp(icount) > send_messageL(icount) = R1(iNode) > ENDDO > CALL MPI_ISEND(send_messageL, MsgLength, MPIdata%AUTO_COMP, > MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, REQUEST(1), MPIdata%iErr) > ENDIF > ! > !recive message FROM RIGHT CPU > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN > MsgLength = MPIdata%imaxN > CALL MPI_IRECV(recv_messageR, MsgLength, MPIdata%AUTO_COMP, > MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD, REQUEST(2), MPIdata%iErr) > ENDIF > CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN > DO i=1,MPIdata%imaxN >iNode=MPIdata%nodeList2Up(i) >R1(iNode)=recv_messageR(i) > ENDDO > ENDIF > > Thank a lot your help > > > > Diego > >
Re: [OMPI users] OpenMPI against multiple, evolving SLURM versions
Is openmpi linked with a static libpmi.a that requires a dynamic libslurm ? that can be checked with ldd mca_ess_pmi.so btw, do slurm folks increase the libpmi.so version each time slurm is upgraded ? that could be a part of the issue ... but if they increase lib version because of abi changes, it might be a bad idea to open libxxx.so instead of libxxx.so.y generally speaking, libxxx.so.y is provided by libxxx package, and libxxx.so is provided by libxxx-devel package, which means it might not be available on compute nodes. we could also dlopen libxxx instead of linking with it, and have the sysadmin configure openmpi so it finds the right lib (this approach is used by a prominent vendor, and has other pros but also cons) Cheers, Gilles On Friday, January 29, 2016, Ralph Castain wrote: > It makes sense - but isn’t it slurm that is linking libpmi against > libslurm? I don’t think we are making that connection, so it would be a > slurm issue to change it. > > > On Jan 28, 2016, at 10:12 PM, William Law > wrote: > > Hi, > > Our group can't find anyway to do this and it'd be helpful. > > We use slurm and keep upgrading the slurm environment. OpenMPI bombs out > against PMI each time the libslurm stuff changes, which seems to be fairly > regularly. Is there a way to compile against slurm but insulate ourselves > from the libslurm chaos? Obvious will ask the slurm folks too. > > [*wlaw*@some-node /scratch/users/wlaw/imb/src]$ mpirun -n 2 --mca grpcomm > ^pmi ./IMB-MPI1 > [some-node.local:42584] mca: base: component_find: unable to open > /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_ess_pmi: > libslurm.so.28: cannot open shared object file: No such file or directory > (ignored) > [some-node.local:42585] mca: base: component_find: unable to open > /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi: > libslurm.so.28: cannot open shared object file: No such file or directory > (ignored) > [some-node.local:42586] mca: base: component_find: unable to open > /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi: > libslurm.so.28: cannot open shared object file: No such file or directory > (ignored) > > (sent it via the wrong email so it bounced. heh) > > Upon further investigation it seems like the most appropriate thing would > be to point it at compile time to libslurm.so instead of libslurm.so.xx; > does that make sense? > > Thanks, > > Will > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28408.php > > >
Re: [OMPI users] difference between OpenMPI - intel MPI mpi_waitall
Dear all, Dear Gilles, I do not understand, I am sorry. I did a "grep" on my code and I find only "MPI_WAITALL(2", so I am not able to find the error. Thanks a lot Diego On 29 January 2016 at 11:58, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Diego, > > your code snippet does MPI_Waitall(2,...) > but the error is about MPI_Waitall(3,...) > > Cheers, > > Gilles > > > On Friday, January 29, 2016, Diego Avesani > wrote: > >> Dear all, >> >> I have created a program in fortran and OpenMPI, I test it on my laptop >> and it works. >> I would like to use it on a cluster that has, unfortunately, intel MPI. >> >> The program crushes on the cluster and I get the following error: >> >> *Fatal error in MPI_Waitall: Invalid MPI_Request, error stack:* >> *MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, >> status_array=0x744600) failed* >> *MPI_Waitall(119): The supplied request in array element 2 was invalid >> (kind=0)* >> >> Do OpenMPI and MPI have some difference that I do not know? >> >> this is my code >> >> REQUEST = MPI_REQUEST_NULL >> !send data share with left >> IF(MPIdata%rank.NE.0)THEN >> MsgLength = MPIdata%imaxN >> DO icount=1,MPIdata%imaxN >> iNode = MPIdata%nodeFromUp(icount) >> send_messageL(icount) = R1(iNode) >> ENDDO >> CALL MPI_ISEND(send_messageL, MsgLength, MPIdata%AUTO_COMP, >> MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, REQUEST(1), MPIdata%iErr) >> ENDIF >> ! >> !recive message FROM RIGHT CPU >> IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN >> MsgLength = MPIdata%imaxN >> CALL MPI_IRECV(recv_messageR, MsgLength, MPIdata%AUTO_COMP, >> MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD, REQUEST(2), MPIdata%iErr) >> ENDIF >> CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) >> IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN >> DO i=1,MPIdata%imaxN >>iNode=MPIdata%nodeList2Up(i) >>R1(iNode)=recv_messageR(i) >> ENDDO >> ENDIF >> >> Thank a lot your help >> >> >> >> Diego >> >> > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28411.php >
Re: [OMPI users] difference between OpenMPI - intel MPI mpi_waitall
You must have an error elsewhere in your code; as Gilles pointed, the error message states that you are calling MPI_WAITALL with a first argument of 3: -- MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, status_array=0x744600) failed -- We can't really help you with problems with Intel MPI; sorry. You'll need to contact their tech support for assistance. > On Jan 29, 2016, at 6:11 AM, Diego Avesani wrote: > > Dear all, Dear Gilles, > > I do not understand, I am sorry. > I did a "grep" on my code and I find only "MPI_WAITALL(2", so I am not able > to find the error. > > > Thanks a lot > > > > Diego > > > On 29 January 2016 at 11:58, Gilles Gouaillardet > wrote: > Diego, > > your code snippet does MPI_Waitall(2,...) > but the error is about MPI_Waitall(3,...) > > Cheers, > > Gilles > > > On Friday, January 29, 2016, Diego Avesani wrote: > Dear all, > > I have created a program in fortran and OpenMPI, I test it on my laptop and > it works. > I would like to use it on a cluster that has, unfortunately, intel MPI. > > The program crushes on the cluster and I get the following error: > > Fatal error in MPI_Waitall: Invalid MPI_Request, error stack: > MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, > status_array=0x744600) failed > MPI_Waitall(119): The supplied request in array element 2 was invalid (kind=0) > > Do OpenMPI and MPI have some difference that I do not know? > > this is my code > > REQUEST = MPI_REQUEST_NULL > !send data share with left > IF(MPIdata%rank.NE.0)THEN > MsgLength = MPIdata%imaxN > DO icount=1,MPIdata%imaxN > iNode = MPIdata%nodeFromUp(icount) > send_messageL(icount) = R1(iNode) > ENDDO > CALL MPI_ISEND(send_messageL, MsgLength, MPIdata%AUTO_COMP, > MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, REQUEST(1), MPIdata%iErr) > ENDIF > ! > !recive message FROM RIGHT CPU > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN > MsgLength = MPIdata%imaxN > CALL MPI_IRECV(recv_messageR, MsgLength, MPIdata%AUTO_COMP, > MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD, REQUEST(2), MPIdata%iErr) > ENDIF > CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN > DO i=1,MPIdata%imaxN >iNode=MPIdata%nodeList2Up(i) >R1(iNode)=recv_messageR(i) > ENDDO > ENDIF > > Thank a lot your help > > > > Diego > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28411.php > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28413.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] OpenMPI against multiple, evolving SLURM versions
on second thought, is there any chance your sysadmin removed the old libslurm.so.x but kept the old libpmix.so.y ? in this case, the real issue would be hidden your sysadmin "broke" the old libpmi, but you want to use the new one indeed. Cheers, Gilles On Friday, January 29, 2016, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Is openmpi linked with a static libpmi.a that requires a dynamic libslurm ? > that can be checked with ldd mca_ess_pmi.so > > btw, do slurm folks increase the libpmi.so version each time slurm is > upgraded ? > that could be a part of the issue ... > but if they increase lib version because of abi changes, it might be a bad > idea to open libxxx.so instead of libxxx.so.y > generally speaking, libxxx.so.y is provided by libxxx package, and > libxxx.so is provided by libxxx-devel package, which means it might not be > available on compute nodes. > we could also dlopen libxxx instead of linking with it, and have the > sysadmin configure openmpi so it finds the right lib (this approach is used > by a prominent vendor, and has other pros but also cons) > > Cheers, > > Gilles > > On Friday, January 29, 2016, Ralph Castain > wrote: > >> It makes sense - but isn’t it slurm that is linking libpmi against >> libslurm? I don’t think we are making that connection, so it would be a >> slurm issue to change it. >> >> >> On Jan 28, 2016, at 10:12 PM, William Law wrote: >> >> Hi, >> >> Our group can't find anyway to do this and it'd be helpful. >> >> We use slurm and keep upgrading the slurm environment. OpenMPI bombs out >> against PMI each time the libslurm stuff changes, which seems to be fairly >> regularly. Is there a way to compile against slurm but insulate ourselves >> from the libslurm chaos? Obvious will ask the slurm folks too. >> >> [*wlaw*@some-node /scratch/users/wlaw/imb/src]$ mpirun -n 2 --mca >> grpcomm ^pmi ./IMB-MPI1 >> [some-node.local:42584] mca: base: component_find: unable to open >> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_ess_pmi: >> libslurm.so.28: cannot open shared object file: No such file or directory >> (ignored) >> [some-node.local:42585] mca: base: component_find: unable to open >> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi: >> libslurm.so.28: cannot open shared object file: No such file or directory >> (ignored) >> [some-node.local:42586] mca: base: component_find: unable to open >> /share/sw/free/openmpi/1.6.5/intel/13sp1up1/lib/openmpi/mca_pubsub_pmi: >> libslurm.so.28: cannot open shared object file: No such file or directory >> (ignored) >> >> (sent it via the wrong email so it bounced. heh) >> >> Upon further investigation it seems like the most appropriate thing would >> be to point it at compile time to libslurm.so instead of libslurm.so.xx; >> does that make sense? >> >> Thanks, >> >> Will >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28408.php >> >> >>
Re: [OMPI users] difference between OpenMPI - intel MPI mpi_waitall
Dear all, Dear Jeff, Dear Gilles, I am sorry, porblably I am a stubborn. In all my code I have CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) how can it became "3"? the only thing that I can think is that MPI starts to allocate the vector from "0", while fortran starts from 1. Indeed I allocate REQUEST(2) what do you think? Diego Diego On 29 January 2016 at 12:43, Jeff Squyres (jsquyres) wrote: > You must have an error elsewhere in your code; as Gilles pointed, the > error message states that you are calling MPI_WAITALL with a first argument > of 3: > > -- > MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, > status_array=0x744600) failed > -- > > We can't really help you with problems with Intel MPI; sorry. You'll need > to contact their tech support for assistance. > > > > > On Jan 29, 2016, at 6:11 AM, Diego Avesani > wrote: > > > > Dear all, Dear Gilles, > > > > I do not understand, I am sorry. > > I did a "grep" on my code and I find only "MPI_WAITALL(2", so I am not > able to find the error. > > > > > > Thanks a lot > > > > > > > > Diego > > > > > > On 29 January 2016 at 11:58, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > > Diego, > > > > your code snippet does MPI_Waitall(2,...) > > but the error is about MPI_Waitall(3,...) > > > > Cheers, > > > > Gilles > > > > > > On Friday, January 29, 2016, Diego Avesani > wrote: > > Dear all, > > > > I have created a program in fortran and OpenMPI, I test it on my laptop > and it works. > > I would like to use it on a cluster that has, unfortunately, intel MPI. > > > > The program crushes on the cluster and I get the following error: > > > > Fatal error in MPI_Waitall: Invalid MPI_Request, error stack: > > MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, > status_array=0x744600) failed > > MPI_Waitall(119): The supplied request in array element 2 was invalid > (kind=0) > > > > Do OpenMPI and MPI have some difference that I do not know? > > > > this is my code > > > > REQUEST = MPI_REQUEST_NULL > > !send data share with left > > IF(MPIdata%rank.NE.0)THEN > > MsgLength = MPIdata%imaxN > > DO icount=1,MPIdata%imaxN > > iNode = MPIdata%nodeFromUp(icount) > > send_messageL(icount) = R1(iNode) > > ENDDO > > CALL MPI_ISEND(send_messageL, MsgLength, MPIdata%AUTO_COMP, > MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, REQUEST(1), MPIdata%iErr) > > ENDIF > > ! > > !recive message FROM RIGHT CPU > > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN > > MsgLength = MPIdata%imaxN > > CALL MPI_IRECV(recv_messageR, MsgLength, MPIdata%AUTO_COMP, > MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD, REQUEST(2), MPIdata%iErr) > > ENDIF > > CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) > > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN > > DO i=1,MPIdata%imaxN > >iNode=MPIdata%nodeList2Up(i) > >R1(iNode)=recv_messageR(i) > > ENDDO > > ENDIF > > > > Thank a lot your help > > > > > > > > Diego > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28411.php > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28413.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/01/28414.php >
Re: [OMPI users] difference between OpenMPI - intel MPI mpi_waitall
On Jan 29, 2016, at 7:55 AM, Diego Avesani wrote: > > Dear all, Dear Jeff, Dear Gilles, > > I am sorry, porblably I am a stubborn. > > In all my code I have > > CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) > > how can it became "3"? I don't know. You'll need to check your code, verify that you sent us the right error message, and/or contact Intel MPI technical support. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] difference between OpenMPI - intel MPI mpi_waitall
Diego, First, you can double check the program you are running has been compiled from your sources. then you can run your program under a debugger, and browse the stack when it crashes. there could be a bug in intelmpi, that incorrectly translates 2 in Fortran to 3 in C... but as far as I am concerned, this is extremely unlikely. Cheers, Gilles On Friday, January 29, 2016, Diego Avesani wrote: > Dear all, Dear Jeff, Dear Gilles, > > I am sorry, porblably I am a stubborn. > > In all my code I have > > CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) > > how can it became "3"? > > the only thing that I can think is that MPI starts to allocate the vector > from "0", while fortran starts from 1. Indeed I allocate REQUEST(2) > > what do you think? > > Diego > > > > Diego > > > On 29 January 2016 at 12:43, Jeff Squyres (jsquyres) > wrote: > >> You must have an error elsewhere in your code; as Gilles pointed, the >> error message states that you are calling MPI_WAITALL with a first argument >> of 3: >> >> -- >> MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, >> status_array=0x744600) failed >> -- >> >> We can't really help you with problems with Intel MPI; sorry. You'll >> need to contact their tech support for assistance. >> >> >> >> > On Jan 29, 2016, at 6:11 AM, Diego Avesani > > wrote: >> > >> > Dear all, Dear Gilles, >> > >> > I do not understand, I am sorry. >> > I did a "grep" on my code and I find only "MPI_WAITALL(2", so I am not >> able to find the error. >> > >> > >> > Thanks a lot >> > >> > >> > >> > Diego >> > >> > >> > On 29 January 2016 at 11:58, Gilles Gouaillardet < >> gilles.gouaillar...@gmail.com >> > wrote: >> > Diego, >> > >> > your code snippet does MPI_Waitall(2,...) >> > but the error is about MPI_Waitall(3,...) >> > >> > Cheers, >> > >> > Gilles >> > >> > >> > On Friday, January 29, 2016, Diego Avesani > > wrote: >> > Dear all, >> > >> > I have created a program in fortran and OpenMPI, I test it on my laptop >> and it works. >> > I would like to use it on a cluster that has, unfortunately, intel MPI. >> > >> > The program crushes on the cluster and I get the following error: >> > >> > Fatal error in MPI_Waitall: Invalid MPI_Request, error stack: >> > MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, >> status_array=0x744600) failed >> > MPI_Waitall(119): The supplied request in array element 2 was invalid >> (kind=0) >> > >> > Do OpenMPI and MPI have some difference that I do not know? >> > >> > this is my code >> > >> > REQUEST = MPI_REQUEST_NULL >> > !send data share with left >> > IF(MPIdata%rank.NE.0)THEN >> > MsgLength = MPIdata%imaxN >> > DO icount=1,MPIdata%imaxN >> > iNode = MPIdata%nodeFromUp(icount) >> > send_messageL(icount) = R1(iNode) >> > ENDDO >> > CALL MPI_ISEND(send_messageL, MsgLength, MPIdata%AUTO_COMP, >> MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, REQUEST(1), MPIdata%iErr) >> > ENDIF >> > ! >> > !recive message FROM RIGHT CPU >> > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN >> > MsgLength = MPIdata%imaxN >> > CALL MPI_IRECV(recv_messageR, MsgLength, MPIdata%AUTO_COMP, >> MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD, REQUEST(2), MPIdata%iErr) >> > ENDIF >> > CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) >> > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN >> > DO i=1,MPIdata%imaxN >> >iNode=MPIdata%nodeList2Up(i) >> >R1(iNode)=recv_messageR(i) >> > ENDDO >> > ENDIF >> > >> > Thank a lot your help >> > >> > >> > >> > Diego >> > >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28411.php >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28413.php >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/01/28414.php >> > >
Re: [OMPI users] difference between OpenMPI - intel MPI mpi_waitall
Dear all, I am really sorry for the time that you dedicated to me. this is what I found: REQUEST = MPI_REQUEST_NULL !send data share with UP IF(MPIdata%rank.NE.0)THEN MsgLength = MPIdata%imaxN DO icount=1,MPIdata%imaxN iNode = MPIdata%nodeFromUp(icount) send_messageL(icount) = R1(iNode) ENDDO CALL MPI_ISEND(send_messageL, MsgLength, MPIdata%AUTO_COMP, MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, REQUEST(1), MPIdata%iErr) ENDIF ! !recive message FROM up CPU IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN MsgLength = MPIdata%imaxN CALL MPI_IRECV(recv_messageR, MsgLength, MPIdata%AUTO_COMP, MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD, REQUEST(2), MPIdata%iErr) ENDIF CALL MPI_WAITALL(nMsg,REQUEST,send_status_list,MPIdata%iErr) IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN DO i=1,MPIdata%imaxN iNode=MPIdata%nodeList2Up(i) R1(iNode)=recv_messageR(i) ENDDO ENDIF As you can see there is a nMsg which is set equal to "3". Do I have to set it equal to? Am I right? Diego On 29 January 2016 at 14:09, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Diego, > > First, you can double check the program you are running has been compiled > from your sources. > > then you can run your program under a debugger, and browse the stack when > it crashes. > > there could be a bug in intelmpi, that incorrectly translates 2 in Fortran > to 3 in C... > but as far as I am concerned, this is extremely unlikely. > > Cheers, > > Gilles > > On Friday, January 29, 2016, Diego Avesani > wrote: > >> Dear all, Dear Jeff, Dear Gilles, >> >> I am sorry, porblably I am a stubborn. >> >> In all my code I have >> >> CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) >> >> how can it became "3"? >> >> the only thing that I can think is that MPI starts to allocate the vector >> from "0", while fortran starts from 1. Indeed I allocate REQUEST(2) >> >> what do you think? >> >> Diego >> >> >> >> Diego >> >> >> On 29 January 2016 at 12:43, Jeff Squyres (jsquyres) >> wrote: >> >>> You must have an error elsewhere in your code; as Gilles pointed, the >>> error message states that you are calling MPI_WAITALL with a first argument >>> of 3: >>> >>> -- >>> MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, >>> status_array=0x744600) failed >>> -- >>> >>> We can't really help you with problems with Intel MPI; sorry. You'll >>> need to contact their tech support for assistance. >>> >>> >>> >>> > On Jan 29, 2016, at 6:11 AM, Diego Avesani >>> wrote: >>> > >>> > Dear all, Dear Gilles, >>> > >>> > I do not understand, I am sorry. >>> > I did a "grep" on my code and I find only "MPI_WAITALL(2", so I am not >>> able to find the error. >>> > >>> > >>> > Thanks a lot >>> > >>> > >>> > >>> > Diego >>> > >>> > >>> > On 29 January 2016 at 11:58, Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com> wrote: >>> > Diego, >>> > >>> > your code snippet does MPI_Waitall(2,...) >>> > but the error is about MPI_Waitall(3,...) >>> > >>> > Cheers, >>> > >>> > Gilles >>> > >>> > >>> > On Friday, January 29, 2016, Diego Avesani >>> wrote: >>> > Dear all, >>> > >>> > I have created a program in fortran and OpenMPI, I test it on my >>> laptop and it works. >>> > I would like to use it on a cluster that has, unfortunately, intel MPI. >>> > >>> > The program crushes on the cluster and I get the following error: >>> > >>> > Fatal error in MPI_Waitall: Invalid MPI_Request, error stack: >>> > MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, >>> status_array=0x744600) failed >>> > MPI_Waitall(119): The supplied request in array element 2 was invalid >>> (kind=0) >>> > >>> > Do OpenMPI and MPI have some difference that I do not know? >>> > >>> > this is my code >>> > >>> > REQUEST = MPI_REQUEST_NULL >>> > !send data share with left >>> > IF(MPIdata%rank.NE.0)THEN >>> > MsgLength = MPIdata%imaxN >>> > DO icount=1,MPIdata%imaxN >>> > iNode = MPIdata%nodeFromUp(icount) >>> > send_messageL(icount) = R1(iNode) >>> > ENDDO >>> > CALL MPI_ISEND(send_messageL, MsgLength, MPIdata%AUTO_COMP, >>> MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, REQUEST(1), MPIdata%iErr) >>> > ENDIF >>> > ! >>> > !recive message FROM RIGHT CPU >>> > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN >>> > MsgLength = MPIdata%imaxN >>> > CALL MPI_IRECV(recv_messageR, MsgLength, MPIdata%AUTO_COMP, >>> MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD, REQUEST(2), MPIdata%iErr) >>> > ENDIF >>> > CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) >>> > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN >>> > DO i=1,MPIdata%imaxN >>> >iNode=MPIdata%nodeList2Up(i) >>> >R1(iNode)=recv_messageR(i) >>> > ENDDO >>> > ENDIF >>> > >>> > Thank a lot your help >>> > >>> > >>> > >>> > Diego >>> > >>> > >>> > ___ >>> > users mailing list >>> > us...@open-mpi.org >>> > Subscription: http://www.open-mpi
Re: [OMPI users] difference between OpenMPI - intel MPI mpi_waitall
> On Jan 29, 2016, at 9:43 AM, Diego Avesani wrote: > > Dear all, > > I am really sorry for the time that you dedicated to me. > > this is what I found: > > REQUEST = MPI_REQUEST_NULL I'm not enough of a fortran expert to know -- does this assign MPI_REQUEST_NULL to every entry in the REQUEST array? > !send data share with UP > IF(MPIdata%rank.NE.0)THEN > MsgLength = MPIdata%imaxN > DO icount=1,MPIdata%imaxN > iNode = MPIdata%nodeFromUp(icount) > send_messageL(icount) = R1(iNode) > ENDDO > CALL MPI_ISEND(send_messageL, MsgLength, MPIdata%AUTO_COMP, > MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, REQUEST(1), MPIdata%iErr) > ENDIF > ! > !recive message FROM up CPU > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN > MsgLength = MPIdata%imaxN > CALL MPI_IRECV(recv_messageR, MsgLength, MPIdata%AUTO_COMP, > MPIdata%rank+1, MPIdata%rank+1, MPI_COMM_WORLD, REQUEST(2), MPIdata%iErr) > ENDIF I only see you setting REQUEST(1) and REQUEST(2) above, so I would assume that you need to send nMsg to 2. That being said, it's valid to pass MPI_REQUEST_NULL in to any of the MPI_WAIT/TEST functions. So it should be permissible to send 3 in, if a) REQUEST is long enough, b) REQUEST(3) has been initialized to MPI_REQUEST_NULL, and c) send_status_list is long enough (you didn't include the declaration for it anywhere). A major point: if REQUEST or send_status_list is only of length 2, then nMsg should not be larger than 2. > CALL MPI_WAITALL(nMsg,REQUEST,send_status_list,MPIdata%iErr) > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN > DO i=1,MPIdata%imaxN >iNode=MPIdata%nodeList2Up(i) >R1(iNode)=recv_messageR(i) > ENDDO > ENDIF > > As you can see there is a nMsg which is set equal to "3". Do I have to set it > equal to? Am I right? > > > > > > Diego > > > On 29 January 2016 at 14:09, Gilles Gouaillardet > wrote: > Diego, > > First, you can double check the program you are running has been compiled > from your sources. > > then you can run your program under a debugger, and browse the stack when it > crashes. > > there could be a bug in intelmpi, that incorrectly translates 2 in Fortran to > 3 in C... > but as far as I am concerned, this is extremely unlikely. > > Cheers, > > Gilles > > On Friday, January 29, 2016, Diego Avesani wrote: > Dear all, Dear Jeff, Dear Gilles, > > I am sorry, porblably I am a stubborn. > > In all my code I have > > CALL MPI_WAITALL(2,REQUEST,send_status_list,MPIdata%iErr) > > how can it became "3"? > > the only thing that I can think is that MPI starts to allocate the vector > from "0", while fortran starts from 1. Indeed I allocate REQUEST(2) > > what do you think? > > Diego > > > > Diego > > > On 29 January 2016 at 12:43, Jeff Squyres (jsquyres) > wrote: > You must have an error elsewhere in your code; as Gilles pointed, the error > message states that you are calling MPI_WAITALL with a first argument of 3: > > -- > MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, > status_array=0x744600) failed > -- > > We can't really help you with problems with Intel MPI; sorry. You'll need to > contact their tech support for assistance. > > > > > On Jan 29, 2016, at 6:11 AM, Diego Avesani wrote: > > > > Dear all, Dear Gilles, > > > > I do not understand, I am sorry. > > I did a "grep" on my code and I find only "MPI_WAITALL(2", so I am not able > > to find the error. > > > > > > Thanks a lot > > > > > > > > Diego > > > > > > On 29 January 2016 at 11:58, Gilles Gouaillardet > > wrote: > > Diego, > > > > your code snippet does MPI_Waitall(2,...) > > but the error is about MPI_Waitall(3,...) > > > > Cheers, > > > > Gilles > > > > > > On Friday, January 29, 2016, Diego Avesani wrote: > > Dear all, > > > > I have created a program in fortran and OpenMPI, I test it on my laptop and > > it works. > > I would like to use it on a cluster that has, unfortunately, intel MPI. > > > > The program crushes on the cluster and I get the following error: > > > > Fatal error in MPI_Waitall: Invalid MPI_Request, error stack: > > MPI_Waitall(271): MPI_Waitall(count=3, req_array=0x7445f0, > > status_array=0x744600) failed > > MPI_Waitall(119): The supplied request in array element 2 was invalid > > (kind=0) > > > > Do OpenMPI and MPI have some difference that I do not know? > > > > this is my code > > > > REQUEST = MPI_REQUEST_NULL > > !send data share with left > > IF(MPIdata%rank.NE.0)THEN > > MsgLength = MPIdata%imaxN > > DO icount=1,MPIdata%imaxN > > iNode = MPIdata%nodeFromUp(icount) > > send_messageL(icount) = R1(iNode) > > ENDDO > > CALL MPI_ISEND(send_messageL, MsgLength, MPIdata%AUTO_COMP, > > MPIdata%rank-1, MPIdata%rank, MPI_COMM_WORLD, REQUEST(1), MPIdata%iErr) > > ENDIF > > ! > > !recive message FROM RIGHT CPU > > IF(MPIdata%rank.NE.MPIdata%nCPU-1)THEN > > MsgLength = MPIdata%imaxN > > CALL MPI_IRECV(recv_mess
Re: [OMPI users] difference between OpenMPI - intel MPI mpi_waitall
On Fri, Jan 29, 2016 at 2:45 AM, Diego Avesani wrote: > Dear all, > > I have created a program in fortran and OpenMPI, I test it on my laptop > and it works. > I would like to use it on a cluster that has, unfortunately, intel MPI. > > You can install any open-source MPI implementation from user space. This includes Open-MPI, MPICH, and MVAPICH2. If you like Open-MPI, try this: cd $OMPI_DIR && mkdir build && cd build && ../configure --prefix=$HOME/ompi-install && make -j && make install ...or something like that. I'm sure the details are properly documented online. Jeff -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/