Re: [OMPI users] ERROR: C_FUNLOC function
Hi Siegmar, a similar issue was reported in mpich with xlf compilers : http://trac.mpich.org/projects/mpich/ticket/2144 They concluded this is a compiler issue (e.g. the compiler does not implement TS 29113 subclause 8.1) Jeff, i made PR 315 https://github.com/open-mpi/ompi/pull/315 f08 binding support is disabled if TS29113 subclause 8.1 is not implemented could you please review/comment on this ? Cheers, Gilles On 2014/12/12 2:28, Siegmar Gross wrote: > Hi Jeff, > > will you have the time to fix the Fortran problem for the new Oracle > Solaris Studio 12.4 compiler suite in openmpi-1.8.4? > > tyr openmpi-1.8.4rc2-SunOS.sparc.64_cc 103 tail -15 > log.make.SunOS.sparc.64_cc > PPFC comm_compare_f08.lo > PPFC comm_connect_f08.lo > PPFC comm_create_errhandler_f08.lo > >fn = c_funloc(comm_errhandler_fn) > ^ > "../../../../../openmpi-1.8.4rc2/ompi/mpi/fortran/use-mpi-f08/comm_create_errhan > dler_f08.F90", Line = 22, Column = 18: ERROR: C_FUNLOC function argument must > be > a procedure that is interoperable or a procedure pointer associated with an > interoperable procedure. > ... > > > Kind regards > > Siegmar > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25963.php
Re: [OMPI users] ERROR: C_FUNLOC function
Hi Gilles, > a similar issue was reported in mpich with xlf compilers : > http://trac.mpich.org/projects/mpich/ticket/2144 > > They concluded this is a compiler issue (e.g. the compiler does not > implement TS 29113 subclause 8.1) Thank you very much. I'll report the problem to Oracle and perhaps they can and will fix their compiler problem. Kind regards Siegmar > Jeff, > i made PR 315 https://github.com/open-mpi/ompi/pull/315 > f08 binding support is disabled if TS29113 subclause 8.1 is not implemented > could you please review/comment on this ? > > > Cheers, > > Gilles > > > On 2014/12/12 2:28, Siegmar Gross wrote: > > Hi Jeff, > > > > will you have the time to fix the Fortran problem for the new Oracle > > Solaris Studio 12.4 compiler suite in openmpi-1.8.4? > > > > tyr openmpi-1.8.4rc2-SunOS.sparc.64_cc 103 tail -15 log.make.SunOS.sparc.64_cc > > PPFC comm_compare_f08.lo > > PPFC comm_connect_f08.lo > > PPFC comm_create_errhandler_f08.lo > > > >fn = c_funloc(comm_errhandler_fn) > > ^ > > "../../../../../openmpi-1.8.4rc2/ompi/mpi/fortran/use-mpi-f08/comm_create_errhan > > dler_f08.F90", Line = 22, Column = 18: ERROR: C_FUNLOC function argument must be > > a procedure that is interoperable or a procedure pointer associated with an > > interoperable procedure. > > ... > > > > > > Kind regards > > > > Siegmar > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: http://www.open-mpi.org/community/lists/users/2014/12/25963.php >
Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.
Hi Brice, - Mensaje original - > De: "Brice Goglin" > CC: "Open MPI Users" > Enviado: Jueves, 11 de Diciembre 2014 19:46:44 > Asunto: Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta > gcc 5.0 compiler. > > This problem was fixed in hwloc upstream recently. > > https://github.com/open-mpi/hwloc/commit/790aa2e1e62be6b4f37622959de9ce3766ebc57e Great! However, yesterday I downloaded the versions 1.8.3 (stable) and 1.8.4rc3 of OpenMPI, and tried to use its more traditional configuration. It was OK on ia64 (as before) but failed again on ia32. Then again, I had to use the external installation of hwloc in order to fix it. Regards. Jorge. > Le 11/12/2014 23:40, Jorge D'Elia a écrit : > > Dear Jeff, > > > > Our updates of OpenMPI to 1.8.3 (and 1.8.4) were > > all OK using Fedora >= 17 and system gcc compilers > > on ia32 or ia64 machines. > > > > However, the "make all" step failed using Fedora 14 > > with a beta gcc 5.0 compiler on an ia32 machine > > with message like: > > > > Error: symbol `Lhwloc1' is already defined > > > > A roundabout way to solve it was perform, first, > > a separated installation of the hwloc package (we use > > Release v1.10.0 (stable)) and, second, configure > > OpenMPI using its flag: > > > > --with-hwloc=${HWLOC_HOME} > > > > although, in this way, the include and library path > > must be given, e.g. > > > > export CFLAGS="-I/usr/beta/hwloc/include" ; echo ${CFLAGS} > > export LDFLAGS="-L/usr/beta/hwloc/lib"; echo ${LDFLAGS} > > export LIBS="-lhwloc" ; echo ${LIBS} > > > > In order to verify that the hwloc works OK, it would be useful > > to include in the OpenMPI distribution a simple test like > > > > $ gcc ${CFLAGS} ${LDFLAGS} -o hwloc-hello.exe hwloc-hello.c ${LIBS} > > $ ./hwloc-hello.exe > > > > (we apologize to forget to use the --with-hwloc-libdir flag ...). > > > > With this previous step we could overcome the fatal error > > in the configuration step related to the hwloc package. > > > > This (fixed) trouble in the configuration step is the same > > as the reported as: > > > > Open MPI 1.8.1: "make all" error: symbol `Lhwloc1' is already defined > > > > on 2014-08-12 15:08:38 > > > > > > Regards, > > Jorge. > > > > - Mensaje original - > >> De: "Jorge D'Elia" > >> Para: "Open MPI Users" > >> Enviado: Martes, 12 de Agosto 2014 16:08:38 > >> Asunto: Re: [OMPI users] Open MPI 1.8.1: "make all" error: symbol > >> `Lhwloc1' is already defined > >> > >> Dear Jeff, > >> > >> These new versions of the tgz files replace the previous ones: > >> I had used an old outdated session environment. However, the > >> configuration and installation was OK again in each case. > >> Sorry for the noise caused by the previous tgz files. > >> > >> Regards, > >> Jorge. > >> > >> - Mensaje original - > >>> De: "Jorge D'Elia" > >>> Para: "Open MPI Users" > >>> Enviados: Martes, 12 de Agosto 2014 15:16:19 > >>> Asunto: Re: [OMPI users] Open MPI 1.8.1: "make all" error: symbol > >>> `Lhwloc1' > >>> is already defined > >>> > >>> Dear Jeff, > >>> > >>> - Mensaje original - > De: "Jeff Squyres (jsquyres)" > Para: "Open MPI User's List" > Enviado: Lunes, 11 de Agosto 2014 11:47:29 > Asunto: Re: [OMPI users] Open MPI 1.8.1: "make all" error: symbol > `Lhwloc1' > is already defined > > The problem appears to be occurring in the hwloc component in OMPI. > Can you download hwloc 1.7.2 (standalone) and try to build that on > the target machine and see what happens? > > http://www.open-mpi.org/software/hwloc/v1.7/ > >>> OK. Just in case I tried both version 1.7.2 and 1.9 (stable). > >>> Both gave no errors in the configuration or installation. > >>> Attached a *.tgz file for each case. Greetings. Jorge. > >>> > >>> > On Aug 10, 2014, at 11:16 AM, Jorge D'Elia > wrote: > > > Hi, > > > > I tried to re-compile Open MPI 1.8.1 version for Linux > > on an ia32 machine with Fedora 14 although using the > > last version of Gfortran (Gfortran 4.10 is required > > by a user program which runs ok). > > > > However, the "make all" phase breaks with the > > error message: > > > > Error: symbol `Lhwloc1' is already defined > > > > I attached a tgz file (tar -zcvf) with: > > > > Output "configure.txt" from "./configure" Open MPI phase; > > The "config.log" file from the top-level Open MPI directory; > > Output "make.txt"from "make all" to build Open MPI; > > Output "make-v1.txt" from "make V=1" to build Open MPI; > > Outputs from cat /proc/version and cat /proc/cpuinfo > > > > Please, any clue in order to fix? > > > > Regards in advance. > > Jorge. > > > > -- > > CIMEC (UNL-CONICET) Predio CONICET-Santa Fe, Colectora Ruta Nac 168, > > Paraje El Pozo, S3000GLN Santa Fe, ARGENTINA, http://www.cimec.org.ar/ > > Tel +54 342 4
Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.
Le 15/12/2014 10:35, Jorge D'Elia a écrit : > Hi Brice, > > - Mensaje original - >> De: "Brice Goglin" >> CC: "Open MPI Users" >> Enviado: Jueves, 11 de Diciembre 2014 19:46:44 >> Asunto: Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta >> gcc 5.0 compiler. >> >> This problem was fixed in hwloc upstream recently. >> >> https://github.com/open-mpi/hwloc/commit/790aa2e1e62be6b4f37622959de9ce3766ebc57e > Great! However, yesterday I downloaded the versions 1.8.3 (stable) and > 1.8.4rc3 of OpenMPI, and tried to use its more traditional configuration. > It was OK on ia64 (as before) but failed again on ia32. Then again, > I had to use the external installation of hwloc in order to fix it. > It's fixed in "upstream hwloc", not in OMPI yet. I have prepared a long branch of hwloc fixes that OMPI should pull, but it will take some time. thanks Brice
Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.
FWIW, if it would be easier, we can just pull a new hwloc tarball -- that's how we've done it in the past (vs. trying to pull individual patches). It's also easier to pull a release tarball, because then we can say "hwloc vX.Y.Z is in OMPI vA.B.C", rather than have to try to examine/explain what exact level of hwloc is in OMPI (based on patches, etc.). On Dec 15, 2014, at 4:39 AM, Brice Goglin wrote: > Le 15/12/2014 10:35, Jorge D'Elia a écrit : >> Hi Brice, >> >> - Mensaje original - >>> De: "Brice Goglin" >>> CC: "Open MPI Users" >>> Enviado: Jueves, 11 de Diciembre 2014 19:46:44 >>> Asunto: Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta >>> gcc 5.0 compiler. >>> >>> This problem was fixed in hwloc upstream recently. >>> >>> https://github.com/open-mpi/hwloc/commit/790aa2e1e62be6b4f37622959de9ce3766ebc57e >> Great! However, yesterday I downloaded the versions 1.8.3 (stable) and >> 1.8.4rc3 of OpenMPI, and tried to use its more traditional configuration. >> It was OK on ia64 (as before) but failed again on ia32. Then again, >> I had to use the external installation of hwloc in order to fix it. >> > > It's fixed in "upstream hwloc", not in OMPI yet. I have prepared a long > branch of hwloc fixes that OMPI should pull, but it will take some time. > thanks > Brice > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25995.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Hi Gilles, here is a simple setup to have valgrind caomplains now: export too_long=./this/is/a_very/long/path/that/contains/a/not/so/long/filename/but/trying/to/collectively/mpi_file_open/it/you/will/have/a/memory/corruption/resulting/of/invalide/writing/or/reading/past/the/end/of/one/or/some/hidden/strings/in/mpio/Simple/user/would/like/to/have/the/parameter/checked/and/an/error/returned/or/this/limit/removed mkdir -p $too_long echo "hello world." > $too_long/toto.txt mpicc -o bug_MPI_File_open_path_too_long bug_MPI_File_open_path_too_long.c mpirun -np 2 valgrind ./bug_MPI_File_open_path_too_long $too_long/toto.txt and look at valgrind errors for invalid read/write on rank0/1. This particular simple case doesn't segfault without valgrind, but in as reported in my real case, it does! Thanks! Eric #include "mpi.h" #include #include #include void abortOnError(int ierr) { if (ierr != MPI_SUCCESS) { printf("ERROR Returned by MPI: %d\n",ierr); char* lCharPtr = (char*) malloc(sizeof(char)*MPI_MAX_ERROR_STRING); int lLongueur = 0; MPI_Error_string(ierr,lCharPtr, &lLongueur); printf("ERROR_string Returned by MPI: %s\n",lCharPtr); free(lCharPtr); MPI_Abort( MPI_COMM_WORLD, 1 ); } } int openFileCollectivelyAndReadMyFormat(char* pFileName) { int lReturnValue = 0; MPI_File lFile = 0; printf("Opening the file by MPI_file_open : %s\n", pFileName); abortOnError(MPI_File_open( MPI_COMM_WORLD, pFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &lFile )); /*printf ("ierr=%d, lFile=%ld, lFile == MPI_FILE_NULL ? %d",ierr,lFile, lFile == MPI_FILE_NULL);*/ long int lTrois = 0; char lCharGIS[]="123\0"; long int lOnze = 0; char lCharVersion10[]="12345678901\0"; abortOnError(MPI_File_read_all(lFile,&lTrois , 1, MPI_LONG, MPI_STATUS_IGNORE)); abortOnError(MPI_File_read_all(lFile,lCharGIS, 3, MPI_CHAR, MPI_STATUS_IGNORE)); if (3 != lTrois) { lReturnValue = 1; } if (0 == lReturnValue && 0 != strcmp(lCharGIS, "123\0")) { lReturnValue = 2; } if (lFile) { printf(" ...closing the file %s\n", pFileName); abortOnError(MPI_File_close(&lFile )); } return lReturnValue; } int main(int argc, char *argv[]) { char lValeur[1024]; char *lHints[] = {"cb_nodes", "striping_factor", "striping_unit"}; int flag; MPI_Init(&argc, &argv); if (2 != argc) { printf("ERROR: you must specify a filename to create.\n"); MPI_Finalize(); return 1; } if (strlen(argv[1]) < 256) { printf("ERROR: you must specify a path+filename longer than 256 to have the bug!.\n"); MPI_Finalize(); return 1; } int lResult = 0; int i; for (i=0; i<10 ; ++i) { lResult |= openFileCollectivelyAndReadMyFormat(argv[1]); } MPI_Finalize(); return lResult; }
Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.
Sorry, I should have been clearer - that was indeed what I was expecting to see. I guess it begs the question - should we just update to something like 1.9 so Brice doesn't have to worry about back porting future fixes this far back? On Mon, Dec 15, 2014 at 7:22 AM, Jeff Squyres (jsquyres) wrote: > > FWIW, if it would be easier, we can just pull a new hwloc tarball -- > that's how we've done it in the past (vs. trying to pull individual > patches). It's also easier to pull a release tarball, because then we can > say "hwloc vX.Y.Z is in OMPI vA.B.C", rather than have to try to > examine/explain what exact level of hwloc is in OMPI (based on patches, > etc.). > > > On Dec 15, 2014, at 4:39 AM, Brice Goglin wrote: > > > Le 15/12/2014 10:35, Jorge D'Elia a écrit : > >> Hi Brice, > >> > >> - Mensaje original - > >>> De: "Brice Goglin" > >>> CC: "Open MPI Users" > >>> Enviado: Jueves, 11 de Diciembre 2014 19:46:44 > >>> Asunto: Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a > beta gcc 5.0 compiler. > >>> > >>> This problem was fixed in hwloc upstream recently. > >>> > >>> > https://github.com/open-mpi/hwloc/commit/790aa2e1e62be6b4f37622959de9ce3766ebc57e > >> Great! However, yesterday I downloaded the versions 1.8.3 (stable) and > >> 1.8.4rc3 of OpenMPI, and tried to use its more traditional > configuration. > >> It was OK on ia64 (as before) but failed again on ia32. Then again, > >> I had to use the external installation of hwloc in order to fix it. > >> > > > > It's fixed in "upstream hwloc", not in OMPI yet. I have prepared a long > > branch of hwloc fixes that OMPI should pull, but it will take some time. > > thanks > > Brice > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25995.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25996.php >
Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.
It's your call, v1.8 RM. :-) On the one hand, we've tried to stick with a consistent version of hwloc through an entire version series. But on the other hand, hwloc is wholly internal and shouldn't be visible to apps. So it *might* be harmless to upgrade it. The only real question is: will upgrading hwloc break anything else inside the v1.8 tree? E.g., did new hwloc abstractions/APIs come in after v1.7 that we've adapted to on the trunk, but didn't adapt to on the v1.8 branch? On Dec 15, 2014, at 10:35 AM, Ralph Castain wrote: > Sorry, I should have been clearer - that was indeed what I was expecting to > see. I guess it begs the question - should we just update to something like > 1.9 so Brice doesn't have to worry about back porting future fixes this far > back? > > > > On Mon, Dec 15, 2014 at 7:22 AM, Jeff Squyres (jsquyres) > wrote: > FWIW, if it would be easier, we can just pull a new hwloc tarball -- that's > how we've done it in the past (vs. trying to pull individual patches). It's > also easier to pull a release tarball, because then we can say "hwloc vX.Y.Z > is in OMPI vA.B.C", rather than have to try to examine/explain what exact > level of hwloc is in OMPI (based on patches, etc.). > > > On Dec 15, 2014, at 4:39 AM, Brice Goglin wrote: > > > Le 15/12/2014 10:35, Jorge D'Elia a écrit : > >> Hi Brice, > >> > >> - Mensaje original - > >>> De: "Brice Goglin" > >>> CC: "Open MPI Users" > >>> Enviado: Jueves, 11 de Diciembre 2014 19:46:44 > >>> Asunto: Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a > >>> beta gcc 5.0 compiler. > >>> > >>> This problem was fixed in hwloc upstream recently. > >>> > >>> https://github.com/open-mpi/hwloc/commit/790aa2e1e62be6b4f37622959de9ce3766ebc57e > >> Great! However, yesterday I downloaded the versions 1.8.3 (stable) and > >> 1.8.4rc3 of OpenMPI, and tried to use its more traditional configuration. > >> It was OK on ia64 (as before) but failed again on ia32. Then again, > >> I had to use the external installation of hwloc in order to fix it. > >> > > > > It's fixed in "upstream hwloc", not in OMPI yet. I have prepared a long > > branch of hwloc fixes that OMPI should pull, but it will take some time. > > thanks > > Brice > > > > ___ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/12/25995.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25996.php > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25998.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.
Le 15/12/2014 16:39, Jeff Squyres (jsquyres) a écrit : > The only real question is: will upgrading hwloc break anything else inside > the v1.8 tree? E.g., did new hwloc abstractions/APIs come in after v1.7 that > we've adapted to on the trunk, but didn't adapt to on the v1.8 branch? I wouldn't expect any such problem when upgrading from hwloc 1.7 to 1.9. Brice > > > > On Dec 15, 2014, at 10:35 AM, Ralph Castain wrote: > >> Sorry, I should have been clearer - that was indeed what I was expecting to >> see. I guess it begs the question - should we just update to something like >> 1.9 so Brice doesn't have to worry about back porting future fixes this far >> back? >> >> >> >> On Mon, Dec 15, 2014 at 7:22 AM, Jeff Squyres (jsquyres) >> wrote: >> FWIW, if it would be easier, we can just pull a new hwloc tarball -- that's >> how we've done it in the past (vs. trying to pull individual patches). It's >> also easier to pull a release tarball, because then we can say "hwloc vX.Y.Z >> is in OMPI vA.B.C", rather than have to try to examine/explain what exact >> level of hwloc is in OMPI (based on patches, etc.). >> >> >> On Dec 15, 2014, at 4:39 AM, Brice Goglin wrote: >> >>> Le 15/12/2014 10:35, Jorge D'Elia a écrit : Hi Brice, - Mensaje original - > De: "Brice Goglin" > CC: "Open MPI Users" > Enviado: Jueves, 11 de Diciembre 2014 19:46:44 > Asunto: Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a > beta gcc 5.0 compiler. > > This problem was fixed in hwloc upstream recently. > > https://github.com/open-mpi/hwloc/commit/790aa2e1e62be6b4f37622959de9ce3766ebc57e Great! However, yesterday I downloaded the versions 1.8.3 (stable) and 1.8.4rc3 of OpenMPI, and tried to use its more traditional configuration. It was OK on ia64 (as before) but failed again on ia32. Then again, I had to use the external installation of hwloc in order to fix it. >>> It's fixed in "upstream hwloc", not in OMPI yet. I have prepared a long >>> branch of hwloc fixes that OMPI should pull, but it will take some time. >>> thanks >>> Brice >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/12/25995.php >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/25996.php >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/25998.php >
Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using a beta gcc 5.0 compiler.
Yeah, I recall it was quite clean when I did the upgrade on the trunk. I may take a pass at it and see if anything breaks since it is so easy now to do. :-) On Mon, Dec 15, 2014 at 8:17 AM, Brice Goglin wrote: > > Le 15/12/2014 16:39, Jeff Squyres (jsquyres) a écrit : > > The only real question is: will upgrading hwloc break anything else > inside the v1.8 tree? E.g., did new hwloc abstractions/APIs come in after > v1.7 that we've adapted to on the trunk, but didn't adapt to on the v1.8 > branch? > > I wouldn't expect any such problem when upgrading from hwloc 1.7 to 1.9. > > Brice > > > > > > > > > > On Dec 15, 2014, at 10:35 AM, Ralph Castain wrote: > > > >> Sorry, I should have been clearer - that was indeed what I was > expecting to see. I guess it begs the question - should we just update to > something like 1.9 so Brice doesn't have to worry about back porting future > fixes this far back? > >> > >> > >> > >> On Mon, Dec 15, 2014 at 7:22 AM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: > >> FWIW, if it would be easier, we can just pull a new hwloc tarball -- > that's how we've done it in the past (vs. trying to pull individual > patches). It's also easier to pull a release tarball, because then we can > say "hwloc vX.Y.Z is in OMPI vA.B.C", rather than have to try to > examine/explain what exact level of hwloc is in OMPI (based on patches, > etc.). > >> > >> > >> On Dec 15, 2014, at 4:39 AM, Brice Goglin > wrote: > >> > >>> Le 15/12/2014 10:35, Jorge D'Elia a écrit : > Hi Brice, > > - Mensaje original - > > De: "Brice Goglin" > > CC: "Open MPI Users" > > Enviado: Jueves, 11 de Diciembre 2014 19:46:44 > > Asunto: Re: [OMPI users] OpenMPI 1.8.4 and hwloc in Fedora 14 using > a beta gcc 5.0 compiler. > > > > This problem was fixed in hwloc upstream recently. > > > > > https://github.com/open-mpi/hwloc/commit/790aa2e1e62be6b4f37622959de9ce3766ebc57e > Great! However, yesterday I downloaded the versions 1.8.3 (stable) and > 1.8.4rc3 of OpenMPI, and tried to use its more traditional > configuration. > It was OK on ia64 (as before) but failed again on ia32. Then again, > I had to use the external installation of hwloc in order to fix it. > > >>> It's fixed in "upstream hwloc", not in OMPI yet. I have prepared a long > >>> branch of hwloc fixes that OMPI should pull, but it will take some > time. > >>> thanks > >>> Brice > >>> > >>> ___ > >>> users mailing list > >>> us...@open-mpi.org > >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25995.php > >> > >> -- > >> Jeff Squyres > >> jsquy...@cisco.com > >> For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > >> > >> ___ > >> users mailing list > >> us...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >> Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25996.php > >> ___ > >> users mailing list > >> us...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >> Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25998.php > > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26000.php >
Re: [OMPI users] OMPI users] OMPI users] OMPI users] MPI inside MPI (still)
George, Thanks for the tip. In fact, calling mpi_comm_spawn right away with MPI _COMM_SELF has worked for me just as well -- no subgroups needed at all. I am testing this openmpi app named "siesta" in parallel. The source code is available, so making it "spawn ready" by adding the pair mpi_comm_get_parent + mpi_comm_disconnect into the main code can be done. If it works, maybe the siesta's developers can be convinced to add this feature in a future release. However, siesta is launched only by specifying input/output files with i/o redirection like mpirun -n <*some number*> siesta < infile > outfile So far, I could not find anything about how to set an stdin file for an spawnee process. Specifiyng it in a app context file doesn't seem to work. Can it be done? Maybe through an MCA parameter? Alex 2014-12-15 2:43 GMT-02:00 George Bosilca : > > Alex, > > The code looks good, and is 100% MPI standard accurate. > > I would change the way you create the subcoms in the parent. You do a lot > of useless operations, as you can achieve exactly the same outcome (one > communicator per node), either by duplicating MPI_COMM_SELF or doing an > MPI_Comm_split with the color equal to your rank. > > George. > > > On Sun, Dec 14, 2014 at 2:20 AM, Alex A. Schmidt wrote: > >> Hi, >> >> Sorry, guys. I don't think the newbie here can follow any discussion >> beyond basic mpi... >> >> Anyway, if I add the pair >> >> call MPI_COMM_GET_PARENT(mpi_comm_parent,ierror) >> call MPI_COMM_DISCONNECT(mpi_comm_parent,ierror) >> >> on the spawnee side I get the proper response in the spawning processes. >> >> Please, take a look at the attached toy codes parent.F and child.F >> I've been playing with. 'mpirun -n 2 parent' seems to work as expected. >> >> Alex >> >> 2014-12-13 23:46 GMT-02:00 Gilles Gouaillardet < >> gilles.gouaillar...@gmail.com>: >>> >>> Alex, >>> >>> Are you calling MPI_Comm_disconnect in the 3 "master" tasks and with the >>> same remote communicator ? >>> >>> I also read the man page again, and MPI_Comm_disconnect does not ensure >>> the remote processes have finished or called MPI_Comm_disconnect, so that >>> might not be the thing you need. >>> George, can you please comment on that ? >>> >>> Cheers, >>> >>> Gilles >>> >>> George Bosilca wrote: >>> MPI_Comm_disconnect should be a local operation, there is no reason for >>> it to deadlock. I looked at the code and everything is local with the >>> exception of a call to PMIX.FENCE. Can you attach to your deadlocked >>> processes and confirm that they are stopped in the pmix.fence? >>> >>> George. >>> >>> >>> On Sat, Dec 13, 2014 at 8:47 AM, Alex A. Schmidt wrote: >>> Hi Sorry, I was calling mpi_comm_disconnect on the group comm handler, not on the intercomm handler returned from the spawn call as it should be. Well, calling the disconnect on the intercomm handler does halt the spwaner side but the wait is never completed since, as George points out, there is no disconnect call being made on the spawnee side and that brings me back to the beginning of the problem since, being a third party app, that call would never be there. I guess an mpi wrapper to deal with that could be made for the app, but I fell the wrapper itself, at the end, would face the same problem we face right now. My application is a genetic algorithm code that search optimal configuration (minimum or maximum energy) of cluster of atoms. The work flow bottleneck is the calculation of the cluster energy. For the cases which an analytical potential is available the calculation can be made internally and the workload is distributed among slaves nodes from a master node. This is also done when an analytical potential is not available and the energy calculation must be done externally by a quantum chemistry code like dftb+, siesta and Gaussian. So far, we have been running these codes in serial mode. No need to say that we could do a lot better if they could be executed in parallel. I am not familiar with DMRAA but it seems to be the right choice to deal with job schedulers as it covers the ones I am interested in (pbs/torque and loadlever). Alex 2014-12-13 7:49 GMT-02:00 Gilles Gouaillardet < gilles.gouaillar...@gmail.com>: > > George is right about the semantic > > However i am surprised it returns immediatly... > That should either work or hang imho > > The second point is no more mpi related, and is batch manager specific. > > You will likely find a submit parameter to make the command block > until the job completes. Or you can write your own wrapper. > Or you can retrieve the jobid and qstat periodically to get the job > state. > If an api is available, this is also an option. > > Cheers, > >
Re: [OMPI users] OMPI users] OMPI users] OMPI users] MPI inside MPI (still)
You should be able to just include that in your argv that you pass to the Comm_spawn API. On Mon, Dec 15, 2014 at 9:27 AM, Alex A. Schmidt wrote: > > George, > > Thanks for the tip. In fact, calling mpi_comm_spawn right away with MPI > _COMM_SELF > has worked for me just as well -- no subgroups needed at all. > > I am testing this openmpi app named "siesta" in parallel. The source code > is available, > so making it "spawn ready" by adding the pair mpi_comm_get_parent + > mpi_comm_disconnect > > into the main code can be done. If it works, maybe the siesta's > developers can be convinced > to add this feature in a future release. > > However, siesta is launched only by specifying input/output files with > i/o redirection like > > mpirun -n <*some number*> siesta < infile > outfile > > So far, I could not find anything about how to set an stdin file for an > spawnee process. > Specifiyng it in a app context file doesn't seem to work. Can it be done? > Maybe through > an MCA parameter? > > Alex > > > > > > 2014-12-15 2:43 GMT-02:00 George Bosilca : >> >> Alex, >> >> The code looks good, and is 100% MPI standard accurate. >> >> I would change the way you create the subcoms in the parent. You do a lot >> of useless operations, as you can achieve exactly the same outcome (one >> communicator per node), either by duplicating MPI_COMM_SELF or doing an >> MPI_Comm_split with the color equal to your rank. >> >> George. >> >> >> On Sun, Dec 14, 2014 at 2:20 AM, Alex A. Schmidt wrote: >> >>> Hi, >>> >>> Sorry, guys. I don't think the newbie here can follow any discussion >>> beyond basic mpi... >>> >>> Anyway, if I add the pair >>> >>> call MPI_COMM_GET_PARENT(mpi_comm_parent,ierror) >>> call MPI_COMM_DISCONNECT(mpi_comm_parent,ierror) >>> >>> on the spawnee side I get the proper response in the spawning processes. >>> >>> Please, take a look at the attached toy codes parent.F and child.F >>> I've been playing with. 'mpirun -n 2 parent' seems to work as expected. >>> >>> Alex >>> >>> 2014-12-13 23:46 GMT-02:00 Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com>: Alex, Are you calling MPI_Comm_disconnect in the 3 "master" tasks and with the same remote communicator ? I also read the man page again, and MPI_Comm_disconnect does not ensure the remote processes have finished or called MPI_Comm_disconnect, so that might not be the thing you need. George, can you please comment on that ? Cheers, Gilles George Bosilca wrote: MPI_Comm_disconnect should be a local operation, there is no reason for it to deadlock. I looked at the code and everything is local with the exception of a call to PMIX.FENCE. Can you attach to your deadlocked processes and confirm that they are stopped in the pmix.fence? George. On Sat, Dec 13, 2014 at 8:47 AM, Alex A. Schmidt wrote: > Hi > > Sorry, I was calling mpi_comm_disconnect on the group comm handler, not > on the intercomm handler returned from the spawn call as it should be. > > Well, calling the disconnect on the intercomm handler does halt the > spwaner > side but the wait is never completed since, as George points out, > there is no > disconnect call being made on the spawnee side and that brings me > back > to the beginning of the problem since, being a third party app, that > call would > never be there. I guess an mpi wrapper to deal with that could be made > for > the app, but I fell the wrapper itself, at the end, would face the > same problem > we face right now. > > My application is a genetic algorithm code that search optimal > configuration > (minimum or maximum energy) of cluster of atoms. The work flow > bottleneck > is the calculation of the cluster energy. For the cases which an > analytical > potential is available the calculation can be made internally and the > workload > is distributed among slaves nodes from a master node. This is also done > when an analytical potential is not available and the energy > calculation must > be done externally by a quantum chemistry code like dftb+, siesta and > Gaussian. > So far, we have been running these codes in serial mode. No need to > say that > we could do a lot better if they could be executed in parallel. > > I am not familiar with DMRAA but it seems to be the right choice to > deal with > job schedulers as it covers the ones I am interested in (pbs/torque > and loadlever). > > Alex > > 2014-12-13 7:49 GMT-02:00 Gilles Gouaillardet < > gilles.gouaillar...@gmail.com>: >> >> George is right about the semantic >> >> However i am surprised it returns immediatly... >> That should either work or hang imho >> >> The second point is no more mpi related, and is batch manager >> s
Re: [OMPI users] OMPI users] OMPI users] OMPI users] MPI inside MPI (still)
Ralph, I guess you mean "call mpi_comm_spawn( 'siesta', '< infile' , 2 ,...)" to execute 'mpirun -n 2 siesta < infile' on the spawnee side. That was my first choice. Well, siesta behaves as if no stdin file was present... Alex 2014-12-15 17:07 GMT-02:00 Ralph Castain : > > You should be able to just include that in your argv that you pass to the > Comm_spawn API. > > > On Mon, Dec 15, 2014 at 9:27 AM, Alex A. Schmidt wrote: > >> George, >> >> Thanks for the tip. In fact, calling mpi_comm_spawn right away with MPI >> _COMM_SELF >> has worked for me just as well -- no subgroups needed at all. >> >> I am testing this openmpi app named "siesta" in parallel. The source >> code is available, >> so making it "spawn ready" by adding the pair mpi_comm_get_parent + >> mpi_comm_disconnect >> >> into the main code can be done. If it works, maybe the siesta's >> developers can be convinced >> to add this feature in a future release. >> >> However, siesta is launched only by specifying input/output files with >> i/o redirection like >> >> mpirun -n <*some number*> siesta < infile > outfile >> >> So far, I could not find anything about how to set an stdin file for an >> spawnee process. >> Specifiyng it in a app context file doesn't seem to work. Can it be done? >> Maybe through >> an MCA parameter? >> >> Alex >> >> >> >> >> >> 2014-12-15 2:43 GMT-02:00 George Bosilca : >>> >>> Alex, >>> >>> The code looks good, and is 100% MPI standard accurate. >>> >>> I would change the way you create the subcoms in the parent. You do a >>> lot of useless operations, as you can achieve exactly the same outcome (one >>> communicator per node), either by duplicating MPI_COMM_SELF or doing an >>> MPI_Comm_split with the color equal to your rank. >>> >>> George. >>> >>> >>> On Sun, Dec 14, 2014 at 2:20 AM, Alex A. Schmidt wrote: >>> Hi, Sorry, guys. I don't think the newbie here can follow any discussion beyond basic mpi... Anyway, if I add the pair call MPI_COMM_GET_PARENT(mpi_comm_parent,ierror) call MPI_COMM_DISCONNECT(mpi_comm_parent,ierror) on the spawnee side I get the proper response in the spawning processes. Please, take a look at the attached toy codes parent.F and child.F I've been playing with. 'mpirun -n 2 parent' seems to work as expected. Alex 2014-12-13 23:46 GMT-02:00 Gilles Gouaillardet < gilles.gouaillar...@gmail.com>: > > Alex, > > Are you calling MPI_Comm_disconnect in the 3 "master" tasks and with > the same remote communicator ? > > I also read the man page again, and MPI_Comm_disconnect does not > ensure the remote processes have finished or called MPI_Comm_disconnect, > so > that might not be the thing you need. > George, can you please comment on that ? > > Cheers, > > Gilles > > George Bosilca wrote: > MPI_Comm_disconnect should be a local operation, there is no reason > for it to deadlock. I looked at the code and everything is local with the > exception of a call to PMIX.FENCE. Can you attach to your deadlocked > processes and confirm that they are stopped in the pmix.fence? > > George. > > > On Sat, Dec 13, 2014 at 8:47 AM, Alex A. Schmidt wrote: > >> Hi >> >> Sorry, I was calling mpi_comm_disconnect on the group comm handler, >> not >> on the intercomm handler returned from the spawn call as it should be. >> >> Well, calling the disconnect on the intercomm handler does halt the >> spwaner >> side but the wait is never completed since, as George points out, >> there is no >> disconnect call being made on the spawnee side and that brings me >> back >> to the beginning of the problem since, being a third party app, that >> call would >> never be there. I guess an mpi wrapper to deal with that could be >> made for >> the app, but I fell the wrapper itself, at the end, would face the >> same problem >> we face right now. >> >> My application is a genetic algorithm code that search optimal >> configuration >> (minimum or maximum energy) of cluster of atoms. The work flow >> bottleneck >> is the calculation of the cluster energy. For the cases which an >> analytical >> potential is available the calculation can be made internally and the >> workload >> is distributed among slaves nodes from a master node. This is also >> done >> when an analytical potential is not available and the energy >> calculation must >> be done externally by a quantum chemistry code like dftb+, siesta and >> Gaussian. >> So far, we have been running these codes in serial mode. No need to >> say that >> we could do a lot better if they could be executed in parallel. >> >> I am not familiar with DMRAA but it seems to be the right choice to >> deal with >> job sc
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Eric, thanks for the simple test program. i think i see what is going wrong and i will make some changes to avoid the memory overflow. that being said, there is a hard coded limit of 256 characters, and your path is bigger than 300 characters. bottom line, and even if there is no more memory overflow, that cannot work as expected. i will report this to the mpich folks, since romio is currently imported from mpich. Cheers, Gilles On 2014/12/16 0:16, Eric Chamberland wrote: > Hi Gilles, > > just created a very simple test case! > > with this setup, you will see the bug with valgrind: > > export > too_long=./this/is/a_very/long/path/that/contains/a/not/so/long/filename/but/trying/to/collectively/mpi_file_open/it/you/will/have/a/memory/corruption/resulting/of/invalide/writing/or/reading/past/the/end/of/one/or/some/hidden/strings/in/mpio/Simple/user/would/like/to/have/the/parameter/checked/and/an/error/returned/or/this/limit/removed > > mpicc -o bug_MPI_File_open_path_too_long > bug_MPI_File_open_path_too_long.c > > mkdir -p $too_long > echo "header of a text file" > $too_long/toto.txt > > mpirun -np 2 valgrind ./bug_MPI_File_open_path_too_long > $too_long/toto.txt > > and watch the errors! > > unfortunately, the memory corruptions here doesn't seem to segfault > this simple test case, but in my case, it is fatal and with valgrind, > it is reported... > > OpenMPI 1.6.5, 1.8.3rc3 are affected > > MPICH-3.1.3 also have the error! > > thanks, > > Eric >
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Eric and all, That is clearly a limitation in romio, and this is being tracked at https://trac.mpich.org/projects/mpich/ticket/2212 in the mean time, what we can do in OpenMPI is update mca_io_romio_file_open() and fails with a user friendly error message if strlen(filename) is larger that 225. Cheers, Gilles On 2014/12/16 12:43, Gilles Gouaillardet wrote: > Eric, > > thanks for the simple test program. > > i think i see what is going wrong and i will make some changes to avoid > the memory overflow. > > that being said, there is a hard coded limit of 256 characters, and your > path is bigger than 300 characters. > bottom line, and even if there is no more memory overflow, that cannot > work as expected. > > i will report this to the mpich folks, since romio is currently imported > from mpich. > > Cheers, > > Gilles > > On 2014/12/16 0:16, Eric Chamberland wrote: >> Hi Gilles, >> >> just created a very simple test case! >> >> with this setup, you will see the bug with valgrind: >> >> export >> too_long=./this/is/a_very/long/path/that/contains/a/not/so/long/filename/but/trying/to/collectively/mpi_file_open/it/you/will/have/a/memory/corruption/resulting/of/invalide/writing/or/reading/past/the/end/of/one/or/some/hidden/strings/in/mpio/Simple/user/would/like/to/have/the/parameter/checked/and/an/error/returned/or/this/limit/removed >> >> mpicc -o bug_MPI_File_open_path_too_long >> bug_MPI_File_open_path_too_long.c >> >> mkdir -p $too_long >> echo "header of a text file" > $too_long/toto.txt >> >> mpirun -np 2 valgrind ./bug_MPI_File_open_path_too_long >> $too_long/toto.txt >> >> and watch the errors! >> >> unfortunately, the memory corruptions here doesn't seem to segfault >> this simple test case, but in my case, it is fatal and with valgrind, >> it is reported... >> >> OpenMPI 1.6.5, 1.8.3rc3 are affected >> >> MPICH-3.1.3 also have the error! >> >> thanks, >> >> Eric >> > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26005.php