[OMPI users] OpenMPI 1.6.5 and IBM-AIX
Dear experts, I am trying to build up OpenMPI 1.6.5 package with the AIX compiler suite: ./configure --prefix=/gpfs/home/ilias/bin/openmpi_xl CXX=xlC CC=xlc F77=xlf FC=xlf90 xl fortran is of version 13.01, xlc/C is 11.01 Configuration goes well, but the compilation fails. Any help, please ? Making all in mca/timer/aix make[2]: Entering directory `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal/mca/timer/aix' CC timer_aix_component.lo "timer_aix_component.c", line 68.10: 1506-045 (S) Undeclared identifier OPAL_SUCCESS. "timer_aix_component.c", line 69.1: 1506-162 (W) No definition was found for function opal_atomic_sub_32. Storage class changed to extern. "timer_aix_component.c", line 69.1: 1506-162 (W) No definition was found for function opal_atomic_add_32. Storage class changed to extern. make[2]: *** [timer_aix_component.lo] Error 1 make[2]: Leaving directory `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal/mca/timer/aix' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal' make: *** [all-recursive] Error 1 ilias@147.213.80.175:~/bin/openmpi_xl/openmpi-1.6.5/.
Re: [OMPI users] OpenMPI 1.6.5 and IBM-AIX
Hi again, even for GNU compilers the OpenMPI compilation fails on AIX: . . . Making all in mca/timer/aix make[2]: Entering directory `/gpfs/home/ilias/bin/openmpi_gnu/openmpi-1.6.5/opal/mca/timer/aix' CC timer_aix_component.lo timer_aix_component.c: In function 'opal_timer_aix_open': timer_aix_component.c:68:10: error: 'OPAL_SUCCESS' undeclared (first use in this function) timer_aix_component.c:68:10: note: each undeclared identifier is reported only once for each function it appears in timer_aix_component.c: At top level: ../../../../opal/include/opal/sys/atomic.h:393:9: warning: 'opal_atomic_add_32' used but never defined [enabled by default] ../../../../opal/include/opal/sys/atomic.h:403:9: warning: 'opal_atomic_sub_32' used but never defined [enabled by default] make[2]: *** [timer_aix_component.lo] Error 1 make[2]: Leaving directory `/gpfs/home/ilias/bin/openmpi_gnu/openmpi-1.6.5/opal/mca/timer/aix' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/gpfs/home/ilias/bin/openmpi_gnu/openmpi-1.6.5/opal' make: *** [all-recursive] Error 1 From: Ilias Miroslav Sent: Saturday, July 06, 2013 1:51 PM To: us...@open-mpi.org Subject: OpenMPI 1.6.5 and IBM-AIX Dear experts, I am trying to build up OpenMPI 1.6.5 package with the AIX compiler suite: ./configure --prefix=/gpfs/home/ilias/bin/openmpi_xl CXX=xlC CC=xlc F77=xlf FC=xlf90 xl fortran is of version 13.01, xlc/C is 11.01 Configuration goes well, but the compilation fails. Any help, please ? Making all in mca/timer/aix make[2]: Entering directory `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal/mca/timer/aix' CC timer_aix_component.lo "timer_aix_component.c", line 68.10: 1506-045 (S) Undeclared identifier OPAL_SUCCESS. "timer_aix_component.c", line 69.1: 1506-162 (W) No definition was found for function opal_atomic_sub_32. Storage class changed to extern. "timer_aix_component.c", line 69.1: 1506-162 (W) No definition was found for function opal_atomic_add_32. Storage class changed to extern. make[2]: *** [timer_aix_component.lo] Error 1 make[2]: Leaving directory `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal/mca/timer/aix' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal' make: *** [all-recursive] Error 1 ilias@147.213.80.175:~/bin/openmpi_xl/openmpi-1.6.5/.
Re: [OMPI users] OpenMPI 1.6.5 and IBM-AIX
We haven't had access to an AIX machine in quite some time, so it isn't a big surprise that things have bit-rotted. If you're willing to debug, we can try to provide fixes. Just may take a bit to complete. On Jul 6, 2013, at 9:49 AM, Ilias Miroslav wrote: > Hi again, > > even for GNU compilers the OpenMPI compilation fails on AIX: > . > . > . > Making all in mca/timer/aix > make[2]: Entering directory > `/gpfs/home/ilias/bin/openmpi_gnu/openmpi-1.6.5/opal/mca/timer/aix' > CC timer_aix_component.lo > timer_aix_component.c: In function 'opal_timer_aix_open': > timer_aix_component.c:68:10: error: 'OPAL_SUCCESS' undeclared (first use in > this function) > timer_aix_component.c:68:10: note: each undeclared identifier is reported > only once for each function it appears in > timer_aix_component.c: At top level: > ../../../../opal/include/opal/sys/atomic.h:393:9: warning: > 'opal_atomic_add_32' used but never defined [enabled by default] > ../../../../opal/include/opal/sys/atomic.h:403:9: warning: > 'opal_atomic_sub_32' used but never defined [enabled by default] > make[2]: *** [timer_aix_component.lo] Error 1 > make[2]: Leaving directory > `/gpfs/home/ilias/bin/openmpi_gnu/openmpi-1.6.5/opal/mca/timer/aix' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory > `/gpfs/home/ilias/bin/openmpi_gnu/openmpi-1.6.5/opal' > make: *** [all-recursive] Error 1 > > > From: Ilias Miroslav > Sent: Saturday, July 06, 2013 1:51 PM > To: us...@open-mpi.org > Subject: OpenMPI 1.6.5 and IBM-AIX > > Dear experts, > > I am trying to build up OpenMPI 1.6.5 package with the AIX compiler suite: > > ./configure --prefix=/gpfs/home/ilias/bin/openmpi_xl CXX=xlC CC=xlc F77=xlf > FC=xlf90 > xl fortran is of version 13.01, xlc/C is 11.01 > > Configuration goes well, but the compilation fails. Any help, please ? > > > > Making all in mca/timer/aix > make[2]: Entering directory > `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal/mca/timer/aix' > CC timer_aix_component.lo > "timer_aix_component.c", line 68.10: 1506-045 (S) Undeclared identifier > OPAL_SUCCESS. > "timer_aix_component.c", line 69.1: 1506-162 (W) No definition was found for > function opal_atomic_sub_32. Storage class changed to extern. > "timer_aix_component.c", line 69.1: 1506-162 (W) No definition was found for > function opal_atomic_add_32. Storage class changed to extern. > make[2]: *** [timer_aix_component.lo] Error 1 > make[2]: Leaving directory > `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal/mca/timer/aix' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory > `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal' > make: *** [all-recursive] Error 1 > ilias@147.213.80.175:~/bin/openmpi_xl/openmpi-1.6.5/. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] openmpi 1.6.3 fails to identify local host if its IP is 127.0.1.1
On Jul 3, 2013, at 1:00 PM, Riccardo Murri wrote: > Hi Jeff, Ralph, > > first of all: thanks for your work on this! > > On 3 July 2013 21:09, Jeff Squyres (jsquyres) wrote: >> 1. The root cause of the issue is that you are assigning a >> non-existent IP address to a name. I.e., maps to 127.0.1.1, >> but that IP address does not exist anywhere. Hence, OMPI will never >> conclude that that is "local". If you had assigned to >> the 127.0.0.1 address, things should have worked fine. > > Ok, I see. Would that have worked also if I had added the 127.0.1.1 > address to the "lo" interface (in addition to 127.0.0.1)? Probably, but I can't say for sure. > > >> Just curious: why are you doing this? > > It's commonplace in Ubuntu/Debian installations; see, e.g., > http://serverfault.com/questions/363095/what-does-127-0-1-1-represent-in-etc-hosts > > In our case, it was rolled out as a fix for some cron job running on > Apache servers (apparently Debian's Apache looks up 127.0.1.1 and uses > that as the ServerName, unless a server name is not explicitly > configured), which was later extended to all hosts because "what harm > can it do?". > > (Needless to say, we have rolled back the change.) Weird - never heard of that before! > > >> 2. That being said, OMPI is not currently looking at all the >> responses from gethostbyname() -- we're only looking at the first >> one. In the spirit of how clients are supposed to behave when >> multiple IP addresses are returned from a single name lookup, OMPI >> should examine all of those addresses and see if it finds one that >> it "likes", and then use that. So we should extend OMPI to examine >> all the IP addresses from gethostbyname(). > > Just for curiosity: would it have worked, had I compiled OMPI with > IPv6 support? (As far as I understand IPv6, an application is > required to examine all the addresses returned for a host name, and > not just pick the first one.) Actually, yes - for some reason, the code path when IPv6 support is enabled had already been extended to look at all addresses. Not sure why, but that change was never carried over to the IPv6-disabled code path. I've done so now, so this won't be a problem in the future. > > >> Ralph is going to work on this, but it'll likely take him a little >> time to get it done. We'll get it into the trunk and probably ask >> you to verify that it works for you. And if so, we'll back-port to >> the v1.6 and v1.7 series. > > I'm glad to help and verify, but I guess we do not need the backport > or an urgent fix. The easy workaround for us was to remove the > 127.0.1.1 line from the compute nodes (we keep it only on Apache > servers where it originated). Glad you found an easy solution! Ralph > > Thanks, > Riccardo > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Trouble with MPI_Recv not filling buffer
Hello, I am currently learning MPI and there's this problem that I have been dealing with very long now. I am trying to receive a struct, and in some very specific cases (when I run with 2/3/4 instances and only calculating exactly the same number of data). For some weird reason it seems to work as soon as I have a lot of data to calculate (starting with N=5, I cannot reproduce the problem). --- snip --- data p; p.collection = malloc(sizeof(int)*N); printf("[%d] before receiving, data id %d at %d with direction %d\n",me,p.id,p.position,p.direction); MPI_Status data_status; MPI_Recv(&p,1,MPI_data,MPI_ANY_SOURCE,99,MPI_COMM_WORLD,&data_status); if(data_status.MPI_ERROR != MPI_SUCCESS) { printf("[%d] ERROR %d",me,data_status.MPI_ERROR); return -1; } printf("[%d] received status %d\n",data_status.MPI_ERROR); received++; printf("[%0d] received data %d (%d/%d) at position %d with direction %d\n",me,p.id,received,expected,p.position,p.direction); --- snip --- I get this output: [1] before receiving, data id -1665002272 at 0 with direction 0 [0] received status 0 [1] received data -1665002272 (1/2) at position 0 with direction 0 I am wondering if you had any hint for me, why data is still not having the correct data but just the old, uninitialized values, and why I don't get any error. Also I really have no idea, why instance 0 is printing this status information, as it does not enter this section at all. Is this some kind of optimazation that I have to turn off? Thanks for all hints, Patrick
[OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2
Hello OpenMPI, I am wondering what level of support is there for CUDA and GPUdirect on OpenMPI 1.6.5 and 1.7.2. I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, it seems that with configure v1.6.5 it was ignored. Can you identify GPU memory and send messages from it directly without copying to host memory first? Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? Do you support SDK 5.0 and above? Cheers ... Michael
Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2
Rolf will have to answer the question on level of support. The CUDA code is not in the 1.6 series as it was developed after that series went "stable". It is in the 1.7 series, although the level of support will likely be incrementally increasing as that "feature" series continues to evolve. On Jul 6, 2013, at 12:06 PM, Michael Thomadakis wrote: > Hello OpenMPI, > > I am wondering what level of support is there for CUDA and GPUdirect on > OpenMPI 1.6.5 and 1.7.2. > > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, it > seems that with configure v1.6.5 it was ignored. > > Can you identify GPU memory and send messages from it directly without > copying to host memory first? > > > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? Do > you support SDK 5.0 and above? > > Cheers ... > Michael > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Question on handling of memory for communications
Hello OpenMPI, When you stack runs on SandyBridge nodes atached to HCAs ove PCI3 *gen 3*do you pay any special attention to the memory buffers according to which socket/memory controller their physical memory belongs to? For instance, if the HCA is attached to the PCIgen3 lanes of Socket 1 do you do anything special when the read/write buffers map to physical memory belonging to Socket 2? Or do you7 avoid using buffers mapping ro memory that belongs (is accessible via) the other socket? Has this situation improved with Ivy-Brige systems or Haswell? Cheers Michael
Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2
thanks, Do you guys have any plan to support Intel Phi in the future? That is, running MPI code on the Phi cards or across the multicore and Phi, as Intel MPI does? thanks... Michael On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain wrote: > Rolf will have to answer the question on level of support. The CUDA code > is not in the 1.6 series as it was developed after that series went > "stable". It is in the 1.7 series, although the level of support will > likely be incrementally increasing as that "feature" series continues to > evolve. > > > On Jul 6, 2013, at 12:06 PM, Michael Thomadakis > wrote: > > > Hello OpenMPI, > > > > I am wondering what level of support is there for CUDA and GPUdirect on > OpenMPI 1.6.5 and 1.7.2. > > > > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, > it seems that with configure v1.6.5 it was ignored. > > > > Can you identify GPU memory and send messages from it directly without > copying to host memory first? > > > > > > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? > Do you support SDK 5.0 and above? > > > > Cheers ... > > Michael > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2
There was discussion of this on a prior email thread on the OMPI devel mailing list: http://www.open-mpi.org/community/lists/devel/2013/05/12354.php On Jul 6, 2013, at 2:01 PM, Michael Thomadakis wrote: > thanks, > > Do you guys have any plan to support Intel Phi in the future? That is, > running MPI code on the Phi cards or across the multicore and Phi, as Intel > MPI does? > > thanks... > Michael > > > On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain wrote: > Rolf will have to answer the question on level of support. The CUDA code is > not in the 1.6 series as it was developed after that series went "stable". It > is in the 1.7 series, although the level of support will likely be > incrementally increasing as that "feature" series continues to evolve. > > > On Jul 6, 2013, at 12:06 PM, Michael Thomadakis > wrote: > > > Hello OpenMPI, > > > > I am wondering what level of support is there for CUDA and GPUdirect on > > OpenMPI 1.6.5 and 1.7.2. > > > > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, it > > seems that with configure v1.6.5 it was ignored. > > > > Can you identify GPU memory and send messages from it directly without > > copying to host memory first? > > > > > > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? Do > > you support SDK 5.0 and above? > > > > Cheers ... > > Michael > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users