[OMPI users] OpenMPI 1.6.5 and IBM-AIX

2013-07-06 Thread Ilias Miroslav
Dear experts,

I am trying to build up OpenMPI 1.6.5 package with the AIX compiler suite:

./configure --prefix=/gpfs/home/ilias/bin/openmpi_xl  CXX=xlC CC=xlc F77=xlf 
FC=xlf90
xl fortran is of version 13.01, xlc/C is 11.01

Configuration goes well, but the compilation fails. Any help, please ?



Making all in mca/timer/aix
make[2]: Entering directory 
`/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal/mca/timer/aix'
  CC timer_aix_component.lo
"timer_aix_component.c", line 68.10: 1506-045 (S) Undeclared identifier 
OPAL_SUCCESS.
"timer_aix_component.c", line 69.1: 1506-162 (W) No definition was found for 
function opal_atomic_sub_32. Storage class changed to extern.
"timer_aix_component.c", line 69.1: 1506-162 (W) No definition was found for 
function opal_atomic_add_32. Storage class changed to extern.
make[2]: *** [timer_aix_component.lo] Error 1
make[2]: Leaving directory 
`/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal/mca/timer/aix'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal'
make: *** [all-recursive] Error 1
ilias@147.213.80.175:~/bin/openmpi_xl/openmpi-1.6.5/.



Re: [OMPI users] OpenMPI 1.6.5 and IBM-AIX

2013-07-06 Thread Ilias Miroslav
Hi again,

even for GNU compilers the OpenMPI compilation fails on AIX:
. 
. 
. 
Making all in mca/timer/aix
make[2]: Entering directory 
`/gpfs/home/ilias/bin/openmpi_gnu/openmpi-1.6.5/opal/mca/timer/aix'
  CC timer_aix_component.lo
timer_aix_component.c: In function 'opal_timer_aix_open':
timer_aix_component.c:68:10: error: 'OPAL_SUCCESS' undeclared (first use in 
this function)
timer_aix_component.c:68:10: note: each undeclared identifier is reported only 
once for each function it appears in
timer_aix_component.c: At top level:
../../../../opal/include/opal/sys/atomic.h:393:9: warning: 'opal_atomic_add_32' 
used but never defined [enabled by default]
../../../../opal/include/opal/sys/atomic.h:403:9: warning: 'opal_atomic_sub_32' 
used but never defined [enabled by default]
make[2]: *** [timer_aix_component.lo] Error 1
make[2]: Leaving directory 
`/gpfs/home/ilias/bin/openmpi_gnu/openmpi-1.6.5/opal/mca/timer/aix'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/gpfs/home/ilias/bin/openmpi_gnu/openmpi-1.6.5/opal'
make: *** [all-recursive] Error 1


From: Ilias Miroslav
Sent: Saturday, July 06, 2013 1:51 PM
To: us...@open-mpi.org
Subject: OpenMPI 1.6.5 and IBM-AIX

Dear experts,

I am trying to build up OpenMPI 1.6.5 package with the AIX compiler suite:

./configure --prefix=/gpfs/home/ilias/bin/openmpi_xl  CXX=xlC CC=xlc F77=xlf 
FC=xlf90
xl fortran is of version 13.01, xlc/C is 11.01

Configuration goes well, but the compilation fails. Any help, please ?



Making all in mca/timer/aix
make[2]: Entering directory 
`/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal/mca/timer/aix'
  CC timer_aix_component.lo
"timer_aix_component.c", line 68.10: 1506-045 (S) Undeclared identifier 
OPAL_SUCCESS.
"timer_aix_component.c", line 69.1: 1506-162 (W) No definition was found for 
function opal_atomic_sub_32. Storage class changed to extern.
"timer_aix_component.c", line 69.1: 1506-162 (W) No definition was found for 
function opal_atomic_add_32. Storage class changed to extern.
make[2]: *** [timer_aix_component.lo] Error 1
make[2]: Leaving directory 
`/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal/mca/timer/aix'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal'
make: *** [all-recursive] Error 1
ilias@147.213.80.175:~/bin/openmpi_xl/openmpi-1.6.5/.



Re: [OMPI users] OpenMPI 1.6.5 and IBM-AIX

2013-07-06 Thread Ralph Castain
We haven't had access to an AIX machine in quite some time, so it isn't a big 
surprise that things have bit-rotted. If you're willing to debug, we can try to 
provide fixes. Just may take a bit to complete.


On Jul 6, 2013, at 9:49 AM, Ilias Miroslav  wrote:

> Hi again,
> 
> even for GNU compilers the OpenMPI compilation fails on AIX:
> .
> .
> .
> Making all in mca/timer/aix
> make[2]: Entering directory 
> `/gpfs/home/ilias/bin/openmpi_gnu/openmpi-1.6.5/opal/mca/timer/aix'
>  CC timer_aix_component.lo
> timer_aix_component.c: In function 'opal_timer_aix_open':
> timer_aix_component.c:68:10: error: 'OPAL_SUCCESS' undeclared (first use in 
> this function)
> timer_aix_component.c:68:10: note: each undeclared identifier is reported 
> only once for each function it appears in
> timer_aix_component.c: At top level:
> ../../../../opal/include/opal/sys/atomic.h:393:9: warning: 
> 'opal_atomic_add_32' used but never defined [enabled by default]
> ../../../../opal/include/opal/sys/atomic.h:403:9: warning: 
> 'opal_atomic_sub_32' used but never defined [enabled by default]
> make[2]: *** [timer_aix_component.lo] Error 1
> make[2]: Leaving directory 
> `/gpfs/home/ilias/bin/openmpi_gnu/openmpi-1.6.5/opal/mca/timer/aix'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory 
> `/gpfs/home/ilias/bin/openmpi_gnu/openmpi-1.6.5/opal'
> make: *** [all-recursive] Error 1
> 
> 
> From: Ilias Miroslav
> Sent: Saturday, July 06, 2013 1:51 PM
> To: us...@open-mpi.org
> Subject: OpenMPI 1.6.5 and IBM-AIX
> 
> Dear experts,
> 
> I am trying to build up OpenMPI 1.6.5 package with the AIX compiler suite:
> 
> ./configure --prefix=/gpfs/home/ilias/bin/openmpi_xl  CXX=xlC CC=xlc F77=xlf 
> FC=xlf90
> xl fortran is of version 13.01, xlc/C is 11.01
> 
> Configuration goes well, but the compilation fails. Any help, please ?
> 
> 
> 
> Making all in mca/timer/aix
> make[2]: Entering directory 
> `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal/mca/timer/aix'
>  CC timer_aix_component.lo
> "timer_aix_component.c", line 68.10: 1506-045 (S) Undeclared identifier 
> OPAL_SUCCESS.
> "timer_aix_component.c", line 69.1: 1506-162 (W) No definition was found for 
> function opal_atomic_sub_32. Storage class changed to extern.
> "timer_aix_component.c", line 69.1: 1506-162 (W) No definition was found for 
> function opal_atomic_add_32. Storage class changed to extern.
> make[2]: *** [timer_aix_component.lo] Error 1
> make[2]: Leaving directory 
> `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal/mca/timer/aix'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory 
> `/gpfs/home/ilias/bin/openmpi_xl/openmpi-1.6.5/opal'
> make: *** [all-recursive] Error 1
> ilias@147.213.80.175:~/bin/openmpi_xl/openmpi-1.6.5/.
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] openmpi 1.6.3 fails to identify local host if its IP is 127.0.1.1

2013-07-06 Thread Ralph Castain

On Jul 3, 2013, at 1:00 PM, Riccardo Murri  wrote:

> Hi Jeff, Ralph,
> 
> first of all: thanks for your work on this!
> 
> On 3 July 2013 21:09, Jeff Squyres (jsquyres)  wrote:
>> 1. The root cause of the issue is that you are assigning a
>> non-existent IP address to a name.  I.e.,  maps to 127.0.1.1,
>> but that IP address does not exist anywhere.  Hence, OMPI will never
>> conclude that that  is "local".  If you had assigned  to
>> the 127.0.0.1 address, things should have worked fine.
> 
> Ok, I see.  Would that have worked also if I had added the 127.0.1.1
> address to the "lo" interface (in addition to 127.0.0.1)?

Probably, but I can't say for sure.

> 
> 
>> Just curious: why are you doing this?
> 
> It's commonplace in Ubuntu/Debian installations; see, e.g.,
> http://serverfault.com/questions/363095/what-does-127-0-1-1-represent-in-etc-hosts
> 
> In our case, it was rolled out as a fix for some cron job running on
> Apache servers (apparently Debian's Apache looks up 127.0.1.1 and uses
> that as the ServerName, unless a server name is not explicitly
> configured), which was later extended to all hosts because "what harm
> can it do?".
> 
> (Needless to say, we have rolled back the change.)

Weird - never heard of that before!

> 
> 
>> 2. That being said, OMPI is not currently looking at all the
>> responses from gethostbyname() -- we're only looking at the first
>> one.  In the spirit of how clients are supposed to behave when
>> multiple IP addresses are returned from a single name lookup, OMPI
>> should examine all of those addresses and see if it finds one that
>> it "likes", and then use that.  So we should extend OMPI to examine
>> all the IP addresses from gethostbyname().
> 
> Just for curiosity: would it have worked, had I compiled OMPI with
> IPv6 support?  (As far as I understand IPv6, an application is
> required to examine all the addresses returned for a host name, and
> not just pick the first one.)

Actually, yes - for some reason, the code path when IPv6 support is enabled had 
already been extended to look at all addresses. Not sure why, but that change 
was never carried over to the IPv6-disabled code path. I've done so now, so 
this won't be a problem in the future.

> 
> 
>> Ralph is going to work on this, but it'll likely take him a little
>> time to get it done.  We'll get it into the trunk and probably ask
>> you to verify that it works for you.  And if so, we'll back-port to
>> the v1.6 and v1.7 series.
> 
> I'm glad to help and verify, but I guess we do not need the backport
> or an urgent fix.  The easy workaround for us was to remove the
> 127.0.1.1 line from the compute nodes (we keep it only on Apache
> servers where it originated).

Glad you found an easy solution!
Ralph

> 
> Thanks,
> Riccardo
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Trouble with MPI_Recv not filling buffer

2013-07-06 Thread Patrick Brückner

Hello,

I am currently learning MPI and there's this problem that I have been 
dealing with very long now. I am trying to receive a struct, and in some 
very specific cases (when I run with 2/3/4 instances and only 
calculating exactly the same number of data). For some weird reason it 
seems to work as soon as I have a lot of data to calculate (starting 
with N=5, I cannot reproduce the problem).


--- snip ---
data p;
p.collection = malloc(sizeof(int)*N);

printf("[%d] before receiving, data id %d at %d with direction 
%d\n",me,p.id,p.position,p.direction);


MPI_Status data_status;
MPI_Recv(&p,1,MPI_data,MPI_ANY_SOURCE,99,MPI_COMM_WORLD,&data_status);
if(data_status.MPI_ERROR != MPI_SUCCESS) {
printf("[%d] ERROR %d",me,data_status.MPI_ERROR);
return -1;
}
printf("[%d] received status %d\n",data_status.MPI_ERROR);
received++;
printf("[%0d] received data %d (%d/%d) at position %d with direction 
%d\n",me,p.id,received,expected,p.position,p.direction);

--- snip ---

I get this output:

[1] before receiving, data id -1665002272 at 0 with direction 0
[0] received status 0
[1] received data -1665002272 (1/2) at position 0 with direction 0

I am wondering if you had any hint for me, why data is still not having 
the correct data but just the old, uninitialized values, and why I don't 
get any error. Also I really have no idea, why instance 0 is printing 
this status information, as it does not enter this section at all. Is 
this some kind of optimazation that I have to turn off?


Thanks for all hints,
Patrick


[OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2

2013-07-06 Thread Michael Thomadakis
Hello OpenMPI,

I am wondering what level of support is there for CUDA and GPUdirect on
OpenMPI 1.6.5 and 1.7.2.

I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, it
seems that with configure v1.6.5 it was ignored.

Can you identify GPU memory and send messages from it directly without
copying to host memory first?


Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? Do
you support SDK 5.0 and above?

Cheers ...
Michael


Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2

2013-07-06 Thread Ralph Castain
Rolf will have to answer the question on level of support. The CUDA code is not 
in the 1.6 series as it was developed after that series went "stable". It is in 
the 1.7 series, although the level of support will likely be incrementally 
increasing as that "feature" series continues to evolve.


On Jul 6, 2013, at 12:06 PM, Michael Thomadakis  
wrote:

> Hello OpenMPI,
> 
> I am wondering what level of support is there for CUDA and GPUdirect on 
> OpenMPI 1.6.5 and 1.7.2.
> 
> I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, it 
> seems that with configure v1.6.5 it was ignored.
> 
> Can you identify GPU memory and send messages from it directly without 
> copying to host memory first? 
> 
> 
> Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? Do 
> you support SDK 5.0 and above?
> 
> Cheers ...
> Michael
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Question on handling of memory for communications

2013-07-06 Thread Michael Thomadakis
Hello OpenMPI,

When you stack runs on SandyBridge nodes atached to HCAs ove PCI3 *gen
3*do you pay any special attention to the memory buffers according to
which
socket/memory controller  their physical memory belongs to?

For instance, if the HCA is attached to the PCIgen3 lanes of Socket 1 do
you do anything special when the read/write buffers map to physical memory
belonging to Socket 2? Or do you7 avoid using buffers mapping ro memory
that belongs (is accessible via) the other socket?

Has this situation improved with Ivy-Brige systems or Haswell?

Cheers
Michael


Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2

2013-07-06 Thread Michael Thomadakis
thanks,

Do you guys have any plan to support Intel Phi in the future? That is,
running MPI code on the Phi cards or across the multicore and Phi, as Intel
MPI does?

thanks...
Michael


On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain  wrote:

> Rolf will have to answer the question on level of support. The CUDA code
> is not in the 1.6 series as it was developed after that series went
> "stable". It is in the 1.7 series, although the level of support will
> likely be incrementally increasing as that "feature" series continues to
> evolve.
>
>
> On Jul 6, 2013, at 12:06 PM, Michael Thomadakis 
> wrote:
>
> > Hello OpenMPI,
> >
> > I am wondering what level of support is there for CUDA and GPUdirect on
> OpenMPI 1.6.5 and 1.7.2.
> >
> > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However,
> it seems that with configure v1.6.5 it was ignored.
> >
> > Can you identify GPU memory and send messages from it directly without
> copying to host memory first?
> >
> >
> > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ?
> Do you support SDK 5.0 and above?
> >
> > Cheers ...
> > Michael
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2

2013-07-06 Thread Ralph Castain
There was discussion of this on a prior email thread on the OMPI devel mailing 
list:

http://www.open-mpi.org/community/lists/devel/2013/05/12354.php


On Jul 6, 2013, at 2:01 PM, Michael Thomadakis  wrote:

> thanks,
> 
> Do you guys have any plan to support Intel Phi in the future? That is, 
> running MPI code on the Phi cards or across the multicore and Phi, as Intel 
> MPI does?
> 
> thanks...
> Michael
> 
> 
> On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain  wrote:
> Rolf will have to answer the question on level of support. The CUDA code is 
> not in the 1.6 series as it was developed after that series went "stable". It 
> is in the 1.7 series, although the level of support will likely be 
> incrementally increasing as that "feature" series continues to evolve.
> 
> 
> On Jul 6, 2013, at 12:06 PM, Michael Thomadakis  
> wrote:
> 
> > Hello OpenMPI,
> >
> > I am wondering what level of support is there for CUDA and GPUdirect on 
> > OpenMPI 1.6.5 and 1.7.2.
> >
> > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, it 
> > seems that with configure v1.6.5 it was ignored.
> >
> > Can you identify GPU memory and send messages from it directly without 
> > copying to host memory first?
> >
> >
> > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? Do 
> > you support SDK 5.0 and above?
> >
> > Cheers ...
> > Michael
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users