Re: [OMPI users] compilation error with pgcc Unknown switch

2012-03-06 Thread Abhinav Sarje
I pulled fresh copy of the dev trunk, and tried building. It did not
change anything - I am still getting the same error:
../../../ompi/.libs/libmpi.so: undefined reference to
`opal_atomic_swap_64'

GNU version still builds fine.


On Tue, Mar 6, 2012 at 5:38 AM, Jeffrey Squyres  wrote:
> I disabled C++ inline assembly for PGI (we already had C inline assembly for 
> PGI).
>
> So I don't think this should have caused a new error... should it?
>
>
> On Mar 5, 2012, at 10:21 AM, Nathan Hjelm wrote:
>
>> Try pulling a fresh trunk. Jeff made a recent commit that may be relevant. 
>> Something about inline assembly being broken on PGI (I personally wouldn't 
>> recommend using that compiler unless you are using fortran).
>>
>> -Nathan
>>
>> On Sun, 4 Mar 2012, Abhinav Sarje wrote:
>>
>>> The same options/configuration in the GNU environment (compiler
>>> version 4.6.1) builds smoothly. PGI env still gives the aforementioned
>>> error. Has anyone experienced similar problem? May be some more flags
>>> need to be set for PGI?
>>>
>>>
>>>
>>> On Sat, Mar 3, 2012 at 10:58 PM, Abhinav Sarje  wrote:
 Hi, I am trying to compile 64 bits.

 On Fri, Mar 2, 2012 at 11:48 PM, George Bosilca  
 wrote:
> Something is definitively weird in your compilation environment.
>
> The "undefined" function is defined in atomic_impl.h as a static inline 
> (static inline int64_t opal_atomic_swap_64(volatile int64_t *addr,…). So 
> either the compiler should have complained during compilation, or it 
> should be inlined when you reach the linking step. Or, and this is the 
> unusual part, you're compiling 32 bits only (thus no atomic 64 bits are 
> available), and we are forcing atomic operations on a 64 bits value. That 
> would be quite strange …
>
> Are you trying to compile 32 or 64 bits?
>
>  george.
>
> On Mar 2, 2012, at 06:12 , Jeffrey Squyres wrote:
>
>> I'm going to have to defer this to those who regularly build on Crays...
>>
>> Sandia / LANL?
>>
>>
>> On Mar 2, 2012, at 12:12 AM, Abhinav Sarje wrote:
>>
>>> Hi again,
>>>
>>> I just tried building afresh -> svn co, autogen, configure, make. And
>>> it failed at the same point as before:
>> CCLD   ompi_info
>> ../../../ompi/.libs/libmpi.so: undefined reference to 
>> `opal_atomic_swap_64'
>>>
>>> Any more ideas/fixes?
>>>
>>> Thanks all.
>>> Abhinav.
>>>
>>> On Fri, Mar 2, 2012 at 8:14 AM, Abhinav Sarje  wrote:
 yes, I did a full autogen, configure, make clean and make all


 On Thu, Mar 1, 2012 at 10:03 PM, Jeffrey Squyres  
 wrote:
> Did you do a full autogen / configure / make clean / make all ?
>
>
> On Mar 1, 2012, at 8:53 AM, Abhinav Sarje wrote:
>
>> Thanks Ralph. That did help, but only till the next hurdle. Now the
>> build fails at the following point with an 'undefined reference':
>> ---
>> Making all in tools/ompi_info
>> make[2]: Entering directory
>> `/global/u1/a/asarje/hopper/openmpi-dev-trunk/build/ompi/tools/ompi_info'
>> CC     ompi_info.o
>> CC     output.o
>> CC     param.o
>> CC     components.o
>> CC     version.o
>> CCLD   ompi_info
>> ../../../ompi/.libs/libmpi.so: undefined reference to 
>> `opal_atomic_swap_64'
>> make[2]: *** [ompi_info] Error 2
>> make[2]: Leaving directory
>> `/global/u1/a/asarje/hopper/openmpi-dev-trunk/build/ompi/tools/ompi_info'
>> make[1]: *** [all-recursive] Error 1
>> make[1]: Leaving directory
>> `/global/u1/a/asarje/hopper/openmpi-dev-trunk/build/ompi'
>> make: *** [all-recursive] Error 1
>> ---
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 1, 2012 at 5:25 PM, Ralph Castain  
>> wrote:
>>> You need to update your source code - this was identified and fixed 
>>> on Wed. Unfortunately, our trunk is a developer's environment. 
>>> While we try hard to keep it fully functional, bugs do occasionally 
>>> work their way into the code.
>>>
>>> On Mar 1, 2012, at 1:37 AM, Abhinav Sarje wrote:
>>>
 Hi Nathan,

 I tried building on an internal login node, and it did not fail at 
 the
 previous point. But, after compiling for a very long time, it 
 failed
 while building libmpi.la, with a multiple definition error:
 --
 ...
 CC     mpiext/mpiext.lo
 CC     mpi/f77/base/mpi_f77_base_libmpi_f77_base_la-attr_fn_f.lo
 CC     
 mpi/f77/base/mpi_f77_base_libmpi_f77_base_la-conversion_fn_null_f.lo
 CC     
 mpi

Re: [OMPI users] Configure MPI_ADDRESS_KIND and MPI_OFFSET_KIND (OSX 10.6; gcc, g95, openmpi-1.4.5.)

2012-03-06 Thread Anthony Rollett
Yes, I can, and I did use gfortran, and that fixed the problem.  No need to 
edit mpif-config.h.  Moreover the program of interest compiled quite happily.  
Thank you very much indeed.  Did not realize that g95 was an outdated package.  
I can only think that I once had a problem with gfortran and tried g95 as a 
work-around.
regards
Tony

On Mar 5, 2012, at 3:09 PM, Jeffrey Squyres wrote:

> Can you use gfortran instead?  I don't think we've tested with g95 in years 
> (if ever).
> 
> Yes, you can manually edit mpif-config.h, if you need to.  I'm guessing that 
> OMPI's configure script got the wrong answer from g95, and therefore put in a 
> 0 for those values.  I don't know if we want to support that ancient 
> compiler; so upgrading to a new compiler or manually editing mpif-config.h 
> might be your best bet.  (but if you manually edit, I hope there's no other 
> as-yet-undiscovered dragons awaiting you around the corner...)
> 
> 
> On Mar 3, 2012, at 1:56 PM, Anthony Rollett wrote:
> 
>> Greetings - I am able to configure and install 1.4.5 with fortran support.  
>> However, when I try to use MPI_ADDRESS_KIND and MPI_OFFSET_KIND, I find that 
>> mpif-config.h has these parameters set to 0 (and I get compiler errors).  Is 
>> there a way to configure and have these be non-zero?  Is it safe to manually 
>> edit mpif-config.h?!  Obviously I can supply more details, but even a hint a 
>> two will probably allow me to solve this one.
>> Thanks, Tony Rollett.
>> 
>> PS: I used  ./configure --prefix=/Users/Shared/openmpi-1.4.5 CC=gcc F77=g95
>> F90=g95 CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64
>> LDFLAGS=-L/usr/local/lib CPPFLAGS=-I/usr/local/include
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 




Re: [OMPI users] compilation error with pgcc Unknown switch

2012-03-06 Thread Jeffrey Squyres
I'm afraid that I have neither a Cray nor the PGI compiler, so I'm not in a 
good position to help here.

Would someone with the PGI compiler give the trunk a whirl to see if disabling 
the CXX inline assembly for PGI broke something?  I'd be a little surprised, 
since we already had it disabled for C, but who knows...


On Mar 6, 2012, at 2:55 AM, Abhinav Sarje wrote:

> I pulled fresh copy of the dev trunk, and tried building. It did not
> change anything - I am still getting the same error:
> ../../../ompi/.libs/libmpi.so: undefined reference to
> `opal_atomic_swap_64'
> 
> GNU version still builds fine.
> 
> 
> On Tue, Mar 6, 2012 at 5:38 AM, Jeffrey Squyres  wrote:
>> I disabled C++ inline assembly for PGI (we already had C inline assembly for 
>> PGI).
>> 
>> So I don't think this should have caused a new error... should it?
>> 
>> 
>> On Mar 5, 2012, at 10:21 AM, Nathan Hjelm wrote:
>> 
>>> Try pulling a fresh trunk. Jeff made a recent commit that may be relevant. 
>>> Something about inline assembly being broken on PGI (I personally wouldn't 
>>> recommend using that compiler unless you are using fortran).
>>> 
>>> -Nathan
>>> 
>>> On Sun, 4 Mar 2012, Abhinav Sarje wrote:
>>> 
 The same options/configuration in the GNU environment (compiler
 version 4.6.1) builds smoothly. PGI env still gives the aforementioned
 error. Has anyone experienced similar problem? May be some more flags
 need to be set for PGI?
 
 
 
 On Sat, Mar 3, 2012 at 10:58 PM, Abhinav Sarje  wrote:
> Hi, I am trying to compile 64 bits.
> 
> On Fri, Mar 2, 2012 at 11:48 PM, George Bosilca  
> wrote:
>> Something is definitively weird in your compilation environment.
>> 
>> The "undefined" function is defined in atomic_impl.h as a static inline 
>> (static inline int64_t opal_atomic_swap_64(volatile int64_t *addr,…). So 
>> either the compiler should have complained during compilation, or it 
>> should be inlined when you reach the linking step. Or, and this is the 
>> unusual part, you're compiling 32 bits only (thus no atomic 64 bits are 
>> available), and we are forcing atomic operations on a 64 bits value. 
>> That would be quite strange …
>> 
>> Are you trying to compile 32 or 64 bits?
>> 
>>  george.
>> 
>> On Mar 2, 2012, at 06:12 , Jeffrey Squyres wrote:
>> 
>>> I'm going to have to defer this to those who regularly build on Crays...
>>> 
>>> Sandia / LANL?
>>> 
>>> 
>>> On Mar 2, 2012, at 12:12 AM, Abhinav Sarje wrote:
>>> 
 Hi again,
 
 I just tried building afresh -> svn co, autogen, configure, make. And
 it failed at the same point as before:
>>> CCLD   ompi_info
>>> ../../../ompi/.libs/libmpi.so: undefined reference to 
>>> `opal_atomic_swap_64'
 
 Any more ideas/fixes?
 
 Thanks all.
 Abhinav.
 
 On Fri, Mar 2, 2012 at 8:14 AM, Abhinav Sarje  wrote:
> yes, I did a full autogen, configure, make clean and make all
> 
> 
> On Thu, Mar 1, 2012 at 10:03 PM, Jeffrey Squyres  
> wrote:
>> Did you do a full autogen / configure / make clean / make all ?
>> 
>> 
>> On Mar 1, 2012, at 8:53 AM, Abhinav Sarje wrote:
>> 
>>> Thanks Ralph. That did help, but only till the next hurdle. Now the
>>> build fails at the following point with an 'undefined reference':
>>> ---
>>> Making all in tools/ompi_info
>>> make[2]: Entering directory
>>> `/global/u1/a/asarje/hopper/openmpi-dev-trunk/build/ompi/tools/ompi_info'
>>> CC ompi_info.o
>>> CC output.o
>>> CC param.o
>>> CC components.o
>>> CC version.o
>>> CCLD   ompi_info
>>> ../../../ompi/.libs/libmpi.so: undefined reference to 
>>> `opal_atomic_swap_64'
>>> make[2]: *** [ompi_info] Error 2
>>> make[2]: Leaving directory
>>> `/global/u1/a/asarje/hopper/openmpi-dev-trunk/build/ompi/tools/ompi_info'
>>> make[1]: *** [all-recursive] Error 1
>>> make[1]: Leaving directory
>>> `/global/u1/a/asarje/hopper/openmpi-dev-trunk/build/ompi'
>>> make: *** [all-recursive] Error 1
>>> ---
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Thu, Mar 1, 2012 at 5:25 PM, Ralph Castain  
>>> wrote:
 You need to update your source code - this was identified and 
 fixed on Wed. Unfortunately, our trunk is a developer's 
 environment. While we try hard to keep it fully functional, bugs 
 do occasionally work their way into the code.
 
 On Mar 1, 2012, at 1:37 AM, Abhinav Sarje wrote:
 
> Hi Nathan,
> 
> I trie

[OMPI users] Scatter+Group Communicator Issue

2012-03-06 Thread Timothy Stitt
Hi all,

I am scratching my head over what I think should be a relatively simple group 
communicator operation. I am hoping some kind person can put me out of my 
misery and figure out what I'm doing wrong. 

Basically, I am trying to scatter a set of values to a subset of process ranks 
(hence the need for a group communicator). When I run the sample code over 4 
processes (and scattering to 3 processes), I am getting a group-communicator 
related error in the scatter operation:

> [stats.crc.nd.edu:29285] *** An error occurred in MPI_Scatter
> [stats.crc.nd.edu:29285] *** on communicator MPI_COMM_WORLD
> [stats.crc.nd.edu:29285] *** MPI_ERR_COMM: invalid communicator
> [stats.crc.nd.edu:29285] *** MPI_ERRORS_ARE_FATAL (your MPI job will now 
> abort)
>  Complete - Rank   1
>  Complete - Rank   0
>  Complete - Rank   3

The actual test code is below:

program scatter_bug

  use mpi

  implicit none

  integer :: ierr,my_rank,procValues(3),procRanks(3)
  integer :: in_cnt,orig_group,new_group,new_comm,out

  call MPI_INIT(ierr)
  call MPI_COMM_RANK(MPI_COMM_WORLD,my_rank,ierr)

  procRanks=(/0,1,3/)
  procValues=(/0,434,268/)
  in_cnt=3

  ! Create sub-communicator
  call MPI_COMM_GROUP(MPI_COMM_WORLD, orig_group, ierr)
  call MPI_Group_incl(orig_group, in_cnt, procRanks, new_group, ierr)
  call MPI_COMM_CREATE(MPI_COMM_WORLD, new_group, new_comm, ierr)

  call MPI_SCATTER(procValues, 1, MPI_INTEGER, out, 1, MPI_INTEGER, 0, 
new_comm, ierr);

  print *,"Complete - Rank", my_rank

end program scatter_bug

Thanks in advance for any advice you can give.

Regards.

Tim.


Re: [OMPI users] Scatter+Group Communicator Issue

2012-03-06 Thread nadia . derbey
Isn't it because you're calling MPI_Scatter() even on rank 2 which is not 
part of your new_comm?

Regards,
Nadia

users-boun...@open-mpi.org wrote on 03/06/2012 01:52:06 PM:

> De : Timothy Stitt 
> A : "us...@open-mpi.org" 
> Date : 03/06/2012 01:52 PM
> Objet : [OMPI users] Scatter+Group Communicator Issue
> Envoyé par : users-boun...@open-mpi.org
> 
> Hi all,
> 
> I am scratching my head over what I think should be a relatively 
> simple group communicator operation. I am hoping some kind person 
> can put me out of my misery and figure out what I'm doing wrong. 
> 
> Basically, I am trying to scatter a set of values to a subset of 
> process ranks (hence the need for a group communicator). When I run 
> the sample code over 4 processes (and scattering to 3 processes), I 
> am getting a group-communicator related error in the scatter operation:
> 
> > [stats.crc.nd.edu:29285] *** An error occurred in MPI_Scatter
> > [stats.crc.nd.edu:29285] *** on communicator MPI_COMM_WORLD
> > [stats.crc.nd.edu:29285] *** MPI_ERR_COMM: invalid communicator
> > [stats.crc.nd.edu:29285] *** MPI_ERRORS_ARE_FATAL (your MPI job 
> will now abort)
> >  Complete - Rank   1
> >  Complete - Rank   0
> >  Complete - Rank   3
> 
> The actual test code is below:
> 
> program scatter_bug
> 
>   use mpi
> 
>   implicit none
> 
>   integer :: ierr,my_rank,procValues(3),procRanks(3)
>   integer :: in_cnt,orig_group,new_group,new_comm,out
> 
>   call MPI_INIT(ierr)
>   call MPI_COMM_RANK(MPI_COMM_WORLD,my_rank,ierr)
> 
>   procRanks=(/0,1,3/)
>   procValues=(/0,434,268/)
>   in_cnt=3
> 
>   ! Create sub-communicator
>   call MPI_COMM_GROUP(MPI_COMM_WORLD, orig_group, ierr)
>   call MPI_Group_incl(orig_group, in_cnt, procRanks, new_group, ierr)
>   call MPI_COMM_CREATE(MPI_COMM_WORLD, new_group, new_comm, ierr)
> 
>   call MPI_SCATTER(procValues, 1, MPI_INTEGER, out, 1, MPI_INTEGER, 
> 0, new_comm, ierr);
> 
>   print *,"Complete - Rank", my_rank
> 
> end program scatter_bug
> 
> Thanks in advance for any advice you can give.
> 
> Regards.
> 
> Tim.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Scatter+Group Communicator Issue

2012-03-06 Thread Timothy Stitt
Hi Nadia,

Thanks for the reply. This is were my confusion with the scatter command comes 
in. I was really hoping that MPI_Scatter would automagically ignore the ranks 
that are not part of the group communicator, since this test code is part of 
something more complicated were many sub-communicators are created over various 
combinations of ranks, and used in various collective routines. Do I really 
have to filter out manually the non-communicator ranks before I call the 
scatter...it would be really nice if the scatter command was 'smart' enough to 
do this for me by looking at the communicator that is passed to the routine.

Thanks again,

Tim.

On Mar 6, 2012, at 8:28 AM, 
mailto:nadia.der...@bull.net>> wrote:

Isn't it because you're calling MPI_Scatter() even on rank 2 which is not part 
of your new_comm?

Regards,
Nadia

users-boun...@open-mpi.org wrote on 
03/06/2012 01:52:06 PM:

> De : Timothy Stitt mailto:timothy.stit...@nd.edu>>
> A : "us...@open-mpi.org" 
> mailto:us...@open-mpi.org>>
> Date : 03/06/2012 01:52 PM
> Objet : [OMPI users] Scatter+Group Communicator Issue
> Envoyé par : users-boun...@open-mpi.org
>
> Hi all,
>
> I am scratching my head over what I think should be a relatively
> simple group communicator operation. I am hoping some kind person
> can put me out of my misery and figure out what I'm doing wrong.
>
> Basically, I am trying to scatter a set of values to a subset of
> process ranks (hence the need for a group communicator). When I run
> the sample code over 4 processes (and scattering to 3 processes), I
> am getting a group-communicator related error in the scatter operation:
>
> > [stats.crc.nd.edu:29285] *** An error occurred in MPI_Scatter
> > [stats.crc.nd.edu:29285] *** on communicator MPI_COMM_WORLD
> > [stats.crc.nd.edu:29285] *** MPI_ERR_COMM: invalid communicator
> > [stats.crc.nd.edu:29285] *** MPI_ERRORS_ARE_FATAL (your MPI job
> will now abort)
> >  Complete - Rank   1
> >  Complete - Rank   0
> >  Complete - Rank   3
>
> The actual test code is below:
>
> program scatter_bug
>
>   use mpi
>
>   implicit none
>
>   integer :: ierr,my_rank,procValues(3),procRanks(3)
>   integer :: in_cnt,orig_group,new_group,new_comm,out
>
>   call MPI_INIT(ierr)
>   call MPI_COMM_RANK(MPI_COMM_WORLD,my_rank,ierr)
>
>   procRanks=(/0,1,3/)
>   procValues=(/0,434,268/)
>   in_cnt=3
>
>   ! Create sub-communicator
>   call MPI_COMM_GROUP(MPI_COMM_WORLD, orig_group, ierr)
>   call MPI_Group_incl(orig_group, in_cnt, procRanks, new_group, ierr)
>   call MPI_COMM_CREATE(MPI_COMM_WORLD, new_group, new_comm, ierr)
>
>   call MPI_SCATTER(procValues, 1, MPI_INTEGER, out, 1, MPI_INTEGER,
> 0, new_comm, ierr);
>
>   print *,"Complete - Rank", my_rank
>
> end program scatter_bug
>
> Thanks in advance for any advice you can give.
>
> Regards.
>
> Tim.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Tim Stitt PhD (User Support Manager).
Center for Research Computing | University of Notre Dame |
P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email: 
tst...@nd.edu



Re: [OMPI users] Scatter+Group Communicator Issue

2012-03-06 Thread nadia . derbey
Tim,

Since MPI_Comm_create sets the created communicator to MPI_COMM_NULL for 
the processes that are not in the group , may be preceding your 
collectives by a:
if (MPI_COMM_NULL != new_comm) {
   
}
could be enough.

But may be I'm wrong: I'll let the specialists answer.

Regards,
Nadia
 
-- 
Nadia Derbey
 

users-boun...@open-mpi.org wrote on 03/06/2012 02:32:03 PM:

> De : Timothy Stitt 
> A : Open MPI Users 
> Date : 03/06/2012 02:32 PM
> Objet : Re: [OMPI users] Scatter+Group Communicator Issue
> Envoyé par : users-boun...@open-mpi.org
> 
> Hi Nadia,
> 
> Thanks for the reply. This is were my confusion with the scatter 
> command comes in. I was really hoping that MPI_Scatter would 
> automagically ignore the ranks that are not part of the group 
> communicator, since this test code is part of something more 
> complicated were many sub-communicators are created over various 
> combinations of ranks, and used in various collective routines. Do I
> really have to filter out manually the non-communicator ranks before
> I call the scatter...it would be really nice if the scatter command 
> was 'smart' enough to do this for me by looking at the communicator 
> that is passed to the routine.
> 
> Thanks again,
> 
> Tim.
> 
> On Mar 6, 2012, at 8:28 AM,  wrote:
> 
> Isn't it because you're calling MPI_Scatter() even on rank 2 which 
> is not part of your new_comm? 
> 
> Regards, 
> Nadia 
> 
> users-boun...@open-mpi.org wrote on 03/06/2012 01:52:06 PM:
> 
> > De : Timothy Stitt  
> > A : "us...@open-mpi.org"  
> > Date : 03/06/2012 01:52 PM 
> > Objet : [OMPI users] Scatter+Group Communicator Issue 
> > Envoyé par : users-boun...@open-mpi.org 
> > 
> > Hi all,
> > 
> > I am scratching my head over what I think should be a relatively 
> > simple group communicator operation. I am hoping some kind person 
> > can put me out of my misery and figure out what I'm doing wrong. 
> > 
> > Basically, I am trying to scatter a set of values to a subset of 
> > process ranks (hence the need for a group communicator). When I run 
> > the sample code over 4 processes (and scattering to 3 processes), I 
> > am getting a group-communicator related error in the scatter 
operation:
> > 
> > > [stats.crc.nd.edu:29285] *** An error occurred in MPI_Scatter
> > > [stats.crc.nd.edu:29285] *** on communicator MPI_COMM_WORLD
> > > [stats.crc.nd.edu:29285] *** MPI_ERR_COMM: invalid communicator
> > > [stats.crc.nd.edu:29285] *** MPI_ERRORS_ARE_FATAL (your MPI job 
> > will now abort)
> > >  Complete - Rank   1
> > >  Complete - Rank   0
> > >  Complete - Rank   3
> > 
> > The actual test code is below:
> > 
> > program scatter_bug
> > 
> >   use mpi
> > 
> >   implicit none
> > 
> >   integer :: ierr,my_rank,procValues(3),procRanks(3)
> >   integer :: in_cnt,orig_group,new_group,new_comm,out
> > 
> >   call MPI_INIT(ierr)
> >   call MPI_COMM_RANK(MPI_COMM_WORLD,my_rank,ierr)
> > 
> >   procRanks=(/0,1,3/)
> >   procValues=(/0,434,268/)
> >   in_cnt=3
> > 
> >   ! Create sub-communicator
> >   call MPI_COMM_GROUP(MPI_COMM_WORLD, orig_group, ierr)
> >   call MPI_Group_incl(orig_group, in_cnt, procRanks, new_group, ierr)
> >   call MPI_COMM_CREATE(MPI_COMM_WORLD, new_group, new_comm, ierr)
> > 
> >   call MPI_SCATTER(procValues, 1, MPI_INTEGER, out, 1, MPI_INTEGER, 
> > 0, new_comm, ierr);
> > 
> >   print *,"Complete - Rank", my_rank
> > 
> > end program scatter_bug
> > 
> > Thanks in advance for any advice you can give.
> > 
> > Regards.
> > 
> > Tim.
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> Tim Stitt PhD (User Support Manager).
> Center for Research Computing | University of Notre Dame | 
> P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email: 
> tst...@nd.edu 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Scatter+Group Communicator Issue

2012-03-06 Thread Timothy Stitt
Will definitely try that. Thanks for the suggestion.

Basically, I need to be able to scatter values from a sender to a subset of 
ranks (as I scale my production code, I don't want to use MPI_COMM_WORLD, as 
the receiver list will be quite small) without the receivers knowing if they 
are to receive something in advance of the scatter.

Thanks again for any help,

Tim.

On Mar 6, 2012, at 10:17 AM, 
mailto:nadia.der...@bull.net>> wrote:

Tim,

Since MPI_Comm_create sets the created communicator to MPI_COMM_NULL for the 
processes that are not in the group , may be preceding your collectives by a:
if (MPI_COMM_NULL != new_comm) {
   
}
could be enough.

But may be I'm wrong: I'll let the specialists answer.

Regards,
Nadia

--
Nadia Derbey


users-boun...@open-mpi.org wrote on 
03/06/2012 02:32:03 PM:

> De : Timothy Stitt mailto:timothy.stit...@nd.edu>>
> A : Open MPI Users mailto:us...@open-mpi.org>>
> Date : 03/06/2012 02:32 PM
> Objet : Re: [OMPI users] Scatter+Group Communicator Issue
> Envoyé par : users-boun...@open-mpi.org
>
> Hi Nadia,
>
> Thanks for the reply. This is were my confusion with the scatter
> command comes in. I was really hoping that MPI_Scatter would
> automagically ignore the ranks that are not part of the group
> communicator, since this test code is part of something more
> complicated were many sub-communicators are created over various
> combinations of ranks, and used in various collective routines. Do I
> really have to filter out manually the non-communicator ranks before
> I call the scatter...it would be really nice if the scatter command
> was 'smart' enough to do this for me by looking at the communicator
> that is passed to the routine.
>
> Thanks again,
>
> Tim.
>
> On Mar 6, 2012, at 8:28 AM, 
> mailto:nadia.der...@bull.net>> wrote:
>
> Isn't it because you're calling MPI_Scatter() even on rank 2 which
> is not part of your new_comm?
>
> Regards,
> Nadia
>
> users-boun...@open-mpi.org wrote on 
> 03/06/2012 01:52:06 PM:
>
> > De : Timothy Stitt mailto:timothy.stit...@nd.edu>>
> > A : "us...@open-mpi.org" 
> > mailto:us...@open-mpi.org>>
> > Date : 03/06/2012 01:52 PM
> > Objet : [OMPI users] Scatter+Group Communicator Issue
> > Envoyé par : users-boun...@open-mpi.org
> >
> > Hi all,
> >
> > I am scratching my head over what I think should be a relatively
> > simple group communicator operation. I am hoping some kind person
> > can put me out of my misery and figure out what I'm doing wrong.
> >
> > Basically, I am trying to scatter a set of values to a subset of
> > process ranks (hence the need for a group communicator). When I run
> > the sample code over 4 processes (and scattering to 3 processes), I
> > am getting a group-communicator related error in the scatter operation:
> >
> > > [stats.crc.nd.edu:29285] *** An error occurred in MPI_Scatter
> > > [stats.crc.nd.edu:29285] *** on communicator MPI_COMM_WORLD
> > > [stats.crc.nd.edu:29285] *** MPI_ERR_COMM: invalid communicator
> > > [stats.crc.nd.edu:29285] *** MPI_ERRORS_ARE_FATAL (your MPI job
> > will now abort)
> > >  Complete - Rank   1
> > >  Complete - Rank   0
> > >  Complete - Rank   3
> >
> > The actual test code is below:
> >
> > program scatter_bug
> >
> >   use mpi
> >
> >   implicit none
> >
> >   integer :: ierr,my_rank,procValues(3),procRanks(3)
> >   integer :: in_cnt,orig_group,new_group,new_comm,out
> >
> >   call MPI_INIT(ierr)
> >   call MPI_COMM_RANK(MPI_COMM_WORLD,my_rank,ierr)
> >
> >   procRanks=(/0,1,3/)
> >   procValues=(/0,434,268/)
> >   in_cnt=3
> >
> >   ! Create sub-communicator
> >   call MPI_COMM_GROUP(MPI_COMM_WORLD, orig_group, ierr)
> >   call MPI_Group_incl(orig_group, in_cnt, procRanks, new_group, ierr)
> >   call MPI_COMM_CREATE(MPI_COMM_WORLD, new_group, new_comm, ierr)
> >
> >   call MPI_SCATTER(procValues, 1, MPI_INTEGER, out, 1, MPI_INTEGER,
> > 0, new_comm, ierr);
> >
> >   print *,"Complete - Rank", my_rank
> >
> > end program scatter_bug
> >
> > Thanks in advance for any advice you can give.
> >
> > Regards.
> >
> > Tim.
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
>
> Tim Stitt PhD (User Support Manager).
> Center for Research Computing | University of Notre Dame |
> P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email:
> tst...@nd.edu
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Tim Stitt PhD (User Support Manager).
Center for Research Computing | University of Notre Dame |
P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email: 
tst...@nd.edu



[OMPI users] core binding confusion

2012-03-06 Thread Dave Love
Could someone confirm whether this is a bug or misunderstanding the doc
(in which case it's not just me, and it needs clarifying!)?  I haven't
looked at the current code in the hope of a quick authoritative answer.

This is with 1.5.5rc3, originally on Interlagos, but also checked on
Magny Cours.  It's also seen on two Interlagos with different physical
numbering of the logical processors.

On a 48-core Magny Cours with

  mpirun --bysocket --bind-to-core --report-bindings -np 48 

what I get is two processes per core, e.g.:

  [node247:09521] [[58099,0],0] odls:default:fork binding child [[58099,1],14] 
to socket 2 cpus 4000
  ...
  [node247:09521] [[58099,0],0] odls:default:fork binding child [[58099,1],38] 
to socket 2 cpus 4000

and hwloc-ps confirms the situation.  However, I (and my boss, who did
it originally) expect one per core.  With --bycore we do see one per
core, of course.

Is that actually expected?



Re: [OMPI users] core binding confusion

2012-03-06 Thread Ralph Castain
On Tue, Mar 6, 2012 at 7:28 AM, Dave Love  wrote:

> Could someone confirm whether this is a bug or misunderstanding the doc
> (in which case it's not just me, and it needs clarifying!)?  I haven't
> looked at the current code in the hope of a quick authoritative answer.
>
> This is with 1.5.5rc3, originally on Interlagos, but also checked on
> Magny Cours.  It's also seen on two Interlagos with different physical
> numbering of the logical processors.
>
> On a 48-core Magny Cours with
>
>  mpirun --bysocket --bind-to-core --report-bindings -np 48
>
> what I get is two processes per core, e.g.:
>
>  [node247:09521] [[58099,0],0] odls:default:fork binding child
> [[58099,1],14] to socket 2 cpus 4000
>  ...
>  [node247:09521] [[58099,0],0] odls:default:fork binding child
> [[58099,1],38] to socket 2 cpus 4000
>
> and hwloc-ps confirms the situation.  However, I (and my boss, who did
> it originally) expect one per core.  With --bycore we do see one per
> core, of course.
>
> Is that actually expected?
>

Well, no - it shouldn't do that, so I would expect it is a bug. Will try to
look at it, but probably won't happen until next week due to travel.


>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Scatter+Group Communicator Issue

2012-03-06 Thread Gustavo Correa
Hi Timothy

There is no call to MPI_Finalize in the program.
Would this be the problem?

I hope this helps,
Gus Correa


On Mar 6, 2012, at 10:19 AM, Timothy Stitt wrote:

> Will definitely try that. Thanks for the suggestion.
> 
> Basically, I need to be able to scatter values from a sender to a subset of 
> ranks (as I scale my production code, I don't want to use MPI_COMM_WORLD, as 
> the receiver list will be quite small) without the receivers knowing if they 
> are to receive something in advance of the scatter.
> 
> Thanks again for any help,
> 
> Tim.
> 
> On Mar 6, 2012, at 10:17 AM,  wrote:
> 
>> Tim, 
>> 
>> Since MPI_Comm_create sets the created communicator to MPI_COMM_NULL for the 
>> processes that are not in the group , may be preceding your collectives by 
>> a: 
>> if (MPI_COMM_NULL != new_comm) { 
>> 
>> } 
>> could be enough. 
>> 
>> But may be I'm wrong: I'll let the specialists answer. 
>> 
>> Regards, 
>> Nadia 
>>  
>> -- 
>> Nadia Derbey 
>>   
>> 
>> users-boun...@open-mpi.org wrote on 03/06/2012 02:32:03 PM:
>> 
>> > De : Timothy Stitt  
>> > A : Open MPI Users  
>> > Date : 03/06/2012 02:32 PM 
>> > Objet : Re: [OMPI users] Scatter+Group Communicator Issue 
>> > Envoyé par : users-boun...@open-mpi.org 
>> > 
>> > Hi Nadia, 
>> > 
>> > Thanks for the reply. This is were my confusion with the scatter 
>> > command comes in. I was really hoping that MPI_Scatter would 
>> > automagically ignore the ranks that are not part of the group 
>> > communicator, since this test code is part of something more 
>> > complicated were many sub-communicators are created over various 
>> > combinations of ranks, and used in various collective routines. Do I
>> > really have to filter out manually the non-communicator ranks before
>> > I call the scatter...it would be really nice if the scatter command 
>> > was 'smart' enough to do this for me by looking at the communicator 
>> > that is passed to the routine. 
>> > 
>> > Thanks again, 
>> > 
>> > Tim. 
>> > 
>> > On Mar 6, 2012, at 8:28 AM,  wrote: 
>> > 
>> > Isn't it because you're calling MPI_Scatter() even on rank 2 which 
>> > is not part of your new_comm? 
>> > 
>> > Regards, 
>> > Nadia 
>> > 
>> > users-boun...@open-mpi.org wrote on 03/06/2012 01:52:06 PM:
>> > 
>> > > De : Timothy Stitt  
>> > > A : "us...@open-mpi.org"  
>> > > Date : 03/06/2012 01:52 PM 
>> > > Objet : [OMPI users] Scatter+Group Communicator Issue 
>> > > Envoyé par : users-boun...@open-mpi.org 
>> > > 
>> > > Hi all,
>> > > 
>> > > I am scratching my head over what I think should be a relatively 
>> > > simple group communicator operation. I am hoping some kind person 
>> > > can put me out of my misery and figure out what I'm doing wrong. 
>> > > 
>> > > Basically, I am trying to scatter a set of values to a subset of 
>> > > process ranks (hence the need for a group communicator). When I run 
>> > > the sample code over 4 processes (and scattering to 3 processes), I 
>> > > am getting a group-communicator related error in the scatter operation:
>> > > 
>> > > > [stats.crc.nd.edu:29285] *** An error occurred in MPI_Scatter
>> > > > [stats.crc.nd.edu:29285] *** on communicator MPI_COMM_WORLD
>> > > > [stats.crc.nd.edu:29285] *** MPI_ERR_COMM: invalid communicator
>> > > > [stats.crc.nd.edu:29285] *** MPI_ERRORS_ARE_FATAL (your MPI job 
>> > > will now abort)
>> > > >  Complete - Rank   1
>> > > >  Complete - Rank   0
>> > > >  Complete - Rank   3
>> > > 
>> > > The actual test code is below:
>> > > 
>> > > program scatter_bug
>> > > 
>> > >   use mpi
>> > > 
>> > >   implicit none
>> > > 
>> > >   integer :: ierr,my_rank,procValues(3),procRanks(3)
>> > >   integer :: in_cnt,orig_group,new_group,new_comm,out
>> > > 
>> > >   call MPI_INIT(ierr)
>> > >   call MPI_COMM_RANK(MPI_COMM_WORLD,my_rank,ierr)
>> > > 
>> > >   procRanks=(/0,1,3/)
>> > >   procValues=(/0,434,268/)
>> > >   in_cnt=3
>> > >  
>> > >   ! Create sub-communicator
>> > >   call MPI_COMM_GROUP(MPI_COMM_WORLD, orig_group, ierr)
>> > >   call MPI_Group_incl(orig_group, in_cnt, procRanks, new_group, ierr)
>> > >   call MPI_COMM_CREATE(MPI_COMM_WORLD, new_group, new_comm, ierr)
>> > >  
>> > >   call MPI_SCATTER(procValues, 1, MPI_INTEGER, out, 1, MPI_INTEGER, 
>> > > 0, new_comm, ierr);
>> > > 
>> > >   print *,"Complete - Rank", my_rank
>> > > 
>> > > end program scatter_bug
>> > >   
>> > > Thanks in advance for any advice you can give.
>> > > 
>> > > Regards.
>> > > 
>> > > Tim.
>> > > ___
>> > > users mailing list
>> > > us...@open-mpi.org
>> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >  
>> > 
>> > Tim Stitt PhD (User Support Manager).
>> > Center for Research Computing | University of Notre Dame | 
>> > P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email: 
>> > tst...@nd.edu 
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.op

Re: [OMPI users] Scatter+Group Communicator Issue

2012-03-06 Thread Timothy Stitt
Dear Gus,

That was a transcription error on my part to email. The Finalize is in the 
actual code I used.

Thanks,

Tim.

On Mar 6, 2012, at 11:43 AM, Gustavo Correa wrote:

Hi Timothy

There is no call to MPI_Finalize in the program.
Would this be the problem?

I hope this helps,
Gus Correa


On Mar 6, 2012, at 10:19 AM, Timothy Stitt wrote:

Will definitely try that. Thanks for the suggestion.

Basically, I need to be able to scatter values from a sender to a subset of 
ranks (as I scale my production code, I don't want to use MPI_COMM_WORLD, as 
the receiver list will be quite small) without the receivers knowing if they 
are to receive something in advance of the scatter.

Thanks again for any help,

Tim.

On Mar 6, 2012, at 10:17 AM, 
mailto:nadia.der...@bull.net>> wrote:

Tim,

Since MPI_Comm_create sets the created communicator to MPI_COMM_NULL for the 
processes that are not in the group , may be preceding your collectives by a:
if (MPI_COMM_NULL != new_comm) {
  
}
could be enough.

But may be I'm wrong: I'll let the specialists answer.

Regards,
Nadia

--
Nadia Derbey


users-boun...@open-mpi.org wrote on 
03/06/2012 02:32:03 PM:

De : Timothy Stitt mailto:timothy.stit...@nd.edu>>
A : Open MPI Users mailto:us...@open-mpi.org>>
Date : 03/06/2012 02:32 PM
Objet : Re: [OMPI users] Scatter+Group Communicator Issue
Envoyé par : users-boun...@open-mpi.org

Hi Nadia,

Thanks for the reply. This is were my confusion with the scatter
command comes in. I was really hoping that MPI_Scatter would
automagically ignore the ranks that are not part of the group
communicator, since this test code is part of something more
complicated were many sub-communicators are created over various
combinations of ranks, and used in various collective routines. Do I
really have to filter out manually the non-communicator ranks before
I call the scatter...it would be really nice if the scatter command
was 'smart' enough to do this for me by looking at the communicator
that is passed to the routine.

Thanks again,

Tim.

On Mar 6, 2012, at 8:28 AM, 
mailto:nadia.der...@bull.net>> wrote:

Isn't it because you're calling MPI_Scatter() even on rank 2 which
is not part of your new_comm?

Regards,
Nadia

users-boun...@open-mpi.org wrote on 
03/06/2012 01:52:06 PM:

De : Timothy Stitt mailto:timothy.stit...@nd.edu>>
A : "us...@open-mpi.org" 
mailto:us...@open-mpi.org>>
Date : 03/06/2012 01:52 PM
Objet : [OMPI users] Scatter+Group Communicator Issue
Envoyé par : users-boun...@open-mpi.org

Hi all,

I am scratching my head over what I think should be a relatively
simple group communicator operation. I am hoping some kind person
can put me out of my misery and figure out what I'm doing wrong.

Basically, I am trying to scatter a set of values to a subset of
process ranks (hence the need for a group communicator). When I run
the sample code over 4 processes (and scattering to 3 processes), I
am getting a group-communicator related error in the scatter operation:

[stats.crc.nd.edu:29285] *** An error occurred in MPI_Scatter
[stats.crc.nd.edu:29285] *** on communicator MPI_COMM_WORLD
[stats.crc.nd.edu:29285] *** MPI_ERR_COMM: invalid communicator
[stats.crc.nd.edu:29285] *** MPI_ERRORS_ARE_FATAL (your MPI job
will now abort)
Complete - Rank   1
Complete - Rank   0
Complete - Rank   3

The actual test code is below:

program scatter_bug

 use mpi

 implicit none

 integer :: ierr,my_rank,procValues(3),procRanks(3)
 integer :: in_cnt,orig_group,new_group,new_comm,out

 call MPI_INIT(ierr)
 call MPI_COMM_RANK(MPI_COMM_WORLD,my_rank,ierr)

 procRanks=(/0,1,3/)
 procValues=(/0,434,268/)
 in_cnt=3

 ! Create sub-communicator
 call MPI_COMM_GROUP(MPI_COMM_WORLD, orig_group, ierr)
 call MPI_Group_incl(orig_group, in_cnt, procRanks, new_group, ierr)
 call MPI_COMM_CREATE(MPI_COMM_WORLD, new_group, new_comm, ierr)

 call MPI_SCATTER(procValues, 1, MPI_INTEGER, out, 1, MPI_INTEGER,
0, new_comm, ierr);

 print *,"Complete - Rank", my_rank

end program scatter_bug

Thanks in advance for any advice you can give.

Regards.

Tim.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Tim Stitt PhD (User Support Manager).
Center for Research Computing | University of Notre Dame |
P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email:
tst...@nd.edu
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Tim Stitt PhD (User Support Manager).
Center for Research Computing | University of Notre Dame |
P.O. Box 539, Notre Dame, IN 46556 | Phone:  574-631-5287 | Email: 
tst...@nd.edu

_

Re: [OMPI users] core binding confusion

2012-03-06 Thread Dave Love
Ralph Castain  writes:

> Well, no - it shouldn't do that, so I would expect it is a bug. Will try to
> look at it, but probably won't happen until next week due to travel.

OK, thanks.  I'll raise an issue and take a look, as we need it working
soon.



Re: [OMPI users] compilation error with pgcc Unknown switch

2012-03-06 Thread Nathan Hjelm

I think the problem is that it is trying to inline opal_atomic_swap_64 (when it 
shouldn't with PGI). I am working on an improved mpool/rcache (w/o kernel 
assistance) solution at the moment but when I am done I can take a look.

-Nathan

On Tue, 6 Mar 2012, Jeffrey Squyres wrote:


I'm afraid that I have neither a Cray nor the PGI compiler, so I'm not in a 
good position to help here.

Would someone with the PGI compiler give the trunk a whirl to see if disabling 
the CXX inline assembly for PGI broke something?  I'd be a little surprised, 
since we already had it disabled for C, but who knows...


On Mar 6, 2012, at 2:55 AM, Abhinav Sarje wrote:


I pulled fresh copy of the dev trunk, and tried building. It did not
change anything - I am still getting the same error:
../../../ompi/.libs/libmpi.so: undefined reference to
`opal_atomic_swap_64'

GNU version still builds fine.


On Tue, Mar 6, 2012 at 5:38 AM, Jeffrey Squyres  wrote:

I disabled C++ inline assembly for PGI (we already had C inline assembly for 
PGI).

So I don't think this should have caused a new error... should it?


On Mar 5, 2012, at 10:21 AM, Nathan Hjelm wrote:


Try pulling a fresh trunk. Jeff made a recent commit that may be relevant. 
Something about inline assembly being broken on PGI (I personally wouldn't 
recommend using that compiler unless you are using fortran).

-Nathan

On Sun, 4 Mar 2012, Abhinav Sarje wrote:


The same options/configuration in the GNU environment (compiler
version 4.6.1) builds smoothly. PGI env still gives the aforementioned
error. Has anyone experienced similar problem? May be some more flags
need to be set for PGI?



On Sat, Mar 3, 2012 at 10:58 PM, Abhinav Sarje  wrote:

Hi, I am trying to compile 64 bits.

On Fri, Mar 2, 2012 at 11:48 PM, George Bosilca  wrote:

Something is definitively weird in your compilation environment.

The "undefined" function is defined in atomic_impl.h as a static inline (static 
inline int64_t opal_atomic_swap_64(volatile int64_t *addr,…). So either the compiler 
should have complained during compilation, or it should be inlined when you reach the 
linking step. Or, and this is the unusual part, you're compiling 32 bits only (thus no 
atomic 64 bits are available), and we are forcing atomic operations on a 64 bits value. 
That would be quite strange …

Are you trying to compile 32 or 64 bits?

 george.

On Mar 2, 2012, at 06:12 , Jeffrey Squyres wrote:


I'm going to have to defer this to those who regularly build on Crays...

Sandia / LANL?


On Mar 2, 2012, at 12:12 AM, Abhinav Sarje wrote:


Hi again,

I just tried building afresh -> svn co, autogen, configure, make. And
it failed at the same point as before:

CCLD   ompi_info
../../../ompi/.libs/libmpi.so: undefined reference to `opal_atomic_swap_64'


Any more ideas/fixes?

Thanks all.
Abhinav.

On Fri, Mar 2, 2012 at 8:14 AM, Abhinav Sarje  wrote:

yes, I did a full autogen, configure, make clean and make all


On Thu, Mar 1, 2012 at 10:03 PM, Jeffrey Squyres  wrote:

Did you do a full autogen / configure / make clean / make all ?


On Mar 1, 2012, at 8:53 AM, Abhinav Sarje wrote:


Thanks Ralph. That did help, but only till the next hurdle. Now the
build fails at the following point with an 'undefined reference':
---
Making all in tools/ompi_info
make[2]: Entering directory
`/global/u1/a/asarje/hopper/openmpi-dev-trunk/build/ompi/tools/ompi_info'
CC ompi_info.o
CC output.o
CC param.o
CC components.o
CC version.o
CCLD   ompi_info
../../../ompi/.libs/libmpi.so: undefined reference to `opal_atomic_swap_64'
make[2]: *** [ompi_info] Error 2
make[2]: Leaving directory
`/global/u1/a/asarje/hopper/openmpi-dev-trunk/build/ompi/tools/ompi_info'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory
`/global/u1/a/asarje/hopper/openmpi-dev-trunk/build/ompi'
make: *** [all-recursive] Error 1
---






On Thu, Mar 1, 2012 at 5:25 PM, Ralph Castain  wrote:

You need to update your source code - this was identified and fixed on Wed. 
Unfortunately, our trunk is a developer's environment. While we try hard to 
keep it fully functional, bugs do occasionally work their way into the code.

On Mar 1, 2012, at 1:37 AM, Abhinav Sarje wrote:


Hi Nathan,

I tried building on an internal login node, and it did not fail at the
previous point. But, after compiling for a very long time, it failed
while building libmpi.la, with a multiple definition error:
--
...
CC mpiext/mpiext.lo
CC mpi/f77/base/mpi_f77_base_libmpi_f77_base_la-attr_fn_f.lo
CC mpi/f77/base/mpi_f77_base_libmpi_f77_base_la-conversion_fn_null_f.lo
CC mpi/f77/base/mpi_f77_base_libmpi_f77_base_la-f90_accessors.lo
CC mpi/f77/base/mpi_f77_base_libmpi_f77_base_la-strings.lo
CC mpi/f77/base/mpi_f77_base_libmpi_f77_base_la-test_constants_f.lo
CCLD   mpi/f77/base/libmpi_f77_base.la
CCLD   libmpi.la
mca/fcoll/dynamic/.libs/libmca_fcoll_dynamic.a(fcoll_dynamic_file_write_all.o):
In function `local_heap_sort':
/

[OMPI users] parallelising ADI

2012-03-06 Thread Kharche, Sanjay

Hi

I am working on a 3D ADI solver for the heat equation. I have implemented it as 
serial. Would anybody be able to indicate the best and more straightforward way 
to parallelise it. Apologies if this is going to the wrong forum.

thanks
Sanjay



Re: [OMPI users] parallelising ADI

2012-03-06 Thread Gustavo Correa

On Mar 6, 2012, at 3:59 PM, Kharche, Sanjay wrote:

> 
> Hi
> 
> I am working on a 3D ADI solver for the heat equation. I have implemented it 
> as serial. Would anybody be able to indicate the best and more 
> straightforward way to parallelise it. Apologies if this is going to the 
> wrong forum.
> 
> thanks
> Sanjay
> 

Hi Sanjay

There is an implementation of the 2D diffusion equation solver using MPI here:

http://beige.ucs.indiana.edu/I590/node71.html
http://beige.ucs.indiana.edu/I590/node72.html 

Not ADI, but it may shed some light.

I hope this helps,
Gus Correa





Re: [OMPI users] parallelising ADI

2012-03-06 Thread Eugene Loh
Parallelize in distributed-memory fashion or is multi-threaded good 
enough?  Anyhow, you should be able to find many resources with an 
Internet search.  This particular mailing list is more for users of 
OMPI, a particular MPI implementation.  One approach would be to 
distribute only one axis, solve locally, and transpose axes as 
necessary.  But, I see Gus also just provided an answer...  :^)


On 3/6/2012 12:59 PM, Kharche, Sanjay wrote:

I am working on a 3D ADI solver for the heat equation. I have implemented it as 
serial. Would anybody be able to indicate the best and more straightforward way 
to parallelise it. Apologies if this is going to the wrong forum.


[OMPI users] AlltoallV (with some zero send count values)

2012-03-06 Thread Timothy Stitt
Hi all,

Can anyone tell me whether it is legal to pass zero values for some of the send 
count elements in an MPI_AlltoallV() call? I want to perform an all-to-all 
operation but for performance reasons do not want to send data to various ranks 
who don't need to receive any useful values. If it is legal, can I assume the 
implementation is smart enough to not send messages when the send count is 0?

FYI: my tests show that AlltoallV operations with various send count values set 
to 0...hangs.

Thanks,

Tim.


Re: [OMPI users] parallelising ADI

2012-03-06 Thread Tim Prince

 On 03/06/2012 03:59 PM, Kharche, Sanjay wrote:

Hi

I am working on a 3D ADI solver for the heat equation. I have implemented it as 
serial. Would anybody be able to indicate the best and more straightforward way 
to parallelise it. Apologies if this is going to the wrong forum.


If it's to be implemented in parallelizable fashion (not SSOR style 
where each line uses updates from the previous line), it should be 
feasible to divide the outer loop into an appropriate number of blocks, 
or decompose the physical domain and perform ADI on individual blocks, 
then update and repeat.


--
Tim Prince



Re: [OMPI users] parallelising ADI

2012-03-06 Thread Jed Brown
On Tue, Mar 6, 2012 at 16:23, Tim Prince  wrote:

>  On 03/06/2012 03:59 PM, Kharche, Sanjay wrote:
>
>> Hi
>>
>> I am working on a 3D ADI solver for the heat equation. I have implemented
>> it as serial. Would anybody be able to indicate the best and more
>> straightforward way to parallelise it. Apologies if this is going to the
>> wrong forum.
>>
>>
>>  If it's to be implemented in parallelizable fashion (not SSOR style
> where each line uses updates from the previous line), it should be feasible
> to divide the outer loop into an appropriate number of blocks, or decompose
> the physical domain and perform ADI on individual blocks, then update and
> repeat.


True ADI has inherently high communication cost because a lot of data
movement is needed to make the _fundamentally sequential_ tridiagonal
solves local enough that latency doesn't kill you trying to keep those
solves distributed. This also applies (albeit to a lesser degree) in serial
due to way memory works.

If you only do non-overlapping subdomain solves, you must use a Krylov
method just to ensure convergence. You can add overlap, but the Krylov
method is still necessary for any practical convergence rate. The method
will also require an iteration count proportional to the number of
subdomains across the global domain times the square root of the number of
elements across a subdomain. The constants may not be small and this
asymptotic result is independent of what the subdomain solver is. You need
a coarse level to improve this scaling.

Sanjay, as Matt and I recommended when you asked the same question on the
PETSc list this morning, unless this is a homework assignment, you should
solve your problem with multigrid instead of ADI. We pointed you to simple
example code that scales well to many thousands of processes.


Re: [OMPI users] AlltoallV (with some zero send count values)

2012-03-06 Thread Jed Brown
On Tue, Mar 6, 2012 at 15:43, Timothy Stitt  wrote:

> Can anyone tell me whether it is legal to pass zero values for some of the
> send count elements in an MPI_AlltoallV() call? I want to perform an
> all-to-all operation but for performance reasons do not want to send data
> to various ranks who don't need to receive any useful values. If it is
> legal, can I assume the implementation is smart enough to not send messages
> when the send count is 0?
>
> FYI: my tests show that AlltoallV operations with various send count
> values set to 0...hangs.
>

This is allowed by the standard, but be warned that it is likely to perform
poorly compared to what could be done with point-to-point or one-sided
operations if most links are empty.


Re: [OMPI users] AlltoallV (with some zero send count values)

2012-03-06 Thread Timothy Stitt
Thanks Jed for the advice. How well-implemented are the one-sided communication 
routines? Are they something that could be trusted in a production code?

Sent from my iPad

On Mar 6, 2012, at 6:06 PM, "Jed Brown" mailto:j...@59a2.org>> 
wrote:

On Tue, Mar 6, 2012 at 15:43, Timothy Stitt 
mailto:timothy.stit...@nd.edu>> wrote:
Can anyone tell me whether it is legal to pass zero values for some of the send 
count elements in an MPI_AlltoallV() call? I want to perform an all-to-all 
operation but for performance reasons do not want to send data to various ranks 
who don't need to receive any useful values. If it is legal, can I assume the 
implementation is smart enough to not send messages when the send count is 0?

FYI: my tests show that AlltoallV operations with various send count values set 
to 0...hangs.

This is allowed by the standard, but be warned that it is likely to perform 
poorly compared to what could be done with point-to-point or one-sided 
operations if most links are empty.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] can't run the code on Jaguar

2012-03-06 Thread bin wang
Hello Ralph,

Thanks for your reply.

In order to start my job, I tried the following two ways
(1) configured/compiled open-mpi and compiled benchmark on head node.
  submitted a pbs job.
(2) submitted an interactive job to redo config/compile on compute node.
  And then used "/path/to/mpicc -o hello hello_world.c" to compile the
benchmark.
  used "/path/tp/mpirun -np 2 /path/to/hello" to run the job.
Actually I also tried to run "/path/tp/mpirun -np 2 hostname" but got the
same error.

The configure line is pretty long.

 67 $SRCDIR/configure \
 68--prefix=$PREFIX \
 69--enable-static --disable-shared --disable-dlopen
--disable-pretty-print-stacktrace --disable-pty-support --disable-io-romio
--enable-contrib-no-build=libnbc,vt --enable-debug \
 70--with-memory-manager=none --with-threads \
 71--without-tm \
 72--with-wrapper-ldflags="${ADD_WRAPPER_LDFLAGS}" \
 73--with-wrapper-libs="-lnsl -lpthread -lm" \
 74--with-platform=optimized \
 75--with-ugni=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem \
 76--with-ugni-libdir=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem/lib64  \
 77
--with-ugni-includedir=/opt/cray/gni-headers/2.1-1.0400.3906.5.1.gem/include
\
 78--with-xpmem=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem \
 79--with-xpmem-libdir=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem/lib64 \
 80--enable-mem-debug --enable-mem-profile --enable-debug-symbols
--enable-binaries \
 81--enable-picky --enable-mpi-f77 --enable-mpi-f90 --enable-mpi-cxx
--enable-mpi-cxx-seek \
 82--without-slurm --with-memory-manager=ptmalloc2 \
 83--with-pmi=/opt/cray/pmi/2.1.4-1..8596.8.9.gem
--with-cray-pmi-ext \
 84
--enable-mca-no-build=maffinity-first_use,maffinity-libnuma,ess-cnos,filem-rsh,grpcomm-cnos,pml-dr
\
 85${ADD_COMPILER} \
 86CPPFLAGS="${ADD_CPPFLAGS} -I${gniheaders}" \
 87FFLAGS="${ADD_FFLAGS} -I${gniheaders}" \
 88FCFLAGS="${ADD_FCFLAGS} -I/usr/include -I${gniheaders}" \
 89CFLAGS="-I/usr/include -I${gniheaders}" \
 90LDFLAGS="--static ${ADD_LDFLAGS} ${UGNILIBS} ${XPMEMLIBS}" \
 91LIBS="${ADD_LIBS} -lpthread -lrt -lpthread -lm" | tee build.log

Any idea?


Bin WANG



On Mon, Mar 5, 2012 at 7:13 PM, Ralph Castain  wrote:

> How did you attempt to start your job, and what does your configure line
> look like?
>
> Sent from my iPad
>
> On Mar 5, 2012, at 2:11 PM, bin Wang  wrote:
>
> > Hello All,
> >
> > I'm trying to run the latest OpenMPI code on Jaguar.
> > (Cloned from the Open MPI Mercurial mirror of the Subversion repository)
> > The configuration and compilation of OpenMPI were fine, and benchmark
> > was also successfully compiled. I tried to launch my program using mpirun
> > within an interactive job, but it failed immediately.
> >
> > Core dump file gave me the following information.
> > Error Msg=
> > [jaguarpf-login2:15370] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
> start a daemon on the local
> > node in file ess_singleton_module.c at line 220
> >
> --
> > It looks like orte_init failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during orte_init; some of which are due to configuration or
> > environment problems.  This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > ompi_mpi_init: orte_init failed
> > --> Returned value Unable to start a daemon on the local node (-127)
> instead of ORTE_SUCCESS
> >
> >
> --
> > It looks like MPI_INIT failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration33r
> environment
> > problems.  This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> > ompi_mpi_init: orte_init failed
> > --> Returned "Unable to start a daemon on40he local node" (-127) instead
> of "Success" (0)
> >
> --
> > [jaguarpf-login2:15370] *** An error occurred in MPI_Init
> > [jaguarpf-login2:15370] *** reported by process [4294967295,42949No
> process In: Line: ??   PC: ??
> > [jaguarpf-login2:15370] *** on a NULL communicator
> > [jaguarpf-login2:15370] *** Unknown error
> > [jaguarpf-login2:15370] *** MPI_ERRORS_ARE_FATAL (processes in this
> communicator will now abort,
> > [jaguarpf-login2:15370] *** and potentially your MPI job)
> >
> --
> > An MPI process is aborting at a time when it cannot guarantee that all
> > of its peer processes in the job will be killed properly.  You should
> > do

Re: [OMPI users] can't run the code on Jaguar

2012-03-06 Thread Ralph Castain
Wow - that's the ugliest configure line I think I've ever seen :-/

I note you have a --with-platform in the middle of it, which is really
unusual. Normally, you would put all that stuff in a platform file if
that's what you were going to do. Note that anything in the platform file
will override any duplicates on the cmd line, not the other way around. So
you may not be building what you thought.

I also noticed that you had two conflicting --with-memory-manager options
specified, which isn't good.

There usually isn't any reason for that complex a configure - we do a
pretty good job of sensing the right thing to do. In this case, I believe
the problem is that you forgot to configure for alps support and configured
out cnos support, so there is nothing left that you can use on your system.

Take a look at contrib/platform/lanl/cray_xe6/debug-nopanasas for an
example platform file that, I believe, builds what you are seeking. I would
suggest copying and editing that one, and then configuring with just
--with-platform=


On Tue, Mar 6, 2012 at 3:28 PM, bin wang  wrote:

> Hello Ralph,
>
> Thanks for your reply.
>
> In order to start my job, I tried the following two ways
> (1) configured/compiled open-mpi and compiled benchmark on head node.
>   submitted a pbs job.
> (2) submitted an interactive job to redo config/compile on compute node.
>   And then used "/path/to/mpicc -o hello hello_world.c" to compile the
> benchmark.
>   used "/path/tp/mpirun -np 2 /path/to/hello" to run the job.
> Actually I also tried to run "/path/tp/mpirun -np 2 hostname" but got the
> same error.
>
> The configure line is pretty long.
>
>  67 $SRCDIR/configure \
>  68--prefix=$PREFIX \
>  69--enable-static --disable-shared --disable-dlopen
> --disable-pretty-print-stacktrace --disable-pty-support --disable-io-romio
> --enable-contrib-no-build=libnbc,vt --enable-debug \
>  70--with-memory-manager=none --with-threads \
>  71--without-tm \
>  72--with-wrapper-ldflags="${ADD_WRAPPER_LDFLAGS}" \
>  73--with-wrapper-libs="-lnsl -lpthread -lm" \
>  74--with-platform=optimized \
>  75--with-ugni=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem \
>  76--with-ugni-libdir=/opt/cray/ugni/2.3-1.0400.3912.4.29.gem/lib64  \
>  77
> --with-ugni-includedir=/opt/cray/gni-headers/2.1-1.0400.3906.5.1.gem/include
> \
>  78--with-xpmem=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem \
>  79--with-xpmem-libdir=/opt/cray/xpmem/0.1-2.0400.29883.4.6.gem/lib64 \
>  80--enable-mem-debug --enable-mem-profile --enable-debug-symbols
> --enable-binaries \
>  81--enable-picky --enable-mpi-f77 --enable-mpi-f90 --enable-mpi-cxx
> --enable-mpi-cxx-seek \
>  82--without-slurm --with-memory-manager=ptmalloc2 \
>  83--with-pmi=/opt/cray/pmi/2.1.4-1..8596.8.9.gem
> --with-cray-pmi-ext \
>  84
> --enable-mca-no-build=maffinity-first_use,maffinity-libnuma,ess-cnos,filem-rsh,grpcomm-cnos,pml-dr
> \
>  85${ADD_COMPILER} \
>  86CPPFLAGS="${ADD_CPPFLAGS} -I${gniheaders}" \
>  87FFLAGS="${ADD_FFLAGS} -I${gniheaders}" \
>  88FCFLAGS="${ADD_FCFLAGS} -I/usr/include -I${gniheaders}" \
>  89CFLAGS="-I/usr/include -I${gniheaders}" \
>  90LDFLAGS="--static ${ADD_LDFLAGS} ${UGNILIBS} ${XPMEMLIBS}" \
>  91LIBS="${ADD_LIBS} -lpthread -lrt -lpthread -lm" | tee build.log
>
> Any idea?
>
>
> Bin WANG
>
>
>
>
> On Mon, Mar 5, 2012 at 7:13 PM, Ralph Castain wrote:
>
>> How did you attempt to start your job, and what does your configure line
>> look like?
>>
>> Sent from my iPad
>>
>> On Mar 5, 2012, at 2:11 PM, bin Wang  wrote:
>>
>> > Hello All,
>> >
>> > I'm trying to run the latest OpenMPI code on Jaguar.
>> > (Cloned from the Open MPI Mercurial mirror of the Subversion repository)
>> > The configuration and compilation of OpenMPI were fine, and benchmark
>> > was also successfully compiled. I tried to launch my program using
>> mpirun
>> > within an interactive job, but it failed immediately.
>> >
>> > Core dump file gave me the following information.
>> > Error Msg=
>> > [jaguarpf-login2:15370] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to
>> start a daemon on the local
>> > node in file ess_singleton_module.c at line 220
>> >
>> --
>> > It looks like orte_init failed for some reason; your parallel process is
>> > likely to abort.  There are many reasons that a parallel process can
>> > fail during orte_init; some of which are due to configuration or
>> > environment problems.  This failure appears to be an internal failure;
>> > here's some additional information (which may only be relevant to an
>> > Open MPI developer):
>> > ompi_mpi_init: orte_init failed
>> > --> Returned value Unable to start a daemon on the local node (-127)
>> instead of ORTE_SUCCESS
>> >
>> >
>> --
>> > It looks like MPI_INIT failed for so