[OMPI users] Able to run mpirun as root, but not as a user.

2013-09-03 Thread Ian Czekala
Dear openmpi users,

My basic problem is that I am able to run mpirun as root, but not at a user
level. I have tried installing openmpi via several methods, but all seem to
yield the same problem. I fear that I am missing something very basic and
zero-order, but I can't seem to resolve my problem with the information in
the FAQ.

Originally, I installed the openmpi through arch linux's package manager,
pacman. After a successful install, I tried (on my laptop)

$mpirun -np 2 /bin/pwd
and I get the following output:
--
mpirun was unable to launch the specified application as it encountered an
error:

Error: pipe function call failed when setting up I/O forwarding subsystem
Node: leo

while attempting to start process rank 0.
--
2 total processes failed to start

however when I run as root, I am sucessful
$ sudo mpirun -np 2 /bin/pwd
/home/ian
/home/ian

After doing some searching on the web (and coming across
thisthread),
I suspected that the issue might be with some PATH setup or user
permissions that weren't being set correctly by the arch linux package
manager, and so I uninstalled and resorted to installing by source.

When trying the normal install method
$ ./configure --prefix=/usr/local/openmpi
$ make all
$ sudo make install
and then changed my .zshrc to include the correct PATH and LD_LIBRARY_PATH,
I get the same behavior as before.

To investigate further the possibility of a permissions issue, I
uninstalled and reinstalled into my own home directory
$ ./configure --prefix=/home/ian/.builds/openmpi
$ make all
$ make install
and updated PATH and LD_LIBRARY_PATH correspondingly.

However, the behavior is *exactly* as before: mpirun will run as root, but
not at a user level. Am I missing something extremely basic here? Online
examples to me imply that I should be able to run as a user without any
additional configuration.

Here is some of the info asked for by the "Getting Help" section (all from
the local home directory install):

[ian@leo:~]$ sudo mpirun --bynode --tag-output ompi_info -v ompi full
--parsable
[1,0]:package:Open MPI ian@leo Distribution
[1,0]:ompi:version:full:1.6.5
[1,0]:ompi:version:svn:r28673
[1,0]:ompi:version:release_date:Jun 26, 2013
[1,0]:orte:version:full:1.6.5
[1,0]:orte:version:svn:r28673
[1,0]:orte:version:release_date:Jun 26, 2013
[1,0]:opal:version:full:1.6.5
[1,0]:opal:version:svn:r28673
[1,0]:opal:version:release_date:Jun 26, 2013
[1,0]:mpi-api:version:full:2.1
[1,0]:ident:1.6.5

Thank you for any help or guidance you may be able to offer! Sincerely,

Ian Czekala


config.log.bz2
Description: BZip2 compressed data


Re: [OMPI users] problems with rankfile in openmpi-1.9a1r29097

2013-09-03 Thread Siegmar Gross
Hi,

> Okay, I have a fix for not specifying the number of procs when
> using a rankfile.
> 
> As for the binding pattern, the problem is a syntax error in
> your rankfile. You need a semi-colon instead of a comma to
> separate the sockets for rank0:
> 
> > rank 0=bend001 slot=0:0-1,1:0-1  => rank 0=bend001 slot=0:0-1;1:0-1
> 
> This is required because you use commas to list specific cores
> - e.g., slot=0:0,1,4,6
...

OK, you have changed syntax. Open MPI 1.6.x needs "," and Open MPI
1.9.x needs ";". Unfortunately my rankfiles still don't work as
expected (even if I add "-np ", so that everything is specified
now). These are some of my rankfiles, which I use to show you different
errors.

::
rf_linpc_semicolon
::
# Open MPI 1.7.x and newer needs ";" to separate sockets.
# mpiexec -report-bindings -rf rf_linpc_semicolon -np 1 hostname
rank 0=linpc1 slot=0:0-1;1:0-1

::
rf_linpc_linpc_semicolon
::
# Open MPI 1.7.x and newer needs ";" to separate sockets.
# mpiexec -report-bindings -rf rf_linpc_linpc_semicolon -np 4 hostname
rank 0=linpc0 slot=0:0-1;1:0-1
rank 1=linpc1 slot=0:0-1
rank 2=linpc1 slot=1:0
rank 3=linpc1 slot=1:1

::
rf_tyr_semicolon
::
# Open MPI 1.7.x and newer needs ";" to separate sockets.
# mpiexec -report-bindings -rf rf_tyr_semicolon -np 1 hostname
rank 0=tyr slot=0:0;1:0
tyr rankfiles 198 


These are my results. "linpc?" use Open-SuSE Linux, "sunpc?" use
Solaris 10 x86_64, and "tyr" uses Solaris 10 sparc. "linpc?" and
"sunpc?" use identical hardware.


tyr rankfiles 107 ompi_info | grep "Open MPI:"
Open MPI: 1.9a1r29097


1) It seems that I can use the rankfile only on a node, which is
   specified in the rankfile.

linpc1 rankfiles 98 mpiexec -report-bindings \
  -rf rf_linpc_semicolon -np 1 hostname
[linpc1:12504] MCW rank 0 bound to socket 0[core 0[hwt 0]],
  socket 0[core 1[hwt 0]], socket 1[core 2[hwt 0]],
  socket 1[core 3[hwt 0]]: [B/B][B/B]
linpc1
linpc1 rankfiles 98 exit


tyr rankfiles 125 ssh sunpc1
...
sunpc1 rankfiles 102 mpiexec -report-bindings \
  -rf rf_linpc_semicolon -np 1 hostname
--
All nodes which are allocated for this job are already filled.
--
sunpc1 rankfiles 103 exit


linpc0 rankfiles 93 mpiexec -report-bindings \
  -rf rf_linpc_semicolon -np 1 hostname
--
All nodes which are allocated for this job are already filled.
--
linpc0 rankfiles 94 exit


I can use the rankfile on any machine with Open MPI 1.6.x.

tyr rankfiles 105 ompi_info | grep "Open MPI:"
Open MPI: 1.6.5a1r28554

tyr rankfiles 106 mpiexec -report-bindings \
  -rf rf_linpc_semicolon -np 1 hostname
[tyr.informatik.hs-fulda.de:29380] Got an error!
[linpc1:12637] MCW rank 0 bound to socket 0[core 0-1]:
  [B B][. .] (slot list 0:0-1)
linpc1

Semicolon isn't allowed.


tyr rankfiles 107 mpiexec -report-bindings \
  -rf rf_linpc_comma -np 1 hostname
[linpc1:12704] MCW rank 0 bound to socket 0[core 0-1]
  socket 1[core 0-1]: [B B][B B] (slot list 0:0,1,1:0,1)
linpc1
tyr rankfiles 108


2) I cannot use two Linux machines with Open MPI 1.9.x.

linpc1 rankfiles 105 mpiexec -report-bindings \
  -rf rf_linpc_linpc_semicolon -np 4 hostname
--
The rankfile that was used claimed that a host was either not
allocated or oversubscribed its slots.  Please review your rank-slot
assignments and your host allocation to ensure a proper match.  Also,
some systems may require using full hostnames, such as
"host1.example.com" (instead of just plain "host1").

  Host: linpc0
--
linpc1 rankfiles 106 


Perhaps this problem is a follow-up of the above problem.


No problem with Open MPI 1.6.x.

linpc1 rankfiles 106 mpiexec -report-bindings \
  -rf rf_linpc_linpc_comma -np 4 hostname
[linpc1:12975] MCW rank 1 bound to socket 0[core 0-1]:
  [B B][. .] (slot list 0:0-1)
[linpc1:12975] MCW rank 2 bound to socket 1[core 0]:
  [. .][B .] (slot list 1:0)
[linpc1:12975] MCW rank 3 bound to socket 1[core 1]:
  [. .][. B] (slot list 1:1)
linpc1
linpc1
[linpc0:13855] MCW rank 0 bound to socket 0[core 0-1]
  socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1)
linpc0
linpc1
linpc1 rankfiles 107 


3) I have a problem on "tyr" (Solaris 10 sparc).

tyr rankfiles 106 mpiexec -report-bindings \
  -rf rf_tyr_semicolon -np 1 hostname
[tyr.informatik.hs-fulda.de:29849] [[53951,0],0] ORTE_ERROR_LOG:
  Not found in file
   ../../../../../openmpi-1.9a1r29097/orte/mca/rmaps/rank_file/rmaps_rank_file.c
   at line 276
[tyr.informatik.hs-fulda.de:29849] [[53951,0],0] ORTE_ERROR_LOG:
  Not found in file
  ../../../../openmp

Re: [OMPI users] problems with rankfile in openmpi-1.9a1r29097

2013-09-03 Thread Siegmar Gross
Hi,

> 3) I have a problem on "tyr" (Solaris 10 sparc).
> 
> tyr rankfiles 106 mpiexec -report-bindings \
>   -rf rf_tyr_semicolon -np 1 hostname
> [tyr.informatik.hs-fulda.de:29849] [[53951,0],0] ORTE_ERROR_LOG:
>   Not found in file
>
> ../../../../../openmpi-1.9a1r29097/orte/mca/rmaps/rank_file/rmaps_rank_file.c
>at line 276
> [tyr.informatik.hs-fulda.de:29849] [[53951,0],0] ORTE_ERROR_LOG:
>   Not found in file
>   ../../../../openmpi-1.9a1r29097/orte/mca/rmaps/base/rmaps_base_map_job.c
>   at line 173
> tyr rankfiles 107

This one works now. I found a strange character in the rankfile, which
I removed.

tyr rankfiles 103 mpiexec -report-bindings \
  -rf rf_tyr_semicolon -np 1 hostname
[tyr.informatik.hs-fulda.de:00079] MCW rank 0 bound to
  socket 0[core 0[hwt 0]], socket 1[core 1[hwt 0]]: [B][B]
tyr.informatik.hs-fulda.de




> I get the following output, if I try the rankfile from a different machine
> (also Solaris 10 sparc).
> 
> rs0 rankfiles 104 mpiexec -report-bindings -rf rf_tyr_semicolon -np 1 hostname
> --
> All nodes which are allocated for this job are already filled.
> --
> rs0 rankfiles 105 

No change in this case.

rs0 rankfiles 102 mpiexec -report-bindings \
  -rf rf_tyr_semicolon -np 1 hostname
--
All nodes which are allocated for this job are already filled.
--


I checked the other rankfiles as well and they are OK, so that
problems 1) and 2) of my previous e-mail still exist.


Kind regards

Siegmar



Re: [OMPI users] mpi_allgatherv

2013-09-03 Thread George Bosilca

On Sep 1, 2013, at 23:36 , Huangwei  wrote:

> Hi George,
>  
> Thank you for your reply. Please see below. 
> best regards,
> Huangwei
> 
>  
> 
> 
> On 1 September 2013 22:03, George Bosilca  wrote:
> 
> On Aug 31, 2013, at 14:56 , Huangwei  wrote:
> 
>> Hi All,
>>  
>> I would like to send an array A, which has different dimensions in the 
>> processors. Then the root receive these As and puts them into another array 
>> globA. I know MPI_allgatherv can do this. However, there are still some 
>> implementation issues that are not very clear for me. Thank you very much if 
>> any of you can give me some suggestions and comments. The piece of code is 
>> as follows (I am not sure if it is completely correct):
>>  
>>  
>> !...calculate the total size for the total size of the globA, 
>> PROCASize(myidf) is the size of array A in each processor.
>>  
>> allocate(PROCASize(numprocs))
>> PROCASize(myidf) = Asize
>> call 
>> mpi_allreduce(PROCSize,PROCSize,numprocs,mpi_integer,mpi_sum,MPI_COMM_WORLD,ierr)
>> globAsize = sum(PROCAsize)
>>  
>> !...calculate the RECS and DISP for MPI_allgatherv
>> allocate(RECSASize(0:numprocs-1))
>> allocate(DISP(0:numprocs-1))
>> do i=1,numprocs
>>RECSASize(i-1) = PROCASize(i)
>> enddo
>> call mpi_type_extent(mpi_integer, extent, ierr)
>> do i=1,numprocs
>>  DISP(i-1) = 1 + (i-1)*RECSASIze(i-1)*extent
>> enddo
>>  
>> !...allocate the size of the array globA
>> allocate(globA(globASize*extent))
>> call mpi_allgatherv(A,ASize,MPI_INTEGER,globA, RECSASIze, 
>> DISP,MPI_INTEGER,MPI_COMM_WORLD,ierr)
>>  
>> My Questions:
>>  
>> 1, How to allocate the globA, i.e. the receive buff's size? Should I use 
>> globASize*extent or justglobalize?
> 
>  
> I don't understand what globASize is supposed to be as you do the reduction 
> on PROCSize and then sum PROCAsize.
>  
> Here I assume globASize is sum of the size of the array A in all the 
> processors. For example, in proc 1, it is A(3), in proc 2, it is A(5), in 
> proc 3 it is A(6). so  globSize =14. I aim to put these A arrays to globA 
> which is sized as 14. All the data in A are aimed to be stored in globA 
> consecutively based on rank number.   
>  
>  
> Anyway, you should always allocate the memory for collective based on the 
> total number of elements to receive times the extent of each element. In fact 
> to be even more accurate, if we suppose that you correctly computed the DISP 
> array, you should allocate globA as DISP(numprocs-1) + RECSASIze.
>If all the elements in all A arrays are integer or all are uniformly 
> double precision, the size of globA should be 14 or 14*extent_integer? 

14 * extent(datatype).

> 
>  
>>  
>> 2, about the displacements in globA, i.e. DISP(:), it is stand for the order 
>> of an array? like 1, 2, 3, , this corresponds to DISP(i-1) = 1 + 
>> (i-1)*RECSASIze(i-1)*extent. Or this array's elements are the address at 
>> which the data from different processors will be stored in globA?
> 
> These are the displacement from the beginning of the array where the data 
> from a peer is stored. The index in this array is the rank of the peer 
> process in the communicator.
> 
> Yes, I know. But I mean  the meaning of the elements of this array. Still use 
> that example mentioned above. Is the following specification correct: 
> DISP(1)=0, DISP(2)=3, DISP(3)=8 ?

It depends on the amount of data sent by each process (as the ranges should not 
overlap).

>>  
>> 3, should the arrays start from 0 to numprocs-1? or start from 1 to 
>> numprocs? This may be important when they work as arguments in 
>> mpi_allgatherv subroutine.
> 
> It doesn't matter how you allocate it (0:numprocs-1) or simple (numprocs) the 
> compiler will do the right this when creating the call using the array.
> 
>   George.
>  
> Additional Question is:
>  
> For fortran mpi, can the mpi subroutine send array with 0 size, i.e. in the 
> example, A is A(0), and ASize =0:

As long as the peers expect 0 INTEGERS from this rank the call is correct.

  George.


>  
> call mpi_allgatherv(A,ASize,MPI_INTEGER,globA, RECSASIze, 
> DISP,MPI_INTEGER,MPI_COMM_WORLD,ierr)
>  
> Is this valid in mpi calling? This case will appear in my work.
>  
>  
> Thank you very much for your help!
>  
> Have a nice holiday!
>  
>  
>>  
>> These questions may be too simple for MPI professionals, but I do not have 
>> much experience on this. Thus I am sincerely eager to get some comments and 
>> suggestions from you. Thank you in advance!
>> 
>> 
>> regards,
>> Huangwei
>> 
>>  
>> 
>>  
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _

Re: [OMPI users] Able to run mpirun as root, but not as a user.

2013-09-03 Thread Reuti
Am 03.09.2013 um 06:48 schrieb Ian Czekala:

> Dear openmpi users,
> 
> My basic problem is that I am able to run mpirun as root, but not at a user 
> level. I have tried installing openmpi via several methods, but all seem to 
> yield the same problem. I fear that I am missing something very basic and 
> zero-order, but I can't seem to resolve my problem with the information in 
> the FAQ.
> 
> Originally, I installed the openmpi through arch linux's package manager, 
> pacman. After a successful install, I tried (on my laptop)
> 
> $mpirun -np 2 /bin/pwd
> and I get the following output:
> --
> mpirun was unable to launch the specified application as it encountered an 
> error:
> 
> Error: pipe function call failed when setting up I/O forwarding subsystem
> Node: leo
> 
> while attempting to start process rank 0.
> --
> 2 total processes failed to start
> 
> however when I run as root, I am sucessful
> $ sudo mpirun -np 2 /bin/pwd
> /home/ian
> /home/ian
> 
> After doing some searching on the web (and coming across this thread),

There is another one:

http://www.open-mpi.org/community/lists/users/2010/03/12291.php

-- Reuti


> I suspected that the issue might be with some PATH setup or user permissions 
> that weren't being set correctly by the arch linux package manager, and so I 
> uninstalled and resorted to installing by source.
> 
> When trying the normal install method 
> $ ./configure --prefix=/usr/local/openmpi
> $ make all
> $ sudo make install
> and then changed my .zshrc to include the correct PATH and LD_LIBRARY_PATH, I 
> get the same behavior as before.
> 
> To investigate further the possibility of a permissions issue, I uninstalled 
> and reinstalled into my own home directory 
> $ ./configure --prefix=/home/ian/.builds/openmpi
> $ make all
> $ make install
> and updated PATH and LD_LIBRARY_PATH correspondingly.
> 
> However, the behavior is *exactly* as before: mpirun will run as root, but 
> not at a user level. Am I missing something extremely basic here? Online 
> examples to me imply that I should be able to run as a user without any 
> additional configuration.
> 
> Here is some of the info asked for by the "Getting Help" section (all from 
> the local home directory install):
> 
> [ian@leo:~]$ sudo mpirun --bynode --tag-output ompi_info -v ompi full 
> --parsable
> [1,0]:package:Open MPI ian@leo Distribution
> [1,0]:ompi:version:full:1.6.5
> [1,0]:ompi:version:svn:r28673
> [1,0]:ompi:version:release_date:Jun 26, 2013
> [1,0]:orte:version:full:1.6.5
> [1,0]:orte:version:svn:r28673
> [1,0]:orte:version:release_date:Jun 26, 2013
> [1,0]:opal:version:full:1.6.5
> [1,0]:opal:version:svn:r28673
> [1,0]:opal:version:release_date:Jun 26, 2013
> [1,0]:mpi-api:version:full:2.1
> [1,0]:ident:1.6.5
> 
> Thank you for any help or guidance you may be able to offer! Sincerely,
> 
> Ian Czekala
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] problems with rankfile in openmpi-1.9a1r29097

2013-09-03 Thread Ralph Castain
Heck if I know what might be wrong - it works fine for me, regardless of what 
machine I run it from.

If this is compiled with --enable-debug, try adding "--display-allocation -mca 
rmaps_base_verbose 5" to your cmd line to see what might be going on.


On Sep 3, 2013, at 1:20 AM, Siegmar Gross 
 wrote:

> Hi,
> 
>> 3) I have a problem on "tyr" (Solaris 10 sparc).
>> 
>> tyr rankfiles 106 mpiexec -report-bindings \
>>  -rf rf_tyr_semicolon -np 1 hostname
>> [tyr.informatik.hs-fulda.de:29849] [[53951,0],0] ORTE_ERROR_LOG:
>>  Not found in file
>>   
>> ../../../../../openmpi-1.9a1r29097/orte/mca/rmaps/rank_file/rmaps_rank_file.c
>>   at line 276
>> [tyr.informatik.hs-fulda.de:29849] [[53951,0],0] ORTE_ERROR_LOG:
>>  Not found in file
>>  ../../../../openmpi-1.9a1r29097/orte/mca/rmaps/base/rmaps_base_map_job.c
>>  at line 173
>> tyr rankfiles 107
> 
> This one works now. I found a strange character in the rankfile, which
> I removed.
> 
> tyr rankfiles 103 mpiexec -report-bindings \
>  -rf rf_tyr_semicolon -np 1 hostname
> [tyr.informatik.hs-fulda.de:00079] MCW rank 0 bound to
>  socket 0[core 0[hwt 0]], socket 1[core 1[hwt 0]]: [B][B]
> tyr.informatik.hs-fulda.de
> 
> 
> 
> 
>> I get the following output, if I try the rankfile from a different machine
>> (also Solaris 10 sparc).
>> 
>> rs0 rankfiles 104 mpiexec -report-bindings -rf rf_tyr_semicolon -np 1 
>> hostname
>> --
>> All nodes which are allocated for this job are already filled.
>> --
>> rs0 rankfiles 105 
> 
> No change in this case.
> 
> rs0 rankfiles 102 mpiexec -report-bindings \
>  -rf rf_tyr_semicolon -np 1 hostname
> --
> All nodes which are allocated for this job are already filled.
> --
> 
> 
> I checked the other rankfiles as well and they are OK, so that
> problems 1) and 2) of my previous e-mail still exist.
> 
> 
> Kind regards
> 
> Siegmar
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Able to run mpirun as root, but not as a user.

2013-09-03 Thread Ian Czekala
Yes! Thank you for your help. Doing

$./configure --disable-pty-support --prefix=/usr/local/openmpi
$make all
$sudo make install

fixed the issue

$ mpirun -np 2 /bin/pwd
/home/ian
/home/ian

Thanks a bunch,

Ian


On Tue, Sep 3, 2013 at 6:26 AM, Reuti  wrote:

> Am 03.09.2013 um 06:48 schrieb Ian Czekala:
>
> > Dear openmpi users,
> >
> > My basic problem is that I am able to run mpirun as root, but not at a
> user level. I have tried installing openmpi via several methods, but all
> seem to yield the same problem. I fear that I am missing something very
> basic and zero-order, but I can't seem to resolve my problem with the
> information in the FAQ.
> >
> > Originally, I installed the openmpi through arch linux's package
> manager, pacman. After a successful install, I tried (on my laptop)
> >
> > $mpirun -np 2 /bin/pwd
> > and I get the following output:
> >
> --
> > mpirun was unable to launch the specified application as it encountered
> an error:
> >
> > Error: pipe function call failed when setting up I/O forwarding subsystem
> > Node: leo
> >
> > while attempting to start process rank 0.
> >
> --
> > 2 total processes failed to start
> >
> > however when I run as root, I am sucessful
> > $ sudo mpirun -np 2 /bin/pwd
> > /home/ian
> > /home/ian
> >
> > After doing some searching on the web (and coming across this thread),
>
> There is another one:
>
> http://www.open-mpi.org/community/lists/users/2010/03/12291.php
>
> -- Reuti
>
>
> > I suspected that the issue might be with some PATH setup or user
> permissions that weren't being set correctly by the arch linux package
> manager, and so I uninstalled and resorted to installing by source.
> >
> > When trying the normal install method
> > $ ./configure --prefix=/usr/local/openmpi
> > $ make all
> > $ sudo make install
> > and then changed my .zshrc to include the correct PATH and
> LD_LIBRARY_PATH, I get the same behavior as before.
> >
> > To investigate further the possibility of a permissions issue, I
> uninstalled and reinstalled into my own home directory
> > $ ./configure --prefix=/home/ian/.builds/openmpi
> > $ make all
> > $ make install
> > and updated PATH and LD_LIBRARY_PATH correspondingly.
> >
> > However, the behavior is *exactly* as before: mpirun will run as root,
> but not at a user level. Am I missing something extremely basic here?
> Online examples to me imply that I should be able to run as a user without
> any additional configuration.
> >
> > Here is some of the info asked for by the "Getting Help" section (all
> from the local home directory install):
> >
> > [ian@leo:~]$ sudo mpirun --bynode --tag-output ompi_info -v ompi full
> --parsable
> > [1,0]:package:Open MPI ian@leo Distribution
> > [1,0]:ompi:version:full:1.6.5
> > [1,0]:ompi:version:svn:r28673
> > [1,0]:ompi:version:release_date:Jun 26, 2013
> > [1,0]:orte:version:full:1.6.5
> > [1,0]:orte:version:svn:r28673
> > [1,0]:orte:version:release_date:Jun 26, 2013
> > [1,0]:opal:version:full:1.6.5
> > [1,0]:opal:version:svn:r28673
> > [1,0]:opal:version:release_date:Jun 26, 2013
> > [1,0]:mpi-api:version:full:2.1
> > [1,0]:ident:1.6.5
> >
> > Thank you for any help or guidance you may be able to offer! Sincerely,
> >
> > Ian Czekala
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?

2013-09-03 Thread Teranishi, Keita
Nathan,

Thanks for the help.  I can run a job using openmpi, assigning a signle
process per node.  However, I have been failing to run a job using
multiple MPI ranks in a single node.  In other words, "mpiexec
--bind-to-core --npernode 16 --n 16 ./test" never works (apron -n 16 works
fine).  DO you have any thought about it?

Thanks,
-
Keita Teranishi
R&D Principal Staff Member
Scalable Modeling and Analysis Systems
Sandia National Laboratories
Livermore, CA 94551




On 8/30/13 8:49 AM, "Hjelm, Nathan T"  wrote:

>Replace install_path to where you want Open MPI installed.
>
>./configure --prefix=install_path
>--with-platform=contrib/platform/lanl/cray_xe6/optimized-luster
>make
>make install
>
>To use Open MPI just set the PATH and LD_LIBRARY_PATH:
>
>PATH=install_path/bin:$PATH
>LD_LIBRARY_PATH=install_path/lib:$LD_LIBRARY_PATH
>
>You can then use mpicc, mpicxx, mpif90, etc to compile and either mpirun
>or aprun to run. If you are running at scale I would recommend against
>using aprun for now. I also recommend you change your programming
>environment to either PrgEnv-gnu or PrgEnv-intel. The PGI compiler can be
>a PIA. It is possible to build with the Cray compiler but it takes
>patching the config.guess and changing some autoconf stuff.
>
>-Nathan
>
>Please excuse the horrible Outlook-style quoting.
>
>From: users [users-boun...@open-mpi.org] on behalf of Teranishi, Keita
>[knte...@sandia.gov]
>Sent: Thursday, August 29, 2013 8:01 PM
>To: Open MPI Users
>Subject: Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6)
>is working for OpenMPI-1.6.5?
>
>Thanks for the info.  Is it still possible to build by myself?  What is
>the procedure other than configure script?
>
>
>
>
>
>On 8/23/13 2:37 PM, "Nathan Hjelm"  wrote:
>
>>On Fri, Aug 23, 2013 at 09:14:25PM +, Teranishi, Keita wrote:
>>>Hi,
>>>I am trying to install OpenMPI 1.6.5 on Cray XE6 and very curious
>>>with the
>>>current support of PMI.  In the previous discussions, there was a
>>>comment
>>>on the version of PMI (it works with 2.1.4, but fails with 3.0).
>>>Our
>>
>>Open MPI 1.6.5 does not have support for the XE-6. Use 1.7.2 instead.
>>
>>>machine has PMI2.1.4 and PMI4.0 (default).  Which version do you
>>
>>There was a regression in PMI 3.x.x that still exists in 4.0.x that
>>causes a warning to be printed on every rank when using mpirun. We are
>>working with Cray to resolve the issue. For now use 2.1.4. See the
>>platform files in contrib/platform/lanl/cray_xe6. The platform files you
>>would want to use are debug-lustre or optimized-lusre.
>>
>>BTW, 1.7.2 is installed on Cielo and Cielito. Just run:
>>
>>module swap PrgEnv-pgi PrgEnv-gnu (PrgEnv-intel also works)
>>module unload cray-mpich2 xt-libsci
>>module load openmpi/1.7.2
>>
>>
>>-Nathan Hjelm
>>Open MPI Team, HPC-3, LANL
>>___
>>users mailing list
>>us...@open-mpi.org
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>___
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>___
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?

2013-09-03 Thread Ralph Castain
How does it fail?

On Sep 3, 2013, at 1:19 PM, "Teranishi, Keita"  wrote:

> Nathan,
> 
> Thanks for the help.  I can run a job using openmpi, assigning a signle
> process per node.  However, I have been failing to run a job using
> multiple MPI ranks in a single node.  In other words, "mpiexec
> --bind-to-core --npernode 16 --n 16 ./test" never works (apron -n 16 works
> fine).  DO you have any thought about it?
> 
> Thanks,
> -
> Keita Teranishi
> R&D Principal Staff Member
> Scalable Modeling and Analysis Systems
> Sandia National Laboratories
> Livermore, CA 94551
> 
> 
> 
> 
> On 8/30/13 8:49 AM, "Hjelm, Nathan T"  wrote:
> 
>> Replace install_path to where you want Open MPI installed.
>> 
>> ./configure --prefix=install_path
>> --with-platform=contrib/platform/lanl/cray_xe6/optimized-luster
>> make
>> make install
>> 
>> To use Open MPI just set the PATH and LD_LIBRARY_PATH:
>> 
>> PATH=install_path/bin:$PATH
>> LD_LIBRARY_PATH=install_path/lib:$LD_LIBRARY_PATH
>> 
>> You can then use mpicc, mpicxx, mpif90, etc to compile and either mpirun
>> or aprun to run. If you are running at scale I would recommend against
>> using aprun for now. I also recommend you change your programming
>> environment to either PrgEnv-gnu or PrgEnv-intel. The PGI compiler can be
>> a PIA. It is possible to build with the Cray compiler but it takes
>> patching the config.guess and changing some autoconf stuff.
>> 
>> -Nathan
>> 
>> Please excuse the horrible Outlook-style quoting.
>> 
>> From: users [users-boun...@open-mpi.org] on behalf of Teranishi, Keita
>> [knte...@sandia.gov]
>> Sent: Thursday, August 29, 2013 8:01 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6)
>> is working for OpenMPI-1.6.5?
>> 
>> Thanks for the info.  Is it still possible to build by myself?  What is
>> the procedure other than configure script?
>> 
>> 
>> 
>> 
>> 
>> On 8/23/13 2:37 PM, "Nathan Hjelm"  wrote:
>> 
>>> On Fri, Aug 23, 2013 at 09:14:25PM +, Teranishi, Keita wrote:
   Hi,
   I am trying to install OpenMPI 1.6.5 on Cray XE6 and very curious
 with the
   current support of PMI.  In the previous discussions, there was a
 comment
   on the version of PMI (it works with 2.1.4, but fails with 3.0).
 Our
>>> 
>>> Open MPI 1.6.5 does not have support for the XE-6. Use 1.7.2 instead.
>>> 
   machine has PMI2.1.4 and PMI4.0 (default).  Which version do you
>>> 
>>> There was a regression in PMI 3.x.x that still exists in 4.0.x that
>>> causes a warning to be printed on every rank when using mpirun. We are
>>> working with Cray to resolve the issue. For now use 2.1.4. See the
>>> platform files in contrib/platform/lanl/cray_xe6. The platform files you
>>> would want to use are debug-lustre or optimized-lusre.
>>> 
>>> BTW, 1.7.2 is installed on Cielo and Cielito. Just run:
>>> 
>>> module swap PrgEnv-pgi PrgEnv-gnu (PrgEnv-intel also works)
>>> module unload cray-mpich2 xt-libsci
>>> module load openmpi/1.7.2
>>> 
>>> 
>>> -Nathan Hjelm
>>> Open MPI Team, HPC-3, LANL
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?

2013-09-03 Thread Teranishi, Keita
It is what I got.

--
There are not enough slots available in the system to satisfy the 16 slots
that were requested by the application:
  /home/knteran/test-openmpi/cpi

Either request fewer slots for your application, or make more slots
available
for use.
--

Thanks,
Keita



On 9/3/13 1:26 PM, "Ralph Castain"  wrote:

>How does it fail?
>
>On Sep 3, 2013, at 1:19 PM, "Teranishi, Keita"  wrote:
>
>> Nathan,
>> 
>> Thanks for the help.  I can run a job using openmpi, assigning a signle
>> process per node.  However, I have been failing to run a job using
>> multiple MPI ranks in a single node.  In other words, "mpiexec
>> --bind-to-core --npernode 16 --n 16 ./test" never works (apron -n 16
>>works
>> fine).  DO you have any thought about it?
>> 
>> Thanks,
>> -
>> Keita Teranishi
>> R&D Principal Staff Member
>> Scalable Modeling and Analysis Systems
>> Sandia National Laboratories
>> Livermore, CA 94551
>> 
>> 
>> 
>> 
>> On 8/30/13 8:49 AM, "Hjelm, Nathan T"  wrote:
>> 
>>> Replace install_path to where you want Open MPI installed.
>>> 
>>> ./configure --prefix=install_path
>>> --with-platform=contrib/platform/lanl/cray_xe6/optimized-luster
>>> make
>>> make install
>>> 
>>> To use Open MPI just set the PATH and LD_LIBRARY_PATH:
>>> 
>>> PATH=install_path/bin:$PATH
>>> LD_LIBRARY_PATH=install_path/lib:$LD_LIBRARY_PATH
>>> 
>>> You can then use mpicc, mpicxx, mpif90, etc to compile and either
>>>mpirun
>>> or aprun to run. If you are running at scale I would recommend against
>>> using aprun for now. I also recommend you change your programming
>>> environment to either PrgEnv-gnu or PrgEnv-intel. The PGI compiler can
>>>be
>>> a PIA. It is possible to build with the Cray compiler but it takes
>>> patching the config.guess and changing some autoconf stuff.
>>> 
>>> -Nathan
>>> 
>>> Please excuse the horrible Outlook-style quoting.
>>> 
>>> From: users [users-boun...@open-mpi.org] on behalf of Teranishi, Keita
>>> [knte...@sandia.gov]
>>> Sent: Thursday, August 29, 2013 8:01 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6)
>>> is working for OpenMPI-1.6.5?
>>> 
>>> Thanks for the info.  Is it still possible to build by myself?  What is
>>> the procedure other than configure script?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 8/23/13 2:37 PM, "Nathan Hjelm"  wrote:
>>> 
 On Fri, Aug 23, 2013 at 09:14:25PM +, Teranishi, Keita wrote:
>   Hi,
>   I am trying to install OpenMPI 1.6.5 on Cray XE6 and very curious
> with the
>   current support of PMI.  In the previous discussions, there was a
> comment
>   on the version of PMI (it works with 2.1.4, but fails with 3.0).
> Our
 
 Open MPI 1.6.5 does not have support for the XE-6. Use 1.7.2 instead.
 
>   machine has PMI2.1.4 and PMI4.0 (default).  Which version do you
 
 There was a regression in PMI 3.x.x that still exists in 4.0.x that
 causes a warning to be printed on every rank when using mpirun. We are
 working with Cray to resolve the issue. For now use 2.1.4. See the
 platform files in contrib/platform/lanl/cray_xe6. The platform files
you
 would want to use are debug-lustre or optimized-lusre.
 
 BTW, 1.7.2 is installed on Cielo and Cielito. Just run:
 
 module swap PrgEnv-pgi PrgEnv-gnu (PrgEnv-intel also works)
 module unload cray-mpich2 xt-libsci
 module load openmpi/1.7.2
 
 
 -Nathan Hjelm
 Open MPI Team, HPC-3, LANL
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>___
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?

2013-09-03 Thread Ralph Castain
Interesting - and do you have an allocation? If so, what was it - i.e., can you 
check the allocation envar to see if you have 16 slots?


On Sep 3, 2013, at 1:38 PM, "Teranishi, Keita"  wrote:

> It is what I got.
> 
> --
> There are not enough slots available in the system to satisfy the 16 slots
> that were requested by the application:
>  /home/knteran/test-openmpi/cpi
> 
> Either request fewer slots for your application, or make more slots
> available
> for use.
> --
> 
> Thanks,
> Keita
> 
> 
> 
> On 9/3/13 1:26 PM, "Ralph Castain"  wrote:
> 
>> How does it fail?
>> 
>> On Sep 3, 2013, at 1:19 PM, "Teranishi, Keita"  wrote:
>> 
>>> Nathan,
>>> 
>>> Thanks for the help.  I can run a job using openmpi, assigning a signle
>>> process per node.  However, I have been failing to run a job using
>>> multiple MPI ranks in a single node.  In other words, "mpiexec
>>> --bind-to-core --npernode 16 --n 16 ./test" never works (apron -n 16
>>> works
>>> fine).  DO you have any thought about it?
>>> 
>>> Thanks,
>>> -
>>> Keita Teranishi
>>> R&D Principal Staff Member
>>> Scalable Modeling and Analysis Systems
>>> Sandia National Laboratories
>>> Livermore, CA 94551
>>> 
>>> 
>>> 
>>> 
>>> On 8/30/13 8:49 AM, "Hjelm, Nathan T"  wrote:
>>> 
 Replace install_path to where you want Open MPI installed.
 
 ./configure --prefix=install_path
 --with-platform=contrib/platform/lanl/cray_xe6/optimized-luster
 make
 make install
 
 To use Open MPI just set the PATH and LD_LIBRARY_PATH:
 
 PATH=install_path/bin:$PATH
 LD_LIBRARY_PATH=install_path/lib:$LD_LIBRARY_PATH
 
 You can then use mpicc, mpicxx, mpif90, etc to compile and either
 mpirun
 or aprun to run. If you are running at scale I would recommend against
 using aprun for now. I also recommend you change your programming
 environment to either PrgEnv-gnu or PrgEnv-intel. The PGI compiler can
 be
 a PIA. It is possible to build with the Cray compiler but it takes
 patching the config.guess and changing some autoconf stuff.
 
 -Nathan
 
 Please excuse the horrible Outlook-style quoting.
 
 From: users [users-boun...@open-mpi.org] on behalf of Teranishi, Keita
 [knte...@sandia.gov]
 Sent: Thursday, August 29, 2013 8:01 PM
 To: Open MPI Users
 Subject: Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6)
 is working for OpenMPI-1.6.5?
 
 Thanks for the info.  Is it still possible to build by myself?  What is
 the procedure other than configure script?
 
 
 
 
 
 On 8/23/13 2:37 PM, "Nathan Hjelm"  wrote:
 
> On Fri, Aug 23, 2013 at 09:14:25PM +, Teranishi, Keita wrote:
>>  Hi,
>>  I am trying to install OpenMPI 1.6.5 on Cray XE6 and very curious
>> with the
>>  current support of PMI.  In the previous discussions, there was a
>> comment
>>  on the version of PMI (it works with 2.1.4, but fails with 3.0).
>> Our
> 
> Open MPI 1.6.5 does not have support for the XE-6. Use 1.7.2 instead.
> 
>>  machine has PMI2.1.4 and PMI4.0 (default).  Which version do you
> 
> There was a regression in PMI 3.x.x that still exists in 4.0.x that
> causes a warning to be printed on every rank when using mpirun. We are
> working with Cray to resolve the issue. For now use 2.1.4. See the
> platform files in contrib/platform/lanl/cray_xe6. The platform files
> you
> would want to use are debug-lustre or optimized-lusre.
> 
> BTW, 1.7.2 is installed on Cielo and Cielito. Just run:
> 
> module swap PrgEnv-pgi PrgEnv-gnu (PrgEnv-intel also works)
> module unload cray-mpich2 xt-libsci
> module load openmpi/1.7.2
> 
> 
> -Nathan Hjelm
> Open MPI Team, HPC-3, LANL
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
 
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?

2013-09-03 Thread Teranishi, Keita
Hi,

Here is what I put in my PBS script to allocate only single node (I want
to use 16 MPI processes in a single node).

#PBS -l mppwidth=16
#PBS -l mppnppn=16
#PBS -l mppdepth=1

Here is the output from aprun (aprun -n 16 -N 16).
Process 2 of 16 is on nid00017
Process 5 of 16 is on nid00017
Process 8 of 16 is on nid00017
Process 12 of 16 is on nid00017
Process 4 of 16 is on nid00017
Process 14 of 16 is on nid00017
Process 0 of 16 is on nid00017
Process 1 of 16 is on nid00017
Process 3 of 16 is on nid00017
Process 13 of 16 is on nid00017
Process 9 of 16 is on nid00017
Process 6 of 16 is on nid00017
Process 11 of 16 is on nid00017
Process 10 of 16 is on nid00017
Process 7 of 16 is on nid00017
Process 15 of 16 is on nid00017



I am guessing that the CrayXE6 here is different from the others in
production (it is 1 cabinet configuration for code development) and I am
afraid mpirun/mpiexec does wrong instantiation of aprun command. Do I have
to edit the script in contrib?


Thanks,
Keita

On 9/3/13 2:51 PM, "Ralph Castain"  wrote:

>Interesting - and do you have an allocation? If so, what was it - i.e.,
>can you check the allocation envar to see if you have 16 slots?
>
>
>On Sep 3, 2013, at 1:38 PM, "Teranishi, Keita"  wrote:
>
>> It is what I got.
>> 
>> 
>>-
>>-
>> There are not enough slots available in the system to satisfy the 16
>>slots
>> that were requested by the application:
>>  /home/knteran/test-openmpi/cpi
>> 
>> Either request fewer slots for your application, or make more slots
>> available
>> for use.
>> 
>>-
>>-
>> 
>> Thanks,
>> Keita
>> 
>> 
>> 
>> On 9/3/13 1:26 PM, "Ralph Castain"  wrote:
>> 
>>> How does it fail?
>>> 
>>> On Sep 3, 2013, at 1:19 PM, "Teranishi, Keita" 
>>>wrote:
>>> 
 Nathan,
 
 Thanks for the help.  I can run a job using openmpi, assigning a
signle
 process per node.  However, I have been failing to run a job using
 multiple MPI ranks in a single node.  In other words, "mpiexec
 --bind-to-core --npernode 16 --n 16 ./test" never works (apron -n 16
 works
 fine).  DO you have any thought about it?
 
 Thanks,
 -
 Keita Teranishi
 R&D Principal Staff Member
 Scalable Modeling and Analysis Systems
 Sandia National Laboratories
 Livermore, CA 94551
 
 
 
 
 On 8/30/13 8:49 AM, "Hjelm, Nathan T"  wrote:
 
> Replace install_path to where you want Open MPI installed.
> 
> ./configure --prefix=install_path
> --with-platform=contrib/platform/lanl/cray_xe6/optimized-luster
> make
> make install
> 
> To use Open MPI just set the PATH and LD_LIBRARY_PATH:
> 
> PATH=install_path/bin:$PATH
> LD_LIBRARY_PATH=install_path/lib:$LD_LIBRARY_PATH
> 
> You can then use mpicc, mpicxx, mpif90, etc to compile and either
> mpirun
> or aprun to run. If you are running at scale I would recommend
>against
> using aprun for now. I also recommend you change your programming
> environment to either PrgEnv-gnu or PrgEnv-intel. The PGI compiler
>can
> be
> a PIA. It is possible to build with the Cray compiler but it takes
> patching the config.guess and changing some autoconf stuff.
> 
> -Nathan
> 
> Please excuse the horrible Outlook-style quoting.
> 
> From: users [users-boun...@open-mpi.org] on behalf of Teranishi,
>Keita
> [knte...@sandia.gov]
> Sent: Thursday, August 29, 2013 8:01 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray
>XE6)
> is working for OpenMPI-1.6.5?
> 
> Thanks for the info.  Is it still possible to build by myself?  What
>is
> the procedure other than configure script?
> 
> 
> 
> 
> 
> On 8/23/13 2:37 PM, "Nathan Hjelm"  wrote:
> 
>> On Fri, Aug 23, 2013 at 09:14:25PM +, Teranishi, Keita wrote:
>>>  Hi,
>>>  I am trying to install OpenMPI 1.6.5 on Cray XE6 and very curious
>>> with the
>>>  current support of PMI.  In the previous discussions, there was a
>>> comment
>>>  on the version of PMI (it works with 2.1.4, but fails with 3.0).
>>> Our
>> 
>> Open MPI 1.6.5 does not have support for the XE-6. Use 1.7.2
>>instead.
>> 
>>>  machine has PMI2.1.4 and PMI4.0 (default).  Which version do you
>> 
>> There was a regression in PMI 3.x.x that still exists in 4.0.x that
>> causes a warning to be printed on every rank when using mpirun. We
>>are
>> working with Cray to resolve the issue. For now use 2.1.4. See the
>> platform files in contrib/platform/lanl/cray_xe6. The platform files
>> you
>> would want to use are debug-lustre or optimized-lusre.
>> 
>

Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?

2013-09-03 Thread Nathan Hjelm
Hmm, what CLE release is your development cluster running? It is the value
after PrgEnv. Ex. on Cielito we have 4.1.40.

32) PrgEnv-gnu/4.1.40

We have not yet fully tested Open MPI on CLE 5.x.x.

-Nathan Hjelm
HPC-3, LANL

On Tue, Sep 03, 2013 at 10:33:57PM +, Teranishi, Keita wrote:
> Hi,
> 
> Here is what I put in my PBS script to allocate only single node (I want
> to use 16 MPI processes in a single node).
> 
> #PBS -l mppwidth=16
> #PBS -l mppnppn=16
> #PBS -l mppdepth=1
> 
> Here is the output from aprun (aprun -n 16 -N 16).
> Process 2 of 16 is on nid00017
> Process 5 of 16 is on nid00017
> Process 8 of 16 is on nid00017
> Process 12 of 16 is on nid00017
> Process 4 of 16 is on nid00017
> Process 14 of 16 is on nid00017
> Process 0 of 16 is on nid00017
> Process 1 of 16 is on nid00017
> Process 3 of 16 is on nid00017
> Process 13 of 16 is on nid00017
> Process 9 of 16 is on nid00017
> Process 6 of 16 is on nid00017
> Process 11 of 16 is on nid00017
> Process 10 of 16 is on nid00017
> Process 7 of 16 is on nid00017
> Process 15 of 16 is on nid00017
> 
> 
> 
> I am guessing that the CrayXE6 here is different from the others in
> production (it is 1 cabinet configuration for code development) and I am
> afraid mpirun/mpiexec does wrong instantiation of aprun command. Do I have
> to edit the script in contrib?
> 
> 
> Thanks,
> Keita
> 
> On 9/3/13 2:51 PM, "Ralph Castain"  wrote:
> 
> >Interesting - and do you have an allocation? If so, what was it - i.e.,
> >can you check the allocation envar to see if you have 16 slots?
> >
> >
> >On Sep 3, 2013, at 1:38 PM, "Teranishi, Keita"  wrote:
> >
> >> It is what I got.
> >> 
> >> 
> >>-
> >>-
> >> There are not enough slots available in the system to satisfy the 16
> >>slots
> >> that were requested by the application:
> >>  /home/knteran/test-openmpi/cpi
> >> 
> >> Either request fewer slots for your application, or make more slots
> >> available
> >> for use.
> >> 
> >>-
> >>-
> >> 
> >> Thanks,
> >> Keita
> >> 
> >> 
> >> 
> >> On 9/3/13 1:26 PM, "Ralph Castain"  wrote:
> >> 
> >>> How does it fail?
> >>> 
> >>> On Sep 3, 2013, at 1:19 PM, "Teranishi, Keita" 
> >>>wrote:
> >>> 
>  Nathan,
>  
>  Thanks for the help.  I can run a job using openmpi, assigning a
> signle
>  process per node.  However, I have been failing to run a job using
>  multiple MPI ranks in a single node.  In other words, "mpiexec
>  --bind-to-core --npernode 16 --n 16 ./test" never works (apron -n 16
>  works
>  fine).  DO you have any thought about it?
>  
>  Thanks,
>  -
>  Keita Teranishi
>  R&D Principal Staff Member
>  Scalable Modeling and Analysis Systems
>  Sandia National Laboratories
>  Livermore, CA 94551
>  
>  
>  
>  
>  On 8/30/13 8:49 AM, "Hjelm, Nathan T"  wrote:
>  
> > Replace install_path to where you want Open MPI installed.
> > 
> > ./configure --prefix=install_path
> > --with-platform=contrib/platform/lanl/cray_xe6/optimized-luster
> > make
> > make install
> > 
> > To use Open MPI just set the PATH and LD_LIBRARY_PATH:
> > 
> > PATH=install_path/bin:$PATH
> > LD_LIBRARY_PATH=install_path/lib:$LD_LIBRARY_PATH
> > 
> > You can then use mpicc, mpicxx, mpif90, etc to compile and either
> > mpirun
> > or aprun to run. If you are running at scale I would recommend
> >against
> > using aprun for now. I also recommend you change your programming
> > environment to either PrgEnv-gnu or PrgEnv-intel. The PGI compiler
> >can
> > be
> > a PIA. It is possible to build with the Cray compiler but it takes
> > patching the config.guess and changing some autoconf stuff.
> > 
> > -Nathan
> > 
> > Please excuse the horrible Outlook-style quoting.
> > 
> > From: users [users-boun...@open-mpi.org] on behalf of Teranishi,
> >Keita
> > [knte...@sandia.gov]
> > Sent: Thursday, August 29, 2013 8:01 PM
> > To: Open MPI Users
> > Subject: Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray
> >XE6)
> > is working for OpenMPI-1.6.5?
> > 
> > Thanks for the info.  Is it still possible to build by myself?  What
> >is
> > the procedure other than configure script?
> > 
> > 
> > 
> > 
> > 
> > On 8/23/13 2:37 PM, "Nathan Hjelm"  wrote:
> > 
> >> On Fri, Aug 23, 2013 at 09:14:25PM +, Teranishi, Keita wrote:
> >>>  Hi,
> >>>  I am trying to install OpenMPI 1.6.5 on Cray XE6 and very curious
> >>> with the
> >>>  current support of PMI.  In the previous discussions, there was a
> >>> comment
> >>>  on the version of PMI (it works with 2.1.4, but fails with 3.

Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?

2013-09-03 Thread Teranishi, Keita
Nathan,

It is close to Cielo and use resource manager under
/opt/cray/xe-sysroot/4.1.40/usr.

Currently Loaded Modulefiles:
  1) modules/3.2.6.7 17)
csa/3.0.0-1_2.0401.37452.4.50.gem
  2) craype-network-gemini   18)
job/1.5.5-0.1_2.0401.35380.1.10.gem
  3) cray-mpich2/5.6.4   19)
xpmem/0.1-2.0401.36790.4.3.gem
  4) atp/1.6.3   20)
gni-headers/2.1-1.0401.5675.4.4.gem
  5) xe-sysroot/4.1.40   21)
dmapp/3.2.1-1.0401.5983.4.5.gem
  6) switch/1.0-1.0401.36779.2.72.gem22)
pmi/2.1.4-1..8596.8.9.gem
  7) shared-root/1.0-1.0401.37253.3.50.gem   23)
ugni/4.0-1.0401.5928.9.5.gem
  8) pdsh/2.26-1.0401.37449.1.1.gem  24)
udreg/2.3.2-1.0401.5929.3.3.gem
  9) nodehealth/5.0-1.0401.38460.12.18.gem   25) xt-libsci/12.0.00
 10) lbcd/2.1-1.0401.35360.1.2.gem   26) xt-totalview/8.12.0
 11) hosts/1.0-1.0401.35364.1.115.gem27) totalview-support/1.1.4
 12) configuration/1.0-1.0401.35391.1.2.gem  28) gcc/4.7.2
 13) ccm/2.2.0-1.0401.37254.2.14229) xt-asyncpe/5.22
 14) audit/1.0.0-1.0401.37969.2.32.gem   30) eswrap/1.0.8
 15) rca/1.0.0-2.0401.38656.2.2.gem  31) craype-mc8
 16) dvs/1.8.6_0.9.0-1.0401.1401.1.120   32) PrgEnv-gnu/4.1.40


Thanks,
Keita



On 9/3/13 3:42 PM, "Nathan Hjelm"  wrote:

>Hmm, what CLE release is your development cluster running? It is the value
>after PrgEnv. Ex. on Cielito we have 4.1.40.
>
>32) PrgEnv-gnu/4.1.40
>
>We have not yet fully tested Open MPI on CLE 5.x.x.
>
>-Nathan Hjelm
>HPC-3, LANL
>
>On Tue, Sep 03, 2013 at 10:33:57PM +, Teranishi, Keita wrote:
>> Hi,
>> 
>> Here is what I put in my PBS script to allocate only single node (I want
>> to use 16 MPI processes in a single node).
>> 
>> #PBS -l mppwidth=16
>> #PBS -l mppnppn=16
>> #PBS -l mppdepth=1
>> 
>> Here is the output from aprun (aprun -n 16 -N 16).
>> Process 2 of 16 is on nid00017
>> Process 5 of 16 is on nid00017
>> Process 8 of 16 is on nid00017
>> Process 12 of 16 is on nid00017
>> Process 4 of 16 is on nid00017
>> Process 14 of 16 is on nid00017
>> Process 0 of 16 is on nid00017
>> Process 1 of 16 is on nid00017
>> Process 3 of 16 is on nid00017
>> Process 13 of 16 is on nid00017
>> Process 9 of 16 is on nid00017
>> Process 6 of 16 is on nid00017
>> Process 11 of 16 is on nid00017
>> Process 10 of 16 is on nid00017
>> Process 7 of 16 is on nid00017
>> Process 15 of 16 is on nid00017
>> 
>> 
>> 
>> I am guessing that the CrayXE6 here is different from the others in
>> production (it is 1 cabinet configuration for code development) and I am
>> afraid mpirun/mpiexec does wrong instantiation of aprun command. Do I
>>have
>> to edit the script in contrib?
>> 
>> 
>> Thanks,
>> Keita
>> 
>> On 9/3/13 2:51 PM, "Ralph Castain"  wrote:
>> 
>> >Interesting - and do you have an allocation? If so, what was it - i.e.,
>> >can you check the allocation envar to see if you have 16 slots?
>> >
>> >
>> >On Sep 3, 2013, at 1:38 PM, "Teranishi, Keita" 
>>wrote:
>> >
>> >> It is what I got.
>> >> 
>> >> 
>> 
---
--
>> >>-
>> >> There are not enough slots available in the system to satisfy the 16
>> >>slots
>> >> that were requested by the application:
>> >>  /home/knteran/test-openmpi/cpi
>> >> 
>> >> Either request fewer slots for your application, or make more slots
>> >> available
>> >> for use.
>> >> 
>> 
---
--
>> >>-
>> >> 
>> >> Thanks,
>> >> Keita
>> >> 
>> >> 
>> >> 
>> >> On 9/3/13 1:26 PM, "Ralph Castain"  wrote:
>> >> 
>> >>> How does it fail?
>> >>> 
>> >>> On Sep 3, 2013, at 1:19 PM, "Teranishi, Keita" 
>> >>>wrote:
>> >>> 
>>  Nathan,
>>  
>>  Thanks for the help.  I can run a job using openmpi, assigning a
>> signle
>>  process per node.  However, I have been failing to run a job using
>>  multiple MPI ranks in a single node.  In other words, "mpiexec
>>  --bind-to-core --npernode 16 --n 16 ./test" never works (apron -n
>>16
>>  works
>>  fine).  DO you have any thought about it?
>>  
>>  Thanks,
>>  -
>>  Keita Teranishi
>>  R&D Principal Staff Member
>>  Scalable Modeling and Analysis Systems
>>  Sandia National Laboratories
>>  Livermore, CA 94551
>>  
>>  
>>  
>>  
>>  On 8/30/13 8:49 AM, "Hjelm, Nathan T"  wrote:
>>  
>> > Replace install_path to where you want Open MPI installed.
>> > 
>> > ./configure --prefix=install_path
>> > --with-platform=contrib/platform/lanl/cray_xe6/optimized-luster
>> > make
>> > make install
>> > 
>> > To use Open MPI just set the PATH and LD_LIBRARY_PATH:
>> > 
>> > PATH=install_path/bin:$PATH
>> > LD_LIBRARY_PATH=install_path/lib:$LD_LIBRARY_PATH
>> > 
>> > You can then use mpicc, mpicxx, mpif9

Re: [OMPI users] [EXTERNAL] Re: What version of PMI (Cray XE6) is working for OpenMPI-1.6.5?

2013-09-03 Thread Nathan Hjelm
Interesting. That should work then. I haven't tested it under batch mode 
though. Let
me try to reproduce on Cielito and see what happens.

-Nathan

On Tue, Sep 03, 2013 at 11:04:40PM +, Teranishi, Keita wrote:
> Nathan,
> 
> It is close to Cielo and use resource manager under
> /opt/cray/xe-sysroot/4.1.40/usr.
> 
> Currently Loaded Modulefiles:
>   1) modules/3.2.6.7 17)
> csa/3.0.0-1_2.0401.37452.4.50.gem
>   2) craype-network-gemini   18)
> job/1.5.5-0.1_2.0401.35380.1.10.gem
>   3) cray-mpich2/5.6.4   19)
> xpmem/0.1-2.0401.36790.4.3.gem
>   4) atp/1.6.3   20)
> gni-headers/2.1-1.0401.5675.4.4.gem
>   5) xe-sysroot/4.1.40   21)
> dmapp/3.2.1-1.0401.5983.4.5.gem
>   6) switch/1.0-1.0401.36779.2.72.gem22)
> pmi/2.1.4-1..8596.8.9.gem
>   7) shared-root/1.0-1.0401.37253.3.50.gem   23)
> ugni/4.0-1.0401.5928.9.5.gem
>   8) pdsh/2.26-1.0401.37449.1.1.gem  24)
> udreg/2.3.2-1.0401.5929.3.3.gem
>   9) nodehealth/5.0-1.0401.38460.12.18.gem   25) xt-libsci/12.0.00
>  10) lbcd/2.1-1.0401.35360.1.2.gem   26) xt-totalview/8.12.0
>  11) hosts/1.0-1.0401.35364.1.115.gem27) totalview-support/1.1.4
>  12) configuration/1.0-1.0401.35391.1.2.gem  28) gcc/4.7.2
>  13) ccm/2.2.0-1.0401.37254.2.14229) xt-asyncpe/5.22
>  14) audit/1.0.0-1.0401.37969.2.32.gem   30) eswrap/1.0.8
>  15) rca/1.0.0-2.0401.38656.2.2.gem  31) craype-mc8
>  16) dvs/1.8.6_0.9.0-1.0401.1401.1.120   32) PrgEnv-gnu/4.1.40
> 
> 
> Thanks,
> Keita
> 
> 
> 
> On 9/3/13 3:42 PM, "Nathan Hjelm"  wrote:
> 
> >Hmm, what CLE release is your development cluster running? It is the value
> >after PrgEnv. Ex. on Cielito we have 4.1.40.
> >
> >32) PrgEnv-gnu/4.1.40
> >
> >We have not yet fully tested Open MPI on CLE 5.x.x.
> >
> >-Nathan Hjelm
> >HPC-3, LANL
> >
> >On Tue, Sep 03, 2013 at 10:33:57PM +, Teranishi, Keita wrote:
> >> Hi,
> >> 
> >> Here is what I put in my PBS script to allocate only single node (I want
> >> to use 16 MPI processes in a single node).
> >> 
> >> #PBS -l mppwidth=16
> >> #PBS -l mppnppn=16
> >> #PBS -l mppdepth=1
> >> 
> >> Here is the output from aprun (aprun -n 16 -N 16).
> >> Process 2 of 16 is on nid00017
> >> Process 5 of 16 is on nid00017
> >> Process 8 of 16 is on nid00017
> >> Process 12 of 16 is on nid00017
> >> Process 4 of 16 is on nid00017
> >> Process 14 of 16 is on nid00017
> >> Process 0 of 16 is on nid00017
> >> Process 1 of 16 is on nid00017
> >> Process 3 of 16 is on nid00017
> >> Process 13 of 16 is on nid00017
> >> Process 9 of 16 is on nid00017
> >> Process 6 of 16 is on nid00017
> >> Process 11 of 16 is on nid00017
> >> Process 10 of 16 is on nid00017
> >> Process 7 of 16 is on nid00017
> >> Process 15 of 16 is on nid00017
> >> 
> >> 
> >> 
> >> I am guessing that the CrayXE6 here is different from the others in
> >> production (it is 1 cabinet configuration for code development) and I am
> >> afraid mpirun/mpiexec does wrong instantiation of aprun command. Do I
> >>have
> >> to edit the script in contrib?
> >> 
> >> 
> >> Thanks,
> >> Keita
> >> 
> >> On 9/3/13 2:51 PM, "Ralph Castain"  wrote:
> >> 
> >> >Interesting - and do you have an allocation? If so, what was it - i.e.,
> >> >can you check the allocation envar to see if you have 16 slots?
> >> >
> >> >
> >> >On Sep 3, 2013, at 1:38 PM, "Teranishi, Keita" 
> >>wrote:
> >> >
> >> >> It is what I got.
> >> >> 
> >> >> 
> >> 
> ---
> --
> >> >>-
> >> >> There are not enough slots available in the system to satisfy the 16
> >> >>slots
> >> >> that were requested by the application:
> >> >>  /home/knteran/test-openmpi/cpi
> >> >> 
> >> >> Either request fewer slots for your application, or make more slots
> >> >> available
> >> >> for use.
> >> >> 
> >> 
> ---
> --
> >> >>-
> >> >> 
> >> >> Thanks,
> >> >> Keita
> >> >> 
> >> >> 
> >> >> 
> >> >> On 9/3/13 1:26 PM, "Ralph Castain"  wrote:
> >> >> 
> >> >>> How does it fail?
> >> >>> 
> >> >>> On Sep 3, 2013, at 1:19 PM, "Teranishi, Keita" 
> >> >>>wrote:
> >> >>> 
> >>  Nathan,
> >>  
> >>  Thanks for the help.  I can run a job using openmpi, assigning a
> >> signle
> >>  process per node.  However, I have been failing to run a job using
> >>  multiple MPI ranks in a single node.  In other words, "mpiexec
> >>  --bind-to-core --npernode 16 --n 16 ./test" never works (apron -n
> >>16
> >>  works
> >>  fine).  DO you have any thought about it?
> >>  
> >>  Thanks,
> >>  -
> >>  Keita Teranishi
> >>  R&D Principal Staff Member
> >>  Scalable Modeling and Analysis Systems
> >>  Sandia National Laboratories
> >>  Livermore, CA 94551
> >>  
> >>  
> >>  
> >>  
> >> >>>