[OMPI users] SIGV at MPI_Cart_sub

2012-01-10 Thread Anas Al-Trad
Dear people,
   In my application, I have the segmentation fault of Integer
Divide-by-zero when calling MPI_cart_sub routine. My program is as follows,
I have 128 ranks, I make a new communicator of the first 96 ranks via
MPI_Comm_creat. Then I create a grid of 8X12 by calling MPI_Cart_create.
After creating the grid if I call MPI_Cart_sub then I have that error.

This error happens also when I use a communicator of 24 ranks and create a
grid of 4X6. Can you please help me in solving this?

Regards,
Anas


Re: [OMPI users] SIGV at MPI_Cart_sub

2012-01-10 Thread Paul Kapinos

A blind guess: did you use Intel compiler?
If so, there is/was an error leading to SIGSEGV _in Open MPI itselv_.

http://www.open-mpi.org/community/lists/users/2012/01/18091.php

If the SIGSEGV arise not in OpenMPI but in application itself it may be 
a programming issue.. In any case, more precisely answer are impossible 
without seeing any codes snippet and/or logs.


Best,
Paul


Anas Al-Trad wrote:
Dear people, 
   In my application, I have the segmentation fault of 
Integer Divide-by-zero when calling MPI_cart_sub routine. My program is 
as follows, I have 128 ranks, I make a new communicator of the first 96 
ranks via MPI_Comm_creat. Then I create a grid of 8X12 by calling 
MPI_Cart_create. After creating the grid if I call MPI_Cart_sub then I 
have that error.


This error happens also when I use a communicator of 24 ranks and create 
a grid of 4X6. Can you please help me in solving this?


Regards,
Anas






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI users] SIGV at MPI_Cart_sub

2012-01-10 Thread Anas Al-Trad
Thanks Paul,
yes I use Intel 12.1.0, and this error is intermittent, not always produced
but most of the times it occurs.
My program is large and contains many files that are related to each other,
I don't think it will help if I take the snippet of the code. The program
run parallel matrix multiplication algorithms. I don't know if it is
because of my code or not, but I run the program for small matrices sizes
and the program completes until the end without error while for large
inputs it will hang or give that sigv.

Regards,
Anas


[OMPI users] Strange TCP latency results on Amazon EC2

2012-01-10 Thread Roberto Rey
Hi,

I'm running some tests on EC2 cluster instances with 10 Gigabit Ethernet
hardware and I'm getting strange latency results with Netpipe and OpenMPI.

If I run Netpipe over OpenMPI (NPmpi) I get a network latency around 60
microseconds for small messages (less than 2kbytes). However, when I run
Netpipe over TCP (NPtcp) I always get around 100 microseconds. For bigger
messages everything seems to be OK.

I'm using the BTL TCP in OpenMPI, so I can't understand why OpenMPI
outperforms raw TCP performance for small messages (40us of difference). I
also have run the PingPong test from the Intel Media Benchmarks and the
latency results for OpenMPI are very similar (60us) to those obtained with
NPmpi

Can OpenMPI outperform Netpipe over TCP? Why? Is OpenMPI  doing any
optimization in BTL TCP?

The results for OpenMPI aren't so good but we must take into account the
network virtualization overhead under Xen

Thanks for your reply


Re: [OMPI users] SIGV at MPI_Cart_sub

2012-01-10 Thread Ralph Castain
Have you tried the suggested fix from the email thread Paul cited? Sounds to me 
like the most likely cause of the problem, assuming it comes from inside OMPI.

Have you looked at the backtrace to see if it is indeed inside OMPI vs your 
code?

On Jan 10, 2012, at 6:13 AM, Anas Al-Trad wrote:

> 
> Thanks Paul, 
> yes I use Intel 12.1.0, and this error is intermittent, not always produced 
> but most of the times it occurs.
> My program is large and contains many files that are related to each other, I 
> don't think it will help if I take the snippet of the code. The program run 
> parallel matrix multiplication algorithms. I don't know if it is because of 
> my code or not, but I run the program for small matrices sizes and the 
> program completes until the end without error while for large inputs it will 
> hang or give that sigv.
> 
> Regards,
> Anas
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] SIGV at MPI_Cart_sub

2012-01-10 Thread Anas Al-Trad
 Hi Ralph, I changed the intel icc module from 12.1.0 to 11.1.069, the
previous default one used at a Neolith Cluster. I submitted the job and I
still waiting for the result. Here is the message of the segmentation fault:

[n764:29867] *** Process received signal ***
[n764:29867] Signal: Floating point exception (8)
[n764:29867] Signal code: Integer divide-by-zero (1)
[n764:29867] Failing at address: 0x2ba640e74627
[n764:29867] [ 0] /lib64/libc.so.6 [0x2ba641e162d0]
[n764:29867] [ 1]
/software/mpi/openmpi/1.4.1/i101011/lib/libmpi.so.0(mca_topo_base_cart_coords+0x43)
[0x2ba640e74627]
[n764:29867] [ 2]
/software/mpi/openmpi/1.4.1/i101011/lib/libmpi.so.0(mca_topo_base_cart_sub+0x1d5)
[0x2ba640e74acd]
[n764:29867] [ 3]
/software/mpi/openmpi/1.4.1/i101011/lib/libmpi.so.0(MPI_Cart_sub+0x35)
[0x2ba640e472d9]
[n764:29867] [ 4]
/home/x_anaal/thesis/cimple/tst_chng_p/v5/r2/output.o(Compute_SUMMA1+0x226)
[0x4088da]
[n764:29867] [ 5]
/home/x_anaal/thesis/cimple/tst_chng_p/v5/r2/output.o(variant_run+0xb2)
[0x409058]
[n764:29867] [ 6]
/home/x_anaal/thesis/cimple/tst_chng_p/v5/r2/output.o(main+0xf90) [0x40eeba]
[n764:29867] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2ba641e03994]
[n764:29867] [ 8] /home/x_anaal/thesis/cimple/tst_chng_p/v5/r2/output.o
[0x403fd9]
[n764:29867] *** End of error message ***

when I run my application, sometimes I get this error and sometimes it is
stuck in the middle.


Re: [OMPI users] SIGV at MPI_Cart_sub

2012-01-10 Thread Jeff Squyres
This may be a dumb question, but are you 100% sure that the input values are 
correct?

On Jan 10, 2012, at 8:16 AM, Anas Al-Trad wrote:

>  Hi Ralph, I changed the intel icc module from 12.1.0 to 11.1.069, the 
> previous default one used at a Neolith Cluster. I submitted the job and I 
> still waiting for the result. Here is the message of the segmentation fault:
> 
> [n764:29867] *** Process received signal ***
> [n764:29867] Signal: Floating point exception (8)
> [n764:29867] Signal code: Integer divide-by-zero (1)
> [n764:29867] Failing at address: 0x2ba640e74627
> [n764:29867] [ 0] /lib64/libc.so.6 [0x2ba641e162d0]
> [n764:29867] [ 1] 
> /software/mpi/openmpi/1.4.1/i101011/lib/libmpi.so.0(mca_topo_base_cart_coords+0x43)
>  [0x2ba640e74627]
> [n764:29867] [ 2] 
> /software/mpi/openmpi/1.4.1/i101011/lib/libmpi.so.0(mca_topo_base_cart_sub+0x1d5)
>  [0x2ba640e74acd]
> [n764:29867] [ 3] 
> /software/mpi/openmpi/1.4.1/i101011/lib/libmpi.so.0(MPI_Cart_sub+0x35) 
> [0x2ba640e472d9]
> [n764:29867] [ 4] 
> /home/x_anaal/thesis/cimple/tst_chng_p/v5/r2/output.o(Compute_SUMMA1+0x226) 
> [0x4088da]
> [n764:29867] [ 5] 
> /home/x_anaal/thesis/cimple/tst_chng_p/v5/r2/output.o(variant_run+0xb2) 
> [0x409058]
> [n764:29867] [ 6] 
> /home/x_anaal/thesis/cimple/tst_chng_p/v5/r2/output.o(main+0xf90) [0x40eeba]
> [n764:29867] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2ba641e03994]
> [n764:29867] [ 8] /home/x_anaal/thesis/cimple/tst_chng_p/v5/r2/output.o 
> [0x403fd9]
> [n764:29867] *** End of error message ***
> 
> when I run my application, sometimes I get this error and sometimes it is 
> stuck in the middle.
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Problem launching application on windows

2012-01-10 Thread Shiqing Fan

Hi Alex,

Have you solved the problem?

Another user also spotted the same problem but under Cygwin. Did you 
also see the problem under Cygwin, or in normal Windows command prompt? 
Actually, there shouldn't be anything wrong with sockets in Open MPI to 
cause such errors anymore, but they of course won't work correctly under 
Cygwin in some cases.



Regards,
Shiqing

On 2011-10-28 1:33 PM, Alex van 't Veer wrote:


Hi Shiqing,

Unfortunately that did not solve the problem.

Can you tell me something more about how the sockets work and how they 
could get corrupted? Maybe I can figure out what is going wrong.


Thanks



*From:*Shiqing Fan [mailto:f...@hlrs.de]
*Sent:* Friday, October 28, 2011 12:16 PM
*To:* Open MPI Users
*Cc:* Alex van 't Veer
*Subject:* Re: [OMPI users] Problem launching application on windows

Hi,

This looks not normal, because this error might happen mainly by 
improper sockets. I don't have any clue at moment, as I can't 
reproduce it.


Could you try to reinstall Open MPI? And make sure there is no other 
installation on your system. If this is still not working, try using 
Open MPI 1.5.3. Please let me know whether these will work for you or not.


Regards,
Shiqing

On 2011-10-27 11:35 AM, Alex van 't Veer wrote:

Hi

I've installed the OpenMPI 1.5.4-1 64-bit binaries on windows 7 when I 
run mpirun.exe without any options I get the help text and everything 
seems to work fine but when I try to actually run a application, I get 
the following error:


..\..\..\openmpi-1.5.4\opal\event\event.c: ompi_evesel->dispatch() failed.

I get the error when running any application, to exclude my own 
application I tried the hello world example and it returns the same 
error. (The command I used is mpirun.exe helloworld.exe)


Searching for the error in the list or looking at event.c didn't get 
me much further, can anyone point me in the right direction for 
solving this problem?


Thanks




___
users mailing list
us...@open-mpi.org  
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
---
Shiqing Fan
High Performance Computing CenterStuttgart  (HLRS)
Tel: ++49(0)711-685-87234  Nobelstrasse 19
Fax: ++49(0)711-685-65832  70569Stuttgart
http://www.hlrs.de/organization/people/shiqing-fan/
email:f...@hlrs.de  
  
  
  



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
---
Shiqing Fan
High Performance Computing Center Stuttgart (HLRS)
Tel: ++49(0)711-685-87234  Nobelstrasse 19
Fax: ++49(0)711-685-65832  70569 Stuttgart
http://www.hlrs.de/organization/people/shiqing-fan/
email: f...@hlrs.de



Re: [OMPI users] SIGV at MPI_Cart_sub

2012-01-10 Thread Anas Al-Trad
it is a good question I asked it myself at the first but then I said it
should be correct but anyway I want to confirm that:
her is the code snippet of the program:
...
int ranks[size];
for(i=0; i < size; ++i)
{
ranks[i] = i;
}
...

for(p=8; p <= (size); p+=4)
{
  MPI_Barrier(MPI_COMM_WORLD);
  if(!grid_init(p, 1)) continue;
  if( (p>=m) || (p>=k) || (p>=n) )
break;

  MPI_Group_incl(world_group, p, ranks, &working_group);
  MPI_Comm_create(MPI_COMM_WORLD, working_group, &working_comm);

  if(working_comm != MPI_COMM_NULL)
  {
...
variant_run(&variant5, C, m, k, n, my_rank, p, working_comm);
...
MPI_Group_free(&working_group);
MPI_Comm_free(&working_comm);
}

Inside variant_run, it calls this function where the error is:
void Compute_SUMMA1(Matrix* A, Matrix* B, Matrix *C, size_t M, size_t K,
size_t N, size_t my_rank, size_t size, MPI_Comm comm)
{
C->block_matrix = gsl_matrix_calloc(A->block_matrix->size1,
B->block_matrix->size2);
C->distribution_type = TwoD_Block;

MPI_Comm grid_comm;
int dim[2], period[2], reorder = 0, ndims = 2;
int coord[2], id;

dim[0] = global.PR; dim[1] = global.PC;
period[0] = 0; period[1] = 0;

int ss, rr;
MPI_Group comm_group;
MPI_Comm_group(comm, &comm_group );
MPI_Group_size( comm_group, &ss);
MPI_Group_rank( comm_group, &rr);
if(ss == 6)
{
//printf("M %d K %d N %d
//printf("my_rank in comm %d   my_rank in world_comm %d\n", rr, my_rank);
//printf(" comm size %d  my_rank in comm %d   my_rank in world_comm %d\n",
ss, rr, my_rank);
//printf("SUMMA ... PR %d  PC %d\n", global.PR, global.PC);
}
//MPI_Barrier(comm);
// if(my_rank == 0)
// printf("my_rank %d  ndims %d  dim[0] %d  dim[1] %d  period[0] %d
 period[1] %d  reorder %d\n",
//my_rank, ndims, dim[0], dim[1], period[0], period[1], reorder);
// if(comm == MPI_COMM_NULL)
//   printf("my_rank %d  comm is empty\n", my_rank);
//
MPI_Cart_create(comm, ndims, dim, period, reorder, &grid_comm);

MPI_Comm Acomm, Bcomm;

// create column subgrids
int remain[2]; //, mdims, dims[2], row_coords[2];
remain[0] = 1;
remain[1] = 0;
MPI_Cart_sub(grid_comm, remain, &Bcomm);

remain[0] = 0;
remain[1] = 1;
MPI_Cart_sub(grid_comm, remain, &Acomm);
...
}


As you can see, all ranks will call grid_init which is a global func that
returns the grid dims, if it is executed for ranks 24 will produce 4X6 and
for 96 produce 8X12 and will store the result in global structure with PR
and PC. As it is executed by all prcesses and I checked for rank 0 and some
other processes and the result is correct so I assume it should be correct
for all other processes.

So the grid_comm is correct which is an input to MPI_Cart_sub. The ranks in
the working_comm and in MPI_COMM_WORLD should be the same and this should
be correct and it is according to filling the rank array at the beginning
of this code snippet.



On Tue, Jan 10, 2012 at 5:25 PM, Jeff Squyres  wrote:

> This may be a dumb question, but are you 100% sure that the input values
> are correct?
>
> On Jan 10, 2012, at 8:16 AM, Anas Al-Trad wrote:
>
> >  Hi Ralph, I changed the intel icc module from 12.1.0 to 11.1.069, the
> previous default one used at a Neolith Cluster. I submitted the job and I
> still waiting for the result. Here is the message of the segmentation fault:
> >
> > [n764:29867] *** Process received signal ***
> > [n764:29867] Signal: Floating point exception (8)
> > [n764:29867] Signal code: Integer divide-by-zero (1)
> > [n764:29867] Failing at address: 0x2ba640e74627
> > [n764:29867] [ 0] /lib64/libc.so.6 [0x2ba641e162d0]
> > [n764:29867] [ 1]
> /software/mpi/openmpi/1.4.1/i101011/lib/libmpi.so.0(mca_topo_base_cart_coords+0x43)
> [0x2ba640e74627]
> > [n764:29867] [ 2]
> /software/mpi/openmpi/1.4.1/i101011/lib/libmpi.so.0(mca_topo_base_cart_sub+0x1d5)
> [0x2ba640e74acd]
> > [n764:29867] [ 3]
> /software/mpi/openmpi/1.4.1/i101011/lib/libmpi.so.0(MPI_Cart_sub+0x35)
> [0x2ba640e472d9]
> > [n764:29867] [ 4]
> /home/x_anaal/thesis/cimple/tst_chng_p/v5/r2/output.o(Compute_SUMMA1+0x226)
> [0x4088da]
> > [n764:29867] [ 5]
> /home/x_anaal/thesis/cimple/tst_chng_p/v5/r2/output.o(variant_run+0xb2)
> [0x409058]
> > [n764:29867] [ 6]
> /home/x_anaal/thesis/cimple/tst_chng_p/v5/r2/output.o(main+0xf90) [0x40eeba]
> > [n764:29867] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x2ba641e03994]
> > [n764:29867] [ 8] /home/x_anaal/thesis/cimple/tst_chng_p/v5/r2/output.o
> [0x403fd9]
> > [n764:29867] *** End of error message ***
> >
> > when I run my application, sometimes I get this error and sometimes it
> is stuck in the middle.
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___

Re: [OMPI users] SIGV at MPI_Cart_sub

2012-01-10 Thread Anas Al-Trad
Anyway, after compiling my code with icc/11.1.069, the job is running
without stuck or that sigv which it occurred before when using icc/12.1.0
module.

Also I have to point that when I was using icc/12.1.0 I was getting strange
outputs or stuck, and I solved them by changing the name of parameters
inside the function, for example, if I call a func like this

time( ..., size_t *P, ...){}

and call it like this:
time(..,p,..);

then I have to change the name of *P inside the time functions as follows:
time( ..., size_t *P, ...)
{
int bestP = *P; // and maybe again as the later bug that I solved
int bP = bestP;
// then start using bP :)
...
}

Thanks guys for the help, I guess that the problem is solved when compiling
with the old one.


[OMPI users] OMPI C++ Bindings problems

2012-01-10 Thread John Doe
I'm trying to compile some code that uses the Chombo mesh package which
uses Open MPI's C++ but keep getting errors like this:

AMRLevelX.o: In function `Intracomm':
/opt/ompi/gnu/1.4.4/include/openmpi/ompi/mpi/cxx/intracomm.h:25:
undefined reference to `MPI::Comm::Comm()'
AMRLevelX.o: In function `Intracomm':
/opt/ompi/gnu/1.4.4/include/openmpi/ompi/mpi/cxx/intracomm_inln.h:23:
undefined reference to `MPI::Comm::Comm()'
AMRLevelX.o: In function `MPI::Op::Init(void (*)(void const*, void*,
int, MPI::Datatype const&), bool)':
/opt/ompi/gnu/1.4.4/include/openmpi/ompi/mpi/cxx/op_inln.h:122:
undefined reference to `ompi_mpi_cxx_op_intercept'
AMRLevelX.o:(.rodata._ZTVN3MPI3WinE[vtable for MPI::Win]+0x48):
undefined reference to `MPI::Win::Free()'
AMRLevelX.o:(.rodata._ZTVN3MPI8DatatypeE[vtable for
MPI::Datatype]+0x78): undefined reference to `MPI::Datatype::Free()'
collect2: ld returned 1 exit status


which looks like a problem with some ompi c++ symbols. I have the path to
the library file libmpi_cxx.so in my LD_LIBRARY_PATH and compiled openmpi
with C++ and shared library support. Am I missing something?

Thanks


Re: [OMPI users] OMPI C++ Bindings problems

2012-01-10 Thread Ralph Castain
Did you use OMPI's C++ wrapper compiler to build your code? Looks to me like 
you are missing the required include paths, which is what the wrapper compiler 
would provide.


On Jan 10, 2012, at 11:50 AM, John Doe wrote:

> I'm trying to compile some code that uses the Chombo mesh package which uses 
> Open MPI's C++ but keep getting errors like this:
> 
> AMRLevelX.o: In function `Intracomm':
> /opt/ompi/gnu/1.4.4/include/openmpi/ompi/mpi/cxx/intracomm.h:25: undefined 
> reference to `MPI::Comm::Comm()'
> AMRLevelX.o: In function `Intracomm':
> /opt/ompi/gnu/1.4.4/include/openmpi/ompi/mpi/cxx/intracomm_inln.h:23: 
> undefined reference to `MPI::Comm::Comm()'
> AMRLevelX.o: In function `MPI::Op::Init(void (*)(void const*, void*, int, 
> MPI::Datatype const&), bool)':
> /opt/ompi/gnu/1.4.4/include/openmpi/ompi/mpi/cxx/op_inln.h:122: undefined 
> reference to `ompi_mpi_cxx_op_intercept'
> AMRLevelX.o:(.rodata._ZTVN3MPI3WinE[vtable for MPI::Win]+0x48): undefined 
> reference to `MPI::Win::Free()'
> AMRLevelX.o:(.rodata._ZTVN3MPI8DatatypeE[vtable for MPI::Datatype]+0x78): 
> undefined reference to `MPI::Datatype::Free()'
> collect2: ld returned 1 exit status
> 
> which looks like a problem with some ompi c++ symbols. I have the path to the 
> library file libmpi_cxx.so in my LD_LIBRARY_PATH and compiled openmpi with 
> C++ and shared library support. Am I missing something?
> 
> Thanks 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] OpenMPI 1.5.4 remote send hang on Windows 2008R2

2012-01-10 Thread Randy Abernethy
Hello,


I have run into an issue that appears to be related to sending messages to
multiple processes on a single remote host prior to the remote processes
sending messages to the origin. I have cooked the issue down to the
following:


*Test Environment of 3 Identical Hosts:*

· * Intel i7-2600K, 12GB ram, Intel GB Ethernet, DLink Switch

 ·   * Windows 2008R2 x64 with all current updates

· * OMPI  (all three hosts report the same ompi_info and were
installed with the same binary)
http://www.open-mpi.org/software/ompi/v1.5/downloads/OpenMPI_v1.5.4-1_win64.exe

C:\GDX>ompi_info -v ompi full --parsable

package:Open MPI hpcfan@VISCLUSTER26 Distribution

ompi:version:full:1.5.4

ompi:version:svn:r25060

ompi:version:release_date:Aug 18, 2011

orte:version:full:1.5.4

orte:version:svn:r25060

orte:version:release_date:Aug 18, 2011

opal:version:full:1.5.4

opal:version:svn:r25060

opal:version:release_date:Aug 18, 2011

ident:1.5.4



*Test Program:*

#include 

#define OMPI_IMPORTS

#include "C:\Program Files (x86)\OpenMPI_v1.5.4-x64\include\mpi.h"



int main(int argc, char *argv[])

{

   int rank, size, i, msg;

   MPI_Init(&argc, &argv);

   MPI_Comm_rank(MPI_COMM_WORLD, &rank);

   MPI_Comm_size(MPI_COMM_WORLD, &size);

   printf("Process %i of %i initialized\n", rank, size);



   if (0 == rank) {

  for (i = 1; i < size; i++) {

 printf("Process %i sending %i to %i\n", rank, i, i);

 MPI_Send(&rank, 1, MPI_INT, i, 0, MPI_COMM_WORLD);

  }

  for (i = 1; i < size; i++) {

 MPI_Recv(&msg, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);

 printf("Process %i received %i\n", rank, msg);

  }

   }

   else {

  MPI_Recv(&msg, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);

  printf("Process %i received %i\n", rank, msg);

  MPI_Send(&rank, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);

  printf("Process %i sent %i to %i\n", rank, rank, 0);

   }



   printf("Process %i exiting\n", rank);

   MPI_Finalize();

   return 0;

}



*Test Cases:*

·X procs on the originating node: Working

·X procs on the originating node and one proc on one or more
remote nodes: Working

·X procs on the originating node and more than one proc on any
remote node:  Fails

A test with two procs on the origin and one on each of two remote nodes
runs, however the same test with the two remote procs on the same machine
fails on the second remote send. Here are some test runs (the ^C indicates
a hang).

C:\GDX>mpirun -v -display-map -hostfile mpihosts -np 2 c:\gdx\distmsg.exe



    JOB MAP   



 Data for node: Yap Num procs: 2

Process OMPI jobid: [42094,1] Process rank: 0

Process OMPI jobid: [42094,1] Process rank: 1



 =

Process 0 of 2 initialized

Process 1 of 2 initialized

Process 0 sending 1 to 1

Process 1 received 0

Process 1 sent 1 to 0

Process 1 exiting

Process 0 received 1

Process 0 exiting



C:\GDX>mpirun -v -display-map -hostfile mpihosts -np 3 c:\gdx\distmsg.exe



    JOB MAP   



 Data for node: Yap Num procs: 2

Process OMPI jobid: [42014,1] Process rank: 0

Process OMPI jobid: [42014,1] Process rank: 1



 Data for node: chuuk   Num procs: 1

Process OMPI jobid: [42014,1] Process rank: 2



 =

connecting to chuuk

username:administrator

password:

Save Credential?(Y/N) n

Process 0 of 3 initialized

Process 1 of 3 initialized

Process 0 sending 1 to 1

Process 0 sending 2 to 2

Process 1 received 0

Process 1 sent 1 to 0

Process 1 exiting

Process 0 received 1

Process 0 received 2

Process 0 exiting



C:\GDX>mpirun -v -display-map -hostfile mpihosts -np 4 c:\gdx\distmsg.exe



    JOB MAP   



 Data for node: Yap Num procs: 2

Process OMPI jobid: [43894,1] Process rank: 0

Process OMPI jobid: [43894,1] Process rank: 1



 Data for node: chuuk   Num procs: 2

Process OMPI jobid: [43894,1] Process rank: 2

Process OMPI jobid: [43894,1] Process rank: 3



 =

connecting to chuuk

username:administrator

password:

Save Credential?(Y/N) n

Process 0 of 4 initialized

Process 1 of 4 initialized

Process 0 sending 1 to 1

Process 0 sending 2 to 2

Process 1 received 0

Process 1 sent 1 to 0

Process 1 exiting

Process 0 sending 3 to 3

^C

C:\GDX>mpirun -v -display-map -hostfile mpihosts -np 4 c:\gdx\distmsg.exe



    JOB MAP   



 Data for node: Yap Num procs: 2

Process OMPI jobid: [43310,1] Process rank: 0

Process OMPI jobid: [43310,1] Process rank: 

[OMPI users] Passwordless ssh

2012-01-10 Thread Shaandar Nyamtulga

Hi
I built Beuwolf cluster using OpenMPI reading the following link.
http://techtinkering.com/2009/12/02/setting-up-a-beowulf-cluster-using-open-mpi-on-linux/
I can do ssh to my slave nodes without the slave mpiuser's password before 
mounting my slaves.
But when I mount my slaves and do ssh, the slaves ask again their passwords.
Master and slaves' ssh directory and authorized_keys have permission 700, 600 
respectively and
 they owned only by owner mpiuser through chown.RSA has no passphrase.

Please help me on this matter.

  

Re: [OMPI users] Passwordless ssh

2012-01-10 Thread Ralph Castain
You might want to ask that on the Beowulf mailing lists - I suspect it has 
something to do with the mount procedure, but honestly have no real idea how to 
resolve it.

On Jan 10, 2012, at 8:45 PM, Shaandar Nyamtulga wrote:

> Hi
> I built Beuwolf cluster using OpenMPI reading the following link.
> http://techtinkering.com/2009/12/02/setting-up-a-beowulf-cluster-using-open-mpi-on-linux/
> I can do ssh to my slave nodes without the slave mpiuser's password before 
> mounting my slaves.
> But when I mount my slaves and do ssh, the slaves ask again their passwords.
> Master and slaves' ssh directory and authorized_keys have permission 700, 600 
> respectively and
>  they owned only by owner mpiuser through chown.RSA has no passphrase.
> 
> Please help me on this matter.
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users