[OMPI users] Spawn problem

2008-03-31 Thread Joao Vicente Lima
Hi,
sorry bring this again ... but i hope use spawn in ompi someday :-D

The execution of spawn in this way works fine:
MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,
MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);

but if this code go to a for I get a problem :
for (i= 0; i < 2; i++)
{
  MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1,
  MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm[i], MPI_ERRCODES_IGNORE);
}

and the error is:
spawning ...
child!
child!
[localhost:03892] *** Process received signal ***
[localhost:03892] Signal: Segmentation fault (11)
[localhost:03892] Signal code: Address not mapped (1)
[localhost:03892] Failing at address: 0xc8
[localhost:03892] [ 0] /lib/libpthread.so.0 [0x2ac71ca8bed0]
[localhost:03892] [ 1]
/usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_dpm_base_dyn_finalize+0xa3)
[0x2ac71ba7448c]
[localhost:03892] [ 2] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2ac71b9decdf]
[localhost:03892] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2ac71ba04765]
[localhost:03892] [ 4]
/usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Finalize+0x71)
[0x2ac71ba365c9]
[localhost:03892] [ 5] ./spawn1(main+0xaa) [0x400ac2]
[localhost:03892] [ 6] /lib/libc.so.6(__libc_start_main+0xf4) [0x2ac71ccb7b74]
[localhost:03892] [ 7] ./spawn1 [0x400989]
[localhost:03892] *** End of error message ***
--
mpirun noticed that process rank 0 with PID 3892 on node localhost
exited on signal 11 (Segmentation fault).
--

the attachments contain the ompi_info, config.log and program.

thanks for some check,
Joao.


config.log.gz
Description: GNU Zip compressed data


ompi_info.txt.gz
Description: GNU Zip compressed data


spawn1.c.gz
Description: GNU Zip compressed data


Re: [OMPI users] Spawn problem

2008-03-31 Thread Terry Frankcombe
My C++ is a little rusty.  Is that returned intercommunicator going
where you think it is?  If you unroll the loop does the same badness
happen?


On Mon, 2008-03-31 at 02:41 -0300, Joao Vicente Lima wrote:
> Hi,
> sorry bring this again ... but i hope use spawn in ompi someday :-D
> 
> The execution of spawn in this way works fine:
> MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,
> MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);
> 
> but if this code go to a for I get a problem :
> for (i= 0; i < 2; i++)
> {
>   MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1,
>   MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm[i], MPI_ERRCODES_IGNORE);
> }
> 
> and the error is:
> spawning ...
> child!
> child!
> [localhost:03892] *** Process received signal ***
> [localhost:03892] Signal: Segmentation fault (11)
> [localhost:03892] Signal code: Address not mapped (1)
> [localhost:03892] Failing at address: 0xc8
> [localhost:03892] [ 0] /lib/libpthread.so.0 [0x2ac71ca8bed0]
> [localhost:03892] [ 1]
> /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_dpm_base_dyn_finalize+0xa3)
> [0x2ac71ba7448c]
> [localhost:03892] [ 2] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 
> [0x2ac71b9decdf]
> [localhost:03892] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 
> [0x2ac71ba04765]
> [localhost:03892] [ 4]
> /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Finalize+0x71)
> [0x2ac71ba365c9]
> [localhost:03892] [ 5] ./spawn1(main+0xaa) [0x400ac2]
> [localhost:03892] [ 6] /lib/libc.so.6(__libc_start_main+0xf4) [0x2ac71ccb7b74]
> [localhost:03892] [ 7] ./spawn1 [0x400989]
> [localhost:03892] *** End of error message ***
> --
> mpirun noticed that process rank 0 with PID 3892 on node localhost
> exited on signal 11 (Segmentation fault).
> --
> 
> the attachments contain the ompi_info, config.log and program.
> 
> thanks for some check,
> Joao.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Spawn problem

2008-03-31 Thread Matt Hughes
On 30/03/2008, Joao Vicente Lima  wrote:
> Hi,
>  sorry bring this again ... but i hope use spawn in ompi someday :-D

I believe it's crashing in MPI_Finalize because you have not closed
all communication paths between the parent and the child processes.
For the parent process, try calling MPI_Comm_free or
MPI_Comm_disconnect on each intercomm in your intercomm array before
calling finalize.  On the child, call free or disconnect on the parent
intercomm before calling finalize.

Out of curiosity, why a loop of spawns?  Why not increase the value of
the maxprocs argument, or if you need to spawn different executables,
or use different arguments for each instance, why not
MPI_Comm_spawn_multiple?

mch



>
>  The execution of spawn in this way works fine:
>  MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,
>  MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);
>
>  but if this code go to a for I get a problem :
>  for (i= 0; i < 2; i++)
>  {
>   MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1,
>   MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm[i], MPI_ERRCODES_IGNORE);
>  }
>
>  and the error is:
>  spawning ...
>  child!
>  child!
>  [localhost:03892] *** Process received signal ***
>  [localhost:03892] Signal: Segmentation fault (11)
>  [localhost:03892] Signal code: Address not mapped (1)
>  [localhost:03892] Failing at address: 0xc8
>  [localhost:03892] [ 0] /lib/libpthread.so.0 [0x2ac71ca8bed0]
>  [localhost:03892] [ 1]
>  /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_dpm_base_dyn_finalize+0xa3)
>  [0x2ac71ba7448c]
>  [localhost:03892] [ 2] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 
> [0x2ac71b9decdf]
>  [localhost:03892] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 
> [0x2ac71ba04765]
>  [localhost:03892] [ 4]
>  /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Finalize+0x71)
>  [0x2ac71ba365c9]
>  [localhost:03892] [ 5] ./spawn1(main+0xaa) [0x400ac2]
>  [localhost:03892] [ 6] /lib/libc.so.6(__libc_start_main+0xf4) 
> [0x2ac71ccb7b74]
>  [localhost:03892] [ 7] ./spawn1 [0x400989]
>  [localhost:03892] *** End of error message ***
>  --
>  mpirun noticed that process rank 0 with PID 3892 on node localhost
>  exited on signal 11 (Segmentation fault).
>  --
>
>  the attachments contain the ompi_info, config.log and program.
>
>  thanks for some check,
>
> Joao.
>
> ___
>  users mailing list
>  us...@open-mpi.org
>  http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>


Re: [OMPI users] RPM build errors when creating multiple rpms

2008-03-31 Thread Jeff Squyres

On Mar 26, 2008, at 10:05 AM, Ashley Pittman wrote:



The community Open MPI projects distributes SRPMs which, when built,
do not install into /opt by default -- you have to request it
specifically.


Out of interest how does open-mpi handle the mpir_dll_name symbol in  
the

library, it's supposed to contain the location of the debugger library
and therefore does not play well with relocating RPMs or installed
library's other to the place they were built for.



Today, it does not -- the location has to be compile-time  
initialized.  So if you move the library+plugin somewhere else, the  
Etnus scheme to find the DLL currently cannot handle it.


We have proposed a new scheme to Etnus and Allinea that allows a bit  
more flexibility to find the DLLs at run-time; both have agreed to the  
idea in principle.  We will include this support in Open MPI v1.3; I  
don't know when/if the debuggers will support it.  I believe that the  
ball is currently in my court; Etnus asked me some questions to which  
I have not yet replied... doh...


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Spawn problem

2008-03-31 Thread Joao Vicente Lima
Really MPI_Finalize is crashing and calling MPI_Comm_{free,disconnect} works!
I don't know if the free/disconnect must appear before a MPI_Finalize
for this case (spawn processes)   some suggest ?

I use loops in spawn:
-  first for testing :)
- and second because certain MPI applications don't know in advance
the number of childrens needed to complete his work.

The spawn works is creat ... I will made other tests.

thanks,
Joao

On Mon, Mar 31, 2008 at 3:03 AM, Matt Hughes
 wrote:
> On 30/03/2008, Joao Vicente Lima  wrote:
>  > Hi,
>  >  sorry bring this again ... but i hope use spawn in ompi someday :-D
>
>  I believe it's crashing in MPI_Finalize because you have not closed
>  all communication paths between the parent and the child processes.
>  For the parent process, try calling MPI_Comm_free or
>  MPI_Comm_disconnect on each intercomm in your intercomm array before
>  calling finalize.  On the child, call free or disconnect on the parent
>  intercomm before calling finalize.
>
>  Out of curiosity, why a loop of spawns?  Why not increase the value of
>  the maxprocs argument, or if you need to spawn different executables,
>  or use different arguments for each instance, why not
>  MPI_Comm_spawn_multiple?
>
>  mch
>
>
>
>
>
>  >
>  >  The execution of spawn in this way works fine:
>  >  MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,
>  >  MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);
>  >
>  >  but if this code go to a for I get a problem :
>  >  for (i= 0; i < 2; i++)
>  >  {
>  >   MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1,
>  >   MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm[i], MPI_ERRCODES_IGNORE);
>  >  }
>  >
>  >  and the error is:
>  >  spawning ...
>  >  child!
>  >  child!
>  >  [localhost:03892] *** Process received signal ***
>  >  [localhost:03892] Signal: Segmentation fault (11)
>  >  [localhost:03892] Signal code: Address not mapped (1)
>  >  [localhost:03892] Failing at address: 0xc8
>  >  [localhost:03892] [ 0] /lib/libpthread.so.0 [0x2ac71ca8bed0]
>  >  [localhost:03892] [ 1]
>  >  /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_dpm_base_dyn_finalize+0xa3)
>  >  [0x2ac71ba7448c]
>  >  [localhost:03892] [ 2] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 
> [0x2ac71b9decdf]
>  >  [localhost:03892] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 
> [0x2ac71ba04765]
>  >  [localhost:03892] [ 4]
>  >  /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Finalize+0x71)
>  >  [0x2ac71ba365c9]
>  >  [localhost:03892] [ 5] ./spawn1(main+0xaa) [0x400ac2]
>  >  [localhost:03892] [ 6] /lib/libc.so.6(__libc_start_main+0xf4) 
> [0x2ac71ccb7b74]
>  >  [localhost:03892] [ 7] ./spawn1 [0x400989]
>  >  [localhost:03892] *** End of error message ***
>  >  --
>  >  mpirun noticed that process rank 0 with PID 3892 on node localhost
>  >  exited on signal 11 (Segmentation fault).
>  >  --
>  >
>  >  the attachments contain the ompi_info, config.log and program.
>  >
>  >  thanks for some check,
>  >
>  > Joao.
>  >
>
>
> > ___
>  >  users mailing list
>  >  us...@open-mpi.org
>  >  http://www.open-mpi.org/mailman/listinfo.cgi/users
>  >
>  >
>  ___
>  users mailing list
>  us...@open-mpi.org
>  http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Spawn problem

2008-03-31 Thread Joao Vicente Lima
Hi again,
when I call MPI_Init_thread in the same program the error is:

spawning ...
opal_mutex_lock(): Resource deadlock avoided
[localhost:07566] *** Process received signal ***
[localhost:07566] Signal: Aborted (6)
[localhost:07566] Signal code:  (-6)
[localhost:07566] [ 0] /lib/libpthread.so.0 [0x2abe5630ded0]
[localhost:07566] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2abe5654c3c5]
[localhost:07566] [ 2] /lib/libc.so.6(abort+0x10e) [0x2abe5654d73e]
[localhost:07566] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe5528063b]
[localhost:07566] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55280559]
[localhost:07566] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe552805e8]
[localhost:07566] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55280fff]
[localhost:07566] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55280f3d]
[localhost:07566] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55281f59]
[localhost:07566] [ 9] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+
0x204) [0x2abe552823cd]
[localhost:07566] [10] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2a
be58efb5f7]
[localhost:07566] [11] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(MPI_Comm_spawn+0x
465) [0x2abe552b55cd]
[localhost:07566] [12] ./spawn1(main+0x9d) [0x400b05]
[localhost:07566] [13] /lib/libc.so.6(__libc_start_main+0xf4) [0x2abe56539b74]
[localhost:07566] [14] ./spawn1 [0x4009d9]
[localhost:07566] *** End of error message ***
opal_mutex_lock(): Resource deadlock avoided
[localhost:07567] *** Process received signal ***
[localhost:07567] Signal: Aborted (6)
[localhost:07567] Signal code:  (-6)
[localhost:07567] [ 0] /lib/libpthread.so.0 [0x2b48610f9ed0]
[localhost:07567] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2b48613383c5]
[localhost:07567] [ 2] /lib/libc.so.6(abort+0x10e) [0x2b486133973e]
[localhost:07567] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006c63b]
[localhost:07567] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006c559]
[localhost:07567] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006c5e8]
[localhost:07567] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006cfff]
[localhost:07567] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006cf3d]
[localhost:07567] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006df59]
[localhost:07567] [ 9] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+
0x204) [0x2b486006e3cd]
[localhost:07567] [10] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b
4863ce75f7]
[localhost:07567] [11] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b
4863ce9c2b]
[localhost:07567] [12] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b48600720d7]
[localhost:07567] [13] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Init_thread+
0x166) [0x2b48600ae4f2]
[localhost:07567] [14] ./spawn1(main+0x2c) [0x400a94]
[localhost:07567] [15] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b4861325b74]
[localhost:07567] [16] ./spawn1 [0x4009d9]
[localhost:07567] *** End of error message ***
--
mpirun noticed that process rank 0 with PID 7566 on node localhost exited on sig
nal 6 (Aborted).
--

thank for some check,
Joao.

On Mon, Mar 31, 2008 at 11:49 AM, Joao Vicente Lima
 wrote:
> Really MPI_Finalize is crashing and calling MPI_Comm_{free,disconnect} works!
>  I don't know if the free/disconnect must appear before a MPI_Finalize
>  for this case (spawn processes)   some suggest ?
>
>  I use loops in spawn:
>  -  first for testing :)
>  - and second because certain MPI applications don't know in advance
>  the number of childrens needed to complete his work.
>
>  The spawn works is creat ... I will made other tests.
>
>  thanks,
>  Joao
>
>
>
>  On Mon, Mar 31, 2008 at 3:03 AM, Matt Hughes
>   wrote:
>  > On 30/03/2008, Joao Vicente Lima  wrote:
>  >  > Hi,
>  >  >  sorry bring this again ... but i hope use spawn in ompi someday :-D
>  >
>  >  I believe it's crashing in MPI_Finalize because you have not closed
>  >  all communication paths between the parent and the child processes.
>  >  For the parent process, try calling MPI_Comm_free or
>  >  MPI_Comm_disconnect on each intercomm in your intercomm array before
>  >  calling finalize.  On the child, call free or disconnect on the parent
>  >  intercomm before calling finalize.
>  >
>  >  Out of curiosity, why a loop of spawns?  Why not increase the value of
>  >  the maxprocs argument, or if you need to spawn different executables,
>  >  or use different arguments for each instance, why not
>  >  MPI_Comm_spawn_multiple?
>  >
>  >  mch
>  >
>  >
>  >
>  >
>  >
>  >  >
>  >  >  The execution of spawn in this way works fine:
>  >  >  MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,
>  >  >  MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);
>  >  >
>  >  >  but if this code go to a for I get a problem :
>  >  >  for (i= 0; i < 2; i++)
>