Re: [OMPI users] prob in running two mpi merged program (UNCLASSIFIED)

2015-02-05 Thread Burns, Andrew J CTR (US)
Classification: UNCLASSIFIED
Caveats: NONE

Okay, I think I may get what's going on. I think you're calling one mpi capable 
program from within another mpi program. What you
have to do is assume that the program that is being called already had MPI_Init 
called and that MPI_Finalize will be called after
the program returns.

Example (pseudocode for brevity):

int main()
{
  MPI_Init();

  int x;

  int p2result = Program2(x, comm);

  MPI_Bcast(p2result, comm);

  MPI_Finalize();
}

int Program2(int x, MPI_Comm comm)
{
  int returnval;
  MPI_AllReduce(&returnval, x, comm);
  return returnval;
}



If the second program were to be:

int Program2(int x, MPI_Comm comm)
{
  MPI_Init();
  int returnval;
  MPI_AllReduce(&returnval, x, comm);
  return returnval;
  MPI_Finalize()
}

The program would return to serial when MPI_Finalize is first called, 
potentially throwing several errors.

-Andrew Burns

-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Muhammad Ashfaqur 
Rahman
Sent: Wednesday, February 04, 2015 3:42 PM
To: Open MPI Users
Subject: Re: [OMPI users] prob in running two mpi merged program (UNCLASSIFIED)

Dear Andrew Burns,
Thank you for your ideas. Your guess is partly correct, I am trying to merge 
two sets of programs into one executable and then run
in mpi.
As per your suggestion, I have omitted the MPI_Finalize from of one set. And 
also commented the MPI_Barrier in some parts. 
But still it is serial.
For your idea: attached here Makefile.


Regards
Ashfaq


On Tue, Feb 3, 2015 at 6:26 PM, Burns, Andrew J CTR (US) 
 wrote:


Classification: UNCLASSIFIED
Caveats: NONE

If I could venture a guess, it sounds like you are trying to merge two 
separate programs into one executable and run them in
parallel
via MPI.

The problem sounds like an issue where your program starts in parallel 
but then changes back to serial while the program is
still
executing.

I can't be entirely sure without looking at the code itself.

One guess is that MPI_Finalize is in the wrong location. Finalize 
should be called to end the parallel section and move the
program
back to serial. Typically this means that Finalize will be very close 
to the last line of the program.

It may also be possible that with the way your program is structured, 
the effect is effectively serial since only one core
is
processing at any given moment. This may be due to extensive use of 
barrier or similar functions.

Andrew Burns
Lockheed Martin
Software Engineer
410-306-0409
ARL DSRC
andrew.j.bur...@us.army.mil
andrew.j.burns35@mail.mil

-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph 
Castain
Sent: Tuesday, February 03, 2015 9:05 AM
To: Open MPI Users
Subject: Re: [OMPI users] prob in running two mpi merged program

I'm afraid I don't quite understand what you are saying, so let's see 
if I can clarify. You have two fortran MPI programs.
You start
one using "mpiexec". You then start the other one as a singleton - 
i.e., you just run "myapp" without using mpiexec. The two
apps are
attempting to execute an MPI_Connect/accept so they can "join".

Is that correct? You mention MPICH in your statement about one of the 
procs - are you using MPICH or Open MPI? If the
latter, which
version are you using?

Ralph


On Mon, Feb 2, 2015 at 11:35 PM, Muhammad Ashfaqur Rahman 
 wrote:


Dear All,
Take my greetings. I am new in mpi usage. I have problems in 
parallel run, when two fortran mpi programs are merged
to one
executable. If these two are separate, then they are running parallel.

One program has used spmd and another one  has used mpich 
header directly.

Other issue is that while trying to run the above mentioned 
merged program in mpi, it's first started with separate
parallel
instances of same step and then after some steps it becomes serial.

Please help me in this regards

Ashfaq
Ph.D Student
Dept. of Meteorology

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/02/26264.php




Classification: UNCLASSIFIED
Caveats: NONE



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/02/26266.php




Classification: U

Re: [OMPI users] orted seg fault when using MPI_Comm_spawn on more than one host

2015-02-05 Thread Ralph Castain
Okay, I tracked this down - thanks for your patience! I have a fix pending 
review. You can track it here:

https://github.com/open-mpi/ompi-release/pull/179 



> On Feb 4, 2015, at 5:14 PM, Evan Samanas  wrote:
> 
> Indeed, I simply commented out all the MPI_Info stuff, which you essentially 
> did by passing a dummy argument.  I'm still not able to get it to succeed.
> 
> So here we go, my results defy logic.  I'm sure this could be my fault...I've 
> only been an occasional user of OpenMPI and MPI in general over the years and 
> I've never used MPI_Comm_spawn before this project. I tested simple_spawn 
> like so:
> mpicc simple_spawn.c -o simple_spawn
> ./simple_spawn
> 
> When my default hostfile points to a file that just lists localhost, this 
> test completes successfully.  If it points to my hostfile with localhost and 
> 5 remote hosts, here's the output:
> evan@lasarti:~/devel/toy_progs/mpi_spawn$ mpicc simple_spawn.c -o simple_spawn
> evan@lasarti:~/devel/toy_progs/mpi_spawn$ ./simple_spawn
> [pid 5703] starting up!
> 0 completed MPI_Init
> Parent [pid 5703] about to spawn!
> [lasarti:05703] [[14661,1],0] FORKING HNP: orted --hnp --set-sid --report-uri 
> 14 --singleton-died-pipe 15 -mca state_novm_select 1 -mca ess_base_jobid 
> 960823296
> [lasarti:05705] *** Process received signal ***
> [lasarti:05705] Signal: Segmentation fault (11)
> [lasarti:05705] Signal code: Address not mapped (1)
> [lasarti:05705] Failing at address: (nil)
> [lasarti:05705] [ 0] 
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fc185dcf340]
> [lasarti:05705] [ 1] 
> /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_rmaps_base_compute_bindings+0x650)[0x7fc186033bb0]
> [lasarti:05705] [ 2] 
> /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_rmaps_base_map_job+0x939)[0x7fc18602fb99]
> [lasarti:05705] [ 3] 
> /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-pal.so.6(opal_libevent2021_event_base_loop+0x6e4)[0x7fc18577dcc4]
> [lasarti:05705] [ 4] 
> /opt/openmpi-v1.8.4-54-g07f735a/lib/libopen-rte.so.7(orte_daemon+0xdf8)[0x7fc186010438]
> [lasarti:05705] [ 5] orted(main+0x47)[0x400887]
> [lasarti:05705] [ 6] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fc185a1aec5]
> [lasarti:05705] [ 7] orted[0x4008db]
> [lasarti:05705] *** End of error message ***
> 
> You can see from the message that this particular run IS from the latest 
> snapshot, though the failure happens on v.1.8.4 as well.  I didn't bother 
> installing the snapshot on the remote nodes though.  Should I do that?  It 
> looked to me like this error happened well before we got to a remote node, so 
> that's why I didn't.
> 
> Your thoughts?
> 
> Evan
> 
> 
> 
> On Tue, Feb 3, 2015 at 7:40 PM, Ralph Castain  > wrote:
> I confess I am sorely puzzled. I replace the Info key with MPI_INFO_NULL, but 
> still had to pass a bogus argument to master since you still have the 
> Info_set code in there - otherwise, info_set segfaults due to a NULL argv[1]. 
> Doing that (and replacing "hostname" with an MPI example code) makes 
> everything work just fine.
> 
> I've attached one of our example comm_spawn codes that we test against - it 
> also works fine with the current head of the 1.8 code base. I confess that 
> some changes have been made since 1.8.4 was released, and it is entirely 
> possible that this was a problem in 1.8.4 and has since been fixed.
> 
> So I'd suggest trying with the nightly 1.8 tarball and seeing if it works for 
> you. You can download it from here:
> 
> http://www.open-mpi.org/nightly/v1.8/ 
> 
> HTH
> Ralph
> 
> 
> On Tue, Feb 3, 2015 at 6:20 PM, Evan Samanas  > wrote:
> Yes, I did.  I replaced the info argument of MPI_Comm_spawn with 
> MPI_INFO_NULL.
> 
> On Tue, Feb 3, 2015 at 5:54 PM, Ralph Castain  > wrote:
> When running your comm_spawn code, did you remove the Info key code? You 
> wouldn't need to provide a hostfile or hosts any more, which is why it should 
> resolve that problem.
> 
> I agree that providing either hostfile or host as an Info key will cause the 
> program to segfault - I'm woking on that issue.
> 
> 
> On Tue, Feb 3, 2015 at 3:46 PM, Evan Samanas  > wrote:
> Setting these environment variables did indeed change the way mpirun maps 
> things, and I didn't have to specify a hostfile.  However, setting these for 
> my MPI_Comm_spawn code still resulted in the same segmentation fault.
> 
> Evan
> 
> On Tue, Feb 3, 2015 at 10:09 AM, Ralph Castain  > wrote:
> If you add the following to your environment, you should run on multiple 
> nodes:
> 
> OMPI_MCA_rmaps_base_mapping_policy=node
> OMPI_MCA_orte_default_hostfile=
> 
> The first tells OMPI to map-by node. The second passes in your default 
> hostfile so you don't need to specify it as an Info key.
> 
> HTH
> Ralph