Re: [OMPI users] Problem with MPI_Intercomm_create

2011-06-13 Thread Jeff Squyres
George -- can you file a ticket about this?


On Jun 12, 2011, at 1:25 PM, George Bosilca wrote:

> Fraderic,
> 
> Based on the current version of the MPI standard, the two groups involved in 
> the intercomm_create have to be disjoints, which means the leader cannot be 
> the same process.
> 
> Regarding the issue in Open MPI, the problem is deep in our modex exchange 
> (contact information). In the example I sent around a while back, the 
> intercomm_create is working, but the resulting communicator contains 
> processes without this modex information. This lead to an error on the next 
> collective communication.
> 
>  george.
> 
> On Jun 12, 2011, at 03:44 , Frédéric Feyel wrote:
> 
>> Dear all, thank you very much for the time spent at looking at my problem.
>> 
>> After reading your contributions, it's not clear wether there is a bug in
>> OpenMPI or not.
>> 
>> So I created a small self contained source code to analyse the behavior,
>> and the problem is still there.
>> 
>> I was wondering if the local and remote leader in the 2 groups could be
>> the same process. Unfortunately, I get
>> an error in the two cases (local and remote leader identical or not).
>> 
>> What do you think about my small source code ?
>> 
>> Best regards,
>> 
>> Frédéric.
>> 
>> 
>> On Tue, 07 Jun 2011 10:31:51 -0500, Edgar Gabriel 
>> wrote:
>>> On 6/7/2011 10:23 AM, George Bosilca wrote:
 
 On Jun 7, 2011, at 11:00 , Edgar Gabriel wrote:
 
> George,
> 
> I did not look over all the details of your test, but it looks to
> me like you are violating one of the requirements of
> intercomm_create namely the request that the two groups have to be
> disjoint. In your case the parent process(es) are part of both
> local intra-communicators, isn't it?
 
 The two groups of the two local communicators are disjoints. One
 contains A,B while the other only C. The bridge communicator contains
 A,C.
 
 I'm confident my example is supposed to work. At least for Open MPI
 the error is under the hood, as the resulting inter-communicator is
 valid but contains NULL endpoints for the remote process.
>>> 
>>> I'll come back to that later, I am not yet convinced that your code is
>>> correct :-) Your local groups might be disjoint, but I am worried about
>>> the ranks of the remote leader in your example. THey can not be 0 from
>>> both groups perspective.
>>> 
 
 Regarding the fact that the two leader should be separate processes,
 you will not find any wording about this in the current version of
 the standard. In the 1.1 there were two opposite sentences about this
 one stating that the two groups can be disjoint, while the other
 claiming that the two leaders can be the same process. After
 discussion, the agreement was that the two groups have to be
 disjoint, and the standard has been amended to match the agreement.
>>> 
>>> 
>>> I realized that this is a non-issue. If the two local groups are
>>> disjoint, there is no way that the two local leaders are the same
>> process.
>>> 
>>> Thanks
>>> Edgar
>>> 
 
 george.
 
 
> 
> I just have MPI-1.1. at hand right now, but here is what it says: 
> 
> 
> Overlap of local and remote groups that are bound into an 
> inter-communicator is prohibited. If there is overlap, then the
> program is erroneous and is likely to deadlock.
> 
>  so bottom line is that the two local intra-communicators that
> are being used have to be disjoint, and the bridgecomm needs to be
> a communicator where at least one process of each of the two
> disjoint groups need to be able to talk to each other.
> Interestingly I did not find a sentence whether it is allowed to be
> the same process, or whether the two local leaders need to be
> separate processes...
> 
> 
> Thanks Edgar
> 
> 
> On 6/7/2011 12:57 AM, George Bosilca wrote:
>> Frederic,
>> 
>> Attached you will find an example that is supposed to work. The
>> main difference with your code is on T3, T4 where you have
>> inversed the local and remote comm. As depicted on the picture
>> attached below, during the 3th step you will create the intercomm
>> between ab and c (no overlap) using ac as a bridge communicator
>> (here the two roots, a and c, can exchange messages).
>> 
>> Based on the MPI 2.2 standard, especially on the paragraph in
>> PS:, the attached code should have been working. Unfortunately, I
>> couldn't run it successfully neither with Open MPI trunk nor
>> MPICH2 1.4rc1.
>> 
>> george.
>> 
>> PS: Here is what the MPI standard states about the
>> MPI_Intercomm_create:
>>> The function MPI_INTERCOMM_CREATE can be used to create an
>>> inter-communicator from two existing intra-communicators, in
>>> the following situation: At least one selected member from ea

Re: [OMPI users] Deadlock with barrier und RMA

2011-06-13 Thread Jeff Squyres
I think your program has a compile error in the Win_create() line.

But other than that, I think you're missing some calls to MPI_WIN_FENCE.  The 
one-sided stuff in MPI-2 is really, really confusing.  

Others on this list disagree with me, but I actively discourage people from 
using it.  Instead, especially if you're just starting with MPI, you might want 
to use MPI_SEND and MPI_RECV (and friends).

I'd also suggest installing your own version of OMPI; the v1.0 series is 
several years out of date (either get your admin to install a more recent 
version, or install a personal copy, as someone outlined earlier in this 
thread).  There have been oodles of bug fixes and new features added since the 
v1.0 series.


On Jun 11, 2011, at 10:43 AM, Ole Kliemann wrote:

> Hi everyone!
> 
> I'm trying to use MPI on a cluster running OpenMPI 1.2.4 and starting
> processes through PBSPro_11.0.2.110766. I've been running into a couple
> of performance and deadlock problems and like to check whether I'm
> making a mistake.
> 
> One of the deadlocks I managed to boil down to the attached example. I
> run it on 8 cores. It usually deadlocks with all except one process
> showing
> 
>   start barrier
> 
> as last output.
> 
> The one process out of order shows:
> 
>   start getting local
> 
> My question at this point is simply whether this is expected behaviour
> of OpenMPI. 
> 
> Thanks in advance!
> Ole
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] Deadlock with barrier und RMA

2011-06-13 Thread Barrett, Brian W
There are no missing calls to MPI_WIN_FENCE as the code is using passive
synchronization (lock/unlock).  The test code looks correct, I think this
is a bug in Open MPI.  The code also fails on the development trunk, so
upgrading will not fix the bug.  I've filed a bug (#2809).  Unfortunately,
I'm not sure when I'll have time to investigate further.

One other note...  Even when everything works correctly, Open MPI's
passive target synchronization implementation is pretty poor (this coming
from the guy who wrote the code).  Open MPI doesn't offer asynchronous
progress for lock/unlock, so all processes have to be entering in the MPI
library for progress.  Also, the latency isn't the best.

Brian

On 6/13/11 6:41 AM, "Jeff Squyres"  wrote:

>I think your program has a compile error in the Win_create() line.
>
>But other than that, I think you're missing some calls to MPI_WIN_FENCE.
>The one-sided stuff in MPI-2 is really, really confusing.
>
>Others on this list disagree with me, but I actively discourage people
>from using it.  Instead, especially if you're just starting with MPI, you
>might want to use MPI_SEND and MPI_RECV (and friends).
>
>I'd also suggest installing your own version of OMPI; the v1.0 series is
>several years out of date (either get your admin to install a more recent
>version, or install a personal copy, as someone outlined earlier in this
>thread).  There have been oodles of bug fixes and new features added
>since the v1.0 series.
>
>
>On Jun 11, 2011, at 10:43 AM, Ole Kliemann wrote:
>
>> Hi everyone!
>> 
>> I'm trying to use MPI on a cluster running OpenMPI 1.2.4 and starting
>> processes through PBSPro_11.0.2.110766. I've been running into a couple
>> of performance and deadlock problems and like to check whether I'm
>> making a mistake.
>> 
>> One of the deadlocks I managed to boil down to the attached example. I
>> run it on 8 cores. It usually deadlocks with all except one process
>> showing
>> 
>> start barrier
>> 
>> as last output.
>> 
>> The one process out of order shows:
>> 
>> start getting local
>> 
>> My question at this point is simply whether this is expected behaviour
>> of OpenMPI. 
>> 
>> Thanks in advance!
>> Ole
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>-- 
>Jeff Squyres
>jsquy...@cisco.com
>For corporate legal information go to:
>http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>___
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>


-- 
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories








[OMPI users] Deadline Extension: HeteroPAR @ Euro-Par 2011

2011-06-13 Thread George Bosilca
Due to multiple requests, the deadline have been extended until June 20, 2011.

==
   CALL FOR PAPERS

  8th International Workshop on Algorithms, Models and Tools
 for Parallel Computing on Heterogeneous Platforms

 HeteroPar'2011

   August 29, 2011, Bordeaux, France
  in conjunction with EuroPar 2011

  http://icl.eecs.utk.edu/heteropar2011/
==

 * Submission of manuscripts:   Monday,  June 20, 2011
 * Notification of acceptance:  Friday,  July   8, 2011
 * Deadline for final version:  Friday,  July  29, 2011
 * Date of workshop:Tuesday, August 29, 2011

Heterogeneity is emerging as one of the most profound and
challenging characteristics of today's parallel environments. From the
macro level, where networks of distributed computers, composed by
diverse node architectures, are interconnected with potentially
heterogeneous networks, to the micro level, where deeper memory
hierarchies and various accelerator architectures are increasingly
common, the impact of heterogeneity on all computing tasks is
increasing rapidly. Traditional parallel algorithms, programming
environments and tools, designed for legacy homogeneous
multiprocessors, can at best achieve on a small fraction of the
efficiency and potential performance we should expect from parallel
computing in tomorrow's highly diversified and mixed environments. New
ideas, innovative algorithms, and specialized programming environments
and tools are needed to efficiently use these new and multifarious
parallel architectures. The workshop is intended to be a forum for
researchers working on algorithms, programming languages, tools, and
theoretical models aimed at efficiently solving problems on
heterogeneous networks.

Authors are encouraged to submit original, unpublished research or
overviews on Algorithms, Models and Tools for Parallel Computing on
Heterogeneous Platforms. Manuscripts should be limited to 10 pages in
Springer LNCS style sheet and submitted through the EasyChair
Conference System: https://www.easychair.org/conferences/?conf=heteropar11.  
Accepted papers that are presented at the workshop will be published
in revised form in a special Euro-Par Workshop Volume in the Lecture
Notes in Computer Science (LNCS) series after the Euro-Par conference.


The topics to be covered include but are not limited to:

   * Heterogeneous parallel programming paradigms and models;

   * Performance models and their integration into the design of efficient
  parallel algorithms for heterogeneous platforms;

   * Performance models and their integration into the design of
 efficient parallel algorithms for heterogeneous platforms;

   * Parallel algorithms for heterogeneous or hierarchical systems,
 including manycores and hardware accelerators (FPGAs, GPUs, etc.);

   * Parallel algorithms for efficient problem solving on
 heterogeneous platforms (numerical linear algebra, nonlinear
 systems, fast transforms, computational biology, data mining,
 multimedia, etc.);

   * Software engineering for heterogeneous parallel systems;

   * Applications on heterogeneous platforms;

   * Integration of parallel and distributed computing on
 heterogeneous platforms;

   * Experience of porting parallel software from supercomputers to
 heterogeneous platforms;

   * Fault tolerance of parallel computations on heterogeneous
 platforms;

   * Algorithms, models and tools for grid, desktop grid, cloud, and
 green computing.


Program Chair

  * George Bosilca, Innovative Computing Laboratory, Department of
 Electrical Engineering and Computer Science, University of
 Tennessee, Knoxville, USA

Steering Committee

  * Domingo Gimenez, University of Murcia, Spain
  * Alexey Kalinov, Cadence Design Systems, Russia
  * Alexey Lastovetsky, University College Dublin, Ireland
  * Yves Robert, Ecole Normale Supérieure de Lyon, France
  * Leonel Sousa, INESC-ID/IST, TU Lisbon, Portugal
  * Denis Trystram, LIG, Grenoble, France

Program Committee

  * Jacques Mohcine Bahi, University of Franche-Comté, France
  * Jorge Barbosa, Faculdade de Engenharia do Porto, Portugal
  * Andrea Clematis, IMATI-CNR, Italy
  * Michel Daydé, IRIT-ENSEEIHT, France
  * Frédéric Desprez, INRIA, ENS Lyon, France
  * Pierre-François Dutot, ID-IMAG, France
  * Alfredo Goldman, University of São Paulo, Brazil
  * Thomas Hérault, University of Tennessee, Knoxville, US
  * Shuichi Ichikawa, Toyohashi University of Technology, Japan
  * Emmanuel Jeannot, INRIA, France
  * Heleni Karatza, Aristotle University of Thessaloniki, Greece
  * Zhiling Lan, Illinois Institute of Technology, USA
  * Pierre Manneback, University of Mons, Belgium
  * Kiminori Matsuzaki, Kochi Un

Re: [OMPI users] Error when trying to kill a spawned process

2011-06-13 Thread Rodrigo Oliveira
The point is: I have a system composed by a set of mpi processes. These
processes run as daemons in each cluster machine. I need a way to kill
those
ones when I decide to shutdown the system.

Thanks


On Fri, May 6, 2011 at 2:51 PM, Rodrigo Oliveira
wrote:

> Hi.
>
> I am having a problem when I try to kill a spawned process. I am using ompi
> 1.4.3. I use the command ompi-clean to kill all the processes I have
> running, but those ones which were dynamically spawned are not killed.
>
> Any idea?
>
> Thanks in advance.
>


Re: [OMPI users] Error when trying to kill a spawned process

2011-06-13 Thread Ralph Castain

On Jun 13, 2011, at 1:32 PM, Rodrigo Oliveira wrote:

> The point is: I have a system composed by a set of mpi processes. These 
> processes run as daemons in each cluster machine. I need a way to kill those 
> ones when I decide to shutdown the system.

Do you mean that your MPI processes actually "daemonize" - i.e., separate from 
their initial session?? If so, then OMPI certainly has no way to kill them.


> 
> Thanks
> 
> 
> On Fri, May 6, 2011 at 2:51 PM, Rodrigo Oliveira  
> wrote:
> Hi.
> 
> I am having a problem when I try to kill a spawned process. I am using ompi 
> 1.4.3. I use the command ompi-clean to kill all the processes I have running, 
> but those ones which were dynamically spawned are not killed.
> 
> Any idea?
> 
> Thanks in advance.
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users