Re: [OMPI users] Problem with MPI_Intercomm_create
George -- can you file a ticket about this? On Jun 12, 2011, at 1:25 PM, George Bosilca wrote: > Fraderic, > > Based on the current version of the MPI standard, the two groups involved in > the intercomm_create have to be disjoints, which means the leader cannot be > the same process. > > Regarding the issue in Open MPI, the problem is deep in our modex exchange > (contact information). In the example I sent around a while back, the > intercomm_create is working, but the resulting communicator contains > processes without this modex information. This lead to an error on the next > collective communication. > > george. > > On Jun 12, 2011, at 03:44 , Frédéric Feyel wrote: > >> Dear all, thank you very much for the time spent at looking at my problem. >> >> After reading your contributions, it's not clear wether there is a bug in >> OpenMPI or not. >> >> So I created a small self contained source code to analyse the behavior, >> and the problem is still there. >> >> I was wondering if the local and remote leader in the 2 groups could be >> the same process. Unfortunately, I get >> an error in the two cases (local and remote leader identical or not). >> >> What do you think about my small source code ? >> >> Best regards, >> >> Frédéric. >> >> >> On Tue, 07 Jun 2011 10:31:51 -0500, Edgar Gabriel >> wrote: >>> On 6/7/2011 10:23 AM, George Bosilca wrote: On Jun 7, 2011, at 11:00 , Edgar Gabriel wrote: > George, > > I did not look over all the details of your test, but it looks to > me like you are violating one of the requirements of > intercomm_create namely the request that the two groups have to be > disjoint. In your case the parent process(es) are part of both > local intra-communicators, isn't it? The two groups of the two local communicators are disjoints. One contains A,B while the other only C. The bridge communicator contains A,C. I'm confident my example is supposed to work. At least for Open MPI the error is under the hood, as the resulting inter-communicator is valid but contains NULL endpoints for the remote process. >>> >>> I'll come back to that later, I am not yet convinced that your code is >>> correct :-) Your local groups might be disjoint, but I am worried about >>> the ranks of the remote leader in your example. THey can not be 0 from >>> both groups perspective. >>> Regarding the fact that the two leader should be separate processes, you will not find any wording about this in the current version of the standard. In the 1.1 there were two opposite sentences about this one stating that the two groups can be disjoint, while the other claiming that the two leaders can be the same process. After discussion, the agreement was that the two groups have to be disjoint, and the standard has been amended to match the agreement. >>> >>> >>> I realized that this is a non-issue. If the two local groups are >>> disjoint, there is no way that the two local leaders are the same >> process. >>> >>> Thanks >>> Edgar >>> george. > > I just have MPI-1.1. at hand right now, but here is what it says: > > > Overlap of local and remote groups that are bound into an > inter-communicator is prohibited. If there is overlap, then the > program is erroneous and is likely to deadlock. > > so bottom line is that the two local intra-communicators that > are being used have to be disjoint, and the bridgecomm needs to be > a communicator where at least one process of each of the two > disjoint groups need to be able to talk to each other. > Interestingly I did not find a sentence whether it is allowed to be > the same process, or whether the two local leaders need to be > separate processes... > > > Thanks Edgar > > > On 6/7/2011 12:57 AM, George Bosilca wrote: >> Frederic, >> >> Attached you will find an example that is supposed to work. The >> main difference with your code is on T3, T4 where you have >> inversed the local and remote comm. As depicted on the picture >> attached below, during the 3th step you will create the intercomm >> between ab and c (no overlap) using ac as a bridge communicator >> (here the two roots, a and c, can exchange messages). >> >> Based on the MPI 2.2 standard, especially on the paragraph in >> PS:, the attached code should have been working. Unfortunately, I >> couldn't run it successfully neither with Open MPI trunk nor >> MPICH2 1.4rc1. >> >> george. >> >> PS: Here is what the MPI standard states about the >> MPI_Intercomm_create: >>> The function MPI_INTERCOMM_CREATE can be used to create an >>> inter-communicator from two existing intra-communicators, in >>> the following situation: At least one selected member from ea
Re: [OMPI users] Deadlock with barrier und RMA
I think your program has a compile error in the Win_create() line. But other than that, I think you're missing some calls to MPI_WIN_FENCE. The one-sided stuff in MPI-2 is really, really confusing. Others on this list disagree with me, but I actively discourage people from using it. Instead, especially if you're just starting with MPI, you might want to use MPI_SEND and MPI_RECV (and friends). I'd also suggest installing your own version of OMPI; the v1.0 series is several years out of date (either get your admin to install a more recent version, or install a personal copy, as someone outlined earlier in this thread). There have been oodles of bug fixes and new features added since the v1.0 series. On Jun 11, 2011, at 10:43 AM, Ole Kliemann wrote: > Hi everyone! > > I'm trying to use MPI on a cluster running OpenMPI 1.2.4 and starting > processes through PBSPro_11.0.2.110766. I've been running into a couple > of performance and deadlock problems and like to check whether I'm > making a mistake. > > One of the deadlocks I managed to boil down to the attached example. I > run it on 8 cores. It usually deadlocks with all except one process > showing > > start barrier > > as last output. > > The one process out of order shows: > > start getting local > > My question at this point is simply whether this is expected behaviour > of OpenMPI. > > Thanks in advance! > Ole > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Deadlock with barrier und RMA
There are no missing calls to MPI_WIN_FENCE as the code is using passive synchronization (lock/unlock). The test code looks correct, I think this is a bug in Open MPI. The code also fails on the development trunk, so upgrading will not fix the bug. I've filed a bug (#2809). Unfortunately, I'm not sure when I'll have time to investigate further. One other note... Even when everything works correctly, Open MPI's passive target synchronization implementation is pretty poor (this coming from the guy who wrote the code). Open MPI doesn't offer asynchronous progress for lock/unlock, so all processes have to be entering in the MPI library for progress. Also, the latency isn't the best. Brian On 6/13/11 6:41 AM, "Jeff Squyres" wrote: >I think your program has a compile error in the Win_create() line. > >But other than that, I think you're missing some calls to MPI_WIN_FENCE. >The one-sided stuff in MPI-2 is really, really confusing. > >Others on this list disagree with me, but I actively discourage people >from using it. Instead, especially if you're just starting with MPI, you >might want to use MPI_SEND and MPI_RECV (and friends). > >I'd also suggest installing your own version of OMPI; the v1.0 series is >several years out of date (either get your admin to install a more recent >version, or install a personal copy, as someone outlined earlier in this >thread). There have been oodles of bug fixes and new features added >since the v1.0 series. > > >On Jun 11, 2011, at 10:43 AM, Ole Kliemann wrote: > >> Hi everyone! >> >> I'm trying to use MPI on a cluster running OpenMPI 1.2.4 and starting >> processes through PBSPro_11.0.2.110766. I've been running into a couple >> of performance and deadlock problems and like to check whether I'm >> making a mistake. >> >> One of the deadlocks I managed to boil down to the attached example. I >> run it on 8 cores. It usually deadlocks with all except one process >> showing >> >> start barrier >> >> as last output. >> >> The one process out of order shows: >> >> start getting local >> >> My question at this point is simply whether this is expected behaviour >> of OpenMPI. >> >> Thanks in advance! >> Ole >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > >-- >Jeff Squyres >jsquy...@cisco.com >For corporate legal information go to: >http://www.cisco.com/web/about/doing_business/legal/cri/ > > >___ >users mailing list >us...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories
[OMPI users] Deadline Extension: HeteroPAR @ Euro-Par 2011
Due to multiple requests, the deadline have been extended until June 20, 2011. == CALL FOR PAPERS 8th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms HeteroPar'2011 August 29, 2011, Bordeaux, France in conjunction with EuroPar 2011 http://icl.eecs.utk.edu/heteropar2011/ == * Submission of manuscripts: Monday, June 20, 2011 * Notification of acceptance: Friday, July 8, 2011 * Deadline for final version: Friday, July 29, 2011 * Date of workshop:Tuesday, August 29, 2011 Heterogeneity is emerging as one of the most profound and challenging characteristics of today's parallel environments. From the macro level, where networks of distributed computers, composed by diverse node architectures, are interconnected with potentially heterogeneous networks, to the micro level, where deeper memory hierarchies and various accelerator architectures are increasingly common, the impact of heterogeneity on all computing tasks is increasing rapidly. Traditional parallel algorithms, programming environments and tools, designed for legacy homogeneous multiprocessors, can at best achieve on a small fraction of the efficiency and potential performance we should expect from parallel computing in tomorrow's highly diversified and mixed environments. New ideas, innovative algorithms, and specialized programming environments and tools are needed to efficiently use these new and multifarious parallel architectures. The workshop is intended to be a forum for researchers working on algorithms, programming languages, tools, and theoretical models aimed at efficiently solving problems on heterogeneous networks. Authors are encouraged to submit original, unpublished research or overviews on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms. Manuscripts should be limited to 10 pages in Springer LNCS style sheet and submitted through the EasyChair Conference System: https://www.easychair.org/conferences/?conf=heteropar11. Accepted papers that are presented at the workshop will be published in revised form in a special Euro-Par Workshop Volume in the Lecture Notes in Computer Science (LNCS) series after the Euro-Par conference. The topics to be covered include but are not limited to: * Heterogeneous parallel programming paradigms and models; * Performance models and their integration into the design of efficient parallel algorithms for heterogeneous platforms; * Performance models and their integration into the design of efficient parallel algorithms for heterogeneous platforms; * Parallel algorithms for heterogeneous or hierarchical systems, including manycores and hardware accelerators (FPGAs, GPUs, etc.); * Parallel algorithms for efficient problem solving on heterogeneous platforms (numerical linear algebra, nonlinear systems, fast transforms, computational biology, data mining, multimedia, etc.); * Software engineering for heterogeneous parallel systems; * Applications on heterogeneous platforms; * Integration of parallel and distributed computing on heterogeneous platforms; * Experience of porting parallel software from supercomputers to heterogeneous platforms; * Fault tolerance of parallel computations on heterogeneous platforms; * Algorithms, models and tools for grid, desktop grid, cloud, and green computing. Program Chair * George Bosilca, Innovative Computing Laboratory, Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, USA Steering Committee * Domingo Gimenez, University of Murcia, Spain * Alexey Kalinov, Cadence Design Systems, Russia * Alexey Lastovetsky, University College Dublin, Ireland * Yves Robert, Ecole Normale Supérieure de Lyon, France * Leonel Sousa, INESC-ID/IST, TU Lisbon, Portugal * Denis Trystram, LIG, Grenoble, France Program Committee * Jacques Mohcine Bahi, University of Franche-Comté, France * Jorge Barbosa, Faculdade de Engenharia do Porto, Portugal * Andrea Clematis, IMATI-CNR, Italy * Michel Daydé, IRIT-ENSEEIHT, France * Frédéric Desprez, INRIA, ENS Lyon, France * Pierre-François Dutot, ID-IMAG, France * Alfredo Goldman, University of São Paulo, Brazil * Thomas Hérault, University of Tennessee, Knoxville, US * Shuichi Ichikawa, Toyohashi University of Technology, Japan * Emmanuel Jeannot, INRIA, France * Heleni Karatza, Aristotle University of Thessaloniki, Greece * Zhiling Lan, Illinois Institute of Technology, USA * Pierre Manneback, University of Mons, Belgium * Kiminori Matsuzaki, Kochi Un
Re: [OMPI users] Error when trying to kill a spawned process
The point is: I have a system composed by a set of mpi processes. These processes run as daemons in each cluster machine. I need a way to kill those ones when I decide to shutdown the system. Thanks On Fri, May 6, 2011 at 2:51 PM, Rodrigo Oliveira wrote: > Hi. > > I am having a problem when I try to kill a spawned process. I am using ompi > 1.4.3. I use the command ompi-clean to kill all the processes I have > running, but those ones which were dynamically spawned are not killed. > > Any idea? > > Thanks in advance. >
Re: [OMPI users] Error when trying to kill a spawned process
On Jun 13, 2011, at 1:32 PM, Rodrigo Oliveira wrote: > The point is: I have a system composed by a set of mpi processes. These > processes run as daemons in each cluster machine. I need a way to kill those > ones when I decide to shutdown the system. Do you mean that your MPI processes actually "daemonize" - i.e., separate from their initial session?? If so, then OMPI certainly has no way to kill them. > > Thanks > > > On Fri, May 6, 2011 at 2:51 PM, Rodrigo Oliveira > wrote: > Hi. > > I am having a problem when I try to kill a spawned process. I am using ompi > 1.4.3. I use the command ompi-clean to kill all the processes I have running, > but those ones which were dynamically spawned are not killed. > > Any idea? > > Thanks in advance. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users