Hello all, I have a library-usage question to the SC, a (wrapping) library building/shipping question to GCC@ readers, and, of course, I also want to announce the project a bit wider and seek for comments.
* * * Coarrays are an extension of Fortran, which date back to the 1990s but have now been integrated into the upcoming* Fortran 2008 standard (ISO/IEC 1539-1:2010). Coarrays can be used to parallelize programs using a partitioned global address space (PGAS) and following the single-program--multiple-data (SPMD) scheme. As coarrays are part of the language, a strong type checking is provided. Each process (called image) has its own private variables. Only variables which have a so-called codimension are addressable from other images. The C (C99) analogue is called Unified Parallel C (UPC), which is, however, not (yet) an international standard. There exists a GCC UPC compiler since a couple of years and there are plans to merge it ("GUPC") into GCC 4.6 trunk, cf. http://gcc.gnu.org/ml/gcc/2010-04/msg00117.html (There are also plans to standardize the UPC--Coarray-Fortran interoperability.) A bit longer description of coarrays, references to the standard, to introductory texts, to talks (including Toon's GCC Summit talk), the current status, and an unsorted collection of thoughts can be found at http://users.physik.fu-berlin.de/~tburnus/coarray/README.txt Currently, single-image support (i.e. compiling a coarray program as serial program for one image) is (nearly fully) implemented in the GCC trunk (4.6, -fcoarray=single). The next step is add support for multiple images. It is planned to implement a shared-memory thread-based version and a library version. * * * Regarding the library version (which will be implemented first): There are several suitable libraries available: a) MPI (message passing interface, http://www.mpi-forum.org/), which is widely used and several Open Source implementations exist, such as Open MPI and MPICH(2). MPI is also well documented (the API, how to use a given implementation, and MPI in general). MPIv1.x allows for two-sided communication, MPIv2 added additionally single-sided communication. b) GASNet (http://gasnet.cs.berkeley.edu/) a single-sided communication, BSD-licensed library by UC Berkeley. This library is also used by GUPS via Berkeley's UPC library. c) ARMCI (Aggregated Remote Memory Copy, http://www.emsl.pnl.gov/docs/parsoft/armci/) + GA (Global Array), a single-sided communication library by DoE's EMSL; rather free licence and redistributable, but requires registration for download at the EMSL homepage d) Not really available, but as in-between solution: One can implement the threaded version used a library, which might be a faster way to get additionally a threaded version than implementing thread version directly in the front end. (Which is also planned.) [Both (b) and (c) are used with PGAS languages on HPC systems and are said to scale well. There are claims that the PGAS programming scheme allows to write faster communication libraries than MPI does, but as the underlying task is the same, I only expect minute difference, which more depend on the actual implementation than on the interface/programming model. The plan is to provide at the end MPI, GASNet, and ARMCI+GA wrappers, which will then allow to do comparisons.] Question to the GCC Steering Committee: Do you see any problems of supporting those libraries? For Berkeley's GASNet the question also applies to GUPS. (GUPS uses shared-memory via threads but also can use Berkeley's UPC library, which is based on GASNet.) Implementation: The current plan is to start with (a), i.e. MPI, and try hard to avoid race conditions and thus possibly tries to avoid single-sided communication.** - Next would be probably (d), (b), or a version of (a) which fully relies on MPI's single-sided-communication [let's see]. There might be also two versions for each library - one which tuned for performance and one for debugging, possibly with different API. As - contrary to, e.g., UPC - one cannot read C header files in Fortran, one needs a always a wrapper library. For MPI it also depends on the MPI implementation. Thus, I was thinking of simply providing gfortran_caf_<library>.c files to be used as: mpicc -c $(CFLAGS) gfortran_caf_mpi.c mpif90 $(FFLAGS) coarray_program.f90 gfortran_caf_mpi.o That way also LTO nicely works (even without gold); however, the question is only how to best ship this library. The thread version could be simply compiled and shipped with gfortran, but the others ... Ideas? Suggestions? For the implementation, the current agenda is: a) Finishing the remaining to-do items for the single-image version b) Design an MPIv1 version (maybe simultaneously start implementing the more obvious parts, such as startup, barriers, shutdown, and error abort) c) Implement the the actual coarray initialization/communication part d) Test it I would be happy to have some more support for (b), (c) and (d); especially, I would like those having experience in either GCC internals (especially backend and for different targets) or in high-performance computing would have a look at the (upcoming) design draft or actual implementation to suggest improvements for stability (such as avoiding race conditions) and performance. There are already some on board, which have, but the more expertise the better :-) Testing: One problem with regards to testing (correctness, performance, scaling): There does not seem to be any larger, publicly available coarray program; and the smaller tests I know do not cover things like locks or atomic operations. I hope that this will improve, but if you have a coarray program that you could make available - publicly or privately, or if you could test it - that would be awesome! The goal is to support parallelization for small 2, 4, or 8 core work stations (via threads and out of the box), smaller 8, 16, 32 clusters (Gigabit/Infiniband with MPI/GASNet/ARMCI), but also for large HPC systems such as x86-64 with 20,000+ cores or Blue Gene with 200,000+ processors (on such systems, GCC is usually installed as backup/fallback compiler***). Tobias PS: To my knowledge, coarrays are currently supported by the Cray compiler (since many years, but I think the latest version now also accepts the modified syntax of Fortran 2008), by the Rice meta-compiler (old syntax only), and by g95 (since about two years). Additionally, one can expect that the major commercial vendors will have coarray support relatively soon as there seems to be a large demand and as Fortran 2008 is now almost an ISO standard.* The GCC Fortran bugreport PR 18918 dates back to 2004 and also on the gfortran list and on comp.lang.fortran there was considerable interest in this feature. * The Fortran 2008 standard should be in Stage ~40.99, i.e. an FDIS (final draft international standard) exists, which now needs to go through a (last?) round of ISO member balloting before it can be published. ** Cf. http://gcc.gnu.org/ml/fortran/2010-03/msg00201.html for a starting point for the implementation of multi-image support via MPI. *** That really happens. Two years back, I was told that "gfortran saved a Ph. D. thesis"; the vendor compiler had a bug - and until it was fixed, GCC was used - with the vendor's library and a quite good performance. (The cascade of creating a test case for the bug, convincing the HPC centre, which then reported the bug via the vendor support to the actual compiler developers, and with subsequent fixing, testing, releasing the new version, and, finally, installing the new version took many, many months.)