Hello Anuj, On Thu, Mar 28 2024, Anuj Mohite wrote: > Hi, > I'm Anuj M, an undergraduate student interested in participating in GSoC > 2024 with GCC. I would like to work on the project improving the DO > CONCURRENT construct in the GFortran compiler.The current implementation in > GFortran has limitations in handling locality clauses, supporting reduction > operations, and parallelization strategies for DO CONCURRENT loops. So the > proposal aims to address these limitations:
timing of the GSoC contributor application deadline (on the upcoming Tuesday) is a bit unfortunate because of Easter, many involved mentors have a long weekend (public holiday on Friday or Monday or, like me, both). So please even if you do not receive any more feedback, make sure to apply - and don't leave it until the last day. IIUC a proposal can be always updated later. I admit that I managed to have only a very quick look at your proposal but it all looked good to me. Good luck! Martin > > 1. Implementing locality clauses and ensuring correct handling of data > dependencies. > 2. Supporting reduction operations in DO CONCURRENT loops. > 3. Developing parallelization strategies, including OpenMP-based > parallelization and OpenMP offloading. > > I have added a detailed project proposal outlining the implementation > approach, timeline, my relevant background, and experience. > > I would greatly appreciate feedback or suggestions from the GCC community > regarding this project proposal. > > Best regards, > Anuj M > > ## GCC, the GNU Compiler Collection - Google Summer Of Code 24 Proposal - > Anuj Mohite > > Project: Fortran - DO CONCURRENT > > Abstract: > > The `DO CONCURRENT` construct, introduced in the Fortran 2018 standard, > provides a mechanism to express parallelism in Fortran programs. However, > fully leveraging its potential requires a systematic and comprehensive > implementation within Fortran compilers. This proposal outlines a robust > solution for implementing `DO CONCURRENT` support, encompassing parsing and > handling of locality clauses, enabling reduction operations, and developing > parallelization strategies utilising OpenMP. > To ensure efficient parallel execution, performance optimization techniques > will be employed. By facilitating efficient parallelization of `DO > CONCURRENT` loops, this project aims to contribute to Fortran's continued > performance in high-performance computing domains, further enhancing its > capabilities in this crucial area. > > Current State of Feature: > > At present, the support for the `DO CONCURRENT` construct in the GFortran > compiler is limited. The existing implementation only partially handles the > locality clauses introduced in the Fortran 2018 standard, and it lacks > support for reduction operations and parallelization strategies. As a > result, the performance gains achievable through the `DO CONCURRENT` > construct are not fully realised. > > The current implementation in GFortran involves a basic parser for the `DO > CONCURRENT` construct and its locality clauses. However, the semantic > analysis and code generation phases are incomplete, leading to incorrect > handling of data dependencies and potential race conditions. Additionally, > the compiler does not support reduction operations or any parallelization > strategies for `DO CONCURRENT` loops, effectively executing them in a > serial manner. > > Other Fortran compilers, such as those from NVIDIA's nvfortran and Intel's > ifort, have implemented varying levels of support for `DO CONCURRENT`. > However, their implementations often have limitations or restrictions, and > their performance can vary depending on the specific workload and hardware > architecture. > > Furthermore, as the Fortran language continues to evolve, with the upcoming > Fortran 202x standard introducing additional features and enhancements > related to the `DO CONCURRENT` construct, it is crucial for compilers to > stay up-to-date and provide comprehensive support for these language > features. > Project Goals > > The primary goals of this project are: > > 1. Implement Locality Clauses: > > * Extend the GFortran compiler to support locality clauses specified in the > Fortran 2018 standard for the `DO CONCURRENT` construct. > * Include parsing, semantic analysis, and code generation phases to handle > specified data dependencies correctly. > * Modify the compiler's parser to recognize new syntax for `DO CONCURRENT` > loops and locality clauses, constructing an accurate AST. > * Enhance semantic analysis phase to perform data dependency analysis, > loop-carried dependency analysis, and alias analysis. > * Resolve data dependencies and identify potential parallelization > opportunities. > > 2. Support Reduction Operations: > > * add support for reduction operations in the `DO CONCURRENT` construct, as > introduced in the upcoming Fortran 202x standard. > * Involve parsing reduction clauses, semantic analysis for correctness, and > generating optimized code for parallel reduction operations. > * Extend the compiler's parser to recognize new syntax for reduction > clauses, constructing an accurate AST. > * Enhance semantic analysis phase to analyze reduction clauses and loop > body, identifying potential dependencies and ensuring correctness of > reduction operation. > * Employ techniques like data dependency analysis and alias analysis to > accurately identify variables involved in reduction operation and ensure > they are not modified outside reduction context. > > 3. Parallelize DO CONCURRENT Loops: > > * Develop and integrate parallelization strategies for `DO CONCURRENT` > loops into the GFortran compiler. > * Include OpenMP-based parallelization and OpenMP offloading. > > OpenMP-based Parallelization: > > * Leverage OpenMP API to enable thread-based parallelization of `DO > CONCURRENT` loops on shared-memory systems. > * Generate code to create OpenMP parallel regions around `DO CONCURRENT` > loop, distribute iterations across threads using work-sharing constructs. > * Handle synchronization and reduction operations using OpenMP's reduction > clauses or atomic operations. > > OpenMP Offloading: > > * Extend OpenMP-based parallelization to support offloading `DO CONCURRENT` > loops to accelerator devices like GPUs, using OpenMP target construct. > * Generate code to detect and initialize accelerator devices, transfer data > between host and device. > * Generate compute kernels optimized for accelerator architecture, handle > synchronization and result collection. > > Implementation: > > The proposed implementation involves modifying the GFortran compiler's > parser, semantic analyzer, and code generator to handle the `DO CONCURRENT` > construct and its associated clauses. The implementation is divided into > several phases: > > 1. Parsing and AST Construction: Extend the parser to recognize the new > syntax for `DO CONCURRENT` loops, locality clauses, and reduction clauses, > constructing an abstract syntax tree (AST) that accurately represents these > constructs. > This phase will involve modifying the Fortran grammar rules and > implementing the necessary parsing actions to correctly parse the `DO > CONCURRENT` construct and its associated clauses. The parser will need to > handle various syntax variations, such as the presence or absence of > locality clauses, reduction clauses, or both. > > 2. Semantic Analysis and Dependency Resolution: Implement semantic analysis > techniques, such as data dependency analysis, loop-carried dependency > analysis, alias analysis, polyhedral analysis, and array data-flow > analysis, to resolve data dependencies and identify potential > parallelization opportunities accurately. > The semantic analysis phase will involve analyzing the AST constructed > during the parsing phase to identify data dependencies and potential > parallelization opportunities. This will involve techniques such as data > dependency analysis, loop-carried dependency analysis, alias analysis, > polyhedral analysis, and array data-flow analysis to provide more accurate > dependency information and enable more aggressive optimizations. > > 3. Code Generation and Transformation: Generate optimized code for parallel > execution of `DO CONCURRENT` loops, respecting the specified locality > clauses and reduction operations. This may involve techniques such as loop > distribution, loop fission, loop fusion, loop blocking, loop unrolling, > software pipelining, and the use of synchronization primitives. > The code generation phase will be responsible for generating optimized > code for parallel execution of `DO CONCURRENT` loops, taking into account > the information gathered during the semantic analysis phase and the > specified locality clauses and reduction operations. This may involve > techniques such as loop distribution, loop fission, loop fusion, loop > blocking, loop unrolling, software pipelining, and the use of > synchronization primitives to ensure efficient parallel execution on modern > hardware architectures. > > 4. Parallelization Strategies: Implement parallelization strategies, such > as OpenMP-based parallelization, OpenMP offloading. These strategies will > involve generating the necessary code for parallel execution, load > balancing, and synchronization. > > * OpenMP-based Parallelization: > > The OpenMP-based parallelization strategy will leverage the widely-used > OpenMP API to enable thread-based parallelization of `DO CONCURRENT` loops > on shared-memory systems. This will involve generating code to create > OpenMP parallel regions around the `DO CONCURRENT` loop, distributing the > iterations across available threads using work-sharing constructs such as > `omp parallel do` or `omp parallel loop`. The implementation will also > handle synchronization and reduction operations using OpenMP's reduction > clauses or atomic operations. > > * OpenMP Offloading: > > The OpenMP offloading strategy will extend the OpenMP-based parallelization > to support offloading `DO CONCURRENT` loops to accelerator devices, such as > GPUs, using the OpenMP target construct. This will involve generating code > to detect and initialize accelerator devices, transfer necessary data > between the host and the device, generate compute kernels optimized for the > accelerator architecture, and handle synchronization and result collection. > > Timeline of the Project: > > Adding Patches & Understanding Code (April 3 - April 30) > > * Contribute minor patches and bug fixes to gain deeper codebase > understanding. > * Study the code organisation, data structures, and compilation phases > related to DO CONCURRENT. > > Community Bonding Period (May 1 - May 26) > > * Familiarize myself with the GFortran codebase, Fortran language > standards, and existing implementations of `DO CONCURRENT` in other > compilers. > * Discuss project goals and implementation details with the mentor, > clarifying doubts or concerns. > * Set up the development environment and ensure all necessary tools and > dependencies are in place. > > Week 1-2: Parsing and AST Construction (May 27 - June 9) > > * Extend the GFortran compiler's parser to recognize the new syntax for `DO > CONCURRENT` loops, locality clauses, and reduction clauses. > * Modify the grammar rules and implement parsing actions to correctly parse > these constructs. > * Construct an AST that accurately represents the `DO CONCURRENT` construct > and its associated clauses. > > Week 3-4: Semantic Analysis and Dependency Resolution (June 10 - June 23) > > * Implement semantic analysis techniques like data dependency analysis, > loop-carried dependency analysis, and alias analysis. > * Analyze the AST to identify data dependencies and potential > parallelization opportunities. > * Resolve data dependencies and ensure the correctness of the `DO > CONCURRENT` loop execution. > > Week 5-6: Code Generation and Transformation (June 24 - July 7) > > * Generate optimized code for parallel execution of `DO CONCURRENT` loops, > respecting locality clauses and reduction operations. > * Implement techniques such as loop distribution, loop fission, loop > fusion, and the use of synchronization primitives. > > Week 7-10: OpenMP-based Parallelization and OpenMP Offloading (July 8 - > August 4) > > * Implement the OpenMP-based parallelization strategy for `DO CONCURRENT` > loops on shared-memory systems. > * Generate code to create OpenMP parallel regions, distribute iterations > across threads, and handle synchronization and reduction operations. > * Implement the OpenMP offloading strategy for offloading `DO CONCURRENT` > loops to accelerator devices like GPUs. > > Week 11: Performance Optimization (August 5 - August 12) > > * Implement techniques to optimize the performance of parallelized `DO > CONCURRENT` loops, like loop tiling, data prefetching, and minimizing > synchronization overhead > > Week 12: Testing, Benchmarking, and Documentation (August 13 - August 19 ) > > * Generate and finalize the comprehensive test suite to validate the > correctness of the proposed implementation, covering various use cases and > edge scenarios. > * Document the project, including implementation details, performance > results, and any relevant findings or limitations. > > About Me: > > * Name - Anuj Mohite > * University - College of Engineering Pune Technological University > * Personal Email - anujmohite...@gmail.com > * University Email - mohitear21.c...@coeptech.ac.in > * GitHub username: https://www.github.com/anujrmohite > * Time Zone - IST (GMT + 05:30) Time zone in India > * Country & City: Pune, India > * Prefered Language for communication: English > > > Academic Background: > * Pursuing a Bachelor's degree in Computer Science and Engineering from the > College of Engineering Pune, Technological University. > * Journey in programming began during the first year of high school Diploma > in 2018, self-taught skills in C/C++ for Embedded Systems programming. > > Current Studies and Work: > * Working as a Generalist Engineering Intern at Syrma SGS, contributing to > Electronic hardware and software product development for Embedded Systems, > with expected work hours of 16 - 20 per week. > * Currently responsible for developing a custom Linux-based distribution > system for Automotive Applications. > > Compiler-related Coursework: > * Taken Compiler Construction theory and laboratory courses as part of the > college curriculum. Completed assignments (GitHub link: click here). > * Learned about different phases of compilation, various optimization > techniques, etc. (Course syllabus Github Link: click here). > > Future Aspirations: > * Wish to work with GCC this summer as a GSoC student, committing around 7 > - 8 hours/Day and around 40 - 50 hours/week. > * Believe in possessing the necessary skills to undertake this project. > * Hope to make significant contributions to GCC this summer and be a part > of GCC in the future. > > My experience with GCC: > > I'm part of the Free Software Users Group (CoFSUG) at my college, COEP. > We're a bunch of students who are really into exploring the whole Free and > Open Source Software (FOSS). We've been digging into how UNIX, GNU, and > eventually GNU/Linux came to be, reading their journey from the early days. > Because of this newfound interest, I got really into the GCC project and > how it's always evolving. I started reaching out to the GCC community, like > Martin and Jerry, to participate in Summer of Code. I also checked out the > Insights On GFortran Mattermost space, which helped me learn how to build, > test, and debug the GCC code. > Now, I'm interested in implementing the `DO CONCURRENT` feature in > GFortran. I'm super dedicated to work on it. And the awesome discussions > happening on Bugzilla/ GCC mailing lists are adding more knowledge to me > regarding overall development, and I'm happy and enthusiastic to be a part > of it. > > Post GSOC: > > My genuine interest in compiler development drives me to actively > contribute to GCC. I will stay updated with GCC's advancements and > contribute to its evolution. Furthermore, I will be available for any > future enhancements or extensions related to this project. > > References: > > [1] Can Fortran's 'do concurrent' replace directives for accelerated > computing? > [2] > https://arxiv.org/catchup?smonth=10&group=grp_&sday=21&num=50&archive=cs&method=without&syear=2021 > . > [3] OpenMP Architecture Review Board. (2018). OpenMP Application > Programming Interface Version 5.0. > [4] OpenACC-Standard.org. (2015). The OpenACC Application Programming > Interface Version 2.5. > [5] Mellor-Crummey, J., & Scott, M. L. (1991). Algorithms for scalable > synchronization on shared-memory multiprocessors. > [6] Satish, N., Harris, M., & Garland, M. (2009). Designing efficient > sorting algorithms for manycore GPUs. > [7] Stratton, J. A., Rodrigues, C., Sung, I. J., Obeid, N., Chang, L. W., > Anssari, N., ... & Hwu, W. M. (2012). Parboil: A revised benchmark suite > for scientific and commercial throughput computing. Center for Reliable and > High-Performance Computing, 127. > [8] Deville, N., Hammer, M., KRAFTIS, J., O'KEEFE, M., Chapman, B., & > Witting, K. (2022). OpenMP Technical Report 9 on OpenMP and Accelerators. > OpenMP Architecture Review Board. > [9] DO CONCURRENT isn’t necessarily concurrent