Hello Anuj,

On Thu, Mar 28 2024, Anuj Mohite wrote:
> Hi,
> I'm Anuj M, an undergraduate student interested in participating in GSoC
> 2024 with GCC. I would like to work on the project improving the DO
> CONCURRENT construct in the GFortran compiler.The current implementation in
> GFortran has limitations in handling locality clauses, supporting reduction
> operations, and parallelization strategies for DO CONCURRENT loops. So the
> proposal aims to address these limitations:

timing of the GSoC contributor application deadline (on the upcoming
Tuesday) is a bit unfortunate because of Easter, many involved mentors
have a long weekend (public holiday on Friday or Monday or, like me,
both).  So please even if you do not receive any more feedback, make
sure to apply - and don't leave it until the last day.  IIUC a proposal
can be always updated later.

I admit that I managed to have only a very quick look at your proposal
but it all looked good to me.

Good luck!

Martin

>
>    1. Implementing locality clauses and ensuring correct handling of data
>    dependencies.
>    2. Supporting reduction operations in DO CONCURRENT loops.
>    3. Developing parallelization strategies, including OpenMP-based
>    parallelization and OpenMP offloading.
>
> I have added a detailed project proposal outlining the implementation
> approach, timeline, my relevant background, and experience.
>
> I would greatly appreciate feedback or suggestions from the GCC community
> regarding this project proposal.
>
> Best regards,
> Anuj M
>
> ## GCC, the GNU Compiler Collection - Google Summer Of Code 24 Proposal -
> Anuj Mohite
>
> Project: Fortran - DO CONCURRENT
>
> Abstract:
>
> The `DO CONCURRENT` construct, introduced in the Fortran 2018 standard,
> provides a mechanism to express parallelism in Fortran programs. However,
> fully leveraging its potential requires a systematic and comprehensive
> implementation within Fortran compilers. This proposal outlines a robust
> solution for implementing `DO CONCURRENT` support, encompassing parsing and
> handling of locality clauses, enabling reduction operations, and developing
> parallelization strategies utilising OpenMP.
> To ensure efficient parallel execution, performance optimization techniques
> will be employed. By facilitating efficient parallelization of `DO
> CONCURRENT` loops, this project aims to contribute to Fortran's continued
> performance in high-performance computing domains, further enhancing its
> capabilities in this crucial area.
>
> Current State of Feature:
>
> At present, the support for the `DO CONCURRENT` construct in the GFortran
> compiler is limited. The existing implementation only partially handles the
> locality clauses introduced in the Fortran 2018 standard, and it lacks
> support for reduction operations and parallelization strategies. As a
> result, the performance gains achievable through the `DO CONCURRENT`
> construct are not fully realised.
>
> The current implementation in GFortran involves a basic parser for the `DO
> CONCURRENT` construct and its locality clauses. However, the semantic
> analysis and code generation phases are incomplete, leading to incorrect
> handling of data dependencies and potential race conditions. Additionally,
> the compiler does not support reduction operations or any parallelization
> strategies for `DO CONCURRENT` loops, effectively executing them in a
> serial manner.
>
> Other Fortran compilers, such as those from NVIDIA's nvfortran and Intel's
> ifort, have implemented varying levels of support for `DO CONCURRENT`.
> However, their implementations often have limitations or restrictions, and
> their performance can vary depending on the specific workload and hardware
> architecture.
>
> Furthermore, as the Fortran language continues to evolve, with the upcoming
> Fortran 202x standard introducing additional features and enhancements
> related to the `DO CONCURRENT` construct, it is crucial for compilers to
> stay up-to-date and provide comprehensive support for these language
> features.
> Project Goals
>
> The primary goals of this project are:
>
> 1. Implement Locality Clauses:
>
> * Extend the GFortran compiler to support locality clauses specified in the
> Fortran 2018 standard for the `DO CONCURRENT` construct.
> * Include parsing, semantic analysis, and code generation phases to handle
> specified data dependencies correctly.
> * Modify the compiler's parser to recognize new syntax for `DO CONCURRENT`
> loops and locality clauses, constructing an accurate AST.
> * Enhance semantic analysis phase to perform data dependency analysis,
> loop-carried dependency analysis, and alias analysis.
> * Resolve data dependencies and identify potential parallelization
> opportunities.
>
> 2. Support Reduction Operations:
>
> * add support for reduction operations in the `DO CONCURRENT` construct, as
> introduced in the upcoming Fortran 202x standard.
> * Involve parsing reduction clauses, semantic analysis for correctness, and
> generating optimized code for parallel reduction operations.
> * Extend the compiler's parser to recognize new syntax for reduction
> clauses, constructing an accurate AST.
> * Enhance semantic analysis phase to analyze reduction clauses and loop
> body, identifying potential dependencies and ensuring correctness of
> reduction operation.
> * Employ techniques like data dependency analysis and alias analysis to
> accurately identify variables involved in reduction operation and ensure
> they are not modified outside reduction context.
>
> 3. Parallelize DO CONCURRENT Loops:
>
> * Develop and integrate parallelization strategies for `DO CONCURRENT`
> loops into the GFortran compiler.
> * Include OpenMP-based parallelization and OpenMP offloading.
>
> OpenMP-based Parallelization:
>
> * Leverage OpenMP API to enable thread-based parallelization of `DO
> CONCURRENT` loops on shared-memory systems.
> * Generate code to create OpenMP parallel regions around `DO CONCURRENT`
> loop, distribute iterations across threads using work-sharing constructs.
> * Handle synchronization and reduction operations using OpenMP's reduction
> clauses or atomic operations.
>
> OpenMP Offloading:
>
> * Extend OpenMP-based parallelization to support offloading `DO CONCURRENT`
> loops to accelerator devices like GPUs, using OpenMP target construct.
> * Generate code to detect and initialize accelerator devices, transfer data
> between host and device.
> * Generate compute kernels optimized for accelerator architecture, handle
> synchronization and result collection.
>
> Implementation:
>
> The proposed implementation involves modifying the GFortran compiler's
> parser, semantic analyzer, and code generator to handle the `DO CONCURRENT`
> construct and its associated clauses. The implementation is divided into
> several phases:
>
> 1. Parsing and AST Construction: Extend the parser to recognize the new
> syntax for `DO CONCURRENT` loops, locality clauses, and reduction clauses,
> constructing an abstract syntax tree (AST) that accurately represents these
> constructs.
>   This phase will involve modifying the Fortran grammar rules and
> implementing the necessary parsing actions to correctly parse the `DO
> CONCURRENT` construct and its associated clauses. The parser will need to
> handle various syntax variations, such as the presence or absence of
> locality clauses, reduction clauses, or both.
>
> 2. Semantic Analysis and Dependency Resolution: Implement semantic analysis
> techniques, such as data dependency analysis, loop-carried dependency
> analysis, alias analysis, polyhedral analysis, and array data-flow
> analysis, to resolve data dependencies and identify potential
> parallelization opportunities accurately.
>   The semantic analysis phase will involve analyzing the AST constructed
> during the parsing phase to identify data dependencies and potential
> parallelization opportunities. This will involve techniques such as data
> dependency analysis, loop-carried dependency analysis, alias analysis,
> polyhedral analysis, and array data-flow analysis to provide more accurate
> dependency information and enable more aggressive optimizations.
>
> 3. Code Generation and Transformation: Generate optimized code for parallel
> execution of `DO CONCURRENT` loops, respecting the specified locality
> clauses and reduction operations. This may involve techniques such as loop
> distribution, loop fission, loop fusion, loop blocking, loop unrolling,
> software pipelining, and the use of synchronization primitives.
>   The code generation phase will be responsible for generating optimized
> code for parallel execution of `DO CONCURRENT` loops, taking into account
> the information gathered during the semantic analysis phase and the
> specified locality clauses and reduction operations. This may involve
> techniques such as loop distribution, loop fission, loop fusion, loop
> blocking, loop unrolling, software pipelining, and the use of
> synchronization primitives to ensure efficient parallel execution on modern
> hardware architectures.
>
> 4. Parallelization Strategies: Implement parallelization strategies, such
> as OpenMP-based parallelization, OpenMP offloading. These strategies will
> involve generating the necessary code for parallel execution, load
> balancing, and synchronization.
>
> *   OpenMP-based Parallelization:
>
> The OpenMP-based parallelization strategy will leverage the widely-used
> OpenMP API to enable thread-based parallelization of `DO CONCURRENT` loops
> on shared-memory systems. This will involve generating code to create
> OpenMP parallel regions around the `DO CONCURRENT` loop, distributing the
> iterations across available threads using work-sharing constructs such as
> `omp parallel do` or `omp parallel loop`. The implementation will also
> handle synchronization and reduction operations using OpenMP's reduction
> clauses or atomic operations.
>
> *   OpenMP Offloading:
>
> The OpenMP offloading strategy will extend the OpenMP-based parallelization
> to support offloading `DO CONCURRENT` loops to accelerator devices, such as
> GPUs, using the OpenMP target construct. This will involve generating code
> to detect and initialize accelerator devices, transfer necessary data
> between the host and the device, generate compute kernels optimized for the
> accelerator architecture, and handle synchronization and result collection.
>
> Timeline of the Project:
>
> Adding Patches & Understanding Code (April 3 -  April 30)
>
> * Contribute minor patches and bug fixes to gain deeper codebase
> understanding.
> * Study the code organisation, data structures, and compilation phases
> related to DO CONCURRENT.
>
> Community Bonding Period (May 1 -  May 26)
>
> * Familiarize myself with the GFortran codebase, Fortran language
> standards, and existing implementations of `DO CONCURRENT` in other
> compilers.
> * Discuss project goals and implementation details with the mentor,
> clarifying doubts or concerns.
> * Set up the development environment and ensure all necessary tools and
> dependencies are in place.
>
> Week 1-2: Parsing and AST Construction (May 27 - June 9)
>
> * Extend the GFortran compiler's parser to recognize the new syntax for `DO
> CONCURRENT` loops, locality clauses, and reduction clauses.
> * Modify the grammar rules and implement parsing actions to correctly parse
> these constructs.
> * Construct an AST that accurately represents the `DO CONCURRENT` construct
> and its associated clauses.
>
> Week 3-4: Semantic Analysis and Dependency Resolution (June 10 - June 23)
>
> * Implement semantic analysis techniques like data dependency analysis,
> loop-carried dependency analysis, and alias analysis.
> * Analyze the AST to identify data dependencies and potential
> parallelization opportunities.
> * Resolve data dependencies and ensure the correctness of the `DO
> CONCURRENT` loop execution.
>
> Week 5-6: Code Generation and Transformation (June 24 - July 7)
>
> * Generate optimized code for parallel execution of `DO CONCURRENT` loops,
> respecting locality clauses and reduction operations.
> * Implement techniques such as loop distribution, loop fission, loop
> fusion, and the use of synchronization primitives.
>
> Week 7-10: OpenMP-based Parallelization and OpenMP Offloading (July 8 -
> August 4)
>
> * Implement the OpenMP-based parallelization strategy for `DO CONCURRENT`
> loops on shared-memory systems.
> * Generate code to create OpenMP parallel regions, distribute iterations
> across threads, and handle synchronization and reduction operations.
> * Implement the OpenMP offloading strategy for offloading `DO CONCURRENT`
> loops to accelerator devices like GPUs.
>
> Week 11: Performance Optimization (August 5 - August 12)
>
> * Implement techniques to optimize the performance of parallelized `DO
> CONCURRENT` loops, like loop tiling, data prefetching, and minimizing
> synchronization overhead
>
> Week 12: Testing, Benchmarking, and Documentation (August 13 - August 19 )
>
> * Generate and finalize the comprehensive test suite to validate the
> correctness of the proposed implementation, covering various use cases and
> edge scenarios.
> * Document the project, including implementation details, performance
> results, and any relevant findings or limitations.
>
> About Me:
>
> * Name - Anuj Mohite
> * University - College of Engineering Pune Technological University
> * Personal Email -  anujmohite...@gmail.com
> * University Email - mohitear21.c...@coeptech.ac.in
> * GitHub username: https://www.github.com/anujrmohite
> * Time Zone - IST (GMT + 05:30) Time zone in India
> * Country & City: Pune, India
> * Prefered Language for communication: English
>
>
> Academic Background:
> * Pursuing a Bachelor's degree in Computer Science and Engineering from the
> College of Engineering Pune, Technological University.
> * Journey in programming began during the first year of high school Diploma
> in 2018, self-taught skills in C/C++ for Embedded Systems programming.
>
> Current Studies and Work:
> * Working as a Generalist Engineering Intern at Syrma SGS, contributing to
> Electronic hardware and software product development for Embedded Systems,
> with expected work hours of 16 - 20 per week.
> * Currently responsible for developing a custom Linux-based distribution
> system for Automotive Applications.
>
> Compiler-related Coursework:
> * Taken Compiler Construction theory and laboratory courses as part of the
> college curriculum. Completed assignments (GitHub link: click here).
> * Learned about different phases of compilation, various optimization
> techniques, etc. (Course syllabus Github Link: click here).
>
> Future Aspirations:
> * Wish to work with GCC this summer as a GSoC student, committing around 7
> - 8 hours/Day and around 40 - 50 hours/week.
> * Believe in possessing the necessary skills to undertake this project.
> * Hope to make significant contributions to GCC this summer and be a part
> of GCC in the future.
>
> My experience with GCC:
>
> I'm part of the Free Software Users Group (CoFSUG) at my college, COEP.
> We're a bunch of students who are really into exploring the whole Free and
> Open Source Software (FOSS). We've been digging into how UNIX, GNU, and
> eventually GNU/Linux came to be, reading their journey from the early days.
> Because of this newfound interest, I got really into the GCC project and
> how it's always evolving. I started reaching out to the GCC community, like
> Martin and Jerry, to participate in Summer of Code. I also checked out the
> Insights On GFortran Mattermost space, which helped me learn how to build,
> test, and debug the GCC code.
> Now, I'm interested in implementing the `DO CONCURRENT` feature in
> GFortran. I'm super dedicated to work on it. And the awesome discussions
> happening on Bugzilla/ GCC mailing lists are adding more knowledge to me
> regarding overall development, and I'm happy and enthusiastic to be a part
> of it.
>
> Post GSOC:
>
> My genuine interest in compiler development drives me to actively
> contribute to GCC. I will stay updated with GCC's advancements and
> contribute to its evolution. Furthermore, I will be available for any
> future enhancements or extensions related to this project.
>
> References:
>
> [1] Can Fortran's 'do concurrent' replace directives for accelerated
> computing?
> [2]
> https://arxiv.org/catchup?smonth=10&group=grp_&sday=21&num=50&archive=cs&method=without&syear=2021
> .
> [3] OpenMP Architecture Review Board. (2018). OpenMP Application
> Programming Interface Version 5.0.
> [4] OpenACC-Standard.org. (2015). The OpenACC Application Programming
> Interface Version 2.5.
> [5] Mellor-Crummey, J., & Scott, M. L. (1991). Algorithms for scalable
> synchronization on shared-memory multiprocessors.
> [6] Satish, N., Harris, M., & Garland, M. (2009). Designing efficient
> sorting algorithms for manycore GPUs.
> [7] Stratton, J. A., Rodrigues, C., Sung, I. J., Obeid, N., Chang, L. W.,
> Anssari, N., ... & Hwu, W. M. (2012). Parboil: A revised benchmark suite
> for scientific and commercial throughput computing. Center for Reliable and
> High-Performance Computing, 127.
> [8] Deville, N., Hammer, M., KRAFTIS, J., O'KEEFE, M., Chapman, B., &
> Witting, K. (2022). OpenMP Technical Report 9 on OpenMP and Accelerators.
> OpenMP Architecture Review Board.
> [9] DO CONCURRENT isn’t necessarily concurrent

Reply via email to