On Mon, 23 Mar 2020, Giuliano Belinassi wrote: > Hi, Richi > > On 03/18, Richard Biener wrote: > > On Tue, 17 Mar 2020, Giuliano Belinassi wrote: > > > > > Hi, all > > > > > > I have applied some revews to the project. Please see the new proposal > > > here: > > > > Looks good, some editorial changes below > > > > > https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf > > > > > > **Automatic Detection of Parallel Compilation Viability** > > > > > > [Giuliano Belinassi]{style="color: darkgreen"}\ > > > Timezone: GMT$-$3:00\ > > > University of São Paulo -- Brazil\ > > > IRC: giulianob in \#gcc\ > > > Email: [`giuliano.belina...@usp.br`](mailto:giuliano.belina...@usp.br)\ > > > Github: <https://github.com/giulianobelinassi/>\ > > > Date: > > > > > > About Me Computer Science Bachelor (University of São Paulo), currently > > > pursuing a Masters Degree in Computer Science at the same institution. > > > I've always been fascinated by topics such as High-Performance Computing > > > and Code Optimization, having worked with a parallel implementation of a > > > Boundary Elements Method software in GPU. I am currently conducting > > > research on compiler parallelization and developing the > > > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having > > > already presented it in [GNU Cauldron > > > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q). > > > > > > **Skills**: Strong knowledge in C, Concurrency, Shared Memory > > > Parallelism, Multithreaded Debugging and other typical programming > > > tools. > > > > > > Brief Introduction > > > > > > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that > > > parallelizing the Intra Procedural optimizations improves speed when > > > compiling huge files by a factor of 1.8x in a 4 cores machine, and also > > > showed that this takes 75% of compilation time. > > > > > > In this project we plan to use the LTO infrastructure to improve > > > compilation performance in the non-LTO case, with a tradeoff of > > > generating a binary as good as if LTO is disabled. Here, we will > > > automatically detect when a single file will benefit from parallelism, > > > and procceed with the compilation in parallel if so. > > > > > > Use of LTO > > > > > > The Link Time Optimization (LTO) is a compilation technique that allows > > > the compiler to analyse the program as a whole, instead of analysing and > > > compiling one file at time. Therefore, LTO is able to collect more > > > information about the program and generate a better optimization plan. > > > LTO is divided in three parts: > > > > > > - *LGEN (Local Generation)*: Each file is translated to GIMPLE. This > > > stage runs sequentially in each file and, therefore, in parallel in > > > the project compilation. > > > > > > - *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis > > > (IPA) in the entire program. This state runs serially in the > > > project. > > > > > > - *LTRANS (Local Transformation)*: Execute all Intra Procedural > > > Optimizations in each partition. This stage runs in parallel. > > > > > > Since WPA can bottleneck the compilation because it runs serially in the > > > entire project, LTO was designed to produce faster binaries, not to > > > produce binaries fast. > > > > > > Here, the proposed use of LTO to address this problem is to run the IPA > > > for each Translation Unit (TU), instead in the Whole Program, and > > > > This proposal is to use LTO to produce binaries fast by running > > the IPA phase separately for each Translation Unit (TU), instead of on the > > Whole Program and ... > > > > > automatically detect when to partition the TU into multiple LTRANS to > > > improve compilation performance. The advantage of this approach is: > > > > > > - It can generate binaries as good as when LTO is disabled. > > > > > > - It is faster, as we can partition big files into multiple partitions > > > and compile these partitions in parallel. > > > > > > - It can interact with GNU Make Jobserver, improving CPU utilization. > > > > This reads a bit odd, regular compilation already interacts with the > > GNU Make Jobserver. I'd reorder and reword it w/o dashes like > > > > We can partition big files into multiple partitions and compile these > > partitions in parallel which should improve CPU utilization by exposing > > smaller chunks to the GNU Make Jobserver. Code generation quality > > should be unaffected by this. > > How about: > > ``` > The advantage of this approach is: by partitioning big files into > multiple partitions, we can improve the compilation performance by > exposing these partitions to the Jobserver. Therefore, it can improve > CPU utilization in manycore machines. Generated code quality should be > unaffected by this procedure, which means that it should run as fast as > when LTO is disabled. > ``` > ?
Sounds great. Richard. > > > > Thanks, > > Richard. > > > > > Planned Tasks > > > > > > I plan to use the GSoC time to develop the following topics: > > > > > > - Week \[1, 3\] -- April 27 to May 15:\ > > > Update `cc1`, `cc1plus`, `f771`, ..., to partition the Compilation > > > Unit (CU) after IPA analysis directly into multiple LTRANS > > > partitions, instead of generating a temporary GIMPLE file, and to > > > accept a additional parameter `-fsplit-outputs=<tempfile>`, in which > > > the generated ASM filenames will be written to. > > > > > > There are two possible cases in which I could work on: > > > > > > 1. *Fork*: After the CU is partitioned into multiple LTRANS, then > > > `cc1` will fork and compile these partitions, each of them > > > generating a ASM file, and write the generated asm name into > > > `<tempfile>`. Note that if the number of partitions is one, then > > > this part is not necessary. > > > > > > 2. *Stream LTRANS IR*: After CU is partitionated into multiple > > > LTRANS, then `cc1` will write these partitions into disk so that > > > LTO can read these files and proceed as a standard LTO operation > > > in order to generate a partially linked object file. > > > > > > 1\. Has the advantage of having less overhead than 2., as there is > > > less > > > IO operations, however it may be hard to implement as the assembler > > > file > > > may be already opened before forking, so caution is necessary to make > > > sure that there are a 1 - 1 relationship between assembler file and > > > the > > > compilation process. 2. on the other hand can easily interact with the > > > GNU jobserver. > > > > > > - Week \[4, 7\] -- May 18 to June 12:\ > > > Update the `gcc` driver to take each file in `<tempfile>`, then > > > assemble and partially link them together. Here, an important > > > optimization is to use a named pipe in `<tempfile>` to avoid having > > > to wait every partition to end its compilation before assembling the > > > files. > > > > > > - Week 8 -- June 15 to 19: **First Evaluation**\ > > > Deliver a non-optimized version of the project. Some programs ought > > > to be compiled correctly, but probably there will be a huge overhead > > > because so far there is no way of interacting with GNU Jobserver. > > > > > > - Week \[9, 11\] -- June 22 to July 10:\ > > > Work on GNU Make Jobserver integration. A way of doing this is to > > > adapt the LTO WPA -> LTRANS way of interacting with > > > Jobserver. Another way is to make the forked `cc1` consume Jobserver > > > tokens until the compilation finishes, then return the token when > > > done. > > > > > > - Week 12 -- July 13 to 17: **Second Evaluation**\ > > > Deliver a more optimized version of the project. Here we should > > > filter files that would compile fast from files that would require > > > partitioning, and interact with GNU Jobserver. Therefore we should > > > see some speedup. > > > > > > - Week \[13, 15\] -- July 20 to August 10:\ > > > Develop adequate tests coverage and address unexpected issues so > > > that this feature can be merged to trunk for the next GCC release. > > > > > > - Week 16: **Final evaluation**\ > > > Deliver the final product as a series of patches for trunk. > > > > > > On 03/13, Giuliano Belinassi wrote: > > > > Hi, all > > > > > > > > I want to propose and apply for the following GSoC project: Automatic > > > > Detection of Parallel Compilation Viability. > > > > > > > > https://www.ime.usp.br/~belinass/Automatic_Detection_of_Parallel_Compilation_Viability.pdf > > > > > > > > Feedback is welcome :) > > > > > > > > Here is a markdown version of it: > > > > > > > > **Automatic Detection of Parallel Compilation Viability** > > > > > > > > [Giuliano Belinassi]{style="color: darkgreen"}\ > > > > Timezone: GMT$-$3:00\ > > > > University of São Paulo -- Brazil\ > > > > IRC: giulianob in \#gcc\ > > > > Email: [`giuliano.belina...@usp.br`](mailto:giuliano.belina...@usp.br)\ > > > > Github: <https://github.com/giulianobelinassi/>\ > > > > Date: > > > > > > > > About Me Computer Science Bachelor (University of São Paulo), currently > > > > pursuing a Masters Degree in Computer Science at the same institution. > > > > I've always been fascinated by topics such as High-Performance Computing > > > > and Code Optimization, having worked with a parallel implementation of a > > > > Boundary Elements Method software in GPU. I am currently conducting > > > > research on compiler parallelization and developing the > > > > [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc) project, having > > > > already presented it in [GNU Cauldron > > > > 2019](https://www.youtube.com/watch?v=jd6R3IK__1Q). > > > > > > > > **Skills**: Strong knowledge in C, Concurrency, Shared Memory > > > > Parallelism, Multithreaded Debugging and other typical programming > > > > tools. > > > > > > > > Brief Introduction > > > > > > > > In [ParallelGcc](https://gcc.gnu.org/wiki/ParallelGcc), we showed that > > > > parallelizing the Intra Procedural optimizations improves speed when > > > > compiling huge files by a factor of 1.8x in a 4 cores machine, and also > > > > showed that this takes 75% of compilation time. > > > > > > > > In this project we plan to use the LTO infrastructure to improve > > > > compilation performance in the non-LTO case, with a tradeoff of > > > > generating a binary as good as if LTO is disabled. Here, we will > > > > automatically detect when a single file will benefit from parallelism, > > > > and proceed with the compilation in parallel if so. > > > > > > > > Use of LTO > > > > > > > > The Link Time Optimization (LTO) is a compilation technique that allows > > > > the compiler to analyse the program as a whole, instead of analysing and > > > > compiling one file at time. Therefore, LTO is able to collect more > > > > information about the program and generate a better optimization plan. > > > > LTO is divided in three parts: > > > > > > > > - *LGEN (Local Generation)*: Each file is translated to GIMPLE. This > > > > stage runs sequentially in each file and, therefore, in parallel in > > > > the project compilation. > > > > > > > > - *WPA (Whole Program Analysis)*: Run the Inter Procedural Analysis > > > > (IPA) in the entire program. This state runs serially in the > > > > project. > > > > > > > > - *LTRANS (Local Transformation)*: Execute all Intra Procedural > > > > Optimizations in each partition. This stage runs in parallel. > > > > > > > > Since WPA can bottleneck the compilation because it runs serially in the > > > > entire project, LTO was designed to produce faster binaries, not to > > > > produce binaries fast. > > > > > > > > Here, the proposed use of LTO to address this problem is to run the IPA > > > > for each Translation Unit (TU), instead in the Whole Program, and > > > > automatically detect when to partition the TU into multiple LTRANS to > > > > improve performance. The advantage of this approach is: > > > > > > > > - It can generate binaries as good as when LTO is disabled. > > > > > > > > - It is faster, as we can partition big files into multiple partitions > > > > and compile these partitions in parallel > > > > > > > > - It can interact with GNU Make Jobserver, improving CPU utilization. > > > > > > > > Planned Tasks > > > > > > > > I plan to use the GSoC time to develop the following topics: > > > > > > > > - Week \[1, 3\] -- April 27 to May 15:\ > > > > Update `cc1`, `cc1plus`, `f771`, ..., to partition the data after > > > > IPA analysis directly into multiple LTRANS partitions, instead of > > > > generating a temporary GIMPLE file. > > > > > > > > - Week \[4, 7\] -- May 18 to June 12:\ > > > > Update the `gcc` driver to take these multiple LTRANS partitions, > > > > then call the compiler and assembler for each of them, and merge the > > > > results into one object file. Here I will use the LTO LTRANS object > > > > streaming, therefore it should interact with GNU Make Jobserver. > > > > > > > > - Week 8 -- June 15 to 19: **First Evaluation**\ > > > > Deliver a non-optimized version of the project. Some programs ought > > > > to be compiled correctly, but probably there will be a huge overhead > > > > because so far there will not be any criteria about when to > > > > partition. Some tests are also planned for this evaluation. > > > > > > > > - Week \[9, 11\] -- June 22 to July 10:\ > > > > Implement a criteria about when to partition, and interactively > > > > improve it based on data. > > > > > > > > - Week 12 --- July 13 to 17: **Second Evaluation**\ > > > > Deliver a more optimized version of the project. Here we should > > > > filter files that would compile fast from files that would require > > > > partitioning, and therefore we should see some speedup. > > > > > > > > - Week \[13, 15\] --- July 20 to August 10:\ > > > > Develop adequate tests coverage and address unexpected issues so > > > > that this feature can be merged to trunk for the next GCC release. > > > > > > > > - Week 16: **Final evaluation**\ > > > > Deliver the final product as a series of patches for trunk. > > > > > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg) > > Thank you, > Giuliano. > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)