Hi, I have a project in mind which I'm going to propose to the GCC in terms of Google Summer of Code. My project is not on the list of project ideas (http://gcc.gnu.org/wiki/SummerOfCode) that is why it would be very interesting for me to hear any opinions and maybe even to find a mentor.
1. Project idea A brief project idea is to create an abstract layer for vectorized computations. This would allow to write a portable vectorized code. 2. State of the art Nowadays most of processors have a support for SIMD computations. However, the problem is that each hardware has a different set of SIMD instructions: Intel MMX+SSE+AVX, PowerPC Altivec, ARM iWMMXt, and so on. GCC supports most of architecture-specific instructions providing built-in functions. It is considerably convenient to use these functions when you want to optimize some piece of code. The problem starts when you want to make this code portable. It is not a very common task, and of course GCC has a vectorizer. Unfortunately, there are many examples which show that it is relatively simple for a human to find a right place in the code and vectorize it, but it is extremely hard for the compiler to do the same. As a result we end up with the code which is not using the capabilities of the architecture. It would be much easier for the programmer to use an abstract layer to implement a vectorized code. A compiler should deal with the portability issues dispatching the code from the abstract layer to the particular architecture. My experience shows that there are no such a library for C/C++ that could solve the problem. There are some attempts like: http://libsimd.sourceforge.net/ but it is only a small part of the idea, and unfortunately the development is suspended. Or maybe I am wrong and everything is already written? 3. Implementation First we need to introduce the SIMD abstract model functionality which can be mapped to the set of architectures we want to support. The difficulty is that SIMD instruction sets from different architectures are not fully compatible. Then we want to write a set of "fake-SIMD" functions to be sure that our code will be usable within the architecture without SIMD support. After that there is a question how to dispatch functions from the abstract layer to the architecture layer. The trivial thing to do is just to map the abstract layer functions to the built-in functions. Obviously it would not give the best performance. For example, loading the data from the unaligned memory into the SIMD register is much slower than loading the data from the aligned memory. Altivec has an instruction vec_madd(a,b,c) which can be represented by two instructions in SSE case: _mm_add_ps( _mm_mul_ps(a,b), c). It means that some code optimizations are required. 4. Time constraints The GSoC gives 4 month to finish the project. It means that the timeline could be the following: 2 weeks -- discussions and design 1 week -- fake SIMD 3 weeks -- implementation of the main dispatcher 2 weeks -- benchmarks and testing * the first submission 1.5 month -- architecture specific dispatcher optimizations 0.5 month -- testing * the second submission This project can be continued in various ways: 1) Cost model for the dispatcher 2) Auto vectorizer + dispatcher 3) Integration with other languages And so on 5. Questions Should it be the library or the part of the language? What about the extensions of this abstract layer with a respect to the Larrabee (or similar) which provides 512-bit register for vectorized operations? And so on. These questions should be discussed considering the project time constraints and the interest of the GCC. If anybody is interested in mentoring such a project please let me know and I would be happy to discuss all the issues. If anybody thinks that the project is hopeless, please let me know as well. -- Best regards, Artem Shinkarov Compiler Technology and Computer Architecture Group University of Hertfordshire