> Hi, > > I have a project in mind which I'm going to propose to the GCC in terms of > Google Summer of Code. My project is not on the list of project ideas > (http://gcc.gnu.org/wiki/SummerOfCode) that is why it would be very > interesting > for me to hear any opinions and maybe even to find a mentor. > > > 1. Project idea > > A brief project idea is to create an abstract layer for vectorized > computations. This would allow to write a portable vectorized code. > > > 2. State of the art > > Nowadays most of processors have a support for SIMD computations. However, the > problem is that each hardware has a different set of SIMD instructions: Intel > MMX+SSE+AVX, PowerPC Altivec, ARM iWMMXt, and so on. GCC supports most of > architecture-specific instructions providing built-in functions. It is > considerably convenient to use these functions when you want to optimize some > piece of code. The problem starts when you want to make this code portable. > It is not a very common task, and of course GCC has a vectorizer. > Unfortunately, there are many examples which show that it is relatively simple > for a human to find a right place in the code and vectorize it, but it is > extremely hard for the compiler to do the same. As a result we end up with the > code which is not using the capabilities of the architecture. > It would be much easier for the programmer to use an abstract layer to > implement a vectorized code. A compiler should deal with the > portability issues > dispatching the code from the abstract layer to the particular > architecture. My > experience shows that there are no such a library for C/C++ that could solve > the problem. There are some attempts like: http://libsimd.sourceforge.net/but > it is only a small part of the idea, and unfortunately the development is > suspended. Or maybe I am wrong and everything is already written? >
Just some relevant/related prior art you may be interested in: one is the LLVA virtual vector IR: http://www.cs.rice.edu/~taha/teaching/04H/RAP/cache/adve-LowLevelVirtual.pdf and there's also an ongoing work on generic vector support in cli on top of the cli-branch of GCC - a preliminary report on early stages of that work was presented at GROW'10 (http://ctuning.org/dissemination/grow10-04.pdf), with hopefully some follow-ups later this year... good luck with whatever GSoC project you ended up proposing! dorit > > 3. Implementation > > First we need to introduce the SIMD abstract model functionality which can be > mapped to the set of architectures we want to support. The difficulty is that > SIMD instruction sets from different architectures are not fully compatible. > Then we want to write a set of "fake-SIMD" functions to be sure that our code > will be usable within the architecture without SIMD support. > After that there is a question how to dispatch functions from the abstract > layer to the architecture layer. The trivial thing to do is just to map the > abstract layer functions to the built-in functions. Obviously it > would not give > the best performance. For example, loading the data from the unaligned memory > into the SIMD register is much slower than loading the data from the aligned > memory. Altivec has an instruction vec_madd(a,b,c) which can be represented by > two instructions in SSE case: _mm_add_ps( _mm_mul_ps(a,b), c). It means that > some code optimizations are required. > > > 4. Time constraints > > The GSoC gives 4 month to finish the project. It means that the > timeline could be the following: > 2 weeks -- discussions and design > 1 week -- fake SIMD > 3 weeks -- implementation of the main dispatcher > 2 weeks -- benchmarks and testing > * the first submission > 1.5 month -- architecture specific dispatcher optimizations > 0.5 month -- testing > * the second submission > > This project can be continued in various ways: > 1) Cost model for the dispatcher > 2) Auto vectorizer + dispatcher > 3) Integration with other languages > And so on > > > 5. Questions > > Should it be the library or the part of the language? What about theextensions > of this abstract layer with a respect to the Larrabee (or similar) which > provides 512-bit register for vectorized operations? And so on. > These questions should be discussed considering the project time constraints > and the interest of the GCC. If anybody is interested in mentoring such a > project please let me know and I would be happy to discuss all the issues. If > anybody thinks that the project is hopeless, please let me know as well. > > -- > Best regards, > Artem Shinkarov > Compiler Technology and Computer Architecture Group > University of Hertfordshire