> Hi,
>
> I have a project in mind which I'm going to propose to the GCC in terms
of
> Google Summer of Code. My project is not on the list of project ideas
> (http://gcc.gnu.org/wiki/SummerOfCode) that is why it would be very
> interesting
> for me to hear any opinions and maybe even to find a mentor.
>
>
> 1. Project idea
>
> A brief project idea is to create an abstract layer for vectorized
> computations. This would allow to write a portable vectorized code.
>
>
> 2. State of the art
>
> Nowadays most of processors have a support for SIMD computations.
However, the
> problem is that each hardware has a different set of SIMD instructions:
Intel
> MMX+SSE+AVX, PowerPC Altivec, ARM iWMMXt, and so on. GCC supports most of
> architecture-specific instructions providing built-in functions. It is
> considerably convenient to use these functions when you want to optimize
some
> piece of code. The problem starts when you want to make this code
portable.
> It is not a very common task, and of course GCC has a vectorizer.
> Unfortunately, there are many examples which show that it is relatively
simple
> for a human to find a right  place in the code and vectorize it, but it
is
> extremely hard for the compiler to do the same. As a result we end up
with the
> code which is not using the capabilities of the architecture.
> It would be much easier for the programmer to use an abstract layer to
> implement a vectorized code. A compiler should deal with the
> portability issues
> dispatching the code from the abstract layer to the particular
> architecture. My
> experience shows that there are no such a library for C/C++ that could
solve
> the problem. There are some attempts like:
http://libsimd.sourceforge.net/but
> it is only a small part of the idea, and unfortunately the development is
> suspended. Or maybe I am wrong and everything is already written?
>

Just some relevant/related prior art you may be interested in: one is the
LLVA virtual vector IR:
http://www.cs.rice.edu/~taha/teaching/04H/RAP/cache/adve-LowLevelVirtual.pdf
and there's also an ongoing work on generic vector support in cli on top of
the cli-branch of GCC - a preliminary report on early stages of that work
was presented at GROW'10 (http://ctuning.org/dissemination/grow10-04.pdf),
with hopefully some follow-ups later this year...

good luck with whatever GSoC project you ended up proposing!

dorit

>
> 3. Implementation
>
> First we need to introduce the SIMD abstract model functionality which
can be
> mapped  to the set of architectures we want to support. The difficulty is
that
> SIMD instruction sets from different architectures are not fully
compatible.
> Then we want to write a set of "fake-SIMD" functions to be sure that our
code
> will be usable within the architecture without SIMD support.
> After that there is a question how to dispatch functions from the
abstract
> layer to the architecture layer. The trivial thing to do is just to map
the
> abstract layer functions to the built-in functions. Obviously it
> would not give
> the best performance. For example, loading the data from the unaligned
memory
> into the SIMD register is much slower than loading the data from the
aligned
> memory. Altivec has an instruction vec_madd(a,b,c) which can be
represented by
> two instructions in SSE case: _mm_add_ps( _mm_mul_ps(a,b), c). It means
that
> some code optimizations are required.
>
>
> 4. Time constraints
>
> The GSoC gives 4 month to finish the project. It means that the
> timeline could be the following:
> 2 weeks -- discussions and design
> 1 week  -- fake SIMD
> 3 weeks -- implementation of the main dispatcher
> 2 weeks -- benchmarks and testing
> * the first submission
> 1.5 month -- architecture specific dispatcher optimizations
> 0.5 month  -- testing
> * the second submission
>
> This project can be continued in various ways:
> 1) Cost model for the dispatcher
> 2) Auto vectorizer + dispatcher
> 3) Integration with other languages
> And so on
>
>
> 5. Questions
>
> Should it be the library or the part of the language? What about
theextensions
> of this abstract layer with a respect to the Larrabee (or similar) which
> provides 512-bit register for vectorized operations? And so on.
> These questions should be discussed considering the project time
constraints
> and the interest of the GCC. If anybody is interested in mentoring such a
> project please let me know and I would be happy to discuss all the
issues. If
> anybody thinks that the project is hopeless, please let me know as well.
>
> --
> Best regards,
> Artem Shinkarov
> Compiler Technology and Computer Architecture Group
> University of Hertfordshire

Reply via email to