> The generic vector types (used with the vector_size attribute) could be
> seen as the beginnings of such an abstract layer.

Yes, this is very likely is going to be a starting point. I'm sorry
that I have not mentioned this in my first email. Maybe there could be
some alternative ideas how it should look like. But this is the most
obvious one.

Basically, general vectors support only a restricted set of
operations: +, -, *, /, &, |, ^, ~

The indexing, as you already said, is not supported. I know about this
patch, but the question is what would be the most efficient way  to
implement it. Do we always want to return a value or a memory address
of the particular vector element, or may be we can optimize the set of
operations using vector-shifting and vector-masking to keep an element
just inside the vector. Sometimes it could be faster.

You cannot compare two vectors, although you have built-in
instructions for that.

You cannot do shifts within a vector and it could be very useful.
Sometimes general vector extension just fails, producing a code that
causes Segmentation fault. For example:

#include <stdio.h>
#define N 1024

typedef short __attribute__((vector_size(16))) v8hi;

short a[N];
v8hi *pa = (v8hi *)a, *pvt;
v8hi va;

int main(int argc, char *argv[]) {
    FILE *f;
    int i, var;

    f = fopen(argv[1], "r");
    for (i = 0; i < N; i++) {
        fscanf(f, "%i", &var);
        a[i] = (short) var;
    }

    printf("Before the assignment\n");

    va  = *((v8hi *)&(a[0]));
    pvt =  ((v8hi *)&(a[3]));
    *pvt = va;

    printf("After the assignment\n");

    for (i = 0; i < 20; i++) {
        printf("%i ", a[i]);
    }
    printf("\n");

    fclose (f);

}

all the vector assignments are converted in case of intel architecture
into instruction "movdqa" which works only if memory is aligned, which
is not the case in this example. Compiler can't figure it out and
produces a code which causes segmentation fault. Although if you would
compile the same code on an architecture without SIMD support then it
works fine.

It is surely not a very serious bug but it makes hard to use generic
vector support.

Reduction of the operation is not supported, you can't sum over the
vector of elements. Some architectures have a support for this feature
as well.

Permutation of elements within a vector.

Saturated arithmetic. But I'm not an expert in that field. I mean I
don't know what kind o instructions each architecture provides for
saturated arithmetic. But I think it would not be hard to find it out.

And some more.

The question is what should be done at the first stage. It is surely a
very big project, not for one summer. Depending on the taste of the
mentor, different things could be done as at the beginning. Are you
interested in mentoring this project?



--
Artem Shinkarov

Reply via email to