I saw the ARM's porting and knew that ARM have V8QI SIMD operation supporting. I'm porting another platform, and the platform is also supporting SIMD operations. Now I'm implementing the V4QI SIMD add operation. (with gcc version 4.0.1 20050514)
I did the following steps: 1. added VECTOR_MODES(INT, 4); to my <target>-modes.def 2. implemented the "movv4qi" and "addv4qi3" expander definitions and corresponding instruction patterns in the machine description file. 3. let the hook "TARGET_VECTOR_MODE_SUPPORTED_P" is always return true if the mode is V4QImode (written in the <target>.c)
And then I wrote the following test program: =================================[top]=================================== typedef char v4qi __attribute__((vector_size(4)));
v4qi foo();
v4qi a = { 0x11, 0x22, 0x33, 0x44 };
int main() { volatile v4qi x;
x = foo();
return 0; }
v4qi foo() { v4qi x = (v4qi)0xaabbccdd, y = a, z;
z = x + y + a;
return z; } =================================[end]===================================
It didn't work. I passed the option '-fdump-tree-all' to gcc and got the following contents in "<file>.t13.cfg": =================================[top]=================================== ;; Function foo (foo)
Merging blocks 0 and 1 foo () { v4qi z; v4qi y; v4qi x; v4qi D.1238; v4qi a.0; v4qi D.1236;
# BLOCK 0 # PRED: ENTRY (fallthru) x = (vector char) 0aabbccdd; y = a; D.1236 = x + y; a.0 = a; z = D.1236 + a.0; D.1238 = z; return D.1238; # SUCC: EXIT
} =================================[end]=================================== (I eliminated the 'main' function because we only need to concern with the function 'foo'.)
In the next optimization pass dump file, "<file>.t14.oplower", I got: =================================[top]=================================== ;; Function foo (foo)
foo () { unsigned int D.1262; unsigned int D.1261; unsigned int D.1260; unsigned int D.1259; unsigned int D.1258; unsigned int D.1257; unsigned int D.1256; unsigned int D.1255; unsigned int D.1254; unsigned int D.1253; unsigned int D.1252; unsigned int D.1251; unsigned int D.1250; unsigned int D.1249; unsigned int D.1248; unsigned int D.1247; v4qi z; v4qi y; v4qi x; v4qi D.1238; v4qi a.0; v4qi D.1236;
<bb 0>: x = (vector char) 0aabbccdd; y = a; D.1247 = VIEW_CONVERT_EXPR<unsigned int>(x); D.1248 = VIEW_CONVERT_EXPR<unsigned int>(y); D.1249 = D.1247 ^ D.1248; D.1250 = D.1248 & 2139062143; D.1251 = D.1247 & 2139062143; D.1252 = D.1249 & 080808080; D.1253 = D.1251 + D.1250; D.1254 = D.1253 ^ D.1252; D.1236 = VIEW_CONVERT_EXPR<v4qi>(D.1254); a.0 = a; D.1255 = VIEW_CONVERT_EXPR<unsigned int>(D.1236); D.1256 = VIEW_CONVERT_EXPR<unsigned int>(a.0); D.1257 = D.1255 ^ D.1256; D.1258 = D.1256 & 2139062143; D.1259 = D.1255 & 2139062143; D.1260 = D.1257 & 080808080; D.1261 = D.1259 + D.1258; D.1262 = D.1261 ^ D.1260; z = VIEW_CONVERT_EXPR<v4qi>(D.1262); D.1238 = z; return D.1238;
}
=================================[end]===================================
The vector operations are expanded into many XOR, AND, and ADD operations, so the RTL expansion pass is never generate any vector operations.
I modified the program to 'V8QI' version and compiled it by arm's iWMMXt porting. The situation didn't appear. So I guess that there are some miss-configured in my ports, but I can't find it. (maybe I missed some settings of target machine hooks or macros) Would anyone like to help me to solve the problem?
Thanks a lot.