Hi!
The intrinsics are more or less C wrapper functions for assembler
commands. You can find a detailed description here:
http://www.intel.com/products/processor/manuals/index.htm
SSE1-3 is supported by modern AMD and Intel processors.
There are many possible improvements, but you need to have
processor-specific selection of code.
An example for intrinsics:
typedef float v4sf __attribute__ ((vector_size(16)));
typedef short int v8hi __attribute__ ((vector_size(16)));
typedef int v4si __attribute__ ((vector_size(16)));
v4sf * o = static_cast<v4sf*>(buffer->write_pointer());
const v8hi * in = reinterpret_cast<v8hi*>(usrp_buffer);
for(i = 0; i < nbytes; i+=16, o+=2, ++in){
const v8hi x = *in;
o[0] = __builtin_ia32_cvtdq2ps(
__builtin_ia32_psradi128(
reinterpret_cast<v4si>(
__builtin_ia32_punpcklwd128(x,x)),16));
o[1] = __builtin_ia32_cvtdq2ps(
__builtin_ia32_psradi128(
reinterpret_cast<v4si>(
__builtin_ia32_punpckhwd128(x,x)),16));
}
The code snippet fastly converts the shorts the usrp delivers to floats,
using SSE. Actually, it ignores the endian-order and assumes
little-endian. The buffer size is supposed to be a multiple of 16 bytes.
Dominik
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio