Hi!

The intrinsics are more or less C wrapper functions for assembler commands. You can find a detailed description here:

http://www.intel.com/products/processor/manuals/index.htm

SSE1-3 is supported by modern AMD and Intel processors.

There are many possible improvements, but you need to have processor-specific selection of code.

An example for intrinsics:

typedef float v4sf __attribute__ ((vector_size(16)));
typedef short int v8hi __attribute__ ((vector_size(16)));
typedef int v4si __attribute__ ((vector_size(16)));

v4sf * o = static_cast<v4sf*>(buffer->write_pointer());
const v8hi * in = reinterpret_cast<v8hi*>(usrp_buffer);
for(i = 0; i < nbytes; i+=16, o+=2, ++in){
  const v8hi x = *in;

  o[0] = __builtin_ia32_cvtdq2ps(
         __builtin_ia32_psradi128(
         reinterpret_cast<v4si>(
         __builtin_ia32_punpcklwd128(x,x)),16));
  o[1] = __builtin_ia32_cvtdq2ps(
         __builtin_ia32_psradi128(
         reinterpret_cast<v4si>(
         __builtin_ia32_punpckhwd128(x,x)),16));
}

The code snippet fastly converts the shorts the usrp delivers to floats, using SSE. Actually, it ignores the endian-order and assumes little-endian. The buffer size is supposed to be a multiple of 16 bytes.

Dominik


_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to