x86_64 extended the sse2 movnti instruction to support 64-bit integer registers as well. i don't see any builtin for generating this... nor is there an intrinsic listed in the intel manuals or apparently in icc 10.0.023 header files either. the natural name would be _mm_stream_si64 -- which fortunately does not conflict with _mm_stream_pi, the mmx 64-bit store version.
on a whim i tried this: void foo(unsigned long *d, unsigned long v) { __builtin_ia32_movntq((unsigned long long *)d, v); } results in this code: 0000000000000000 <foo>: 0: 48 89 74 24 f8 mov %rsi,0xfffffffffffffff8(%rsp) 5: 0f 6f 44 24 f8 movq 0xfffffffffffffff8(%rsp),%mm0 a: 0f e7 07 movntq %mm0,(%rdi) d: c3 retq perhaps this builtin could be overloaded to generate the "movnti %rsi,(%rdi)" directly instead of shuffling through the mmx reg file? (note if i throw in -mtune=core2 it eliminates the trip through the stack) -dean /home/odo/gcc/bin/gcc -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure --prefix=/home/odo/gcc --disable-multilib --disable-biarch x86_64-unknown-linux-gnu --enable-languages=c Thread model: posix gcc version 4.3.0 20071029 (experimental) (GCC) -- Summary: streaming 64-bit integer stores Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dean at arctic dot org GCC build triplet: x86_64-unknown-linux-gnu GCC host triplet: x86_64-unknown-linux-gnu GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33944