You could try doing 8 or 16 bytes per iteration instead of 4, it might be faster depending on how good your cpu is at OOE. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
- [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE4.1 optimizatio... Timothy Gu
- Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE4.1 op... James Almer
- Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE4.... Timothy Gu
- [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE2 opti... Timothy Gu
- Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add S... Timothy Gu
- Re: [FFmpeg-devel] [PATCH] x86/vf_blend: A... Paul B Mahol
- Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add S... Henrik Gramner