Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE2 optimization for divide

2016-02-28 Thread Timothy Gu
On Sun, Feb 14, 2016 at 03:45:11PM +0100, Henrik Gramner wrote: > You could try doing 8 or 16 bytes per iteration instead of 4, it might > be faster depending on how good your cpu is at OOE. As discussed on IRC, no observable difference has been observed with such changes, mainly because the bottl

Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE2 optimization for divide

2016-02-14 Thread Henrik Gramner
You could try doing 8 or 16 bytes per iteration instead of 4, it might be faster depending on how good your cpu is at OOE. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE2 optimization for divide

2016-02-14 Thread Paul B Mahol
On 2/14/16, Timothy Gu wrote: > On Sat, Feb 13, 2016 at 07:21:25PM -0800, Timothy Gu wrote: >> --- >> libavfilter/x86/vf_blend.asm| 30 ++ >> libavfilter/x86/vf_blend_init.c | 2 ++ >> 2 files changed, 32 insertions(+) > > Locally added commit message: > > 4.

Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE2 optimization for divide

2016-02-13 Thread Timothy Gu
On Sat, Feb 13, 2016 at 07:21:25PM -0800, Timothy Gu wrote: > --- > libavfilter/x86/vf_blend.asm| 30 ++ > libavfilter/x86/vf_blend_init.c | 2 ++ > 2 files changed, 32 insertions(+) Locally added commit message: 4.5x faster than C float version with autovec