hpeldsp: fix half pel interpolation

Jerome Borsboom Fri, 27 Apr 2018 07:48:11 -0700

The assembly optimized half pel interpolation in some cases rounds the
interpolated value when no rounding is requested. The result is a off by one
error when one of the pixel values is zero.


Signed-off-by: Jerome Borsboom <[email protected]>
---
In the put_no_rnd_pixels functions, the psubusb instruction subtracts one from 
each
unsigned byte to correct for the rouding that the PAVGB instruction performs. 
The psubusb
instruction, however, uses saturation when the value does not fit in the 
operand type,
i.e. an unsigned byte. In this particular case, this means that when the value 
of a pixel
is 0, the psubusb instruction will return 0 instead of -1 as this value does 
not fit in
an unsigned byte and is saturated to 0. The result is that the interpolated 
value is not
corrected for the rounding that PAVGB performs and that the result will be off 
by one.

The corrections below solved the issues for me, but I do not a lot of 
experience in optimizing
assembly. A good check for the correctness of the solution might be advisable. 
Furthermore,
I have not checked the other assembly, but there may be more cases where the 
psubusb
instruction does not provide the desired results. A good check by the 
owner/maintainer of
the assembly code might be appropriate.

 libavcodec/x86/hpeldsp.asm | 38 ++++++++++++++++++++++++++++++++------
 1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/libavcodec/x86/hpeldsp.asm b/libavcodec/x86/hpeldsp.asm
index ce5d7a4e28..bae2ba9880 100644
--- a/libavcodec/x86/hpeldsp.asm
+++ b/libavcodec/x86/hpeldsp.asm
@@ -145,10 +145,16 @@ cglobal put_no_rnd_pixels8_x2, 4,5
     mova         m1, [r1+1]
     mova         m3, [r1+r2+1]
     add          r1, r4
-    psubusb      m0, m6
-    psubusb      m2, m6
+    mova         m4, m0
+    pxor         m4, m1
+    pand         m4, m6
     PAVGB        m0, m1
+    psubb        m0, m4
+    mova         m4, m2
+    pxor         m4, m3
+    pand         m4, m6
     PAVGB        m2, m3
+    psubb        m2, m4
     mova       [r0], m0
     mova    [r0+r2], m2
     mova         m0, [r1]
@@ -157,10 +163,16 @@ cglobal put_no_rnd_pixels8_x2, 4,5
     mova         m3, [r1+r2+1]
     add          r0, r4
     add          r1, r4
-    psubusb      m0, m6
-    psubusb      m2, m6
+    mova         m4, m0
+    pxor         m4, m1
+    pand         m4, m6
     PAVGB        m0, m1
+    psubb        m0, m4
+    mova         m4, m2
+    pxor         m4, m3
+    pand         m4, m6
     PAVGB        m2, m3
+    psubb        m2, m4
     mova       [r0], m0
     mova    [r0+r2], m2
     add          r0, r4
@@ -227,18 +239,32 @@ cglobal put_no_rnd_pixels8_y2, 4,5
     mova         m1, [r1+r2]
     mova         m2, [r1+r4]
     add          r1, r4
-    psubusb      m1, m6
+    mova         m3, m0
+    pxor         m3, m1
+    pand         m3, m6
     PAVGB        m0, m1
+    psubb        m0, m3
+    mova         m3, m1
+    pxor         m3, m2
+    pand         m3, m6
     PAVGB        m1, m2
+    psubb        m1, m3
     mova    [r0+r2], m0
     mova    [r0+r4], m1
     mova         m1, [r1+r2]
     mova         m0, [r1+r4]
     add          r0, r4
     add          r1, r4
-    psubusb      m1, m6
+    mova         m3, m2
+    pxor         m3, m1
+    pand         m3, m6
     PAVGB        m2, m1
+    psubb        m2, m3
+    mova         m3, m1
+    pxor         m3, m0
+    pand         m3, m6
     PAVGB        m1, m0
+    psubb        m1, m3
     mova    [r0+r2], m2
     mova    [r0+r4], m1
     add          r0, r4
-- 
2.13.6
_______________________________________________
ffmpeg-devel mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

Reply via email to