On Fri, Dec 05, 2014 at 12:20:31AM +0000, Dominique Leroux wrote:
> Found and fixed an artifact in the last column of decoded RGBA 64-bits PNG 
> images.
> 
> The code was dealing with a SIMD-optimized version of the function without 
> taking into account that we can have RGB/RGBA images that are respectively 6 
> or 8  bytes per pixel (not just 3 or 4).
> 
> Dominique

>  Changelog                    |    1 +
>  libavcodec/pngdec.c          |   16 ++++++++++++----
>  libavcodec/pngdsp.c          |    7 +++++--
>  libavcodec/pngdsp.h          |    5 +++++
>  libavcodec/x86/pngdsp_init.c |   17 +++++++++++------
>  5 files changed, 34 insertions(+), 12 deletions(-)
> a3c13eb0d8d35a5f23a05c3005b7792893f1dc2f  
> 0001-Fixed-PNG-decoding-with-Paeth-filter-for-RGBA-64-bit.patch
> From 5d3b88d2a9bad1f9e9431b6244493cbc6dda5701 Mon Sep 17 00:00:00 2001
> From: Dominique Leroux <dominique.ler...@autodesk.com>
> Date: Thu, 4 Dec 2014 19:05:44 -0500
> Subject: [PATCH] Fixed PNG decoding with Paeth filter for RGBA 64-bits.
> 
> I noticed an artifact on the resulting image when passing a RGBA 64-bits PNG
> through the copy filter.  The last column, although obviously "derived" from 
> the
> original image's last column (i.e. not completely garbage), had way too much
> yellow in it; a telltale sign of an off-by-X somewhere.  Original scenario 
> used
> the scale filter, but taking it out was still exhibiting the problem.
> 
> So my boiled-down test consisted in running:
> 
>       ./ffmpeg.exe -i original_rgba64.png -vf copy result.png
> 
> and inspecting the result image with a viewer.  I realized the image had some
> lines filtered with Paeth, and only those lines would get the artifact on 
> their
> rightmost pixel.
> 
> In libavcodec/pngdec.c, png_filter_row() has buffer-sizing arithmetic for
> feeding the Paeth prediction function with a buffer that has a multiple of 4
> bytes.  This is done so the optimized version of the algorithm, using 
> MMX/SSE3,
> can work on 4 bytes at a time without causing a write past the end of the 
> buffer
> when writing back into the result buffer.  This arithmetic assumed that
> situations where the buffer is improperly sized were limited to RGB 24-bits
> (bpp, or bytes per pixel, = 3) and that all other formats had buffers properly
> sized for SIMD, i.e. a multiple of 4.
> 
> There are 2 problems with this:
> 
> . bpp == 8 (RGBA 64-bits, my case) also has the proper size for SIMD.  But it
>   got handled as if it had a bpp of 3 because the code was testing for bpp == 
> 4
>   (with ternary operator's operands reversed).
> . bpp == 6 (RGB 48-bits) can also cause writes past the end of the buffer when
>   handling the last 2 of 6 bytes for an image that has an odd number of 
> columns.
> 
> So I am reformulating the SIMD-adjusted width to ignore the trailing 3 (or 
> less)
> bytes when the total width is not a multiple of 4.  These bytes are fed to the
> unoptimized version of the algorithm.
> 
> I was tempted to apply this to bpp == 1 as well (i.e. take out the bpp > 2 
> test
> in the enclosing if). But looking at pngdec.c's history, this has been done
> earlier and differences between the results of the optimized and non-optimized
> versions of the algorithm were observed.  So I am leaving this untouched.
> 
> It then looked to me that knowing the SIMD stride requirements of the x86
> platform implementation was a bit of a long-distance coupling between the
> platform-independent pngdec.c and the x86-specific x86/pngdsp.asm.  So I
> abstracted this out through a pair of members in the PNGDSPContext structure:
> SIMD stride as well as a mask for adjusting the buffer size.
> 
> Signed-off-by: Dominique Leroux <dominique.ler...@autodesk.com>
> ---
>  Changelog                    |  1 +
>  libavcodec/pngdec.c          | 16 ++++++++++++----
>  libavcodec/pngdsp.c          |  7 +++++--
>  libavcodec/pngdsp.h          |  5 +++++
>  libavcodec/x86/pngdsp_init.c | 17 +++++++++++------
>  5 files changed, 34 insertions(+), 12 deletions(-)

this is not correct
iam a bit tired so i just quickly read your elaborate text but the asm
works in steps of bpp not steps of 4 and writes 4 bytes at a time
you seem to assume the asm is simpler and more systematic than it
actually is, also theres are bugs in the asm i think
ive a fix for some of this locally but id like to test it more before
pushing

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Breaking DRM is a little like attempting to break through a door even
though the window is wide open and the only thing in the house is a bunch
of things you dont want and which you would get tomorrow for free anyway

Attachment: signature.asc
Description: Digital signature

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to