On 28/11/2018 02:27, Ruiling Song wrote:
> Signed-off-by: Ruiling Song <ruiling.s...@intel.com>
> ---
>  configure                         |   1 +
>  libavfilter/Makefile              |   1 +
>  libavfilter/allfilters.c          |   1 +
>  libavfilter/opencl/transpose.cl   |  35 +++++
>  libavfilter/opencl_source.h       |   1 +
>  libavfilter/transpose.h           |  34 +++++
>  libavfilter/vf_transpose.c        |  14 +-
>  libavfilter/vf_transpose_opencl.c | 288 
> ++++++++++++++++++++++++++++++++++++++
>  8 files changed, 362 insertions(+), 13 deletions(-)
>  create mode 100644 libavfilter/opencl/transpose.cl
>  create mode 100644 libavfilter/transpose.h
>  create mode 100644 libavfilter/vf_transpose_opencl.c

Testing the passthrough option here reveals a slightly unfortunate interaction 
with mapping - if this is the only filter in use, then not doing a redundant 
copy can fall over.

For example, on Rockchip (Mali) decoding with rkmpp then using:

-vf 
hwmap=derive_device=opencl,transpose_opencl=dir=clock:passthrough=landscape,hwdownload,format=nv12

fails at the download in the passthrough case because it doesn't allow the read 
(the extension does explicitly document this constraint - 
<https://www.khronos.org/registry/OpenCL/extensions/arm/cl_arm_import_memory.txt>).

VAAPI has a similar problem with a decode followed by:

-vf 
hwmap=derive_device=opencl,transpose_opencl,hwmap=derive_device=vaapi:reverse=1

because the reverse mapping tries to replace the inlink hw_frames_ctx in a way 
which doesn't actually work.

All of these cases do of course work if anything else is in the way - any 
additional opencl filter on either side makes it work.  I think it's fine to 
ignore this (after all, the hwmap immediately followed by hwdownload case can 
already fail in the same way), but any thoughts you have on making that better 
are welcome.


>> Does the dependency on dir have any effect on speed here?  Any call is only 
>> ever
>> going to use one side of each of the dir cases, so it feels like it might be 
>> nicer to
>> hard-code that so they aren't included in the compiled code at all.
> For such memory bound OpenCL kernel, some little more arithmetic operation 
> would not affect the overall performance.
> I did some more testing, and see no obvious performance difference for 
> different 'dir' parameter. So I just keep it as now.

That makes sense, thank you for checking.


So, LGTM and applied.

Thanks,

- Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to