Re: [FFmpeg-devel] [PATCH 4/5] lavu/intmath.h: Fix UB in ff_ctz_c() and ff_ctzll_c()

2024-05-30 Thread Rémi Denis-Courmont
Can't we just use the compiler built-ins here? AFAIK, they (GCC, LLVM) use the 
same algorithm if the CPU doesn't support native CTZ. And they will pick the 
right instruction if CPU does have CTZ.

I get it that maybe it wasn't working so well 20 years ago, but we've increased 
compiler version requirements since then.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] git problems

2024-05-30 Thread Andrew Sayers
On Thu, May 30, 2024 at 01:30:09AM +0200, Michael Niedermayer wrote:
> Hi all
> 
> It seems the security update (https://ubuntu.com/security/notices/USN-6793-1)
> broke public git
> 
> We use gitolite that runs under its own user and serve git through apache
> which runs under a different user.
> Apache has only read access to the repositories
> 
> Since the security update that stoped working, the logs are full of messages
> telling that we need to add the repositories to safe.directory
> (the commands suggested dont work and seem to mix up \t with a tab but thats 
> besides the point)
> once the repository is added to safe.directory, which ive done with 
> https://git.ffmpeg.org/michael.git
> the error is gone and everything looks fine in the logs on the server but it 
> still
> doesnt work. (i have not touched ffmpeg.git config as i first wanted to test 
> this)
> 
> So like i just said on IRC. i hope some of the other root admins will have
> some more insight here. Or if you (yes YOU!) want to help or know something
> please speak up.
> 
> This is totally not my area and i think other people could find the issue
> with less effort in less time and it would be more efficient if i work
> on FFmpeg instead where the return per hour of my time should be much greater.
> 
> Also gitweb and git over ssh seem uneffected and theres github
> 
> If people want i could downgrade git OR
> upgrade git to latest git ignoring official ubuntu packages
> otherwise, i intend to leave this for someone else to investigate and rather
> work on FFmpeg which just seems like a much better use of my time

You've talked recently about looking for STF money to upgrade the servers.
You might want to write up a postmortem when the bug is fixed, focussing on
improvements that are unlikely to happen without money.  Then you can say
"we had X hours of downtime, we think Y jobs will reduce that by Z%".

One thing for the postmortem - I don't know enough about these specific
programs to do much with the description provided.  And even if I did, I could
only offer prose hints at a solution.  But containerising these services would
let me replicate the server locally, and suggest solutions as normal patches
on the mailing list.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

2024-05-30 Thread Tomas Härdin
tor 2024-05-30 klockan 09:41 +0300 skrev Rémi Denis-Courmont:
> Hi,
> 
> Le 30 mai 2024 01:13:14 GMT+03:00, "Tomas Härdin"  a
> écrit :
> > The entire patchset passes FATE
> 
> Is the version in riscv/intmath.h safe? It looks to me that the GCC
> codegen for not only RV64 but also AArch{32,64} and x86-64 is better
> than this.

I haven't checked. It seems weird to me to have two different C
versions. We shouldn't rely on type punning. The standard compliant way
is to use memcpy()

/Tomas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

2024-05-30 Thread Tomas Härdin
tor 2024-05-30 klockan 00:31 +0200 skrev Andreas Rheinhardt:
> Tomas Härdin:
> >   */
> >  static av_always_inline av_const int32_t av_clipl_int32_c(int64_t
> > a)
> >  {
> > -    if ((a+0x8000u) & ~UINT64_C(0x)) return
> > (int32_t)((a>>63) ^ 0x7FFF);
> > -    else return
> > (int32_t)a;
> > +    if ((a+UINT64_C(0x8000)) & ~UINT64_C(0x)) return
> > (int32_t)((a>>63) ^ 0x7FFF);
> > +    else  return
> > (int32_t)a;
> 
> IMO (uint64_t)a + 0x8000 is more readable. (Maybe it would even
> be
> good to use >> 32 instead of ~UINT64_C(0x)?)

It already uses UINT64_C, hence why I used it.

>> 32 would work also. Does it make any difference performance wise?

/Tomas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/5] lavu/intmath.h: Fix UB in ff_ctz_c() and ff_ctzll_c()

2024-05-30 Thread Tomas Härdin
tor 2024-05-30 klockan 10:54 +0300 skrev Rémi Denis-Courmont:
> Can't we just use the compiler built-ins here? AFAIK, they (GCC,
> LLVM) use the same algorithm if the CPU doesn't support native CTZ.
> And they will pick the right instruction if CPU does have CTZ.
> 
> I get it that maybe it wasn't working so well 20 years ago, but we've
> increased compiler version requirements since then.

I think we still support MSVC, but maybe we shouldn't? It's possible to
cross-compile for Windows either way.

/Tomas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

2024-05-30 Thread Rémi Denis-Courmont


Le 30 mai 2024 12:40:20 GMT+03:00, "Tomas Härdin"  a écrit :
>tor 2024-05-30 klockan 09:41 +0300 skrev Rémi Denis-Courmont:
>> Hi,
>> 
>> Le 30 mai 2024 01:13:14 GMT+03:00, "Tomas Härdin"  a
>> écrit :
>> > The entire patchset passes FATE
>> 
>> Is the version in riscv/intmath.h safe? It looks to me that the GCC
>> codegen for not only RV64 but also AArch{32,64} and x86-64 is better
>> than this.
>
>I haven't checked. It seems weird to me to have two different C
>versions.

The common one ends up horrendously bad on RV, and presumably on MIPS and some 
other RISC ISA.

> We shouldn't rely on type punning.

Because?

We should depend on punning as long as it conforms to the standard.

> The standard compliant way
>is to use memcpy()

That's way worse than union in terms of how proactively the compiler needs to 
optimise, and both approaches are as confirming.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/5] lavu/intmath.h: Fix UB in ff_ctz_c() and ff_ctzll_c()

2024-05-30 Thread Hendrik Leppkes
On Thu, May 30, 2024 at 11:50 AM Tomas Härdin  wrote:
>
> tor 2024-05-30 klockan 10:54 +0300 skrev Rémi Denis-Courmont:
> > Can't we just use the compiler built-ins here? AFAIK, they (GCC,
> > LLVM) use the same algorithm if the CPU doesn't support native CTZ.
> > And they will pick the right instruction if CPU does have CTZ.
> >
> > I get it that maybe it wasn't working so well 20 years ago, but we've
> > increased compiler version requirements since then.
>
> I think we still support MSVC, but maybe we shouldn't? It's possible to
> cross-compile for Windows either way.
>

This is not going to happen.

- Hendrik
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] git problems

2024-05-30 Thread Michael Niedermayer
On Thu, May 30, 2024 at 10:27:31AM +0100, Andrew Sayers wrote:
> On Thu, May 30, 2024 at 01:30:09AM +0200, Michael Niedermayer wrote:
> > Hi all
> > 
> > It seems the security update 
> > (https://ubuntu.com/security/notices/USN-6793-1)
> > broke public git
> > 
> > We use gitolite that runs under its own user and serve git through apache
> > which runs under a different user.
> > Apache has only read access to the repositories
> > 
> > Since the security update that stoped working, the logs are full of messages
> > telling that we need to add the repositories to safe.directory
> > (the commands suggested dont work and seem to mix up \t with a tab but 
> > thats besides the point)
> > once the repository is added to safe.directory, which ive done with 
> > https://git.ffmpeg.org/michael.git
> > the error is gone and everything looks fine in the logs on the server but 
> > it still
> > doesnt work. (i have not touched ffmpeg.git config as i first wanted to 
> > test this)
> > 
> > So like i just said on IRC. i hope some of the other root admins will have
> > some more insight here. Or if you (yes YOU!) want to help or know something
> > please speak up.
> > 
> > This is totally not my area and i think other people could find the issue
> > with less effort in less time and it would be more efficient if i work
> > on FFmpeg instead where the return per hour of my time should be much 
> > greater.
> > 
> > Also gitweb and git over ssh seem uneffected and theres github
> > 
> > If people want i could downgrade git OR
> > upgrade git to latest git ignoring official ubuntu packages
> > otherwise, i intend to leave this for someone else to investigate and rather
> > work on FFmpeg which just seems like a much better use of my time
> 
> You've talked recently about looking for STF money to upgrade the servers.

> You might want to write up a postmortem when the bug is fixed,

i will suggest this to raz once we understand teh issue fully


[...]

> One thing for the postmortem - I don't know enough about these specific
> programs to do much with the description provided.  And even if I did, I could
> only offer prose hints at a solution.  But containerising these services would
> let me replicate the server locally, and suggest solutions as normal patches
> on the mailing list.

the box is a VM currently so one could in principle clone it.
only that various private keys (for example for SSL certs) and
personal data (like IP addresses in log files) would be in it
making public sharing impossible
also there are likely other reasons why publically sharing such a clone
would be a bad idea.

i dont see how containerising would change this.
IMHO the effort to make sure a container would be safe security and privacy
wise to share publically outweights the benefit.

If someone wants to reproduce this locally, setup a ubuntu focal, setup gitolite
setup apache and try to do a git clone via https. with latest git vs the
version from 3 days ago, that should probably replicate it.
If one person builds such a test setup, (s)he can share this with everyone
I think the effort here is quite a bit lower than trying to make the live
servers publically sharable. (and it costs us 0 time and 0 $)
anyway not suggesting anyone does this. Just saying, IF someone really
wants to replicate it.

raz has found a workaround already with the current git version, but we
still have incomplete understanding of teh issue

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Rewriting code that is poorly written but fully understood is good.
Rewriting code that one doesnt understand is a sign that one is less smart
than the original author, trying to rewrite it will not make it better.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/5] lavu/intmath.h: Fix UB in ff_ctz_c() and ff_ctzll_c()

2024-05-30 Thread Rémi Denis-Courmont


Le 30 mai 2024 12:50:05 GMT+03:00, "Tomas Härdin"  a écrit :
>tor 2024-05-30 klockan 10:54 +0300 skrev Rémi Denis-Courmont:
>> Can't we just use the compiler built-ins here? AFAIK, they (GCC,
>> LLVM) use the same algorithm if the CPU doesn't support native CTZ.
>> And they will pick the right instruction if CPU does have CTZ.
>> 
>> I get it that maybe it wasn't working so well 20 years ago, but we've
>> increased compiler version requirements since then.
>
>I think we still support MSVC, but maybe we shouldn't? It's possible to
>cross-compile for Windows either way.

I don't get how that prevents using the GCC and Clang builtins (on GCC and 
Clang).
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 01/10, v3] avutil: add hwcontext_amf.

2024-05-30 Thread Dmitrii Ovchinnikov
Adds hwcontext_amf, which allows to use shared AMF
context for the encoder, decoder and AMF-based filters,
without copy to the host memory.
It will also allow you to use some optimisations in
the interaction of components (for example, SAV) and make a more
manageable and optimal setup for using GPU devices with AMF
in the case of a fully AMF pipeline.
It will be a significant performance uplift when full AMF pipeline
with filters is used.

We also plan to add Compression artefact removal filter in near feature.
v2: cleanup header files
v3: an unnecessary class has been removed.
---
 libavutil/Makefile |   4 +
 libavutil/hwcontext.c  |   4 +
 libavutil/hwcontext.h  |   1 +
 libavutil/hwcontext_amf.c  | 585 +
 libavutil/hwcontext_amf.h  |  64 
 libavutil/hwcontext_amf_internal.h |  44 +++
 libavutil/hwcontext_internal.h |   1 +
 libavutil/pixdesc.c|   4 +
 libavutil/pixfmt.h |   5 +
 9 files changed, 712 insertions(+)
 create mode 100644 libavutil/hwcontext_amf.c
 create mode 100644 libavutil/hwcontext_amf.h
 create mode 100644 libavutil/hwcontext_amf_internal.h

diff --git a/libavutil/Makefile b/libavutil/Makefile
index 6e6fa8d800..13c318560d 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -45,6 +45,7 @@ HEADERS = adler32.h   
  \
   hwcontext_d3d12va.h   \
   hwcontext_drm.h   \
   hwcontext_dxva2.h \
+  hwcontext_amf.h   \
   hwcontext_qsv.h   \
   hwcontext_mediacodec.h\
   hwcontext_opencl.h\
@@ -196,6 +197,7 @@ OBJS-$(CONFIG_CUDA) += hwcontext_cuda.o
 OBJS-$(CONFIG_D3D11VA)  += hwcontext_d3d11va.o
 OBJS-$(CONFIG_D3D12VA)  += hwcontext_d3d12va.o
 OBJS-$(CONFIG_DXVA2)+= hwcontext_dxva2.o
+OBJS-$(CONFIG_AMF)  += hwcontext_amf.o
 OBJS-$(CONFIG_LIBDRM)   += hwcontext_drm.o
 OBJS-$(CONFIG_MACOS_KPERF)  += macos_kperf.o
 OBJS-$(CONFIG_MEDIACODEC)   += hwcontext_mediacodec.o
@@ -220,6 +222,8 @@ SKIPHEADERS-$(CONFIG_CUDA) += 
hwcontext_cuda_internal.h \
 SKIPHEADERS-$(CONFIG_D3D11VA)  += hwcontext_d3d11va.h
 SKIPHEADERS-$(CONFIG_D3D12VA)  += hwcontext_d3d12va.h
 SKIPHEADERS-$(CONFIG_DXVA2)+= hwcontext_dxva2.h
+SKIPHEADERS-$(CONFIG_AMF)  += hwcontext_amf.h   \
+  hwcontext_amf_internal
 SKIPHEADERS-$(CONFIG_QSV)  += hwcontext_qsv.h
 SKIPHEADERS-$(CONFIG_OPENCL)   += hwcontext_opencl.h
 SKIPHEADERS-$(CONFIG_VAAPI)+= hwcontext_vaapi.h
diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
index fa99a0d8a4..f06d49c45c 100644
--- a/libavutil/hwcontext.c
+++ b/libavutil/hwcontext.c
@@ -65,6 +65,9 @@ static const HWContextType * const hw_table[] = {
 #endif
 #if CONFIG_VULKAN
 &ff_hwcontext_type_vulkan,
+#endif
+#if CONFIG_AMF
+&ff_hwcontext_type_amf,
 #endif
 NULL,
 };
@@ -82,6 +85,7 @@ static const char *const hw_type_names[] = {
 [AV_HWDEVICE_TYPE_VIDEOTOOLBOX] = "videotoolbox",
 [AV_HWDEVICE_TYPE_MEDIACODEC] = "mediacodec",
 [AV_HWDEVICE_TYPE_VULKAN] = "vulkan",
+[AV_HWDEVICE_TYPE_AMF] = "amf",
 };
 
 typedef struct FFHWDeviceContext {
diff --git a/libavutil/hwcontext.h b/libavutil/hwcontext.h
index bac30debae..96042ba197 100644
--- a/libavutil/hwcontext.h
+++ b/libavutil/hwcontext.h
@@ -38,6 +38,7 @@ enum AVHWDeviceType {
 AV_HWDEVICE_TYPE_MEDIACODEC,
 AV_HWDEVICE_TYPE_VULKAN,
 AV_HWDEVICE_TYPE_D3D12VA,
+AV_HWDEVICE_TYPE_AMF,
 };
 
 /**
diff --git a/libavutil/hwcontext_amf.c b/libavutil/hwcontext_amf.c
new file mode 100644
index 00..1c589669e1
--- /dev/null
+++ b/libavutil/hwcontext_amf.c
@@ -0,0 +1,585 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301

[FFmpeg-devel] [PATCH 02/10, v3] avcodec: add amfdec.

2024-05-30 Thread Dmitrii Ovchinnikov
From: Evgeny Pavlov 

Added AMF based h264, hevc, av1 decoders.
Co-authored-by: Dmitrii Ovchinnikov 
v2: added encoder reinitialisation
v3: use AMF_SURFACE_UNKNOWN to int decoder(ctx->output_format before)
---
 libavcodec/Makefile|   7 +-
 libavcodec/allcodecs.c |   3 +
 libavcodec/amfdec.c| 696 +
 libavcodec/amfdec.h|  63 
 4 files changed, 767 insertions(+), 2 deletions(-)
 create mode 100644 libavcodec/amfdec.c
 create mode 100644 libavcodec/amfdec.h

diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 2443d2c6fd..69918903ff 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -70,7 +70,7 @@ include $(SRC_PATH)/libavcodec/vvc/Makefile
 OBJS-$(CONFIG_AANDCTTABLES)+= aandcttab.o
 OBJS-$(CONFIG_AC3DSP)  += ac3dsp.o ac3.o ac3tab.o
 OBJS-$(CONFIG_ADTS_HEADER) += adts_header.o 
mpeg4audio_sample_rates.o
-OBJS-$(CONFIG_AMF) += amfenc.o
+OBJS-$(CONFIG_AMF) += amfenc.o amfdec.o
 OBJS-$(CONFIG_AUDIO_FRAME_QUEUE)   += audio_frame_queue.o
 OBJS-$(CONFIG_ATSC_A53)+= atsc_a53.o
 OBJS-$(CONFIG_AUDIODSP)+= audiodsp.o
@@ -167,6 +167,7 @@ OBJS-$(CONFIG_TEXTUREDSPENC)   += texturedspenc.o
 OBJS-$(CONFIG_TPELDSP) += tpeldsp.o
 OBJS-$(CONFIG_VAAPI_ENCODE)+= vaapi_encode.o
 OBJS-$(CONFIG_AV1_AMF_ENCODER) += amfenc_av1.o
+OBJS-$(CONFIG_AV1_AMF_DECODER) += amfdec.o
 OBJS-$(CONFIG_VC1DSP)  += vc1dsp.o
 OBJS-$(CONFIG_VIDEODSP)+= videodsp.o
 OBJS-$(CONFIG_VP3DSP)  += vp3dsp.o
@@ -409,6 +410,7 @@ OBJS-$(CONFIG_H264_DECODER)+= h264dec.o 
h264_cabac.o h264_cavlc.o \
   h264_refs.o \
   h264_slice.o h264data.o h274.o
 OBJS-$(CONFIG_H264_AMF_ENCODER)+= amfenc_h264.o
+OBJS-$(CONFIG_H264_AMF_DECODER)+= amfdec.o
 OBJS-$(CONFIG_H264_CUVID_DECODER)  += cuviddec.o
 OBJS-$(CONFIG_H264_MEDIACODEC_DECODER) += mediacodecdec.o
 OBJS-$(CONFIG_H264_MEDIACODEC_ENCODER) += mediacodecenc.o
@@ -435,6 +437,7 @@ OBJS-$(CONFIG_HEVC_DECODER)+= hevcdec.o 
hevc_mvs.o \
   hevcdsp.o hevc_filter.o hevc_data.o \
   h274.o aom_film_grain.o
 OBJS-$(CONFIG_HEVC_AMF_ENCODER)+= amfenc_hevc.o
+OBJS-$(CONFIG_HEVC_AMF_DECODER)+= amfdec.o
 OBJS-$(CONFIG_HEVC_CUVID_DECODER)  += cuviddec.o
 OBJS-$(CONFIG_HEVC_MEDIACODEC_DECODER) += mediacodecdec.o
 OBJS-$(CONFIG_HEVC_MEDIACODEC_ENCODER) += mediacodecenc.o
@@ -1263,7 +1266,7 @@ SKIPHEADERS+= %_tablegen.h
  \
   bitstream_template.h  \
   $(ARCH)/vpx_arith.h   \
 
-SKIPHEADERS-$(CONFIG_AMF)  += amfenc.h
+SKIPHEADERS-$(CONFIG_AMF)  += amfenc.h amfdec.h
 SKIPHEADERS-$(CONFIG_D3D11VA)  += d3d11va.h dxva2_internal.h
 SKIPHEADERS-$(CONFIG_D3D12VA)  += d3d12va_decode.h
 SKIPHEADERS-$(CONFIG_DXVA2)+= dxva2.h dxva2_internal.h
diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
index b102a8069e..d215c9f0d4 100644
--- a/libavcodec/allcodecs.c
+++ b/libavcodec/allcodecs.c
@@ -834,10 +834,12 @@ extern const FFCodec ff_av1_nvenc_encoder;
 extern const FFCodec ff_av1_qsv_decoder;
 extern const FFCodec ff_av1_qsv_encoder;
 extern const FFCodec ff_av1_amf_encoder;
+extern const FFCodec ff_av1_amf_decoder;
 extern const FFCodec ff_av1_vaapi_encoder;
 extern const FFCodec ff_libopenh264_encoder;
 extern const FFCodec ff_libopenh264_decoder;
 extern const FFCodec ff_h264_amf_encoder;
+extern const FFCodec ff_h264_amf_decoder;
 extern const FFCodec ff_h264_cuvid_decoder;
 extern const FFCodec ff_h264_mf_encoder;
 extern const FFCodec ff_h264_nvenc_encoder;
@@ -847,6 +849,7 @@ extern const FFCodec ff_h264_v4l2m2m_encoder;
 extern const FFCodec ff_h264_vaapi_encoder;
 extern const FFCodec ff_h264_videotoolbox_encoder;
 extern const FFCodec ff_hevc_amf_encoder;
+extern const FFCodec ff_hevc_amf_decoder;
 extern const FFCodec ff_hevc_cuvid_decoder;
 extern const FFCodec ff_hevc_mediacodec_decoder;
 extern const FFCodec ff_hevc_mediacodec_encoder;
diff --git a/libavcodec/amfdec.c b/libavcodec/amfdec.c
new file mode 100644
index 00..f365d3084c
--- /dev/null
+++ b/libavcodec/amfdec.c
@@ -0,0 +1,696 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTAB

[FFmpeg-devel] [PATCH 03/10, v3] avcodec/amfenc: Fixes the color information in the output.

2024-05-30 Thread Dmitrii Ovchinnikov
From: Michael Fabian 'Xaymar' Dirks 

added 10 bit support for amf hevc.

before:

command - ffmpeg.exe -hide_banner -y -hwaccel d3d11va -hwaccel_output_format 
d3d11 -i test_10bit_file.mkv -an -c:v h264_amf res.dx11_hw_h264.mkv
output -  Format of input frames context (p010le) is not supported by AMF.
command - ffmpeg.exe -hide_banner -y -hwaccel d3d11va -hwaccel_output_format 
d3d11 -i test_10bit_file -an -c:v hevc_amf res.dx11_hw_hevc.mkv
output -  Format of input frames context (p010le) is not supported by AMF.

after:

command - ffmpeg.exe -hide_banner -y -hwaccel d3d11va -hwaccel_output_format 
d3d11 -i test_10bit_file -an -c:v h264_amf res.dx11_hw_h264.mkv
output -  10-bit input video is not supported by AMF H264 encoder
command - ffmpeg.exe -hide_banner -y -hwaccel d3d11va -hwaccel_output_format 
d3d11 -i test_10bit_file -an -c:v hevc_amf res.dx11_hw_hevc.mkv
output -  10bit file

v2 - lost line returned in ff_amf_pix_fmts
v3 - fixes after review
v4 - extract duplicated code, fix incorrect processing of 10-bit input for h264
v5 - non-functional changes after review

Co-authored-by: Evgeny Pavlov 
Co-authored-by: Araz Iusubov 
---
 libavcodec/amfenc.c  | 37 +
 libavcodec/amfenc.h  |  3 +++
 libavcodec/amfenc_h264.c | 24 
 libavcodec/amfenc_hevc.c | 26 +-
 4 files changed, 85 insertions(+), 5 deletions(-)

diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
index 061859f85c..0bd15dd812 100644
--- a/libavcodec/amfenc.c
+++ b/libavcodec/amfenc.c
@@ -60,6 +60,7 @@ const enum AVPixelFormat ff_amf_pix_fmts[] = {
 #if CONFIG_DXVA2
 AV_PIX_FMT_DXVA2_VLD,
 #endif
+AV_PIX_FMT_P010,
 AV_PIX_FMT_NONE
 };
 
@@ -72,6 +73,7 @@ static const FormatMap format_map[] =
 {
 { AV_PIX_FMT_NONE,   AMF_SURFACE_UNKNOWN },
 { AV_PIX_FMT_NV12,   AMF_SURFACE_NV12 },
+{ AV_PIX_FMT_P010,   AMF_SURFACE_P010 },
 { AV_PIX_FMT_BGR0,   AMF_SURFACE_BGRA },
 { AV_PIX_FMT_RGB0,   AMF_SURFACE_RGBA },
 { AV_PIX_FMT_GRAY8,  AMF_SURFACE_GRAY8 },
@@ -785,6 +787,41 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket 
*avpkt)
 return ret;
 }
 
+int ff_amf_get_color_profile(AVCodecContext *avctx)
+{
+amf_int64 color_profile = AMF_VIDEO_CONVERTER_COLOR_PROFILE_UNKNOWN;
+if (avctx->color_range == AVCOL_RANGE_JPEG) {
+/// Color Space for Full (JPEG) Range
+switch (avctx->colorspace) {
+case AVCOL_SPC_SMPTE170M:
+color_profile = AMF_VIDEO_CONVERTER_COLOR_PROFILE_FULL_601;
+break;
+case AVCOL_SPC_BT709:
+color_profile = AMF_VIDEO_CONVERTER_COLOR_PROFILE_FULL_709;
+break;
+case AVCOL_SPC_BT2020_NCL:
+case AVCOL_SPC_BT2020_CL:
+color_profile = AMF_VIDEO_CONVERTER_COLOR_PROFILE_FULL_2020;
+break;
+}
+} else {
+/// Color Space for Limited (MPEG) range
+switch (avctx->colorspace) {
+case AVCOL_SPC_SMPTE170M:
+color_profile = AMF_VIDEO_CONVERTER_COLOR_PROFILE_601;
+break;
+case AVCOL_SPC_BT709:
+color_profile = AMF_VIDEO_CONVERTER_COLOR_PROFILE_709;
+break;
+case AVCOL_SPC_BT2020_NCL:
+case AVCOL_SPC_BT2020_CL:
+color_profile = AMF_VIDEO_CONVERTER_COLOR_PROFILE_2020;
+break;
+}
+}
+return color_profile;
+}
+
 const AVCodecHWConfigInternal *const ff_amfenc_hw_configs[] = {
 #if CONFIG_D3D11VA
 HW_CONFIG_ENCODER_FRAMES(D3D11, D3D11VA),
diff --git a/libavcodec/amfenc.h b/libavcodec/amfenc.h
index 2dbd378ef8..62736ef579 100644
--- a/libavcodec/amfenc.h
+++ b/libavcodec/amfenc.h
@@ -21,6 +21,7 @@
 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -170,6 +171,8 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket 
*avpkt);
 */
 extern const enum AVPixelFormat ff_amf_pix_fmts[];
 
+int ff_amf_get_color_profile(AVCodecContext *avctx);
+
 /**
 * Error handling helper
 */
diff --git a/libavcodec/amfenc_h264.c b/libavcodec/amfenc_h264.c
index abfac2a90f..ad5fcc9ecb 100644
--- a/libavcodec/amfenc_h264.c
+++ b/libavcodec/amfenc_h264.c
@@ -199,6 +199,8 @@ static av_cold int amf_encode_init_h264(AVCodecContext 
*avctx)
 AMFRate  framerate;
 AMFSize  framesize = 
AMFConstructSize(avctx->width, avctx->height);
 int  deblocking_filter = (avctx->flags & 
AV_CODEC_FLAG_LOOP_FILTER) ? 1 : 0;
+amf_int64color_profile;
+enum AVPixelFormat pix_fmt;
 
 if (avctx->framerate.num > 0 && avctx->framerate.den > 0) {
 framerate = AMFConstructRate(avctx->framerate.num, 
avctx->framerate.den);
@@ -262,10 +264,24 @@ FF_ENABLE_DEPRECATION_WARNINGS
 AMF_ASSIGN_PROPERTY_RATIO(res, ctx->encoder, 
AMF_VIDEO_ENCODER_ASPECT_RATIO, ratio);
 }
 
-/// Color Range (Parti

[FFmpeg-devel] [PATCH 04/10, v3] avcodec/amfenc: HDR metadata.

2024-05-30 Thread Dmitrii Ovchinnikov
From: nyanmisaka 

v2: fixes for indentation
---
 libavcodec/amfenc.c | 83 +
 1 file changed, 83 insertions(+)

diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
index 0bd15dd812..068bb53002 100644
--- a/libavcodec/amfenc.c
+++ b/libavcodec/amfenc.c
@@ -36,6 +36,57 @@
 #include "amfenc.h"
 #include "encode.h"
 #include "internal.h"
+#include "libavutil/mastering_display_metadata.h"
+
+static int amf_save_hdr_metadata(AVCodecContext *avctx, const AVFrame *frame, 
AMFHDRMetadata *hdrmeta)
+{
+AVFrameSideData*sd_display;
+AVFrameSideData*sd_light;
+AVMasteringDisplayMetadata *display_meta;
+AVContentLightMetadata *light_meta;
+
+sd_display = av_frame_get_side_data(frame, 
AV_FRAME_DATA_MASTERING_DISPLAY_METADATA);
+if (sd_display) {
+display_meta = (AVMasteringDisplayMetadata *)sd_display->data;
+if (display_meta->has_luminance) {
+const unsigned int luma_den = 1;
+hdrmeta->maxMasteringLuminance =
+(amf_uint32)(luma_den * av_q2d(display_meta->max_luminance));
+hdrmeta->minMasteringLuminance =
+FFMIN((amf_uint32)(luma_den * 
av_q2d(display_meta->min_luminance)), hdrmeta->maxMasteringLuminance);
+}
+if (display_meta->has_primaries) {
+const unsigned int chroma_den = 5;
+hdrmeta->redPrimary[0] =
+FFMIN((amf_uint16)(chroma_den * 
av_q2d(display_meta->display_primaries[0][0])), chroma_den);
+hdrmeta->redPrimary[1] =
+FFMIN((amf_uint16)(chroma_den * 
av_q2d(display_meta->display_primaries[0][1])), chroma_den);
+hdrmeta->greenPrimary[0] =
+FFMIN((amf_uint16)(chroma_den * 
av_q2d(display_meta->display_primaries[1][0])), chroma_den);
+hdrmeta->greenPrimary[1] =
+FFMIN((amf_uint16)(chroma_den * 
av_q2d(display_meta->display_primaries[1][1])), chroma_den);
+hdrmeta->bluePrimary[0] =
+FFMIN((amf_uint16)(chroma_den * 
av_q2d(display_meta->display_primaries[2][0])), chroma_den);
+hdrmeta->bluePrimary[1] =
+FFMIN((amf_uint16)(chroma_den * 
av_q2d(display_meta->display_primaries[2][1])), chroma_den);
+hdrmeta->whitePoint[0] =
+FFMIN((amf_uint16)(chroma_den * 
av_q2d(display_meta->white_point[0])), chroma_den);
+hdrmeta->whitePoint[1] =
+FFMIN((amf_uint16)(chroma_den * 
av_q2d(display_meta->white_point[1])), chroma_den);
+}
+
+sd_light = av_frame_get_side_data(frame, 
AV_FRAME_DATA_CONTENT_LIGHT_LEVEL);
+if (sd_light) {
+light_meta = (AVContentLightMetadata *)sd_light->data;
+if (light_meta) {
+hdrmeta->maxContentLightLevel = (amf_uint16)light_meta->MaxCLL;
+hdrmeta->maxFrameAverageLightLevel = 
(amf_uint16)light_meta->MaxFALL;
+}
+}
+return 0;
+}
+return 1;
+}
 
 #if CONFIG_D3D11VA
 #include 
@@ -683,6 +734,26 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket 
*avpkt)
 frame_ref_storage_buffer->pVtbl->Release(frame_ref_storage_buffer);
 }
 
+// HDR10 metadata
+if (frame->color_trc == AVCOL_TRC_SMPTE2084) {
+AMFBuffer * hdrmeta_buffer = NULL;
+res = ctx->context->pVtbl->AllocBuffer(ctx->context, 
AMF_MEMORY_HOST, sizeof(AMFHDRMetadata), &hdrmeta_buffer);
+if (res == AMF_OK) {
+AMFHDRMetadata * hdrmeta = 
(AMFHDRMetadata*)hdrmeta_buffer->pVtbl->GetNative(hdrmeta_buffer);
+if (amf_save_hdr_metadata(avctx, frame, hdrmeta) == 0) {
+switch (avctx->codec->id) {
+case AV_CODEC_ID_H264:
+AMF_ASSIGN_PROPERTY_INTERFACE(res, ctx->encoder, 
AMF_VIDEO_ENCODER_INPUT_HDR_METADATA, hdrmeta_buffer); break;
+case AV_CODEC_ID_HEVC:
+AMF_ASSIGN_PROPERTY_INTERFACE(res, ctx->encoder, 
AMF_VIDEO_ENCODER_HEVC_INPUT_HDR_METADATA, hdrmeta_buffer); break;
+}
+res = amf_set_property_buffer(surface, 
L"av_frame_hdrmeta", hdrmeta_buffer);
+AMF_RETURN_IF_FALSE(avctx, res == AMF_OK, AVERROR_UNKNOWN, 
"SetProperty failed for \"av_frame_hdrmeta\" with error %d\n", res);
+}
+hdrmeta_buffer->pVtbl->Release(hdrmeta_buffer);
+}
+}
+
 surface->pVtbl->SetPts(surface, frame->pts);
 AMF_ASSIGN_PROPERTY_INT64(res, surface, PTS_PROP, frame->pts);
 
@@ -746,6 +817,18 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket 
*avpkt)
 }
 res_resubmit = AMF_OK;
 if (ctx->delayed_surface != NULL) { // try to resubmit frame
+if (ctx->delayed_surface->pVtbl->HasProperty(ctx->delayed_surface, 
L"av_frame_hdrmeta")) {
+AMFBuffer

[FFmpeg-devel] [PATCH 05/10, v3] avcodec/amfenc: add 10 bit encoding in av1_amf

2024-05-30 Thread Dmitrii Ovchinnikov
From: Evgeny Pavlov 

v2: refactored after review

Signed-off-by: Evgeny Pavlov 
Co-authored-by: Araz Iusubov 
---
 libavcodec/amfenc.c |  2 ++
 libavcodec/amfenc_av1.c | 22 ++
 2 files changed, 24 insertions(+)

diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
index 068bb53002..49dd91c4e0 100644
--- a/libavcodec/amfenc.c
+++ b/libavcodec/amfenc.c
@@ -746,6 +746,8 @@ int ff_amf_receive_packet(AVCodecContext *avctx, AVPacket 
*avpkt)
 AMF_ASSIGN_PROPERTY_INTERFACE(res, ctx->encoder, 
AMF_VIDEO_ENCODER_INPUT_HDR_METADATA, hdrmeta_buffer); break;
 case AV_CODEC_ID_HEVC:
 AMF_ASSIGN_PROPERTY_INTERFACE(res, ctx->encoder, 
AMF_VIDEO_ENCODER_HEVC_INPUT_HDR_METADATA, hdrmeta_buffer); break;
+case AV_CODEC_ID_AV1:
+AMF_ASSIGN_PROPERTY_INTERFACE(res, ctx->encoder, 
AMF_VIDEO_ENCODER_AV1_INPUT_HDR_METADATA, hdrmeta_buffer); break;
 }
 res = amf_set_property_buffer(surface, 
L"av_frame_hdrmeta", hdrmeta_buffer);
 AMF_RETURN_IF_FALSE(avctx, res == AMF_OK, AVERROR_UNKNOWN, 
"SetProperty failed for \"av_frame_hdrmeta\" with error %d\n", res);
diff --git a/libavcodec/amfenc_av1.c b/libavcodec/amfenc_av1.c
index 9f18aac648..cc48e93fcb 100644
--- a/libavcodec/amfenc_av1.c
+++ b/libavcodec/amfenc_av1.c
@@ -165,6 +165,9 @@ static av_cold int amf_encode_init_av1(AVCodecContext* 
avctx)
 AMFGuid guid;
 AMFRate framerate;
 AMFSize framesize = AMFConstructSize(avctx->width, 
avctx->height);
+amf_int64   color_depth;
+amf_int64   color_profile;
+enumAVPixelFormat pix_fmt;
 
 
 
@@ -203,6 +206,25 @@ FF_ENABLE_DEPRECATION_WARNINGS
 }
 AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder, 
AMF_VIDEO_ENCODER_AV1_PROFILE, profile);
 
+/// Color profile
+color_profile = ff_amf_get_color_profile(avctx);
+AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder, 
AMF_VIDEO_ENCODER_AV1_OUTPUT_COLOR_PROFILE, color_profile);
+
+/// Color Depth
+pix_fmt = avctx->hw_frames_ctx ? 
((AVHWFramesContext*)avctx->hw_frames_ctx->data)->sw_format
+: avctx->pix_fmt;
+color_depth = AMF_COLOR_BIT_DEPTH_8;
+if (pix_fmt == AV_PIX_FMT_P010) {
+color_depth = AMF_COLOR_BIT_DEPTH_10;
+}
+
+AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder, 
AMF_VIDEO_ENCODER_AV1_COLOR_BIT_DEPTH, color_depth);
+AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder, 
AMF_VIDEO_ENCODER_AV1_OUTPUT_COLOR_PROFILE, color_profile);
+/// Color Transfer Characteristics (AMF matches ISO/IEC)
+AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder, 
AMF_VIDEO_ENCODER_AV1_OUTPUT_TRANSFER_CHARACTERISTIC, 
(amf_int64)avctx->color_trc);
+/// Color Primaries (AMF matches ISO/IEC)
+AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder, 
AMF_VIDEO_ENCODER_AV1_OUTPUT_COLOR_PRIMARIES, 
(amf_int64)avctx->color_primaries);
+
 profile_level = avctx->level;
 if (profile_level == AV_LEVEL_UNKNOWN) {
 profile_level = ctx->level;
-- 
2.39.3 (Apple Git-146)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 06/10, v3] avcodec/amfenc: GPU driver version check

2024-05-30 Thread Dmitrii Ovchinnikov
From: Araz Iusubov 

Implemented gpu driver check.
10-bit patch works incorrectly on driver version lower than 23.30.

Signed-off-by: Araz Iusubov 
---
 libavcodec/amfenc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
index 49dd91c4e0..510050e282 100644
--- a/libavcodec/amfenc.c
+++ b/libavcodec/amfenc.c
@@ -558,6 +558,10 @@ int ff_amf_encode_init(AVCodecContext *avctx)
 if ((ret = amf_load_library(avctx)) == 0) {
 if ((ret = amf_init_context(avctx)) == 0) {
 if ((ret = amf_init_encoder(avctx)) == 0) {
+if (avctx->pix_fmt == AV_PIX_FMT_P010) {
+AmfContext *ctx = avctx->priv_data;
+AMF_RETURN_IF_FALSE(ctx, ctx->version >= 
AMF_MAKE_FULL_VERSION(1, 4, 32, 0), AVERROR_UNKNOWN, "10-bit encoder is not 
supported by AMD GPU drivers versions lower than 23.30.\n");
+}
 return 0;
 }
 }
-- 
2.39.3 (Apple Git-146)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 09/10, v3] avfilter/scale_amf: Add AMF VPP & super resolution filters

2024-05-30 Thread Dmitrii Ovchinnikov
From: Evgeny Pavlov 

This commit adds two AMF filters: vpp_amf & sr_amf.
Both filters are using AMF hardware acceleration.
vpp_amf supports simple scaling algorithms & color conversion.
sr_amf supports advanced scaling algorithms such as FSR & can
be used for upscaling only.
---
 configure   |   1 +
 libavfilter/Makefile|   2 +
 libavfilter/allfilters.c|   2 +
 libavfilter/vf_amf_common.c | 516 
 libavfilter/vf_amf_common.h |  73 +
 libavfilter/vf_sr_amf.c | 189 +
 libavfilter/vf_vpp_amf.c| 264 ++
 7 files changed, 1047 insertions(+)
 create mode 100644 libavfilter/vf_amf_common.c
 create mode 100644 libavfilter/vf_amf_common.h
 create mode 100644 libavfilter/vf_sr_amf.c
 create mode 100644 libavfilter/vf_vpp_amf.c

diff --git a/configure b/configure
index 96b181fd21..56d9bad3ee 100755
--- a/configure
+++ b/configure
@@ -3916,6 +3916,7 @@ rubberband_filter_deps="librubberband"
 sab_filter_deps="gpl swscale"
 scale2ref_filter_deps="swscale"
 scale_filter_deps="swscale"
+scale_amf_filter_deps="amf"
 scale_qsv_filter_deps="libmfx"
 scale_qsv_filter_select="qsvvpp"
 scdet_filter_select="scene_sad"
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 5992fd161f..8c8a9466a8 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -500,6 +500,7 @@ OBJS-$(CONFIG_SITI_FILTER)   += vf_siti.o
 OBJS-$(CONFIG_SPLIT_FILTER)  += split.o
 OBJS-$(CONFIG_SPP_FILTER)+= vf_spp.o qp_table.o
 OBJS-$(CONFIG_SR_FILTER) += vf_sr.o
+OBJS-$(CONFIG_SR_AMF_FILTER) += vf_sr_amf.o scale_eval.o 
vf_amf_common.o
 OBJS-$(CONFIG_SSIM_FILTER)   += vf_ssim.o framesync.o
 OBJS-$(CONFIG_SSIM360_FILTER)+= vf_ssim360.o framesync.o
 OBJS-$(CONFIG_STEREO3D_FILTER)   += vf_stereo3d.o
@@ -553,6 +554,7 @@ OBJS-$(CONFIG_VIDSTABTRANSFORM_FILTER)   += 
vidstabutils.o vf_vidstabtransfo
 OBJS-$(CONFIG_VIF_FILTER)+= vf_vif.o framesync.o
 OBJS-$(CONFIG_VIGNETTE_FILTER)   += vf_vignette.o
 OBJS-$(CONFIG_VMAFMOTION_FILTER) += vf_vmafmotion.o framesync.o
+OBJS-$(CONFIG_VPP_AMF_FILTER)+= vf_vpp_amf.o scale_eval.o 
vf_amf_common.o
 OBJS-$(CONFIG_VPP_QSV_FILTER)+= vf_vpp_qsv.o
 OBJS-$(CONFIG_VSTACK_FILTER) += vf_stack.o framesync.o
 OBJS-$(CONFIG_W3FDIF_FILTER) += vf_w3fdif.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index c532682fc2..2f40fb8f6f 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -430,6 +430,8 @@ extern const AVFilter ff_vf_roberts_opencl;
 extern const AVFilter ff_vf_rotate;
 extern const AVFilter ff_vf_sab;
 extern const AVFilter ff_vf_scale;
+extern const AVFilter ff_vf_vpp_amf;
+extern const AVFilter ff_vf_sr_amf;
 extern const AVFilter ff_vf_scale_cuda;
 extern const AVFilter ff_vf_scale_npp;
 extern const AVFilter ff_vf_scale_qsv;
diff --git a/libavfilter/vf_amf_common.c b/libavfilter/vf_amf_common.c
new file mode 100644
index 00..b842aae77a
--- /dev/null
+++ b/libavfilter/vf_amf_common.c
@@ -0,0 +1,516 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "vf_amf_common.h"
+
+#include "libavutil/avassert.h"
+#include "avfilter.h"
+#include "internal.h"
+#include "formats.h"
+#include "libavutil/imgutils.h"
+
+#include "libavutil/hwcontext_amf.h"
+#include "libavutil/hwcontext_amf_internal.h"
+#include "AMF/components/ColorSpace.h"
+#include "scale_eval.h"
+
+#if CONFIG_DXVA2
+#include 
+#endif
+
+#if CONFIG_D3D11VA
+#include 
+#endif
+
+int amf_filter_init(AVFilterContext *avctx)
+{
+AMFFilterContext *ctx = avctx->priv;
+
+if (!strcmp(ctx->format_str, "same")) {
+ctx->format = AV_PIX_FMT_NONE;
+} else {
+ctx->format = av_get_pix_fmt(ctx->format_str);
+if (ctx->format == AV_PIX_FMT_NONE) {
+av_log(avctx, AV_LOG_ERROR, "Unrecognized pixel format: %s\n", 
ctx->format_str);
+return AVERROR(EINVAL);
+}
+}
+
+return 0;
+}
+
+void amf_filter_uninit(AVFilterContext *avctx)
+{
+AMFFilterContext *ctx = avctx->priv;
+
+if (ctx->c

[FFmpeg-devel] [PATCH 07/10, v3] avcodec/amfenc: add smart access video option

2024-05-30 Thread Dmitrii Ovchinnikov
From: Evgeny Pavlov 

This commit adds option for enabling SmartAccess Video (SAV)
in AMF encoders. SAV is an AMD hardware-specific feature which
enables the parallelization of encode and decode streams across
multiple Video Codec Engine (VCN) hardware instances.

Signed-off-by: Evgeny Pavlov 
---
 libavcodec/amfenc.h  |  1 +
 libavcodec/amfenc_av1.c  | 18 ++
 libavcodec/amfenc_h264.c | 18 ++
 libavcodec/amfenc_hevc.c | 18 ++
 4 files changed, 55 insertions(+)

diff --git a/libavcodec/amfenc.h b/libavcodec/amfenc.h
index 62736ef579..1bda0136bd 100644
--- a/libavcodec/amfenc.h
+++ b/libavcodec/amfenc.h
@@ -90,6 +90,7 @@ typedef struct AmfContext {
 int quality;
 int b_frame_delta_qp;
 int ref_b_frame_delta_qp;
+int smart_access_video;
 
 // Dynamic options, can be set after Init() call
 
diff --git a/libavcodec/amfenc_av1.c b/libavcodec/amfenc_av1.c
index cc48e93fcb..7d37a242fc 100644
--- a/libavcodec/amfenc_av1.c
+++ b/libavcodec/amfenc_av1.c
@@ -104,6 +104,8 @@ static const AVOption options[] = {
 
 { "log_to_dbg", "Enable AMF logging to debug output",   
OFFSET(log_to_dbg), AV_OPT_TYPE_BOOL,{.i64 = 0 }, 0, 1, VE },
 
+{ "smart_access_video", "Enable Smart Access Video",
OFFSET(smart_access_video), AV_OPT_TYPE_BOOL, {.i64 = -1  }, -1, 1, 
VE},
+
 //Pre Analysis options
 { "preanalysis","Enable preanalysis",  
 OFFSET(preanalysis),   
 AV_OPT_TYPE_BOOL,   {.i64 = -1 }, -1, 1, VE },
 
@@ -265,6 +267,22 @@ FF_ENABLE_DEPRECATION_WARNINGS
 }
 }
 
+if (ctx->smart_access_video != -1) {
+AMF_ASSIGN_PROPERTY_BOOL(res, ctx->encoder, 
AMF_VIDEO_ENCODER_AV1_ENABLE_SMART_ACCESS_VIDEO, ctx->smart_access_video != 0);
+if (res != AMF_OK) {
+av_log(avctx, AV_LOG_ERROR, "The Smart Access Video is not 
supported by AMF.\n");
+if (ctx->smart_access_video != 0)
+return AVERROR(ENOSYS);
+} else {
+av_log(avctx, AV_LOG_INFO, "The Smart Access Video (%d) is 
set.\n", ctx->smart_access_video);
+// Set low latency mode if Smart Access Video is enabled
+if (ctx->smart_access_video != 0) {
+AMF_ASSIGN_PROPERTY_BOOL(res, ctx->encoder, 
AMF_VIDEO_ENCODER_AV1_ENCODING_LATENCY_MODE, 
AMF_VIDEO_ENCODER_AV1_ENCODING_LATENCY_MODE_LOWEST_LATENCY);
+av_log(avctx, AV_LOG_INFO, "The Smart Access Video set low 
latency mode.\n");
+}
+}
+}
+
 // Pre-Pass, Pre-Analysis, Two-Pass
 if (ctx->rate_control_mode == 
AMF_VIDEO_ENCODER_AV1_RATE_CONTROL_METHOD_CONSTANT_QP) {
 AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder, 
AMF_VIDEO_ENCODER_AV1_RATE_CONTROL_PREENCODE, 0);
diff --git a/libavcodec/amfenc_h264.c b/libavcodec/amfenc_h264.c
index ad5fcc9ecb..a26a6dbef8 100644
--- a/libavcodec/amfenc_h264.c
+++ b/libavcodec/amfenc_h264.c
@@ -136,6 +136,8 @@ static const AVOption options[] = {
 
 { "log_to_dbg", "Enable AMF logging to debug output",   
OFFSET(log_to_dbg), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, VE },
 
+{ "smart_access_video", "Enable Smart Access Video",
OFFSET(smart_access_video), AV_OPT_TYPE_BOOL, {.i64 = -1  }, -1, 1, VE},
+
 //Pre Analysis options
 { "preanalysis","Enable preanalysis",  
 OFFSET(preanalysis),   
 AV_OPT_TYPE_BOOL,   {.i64 = -1 }, -1, 1, VE },
 
@@ -369,6 +371,22 @@ FF_ENABLE_DEPRECATION_WARNINGS
 av_log(ctx, AV_LOG_WARNING, "rate control mode is PEAK_CONSTRAINED_VBR 
but rc_max_rate is not set\n");
 }
 
+if (ctx->smart_access_video != -1) {
+AMF_ASSIGN_PROPERTY_BOOL(res, ctx->encoder, 
AMF_VIDEO_ENCODER_ENABLE_SMART_ACCESS_VIDEO, ctx->smart_access_video != 0);
+if (res != AMF_OK) {
+av_log(avctx, AV_LOG_ERROR, "The Smart Access Video is not 
supported by AMF.\n");
+if (ctx->smart_access_video != 0)
+return AVERROR(ENOSYS);
+} else {
+av_log(avctx, AV_LOG_INFO, "The Smart Access Video (%d) is 
set.\n", ctx->smart_access_video);
+// Set low latency mode if Smart Access Video is enabled
+if (ctx->smart_access_video != 0) {
+AMF_ASSIGN_PROPERTY_BOOL(res, ctx->encoder, 
AMF_VIDEO_ENCODER_LOWLATENCY_MODE, true);
+av_log(avctx, AV_LOG_INFO, "The Smart Access Video set low 
latency mode.\n");
+}
+}
+}
+
 if (ctx->preanalysis != -1) {
 AMF_ASSIGN_PROPERTY_BOOL(res, ctx->encoder, 
AMF_VIDEO_ENCODER_PRE_ANALYSIS_ENABLE, !!((ctx->preanalysis == 0) ? false : 
true));
 }
diff --git a/libavcodec/amfenc_hevc.c b/libavcodec/amfenc_hevc.c
index a89a3cf20c..8c26956513 100644
--- a/

[FFmpeg-devel] [PATCH 10/10, v3] doc/filters: Add documentation for AMF filters

2024-05-30 Thread Dmitrii Ovchinnikov
From: Evgeny Pavlov 

Signed-off-by: Evgeny Pavlov 
---
 doc/filters.texi | 238 +++
 1 file changed, 238 insertions(+)

diff --git a/doc/filters.texi b/doc/filters.texi
index f5bf475d13..78e87ff5f7 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -22791,6 +22791,76 @@ input upscaled using bicubic upscaling with proper 
scale factor.
 
 To get full functionality (such as async execution), please use the 
@ref{dnn_processing} filter.
 
+@anchor{sr_amf}
+@section sr_amf
+
+Upscale (size increasing) for the input video using AMD Advanced Media 
Framework library for hardware acceleration.
+Use advanced algorithms for upscaling with higher output quality.
+Setting the output width and height works in the same way as for the 
@ref{scale} filter.
+
+The filter accepts the following options:
+@table @option
+@item w
+@item h
+Set the output video dimension expression. Default value is the input 
dimension.
+
+Allows for the same expressions as the @ref{scale} filter.
+
+@item algorithm
+Sets the algorithm used for scaling:
+
+@table @var
+@item bilinear
+Bilinear
+
+@item bicubic
+Bicubic
+
+@item sr1-0
+Video SR1.0
+This is a default value
+
+@item point
+Point
+
+@item sr1-1
+Video SR1.1
+
+@end table
+
+@item sharpness
+Control hq scaler sharpening. The value is a float in the range of [0.0, 2.0]
+
+@item format
+Controls the output pixel format. By default, or if none is specified, the 
input
+pixel format is used.
+
+@item keep-ratio
+Force the scaler to keep the aspect ratio of the input image when the output 
size has a different aspect ratio.
+Default value is false.
+
+@item fill
+Specifies whether the output image outside the region of interest,
+which does not fill the entire output surface should be filled with a solid 
color.
+
+@end table
+
+@subsection Examples
+
+@itemize
+@item
+Scale input to 720p, keeping aspect ratio and ensuring the output is yuv420p.
+@example
+sr_amf=-2:720:format=yuv420p
+@end example
+
+@item
+Upscale to 4K with algorithm video SR1.1.
+@example
+sr_amf=4096:2160:algorithm=sr1-1
+@end example
+@end itemize
+
 @section ssim
 
 Obtain the SSIM (Structural SImilarity Metric) between two input videos.
@@ -25528,6 +25598,174 @@ Example:
 ffmpeg -i ref.mpg -vf vmafmotion -f null -
 @end example
 
+@anchor{vpp_amf}
+@section vpp_amf
+
+Scale (resize) and convert colorspace, transfer characteristics or color 
primaries for the input video, using AMD Advanced Media Framework library for 
hardware acceleration.
+Setting the output width and height works in the same way as for the 
@ref{scale} filter.
+
+The filter accepts the following options:
+@table @option
+@item w
+@item h
+Set the output video dimension expression. Default value is the input 
dimension.
+
+Allows for the same expressions as the @ref{scale} filter.
+
+@item scale_type
+Sets the algorithm used for scaling:
+
+@table @var
+@item bilinear
+Bilinear
+
+This is the default.
+
+@item bicubic
+Bicubic
+
+@end table
+
+@item format
+Controls the output pixel format. By default, or if none is specified, the 
input
+pixel format is used.
+
+
+@item force_original_aspect_ratio
+@item force_divisible_by
+Work the same as the identical @ref{scale} filter options.
+
+@anchor{color_profile}
+@item color_profile
+Specify all color properties at once.
+
+The accepted values are:
+@table @samp
+@item bt601
+BT.601
+
+@item bt709
+BT.709
+
+@item bt2020
+BT.2020
+
+@end table
+
+@anchor{trc}
+@item trc
+Specify output transfer characteristics.
+
+The accepted values are:
+@table @samp
+@item bt709
+BT.709
+
+@item gamma22
+Constant gamma of 2.2
+
+@item gamma28
+Constant gamma of 2.8
+
+@item smpte170m
+SMPTE-170M
+
+@item smpte240m
+SMPTE-240M
+
+@item linear
+Linear
+
+@item log
+LOG
+
+@item log-sqrt
+LOG_SQRT
+
+@item iec61966-2-4
+iec61966-2-4
+
+@item bt1361-ecg
+BT1361_ECG
+
+@item iec61966-2-1
+iec61966-2-1
+
+@item bt2020-10
+BT.2020 for 10-bits content
+
+@item bt2020-12
+BT.2020 for 12-bits content
+
+@item smpte2084
+SMPTE2084
+
+@item smpte428
+SMPTE428
+
+@item arib-std-b67
+ARIB_STD_B67
+
+@end table
+
+@anchor{primaries}
+@item primaries
+Specify output color primaries.
+
+The accepted values are:
+@table @samp
+@item bt709
+BT.709
+
+@item bt470m
+BT.470M
+
+@item bt470bg
+BT.470BG or BT.601-6 625
+
+@item smpte170m
+SMPTE-170M or BT.601-6 525
+
+@item smpte240m
+SMPTE-240M
+
+@item film
+film
+
+@item bt2020
+BT.2020
+
+@item smpte428
+SMPTE-428
+
+@item smpte431
+SMPTE-431
+
+@item smpte432
+SMPTE-432
+
+@item jedec-p22
+JEDEC P22 phosphors
+
+@end table
+@end table
+
+@subsection Examples
+
+@itemize
+@item
+Scale input to 720p, keeping aspect ratio and ensuring the output is yuv420p.
+@example
+vpp_amf=-2:720:format=yuv420p
+@end example
+
+@item
+Upscale to 4K and change color profile to bt2020.
+@example
+vpp_amf=4096:2160:color_profile=bt2020
+@end example
+@end itemize
+
 @anchor{vstack}
 @section vstack
 Stack input videos vertically.
-- 
2.39.3 (Apple Gi

[FFmpeg-devel] [PATCH 08/10, v3] avcodec/amfenc: redesign to use hwcontext_amf.

2024-05-30 Thread Dmitrii Ovchinnikov
Co-authored-by: Evgeny Pavlov 
v3: cleanup code
---
 libavcodec/amfenc.c  | 573 +++
 libavcodec/amfenc.h  |  32 +--
 libavcodec/amfenc_av1.c  |   8 +-
 libavcodec/amfenc_h264.c |   8 +-
 libavcodec/amfenc_hevc.c |  14 +-
 5 files changed, 176 insertions(+), 459 deletions(-)

diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c
index 510050e282..c57fa1b980 100644
--- a/libavcodec/amfenc.c
+++ b/libavcodec/amfenc.c
@@ -29,6 +29,8 @@
 #define COBJMACROS
 #include "libavutil/hwcontext_dxva2.h"
 #endif
+#include "libavutil/hwcontext_amf.h"
+#include "libavutil/hwcontext_amf_internal.h"
 #include "libavutil/mem.h"
 #include "libavutil/pixdesc.h"
 #include "libavutil/time.h"
@@ -38,6 +40,18 @@
 #include "internal.h"
 #include "libavutil/mastering_display_metadata.h"
 
+#if CONFIG_D3D11VA
+#include 
+#endif
+
+#ifdef _WIN32
+#include "compat/w32dlfcn.h"
+#else
+#include 
+#endif
+
+#define PTS_PROP L"PtsProp"
+
 static int amf_save_hdr_metadata(AVCodecContext *avctx, const AVFrame *frame, 
AMFHDRMetadata *hdrmeta)
 {
 AVFrameSideData*sd_display;
@@ -88,20 +102,6 @@ static int amf_save_hdr_metadata(AVCodecContext *avctx, 
const AVFrame *frame, AM
 return 1;
 }
 
-#if CONFIG_D3D11VA
-#include 
-#endif
-
-#ifdef _WIN32
-#include "compat/w32dlfcn.h"
-#else
-#include 
-#endif
-
-#define FFMPEG_AMF_WRITER_ID L"ffmpeg_amf"
-
-#define PTS_PROP L"PtsProp"
-
 const enum AVPixelFormat ff_amf_pix_fmts[] = {
 AV_PIX_FMT_NV12,
 AV_PIX_FMT_YUV420P,
@@ -111,289 +111,18 @@ const enum AVPixelFormat ff_amf_pix_fmts[] = {
 #if CONFIG_DXVA2
 AV_PIX_FMT_DXVA2_VLD,
 #endif
+AV_PIX_FMT_AMF_SURFACE,
 AV_PIX_FMT_P010,
 AV_PIX_FMT_NONE
 };
 
-typedef struct FormatMap {
-enum AVPixelFormat   av_format;
-enum AMF_SURFACE_FORMAT  amf_format;
-} FormatMap;
-
-static const FormatMap format_map[] =
-{
-{ AV_PIX_FMT_NONE,   AMF_SURFACE_UNKNOWN },
-{ AV_PIX_FMT_NV12,   AMF_SURFACE_NV12 },
-{ AV_PIX_FMT_P010,   AMF_SURFACE_P010 },
-{ AV_PIX_FMT_BGR0,   AMF_SURFACE_BGRA },
-{ AV_PIX_FMT_RGB0,   AMF_SURFACE_RGBA },
-{ AV_PIX_FMT_GRAY8,  AMF_SURFACE_GRAY8 },
-{ AV_PIX_FMT_YUV420P,AMF_SURFACE_YUV420P },
-{ AV_PIX_FMT_YUYV422,AMF_SURFACE_YUY2 },
-};
-
-static enum AMF_SURFACE_FORMAT amf_av_to_amf_format(enum AVPixelFormat fmt)
-{
-int i;
-for (i = 0; i < amf_countof(format_map); i++) {
-if (format_map[i].av_format == fmt) {
-return format_map[i].amf_format;
-}
-}
-return AMF_SURFACE_UNKNOWN;
-}
-
-static void AMF_CDECL_CALL AMFTraceWriter_Write(AMFTraceWriter *pThis,
-const wchar_t *scope, const wchar_t *message)
-{
-AmfTraceWriter *tracer = (AmfTraceWriter*)pThis;
-av_log(tracer->avctx, AV_LOG_DEBUG, "%ls: %ls", scope, message); // \n is 
provided from AMF
-}
-
-static void AMF_CDECL_CALL AMFTraceWriter_Flush(AMFTraceWriter *pThis)
-{
-}
-
-static AMFTraceWriterVtbl tracer_vtbl =
-{
-.Write = AMFTraceWriter_Write,
-.Flush = AMFTraceWriter_Flush,
-};
-
-static int amf_load_library(AVCodecContext *avctx)
-{
-AmfContext*ctx = avctx->priv_data;
-AMFInit_Fn init_fun;
-AMFQueryVersion_Fn version_fun;
-AMF_RESULT res;
-
-ctx->delayed_frame = av_frame_alloc();
-if (!ctx->delayed_frame) {
-return AVERROR(ENOMEM);
-}
-// hardcoded to current HW queue size - will auto-realloc if too small
-ctx->timestamp_list = av_fifo_alloc2(avctx->max_b_frames + 16, 
sizeof(int64_t),
- AV_FIFO_FLAG_AUTO_GROW);
-if (!ctx->timestamp_list) {
-return AVERROR(ENOMEM);
-}
-ctx->dts_delay = 0;
-
-
-ctx->library = dlopen(AMF_DLL_NAMEA, RTLD_NOW | RTLD_LOCAL);
-AMF_RETURN_IF_FALSE(ctx, ctx->library != NULL,
-AVERROR_UNKNOWN, "DLL %s failed to open\n", AMF_DLL_NAMEA);
-
-init_fun = (AMFInit_Fn)dlsym(ctx->library, AMF_INIT_FUNCTION_NAME);
-AMF_RETURN_IF_FALSE(ctx, init_fun != NULL, AVERROR_UNKNOWN, "DLL %s failed 
to find function %s\n", AMF_DLL_NAMEA, AMF_INIT_FUNCTION_NAME);
-
-version_fun = (AMFQueryVersion_Fn)dlsym(ctx->library, 
AMF_QUERY_VERSION_FUNCTION_NAME);
-AMF_RETURN_IF_FALSE(ctx, version_fun != NULL, AVERROR_UNKNOWN, "DLL %s 
failed to find function %s\n", AMF_DLL_NAMEA, AMF_QUERY_VERSION_FUNCTION_NAME);
-
-res = version_fun(&ctx->version);
-AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN, "%s failed with 
error %d\n", AMF_QUERY_VERSION_FUNCTION_NAME, res);
-res = init_fun(AMF_FULL_VERSION, &ctx->factory);
-AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN, "%s failed with 
error %d\n", AMF_INIT_FUNCTION_NAME, res);
-res = ctx->factory->pVtbl->GetTrace(ctx->factory, &ctx->trace);
-AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN, "GetTrace() 
failed with error %d\n", res);
-res = ctx->factory->pVtbl->GetDebug(ctx->factory, &ctx->debug);
-AMF_

Re: [FFmpeg-devel] [PATCH v4 10/11] avfilter/dnn: Remove a level of dereference

2024-05-30 Thread Guo, Yejun



> -Original Message-
> From: ffmpeg-devel  On Behalf Of Zhao
> Zhili
> Sent: Wednesday, May 8, 2024 12:08 AM
> To: ffmpeg-devel@ffmpeg.org
> Cc: Zhao Zhili 
> Subject: [FFmpeg-devel] [PATCH v4 10/11] avfilter/dnn: Remove a level of
> dereference
> 
> From: Zhao Zhili 
> 
> For code such as 'model->model = ov_model' is confusing. We can just drop the
> member variable and use cast to get the subclass.
> 
> Signed-off-by: Zhao Zhili 
> ---
>  libavfilter/dnn/dnn_backend_openvino.c | 17 -
>  libavfilter/dnn/dnn_backend_tf.c   | 19 +--
>  libavfilter/dnn/dnn_backend_torch.cpp  | 15 +++
>  libavfilter/dnn_filter_common.c|  6 +++---
>  libavfilter/dnn_interface.h|  6 ++
>  5 files changed, 29 insertions(+), 34 deletions(-)
> 
this patch set pushed, thanks.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/5] lavu/intmath.h: Fix UB in ff_ctz_c() and ff_ctzll_c()

2024-05-30 Thread Tomas Härdin
tor 2024-05-30 klockan 16:06 +0300 skrev Rémi Denis-Courmont:
> 
> 
> Le 30 mai 2024 12:50:05 GMT+03:00, "Tomas Härdin"  a
> écrit :
> > tor 2024-05-30 klockan 10:54 +0300 skrev Rémi Denis-Courmont:
> > > Can't we just use the compiler built-ins here? AFAIK, they (GCC,
> > > LLVM) use the same algorithm if the CPU doesn't support native
> > > CTZ.
> > > And they will pick the right instruction if CPU does have CTZ.
> > > 
> > > I get it that maybe it wasn't working so well 20 years ago, but
> > > we've
> > > increased compiler version requirements since then.
> > 
> > I think we still support MSVC, but maybe we shouldn't? It's
> > possible to
> > cross-compile for Windows either way.
> 
> I don't get how that prevents using the GCC and Clang builtins (on
> GCC and Clang).

Does MSVC have builtins for these? Do all compilers we support?

/Tomas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 01/10, v3] avutil: add hwcontext_amf.

2024-05-30 Thread Andreas Rheinhardt
Dmitrii Ovchinnikov:
> Adds hwcontext_amf, which allows to use shared AMF
> context for the encoder, decoder and AMF-based filters,
> without copy to the host memory.
> It will also allow you to use some optimisations in
> the interaction of components (for example, SAV) and make a more
> manageable and optimal setup for using GPU devices with AMF
> in the case of a fully AMF pipeline.
> It will be a significant performance uplift when full AMF pipeline
> with filters is used.
> 
> We also plan to add Compression artefact removal filter in near feature.
> v2: cleanup header files
> v3: an unnecessary class has been removed.
> ---
>  libavutil/Makefile |   4 +
>  libavutil/hwcontext.c  |   4 +
>  libavutil/hwcontext.h  |   1 +
>  libavutil/hwcontext_amf.c  | 585 +
>  libavutil/hwcontext_amf.h  |  64 
>  libavutil/hwcontext_amf_internal.h |  44 +++
>  libavutil/hwcontext_internal.h |   1 +
>  libavutil/pixdesc.c|   4 +
>  libavutil/pixfmt.h |   5 +
>  9 files changed, 712 insertions(+)
>  create mode 100644 libavutil/hwcontext_amf.c
>  create mode 100644 libavutil/hwcontext_amf.h
>  create mode 100644 libavutil/hwcontext_amf_internal.h
> 
> diff --git a/libavutil/Makefile b/libavutil/Makefile
> index 6e6fa8d800..13c318560d 100644
> --- a/libavutil/Makefile
> +++ b/libavutil/Makefile
> @@ -45,6 +45,7 @@ HEADERS = adler32.h 
> \
>hwcontext_d3d12va.h   \
>hwcontext_drm.h   \
>hwcontext_dxva2.h \
> +  hwcontext_amf.h   \
>hwcontext_qsv.h   \
>hwcontext_mediacodec.h\
>hwcontext_opencl.h\
> @@ -196,6 +197,7 @@ OBJS-$(CONFIG_CUDA) += 
> hwcontext_cuda.o
>  OBJS-$(CONFIG_D3D11VA)  += hwcontext_d3d11va.o
>  OBJS-$(CONFIG_D3D12VA)  += hwcontext_d3d12va.o
>  OBJS-$(CONFIG_DXVA2)+= hwcontext_dxva2.o
> +OBJS-$(CONFIG_AMF)  += hwcontext_amf.o
>  OBJS-$(CONFIG_LIBDRM)   += hwcontext_drm.o
>  OBJS-$(CONFIG_MACOS_KPERF)  += macos_kperf.o
>  OBJS-$(CONFIG_MEDIACODEC)   += hwcontext_mediacodec.o
> @@ -220,6 +222,8 @@ SKIPHEADERS-$(CONFIG_CUDA) += 
> hwcontext_cuda_internal.h \
>  SKIPHEADERS-$(CONFIG_D3D11VA)  += hwcontext_d3d11va.h
>  SKIPHEADERS-$(CONFIG_D3D12VA)  += hwcontext_d3d12va.h
>  SKIPHEADERS-$(CONFIG_DXVA2)+= hwcontext_dxva2.h
> +SKIPHEADERS-$(CONFIG_AMF)  += hwcontext_amf.h   \
> +  hwcontext_amf_internal
>  SKIPHEADERS-$(CONFIG_QSV)  += hwcontext_qsv.h
>  SKIPHEADERS-$(CONFIG_OPENCL)   += hwcontext_opencl.h
>  SKIPHEADERS-$(CONFIG_VAAPI)+= hwcontext_vaapi.h
> diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
> index fa99a0d8a4..f06d49c45c 100644
> --- a/libavutil/hwcontext.c
> +++ b/libavutil/hwcontext.c
> @@ -65,6 +65,9 @@ static const HWContextType * const hw_table[] = {
>  #endif
>  #if CONFIG_VULKAN
>  &ff_hwcontext_type_vulkan,
> +#endif
> +#if CONFIG_AMF
> +&ff_hwcontext_type_amf,
>  #endif
>  NULL,
>  };
> @@ -82,6 +85,7 @@ static const char *const hw_type_names[] = {
>  [AV_HWDEVICE_TYPE_VIDEOTOOLBOX] = "videotoolbox",
>  [AV_HWDEVICE_TYPE_MEDIACODEC] = "mediacodec",
>  [AV_HWDEVICE_TYPE_VULKAN] = "vulkan",
> +[AV_HWDEVICE_TYPE_AMF] = "amf",
>  };
>  
>  typedef struct FFHWDeviceContext {
> diff --git a/libavutil/hwcontext.h b/libavutil/hwcontext.h
> index bac30debae..96042ba197 100644
> --- a/libavutil/hwcontext.h
> +++ b/libavutil/hwcontext.h
> @@ -38,6 +38,7 @@ enum AVHWDeviceType {
>  AV_HWDEVICE_TYPE_MEDIACODEC,
>  AV_HWDEVICE_TYPE_VULKAN,
>  AV_HWDEVICE_TYPE_D3D12VA,
> +AV_HWDEVICE_TYPE_AMF,
>  };
>  
>  /**
> diff --git a/libavutil/hwcontext_amf.c b/libavutil/hwcontext_amf.c
> new file mode 100644
> index 00..1c589669e1
> --- /dev/null
> +++ b/libavutil/hwcontext_amf.c
> @@ -0,0 +1,585 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Pu

Re: [FFmpeg-devel] [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

2024-05-30 Thread Tomas Härdin
tor 2024-05-30 klockan 14:50 +0300 skrev Rémi Denis-Courmont:
> 
> 
> Le 30 mai 2024 12:40:20 GMT+03:00, "Tomas Härdin"  a
> écrit :
> > tor 2024-05-30 klockan 09:41 +0300 skrev Rémi Denis-Courmont:
> > > Hi,
> > > 
> > > Le 30 mai 2024 01:13:14 GMT+03:00, "Tomas Härdin"
> > >  a
> > > écrit :
> > > > The entire patchset passes FATE
> > > 
> > > Is the version in riscv/intmath.h safe? It looks to me that the
> > > GCC
> > > codegen for not only RV64 but also AArch{32,64} and x86-64 is
> > > better
> > > than this.
> > 
> > I haven't checked. It seems weird to me to have two different C
> > versions.
> 
> The common one ends up horrendously bad on RV, and presumably on MIPS
> and some other RISC ISA.
> 
> > We shouldn't rely on type punning.
> 
> Because?
> 
> We should depend on punning as long as it conforms to the standard.

My mistake, I forgot type punning is allowed in C. It's UB in C++

> > The standard compliant way
> > is to use memcpy()
> 
> That's way worse than union in terms of how proactively the compiler
> needs to optimise, and both approaches are as confirming.

A good compiler will do the same thing

Maybe I can get the riscv version covered by Eva as well. That's beyond
the scope of this patchset

/Tomas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 02/10, v2] avcodec: add amfdec.

2024-05-30 Thread Dmitrii Ovchinnikov
In the new version, I removed unnecessary includes and cleaned up the code
(https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=11968).
It didn't seem to help. On my local computer build is successful in all
configurations that I have tried.
The file mentioned in the error is also already used in the encoder.
Could you share the log and output of the configure command?
I will try to understand the difference from my case.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/2] lavc/speedhqdec: Add AV_CODEC_CAP_SLICE_THREADS

2024-05-30 Thread Tomas Härdin
Ping

Will push in a couple of days

/Tomas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/speedhqenc: Require width to be a multiple of 16

2024-05-30 Thread Tomas Härdin
Ping

This stops us from producing broken output

/Tomas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 02/10, v2] avcodec: add amfdec.

2024-05-30 Thread Andreas Rheinhardt
Dmitrii Ovchinnikov:
> In the new version, I removed unnecessary includes and cleaned up the code
> (https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=11968).
> It didn't seem to help. On my local computer build is successful in all
> configurations that I have tried.
> The file mentioned in the error is also already used in the encoder.
> Could you share the log and output of the configure command?
> I will try to understand the difference from my case.

I do not see a configure dependency of your AMF decoders on AMF, so they
will be built even on systems without AMF. (Does "all the configurations
that you tried" include such a configuration? It should.)

- Andreas

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

2024-05-30 Thread Rémi Denis-Courmont


Le 30 mai 2024 17:07:21 GMT+03:00, "Tomas Härdin"  a écrit :
>> We should depend on punning as long as it conforms to the standard.
>
>My mistake, I forgot type punning is allowed in C. It's UB in C++
>
>> > The standard compliant way
>> > is to use memcpy()
>> 
>> That's way worse than union in terms of how proactively the compiler
>> needs to optimise, and both approaches are as confirming.
>
>A good compiler will do the same thing

True, and I don't care very much about memcpy vs union, as they both rely on 
matching representation. AFAIR, FFmpeg tends to use unions though.

>
>Maybe I can get the riscv version covered by Eva as well. That's beyond
>the scope of this patchset

IMHO, this specific patch (and the following one) are beating dead horses. Sure 
there may be theoretical UB in the current code, but if there is a *better* 
implementation, better switch to that than bike shedding the fix for the UB.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/5] lavu/intmath.h: Fix UB in ff_ctz_c() and ff_ctzll_c()

2024-05-30 Thread Rémi Denis-Courmont


Le 30 mai 2024 17:03:09 GMT+03:00, "Tomas Härdin"  a écrit :
>> I don't get how that prevents using the GCC and Clang builtins (on
>> GCC and Clang).
>
>Does MSVC have builtins for these?

I don't know, but insofar as MSVC is used for x86, it should use x86 
instructions rather than the complex fallback algo anyway, be it via built-ins, 
or assembler.

Either way, I don't see how that detracts from using the built-ins on compilers 
that do have them.

> Do all compilers we support?

No, unless all other compilers are C23 (CTZ and CLZ are in stdbit.h). Again, 
that's hardly a reason not to use built-ins where available.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 01/10, v2] avutil: add hwcontext_amf.

2024-05-30 Thread Dmitrii Ovchinnikov
>>This seems to have forgotten to actually allocate anything?
I made an empty allocation here, since in fact all allocation
takes place in the decoder.

>>This makes it look like you really wanted to implement map_from,
not transfer_data_from.
These functions were conceived specifically for transferring data
to and from the host memory. Memory mapping functions will probably
be added in the future.
>>libavutil cannot depend on libavformat, that would be circular.
In the new version, I have removed this and some other unnecessary includes.
(https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=11968)

>>Some of these details look like they should be in the public
hwcontext so that a user can create one.
In the new version, I removed the additional class and put the
functions that the user might need in the header.
I hope everything is better now.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 01/10, v3] avutil: add hwcontext_amf.

2024-05-30 Thread Lynne via ffmpeg-devel

On 30/05/2024 15:08, Dmitrii Ovchinnikov wrote:

Adds hwcontext_amf, which allows to use shared AMF
context for the encoder, decoder and AMF-based filters,
without copy to the host memory.
It will also allow you to use some optimisations in
the interaction of components (for example, SAV) and make a more
manageable and optimal setup for using GPU devices with AMF
in the case of a fully AMF pipeline.
It will be a significant performance uplift when full AMF pipeline
with filters is used.

We also plan to add Compression artefact removal filter in near feature.
v2: cleanup header files
v3: an unnecessary class has been removed.
---
  libavutil/Makefile |   4 +
  libavutil/hwcontext.c  |   4 +
  libavutil/hwcontext.h  |   1 +
  libavutil/hwcontext_amf.c  | 585 +
  libavutil/hwcontext_amf.h  |  64 
  libavutil/hwcontext_amf_internal.h |  44 +++
  libavutil/hwcontext_internal.h |   1 +
  libavutil/pixdesc.c|   4 +
  libavutil/pixfmt.h |   5 +
  9 files changed, 712 insertions(+)
  create mode 100644 libavutil/hwcontext_amf.c
  create mode 100644 libavutil/hwcontext_amf.h
  create mode 100644 libavutil/hwcontext_amf_internal.h


Still no answer to my question?


OpenPGP_0xA2FEA5F03F034464.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 02/10, v2] avcodec: add amfdec.

2024-05-30 Thread Dmitrii Ovchinnikov
>>I do not see a configure dependency of your AMF decoders on AMF
Thanks, apparently I always put it together with the encoder and therefore
did not get such an error. I'll fix the dependencies and check the options
without AMF.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-05-30 Thread uk7b
From: sunyuechi 

Since len < 64, the registers are sufficient, so it can be
directly unrolled (a4 is even).

Another benefit of unrolling is that it reduces one load operation
vertically compared to horizontally.

 old new
 C908   X60  C908   X60
vp8_put_bilin4_h_c :6.25.5 :6.25.5
vp8_put_bilin4_h_rvv_i32   :2.22.0 :1.51.5
vp8_put_bilin4_v_c :6.55.7 :6.25.7
vp8_put_bilin4_v_rvv_i32   :2.22.0 :1.21.5
vp8_put_bilin8_h_c :   24.2   21.5 :   24.2   21.5
vp8_put_bilin8_h_rvv_i32   :5.24.7 :3.53.5
vp8_put_bilin8_v_c :   24.5   21.7 :   24.5   21.7
vp8_put_bilin8_v_rvv_i32   :5.24.7 :3.53.2
vp8_put_bilin16_h_c:   48.0   42.7 :   48.0   42.7
vp8_put_bilin16_h_rvv_i32  :5.75.0 :5.24.5
vp8_put_bilin16_v_c:   48.2   43.0 :   48.2   42.7
vp8_put_bilin16_v_rvv_i32  :5.75.2 :4.54.2
---
 libavcodec/riscv/vp8dsp_rvv.S | 34 +-
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S
index 3360a38cac..5bea6cba9c 100644
--- a/libavcodec/riscv/vp8dsp_rvv.S
+++ b/libavcodec/riscv/vp8dsp_rvv.S
@@ -172,11 +172,35 @@ func ff_put_vp8_bilin4_\type\()_rvv, zve32x
 li  t4, 4
 sub t1, t1, \mn
 1:
-addia4, a4, -1
-bilin_load  v0, \type, \mn
-vse8.v  v0, (a0)
-add a2, a2, a3
-add a0, a0, a1
+add t0, a2, a3
+add t2, a0, a1
+addia4, a4, -2
+.ifc \type,v
+add t3, t0, a3
+.else
+addit5, a2, 1
+addit3, t0, 1
+vle8.v  v2, (t5)
+.endif
+vle8.v  v0, (a2)
+vle8.v  v4, (t0)
+vle8.v  v6, (t3)
+vwmulu.vx   v28, v0, t1
+vwmulu.vx   v26, v4, t1
+.ifc \type,v
+vwmaccu.vx  v28, \mn, v4
+.else
+vwmaccu.vx  v28, \mn, v2
+.endif
+vwmaccu.vx  v26, \mn, v6
+vwaddu.wx   v24, v28, t4
+vwaddu.wx   v22, v26, t4
+vnsra.wiv30, v24, 3
+vnsra.wiv0, v22, 3
+vse8.v  v30, (a0)
+vse8.v  v0, (t2)
+add a2, t0, a3
+add a0, t2, a1
 bneza4, 1b
 
 ret
-- 
2.45.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

2024-05-30 Thread Tomas Härdin
tor 2024-05-30 klockan 17:28 +0300 skrev Rémi Denis-Courmont:
> 
> 
> Le 30 mai 2024 17:07:21 GMT+03:00, "Tomas Härdin"  a
> écrit :
> > > We should depend on punning as long as it conforms to the
> > > standard.
> > 
> > My mistake, I forgot type punning is allowed in C. It's UB in C++
> > 
> > > > The standard compliant way
> > > > is to use memcpy()
> > > 
> > > That's way worse than union in terms of how proactively the
> > > compiler
> > > needs to optimise, and both approaches are as confirming.
> > 
> > A good compiler will do the same thing
> 
> True, and I don't care very much about memcpy vs union, as they both
> rely on matching representation. AFAIR, FFmpeg tends to use unions
> though.
> 
> > 
> > Maybe I can get the riscv version covered by Eva as well. That's
> > beyond
> > the scope of this patchset
> 
> IMHO, this specific patch (and the following one) are beating dead
> horses. Sure there may be theoretical UB in the current code, but if
> there is a *better* implementation, better switch to that than bike
> shedding the fix for the UB.

Are you saying that UB is acceptable? You know the compiler is free to
assume signed arithmetic doesn't overflow, right? If so then what other
UB might we accept?

/Tomas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-05-30 Thread flow gg
I directly copied the VP9 modifications over... Since len <= 16, it seems
like it can be improved a bit more

 于2024年5月30日周四 23:27写道:

> From: sunyuechi 
>
> Since len < 64, the registers are sufficient, so it can be
> directly unrolled (a4 is even).
>
> Another benefit of unrolling is that it reduces one load operation
> vertically compared to horizontally.
>
>  old new
>  C908   X60  C908   X60
> vp8_put_bilin4_h_c :6.25.5 :6.25.5
> vp8_put_bilin4_h_rvv_i32   :2.22.0 :1.51.5
> vp8_put_bilin4_v_c :6.55.7 :6.25.7
> vp8_put_bilin4_v_rvv_i32   :2.22.0 :1.21.5
> vp8_put_bilin8_h_c :   24.2   21.5 :   24.2   21.5
> vp8_put_bilin8_h_rvv_i32   :5.24.7 :3.53.5
> vp8_put_bilin8_v_c :   24.5   21.7 :   24.5   21.7
> vp8_put_bilin8_v_rvv_i32   :5.24.7 :3.53.2
> vp8_put_bilin16_h_c:   48.0   42.7 :   48.0   42.7
> vp8_put_bilin16_h_rvv_i32  :5.75.0 :5.24.5
> vp8_put_bilin16_v_c:   48.2   43.0 :   48.2   42.7
> vp8_put_bilin16_v_rvv_i32  :5.75.2 :4.54.2
> ---
>  libavcodec/riscv/vp8dsp_rvv.S | 34 +-
>  1 file changed, 29 insertions(+), 5 deletions(-)
>
> diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S
> index 3360a38cac..5bea6cba9c 100644
> --- a/libavcodec/riscv/vp8dsp_rvv.S
> +++ b/libavcodec/riscv/vp8dsp_rvv.S
> @@ -172,11 +172,35 @@ func ff_put_vp8_bilin4_\type\()_rvv, zve32x
>  li  t4, 4
>  sub t1, t1, \mn
>  1:
> -addia4, a4, -1
> -bilin_load  v0, \type, \mn
> -vse8.v  v0, (a0)
> -add a2, a2, a3
> -add a0, a0, a1
> +add t0, a2, a3
> +add t2, a0, a1
> +addia4, a4, -2
> +.ifc \type,v
> +add t3, t0, a3
> +.else
> +addit5, a2, 1
> +addit3, t0, 1
> +vle8.v  v2, (t5)
> +.endif
> +vle8.v  v0, (a2)
> +vle8.v  v4, (t0)
> +vle8.v  v6, (t3)
> +vwmulu.vx   v28, v0, t1
> +vwmulu.vx   v26, v4, t1
> +.ifc \type,v
> +vwmaccu.vx  v28, \mn, v4
> +.else
> +vwmaccu.vx  v28, \mn, v2
> +.endif
> +vwmaccu.vx  v26, \mn, v6
> +vwaddu.wx   v24, v28, t4
> +vwaddu.wx   v22, v26, t4
> +vnsra.wiv30, v24, 3
> +vnsra.wiv0, v22, 3
> +vse8.v  v30, (a0)
> +vse8.v  v0, (t2)
> +add a2, t0, a3
> +add a0, t2, a1
>  bneza4, 1b
>
>  ret
> --
> 2.45.1
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

2024-05-30 Thread Rémi Denis-Courmont
Le torstaina 30. toukokuuta 2024, 18.32.19 EEST Tomas Härdin a écrit :
> Are you saying that UB is acceptable?

Are you imitating Thilo and grand-standing by putting words in my mouth?

Yes and so -1 for you.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

2024-05-30 Thread James Almer

On 5/30/2024 12:32 PM, Tomas Härdin wrote:

tor 2024-05-30 klockan 17:28 +0300 skrev Rémi Denis-Courmont:



Le 30 mai 2024 17:07:21 GMT+03:00, "Tomas Härdin"  a
écrit :

We should depend on punning as long as it conforms to the
standard.


My mistake, I forgot type punning is allowed in C. It's UB in C++


The standard compliant way
is to use memcpy()


That's way worse than union in terms of how proactively the
compiler
needs to optimise, and both approaches are as confirming.


A good compiler will do the same thing


True, and I don't care very much about memcpy vs union, as they both
rely on matching representation. AFAIR, FFmpeg tends to use unions
though.



Maybe I can get the riscv version covered by Eva as well. That's
beyond
the scope of this patchset


IMHO, this specific patch (and the following one) are beating dead
horses. Sure there may be theoretical UB in the current code, but if
there is a *better* implementation, better switch to that than bike
shedding the fix for the UB.


Are you saying that UB is acceptable? You know the compiler is free to
assume signed arithmetic doesn't overflow, right? If so then what other
UB might we accept?


He did not say that... He said we should switch to a better 
implementation rather than trying to fix the existing potentially buggy one.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] lavc/vp8dsp: R-V V put_bilin_h v unroll

2024-05-30 Thread flow gg
Well.. because scalar registers are limited, the direct unrolling will be
like this for now. We can handle different lengths separately in the future

flow gg  于2024年5月30日周四 23:36写道:

> I directly copied the VP9 modifications over... Since len <= 16, it seems
> like it can be improved a bit more
>
>  于2024年5月30日周四 23:27写道:
>
>> From: sunyuechi 
>>
>> Since len < 64, the registers are sufficient, so it can be
>> directly unrolled (a4 is even).
>>
>> Another benefit of unrolling is that it reduces one load operation
>> vertically compared to horizontally.
>>
>>  old new
>>  C908   X60  C908   X60
>> vp8_put_bilin4_h_c :6.25.5 :6.25.5
>> vp8_put_bilin4_h_rvv_i32   :2.22.0 :1.51.5
>> vp8_put_bilin4_v_c :6.55.7 :6.25.7
>> vp8_put_bilin4_v_rvv_i32   :2.22.0 :1.21.5
>> vp8_put_bilin8_h_c :   24.2   21.5 :   24.2   21.5
>> vp8_put_bilin8_h_rvv_i32   :5.24.7 :3.53.5
>> vp8_put_bilin8_v_c :   24.5   21.7 :   24.5   21.7
>> vp8_put_bilin8_v_rvv_i32   :5.24.7 :3.53.2
>> vp8_put_bilin16_h_c:   48.0   42.7 :   48.0   42.7
>> vp8_put_bilin16_h_rvv_i32  :5.75.0 :5.24.5
>> vp8_put_bilin16_v_c:   48.2   43.0 :   48.2   42.7
>> vp8_put_bilin16_v_rvv_i32  :5.75.2 :4.54.2
>> ---
>>  libavcodec/riscv/vp8dsp_rvv.S | 34 +-
>>  1 file changed, 29 insertions(+), 5 deletions(-)
>>
>> diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S
>> index 3360a38cac..5bea6cba9c 100644
>> --- a/libavcodec/riscv/vp8dsp_rvv.S
>> +++ b/libavcodec/riscv/vp8dsp_rvv.S
>> @@ -172,11 +172,35 @@ func ff_put_vp8_bilin4_\type\()_rvv, zve32x
>>  li  t4, 4
>>  sub t1, t1, \mn
>>  1:
>> -addia4, a4, -1
>> -bilin_load  v0, \type, \mn
>> -vse8.v  v0, (a0)
>> -add a2, a2, a3
>> -add a0, a0, a1
>> +add t0, a2, a3
>> +add t2, a0, a1
>> +addia4, a4, -2
>> +.ifc \type,v
>> +add t3, t0, a3
>> +.else
>> +addit5, a2, 1
>> +addit3, t0, 1
>> +vle8.v  v2, (t5)
>> +.endif
>> +vle8.v  v0, (a2)
>> +vle8.v  v4, (t0)
>> +vle8.v  v6, (t3)
>> +vwmulu.vx   v28, v0, t1
>> +vwmulu.vx   v26, v4, t1
>> +.ifc \type,v
>> +vwmaccu.vx  v28, \mn, v4
>> +.else
>> +vwmaccu.vx  v28, \mn, v2
>> +.endif
>> +vwmaccu.vx  v26, \mn, v6
>> +vwaddu.wx   v24, v28, t4
>> +vwaddu.wx   v22, v26, t4
>> +vnsra.wiv30, v24, 3
>> +vnsra.wiv0, v22, 3
>> +vse8.v  v30, (a0)
>> +vse8.v  v0, (t2)
>> +add a2, t0, a3
>> +add a0, t2, a1
>>  bneza4, 1b
>>
>>  ret
>> --
>> 2.45.1
>>
>> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCHv4] checkasm/lpc: test compute_autocorr

2024-05-30 Thread Rémi Denis-Courmont
Le keskiviikkona 29. toukokuuta 2024, 23.52.20 EEST James Almer a écrit :
> On 5/29/2024 4:42 PM, Rém
> >   void checkasm_check_lpc(void)
> >   {
> >   
> >   LPCContext ctx;
> > 
> > -int len = rnd() % 5000;
> > +int len = 2000 + (rnd() % 1500) * 2;
> 
> Instead of changing how len is generated, which will break known
> existing results for specific seeds in other tests, alter the value when
> passing it to test_compute_autocorr(), like
> apply_welch_window_{even,odd}() do.

Existing benchmarks are unstable and unreliable because of the varying length. 
This was supposed to be contained by the 2000 minimum, but that means the 
benchmarks are invalidated regardless of the parity change.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCHv5] checkasm/lpc: test compute_autocorr

2024-05-30 Thread Rémi Denis-Courmont
Also restrict length to even values as per (questionable) assumption in
the reference C code.
---
 tests/checkasm/lpc.c | 59 +---
 1 file changed, 56 insertions(+), 3 deletions(-)

diff --git a/tests/checkasm/lpc.c b/tests/checkasm/lpc.c
index 592e34c03d..f9f3d84080 100644
--- a/tests/checkasm/lpc.c
+++ b/tests/checkasm/lpc.c
@@ -16,6 +16,7 @@
  * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
  */
 
+#include "libavutil/avassert.h"
 #include "libavutil/mem_internal.h"
 
 #include "libavcodec/lpc.h"
@@ -30,7 +31,7 @@
 } \
 } while (0)
 
-#define EPS 0.005
+#define EPS 0.0001
 
 static void test_window(int len)
 {
@@ -57,10 +58,51 @@ static void test_window(int len)
 bench_new(src, len, dst1);
 }
 
+#if !ARCH_X86
+static void test_compute_autocorr(ptrdiff_t len, int lag)
+{
+const double eps = EPS * (double)len;
+LOCAL_ALIGNED(32, double, src, [5000 + 2 + MAX_LPC_ORDER]);
+LOCAL_ALIGNED(16, double, dst0, [MAX_LPC_ORDER + 1]);
+LOCAL_ALIGNED(16, double, dst1, [MAX_LPC_ORDER + 1]);
+
+declare_func(void, const double *in, ptrdiff_t len, int lag, double *out);
+
+av_assert0(lag >= 0 && lag <= MAX_LPC_ORDER);
+
+for (int i = 0; i < MAX_LPC_ORDER; i++)
+src[i] = 0.;
+
+src += MAX_LPC_ORDER;
+
+for (int i = 0; i < 5000 + 2; i++) {
+src[i] = (double)rnd() / (double)UINT_MAX;
+}
+
+call_ref(src, len, lag, dst0);
+call_new(src, len, lag, dst1);
+
+for (size_t i = 0; i <= lag; i++) {
+if (!double_near_abs_eps(dst0[i], dst1[i], eps)) {
+fprintf(stderr, "%zu: %- .12f - %- .12f = % .12g\n",
+i, dst0[i], dst1[i], dst0[i] - dst1[i]);
+fail();
+break;
+}
+}
+
+bench_new(src, 4608, lag, dst1);
+}
+#endif
+
 void checkasm_check_lpc(void)
 {
 LPCContext ctx;
-int len = rnd() % 5000;
+int len = 2000 + rnd() % 3000;
+#if !ARCH_X86
+static const int lags[] = { 8, 12, };
+#endif
+
 ff_lpc_init(&ctx, 32, 16, FF_LPC_TYPE_DEFAULT);
 
 if (check_func(ctx.lpc_apply_welch_window, "apply_welch_window_even")) {
@@ -72,6 +114,17 @@ void checkasm_check_lpc(void)
 test_window(len | 1);
 }
 report("apply_welch_window_odd");
-
 ff_lpc_end(&ctx);
+
+#if !ARCH_X86
+for (size_t i = 0; i < FF_ARRAY_ELEMS(lags); i++) {
+ff_lpc_init(&ctx, len, lags[i], FF_LPC_TYPE_DEFAULT);
+if (check_func(ctx.lpc_compute_autocorr, "autocorr_%d", lags[i])) {
+test_compute_autocorr(len & ~1, lags[i]);
+/*TODO: test_compute_autocorr(len | 1, lags[i]);*/
+}
+ff_lpc_end(&ctx);
+}
+report("compute_autocorr");
+#endif
 }
-- 
2.45.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 01/10, v3] avutil: add hwcontext_amf.

2024-05-30 Thread Dmitrii Ovchinnikov
DX12 and Vulkan native encoders will expose less features compare to AMF,

at least in foreseeable feature. The missing features include low latency,

PreAnalysis including look-ahead etc. AMF context on Windows allows fully

enable SAV - ability to utilize VCNs in dGPU and APU in a single session.

AMF components including encoder and decoder has some internal optimizations

in the area of memory access for APUs that are not available in standard

3D APIs.

Eventually specialized multimedia AMD cards could be added seamlessly to

FFmpeg with AMF integration.

AMF FSR(VSR) includes YUV version with focus on videos which is not

available in AMD FSR aimed for gaming.

More advanced filters that are not available in standard 3D APIs are coming.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] Empty arch/ directories

2024-05-30 Thread Rémi Denis-Courmont
Le keskiviikkona 29. toukokuuta 2024, 19.59.56 EEST Sean McGovern a écrit :
> Hi,
> 
> It is not likely we will get anyone to step up to do DSP work for arguably
> dead architectures like SPARC, Blackfin, etc.

Super-H and Alpha.

> Maybe is it time to remove those directories that just contain a README now?

Yes, I think that it is overdue. This Sunday marks the tenth anniversary of 
Blackfin's removal. Alpha was mostly removed the previous year, but with the 
leftovers left to bit rot. The other two were removed a little earlier than 
Blackfin.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v2 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow

2024-05-30 Thread toqsxw
From: Wu Jianhua 

Some tests fails with certain seeds

tests/checkasm/checkasm 2325607578 --test=vvc_alf
checkasm: using random seed 2325607578
AVX2:
vvc_alf_filter_luma_120x20_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x24_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x28_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x32_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x36_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x40_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x44_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x48_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x52_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x56_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x60_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x64_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x68_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x72_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x76_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x80_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x84_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x88_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x92_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x96_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x100_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x104_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x108_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x112_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x116_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x120_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x124_12_avx2 (vvc_alf.c:104)
vvc_alf_filter_luma_120x128_12_avx2 (vvc_alf.c:104)
  - vvc_alf.alf_filter   [FAILED]
  - vvc_alf.alf_classify [OK]
checkasm: 28 of 9216 tests have failed

Reported-by: James Almer 
Signed-off-by: Wu Jianhua 
---
 libavcodec/x86/vvc/vvc_alf.asm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavcodec/x86/vvc/vvc_alf.asm b/libavcodec/x86/vvc/vvc_alf.asm
index 71e821c27b..f7b3e2a6cc 100644
--- a/libavcodec/x86/vvc/vvc_alf.asm
+++ b/libavcodec/x86/vvc/vvc_alf.asm
@@ -356,7 +356,8 @@ SECTION .text
 
 FILTER_VB xq
 
-paddw m0, m2
+; sum += curr
+paddsw m0, m2
 
 ; clip to pixel
 CLIPW m0, m14, m15
-- 
2.44.0.windows.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v2 2/3] avcodec/x86/vvc/vvc_alf: use xq to match ptrdiff_t

2024-05-30 Thread toqsxw
From: Wu Jianhua 

Signed-off-by: Wu Jianhua 
---
 libavcodec/x86/vvc/vvc_alf.asm | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libavcodec/x86/vvc/vvc_alf.asm b/libavcodec/x86/vvc/vvc_alf.asm
index f7b3e2a6cc..b35dd9b0e9 100644
--- a/libavcodec/x86/vvc/vvc_alf.asm
+++ b/libavcodec/x86/vvc/vvc_alf.asm
@@ -409,7 +409,7 @@ cglobal vvc_alf_filter_%2_%1bpc, 11, 15, 16, 0-0x28, dst, 
dst_stride, src, src_s
 .loop:
 pushsrcq
 pushdstq
-xor   xd, xd
+xor   xq, xq
 
 .loop_w:
 LOAD_PARAMS
@@ -417,8 +417,8 @@ cglobal vvc_alf_filter_%2_%1bpc, 11, 15, 16, 0-0x28, dst, 
dst_stride, src, src_s
 
 add srcq, 16 * ps
 add dstq, 16 * ps
-add   xd, 16
-cmp   xd, widthd
+add   xq, 16
+cmp   xq, widthq
 jl   .loop_w
 
 pop dstq
@@ -427,7 +427,7 @@ cglobal vvc_alf_filter_%2_%1bpc, 11, 15, 16, 0-0x28, dst, 
dst_stride, src, src_s
 lea dstq, [dstq + 4 * dst_strideq]
 
 lea  filterq, [filterq + 2 * strideq]
-leaclipq, [clipq + 2 * strideq]
+leaclipq, [clipq   + 2 * strideq]
 
 sub  vb_posq, 4
 sub  heightq, 4
-- 
2.44.0.windows.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v2 3/3] tests/checkasm/vvc_alf: change alf step size to 8

2024-05-30 Thread toqsxw
From: Wu Jianhua 

>From Benjamin Bross:
> for ALF where functions are in increments of 4 while 8 should be sufficient 
> according to the spec.

Signed-off-by: Wu Jianhua 
---
 tests/checkasm/vvc_alf.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/checkasm/vvc_alf.c b/tests/checkasm/vvc_alf.c
index f35fd2cd3e..84b0f9da15 100644
--- a/tests/checkasm/vvc_alf.c
+++ b/tests/checkasm/vvc_alf.c
@@ -90,8 +90,8 @@ static void check_alf_filter(VVCDSPContext *c, const int 
bit_depth)
 randomize_buffers2(filter, LUMA_PARAMS_SIZE, 1);
 randomize_buffers2(clip, LUMA_PARAMS_SIZE, 0);
 
-for (int h = 4; h <= MAX_CTU_SIZE; h += 4) {
-for (int w = 4; w <= MAX_CTU_SIZE; w += 4) {
+for (int h = 4; h <= MAX_CTU_SIZE; h += 8) {
+for (int w = 4; w <= MAX_CTU_SIZE; w += 8) {
 const int ctu_size = MAX_CTU_SIZE;
 if (check_func(c->alf.filter[LUMA], 
"vvc_alf_filter_luma_%dx%d_%d", w, h, bit_depth)) {
 const int vb_pos = ctu_size - ALF_VB_POS_ABOVE_LUMA;
@@ -142,8 +142,8 @@ static void check_alf_classify(VVCDSPContext *c, const int 
bit_depth)
 
 randomize_buffers(src0, src1, SRC_BUF_SIZE);
 
-for (int h = 4; h <= MAX_CTU_SIZE; h += 4) {
-for (int w = 4; w <= MAX_CTU_SIZE; w += 4) {
+for (int h = 4; h <= MAX_CTU_SIZE; h += 8) {
+for (int w = 4; w <= MAX_CTU_SIZE; w += 8) {
 const int id_size = w * h / ALF_BLOCK_SIZE / ALF_BLOCK_SIZE * 
sizeof(int);
 const int vb_pos  = MAX_CTU_SIZE - ALF_BLOCK_SIZE;
 if (check_func(c->alf.classify, "vvc_alf_classify_%dx%d_%d", w, h, 
bit_depth)) {
-- 
2.44.0.windows.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] 回复: [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow

2024-05-30 Thread Wu Jianhua
Ronald S. Bultje:
> 发件人: Ronald S. Bultje 
> 发送时间: 2024年5月29日 13:56
> 收件人: Wu Jianhua
> 抄送: FFmpeg development discussions and patches; Nuo Mi; James Almer
> 主题: Re: [FFmpeg-devel] [PATCH 1/3] avcodec/x86/vvc/vvc_alf: fix integer 
> overflow
> 
> Hi,
> 
> On Wed, May 29, 2024 at 3:44 PM Wu Jianhua 
> mailto:toq...@outlook.com>> wrote:
> Ronald S. Bultje:
>> On Wed, May 29, 2024 at 11:38 AM 
>> mailto:toq...@outlook.com>> 
>> >> wrote:
>> +%else
>> +vpunpcklqdq  m11, m2, m2
>> +vpunpckhqdq  m12, m2, m2
>> +vpunpcklwd   m11, m11, m14
>> +vpunpcklwd   m12, m12, m14
>> +paddd m0, m11
>> +paddd m1, m12
>>  +packssdw  m0, m0, m1
>> +%endif
>
> [..]
> > Also, the whole thing just emulates a saturated add. Can't you use paddsw 
> > instead of paddw and be done with it? To add to Andreas' question: is >>  
> > saturating here normatively required?
> 
> > We didn't have any sample that failed for this issue except for the 
> > checksum with specific seeds. I think we can keep not changing it until a 
> > real  sample has something wrong.
> 
> @Nuomi to get more details.
> 
> I think "just" replacing paddw with paddsw is correct, since the input pixels 
> are 12bit (so they could be either unsigned or signed), the filtered output > 
> is the result of packssdw (so signed words), and the desired output is 12bit 
> pixels anyway, anything greater than that is clipped to 12bit range. So to > 
> me, it seems paddsw is a cheaper way to accomplish the same thing.
> 
> Ronald

Hi Ronald,

Yes, it does. I've test paddsw and everything works well. It must be a cheaper 
way to get minimum performance loss.

And v2 sent.

Thanks for this.
Jianhua
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

2024-05-30 Thread Tomas Härdin
tor 2024-05-30 klockan 12:42 -0300 skrev James Almer:
> On 5/30/2024 12:32 PM, Tomas Härdin wrote:
> > tor 2024-05-30 klockan 17:28 +0300 skrev Rémi Denis-Courmont:
> > > 
> > > 
> > > Le 30 mai 2024 17:07:21 GMT+03:00, "Tomas Härdin"
> > >  a
> > > écrit :
> > > > > We should depend on punning as long as it conforms to the
> > > > > standard.
> > > > 
> > > > My mistake, I forgot type punning is allowed in C. It's UB in
> > > > C++
> > > > 
> > > > > > The standard compliant way
> > > > > > is to use memcpy()
> > > > > 
> > > > > That's way worse than union in terms of how proactively the
> > > > > compiler
> > > > > needs to optimise, and both approaches are as confirming.
> > > > 
> > > > A good compiler will do the same thing
> > > 
> > > True, and I don't care very much about memcpy vs union, as they
> > > both
> > > rely on matching representation. AFAIR, FFmpeg tends to use
> > > unions
> > > though.
> > > 
> > > > 
> > > > Maybe I can get the riscv version covered by Eva as well.
> > > > That's
> > > > beyond
> > > > the scope of this patchset
> > > 
> > > IMHO, this specific patch (and the following one) are beating
> > > dead
> > > horses. Sure there may be theoretical UB in the current code, but
> > > if
> > > there is a *better* implementation, better switch to that than
> > > bike
> > > shedding the fix for the UB.
> > 
> > Are you saying that UB is acceptable? You know the compiler is free
> > to
> > assume signed arithmetic doesn't overflow, right? If so then what
> > other
> > UB might we accept?
> 
> He did not say that... He said we should switch to a better 
> implementation rather than trying to fix the existing potentially
> buggy one.

I have a fix for demonstrable UB and Rémi is problematizing it. It is
not a "theoretical" UB - that's not how UB works. Any compiler doing
basic value analysis will find it, and is therefore free to do whatever
it wants, for example deleting all calls to av_clipl_int32_c().

We could certainly replace some of these functions with intrinsics, but
that's not what this patchset is about. I don't know what set of
compilers we support. I don't know what intrinsics they support. Am I
to be compelled to figure that out, and provide the necessary
intrinsics for all of them?

This may all seem trivial, and it is, but this patchset is also a test
balloon. Line struggle is important. What I see is the stalling of
fixes of *known broken code*. That is not encouraging.

/Tomas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 01/10, v3] avutil: add hwcontext_amf.

2024-05-30 Thread Lynne via ffmpeg-devel

On 30/05/2024 18:06, Dmitrii Ovchinnikov wrote:
DX12 and Vulkan native encoders will expose less features compare to 
AMF,


at least in foreseeable feature. The missing features include low 
latency,


That's plainly not true.

PreAnalysis including look-ahead etc. AMF context on Windows allows 
fully


enable SAV - ability to utilize VCNs in dGPU and APU in a single 
session.


You should try talking internally to learn what is in progress.

AMF components including encoder and decoder has some internal 
optimizations 


in the area of memory access for APUs that are not available in standard 



3D APIs.


This isn't OpenGL.


Eventually specialized multimedia AMD cards could be added seamlessly to

FFmpeg with AMF integration.

AMF FSR(VSR) includes YUV version with focus on videos which is not

available in AMD FSR aimed for gaming.


Why don't you open source it then?

More advanced filters that are not available in standard 3D APIs are 
coming. __

We could have them as Vulkan filters.


I'm not objecting on this patch, but I am concerned that it's more 
proprietary code which is soon going to be redundant.


I will have to review it properly at some point.


OpenPGP_0xA2FEA5F03F034464.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3] avformat/nutdec: Don't create inconsistent side data

2024-05-30 Thread Michael Niedermayer
On Thu, May 30, 2024 at 02:14:20AM +0200, Andreas Rheinhardt wrote:
> Forgotten in 65ddc74988245a01421a63c5cffa4d900c47117c.
> 
> Signed-off-by: Andreas Rheinhardt 
> ---
>  libavformat/nutdec.c | 14 --
>  1 file changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/libavformat/nutdec.c b/libavformat/nutdec.c
> index 0bb7f154db..34b7e3cb9a 100644
> --- a/libavformat/nutdec.c
> +++ b/libavformat/nutdec.c
> @@ -881,8 +881,6 @@ static int read_sm_data(AVFormatContext *s, AVIOContext 
> *bc, AVPacket *pkt, int
>  int count = ffio_read_varlen(bc);
>  int skip_start = 0;
>  int skip_end = 0;
> -int channels = 0;
> -int64_t channel_layout = 0;
>  int sample_rate = 0;
>  int width = 0;
>  int height = 0;
> @@ -930,7 +928,7 @@ static int read_sm_data(AVFormatContext *s, AVIOContext 
> *bc, AVPacket *pkt, int
>  AV_WB64(dst, v64);
>  dst += 8;
>  } else if (!strcmp(name, "ChannelLayout") && value_len == 8) {
> -channel_layout = avio_rl64(bc);
> +// Ignored
>  continue;
>  } else {
>  av_log(s, AV_LOG_WARNING, "Unknown data %s / %s\n", name, 
> type_str);
> @@ -952,7 +950,7 @@ static int read_sm_data(AVFormatContext *s, AVIOContext 
> *bc, AVPacket *pkt, int
>  } else if (!strcmp(name, "SkipEnd")) {
>  skip_end = value;
>  } else if (!strcmp(name, "Channels")) {
> -channels = value;
> +// Ignored
>  } else if (!strcmp(name, "SampleRate")) {
>  sample_rate = value;
>  } else if (!strcmp(name, "Width")) {
> @@ -965,18 +963,14 @@ static int read_sm_data(AVFormatContext *s, AVIOContext 
> *bc, AVPacket *pkt, int
>  }
>  }
>  
> -if (channels || channel_layout || sample_rate || width || height) {
> -uint8_t *dst = av_packet_new_side_data(pkt, 
> AV_PKT_DATA_PARAM_CHANGE, 28);
> +if (sample_rate || width || height) {
> +uint8_t *dst = av_packet_new_side_data(pkt, 
> AV_PKT_DATA_PARAM_CHANGE, 16);
>  if (!dst)
>  return AVERROR(ENOMEM);
>  bytestream_put_le32(&dst,
>  
> AV_SIDE_DATA_PARAM_CHANGE_SAMPLE_RATE*(!!sample_rate) +
>  
> AV_SIDE_DATA_PARAM_CHANGE_DIMENSIONS*(!!(width|height))
> );
> -if (channels)
> -bytestream_put_le32(&dst, channels);
> -if (channel_layout)
> -bytestream_put_le64(&dst, channel_layout);
>  if (sample_rate)
>  bytestream_put_le32(&dst, sample_rate);
>  if (width || height){

This would break mid stream changes to the channel layout & channels when it
is carried at format level only

The commit message also does not adequately explain why such mid stream changes
are ignored

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Into a blind darkness they enter who follow after the Ignorance,
they as if into a greater darkness enter who devote themselves
to the Knowledge alone. -- Isha Upanishad


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

2024-05-30 Thread Rémi Denis-Courmont
Le torstaina 30. toukokuuta 2024, 19.48.13 EEST Tomas Härdin a écrit :
> > > Are you saying that UB is acceptable? You know the compiler is free
> > > to
> > > assume signed arithmetic doesn't overflow, right? If so then what
> > > other
> > > UB might we accept?
> > 
> > He did not say that... He said we should switch to a better
> > implementation rather than trying to fix the existing potentially
> > buggy one.
> 
> I have a fix for demonstrable UB and Rémi is problematizing it.

Andreas made cosmetic arguments against this patch before I had even seen the 
patch, forget comment on it.

> It is not a "theoretical" UB - that's not how UB works.

It is a *theoretical* UB if you can not prove that it leads to misbehaviour in 
any *practical* use. In theory, all UB is *potentially* fatal. Emphasis on 
potentially.

So yes, while all UB instances are bad and deserve fixing, they are not all 
equally bad nor urgent. UB that is proven to lead to remote code execution is 
way worse than theoretical UB that has only been proven in literature, and is 
not known or even seriously suspected to lead to broken optimisations.

> Any compiler doing
> basic value analysis will find it, and is therefore free to do whatever
> it wants, for example deleting all calls to av_clipl_int32_c().

That is formally true. But it is also formally true that, by that same logic, 
since there is most certainly some UB instance left elsewhere in the codebase, 
the entirety of libavutil could be elided by the compiler. In other words, in 
theory, FFmpeg does not work at all. Does that mean that we should give up on 
the project here and now?

> We could certainly replace some of these functions with intrinsics, but
> that's not what this patchset is about.

I am not sure what is your point because nobody said that av_clipl_int32_c() 
should be replaced by intrinsics.

> I don't know what set of compilers we support.

That is irrelevant since all C99, C11 and C23 compilers support the proposed 
substitute code as long as  defines int32_t.

> I don't know what intrinsics they support.

Also irrelevant.

> Am I to be compelled to figure that out, and provide the necessary
> intrinsics for all of them?

No, and you are the only person to have made an implication to the contrary as 
far as *this* patch is concerned.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3] avformat/nutdec: Don't create inconsistent side data

2024-05-30 Thread Andreas Rheinhardt
Michael Niedermayer:
> On Thu, May 30, 2024 at 02:14:20AM +0200, Andreas Rheinhardt wrote:
>> Forgotten in 65ddc74988245a01421a63c5cffa4d900c47117c.
>>
>> Signed-off-by: Andreas Rheinhardt 
>> ---
>>  libavformat/nutdec.c | 14 --
>>  1 file changed, 4 insertions(+), 10 deletions(-)
>>
>> diff --git a/libavformat/nutdec.c b/libavformat/nutdec.c
>> index 0bb7f154db..34b7e3cb9a 100644
>> --- a/libavformat/nutdec.c
>> +++ b/libavformat/nutdec.c
>> @@ -881,8 +881,6 @@ static int read_sm_data(AVFormatContext *s, AVIOContext 
>> *bc, AVPacket *pkt, int
>>  int count = ffio_read_varlen(bc);
>>  int skip_start = 0;
>>  int skip_end = 0;
>> -int channels = 0;
>> -int64_t channel_layout = 0;
>>  int sample_rate = 0;
>>  int width = 0;
>>  int height = 0;
>> @@ -930,7 +928,7 @@ static int read_sm_data(AVFormatContext *s, AVIOContext 
>> *bc, AVPacket *pkt, int
>>  AV_WB64(dst, v64);
>>  dst += 8;
>>  } else if (!strcmp(name, "ChannelLayout") && value_len == 8) {
>> -channel_layout = avio_rl64(bc);
>> +// Ignored
>>  continue;
>>  } else {
>>  av_log(s, AV_LOG_WARNING, "Unknown data %s / %s\n", name, 
>> type_str);
>> @@ -952,7 +950,7 @@ static int read_sm_data(AVFormatContext *s, AVIOContext 
>> *bc, AVPacket *pkt, int
>>  } else if (!strcmp(name, "SkipEnd")) {
>>  skip_end = value;
>>  } else if (!strcmp(name, "Channels")) {
>> -channels = value;
>> +// Ignored
>>  } else if (!strcmp(name, "SampleRate")) {
>>  sample_rate = value;
>>  } else if (!strcmp(name, "Width")) {
>> @@ -965,18 +963,14 @@ static int read_sm_data(AVFormatContext *s, 
>> AVIOContext *bc, AVPacket *pkt, int
>>  }
>>  }
>>  
>> -if (channels || channel_layout || sample_rate || width || height) {
>> -uint8_t *dst = av_packet_new_side_data(pkt, 
>> AV_PKT_DATA_PARAM_CHANGE, 28);
>> +if (sample_rate || width || height) {
>> +uint8_t *dst = av_packet_new_side_data(pkt, 
>> AV_PKT_DATA_PARAM_CHANGE, 16);
>>  if (!dst)
>>  return AVERROR(ENOMEM);
>>  bytestream_put_le32(&dst,
>>  
>> AV_SIDE_DATA_PARAM_CHANGE_SAMPLE_RATE*(!!sample_rate) +
>>  
>> AV_SIDE_DATA_PARAM_CHANGE_DIMENSIONS*(!!(width|height))
>> );
>> -if (channels)
>> -bytestream_put_le32(&dst, channels);
>> -if (channel_layout)
>> -bytestream_put_le64(&dst, channel_layout);
>>  if (sample_rate)
>>  bytestream_put_le32(&dst, sample_rate);
>>  if (width || height){
> 
> This would break mid stream changes to the channel layout & channels when it
> is carried at format level only
> 
> The commit message also does not adequately explain why such mid stream 
> changes
> are ignored
> 

Mid-stream changes like this have been deprecated in
09b5d3fb44ae1036700f80c8c80b15e9074c58c3;
65ddc74988245a01421a63c5cffa4d900c47117c removed it, but only
incompletely: The side data flags for channel count and channel layout
changes were no longer written (in fact, they were removed from
packet.h), yet it still wrote the rest of the side data as if these
flags existed and had been written. That is the inconsistency this
commit addresses. It does not address whether channel count/layout
updates should have been removed, because that has already happened.

- Andreas

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3] avformat/nutdec: Don't create inconsistent side data

2024-05-30 Thread Michael Niedermayer
On Thu, May 30, 2024 at 07:53:42PM +0200, Andreas Rheinhardt wrote:
> Michael Niedermayer:
> > On Thu, May 30, 2024 at 02:14:20AM +0200, Andreas Rheinhardt wrote:
> >> Forgotten in 65ddc74988245a01421a63c5cffa4d900c47117c.
> >>
> >> Signed-off-by: Andreas Rheinhardt 
> >> ---
> >>  libavformat/nutdec.c | 14 --
> >>  1 file changed, 4 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/libavformat/nutdec.c b/libavformat/nutdec.c
> >> index 0bb7f154db..34b7e3cb9a 100644
> >> --- a/libavformat/nutdec.c
> >> +++ b/libavformat/nutdec.c
> >> @@ -881,8 +881,6 @@ static int read_sm_data(AVFormatContext *s, 
> >> AVIOContext *bc, AVPacket *pkt, int
> >>  int count = ffio_read_varlen(bc);
> >>  int skip_start = 0;
> >>  int skip_end = 0;
> >> -int channels = 0;
> >> -int64_t channel_layout = 0;
> >>  int sample_rate = 0;
> >>  int width = 0;
> >>  int height = 0;
> >> @@ -930,7 +928,7 @@ static int read_sm_data(AVFormatContext *s, 
> >> AVIOContext *bc, AVPacket *pkt, int
> >>  AV_WB64(dst, v64);
> >>  dst += 8;
> >>  } else if (!strcmp(name, "ChannelLayout") && value_len == 8) {
> >> -channel_layout = avio_rl64(bc);
> >> +// Ignored
> >>  continue;
> >>  } else {
> >>  av_log(s, AV_LOG_WARNING, "Unknown data %s / %s\n", name, 
> >> type_str);
> >> @@ -952,7 +950,7 @@ static int read_sm_data(AVFormatContext *s, 
> >> AVIOContext *bc, AVPacket *pkt, int
> >>  } else if (!strcmp(name, "SkipEnd")) {
> >>  skip_end = value;
> >>  } else if (!strcmp(name, "Channels")) {
> >> -channels = value;
> >> +// Ignored
> >>  } else if (!strcmp(name, "SampleRate")) {
> >>  sample_rate = value;
> >>  } else if (!strcmp(name, "Width")) {
> >> @@ -965,18 +963,14 @@ static int read_sm_data(AVFormatContext *s, 
> >> AVIOContext *bc, AVPacket *pkt, int
> >>  }
> >>  }
> >>  
> >> -if (channels || channel_layout || sample_rate || width || height) {
> >> -uint8_t *dst = av_packet_new_side_data(pkt, 
> >> AV_PKT_DATA_PARAM_CHANGE, 28);
> >> +if (sample_rate || width || height) {
> >> +uint8_t *dst = av_packet_new_side_data(pkt, 
> >> AV_PKT_DATA_PARAM_CHANGE, 16);
> >>  if (!dst)
> >>  return AVERROR(ENOMEM);
> >>  bytestream_put_le32(&dst,
> >>  
> >> AV_SIDE_DATA_PARAM_CHANGE_SAMPLE_RATE*(!!sample_rate) +
> >>  
> >> AV_SIDE_DATA_PARAM_CHANGE_DIMENSIONS*(!!(width|height))
> >> );
> >> -if (channels)
> >> -bytestream_put_le32(&dst, channels);
> >> -if (channel_layout)
> >> -bytestream_put_le64(&dst, channel_layout);
> >>  if (sample_rate)
> >>  bytestream_put_le32(&dst, sample_rate);
> >>  if (width || height){
> > 
> > This would break mid stream changes to the channel layout & channels when it
> > is carried at format level only
> > 
> > The commit message also does not adequately explain why such mid stream 
> > changes
> > are ignored
> > 
> 
> Mid-stream changes like this have been deprecated in
> 09b5d3fb44ae1036700f80c8c80b15e9074c58c3;
> 65ddc74988245a01421a63c5cffa4d900c47117c removed it, but only
> incompletely: The side data flags for channel count and channel layout
> changes were no longer written (in fact, they were removed from
> packet.h), yet it still wrote the rest of the side data as if these
> flags existed and had been written. That is the inconsistency this
> commit addresses. It does not address whether channel count/layout
> updates should have been removed, because that has already happened.

i honestly belive that we should support changing channel(layout) for
cases like PCM in nut

thx

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I do not agree with what you have to say, but I'll defend to the death your
right to say it. -- Voltaire


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3] avformat/nutdec: Don't create inconsistent side data

2024-05-30 Thread Andreas Rheinhardt
Michael Niedermayer:
> On Thu, May 30, 2024 at 07:53:42PM +0200, Andreas Rheinhardt wrote:
>> Michael Niedermayer:
>>> On Thu, May 30, 2024 at 02:14:20AM +0200, Andreas Rheinhardt wrote:
 Forgotten in 65ddc74988245a01421a63c5cffa4d900c47117c.

 Signed-off-by: Andreas Rheinhardt 
 ---
  libavformat/nutdec.c | 14 --
  1 file changed, 4 insertions(+), 10 deletions(-)

 diff --git a/libavformat/nutdec.c b/libavformat/nutdec.c
 index 0bb7f154db..34b7e3cb9a 100644
 --- a/libavformat/nutdec.c
 +++ b/libavformat/nutdec.c
 @@ -881,8 +881,6 @@ static int read_sm_data(AVFormatContext *s, 
 AVIOContext *bc, AVPacket *pkt, int
  int count = ffio_read_varlen(bc);
  int skip_start = 0;
  int skip_end = 0;
 -int channels = 0;
 -int64_t channel_layout = 0;
  int sample_rate = 0;
  int width = 0;
  int height = 0;
 @@ -930,7 +928,7 @@ static int read_sm_data(AVFormatContext *s, 
 AVIOContext *bc, AVPacket *pkt, int
  AV_WB64(dst, v64);
  dst += 8;
  } else if (!strcmp(name, "ChannelLayout") && value_len == 8) {
 -channel_layout = avio_rl64(bc);
 +// Ignored
  continue;
  } else {
  av_log(s, AV_LOG_WARNING, "Unknown data %s / %s\n", name, 
 type_str);
 @@ -952,7 +950,7 @@ static int read_sm_data(AVFormatContext *s, 
 AVIOContext *bc, AVPacket *pkt, int
  } else if (!strcmp(name, "SkipEnd")) {
  skip_end = value;
  } else if (!strcmp(name, "Channels")) {
 -channels = value;
 +// Ignored
  } else if (!strcmp(name, "SampleRate")) {
  sample_rate = value;
  } else if (!strcmp(name, "Width")) {
 @@ -965,18 +963,14 @@ static int read_sm_data(AVFormatContext *s, 
 AVIOContext *bc, AVPacket *pkt, int
  }
  }
  
 -if (channels || channel_layout || sample_rate || width || height) {
 -uint8_t *dst = av_packet_new_side_data(pkt, 
 AV_PKT_DATA_PARAM_CHANGE, 28);
 +if (sample_rate || width || height) {
 +uint8_t *dst = av_packet_new_side_data(pkt, 
 AV_PKT_DATA_PARAM_CHANGE, 16);
  if (!dst)
  return AVERROR(ENOMEM);
  bytestream_put_le32(&dst,
  
 AV_SIDE_DATA_PARAM_CHANGE_SAMPLE_RATE*(!!sample_rate) +
  
 AV_SIDE_DATA_PARAM_CHANGE_DIMENSIONS*(!!(width|height))
 );
 -if (channels)
 -bytestream_put_le32(&dst, channels);
 -if (channel_layout)
 -bytestream_put_le64(&dst, channel_layout);
  if (sample_rate)
  bytestream_put_le32(&dst, sample_rate);
  if (width || height){
>>>
>>> This would break mid stream changes to the channel layout & channels when it
>>> is carried at format level only
>>>
>>> The commit message also does not adequately explain why such mid stream 
>>> changes
>>> are ignored
>>>
>>
>> Mid-stream changes like this have been deprecated in
>> 09b5d3fb44ae1036700f80c8c80b15e9074c58c3;
>> 65ddc74988245a01421a63c5cffa4d900c47117c removed it, but only
>> incompletely: The side data flags for channel count and channel layout
>> changes were no longer written (in fact, they were removed from
>> packet.h), yet it still wrote the rest of the side data as if these
>> flags existed and had been written. That is the inconsistency this
>> commit addresses. It does not address whether channel count/layout
>> updates should have been removed, because that has already happened.
> 
> i honestly belive that we should support changing channel(layout) for
> cases like PCM in nut
> 

That is orthogonal to this patch (which just wants to not create
inconsistent side data).

- Andreas

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow

2024-05-30 Thread Ronald S. Bultje
Hi,

On Thu, May 30, 2024 at 12:28 PM  wrote:

> From: Wu Jianhua 
>
> Some tests fails with certain seeds
>
> tests/checkasm/checkasm 2325607578 --test=vvc_alf
> checkasm: using random seed 2325607578
> AVX2:
> vvc_alf_filter_luma_120x20_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x24_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x28_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x32_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x36_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x40_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x44_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x48_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x52_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x56_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x60_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x64_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x68_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x72_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x76_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x80_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x84_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x88_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x92_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x96_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x100_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x104_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x108_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x112_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x116_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x120_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x124_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x128_12_avx2 (vvc_alf.c:104)
>   - vvc_alf.alf_filter   [FAILED]
>   - vvc_alf.alf_classify [OK]
> checkasm: 28 of 9216 tests have failed
>
> Reported-by: James Almer 
> Signed-off-by: Wu Jianhua 
> ---
>  libavcodec/x86/vvc/vvc_alf.asm | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/libavcodec/x86/vvc/vvc_alf.asm
> b/libavcodec/x86/vvc/vvc_alf.asm
> index 71e821c27b..f7b3e2a6cc 100644
> --- a/libavcodec/x86/vvc/vvc_alf.asm
> +++ b/libavcodec/x86/vvc/vvc_alf.asm
> @@ -356,7 +356,8 @@ SECTION .text
>
>  FILTER_VB xq
>
> -paddw m0, m2
> +; sum += curr
> +paddsw m0, m2
>
>  ; clip to pixel
>  CLIPW m0, m14, m15
> --
> 2.44.0.windows.1
>

LGTM.

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow

2024-05-30 Thread Andreas Rheinhardt
toq...@outlook.com:
> From: Wu Jianhua 
> 
> Some tests fails with certain seeds
> 
> tests/checkasm/checkasm 2325607578 --test=vvc_alf
> checkasm: using random seed 2325607578
> AVX2:
> vvc_alf_filter_luma_120x20_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x24_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x28_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x32_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x36_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x40_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x44_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x48_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x52_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x56_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x60_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x64_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x68_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x72_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x76_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x80_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x84_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x88_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x92_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x96_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x100_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x104_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x108_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x112_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x116_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x120_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x124_12_avx2 (vvc_alf.c:104)
> vvc_alf_filter_luma_120x128_12_avx2 (vvc_alf.c:104)
>   - vvc_alf.alf_filter   [FAILED]
>   - vvc_alf.alf_classify [OK]
> checkasm: 28 of 9216 tests have failed
> 
> Reported-by: James Almer 
> Signed-off-by: Wu Jianhua 
> ---
>  libavcodec/x86/vvc/vvc_alf.asm | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/libavcodec/x86/vvc/vvc_alf.asm b/libavcodec/x86/vvc/vvc_alf.asm
> index 71e821c27b..f7b3e2a6cc 100644
> --- a/libavcodec/x86/vvc/vvc_alf.asm
> +++ b/libavcodec/x86/vvc/vvc_alf.asm
> @@ -356,7 +356,8 @@ SECTION .text
>  
>  FILTER_VB xq
>  
> -paddw m0, m2
> +; sum += curr
> +paddsw m0, m2
>  
>  ; clip to pixel
>  CLIPW m0, m14, m15

And can I get an answer to the question of whether the issue is present
when used by the actual decoder and not only the checkasm test?

- Andreas

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/7] avcodec/vc2enc: Avoid void* where possible

2024-05-30 Thread Andreas Rheinhardt
Andreas Rheinhardt:
> Signed-off-by: Andreas Rheinhardt 
> ---
>  libavcodec/vc2enc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/vc2enc.c b/libavcodec/vc2enc.c
> index 365d43146d..b496f67d3a 100644
> --- a/libavcodec/vc2enc.c
> +++ b/libavcodec/vc2enc.c
> @@ -106,7 +106,7 @@ typedef struct Plane {
>  typedef struct SliceArgs {
>  PutBitContext pb;
>  int cache[DIRAC_MAX_QUANT_INDEX];
> -void *ctx;
> +struct VC2EncContext *ctx;
>  int x;
>  int y;
>  int quant_idx;
> @@ -116,7 +116,7 @@ typedef struct SliceArgs {
>  } SliceArgs;
>  
>  typedef struct TransformArgs {
> -void *ctx;
> +struct VC2EncContext *ctx;
>  Plane *plane;
>  const void *idata;
>  ptrdiff_t istride;

Will apply the patchset with 5/7 using uint8_t tomorrow unless there are
objections.

- Andreas

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/2] avcodec/diracdec: Use FF_CODEC_CAP_INIT_CLEANUP

2024-05-30 Thread Andreas Rheinhardt
Andreas Rheinhardt:
> This was one of the few decoders incompatible with the flag.
> Also only call free_sequence_buffers(), dirac_decode_flush()
> in dirac_decode_end().
> 
> Signed-off-by: Andreas Rheinhardt 
> ---
>  libavcodec/diracdec.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/libavcodec/diracdec.c b/libavcodec/diracdec.c
> index 3a36479c59..5bf0dcc2db 100644
> --- a/libavcodec/diracdec.c
> +++ b/libavcodec/diracdec.c
> @@ -403,11 +403,8 @@ static av_cold int dirac_decode_init(AVCodecContext 
> *avctx)
>  
>  for (i = 0; i < MAX_FRAMES; i++) {
>  s->all_frames[i].avframe = av_frame_alloc();
> -if (!s->all_frames[i].avframe) {
> -while (i > 0)
> -av_frame_free(&s->all_frames[--i].avframe);
> +if (!s->all_frames[i].avframe)
>  return AVERROR(ENOMEM);
> -}
>  }
>  ret = ff_thread_once(&dirac_arith_init, ff_dirac_init_arith_tables);
>  if (ret != 0)
> @@ -429,7 +426,9 @@ static av_cold int dirac_decode_end(AVCodecContext *avctx)
>  DiracContext *s = avctx->priv_data;
>  int i;
>  
> -dirac_decode_flush(avctx);
> +// Necessary in case dirac_decode_init() failed
> +if (s->all_frames[MAX_FRAMES - 1].avframe)
> +free_sequence_buffers(s);
>  for (i = 0; i < MAX_FRAMES; i++)
>  av_frame_free(&s->all_frames[i].avframe);
>  
> @@ -2371,4 +2370,5 @@ const FFCodec ff_dirac_decoder = {
>  FF_CODEC_DECODE_CB(dirac_decode_frame),
>  .p.capabilities = AV_CODEC_CAP_DELAY | AV_CODEC_CAP_SLICE_THREADS | 
> AV_CODEC_CAP_DR1,
>  .flush  = dirac_decode_flush,
> +.caps_internal  = FF_CODEC_CAP_INIT_CLEANUP,
>  };

Will apply the patchset tomorrow unless there are objections.

- Andreas

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 2/5] lavu/lls: use ff_scalarproduct_double_c()

2024-05-30 Thread Rémi Denis-Courmont
---
 libavutil/lls.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/libavutil/lls.c b/libavutil/lls.c
index c1e038daf1..1096ae69d5 100644
--- a/libavutil/lls.c
+++ b/libavutil/lls.c
@@ -30,6 +30,7 @@
 
 #include "config.h"
 #include "attributes.h"
+#include "float_dsp.h"
 #include "lls.h"
 
 static void update_lls(LLSModel *m, const double *var)
@@ -102,13 +103,7 @@ void avpriv_solve_lls(LLSModel *m, double threshold, 
unsigned short min_order)
 
 static double evaluate_lls(LLSModel *m, const double *param, int order)
 {
-int i;
-double out = 0;
-
-for (i = 0; i <= order; i++)
-out += param[i] * m->coeff[order][i];
-
-return out;
+return ff_scalarproduct_double_c(m->coeff[order], param, order + 1);
 }
 
 av_cold void avpriv_init_lls(LLSModel *m, int indep_count)
-- 
2.45.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 4/5] checkasm/float_dsp: add double-precision scalar product

2024-05-30 Thread Rémi Denis-Courmont
---
 tests/checkasm/float_dsp.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/tests/checkasm/float_dsp.c b/tests/checkasm/float_dsp.c
index cadfa65e2a..296db1cff9 100644
--- a/tests/checkasm/float_dsp.c
+++ b/tests/checkasm/float_dsp.c
@@ -278,6 +278,22 @@ static void test_scalarproduct_float(const float *src0, 
const float *src1)
 bench_new(src0, src1, LEN);
 }
 
+static void test_scalarproduct_double(const double *src0, const double *src1)
+{
+double cprod, oprod;
+
+declare_func_float(double, const double *, const double *, size_t);
+
+cprod = call_ref(src0, src1, LEN);
+oprod = call_new(src0, src1, LEN);
+if (!double_near_abs_eps(cprod, oprod, ARBITRARY_SCALARPRODUCT_CONST)) {
+fprintf(stderr, "%- .12f - %- .12f = % .12g\n",
+cprod, oprod, cprod - oprod);
+fail();
+}
+bench_new(src0, src1, LEN);
+}
+
 void checkasm_check_float_dsp(void)
 {
 LOCAL_ALIGNED_32(float,  src0, [LEN]);
@@ -334,6 +350,9 @@ void checkasm_check_float_dsp(void)
 if (check_func(fdsp->scalarproduct_float, "scalarproduct_float"))
 test_scalarproduct_float(src3, src4);
 report("scalarproduct_float");
+if (check_func(fdsp->scalarproduct_double, "scalarproduct_double"))
+test_scalarproduct_double(dbl_src0, dbl_src1);
+report("scalarproduct_double");
 
 av_freep(&fdsp);
 }
-- 
2.45.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCHv2 1/5] lavu/float_dsp: add double-precision scalar product

2024-05-30 Thread Rémi Denis-Courmont
The function pointer is appended to the structure for backward binary
compatibility. Fortunately, this is allocated by libavutil, not by the
user, so increasing the structure size is safe.
---
 libavutil/float_dsp.c | 12 
 libavutil/float_dsp.h | 31 ++-
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/libavutil/float_dsp.c b/libavutil/float_dsp.c
index e9fb023466..08bbc85e3e 100644
--- a/libavutil/float_dsp.c
+++ b/libavutil/float_dsp.c
@@ -132,6 +132,17 @@ float avpriv_scalarproduct_float_c(const float *v1, const 
float *v2, int len)
 return p;
 }
 
+double ff_scalarproduct_double_c(const double *v1, const double *v2,
+ size_t len)
+{
+double p = 0.0;
+
+for (size_t i = 0; i < len; i++)
+p += v1[i] * v2[i];
+
+return p;
+}
+
 av_cold AVFloatDSPContext *avpriv_float_dsp_alloc(int bit_exact)
 {
 AVFloatDSPContext *fdsp = av_mallocz(sizeof(AVFloatDSPContext));
@@ -149,6 +160,7 @@ av_cold AVFloatDSPContext *avpriv_float_dsp_alloc(int 
bit_exact)
 fdsp->vector_fmul_reverse = vector_fmul_reverse_c;
 fdsp->butterflies_float = butterflies_float_c;
 fdsp->scalarproduct_float = avpriv_scalarproduct_float_c;
+fdsp->scalarproduct_double = ff_scalarproduct_double_c;
 
 #if ARCH_AARCH64
 ff_float_dsp_init_aarch64(fdsp);
diff --git a/libavutil/float_dsp.h b/libavutil/float_dsp.h
index 342a8715c5..5053aa240d 100644
--- a/libavutil/float_dsp.h
+++ b/libavutil/float_dsp.h
@@ -19,6 +19,8 @@
 #ifndef AVUTIL_FLOAT_DSP_H
 #define AVUTIL_FLOAT_DSP_H
 
+#include 
+
 typedef struct AVFloatDSPContext {
 /**
  * Calculate the entry wise product of two vectors of floats and store the 
result in
@@ -187,19 +189,46 @@ typedef struct AVFloatDSPContext {
  */
 void (*vector_dmul)(double *dst, const double *src0, const double *src1,
 int len);
+
+/**
+ * Calculate the scalar product of two vectors of doubles.
+ *
+ * @param v1  first vector
+ * @param v2  second vector
+ * @param len length of vectors
+ *
+ * @return inner product of the vectors
+ */
+double (*scalarproduct_double)(const double *v1, const double *v2,
+   size_t len);
 } AVFloatDSPContext;
 
 /**
- * Return the scalar product of two vectors.
+ * Return the scalar product of two vectors of floats.
  *
  * @param v1  first input vector
+ *constraints: 32-byte aligned
  * @param v2  first input vector
+ *constraints: 32-byte aligned
  * @param len number of elements
+ *constraints: multiple of 16
  *
  * @return sum of elementwise products
  */
 float avpriv_scalarproduct_float_c(const float *v1, const float *v2, int len);
 
+/**
+ * Return the scalar product of two vectors of doubles.
+ *
+ * @param v1  first input vector
+ * @param v2  first input vector
+ * @param len number of elements
+ *
+ * @return inner product of the vectors
+ */
+double ff_scalarproduct_double_c(const double *v1, const double *v2,
+ size_t len);
+
 void ff_float_dsp_init_aarch64(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_arm(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_ppc(AVFloatDSPContext *fdsp, int strict);
-- 
2.45.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 3/5] lavfi: get rid of bespoke double scalar products

2024-05-30 Thread Rémi Denis-Courmont
---
 libavfilter/aap_template.c   | 14 +-
 libavfilter/anlms_template.c | 16 ++--
 libavfilter/arls_template.c  | 14 +-
 3 files changed, 4 insertions(+), 40 deletions(-)

diff --git a/libavfilter/aap_template.c b/libavfilter/aap_template.c
index ea9c815a89..0e0580fb32 100644
--- a/libavfilter/aap_template.c
+++ b/libavfilter/aap_template.c
@@ -36,18 +36,6 @@
 #define fn2(a,b)   fn3(a,b)
 #define fn(a)  fn2(a, SAMPLE_FORMAT)
 
-#if DEPTH == 64
-static double scalarproduct_double(const double *v1, const double *v2, int len)
-{
-double p = 0.0;
-
-for (int i = 0; i < len; i++)
-p += v1[i] * v2[i];
-
-return p;
-}
-#endif
-
 static ftype fn(fir_sample)(AudioAPContext *s, ftype sample, ftype *delay,
 ftype *coeffs, ftype *tmp, int *offset)
 {
@@ -60,7 +48,7 @@ static ftype fn(fir_sample)(AudioAPContext *s, ftype sample, 
ftype *delay,
 #if DEPTH == 32
 output = s->fdsp->scalarproduct_float(delay, tmp, s->kernel_size);
 #else
-output = scalarproduct_double(delay, tmp, s->kernel_size);
+output = s->fdsp->scalarproduct_double(delay, tmp, s->kernel_size);
 #endif
 
 if (--(*offset) < 0)
diff --git a/libavfilter/anlms_template.c b/libavfilter/anlms_template.c
index b25df4fa18..a8d1dbfe0f 100644
--- a/libavfilter/anlms_template.c
+++ b/libavfilter/anlms_template.c
@@ -33,18 +33,6 @@
 #define fn2(a,b)   fn3(a,b)
 #define fn(a)  fn2(a, SAMPLE_FORMAT)
 
-#if DEPTH == 64
-static double scalarproduct_double(const double *v1, const double *v2, int len)
-{
-double p = 0.0;
-
-for (int i = 0; i < len; i++)
-p += v1[i] * v2[i];
-
-return p;
-}
-#endif
-
 static ftype fn(fir_sample)(AudioNLMSContext *s, ftype sample, ftype *delay,
 ftype *coeffs, ftype *tmp, int *offset)
 {
@@ -58,7 +46,7 @@ static ftype fn(fir_sample)(AudioNLMSContext *s, ftype 
sample, ftype *delay,
 #if DEPTH == 32
 output = s->fdsp->scalarproduct_float(delay, tmp, s->kernel_size);
 #else
-output = scalarproduct_double(delay, tmp, s->kernel_size);
+output = s->fdsp->scalarproduct_double(delay, tmp, s->kernel_size);
 #endif
 
 if (--(*offset) < 0)
@@ -85,7 +73,7 @@ static ftype fn(process_sample)(AudioNLMSContext *s, ftype 
input, ftype desired,
 #if DEPTH == 32
 sum = s->fdsp->scalarproduct_float(delay, delay, s->kernel_size);
 #else
-sum = scalarproduct_double(delay, delay, s->kernel_size);
+sum = s->fdsp->scalarproduct_double(delay, delay, s->kernel_size);
 #endif
 norm = s->eps + sum;
 b = mu * e / norm;
diff --git a/libavfilter/arls_template.c b/libavfilter/arls_template.c
index d8b19d89a5..c67b48cf6f 100644
--- a/libavfilter/arls_template.c
+++ b/libavfilter/arls_template.c
@@ -39,18 +39,6 @@
 #define fn2(a,b)   fn3(a,b)
 #define fn(a)  fn2(a, SAMPLE_FORMAT)
 
-#if DEPTH == 64
-static double scalarproduct_double(const double *v1, const double *v2, int len)
-{
-double p = 0.0;
-
-for (int i = 0; i < len; i++)
-p += v1[i] * v2[i];
-
-return p;
-}
-#endif
-
 static ftype fn(fir_sample)(AudioRLSContext *s, ftype sample, ftype *delay,
 ftype *coeffs, ftype *tmp, int *offset)
 {
@@ -64,7 +52,7 @@ static ftype fn(fir_sample)(AudioRLSContext *s, ftype sample, 
ftype *delay,
 #if DEPTH == 32
 output = s->fdsp->scalarproduct_float(delay, tmp, s->kernel_size);
 #else
-output = scalarproduct_double(delay, tmp, s->kernel_size);
+output = s->fdsp->scalarproduct_double(delay, tmp, s->kernel_size);
 #endif
 
 if (--(*offset) < 0)
-- 
2.45.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 5/5] lavu/float_dsp: R-V V scalarproduct_double

2024-05-30 Thread Rémi Denis-Courmont
C908:
scalarproduct_double_c:   39.2
scalarproduct_double_rvv_f64: 10.5

X60:
scalarproduct_double_c:   35.0
scalarproduct_double_rvv_f64:  5.2
---
 libavutil/riscv/float_dsp_init.c |  3 +++
 libavutil/riscv/float_dsp_rvv.S  | 21 +
 2 files changed, 24 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 585f237225..155496fa6b 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -46,6 +46,8 @@ void ff_vector_dmac_scalar_rvv(double *dst, const double 
*src, double mul,
 int len);
 void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul,
 int len);
+double ff_scalarproduct_double_rvv(const double *v1, const double *v2,
+   size_t len);
 
 av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
 {
@@ -68,6 +70,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
 fdsp->vector_dmul = ff_vector_dmul_rvv;
 fdsp->vector_dmac_scalar = ff_vector_dmac_scalar_rvv;
 fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv;
+fdsp->scalarproduct_double = ff_scalarproduct_double_rvv;
 }
 }
 #endif
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index e6ec182a7a..2f0ade6db6 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -249,3 +249,24 @@ NOHWD   mv   a2, a3
 
 ret
 endfunc
+
+func ff_scalarproduct_double_rvv, zve64f
+vsetvli  t0, zero, e64, m8, ta, ma
+vmv.v.x  v8, zero
+vmv.s.x  v0, zero
+1:
+vsetvli  t0, a2, e64, m8, tu, ma
+vle64.v  v16, (a0)
+sub  a2, a2, t0
+vle64.v  v24, (a1)
+sh3add   a0, t0, a0
+vfmacc.vvv8, v16, v24
+sh3add   a1, t0, a1
+bnez a2, 1b
+
+vsetvli  t0, zero, e64, m8, ta, ma
+vfredusum.vs v0, v8, v0
+vfmv.f.s fa0, v0
+NOHWD   fmv.x.w  a0, fa0
+ret
+endfunc
-- 
2.45.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

2024-05-30 Thread Michael Niedermayer
On Thu, May 30, 2024 at 08:49:12PM +0300, Rémi Denis-Courmont wrote:
> Le torstaina 30. toukokuuta 2024, 19.48.13 EEST Tomas Härdin a écrit :
> > > > Are you saying that UB is acceptable? You know the compiler is free
> > > > to
> > > > assume signed arithmetic doesn't overflow, right? If so then what
> > > > other
> > > > UB might we accept?
> > > 
> > > He did not say that... He said we should switch to a better
> > > implementation rather than trying to fix the existing potentially
> > > buggy one.
> > 
> > I have a fix for demonstrable UB and Rémi is problematizing it.
> 
> Andreas made cosmetic arguments against this patch before I had even seen the 
> patch, forget comment on it.
> 

> > It is not a "theoretical" UB - that's not how UB works.
> 
> It is a *theoretical* UB if you can not prove that it leads to misbehaviour 
> in 

If the function doesnt get called with values triggering UB then its not UB.

If the function gets called with values triggering the signed overflow then its 
UB
And its a bug unless the applications intended behavior is undefined.

also i would not bet on that the function produces the correct output for
input values that trigger UB on every platform

The case where this really could be a problem is if its used with compile
time constants that would trigger the overflow because in these cases
the optimizer can assume the whole codepath leading to it can be
removed.

IMHO we should simply fix UB instead of arguing over how bad it could be
or when.

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety -- Benjamin Franklin


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCHv2 1/5] lavu/float_dsp: add double-precision scalar product

2024-05-30 Thread James Almer

On 5/30/2024 4:06 PM, Rémi Denis-Courmont wrote:

The function pointer is appended to the structure for backward binary
compatibility. Fortunately, this is allocated by libavutil, not by the
user, so increasing the structure size is safe.
---
  libavutil/float_dsp.c | 12 
  libavutil/float_dsp.h | 31 ++-
  2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/libavutil/float_dsp.c b/libavutil/float_dsp.c
index e9fb023466..08bbc85e3e 100644
--- a/libavutil/float_dsp.c
+++ b/libavutil/float_dsp.c
@@ -132,6 +132,17 @@ float avpriv_scalarproduct_float_c(const float *v1, const 
float *v2, int len)
  return p;
  }
  
+double ff_scalarproduct_double_c(const double *v1, const double *v2,

+ size_t len)
+{
+double p = 0.0;
+
+for (size_t i = 0; i < len; i++)
+p += v1[i] * v2[i];
+
+return p;
+}
+
  av_cold AVFloatDSPContext *avpriv_float_dsp_alloc(int bit_exact)
  {
  AVFloatDSPContext *fdsp = av_mallocz(sizeof(AVFloatDSPContext));
@@ -149,6 +160,7 @@ av_cold AVFloatDSPContext *avpriv_float_dsp_alloc(int 
bit_exact)
  fdsp->vector_fmul_reverse = vector_fmul_reverse_c;
  fdsp->butterflies_float = butterflies_float_c;
  fdsp->scalarproduct_float = avpriv_scalarproduct_float_c;
+fdsp->scalarproduct_double = ff_scalarproduct_double_c;
  
  #if ARCH_AARCH64

  ff_float_dsp_init_aarch64(fdsp);
diff --git a/libavutil/float_dsp.h b/libavutil/float_dsp.h
index 342a8715c5..5053aa240d 100644
--- a/libavutil/float_dsp.h
+++ b/libavutil/float_dsp.h
@@ -19,6 +19,8 @@
  #ifndef AVUTIL_FLOAT_DSP_H
  #define AVUTIL_FLOAT_DSP_H
  
+#include 

+
  typedef struct AVFloatDSPContext {
  /**
   * Calculate the entry wise product of two vectors of floats and store 
the result in
@@ -187,19 +189,46 @@ typedef struct AVFloatDSPContext {
   */
  void (*vector_dmul)(double *dst, const double *src0, const double *src1,
  int len);
+
+/**
+ * Calculate the scalar product of two vectors of doubles.
+ *
+ * @param v1  first vector
+ * @param v2  second vector
+ * @param len length of vectors
+ *
+ * @return inner product of the vectors
+ */
+double (*scalarproduct_double)(const double *v1, const double *v2,
+   size_t len);
  } AVFloatDSPContext;
  
  /**

- * Return the scalar product of two vectors.
+ * Return the scalar product of two vectors of floats.
   *
   * @param v1  first input vector
+ *constraints: 32-byte aligned
   * @param v2  first input vector
+ *constraints: 32-byte aligned
   * @param len number of elements
+ *constraints: multiple of 16


Why are you adding this to the doxy for scalarproduct_float()? Those 
constrains are not correct for it. They are for scalarproduct_double() 
which you're adding now.



   *
   * @return sum of elementwise products
   */
  float avpriv_scalarproduct_float_c(const float *v1, const float *v2, int len);
  
+/**

+ * Return the scalar product of two vectors of doubles.
+ *
+ * @param v1  first input vector
+ * @param v2  first input vector
+ * @param len number of elements
+ *
+ * @return inner product of the vectors
+ */
+double ff_scalarproduct_double_c(const double *v1, const double *v2,
+ size_t len);
+
  void ff_float_dsp_init_aarch64(AVFloatDSPContext *fdsp);
  void ff_float_dsp_init_arm(AVFloatDSPContext *fdsp);
  void ff_float_dsp_init_ppc(AVFloatDSPContext *fdsp, int strict);

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/5] lavu/common.h: Fix UB in av_clipl_int32_c()

2024-05-30 Thread Rémi Denis-Courmont
Le torstaina 30. toukokuuta 2024, 22.07.13 EEST Michael Niedermayer a écrit :
> If the function doesnt get called with values triggering UB then its not UB.

As Tomas pointed out, that statement is actually false. Specifically, if the 
compiler can prove that the function can be called with values triggering UB, 
then the code is UB, even if those offending values do not actually occur in a 
given instance of the program.

The C specification is known to contradict causality.

For instance, if you have pass an uninitialised value to av_clipl_int32_c(), 
then the code is UB, even if the actual value in the register or stack slot is 
never one that could trigger UB. Of course, usage of uninitialised values is a 
bad practice, but it is not, per se, UB.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCHv2 1/5] lavu/float_dsp: add double-precision scalar product

2024-05-30 Thread Rémi Denis-Courmont
Le torstaina 30. toukokuuta 2024, 22.10.28 EEST James Almer a écrit :
> Why are you adding this to the doxy for scalarproduct_float()? Those
> constrains are not correct for it. They are for scalarproduct_double()
> which you're adding now.

Because copy-paste error.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3] avformat/nutdec: Don't create inconsistent side data

2024-05-30 Thread Michael Niedermayer
On Thu, May 30, 2024 at 08:07:48PM +0200, Andreas Rheinhardt wrote:
> Michael Niedermayer:
> > On Thu, May 30, 2024 at 07:53:42PM +0200, Andreas Rheinhardt wrote:
> >> Michael Niedermayer:
> >>> On Thu, May 30, 2024 at 02:14:20AM +0200, Andreas Rheinhardt wrote:
>  Forgotten in 65ddc74988245a01421a63c5cffa4d900c47117c.
> 
>  Signed-off-by: Andreas Rheinhardt 
>  ---
>   libavformat/nutdec.c | 14 --
>   1 file changed, 4 insertions(+), 10 deletions(-)
> 
>  diff --git a/libavformat/nutdec.c b/libavformat/nutdec.c
>  index 0bb7f154db..34b7e3cb9a 100644
>  --- a/libavformat/nutdec.c
>  +++ b/libavformat/nutdec.c
>  @@ -881,8 +881,6 @@ static int read_sm_data(AVFormatContext *s, 
>  AVIOContext *bc, AVPacket *pkt, int
>   int count = ffio_read_varlen(bc);
>   int skip_start = 0;
>   int skip_end = 0;
>  -int channels = 0;
>  -int64_t channel_layout = 0;
>   int sample_rate = 0;
>   int width = 0;
>   int height = 0;
>  @@ -930,7 +928,7 @@ static int read_sm_data(AVFormatContext *s, 
>  AVIOContext *bc, AVPacket *pkt, int
>   AV_WB64(dst, v64);
>   dst += 8;
>   } else if (!strcmp(name, "ChannelLayout") && value_len == 
>  8) {
>  -channel_layout = avio_rl64(bc);
>  +// Ignored
>   continue;
>   } else {
>   av_log(s, AV_LOG_WARNING, "Unknown data %s / %s\n", 
>  name, type_str);
>  @@ -952,7 +950,7 @@ static int read_sm_data(AVFormatContext *s, 
>  AVIOContext *bc, AVPacket *pkt, int
>   } else if (!strcmp(name, "SkipEnd")) {
>   skip_end = value;
>   } else if (!strcmp(name, "Channels")) {
>  -channels = value;
>  +// Ignored
>   } else if (!strcmp(name, "SampleRate")) {
>   sample_rate = value;
>   } else if (!strcmp(name, "Width")) {
>  @@ -965,18 +963,14 @@ static int read_sm_data(AVFormatContext *s, 
>  AVIOContext *bc, AVPacket *pkt, int
>   }
>   }
>   
>  -if (channels || channel_layout || sample_rate || width || height) {
>  -uint8_t *dst = av_packet_new_side_data(pkt, 
>  AV_PKT_DATA_PARAM_CHANGE, 28);
>  +if (sample_rate || width || height) {
>  +uint8_t *dst = av_packet_new_side_data(pkt, 
>  AV_PKT_DATA_PARAM_CHANGE, 16);
>   if (!dst)
>   return AVERROR(ENOMEM);
>   bytestream_put_le32(&dst,
>   
>  AV_SIDE_DATA_PARAM_CHANGE_SAMPLE_RATE*(!!sample_rate) +
>   
>  AV_SIDE_DATA_PARAM_CHANGE_DIMENSIONS*(!!(width|height))
>  );
>  -if (channels)
>  -bytestream_put_le32(&dst, channels);
>  -if (channel_layout)
>  -bytestream_put_le64(&dst, channel_layout);
>   if (sample_rate)
>   bytestream_put_le32(&dst, sample_rate);
>   if (width || height){
> >>>
> >>> This would break mid stream changes to the channel layout & channels when 
> >>> it
> >>> is carried at format level only
> >>>
> >>> The commit message also does not adequately explain why such mid stream 
> >>> changes
> >>> are ignored
> >>>
> >>
> >> Mid-stream changes like this have been deprecated in
> >> 09b5d3fb44ae1036700f80c8c80b15e9074c58c3;
> >> 65ddc74988245a01421a63c5cffa4d900c47117c removed it, but only
> >> incompletely: The side data flags for channel count and channel layout
> >> changes were no longer written (in fact, they were removed from
> >> packet.h), yet it still wrote the rest of the side data as if these
> >> flags existed and had been written. That is the inconsistency this
> >> commit addresses. It does not address whether channel count/layout
> >> updates should have been removed, because that has already happened.
> > 
> > i honestly belive that we should support changing channel(layout) for
> > cases like PCM in nut
> > 
> 
> That is orthogonal to this patch (which just wants to not create
> inconsistent side data).

You can fix the inconsistency in 2 directions
1. remove everyting
2. add the code back that made it inconsistant

This line between these 2 points is not orthogonal to what this patch changes
It also is not orthoginal to supporting PCM channel changes in NUT
nor is the change this patch does from our current state orthogonal
to what would be needed to support channel changes

IMHO, decide on what the end goal is and work toward it. Not just
make something consistent even when its a direction that might be suboptimal

thx

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If the United States is serious about tackling 

Re: [FFmpeg-devel] [PATCHv2 1/5] lavu/float_dsp: add double-precision scalar product

2024-05-30 Thread James Almer

On 5/30/2024 4:21 PM, Rémi Denis-Courmont wrote:

Le torstaina 30. toukokuuta 2024, 22.10.28 EEST James Almer a écrit :

Why are you adding this to the doxy for scalarproduct_float()? Those
constrains are not correct for it. They are for scalarproduct_double()
which you're adding now.


Because copy-paste error.


Ok, patchset LGTM after you amend that.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCHv2 1/5] lavu/float_dsp: add double-precision scalar product

2024-05-30 Thread Rémi Denis-Courmont
Le torstaina 30. toukokuuta 2024, 22.06.55 EEST Rémi Denis-Courmont a écrit :
> The function pointer is appended to the structure for backward binary
> compatibility. Fortunately, this is allocated by libavutil, not by the
> user, so increasing the structure size is safe.
> ---
>  libavutil/float_dsp.c | 12 
>  libavutil/float_dsp.h | 31 ++-
>  2 files changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/libavutil/float_dsp.c b/libavutil/float_dsp.c
> index e9fb023466..08bbc85e3e 100644
> --- a/libavutil/float_dsp.c
> +++ b/libavutil/float_dsp.c
> @@ -132,6 +132,17 @@ float avpriv_scalarproduct_float_c(const float *v1,
> const float *v2, int len) return p;
>  }
> 
> +double ff_scalarproduct_double_c(const double *v1, const double *v2,
> + size_t len)
> +{
> +double p = 0.0;
> +
> +for (size_t i = 0; i < len; i++)
> +p += v1[i] * v2[i];
> +
> +return p;
> +}
> +

If somebody wants to write x86 assembly, they can probably borrow most of the 
code for evaluate_lls. It is a double precision scalar product with a little 
bit of extra fluff in the prologue.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCHv2 1/5] lavu/float_dsp: add double-precision scalar product

2024-05-30 Thread James Almer

On 5/30/2024 4:31 PM, Rémi Denis-Courmont wrote:

Le torstaina 30. toukokuuta 2024, 22.06.55 EEST Rémi Denis-Courmont a écrit :

The function pointer is appended to the structure for backward binary
compatibility. Fortunately, this is allocated by libavutil, not by the
user, so increasing the structure size is safe.
---
  libavutil/float_dsp.c | 12 
  libavutil/float_dsp.h | 31 ++-
  2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/libavutil/float_dsp.c b/libavutil/float_dsp.c
index e9fb023466..08bbc85e3e 100644
--- a/libavutil/float_dsp.c
+++ b/libavutil/float_dsp.c
@@ -132,6 +132,17 @@ float avpriv_scalarproduct_float_c(const float *v1,
const float *v2, int len) return p;
  }

+double ff_scalarproduct_double_c(const double *v1, const double *v2,
+ size_t len)
+{
+double p = 0.0;
+
+for (size_t i = 0; i < len; i++)
+p += v1[i] * v2[i];
+
+return p;
+}
+


If somebody wants to write x86 assembly, they can probably borrow most of the
code for evaluate_lls. It is a double precision scalar product with a little
bit of extra fluff in the prologue.


I already did, I'm just waiting for this set to be pushed before sending it.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avformat/framecrcenc: compute the checksum for side data

2024-05-30 Thread Michael Niedermayer
On Mon, May 27, 2024 at 04:52:22PM -0300, James Almer wrote:
> On 5/27/2024 4:50 PM, Michael Niedermayer wrote:
> > On Mon, May 27, 2024 at 04:33:21PM -0300, James Almer wrote:
> > > On 5/27/2024 4:31 PM, Michael Niedermayer wrote:
> > > > On Mon, May 27, 2024 at 09:20:55PM +0200, Michael Niedermayer wrote:
> > > > > On Mon, May 27, 2024 at 03:17:15PM -0300, James Almer wrote:
> > > > > > On 5/27/2024 3:11 PM, Michael Niedermayer wrote:
> > > > > > > On Mon, May 27, 2024 at 10:15:43AM +0200, Anton Khirnov wrote:
> > > > > > > > Quoting Michael Niedermayer (2024-04-27 02:36:23)
> > > > > > > > > This allows detecting issues in side data related code, same 
> > > > > > > > > as what
> > > > > > > > > framecrc does for before already for packet data itself.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Michael Niedermayer 
> > > > > > > > > ---
> > > > > > > > 
> > > > > > > > I am against this patch. Checksumming side data is a 
> > > > > > > > fundamentally wrong
> > > > > > > > thing to do.
> > > > > > > 
> > > > > > > It, or something equivalent is neccessary for regression testing.
> > > > > > > (and it was you who asked also for the tests i run to be part of
> > > > > > > fate. But here you object to it)
> > > > > > > 
> > > > > > > You know, not checking side data is not checking it so 
> > > > > > > differences would then not be
> > > > > > > detected allowing for unintended changes to be introduced (aka 
> > > > > > > bugs)
> > > > > > 
> > > > > > You have seen how much code is needed to get hashing to work for 
> > > > > > all targets
> > > > > > with some types,
> > > > > 
> > > > >framecrcenc.c |   76 
> > > > > +---
> > > > >1 file changed, 73 insertions(+), 3 deletions(-)
> > > > > 
> > > > > 70 more lines of code, in my patch
> > > > > 
> > > > > If we need another 70 to handle some corner cases, no idea if we do, 
> > > > > thats
> > > > > still negligible
> > > > > 
> > > > > 
> > > > > > so it does feel like it's not the right thing to do.
> > > > > 
> > > > > I dont think i can follow that logic
> > > > > 
> > > > > 
> > > > > > ffprobe (and f_sidedata) are what should be used for actual 
> > > > > > integrity
> > > > > > checks.
> > > > > 
> > > > > ffprobe cannot test ffmpeg, ffmpeg is a seperate excutable
> > > > > 
> > > > > If you suggest that side data should not be tested in FFmpeg while 
> > > > > packet.data
> > > > > should be tested. That position seems inconsistant to me
> > > > > 
> > > > > If you suggest that neither side data nor packet.data should be 
> > > > > tested in FFmpeg
> > > > > iam confident that there would be a majority disagreeing.
> > > > > 
> > > > > f_sidedata is not at the output of ffmpeg so even if it could test 
> > > > > it, it
> > > > > does not test the ffmpeg output.
> > > > > We also dont replace running md5sum and framecrc on ffmpeg output by 
> > > > > a bitstream
> > > > > filter.
> > > > > 
> > > > > Again, there is need to test what comes out of FFmpeg, thats at the 
> > > > > muxer level
> > > > > thats what framecrcenc does.
> > > > 
> > > > There is also an additional aspect
> > > > and that is efficiency or "time taken by all fate tests"
> > > > framecrcenc already has all the side data, it costs basically 0 time to 
> > > > print that
> > > > 
> > > > any ffprobe based check needs to run everything a 2nd time, so it will 
> > > > be slower
> > > > 
> > > > also ffprobe is only good for side data from the demuxer.
> > > > my patch tests all cases including side data from the encoder or any 
> > > > other
> > > > source that gets forwarded to the muxer in each testcase.
> > > 
> > > We could extend showinfo_bsf to print side data information.
> > 
> > Well, you argued a moment ago that its too much code (in framecrcenc)
> > its not going to be less code if the same or more detailed information
> > is printed in a showinfo_bsf
> > 
> > again, my suggestion is that this code should go to where side data is
> > and then showinfo_bsf, framecrcenc and ffprobe can use it
> 
> I mean, showinfo_bsf could be adapted in a way ffprobe can invoke/parse, so
> all the related ffprobe code can be moved there.

do you agree that framecrcenc should show side data in a way to allow
detecting changes, as it also does with the main packet data ?
Its perfectly fine with me if that invokes the same code as showinfo_bsf

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If a bugfix only changes things apparently unrelated to the bug with no
further explanation, that is a good sign that the bugfix is wrong.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3] avformat/nutdec: Don't create inconsistent side data

2024-05-30 Thread Andreas Rheinhardt
Michael Niedermayer:
> On Thu, May 30, 2024 at 08:07:48PM +0200, Andreas Rheinhardt wrote:
>> Michael Niedermayer:
>>> On Thu, May 30, 2024 at 07:53:42PM +0200, Andreas Rheinhardt wrote:
 Michael Niedermayer:
> On Thu, May 30, 2024 at 02:14:20AM +0200, Andreas Rheinhardt wrote:
>> Forgotten in 65ddc74988245a01421a63c5cffa4d900c47117c.
>>
>> Signed-off-by: Andreas Rheinhardt 
>> ---
>>  libavformat/nutdec.c | 14 --
>>  1 file changed, 4 insertions(+), 10 deletions(-)
>>
>> diff --git a/libavformat/nutdec.c b/libavformat/nutdec.c
>> index 0bb7f154db..34b7e3cb9a 100644
>> --- a/libavformat/nutdec.c
>> +++ b/libavformat/nutdec.c
>> @@ -881,8 +881,6 @@ static int read_sm_data(AVFormatContext *s, 
>> AVIOContext *bc, AVPacket *pkt, int
>>  int count = ffio_read_varlen(bc);
>>  int skip_start = 0;
>>  int skip_end = 0;
>> -int channels = 0;
>> -int64_t channel_layout = 0;
>>  int sample_rate = 0;
>>  int width = 0;
>>  int height = 0;
>> @@ -930,7 +928,7 @@ static int read_sm_data(AVFormatContext *s, 
>> AVIOContext *bc, AVPacket *pkt, int
>>  AV_WB64(dst, v64);
>>  dst += 8;
>>  } else if (!strcmp(name, "ChannelLayout") && value_len == 
>> 8) {
>> -channel_layout = avio_rl64(bc);
>> +// Ignored
>>  continue;
>>  } else {
>>  av_log(s, AV_LOG_WARNING, "Unknown data %s / %s\n", 
>> name, type_str);
>> @@ -952,7 +950,7 @@ static int read_sm_data(AVFormatContext *s, 
>> AVIOContext *bc, AVPacket *pkt, int
>>  } else if (!strcmp(name, "SkipEnd")) {
>>  skip_end = value;
>>  } else if (!strcmp(name, "Channels")) {
>> -channels = value;
>> +// Ignored
>>  } else if (!strcmp(name, "SampleRate")) {
>>  sample_rate = value;
>>  } else if (!strcmp(name, "Width")) {
>> @@ -965,18 +963,14 @@ static int read_sm_data(AVFormatContext *s, 
>> AVIOContext *bc, AVPacket *pkt, int
>>  }
>>  }
>>  
>> -if (channels || channel_layout || sample_rate || width || height) {
>> -uint8_t *dst = av_packet_new_side_data(pkt, 
>> AV_PKT_DATA_PARAM_CHANGE, 28);
>> +if (sample_rate || width || height) {
>> +uint8_t *dst = av_packet_new_side_data(pkt, 
>> AV_PKT_DATA_PARAM_CHANGE, 16);
>>  if (!dst)
>>  return AVERROR(ENOMEM);
>>  bytestream_put_le32(&dst,
>>  
>> AV_SIDE_DATA_PARAM_CHANGE_SAMPLE_RATE*(!!sample_rate) +
>>  
>> AV_SIDE_DATA_PARAM_CHANGE_DIMENSIONS*(!!(width|height))
>> );
>> -if (channels)
>> -bytestream_put_le32(&dst, channels);
>> -if (channel_layout)
>> -bytestream_put_le64(&dst, channel_layout);
>>  if (sample_rate)
>>  bytestream_put_le32(&dst, sample_rate);
>>  if (width || height){
>
> This would break mid stream changes to the channel layout & channels when 
> it
> is carried at format level only
>
> The commit message also does not adequately explain why such mid stream 
> changes
> are ignored
>

 Mid-stream changes like this have been deprecated in
 09b5d3fb44ae1036700f80c8c80b15e9074c58c3;
 65ddc74988245a01421a63c5cffa4d900c47117c removed it, but only
 incompletely: The side data flags for channel count and channel layout
 changes were no longer written (in fact, they were removed from
 packet.h), yet it still wrote the rest of the side data as if these
 flags existed and had been written. That is the inconsistency this
 commit addresses. It does not address whether channel count/layout
 updates should have been removed, because that has already happened.
>>>
>>> i honestly belive that we should support changing channel(layout) for
>>> cases like PCM in nut
>>>
>>
>> That is orthogonal to this patch (which just wants to not create
>> inconsistent side data).
> 
> You can fix the inconsistency in 2 directions
> 1. remove everyting
> 2. add the code back that made it inconsistant
> 
> This line between these 2 points is not orthogonal to what this patch changes
> It also is not orthoginal to supporting PCM channel changes in NUT
> nor is the change this patch does from our current state orthogonal
> to what would be needed to support channel changes
> 
> IMHO, decide on what the end goal is and work toward it. Not just
> make something consistent even when its a direction that might be suboptimal
> 

We have a release that is able to create nonsense side data; this needs
to be fixed a

[FFmpeg-devel] [PATCH] mov.c fix the duration for the last audio frame.

2024-05-30 Thread Wang Cao via ffmpeg-devel
It is possible that the actual audio data only occupy part of the last audio 
frame. This can be signaled by duration in the "trun" atom. We should respect 
the metadata of file and set the duration correctly.
---
 libavformat/mov.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/libavformat/mov.c b/libavformat/mov.c
index 45eca74d1d..caea36b495 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -5700,11 +5700,16 @@ static int mov_read_trun(MOVContext *c, AVIOContext 
*pb, MOVAtom atom)
 
 sc->ctts_data[index_entry_pos].count = 1;
 sc->ctts_data[index_entry_pos].duration = ctts_duration;
+if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
+sc->ctts_data[index_entry_pos].duration = sample_duration;
+} else {
+sc->ctts_data[index_entry_pos].duration = ctts_duration;
+}
 index_entry_pos++;
 
 av_log(c->fc, AV_LOG_TRACE, "AVIndex stream %d, sample %d, offset 
%"PRIx64", dts %"PRId64", "
-"size %u, distance %d, keyframe %d\n", st->index,
-index_entry_pos, offset, dts, sample_size, distance, keyframe);
+"size %u, distance %d, keyframe %d duration %d\n", st->index,
+index_entry_pos, offset, dts, sample_size, distance, keyframe, 
sc->ctts_data[index_entry_pos-1].duration);
 distance++;
 if (av_sat_add64(dts, sample_duration) != dts + 
(uint64_t)sample_duration)
 return AVERROR_INVALIDDATA;
@@ -9894,6 +9899,9 @@ static int mov_finalize_packet(AVFormatContext *s, 
AVStream *st, AVIndexEntry *s
 }
 if (sc->ctts_data && sc->ctts_index < sc->ctts_count) {
 pkt->pts = av_sat_add64(pkt->dts, av_sat_add64(sc->dts_shift, 
sc->ctts_data[sc->ctts_index].duration));
+if (st->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
+pkt->duration = sc->ctts_data[sc->ctts_index].duration;
+}
 /* update ctts context */
 sc->ctts_sample++;
 if (sc->ctts_index < sc->ctts_count &&
-- 
2.45.1.288.g0e0cd299f1-goog

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 00/16] NVidia Tegra hardware decoding backend

2024-05-30 Thread averne
Hi all,

This patch series implements a hardware decoding backend for nvidia
Tegra devices, notably the Nintendo Switch. It was primarily written
for HorizonOS (Nintendo Switch OS), but also supports nvidia's
Linux4Tegra distro. As for hardware, all Tegras later than the X1
(T210) should be supported, although the patch does not implement
features that were added to subsequent revisions of multimedia
engines (eg. 12-bit HEVC). However, since I only own T210 devices
(Switch and jetson nano), I was not able to verify this.

The backend is essentially a userspace NVDEC driver, as due to the
OS design of the Switch, we cannot link to nvidia's system libraries.
It notably uses (sparse) hardware documentation released by nvidia
here: https://github.com/NVIDIA/open-gpu-doc/tree/master/classes/video.
It supports all codecs available in hardware (MPEG1/2/4, VC1, H264,
HEVC, VP8, VP9 and JPEG), with dynamic frequency scaling, and
hardware-accelerated frame transfer.

At the moment I'm submitting the series with some nvidia headers
pulled from various sources, but I do think they should rather be
put in nv-codec-headers, let me know.

The code was tested for memory bugs and leaks with valgrind and
asan on L4T. Some quick performance testing (decoding with -f null -)
showed results in line with official software, tested against the 
nvv4l2 backend that was posted here a while ago:
https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2020-June/263759.html.
Note that the numbers are skewed because frame transfer cannot be 
disabled in nvidia's backend.
- HEVC Main 10 @ 4k   (~80Mbps): nvtegra  79fps, nvv4l2  66fps
- HEVC Main 10 @ 1080p (~5Mbps): nvtegra 402fps, nvv4l2 229fps
- H264 @ 1080p (~3Mbps): nvtegra 286fps, nvv4l2 260fps

Several homebrew applications have been using this backend for some
time, with no bugs reported. As far as I'm aware, this is the
complete list of them:
- NXMP, a media player based on mpv: https://github.com/proconsule/nxmp
- WiliWili, a bilibili client: https://github.com/xfangfang/wiliwili
- Switchfin, a Jellyfin client: https://github.com/dragonflylee/switchfin
- Moonlight-Switch, a Moonlight client: 
https://github.com/XITRIX/Moonlight-Switch
- chiaki: https://git.sr.ht/~kkwong/chiaki/
- My own media player, unreleased at this time

Nintendo Switch support assumes a working devkitA64 homebrew
environment, instructions regarding setup can be found here:
https://devkitpro.org/wiki/devkitPro_pacman. The hwaccel can then be
configured by eg.:
```
source /opt/devkitpro/switchvars.sh && ./configure
--cross-prefix=aarch64-none-elf- --enable-cross-compile --arch=aarch64
--cpu=cortex-a57 --target-os=horizon --enable-pic --enable-gpl
--enable-nvtegra
```

It should probably be noted that NVDEC usage on discrete gpus is
very similar. As far as I know, the main difference is that the 
interfacing is done through the GPFIFO block (same block that 
manages the 3D engine), instead of host1x.

Thank you for your consideration.


averne (16):
  avutil/buffer: add helper to allocate aligned memory
  configure,avutil: add support for HorizonOS
  avutil: add ioctl definitions for tegra devices
  avutil: add hardware definitions for NVDEC, NVJPG and VIC
  avutil: add common code for nvtegra
  avutil: add nvtegra hwcontext
  hwcontext_nvtegra: add dynamic frequency scaling routines
  nvtegra: add common hardware decoding code
  nvtegra: add mpeg1/2 hardware decoding
  nvtegra: add mpeg4 hardware decoding
  nvtegra: add vc1 hardware decoding
  nvtegra: add h264 hardware decoding
  nvtegra: add hevc hardware decoding
  nvtegra: add vp8 hardware decoding
  nvtegra: add vp9 hardware decoding
  nvtegra: add mjpeg hardware decoding

 configure  |   30 +
 libavcodec/Makefile|   11 +
 libavcodec/h263dec.c   |6 +
 libavcodec/h264_slice.c|6 +-
 libavcodec/h264dec.c   |3 +
 libavcodec/hevcdec.c   |   17 +-
 libavcodec/hevcdec.h   |2 +
 libavcodec/hwaccels.h  |   10 +
 libavcodec/hwconfig.h  |2 +
 libavcodec/mjpegdec.c  |6 +
 libavcodec/mpeg12dec.c |   12 +
 libavcodec/mpeg4videodec.c |3 +
 libavcodec/nvtegra_decode.c|  517 +
 libavcodec/nvtegra_decode.h|   94 ++
 libavcodec/nvtegra_h264.c  |  506 +
 libavcodec/nvtegra_hevc.c  |  633 +++
 libavcodec/nvtegra_mjpeg.c |  336 ++
 libavcodec/nvtegra_mpeg12.c|  319 ++
 libavcodec/nvtegra_mpeg4.c |  344 ++
 libavcodec/nvtegra_vc1.c   |  455 
 libavcodec/nvtegra_vp8.c   |  334 ++
 libavcodec/nvtegra_vp9.c   |  665 
 libavcodec/vc1dec.c|9 +
 libavcodec/vp8.c   |6 +
 libavcodec/vp9.c   |   10 +-
 libavutil/Makefile |9 +
 libavutil/buffer.c |   31 +
 libavutil/buffer.h |7 +
 libavutil/clb0b6.h |  303 ++
 libavutil/clc5b0.h |  436 

[FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory

2024-05-30 Thread averne
This is useful eg. for memory-mapped buffers that need page-aligned memory, 
when dealing with hardware devices

Signed-off-by: averne 
---
 libavutil/buffer.c | 31 +++
 libavutil/buffer.h |  7 +++
 2 files changed, 38 insertions(+)

diff --git a/libavutil/buffer.c b/libavutil/buffer.c
index e4562a79b1..b8e357f540 100644
--- a/libavutil/buffer.c
+++ b/libavutil/buffer.c
@@ -16,9 +16,14 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
+#include "config.h"
+
 #include 
 #include 
 #include 
+#if HAVE_MALLOC_H
+#include 
+#endif
 
 #include "avassert.h"
 #include "buffer_internal.h"
@@ -100,6 +105,32 @@ AVBufferRef *av_buffer_allocz(size_t size)
 return ret;
 }
 
+AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align)
+{
+AVBufferRef *ret = NULL;
+uint8_t*data = NULL;
+
+#if HAVE_POSIX_MEMALIGN
+if (posix_memalign((void **)&data, align, size))
+return NULL;
+#elif HAVE_ALIGNED_MALLOC
+data = aligned_alloc(align, size);
+#elif HAVE_MEMALIGN
+data = memalign(align, size);
+#else
+return NULL;
+#endif
+
+if (!data)
+return NULL;
+
+ret = av_buffer_create(data, size, av_buffer_default_free, NULL, 0);
+if (!ret)
+av_freep(&data);
+
+return ret;
+}
+
 AVBufferRef *av_buffer_ref(const AVBufferRef *buf)
 {
 AVBufferRef *ret = av_mallocz(sizeof(*ret));
diff --git a/libavutil/buffer.h b/libavutil/buffer.h
index e1ef5b7f07..8422ec3453 100644
--- a/libavutil/buffer.h
+++ b/libavutil/buffer.h
@@ -107,6 +107,13 @@ AVBufferRef *av_buffer_alloc(size_t size);
  */
 AVBufferRef *av_buffer_allocz(size_t size);
 
+/**
+ * Allocate an AVBuffer of the given size and alignment.
+ *
+ * @return an AVBufferRef of given size or NULL when out of memory
+ */
+AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align);
+
 /**
  * Always treat the buffer as read-only, even when it has only one
  * reference.
-- 
2.45.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 02/16] configure, avutil: add support for HorizonOS

2024-05-30 Thread averne
HorizonOS (HOS) is the operating system of the Nintendo Switch.
This patch enables integration with the homebrew toolchain developped by the 
devkitPro team. Its two main components are devkitA64 (common toolchain for 
aarch64 targets) and libnx (library implementing interaction with the HOS 
kernel and system daemons, termed sysmodules).

Signed-off-by: averne 
---
 configure   | 8 
 libavutil/cpu.c | 7 +++
 2 files changed, 15 insertions(+)

diff --git a/configure b/configure
index 96b181fd21..09fb2aed1b 100755
--- a/configure
+++ b/configure
@@ -5967,6 +5967,10 @@ case $target_os in
 ;;
 minix)
 ;;
+horizon)
+enable section_data_rel_ro
+add_extralibs -lnx
+;;
 none)
 ;;
 *)
@@ -7710,6 +7714,10 @@ haiku)
 disable memalign
 fi
 ;;
+horizon)
+disable sysctl
+disable sysctlbyname
+;;
 esac
 
 flatten_extralibs(){
diff --git a/libavutil/cpu.c b/libavutil/cpu.c
index 9ac2f01c20..6a77df5e34 100644
--- a/libavutil/cpu.c
+++ b/libavutil/cpu.c
@@ -48,6 +48,9 @@
 #if HAVE_UNISTD_H
 #include 
 #endif
+#ifdef __SWITCH__
+#include 
+#endif
 
 static atomic_int cpu_flags = -1;
 static atomic_int cpu_count = -1;
@@ -247,6 +250,10 @@ int av_cpu_count(void)
 #elif HAVE_WINRT
 GetNativeSystemInfo(&sysinfo);
 nb_cpus = sysinfo.dwNumberOfProcessors;
+#elif defined(__SWITCH__)
+u64 core_mask = 0;
+Result rc = svcGetInfo(&core_mask, InfoType_CoreMask, CUR_PROCESS_HANDLE, 
0);
+nb_cpus = R_SUCCEEDED(rc) ? av_popcount64(core_mask) : 3;
 #endif
 
 if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed))
-- 
2.45.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 03/16] avutil: add ioctl definitions for tegra devices

2024-05-30 Thread averne
These files are taken with minimal modifications from nvidia's Linux4Tegra 
(L4T) tree.
nvmap enables management of memory-mapped buffers for hardware devices.
nvhost enables interaction with different hardware modules (multimedia engines, 
display engine, ...), through a common block, host1x.

Signed-off-by: averne 
---
 libavutil/Makefile   |   2 +
 libavutil/nvhost_ioctl.h | 511 +++
 libavutil/nvmap_ioctl.h  | 451 ++
 3 files changed, 964 insertions(+)
 create mode 100644 libavutil/nvhost_ioctl.h
 create mode 100644 libavutil/nvmap_ioctl.h

diff --git a/libavutil/Makefile b/libavutil/Makefile
index 6e6fa8d800..9c112bc58a 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -52,6 +52,8 @@ HEADERS = adler32.h   
  \
   hwcontext_videotoolbox.h  \
   hwcontext_vdpau.h \
   hwcontext_vulkan.h\
+  nvhost_ioctl.h\
+  nvmap_ioctl.h \
   iamf.h\
   imgutils.h\
   intfloat.h\
diff --git a/libavutil/nvhost_ioctl.h b/libavutil/nvhost_ioctl.h
new file mode 100644
index 00..b0bf3e3ae6
--- /dev/null
+++ b/libavutil/nvhost_ioctl.h
@@ -0,0 +1,511 @@
+/*
+ * include/uapi/linux/nvhost_ioctl.h
+ *
+ * Tegra graphics host driver
+ *
+ * Copyright (c) 2016-2020, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef AVUTIL_NVHOST_IOCTL_H
+#define AVUTIL_NVHOST_IOCTL_H
+
+#ifndef __SWITCH__
+#   include 
+#   include 
+#else
+#   include 
+
+#   define _IO   _NV_IO
+#   define _IOR  _NV_IOR
+#   define _IOW  _NV_IOW
+#   define _IOWR _NV_IOWR
+
+#   define _IOC_DIR  _NV_IOC_DIR
+#   define _IOC_TYPE _NV_IOC_TYPE
+#   define _IOC_NR   _NV_IOC_NR
+#   define _IOC_SIZE _NV_IOC_SIZE
+#endif
+
+#define __user
+
+#define NVHOST_INVALID_SYNCPOINT 0x
+#define NVHOST_NO_TIMEOUT (-1)
+#define NVHOST_NO_CONTEXT 0x0
+#define NVHOST_IOCTL_MAGIC 'H'
+#define NVHOST_PRIORITY_LOW 50
+#define NVHOST_PRIORITY_MEDIUM 100
+#define NVHOST_PRIORITY_HIGH 150
+
+#define NVHOST_TIMEOUT_FLAG_DISABLE_DUMP0
+
+#define NVHOST_SUBMIT_VERSION_V00x0
+#define NVHOST_SUBMIT_VERSION_V10x1
+#define NVHOST_SUBMIT_VERSION_V20x2
+#define NVHOST_SUBMIT_VERSION_MAX_SUPPORTED NVHOST_SUBMIT_VERSION_V2
+
+struct nvhost_cmdbuf {
+uint32_t mem;
+uint32_t offset;
+uint32_t words;
+} __attribute__((packed));
+
+struct nvhost_cmdbuf_ext {
+int32_t  pre_fence;
+uint32_t reserved;
+};
+
+struct nvhost_reloc {
+uint32_t cmdbuf_mem;
+uint32_t cmdbuf_offset;
+uint32_t target;
+uint32_t target_offset;
+};
+
+struct nvhost_reloc_shift {
+uint32_t shift;
+} __attribute__((packed));
+
+#define NVHOST_RELOC_TYPE_DEFAULT0
+#define NVHOST_RELOC_TYPE_PITCH_LINEAR1
+#define NVHOST_RELOC_TYPE_BLOCK_LINEAR2
+#define NVHOST_RELOC_TYPE_NVLINK3
+struct nvhost_reloc_type {
+uint32_t reloc_type;
+uint32_t padding;
+};
+
+struct nvhost_waitchk {
+uint32_t mem;
+uint32_t offset;
+uint32_t syncpt_id;
+uint32_t thresh;
+};
+
+struct nvhost_syncpt_incr {
+uint32_t syncpt_id;
+uint32_t syncpt_incrs;
+};
+
+struct nvhost_get_param_args {
+uint32_t value;
+} __attribute__((packed));
+
+struct nvhost_get_param_arg {
+uint32_t param;
+uint32_t value;
+};
+
+struct nvhost_get_client_managed_syncpt_arg {
+uint64_t name;
+uint32_t param;
+uint32_t value;
+};
+
+struct nvhost_free_client_managed_syncpt_arg {
+uint32_t param;
+uint32_t value;
+};
+
+struct nvhost_channel_open_args {
+int32_t  channel_fd;
+};
+
+struct nvhost_set_syncpt_name_args {
+uint64_t name;
+uint32_t syncpt_id;
+uint32_t padding;
+};
+
+struct nvhost_set_nvmap_fd_args {
+uint32_t fd;
+} __attribute__((packed));
+
+enum nvhost_clk_attr {
+N

[FFmpeg-devel] [PATCH 05/16] avutil: add common code for nvtegra

2024-05-30 Thread averne
This includes a new pixel format for nvtegra hardware frames, and several 
objects for interaction with hardware blocks.
In particular, this contains code for channels (handles to hardware engines), 
maps (memory-mapped buffers shared with engines), and command buffers 
(abstraction for building command lists sent to the engines).

Signed-off-by: averne 
---
 configure  |2 +
 libavutil/Makefile |4 +
 libavutil/nvtegra.c| 1035 
 libavutil/nvtegra.h|  258 +
 libavutil/nvtegra_host1x.h |   94 
 libavutil/pixdesc.c|4 +
 libavutil/pixfmt.h |8 +
 7 files changed, 1405 insertions(+)
 create mode 100644 libavutil/nvtegra.c
 create mode 100644 libavutil/nvtegra.h
 create mode 100644 libavutil/nvtegra_host1x.h

diff --git a/configure b/configure
index 09fb2aed1b..51f169bfbd 100755
--- a/configure
+++ b/configure
@@ -361,6 +361,7 @@ External library support:
   --disable-vdpau  disable Nvidia Video Decode and Presentation API 
for Unix code [autodetect]
   --disable-videotoolbox   disable VideoToolbox code [autodetect]
   --disable-vulkan disable Vulkan code [autodetect]
+  --enable-nvtegra enable nvtegra code [no]
 
 Toolchain options:
   --arch=ARCH  select architecture [$arch]
@@ -3151,6 +3152,7 @@ videotoolbox_hwaccel_deps="videotoolbox pthreads"
 videotoolbox_hwaccel_extralibs="-framework QuartzCore"
 vulkan_deps="threads"
 vulkan_deps_any="libdl LoadLibrary"
+nvtegra_deps="gpl"
 
 av1_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_AV1"
 av1_d3d11va_hwaccel_select="av1_decoder"
diff --git a/libavutil/Makefile b/libavutil/Makefile
index 9c112bc58a..733a23a8a3 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -52,6 +52,7 @@ HEADERS = adler32.h   
  \
   hwcontext_videotoolbox.h  \
   hwcontext_vdpau.h \
   hwcontext_vulkan.h\
+  nvtegra.h \
   nvhost_ioctl.h\
   nvmap_ioctl.h \
   iamf.h\
@@ -209,6 +210,7 @@ OBJS-$(CONFIG_VDPAU)+= hwcontext_vdpau.o
 OBJS-$(CONFIG_VULKAN)   += hwcontext_vulkan.o vulkan.o
 
 OBJS-$(!CONFIG_VULKAN)  += hwcontext_stub.o
+OBJS-$(CONFIG_NVTEGRA)  += nvtegra.o
 
 OBJS += $(COMPAT_OBJS:%=../compat/%)
 
@@ -230,6 +232,8 @@ SKIPHEADERS-$(CONFIG_VDPAU)+= hwcontext_vdpau.h
 SKIPHEADERS-$(CONFIG_VULKAN)   += hwcontext_vulkan.h vulkan.h   \
   vulkan_functions.h\
   vulkan_loader.h
+SKIPHEADERS-$(CONFIG_NVTEGRA)  += nvtegra.h \
+  nvtegra_host1x.h
 
 TESTPROGS = adler32 \
 aes \
diff --git a/libavutil/nvtegra.c b/libavutil/nvtegra.c
new file mode 100644
index 00..ad0bbbdfaa
--- /dev/null
+++ b/libavutil/nvtegra.c
@@ -0,0 +1,1035 @@
+/*
+ * Copyright (c) 2024 averne 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef __SWITCH__
+#   include 
+#   include 
+#   include 
+#   include 
+#else
+#   include 
+#   include 
+#endif
+
+#include 
+
+#include "buffer.h"
+#include "log.h"
+#include "error.h"
+#include "mem.h"
+#include "thread.h"
+
+#include "nvhost_ioctl.h"
+#include "nvmap_ioctl.h"
+#include "nvtegra_host1x.h"
+
+#include "nvtegra.h"
+
+/*
+ * Tag used by the kernel to identify allocations.
+ * Official software has been seen using 0x900, 0xf00, 0x1100, 0x1400, 0x4000.
+ */
+#define MEM_TAG (0xfeed)
+
+struct DriverState {
+int nvmap_fd, nvhost_fd;
+};
+
+static AVMutex g_driver_init_mtx = AV_MUTEX_INITIALIZER;
+static struct DriverState *g_driver_state = NULL;
+static AVBufferRef *g_driver_state_ref = NULL;
+
+static void fre

[FFmpeg-devel] [PATCH 06/16] avutil: add nvtegra hwcontext

2024-05-30 Thread averne
This includes hwdevice and hwframes objects.
As the multimedia engines work with tiled surfaces (block linear in nvidia 
jargon), two frame transfer methods are implemented.
The first makes use of the VIC to perform the copy. Since some revisions of the 
VIC (such as the one found in the tegra X1) did not support 10+ bit formats, 
these go through two separate copy steps for the luma and chroma planes.
The second method copies on the CPU, and is used as a fallback if the VIC 
constraints are not satisfied.

Signed-off-by: averne 
---
 libavutil/Makefile |   7 +-
 libavutil/hwcontext.c  |   4 +
 libavutil/hwcontext.h  |   1 +
 libavutil/hwcontext_internal.h |   1 +
 libavutil/hwcontext_nvtegra.c  | 880 +
 libavutil/hwcontext_nvtegra.h  |  85 
 6 files changed, 976 insertions(+), 2 deletions(-)
 create mode 100644 libavutil/hwcontext_nvtegra.c
 create mode 100644 libavutil/hwcontext_nvtegra.h

diff --git a/libavutil/Makefile b/libavutil/Makefile
index 733a23a8a3..44cd3f0dda 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -52,6 +52,7 @@ HEADERS = adler32.h   
  \
   hwcontext_videotoolbox.h  \
   hwcontext_vdpau.h \
   hwcontext_vulkan.h\
+  hwcontext_nvtegra.h   \
   nvtegra.h \
   nvhost_ioctl.h\
   nvmap_ioctl.h \
@@ -210,7 +211,7 @@ OBJS-$(CONFIG_VDPAU)+= hwcontext_vdpau.o
 OBJS-$(CONFIG_VULKAN)   += hwcontext_vulkan.o vulkan.o
 
 OBJS-$(!CONFIG_VULKAN)  += hwcontext_stub.o
-OBJS-$(CONFIG_NVTEGRA)  += nvtegra.o
+OBJS-$(CONFIG_NVTEGRA)  += nvtegra.o hwcontext_nvtegra.o
 
 OBJS += $(COMPAT_OBJS:%=../compat/%)
 
@@ -233,7 +234,9 @@ SKIPHEADERS-$(CONFIG_VULKAN)   += 
hwcontext_vulkan.h vulkan.h   \
   vulkan_functions.h\
   vulkan_loader.h
 SKIPHEADERS-$(CONFIG_NVTEGRA)  += nvtegra.h \
-  nvtegra_host1x.h
+  nvtegra_host1x.h  \
+  hwcontext_nvtegra.h
+
 
 TESTPROGS = adler32 \
 aes \
diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
index fa99a0d8a4..8dd05147a4 100644
--- a/libavutil/hwcontext.c
+++ b/libavutil/hwcontext.c
@@ -65,6 +65,9 @@ static const HWContextType * const hw_table[] = {
 #endif
 #if CONFIG_VULKAN
 &ff_hwcontext_type_vulkan,
+#endif
+#if CONFIG_NVTEGRA
+&ff_hwcontext_type_nvtegra,
 #endif
 NULL,
 };
@@ -82,6 +85,7 @@ static const char *const hw_type_names[] = {
 [AV_HWDEVICE_TYPE_VIDEOTOOLBOX] = "videotoolbox",
 [AV_HWDEVICE_TYPE_MEDIACODEC] = "mediacodec",
 [AV_HWDEVICE_TYPE_VULKAN] = "vulkan",
+[AV_HWDEVICE_TYPE_NVTEGRA] = "nvtegra",
 };
 
 typedef struct FFHWDeviceContext {
diff --git a/libavutil/hwcontext.h b/libavutil/hwcontext.h
index bac30debae..d506281784 100644
--- a/libavutil/hwcontext.h
+++ b/libavutil/hwcontext.h
@@ -38,6 +38,7 @@ enum AVHWDeviceType {
 AV_HWDEVICE_TYPE_MEDIACODEC,
 AV_HWDEVICE_TYPE_VULKAN,
 AV_HWDEVICE_TYPE_D3D12VA,
+AV_HWDEVICE_TYPE_NVTEGRA,
 };
 
 /**
diff --git a/libavutil/hwcontext_internal.h b/libavutil/hwcontext_internal.h
index e32b786238..478583abdd 100644
--- a/libavutil/hwcontext_internal.h
+++ b/libavutil/hwcontext_internal.h
@@ -163,5 +163,6 @@ extern const HWContextType ff_hwcontext_type_vdpau;
 extern const HWContextType ff_hwcontext_type_videotoolbox;
 extern const HWContextType ff_hwcontext_type_mediacodec;
 extern const HWContextType ff_hwcontext_type_vulkan;
+extern const HWContextType ff_hwcontext_type_nvtegra;
 
 #endif /* AVUTIL_HWCONTEXT_INTERNAL_H */
diff --git a/libavutil/hwcontext_nvtegra.c b/libavutil/hwcontext_nvtegra.c
new file mode 100644
index 00..0f4d5a323b
--- /dev/null
+++ b/libavutil/hwcontext_nvtegra.c
@@ -0,0 +1,880 @@
+/*
+ * Copyright (c) 2024 averne 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  Se

[FFmpeg-devel] [PATCH 07/16] hwcontext_nvtegra: add dynamic frequency scaling routines

2024-05-30 Thread averne
To save on energy, the clock speed of multimedia engines should be adapted to 
their workload.

Signed-off-by: averne 
---
 libavutil/hwcontext_nvtegra.c | 165 ++
 libavutil/hwcontext_nvtegra.h |   7 ++
 2 files changed, 172 insertions(+)

diff --git a/libavutil/hwcontext_nvtegra.c b/libavutil/hwcontext_nvtegra.c
index 0f4d5a323b..6b72348082 100644
--- a/libavutil/hwcontext_nvtegra.c
+++ b/libavutil/hwcontext_nvtegra.c
@@ -46,6 +46,14 @@ typedef struct NVTegraDevicePriv {
 
 AVNVTegraJobPool job_pool;
 uint32_t vic_setup_off, vic_cmdbuf_off;
+
+double framerate;
+uint32_t dfs_lowcorner;
+double dfs_decode_cycles_ema;
+double dfs_ema_damping;
+int dfs_bitrate_sum;
+int dfs_cur_sample, dfs_num_samples;
+int64_t dfs_sampling_start_ts, dfs_last_ts_delta;
 } NVTegraDevicePriv;
 
 static const enum AVPixelFormat supported_sw_formats[] = {
@@ -108,6 +116,28 @@ static inline uint32_t 
nvtegra_surface_get_height_align(enum AVPixelFormat fmt,
 return 32;
 }
 
+static int nvtegra_channel_set_freq(AVNVTegraChannel *channel, uint32_t freq) {
+int err;
+#ifndef __SWITCH__
+err = av_nvtegra_channel_set_clock_rate(channel, channel->module_id, freq);
+if (err < 0)
+return err;
+
+err = av_nvtegra_channel_get_clock_rate(channel, channel->module_id, 
&channel->clock);
+if (err < 0)
+return err;
+#else
+err = AVERROR(mmuRequestSetAndWait(&channel->mmu_request, freq, -1));
+if (err < 0)
+return err;
+
+err = AVERROR(mmuRequestGet(&channel->mmu_request, &channel->clock));
+if (err < 0)
+return err;
+#endif
+return 0;
+}
+
 static void nvtegra_device_uninit(AVHWDeviceContext *ctx) {
 NVTegraDevicePriv   *priv = ctx->hwctx;
 AVNVTegraDeviceContext *hwctx = &priv->p;
@@ -386,6 +416,141 @@ static int nvtegra_get_buffer(AVHWFramesContext *ctx, 
AVFrame *frame) {
 return 0;
 }
 
+/*
+ * Possible frequencies on Icosa and Mariko+, in MHz
+ * (see tegra210-core-dvfs.c and tegra210b01-core-dvfs.c in l4t kernel 
sources, respectively):
+ * for NVDEC:
+ *   268.8, 384.0, 448.0, 486.4, 550.4, 576.0, 614.4, 652.8, 678.4, 691.2, 
716.8
+ *   460.8, 499.2, 556.8, 633.6, 652.8, 710.4, 748.8, 787.2, 825.6, 844.8, 
883.2, 902.4, 921.6, 940.8, 960.0, 979.2
+ * for NVJPG:
+ *   192.0, 307.2, 345.6, 409.6, 486.4, 524.8, 550.4, 576.0, 588.8, 614.4, 
627.2
+ *   422.4, 441.6, 499.2, 518.4, 537.6, 556.8, 576.0, 595.2, 614.4, 633.6, 
652.8
+ */
+
+int av_nvtegra_dfs_init(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, int 
width, int height,
+double framerate_hz)
+{
+NVTegraDevicePriv *priv = ctx->hwctx;
+
+uint32_t max_freq, lowcorner;
+int num_mbs, err;
+
+priv->dfs_num_samples = 20;
+priv->dfs_ema_damping = 0.1;
+
+/*
+ * Initialize low-corner frequency (reproduces official code)
+ * Framerate might be unavailable (or variable), but this is official logic
+ */
+num_mbs = width / 16 * height / 16;
+if (num_mbs <= 3600)
+lowcorner = 1;  /* 480p */
+else if (num_mbs <= 8160)
+lowcorner = 18000;  /* 720p */
+else if (num_mbs <= 32400)
+lowcorner = 34500;  /* 1080p */
+else
+lowcorner = 57600;  /* 4k */
+
+if (framerate_hz >= 0.1 && isfinite(framerate_hz))
+lowcorner = FFMIN(lowcorner, lowcorner * framerate_hz / 30.0);
+
+priv->framerate = framerate_hz;
+priv->dfs_lowcorner = lowcorner;
+
+av_log(ctx, AV_LOG_DEBUG, "DFS: Initializing lowcorner to %d Hz, using %u 
samples\n",
+   priv->dfs_lowcorner, priv->dfs_num_samples);
+
+/*
+ * Initialize channel to the max possible frequency (the kernel driver 
will clamp to an allowed value)
+ * Note: Official code passes INT_MAX kHz then multiplies by 1000 (to Hz) 
and converts to u32,
+ * resulting in this value.
+ */
+max_freq = (UINT64_C(1)<<32) - 1000 & UINT32_MAX;
+
+err = nvtegra_channel_set_freq(channel, max_freq);
+if (err < 0)
+return err;
+
+priv->dfs_decode_cycles_ema = 0.0;
+priv->dfs_bitrate_sum   = 0;
+priv->dfs_cur_sample= 0;
+priv->dfs_sampling_start_ts = av_gettime_relative();
+priv->dfs_last_ts_delta = 0;
+
+return 0;
+}
+
+int av_nvtegra_dfs_update(AVHWDeviceContext *ctx, AVNVTegraChannel *channel, 
int bitstream_len, int decode_cycles) {
+NVTegraDevicePriv *priv = ctx->hwctx;
+
+double frame_time, avg;
+int64_t now, wl_dt;
+uint32_t clock;
+int err;
+
+/*
+ * Official software implements DFS using a flat average of the decoder 
pool occupancy.
+ * We instead use the decode cycles as reported by NVDEC microcode, and 
the "bitrate"
+ * (bitstream bits fed to the hardware in a given clock time interval, NOT 
video time),
+ * to calculate a suitable frequency, and multiply it by 1.2 for good 
measure:
+ *   Freq = decode_cycles_per_bit * bits_per_secon

[FFmpeg-devel] [PATCH 08/16] nvtegra: add common hardware decoding code

2024-05-30 Thread averne
This includes decode common de/initialization code, decode-job management, and 
constraint checks.

Signed-off-by: averne 
---
 configure   |   1 +
 libavcodec/Makefile |   2 +
 libavcodec/hwconfig.h   |   2 +
 libavcodec/nvtegra_decode.c | 517 
 libavcodec/nvtegra_decode.h |  94 +++
 5 files changed, 616 insertions(+)
 create mode 100644 libavcodec/nvtegra_decode.c
 create mode 100644 libavcodec/nvtegra_decode.h

diff --git a/configure b/configure
index 51f169bfbd..566bb37b8c 100755
--- a/configure
+++ b/configure
@@ -2022,6 +2022,7 @@ HWACCEL_LIBRARY_LIST="
 mmal
 omx
 opencl
+nvtegra
 "
 
 DOCUMENT_LIST="
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 2443d2c6fd..f1e2dc6625 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -993,6 +993,7 @@ OBJS-$(CONFIG_VAAPI)  += vaapi_decode.o
 OBJS-$(CONFIG_VIDEOTOOLBOX)   += videotoolbox.o
 OBJS-$(CONFIG_VDPAU)  += vdpau.o
 OBJS-$(CONFIG_VULKAN) += vulkan.o vulkan_video.o
+OBJS-$(CONFIG_NVTEGRA)+= nvtegra_decode.o
 
 OBJS-$(CONFIG_AV1_D3D11VA_HWACCEL)+= dxva2_av1.o
 OBJS-$(CONFIG_AV1_DXVA2_HWACCEL)  += dxva2_av1.o
@@ -1285,6 +1286,7 @@ SKIPHEADERS-$(CONFIG_VIDEOTOOLBOX) += videotoolbox.h 
vt_internal.h
 SKIPHEADERS-$(CONFIG_VULKAN)   += vulkan.h vulkan_video.h 
vulkan_decode.h
 SKIPHEADERS-$(CONFIG_V4L2_M2M) += v4l2_buffers.h v4l2_context.h 
v4l2_m2m.h
 SKIPHEADERS-$(CONFIG_ZLIB) += zlib_wrapper.h
+SKIPHEADERS-$(CONFIG_NVTEGRA)  += nvtegra_decode.h
 
 TESTPROGS = avcodec \
 avpacket\
diff --git a/libavcodec/hwconfig.h b/libavcodec/hwconfig.h
index ee29ca631d..a3c3402c77 100644
--- a/libavcodec/hwconfig.h
+++ b/libavcodec/hwconfig.h
@@ -79,6 +79,8 @@ void ff_hwaccel_uninit(AVCodecContext *avctx);
 HW_CONFIG_HWACCEL(0, 0, 1, D3D11VA_VLD,  NONE, ff_ ## codec ## 
_d3d11va_hwaccel)
 #define HWACCEL_D3D12VA(codec) \
 HW_CONFIG_HWACCEL(1, 1, 0, D3D12,D3D12VA,  ff_ ## codec ## 
_d3d12va_hwaccel)
+#define HWACCEL_NVTEGRA(codec) \
+HW_CONFIG_HWACCEL(1, 1, 0, NVTEGRA,  NVTEGRA,  ff_ ## codec ## 
_nvtegra_hwaccel)
 
 #define HW_CONFIG_ENCODER(device, frames, ad_hoc, format, device_type_) \
 &(const AVCodecHWConfigInternal) { \
diff --git a/libavcodec/nvtegra_decode.c b/libavcodec/nvtegra_decode.c
new file mode 100644
index 00..1978fcf644
--- /dev/null
+++ b/libavcodec/nvtegra_decode.c
@@ -0,0 +1,517 @@
+/*
+ * Copyright (c) 2024 averne 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include "libavutil/hwcontext.h"
+#include "libavutil/hwcontext_nvtegra.h"
+#include "libavutil/nvtegra_host1x.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/pixfmt.h"
+#include "libavutil/intreadwrite.h"
+
+#include "avcodec.h"
+#include "codec_desc.h"
+#include "internal.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+static void nvtegra_input_map_free(void *opaque, uint8_t *data) {
+AVNVTegraMap *map = (AVNVTegraMap *)data;
+
+if (!data)
+return;
+
+av_nvtegra_map_destroy(map);
+
+av_freep(&map);
+}
+
+static AVBufferRef *nvtegra_input_map_alloc(void *opaque, size_t size) {
+FFNVTegraDecodeContext *ctx = opaque;
+
+AVBufferRef  *buffer;
+AVNVTegraMap *map;
+int err;
+
+map = av_mallocz(sizeof(*map));
+if (!map)
+return NULL;
+
+err = av_nvtegra_map_create(map, ctx->channel, ctx->input_map_size, 0x100,
+NVMAP_HEAP_IOVMM, NVMAP_HANDLE_WRITE_COMBINE);
+if (err < 0)
+return NULL;
+
+buffer = av_buffer_create((uint8_t *)map, sizeof(*map), 
nvtegra_input_map_free, ctx, 0);
+if (!buffer)
+goto fail;
+
+ctx->new_input_buffer = true;
+
+return buffer;
+
+fail:
+av_log(ctx, AV_LOG_ERROR, "Failed to create buffer\n");
+av_nvtegra_map_destroy(map);
+av_freep(map);
+return NULL;
+}
+
+int ff_nvtegra_decode_init(AVCodecContext *avctx, FFNVTegraDecodeContext *ctx) 
{
+AVHWFramesContext  *frames_ctx;
+AVHWDeviceContext  

[FFmpeg-devel] [PATCH 09/16] nvtegra: add mpeg1/2 hardware decoding

2024-05-30 Thread averne
This is probably the most straightforward codec to implement on NVDEC. Since 
mpeg2 is a superset of mpeg1, both are supported by the same backend.

Signed-off-by: averne 
---
 configure   |   4 +
 libavcodec/Makefile |   2 +
 libavcodec/hwaccels.h   |   2 +
 libavcodec/mpeg12dec.c  |  12 ++
 libavcodec/nvtegra_mpeg12.c | 319 
 5 files changed, 339 insertions(+)
 create mode 100644 libavcodec/nvtegra_mpeg12.c

diff --git a/configure b/configure
index 566bb37b8c..67db4a2ed2 100755
--- a/configure
+++ b/configure
@@ -3221,6 +3221,8 @@ mpeg1_vdpau_hwaccel_deps="vdpau"
 mpeg1_vdpau_hwaccel_select="mpeg1video_decoder"
 mpeg1_videotoolbox_hwaccel_deps="videotoolbox"
 mpeg1_videotoolbox_hwaccel_select="mpeg1video_decoder"
+mpeg1_nvtegra_hwaccel_deps="nvtegra"
+mpeg1_nvtegra_hwaccel_select="mpeg1video_decoder"
 mpeg2_d3d11va_hwaccel_deps="d3d11va"
 mpeg2_d3d11va_hwaccel_select="mpeg2video_decoder"
 mpeg2_d3d11va2_hwaccel_deps="d3d11va"
@@ -3237,6 +3239,8 @@ mpeg2_vdpau_hwaccel_deps="vdpau"
 mpeg2_vdpau_hwaccel_select="mpeg2video_decoder"
 mpeg2_videotoolbox_hwaccel_deps="videotoolbox"
 mpeg2_videotoolbox_hwaccel_select="mpeg2video_decoder"
+mpeg2_nvtegra_hwaccel_deps="nvtegra"
+mpeg2_nvtegra_hwaccel_select="mpeg2video_decoder"
 mpeg4_nvdec_hwaccel_deps="nvdec"
 mpeg4_nvdec_hwaccel_select="mpeg4_decoder"
 mpeg4_vaapi_hwaccel_deps="vaapi"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index f1e2dc6625..e4dfcbce6c 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1026,6 +1026,7 @@ OBJS-$(CONFIG_MJPEG_VAAPI_HWACCEL)+= vaapi_mjpeg.o
 OBJS-$(CONFIG_MPEG1_NVDEC_HWACCEL)+= nvdec_mpeg12.o
 OBJS-$(CONFIG_MPEG1_VDPAU_HWACCEL)+= vdpau_mpeg12.o
 OBJS-$(CONFIG_MPEG1_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o
+OBJS-$(CONFIG_MPEG1_NVTEGRA_HWACCEL)  += nvtegra_mpeg12.o
 OBJS-$(CONFIG_MPEG2_D3D11VA_HWACCEL)  += dxva2_mpeg2.o
 OBJS-$(CONFIG_MPEG2_DXVA2_HWACCEL)+= dxva2_mpeg2.o
 OBJS-$(CONFIG_MPEG2_D3D12VA_HWACCEL)  += dxva2_mpeg2.o d3d12va_mpeg2.o
@@ -1034,6 +1035,7 @@ OBJS-$(CONFIG_MPEG2_QSV_HWACCEL)  += qsvdec.o
 OBJS-$(CONFIG_MPEG2_VAAPI_HWACCEL)+= vaapi_mpeg2.o
 OBJS-$(CONFIG_MPEG2_VDPAU_HWACCEL)+= vdpau_mpeg12.o
 OBJS-$(CONFIG_MPEG2_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o
+OBJS-$(CONFIG_MPEG2_NVTEGRA_HWACCEL)  += nvtegra_mpeg12.o
 OBJS-$(CONFIG_MPEG4_NVDEC_HWACCEL)+= nvdec_mpeg4.o
 OBJS-$(CONFIG_MPEG4_VAAPI_HWACCEL)+= vaapi_mpeg4.o
 OBJS-$(CONFIG_MPEG4_VDPAU_HWACCEL)+= vdpau_mpeg4.o
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index 5171e4c7d7..ad9e9366f2 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -52,6 +52,7 @@ extern const struct FFHWAccel ff_mjpeg_vaapi_hwaccel;
 extern const struct FFHWAccel ff_mpeg1_nvdec_hwaccel;
 extern const struct FFHWAccel ff_mpeg1_vdpau_hwaccel;
 extern const struct FFHWAccel ff_mpeg1_videotoolbox_hwaccel;
+extern const struct FFHWAccel ff_mpeg1_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_mpeg2_d3d11va_hwaccel;
 extern const struct FFHWAccel ff_mpeg2_d3d11va2_hwaccel;
 extern const struct FFHWAccel ff_mpeg2_d3d12va_hwaccel;
@@ -60,6 +61,7 @@ extern const struct FFHWAccel ff_mpeg2_nvdec_hwaccel;
 extern const struct FFHWAccel ff_mpeg2_vaapi_hwaccel;
 extern const struct FFHWAccel ff_mpeg2_vdpau_hwaccel;
 extern const struct FFHWAccel ff_mpeg2_videotoolbox_hwaccel;
+extern const struct FFHWAccel ff_mpeg2_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_mpeg4_nvdec_hwaccel;
 extern const struct FFHWAccel ff_mpeg4_vaapi_hwaccel;
 extern const struct FFHWAccel ff_mpeg4_vdpau_hwaccel;
diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c
index 9fd765f030..7d8ecae542 100644
--- a/libavcodec/mpeg12dec.c
+++ b/libavcodec/mpeg12dec.c
@@ -835,6 +835,9 @@ static const enum AVPixelFormat 
mpeg1_hwaccel_pixfmt_list_420[] = {
 #endif
 #if CONFIG_MPEG1_VDPAU_HWACCEL
 AV_PIX_FMT_VDPAU,
+#endif
+#if CONFIG_MPEG1_NVTEGRA_HWACCEL
+AV_PIX_FMT_NVTEGRA,
 #endif
 AV_PIX_FMT_YUV420P,
 AV_PIX_FMT_NONE
@@ -862,6 +865,9 @@ static const enum AVPixelFormat 
mpeg2_hwaccel_pixfmt_list_420[] = {
 #endif
 #if CONFIG_MPEG2_VIDEOTOOLBOX_HWACCEL
 AV_PIX_FMT_VIDEOTOOLBOX,
+#endif
+#if CONFIG_MPEG2_NVTEGRA_HWACCEL
+AV_PIX_FMT_NVTEGRA,
 #endif
 AV_PIX_FMT_YUV420P,
 AV_PIX_FMT_NONE
@@ -2624,6 +2630,9 @@ const FFCodec ff_mpeg1video_decoder = {
 #endif
 #if CONFIG_MPEG1_VIDEOTOOLBOX_HWACCEL
HWACCEL_VIDEOTOOLBOX(mpeg1),
+#endif
+#if CONFIG_MPEG1_NVTEGRA_HWACCEL
+   HWACCEL_NVTEGRA(mpeg1),
 #endif
NULL
},
@@ -2696,6 +2705,9 @@ const FFCodec ff_mpeg2video_decoder = {
 #endif
 #if CONFIG_MPEG2_VIDEOTOOLBOX_HWACCEL
 HWACCEL_VIDEOTOOLBOX(mpeg2),
+#endif
+#if CONFIG_MPEG2_NVTEGRA_HWACCEL
+HWACCEL_NVTEGRA(mpeg2),
 #endif
  

[FFmpeg-devel] [PATCH 10/16] nvtegra: add mpeg4 hardware decoding

2024-05-30 Thread averne
Signed-off-by: averne 
---
 configure  |   2 +
 libavcodec/Makefile|   1 +
 libavcodec/h263dec.c   |   6 +
 libavcodec/hwaccels.h  |   1 +
 libavcodec/mpeg4videodec.c |   3 +
 libavcodec/nvtegra_mpeg4.c | 344 +
 6 files changed, 357 insertions(+)
 create mode 100644 libavcodec/nvtegra_mpeg4.c

diff --git a/configure b/configure
index 67db4a2ed2..0795f44a1e 100755
--- a/configure
+++ b/configure
@@ -3251,6 +3251,8 @@ mpeg4_videotoolbox_hwaccel_deps="videotoolbox"
 mpeg4_videotoolbox_hwaccel_select="mpeg4_decoder"
 prores_videotoolbox_hwaccel_deps="videotoolbox"
 prores_videotoolbox_hwaccel_select="prores_decoder"
+mpeg4_nvtegra_hwaccel_deps="nvtegra"
+mpeg4_nvtegra_hwaccel_select="mpeg4_decoder"
 vc1_d3d11va_hwaccel_deps="d3d11va"
 vc1_d3d11va_hwaccel_select="vc1_decoder"
 vc1_d3d11va2_hwaccel_deps="d3d11va"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index e4dfcbce6c..1ea9984876 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1040,6 +1040,7 @@ OBJS-$(CONFIG_MPEG4_NVDEC_HWACCEL)+= nvdec_mpeg4.o
 OBJS-$(CONFIG_MPEG4_VAAPI_HWACCEL)+= vaapi_mpeg4.o
 OBJS-$(CONFIG_MPEG4_VDPAU_HWACCEL)+= vdpau_mpeg4.o
 OBJS-$(CONFIG_MPEG4_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o
+OBJS-$(CONFIG_MPEG4_NVTEGRA_HWACCEL)  += nvtegra_mpeg4.o
 OBJS-$(CONFIG_VC1_D3D11VA_HWACCEL)+= dxva2_vc1.o
 OBJS-$(CONFIG_VC1_DXVA2_HWACCEL)  += dxva2_vc1.o
 OBJS-$(CONFIG_VC1_D3D12VA_HWACCEL)+= dxva2_vc1.o d3d12va_vc1.o
diff --git a/libavcodec/h263dec.c b/libavcodec/h263dec.c
index 48bd467f30..db25e09ff3 100644
--- a/libavcodec/h263dec.c
+++ b/libavcodec/h263dec.c
@@ -60,6 +60,9 @@ static const enum AVPixelFormat 
h263_hwaccel_pixfmt_list_420[] = {
 #endif
 #if CONFIG_H263_VIDEOTOOLBOX_HWACCEL || CONFIG_MPEG4_VIDEOTOOLBOX_HWACCEL
 AV_PIX_FMT_VIDEOTOOLBOX,
+#endif
+#if CONFIG_MPEG4_NVTEGRA_HWACCEL
+AV_PIX_FMT_NVTEGRA,
 #endif
 AV_PIX_FMT_YUV420P,
 AV_PIX_FMT_NONE
@@ -690,6 +693,9 @@ static const AVCodecHWConfigInternal *const 
h263_hw_config_list[] = {
 #if CONFIG_MPEG4_VDPAU_HWACCEL
 HWACCEL_VDPAU(mpeg4),
 #endif
+#if CONFIG_MPEG4_NVTEGRA_HWACCEL
+HWACCEL_NVTEGRA(mpeg4),
+#endif
 #if CONFIG_H263_VIDEOTOOLBOX_HWACCEL
 HWACCEL_VIDEOTOOLBOX(h263),
 #endif
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index ad9e9366f2..da2b4ae10e 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -67,6 +67,7 @@ extern const struct FFHWAccel ff_mpeg4_vaapi_hwaccel;
 extern const struct FFHWAccel ff_mpeg4_vdpau_hwaccel;
 extern const struct FFHWAccel ff_mpeg4_videotoolbox_hwaccel;
 extern const struct FFHWAccel ff_prores_videotoolbox_hwaccel;
+extern const struct FFHWAccel ff_mpeg4_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_vc1_d3d11va_hwaccel;
 extern const struct FFHWAccel ff_vc1_d3d11va2_hwaccel;
 extern const struct FFHWAccel ff_vc1_d3d12va_hwaccel;
diff --git a/libavcodec/mpeg4videodec.c b/libavcodec/mpeg4videodec.c
index df1e22207d..15e2da5e88 100644
--- a/libavcodec/mpeg4videodec.c
+++ b/libavcodec/mpeg4videodec.c
@@ -3882,6 +3882,9 @@ const FFCodec ff_mpeg4_decoder = {
 #endif
 #if CONFIG_MPEG4_VIDEOTOOLBOX_HWACCEL
HWACCEL_VIDEOTOOLBOX(mpeg4),
+#endif
+#if CONFIG_MPEG4_NVTEGRA_HWACCEL
+   HWACCEL_NVTEGRA(mpeg4),
 #endif
NULL
},
diff --git a/libavcodec/nvtegra_mpeg4.c b/libavcodec/nvtegra_mpeg4.c
new file mode 100644
index 00..2325380330
--- /dev/null
+++ b/libavcodec/nvtegra_mpeg4.c
@@ -0,0 +1,344 @@
+/*
+ * Copyright (c) 2024 averne 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "mpeg4video.h"
+#include "mpeg4videodec.h"
+#include "mpeg4videodefs.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraMPEG4DecodeContext {
+FFNVTegraDecodeContext core;
+
+AVNVTegraMap common_map;
+uint32_t coloc_off, history_off, scratch_off;
+uint32_t history_size, scratch_size;
+
+AVFrame *prev_fra

[FFmpeg-devel] [PATCH 11/16] nvtegra: add vc1 hardware decoding

2024-05-30 Thread averne
Since L4T does not hook up the vc1 code to a user-facing library, this was 
written solely based on static reverse engineering.

Signed-off-by: averne 
---
 configure|   3 +
 libavcodec/Makefile  |   1 +
 libavcodec/hwaccels.h|   2 +
 libavcodec/nvtegra_vc1.c | 455 +++
 libavcodec/vc1dec.c  |   9 +
 5 files changed, 470 insertions(+)
 create mode 100644 libavcodec/nvtegra_vc1.c

diff --git a/configure b/configure
index 0795f44a1e..952e3aef7d 100755
--- a/configure
+++ b/configure
@@ -3267,6 +3267,8 @@ vc1_vaapi_hwaccel_deps="vaapi"
 vc1_vaapi_hwaccel_select="vc1_decoder"
 vc1_vdpau_hwaccel_deps="vdpau"
 vc1_vdpau_hwaccel_select="vc1_decoder"
+vc1_nvtegra_hwaccel_deps="nvtegra"
+vc1_nvtegra_hwaccel_select="vc1_decoder"
 vp8_nvdec_hwaccel_deps="nvdec"
 vp8_nvdec_hwaccel_select="vp8_decoder"
 vp8_vaapi_hwaccel_deps="vaapi"
@@ -3294,6 +3296,7 @@ wmv3_dxva2_hwaccel_select="vc1_dxva2_hwaccel"
 wmv3_nvdec_hwaccel_select="vc1_nvdec_hwaccel"
 wmv3_vaapi_hwaccel_select="vc1_vaapi_hwaccel"
 wmv3_vdpau_hwaccel_select="vc1_vdpau_hwaccel"
+wmv3_nvtegra_hwaccel_select="vc1_nvtegra_hwaccel"
 
 # hardware-accelerated codecs
 mediafoundation_deps="mftransform_h MFCreateAlignedMemoryBuffer"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 1ea9984876..e102d03e7d 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1048,6 +1048,7 @@ OBJS-$(CONFIG_VC1_NVDEC_HWACCEL)  += nvdec_vc1.o
 OBJS-$(CONFIG_VC1_QSV_HWACCEL)+= qsvdec.o
 OBJS-$(CONFIG_VC1_VAAPI_HWACCEL)  += vaapi_vc1.o
 OBJS-$(CONFIG_VC1_VDPAU_HWACCEL)  += vdpau_vc1.o
+OBJS-$(CONFIG_VC1_NVTEGRA_HWACCEL)+= nvtegra_vc1.o
 OBJS-$(CONFIG_VP8_NVDEC_HWACCEL)  += nvdec_vp8.o
 OBJS-$(CONFIG_VP8_VAAPI_HWACCEL)  += vaapi_vp8.o
 OBJS-$(CONFIG_VP9_D3D11VA_HWACCEL)+= dxva2_vp9.o
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index da2b4ae10e..a69e6a1977 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -75,6 +75,7 @@ extern const struct FFHWAccel ff_vc1_dxva2_hwaccel;
 extern const struct FFHWAccel ff_vc1_nvdec_hwaccel;
 extern const struct FFHWAccel ff_vc1_vaapi_hwaccel;
 extern const struct FFHWAccel ff_vc1_vdpau_hwaccel;
+extern const struct FFHWAccel ff_vc1_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_vp8_nvdec_hwaccel;
 extern const struct FFHWAccel ff_vp8_vaapi_hwaccel;
 extern const struct FFHWAccel ff_vp9_d3d11va_hwaccel;
@@ -92,5 +93,6 @@ extern const struct FFHWAccel ff_wmv3_dxva2_hwaccel;
 extern const struct FFHWAccel ff_wmv3_nvdec_hwaccel;
 extern const struct FFHWAccel ff_wmv3_vaapi_hwaccel;
 extern const struct FFHWAccel ff_wmv3_vdpau_hwaccel;
+extern const struct FFHWAccel ff_wmv3_nvtegra_hwaccel;
 
 #endif /* AVCODEC_HWACCELS_H */
diff --git a/libavcodec/nvtegra_vc1.c b/libavcodec/nvtegra_vc1.c
new file mode 100644
index 00..b5ee85c9d4
--- /dev/null
+++ b/libavcodec/nvtegra_vc1.c
@@ -0,0 +1,455 @@
+/*
+ * Copyright (c) 2024 averne 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include 
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "vc1.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraVC1DecodeContext {
+FFNVTegraDecodeContext core;
+
+AVNVTegraMap common_map;
+uint32_t coloc_off, history_off, scratch_off;
+uint32_t history_size, scratch_size;
+
+bool is_first_slice;
+
+AVFrame *prev_frame, *next_frame;
+} NVTegraVC1DecodeContext;
+
+/* Size (width, height) of a macroblock */
+#define MB_SIZE 16
+
+static const uint8_t bitstream_end_sequence[] = {
+0x00, 0x00, 0x01, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x0a, 
0x00, 0x00, 0x00, 0x00,
+};
+
+static int nvtegra_vc1_decode_uninit(AVCodecContext *avctx) {
+NVTegraVC1DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+int err;
+
+av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA VC1 decoder\n");
+
+err = av_nvtegra_map_destroy(&ctx->common_map);
+if (err < 0)
+return err;
+
+err = ff_nvtegra_decode_uninit(avctx, &ctx->core);
+if (err < 0)
+retu

[FFmpeg-devel] [PATCH 16/16] nvtegra: add mjpeg hardware decoding

2024-05-30 Thread averne
This uses NVJPG, a hardware engine separate from NVDEC. On the tegra 210 (and 
possibly later hardware), it has the specificity of being unable to decode to 
tiled surfaces, along with some quirks that have been observed to hang the 
hardware.

Signed-off-by: averne 
---
 configure  |   2 +
 libavcodec/Makefile|   1 +
 libavcodec/hwaccels.h  |   1 +
 libavcodec/mjpegdec.c  |   6 +
 libavcodec/nvtegra_mjpeg.c | 336 +
 5 files changed, 346 insertions(+)
 create mode 100644 libavcodec/nvtegra_mjpeg.c

diff --git a/configure b/configure
index 3fe948d9ab..1d885ed655 100755
--- a/configure
+++ b/configure
@@ -3219,6 +3219,8 @@ mjpeg_nvdec_hwaccel_deps="nvdec"
 mjpeg_nvdec_hwaccel_select="mjpeg_decoder"
 mjpeg_vaapi_hwaccel_deps="vaapi"
 mjpeg_vaapi_hwaccel_select="mjpeg_decoder"
+mjpeg_nvtegra_hwaccel_deps="nvtegra"
+mjpeg_nvtegra_hwaccel_select="mjpeg_decoder"
 mpeg1_nvdec_hwaccel_deps="nvdec"
 mpeg1_nvdec_hwaccel_select="mpeg1video_decoder"
 mpeg1_vdpau_hwaccel_deps="vdpau"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 914995558e..6a773f8d3e 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1025,6 +1025,7 @@ OBJS-$(CONFIG_HEVC_VULKAN_HWACCEL)+= 
vulkan_decode.o vulkan_hevc.o
 OBJS-$(CONFIG_HEVC_NVTEGRA_HWACCEL)   += nvtegra_hevc.o
 OBJS-$(CONFIG_MJPEG_NVDEC_HWACCEL)+= nvdec_mjpeg.o
 OBJS-$(CONFIG_MJPEG_VAAPI_HWACCEL)+= vaapi_mjpeg.o
+OBJS-$(CONFIG_MJPEG_NVTEGRA_HWACCEL)  += nvtegra_mjpeg.o
 OBJS-$(CONFIG_MPEG1_NVDEC_HWACCEL)+= nvdec_mpeg12.o
 OBJS-$(CONFIG_MPEG1_VDPAU_HWACCEL)+= vdpau_mpeg12.o
 OBJS-$(CONFIG_MPEG1_VIDEOTOOLBOX_HWACCEL) += videotoolbox.o
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index a3babfc309..f5a121d23f 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -51,6 +51,7 @@ extern const struct FFHWAccel ff_hevc_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_hevc_vulkan_hwaccel;
 extern const struct FFHWAccel ff_mjpeg_nvdec_hwaccel;
 extern const struct FFHWAccel ff_mjpeg_vaapi_hwaccel;
+extern const struct FFHWAccel ff_mjpeg_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_mpeg1_nvdec_hwaccel;
 extern const struct FFHWAccel ff_mpeg1_vdpau_hwaccel;
 extern const struct FFHWAccel ff_mpeg1_videotoolbox_hwaccel;
diff --git a/libavcodec/mjpegdec.c b/libavcodec/mjpegdec.c
index 1481a7f285..f8b00a92d6 100644
--- a/libavcodec/mjpegdec.c
+++ b/libavcodec/mjpegdec.c
@@ -733,6 +733,9 @@ int ff_mjpeg_decode_sof(MJpegDecodeContext *s)
 #endif
 #if CONFIG_MJPEG_VAAPI_HWACCEL
 AV_PIX_FMT_VAAPI,
+#endif
+#if CONFIG_MJPEG_NVTEGRA_HWACCEL
+AV_PIX_FMT_NVTEGRA,
 #endif
 s->avctx->pix_fmt,
 AV_PIX_FMT_NONE,
@@ -3021,6 +3024,9 @@ const FFCodec ff_mjpeg_decoder = {
 #endif
 #if CONFIG_MJPEG_VAAPI_HWACCEL
 HWACCEL_VAAPI(mjpeg),
+#endif
+#if CONFIG_MJPEG_NVTEGRA_HWACCEL
+HWACCEL_NVTEGRA(mjpeg),
 #endif
 NULL
 },
diff --git a/libavcodec/nvtegra_mjpeg.c b/libavcodec/nvtegra_mjpeg.c
new file mode 100644
index 00..9139116159
--- /dev/null
+++ b/libavcodec/nvtegra_mjpeg.c
@@ -0,0 +1,336 @@
+/*
+ * Copyright (c) 2024 averne 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "mjpegdec.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraMJPEGDecodeContext {
+FFNVTegraDecodeContext core;
+} NVTegraMJPEGDecodeContext;
+
+static int nvtegra_mjpeg_decode_uninit(AVCodecContext *avctx) {
+NVTegraMJPEGDecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+int err;
+
+av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA MJPEG decoder\n");
+
+err = ff_nvtegra_decode_uninit(avctx, &ctx->core);
+if (err < 0)
+return err;
+
+return 0;
+}
+
+static int nvtegra_mjpeg_decode_init(AVCodecContext *avctx) {
+MJpegDecodeContext  *s = avctx->priv_data;
+NVTegraMJPEGDecodeContext *ctx = avctx->internal->hwaccel_priv_data;

[FFmpeg-devel] [PATCH 15/16] nvtegra: add vp9 hardware decoding

2024-05-30 Thread averne
This hardware block was based on/licensed from the hantro implementation (as 
evidenced by the identical structures). Relevant V4L2 kernel code was 
referenced when implementing backward entropy updates.

Signed-off-by: averne 
---
 configure|   2 +
 libavcodec/Makefile  |   1 +
 libavcodec/hwaccels.h|   1 +
 libavcodec/nvtegra_vp9.c | 665 +++
 libavcodec/vp9.c |  10 +-
 5 files changed, 678 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/nvtegra_vp9.c

diff --git a/configure b/configure
index a347337dd4..3fe948d9ab 100755
--- a/configure
+++ b/configure
@@ -3295,6 +3295,8 @@ vp9_vdpau_hwaccel_deps="vdpau VdpPictureInfoVP9"
 vp9_vdpau_hwaccel_select="vp9_decoder"
 vp9_videotoolbox_hwaccel_deps="videotoolbox"
 vp9_videotoolbox_hwaccel_select="vp9_decoder"
+vp9_nvtegra_hwaccel_deps="nvtegra"
+vp9_nvtegra_hwaccel_select="vp9_decoder"
 wmv3_d3d11va_hwaccel_select="vc1_d3d11va_hwaccel"
 wmv3_d3d11va2_hwaccel_select="vc1_d3d11va2_hwaccel"
 wmv3_d3d12va_hwaccel_select="vc1_d3d12va_hwaccel"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 89c5986aab..914995558e 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1061,6 +1061,7 @@ OBJS-$(CONFIG_VP9_NVDEC_HWACCEL)  += nvdec_vp9.o
 OBJS-$(CONFIG_VP9_VAAPI_HWACCEL)  += vaapi_vp9.o
 OBJS-$(CONFIG_VP9_VDPAU_HWACCEL)  += vdpau_vp9.o
 OBJS-$(CONFIG_VP9_VIDEOTOOLBOX_HWACCEL)   += videotoolbox_vp9.o
+OBJS-$(CONFIG_VP9_NVTEGRA_HWACCEL)+= nvtegra_vp9.o
 OBJS-$(CONFIG_VP8_QSV_HWACCEL)+= qsvdec.o
 
 # Objects duplicated from other libraries for shared builds
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index 7d43aeccec..a3babfc309 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -89,6 +89,7 @@ extern const struct FFHWAccel ff_vp9_nvdec_hwaccel;
 extern const struct FFHWAccel ff_vp9_vaapi_hwaccel;
 extern const struct FFHWAccel ff_vp9_vdpau_hwaccel;
 extern const struct FFHWAccel ff_vp9_videotoolbox_hwaccel;
+extern const struct FFHWAccel ff_vp9_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_wmv3_d3d11va_hwaccel;
 extern const struct FFHWAccel ff_wmv3_d3d11va2_hwaccel;
 extern const struct FFHWAccel ff_wmv3_d3d12va_hwaccel;
diff --git a/libavcodec/nvtegra_vp9.c b/libavcodec/nvtegra_vp9.c
new file mode 100644
index 00..a0cca1a5a4
--- /dev/null
+++ b/libavcodec/nvtegra_vp9.c
@@ -0,0 +1,665 @@
+/*
+ * Copyright (c) 2024 averne 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include 
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "vp9data.h"
+#include "vp9dec.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraVP9DecodeContext {
+FFNVTegraDecodeContext core;
+
+uint32_t prob_tab_off;
+
+AVNVTegraMap common_map;
+uint32_t segment_rw1_off, segment_rw2_off, tile_sizes_off, filter_off,
+ col_mvrw1_off, col_mvrw2_off, ctx_counter_off;
+
+bool prev_show_frame;
+
+AVFrame *refs[3];
+} NVTegraVP9DecodeContext;
+
+/* Size (width, height) of a macroblock */
+#define MB_SIZE 16
+
+/* Maximum size (width, height) of a superblock */
+#define SB_SIZE 64
+
+#define CEILDIV(a, b) (((a) + (b) - 1) / (b))
+
+/* Prediction modes aren't layed out in the same order in ffmpeg's defaults 
than in hardware */
+static const uint8_t pmconv[] = { 2, 0, 1, 3, 4, 5, 6, 8, 7, 9 };
+
+static int nvtegra_vp9_decode_uninit(AVCodecContext *avctx) {
+NVTegraVP9DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+int err;
+
+av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA VP9 decoder\n");
+
+err = av_nvtegra_map_destroy(&ctx->common_map);
+if (err < 0)
+return err;
+
+err = ff_nvtegra_decode_uninit(avctx, &ctx->core);
+if (err < 0)
+return err;
+
+return 0;
+}
+
+static int nvtegra_vp9_decode_init(AVCodecContext *avctx) {
+NVTegraVP9DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+AVHWDeviceContext  *hw_device_ctx;
+AVNVTegraDeviceContext *device_hwctx;
+uint32_t aligned_width, aligned_height, max_sb_size,
+  

[FFmpeg-devel] [PATCH 12/16] nvtegra: add h264 hardware decoding

2024-05-30 Thread averne
Due to the hardware modus operandi, dpb references must stay at a fixed slot 
for their entire lifetime.

Signed-off-by: averne 
---
 configure |   2 +
 libavcodec/Makefile   |   1 +
 libavcodec/h264_slice.c   |   6 +-
 libavcodec/h264dec.c  |   3 +
 libavcodec/hwaccels.h |   1 +
 libavcodec/nvtegra_h264.c | 506 ++
 6 files changed, 518 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/nvtegra_h264.c

diff --git a/configure b/configure
index 952e3aef7d..930cd3c9bd 100755
--- a/configure
+++ b/configure
@@ -3193,6 +3193,8 @@ h264_videotoolbox_hwaccel_deps="videotoolbox"
 h264_videotoolbox_hwaccel_select="h264_decoder"
 h264_vulkan_hwaccel_deps="vulkan"
 h264_vulkan_hwaccel_select="h264_decoder"
+h264_nvtegra_hwaccel_deps="nvtegra"
+h264_nvtegra_hwaccel_select="h264_decoder"
 hevc_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_HEVC"
 hevc_d3d11va_hwaccel_select="hevc_decoder"
 hevc_d3d11va2_hwaccel_deps="d3d11va DXVA_PicParams_HEVC"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index e102d03e7d..2cb0ec21a8 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1013,6 +1013,7 @@ OBJS-$(CONFIG_H264_VAAPI_HWACCEL) += vaapi_h264.o
 OBJS-$(CONFIG_H264_VDPAU_HWACCEL) += vdpau_h264.o
 OBJS-$(CONFIG_H264_VIDEOTOOLBOX_HWACCEL)  += videotoolbox.o
 OBJS-$(CONFIG_H264_VULKAN_HWACCEL)+= vulkan_decode.o vulkan_h264.o
+OBJS-$(CONFIG_H264_NVTEGRA_HWACCEL)   += nvtegra_h264.o
 OBJS-$(CONFIG_HEVC_D3D11VA_HWACCEL)   += dxva2_hevc.o
 OBJS-$(CONFIG_HEVC_DXVA2_HWACCEL) += dxva2_hevc.o
 OBJS-$(CONFIG_HEVC_D3D12VA_HWACCEL)   += dxva2_hevc.o d3d12va_hevc.o
diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c
index ce2c4caca1..dc4c5545c8 100644
--- a/libavcodec/h264_slice.c
+++ b/libavcodec/h264_slice.c
@@ -784,7 +784,8 @@ static enum AVPixelFormat get_pixel_format(H264Context *h, 
int force_callback)
  CONFIG_H264_VAAPI_HWACCEL + \
  CONFIG_H264_VIDEOTOOLBOX_HWACCEL + \
  CONFIG_H264_VDPAU_HWACCEL + \
- CONFIG_H264_VULKAN_HWACCEL)
+ CONFIG_H264_VULKAN_HWACCEL + \
+ CONFIG_H264_NVTEGRA_HWACCEL)
 enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmt = pix_fmts;
 
 switch (h->ps.sps->bit_depth_luma) {
@@ -888,6 +889,9 @@ static enum AVPixelFormat get_pixel_format(H264Context *h, 
int force_callback)
 #endif
 #if CONFIG_H264_VAAPI_HWACCEL
 *fmt++ = AV_PIX_FMT_VAAPI;
+#endif
+#if CONFIG_H264_NVTEGRA_HWACCEL
+*fmt++ = AV_PIX_FMT_NVTEGRA;
 #endif
 if (h->avctx->color_range == AVCOL_RANGE_JPEG)
 *fmt++ = AV_PIX_FMT_YUVJ420P;
diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c
index fd23e367b4..51f53f07a9 100644
--- a/libavcodec/h264dec.c
+++ b/libavcodec/h264dec.c
@@ -1160,6 +1160,9 @@ const FFCodec ff_h264_decoder = {
 #endif
 #if CONFIG_H264_VULKAN_HWACCEL
HWACCEL_VULKAN(h264),
+#endif
+#if CONFIG_H264_NVTEGRA_HWACCEL
+   HWACCEL_NVTEGRA(h264),
 #endif
NULL
},
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index a69e6a1977..463fd333a1 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -37,6 +37,7 @@ extern const struct FFHWAccel ff_h264_nvdec_hwaccel;
 extern const struct FFHWAccel ff_h264_vaapi_hwaccel;
 extern const struct FFHWAccel ff_h264_vdpau_hwaccel;
 extern const struct FFHWAccel ff_h264_videotoolbox_hwaccel;
+extern const struct FFHWAccel ff_h264_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_h264_vulkan_hwaccel;
 extern const struct FFHWAccel ff_hevc_d3d11va_hwaccel;
 extern const struct FFHWAccel ff_hevc_d3d11va2_hwaccel;
diff --git a/libavcodec/nvtegra_h264.c b/libavcodec/nvtegra_h264.c
new file mode 100644
index 00..63073c44a6
--- /dev/null
+++ b/libavcodec/nvtegra_h264.c
@@ -0,0 +1,506 @@
+/*
+ * Copyright (c) 2024 averne 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include 
+#include 
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "h264dec.h"
+#include 

[FFmpeg-devel] [PATCH 13/16] nvtegra: add hevc hardware decoding

2024-05-30 Thread averne
Same remark as for h264. In addition, a number of bits to be skipped must be 
calculated. This is done in the main header parsing routine, instead of 
reimplementing it in the hwaccel backend.
On the tegra 210, this is the only hardware codec that can output 10-bit data.

Signed-off-by: averne 
---
 configure |   2 +
 libavcodec/Makefile   |   1 +
 libavcodec/hevcdec.c  |  17 +-
 libavcodec/hevcdec.h  |   2 +
 libavcodec/hwaccels.h |   1 +
 libavcodec/nvtegra_hevc.c | 633 ++
 6 files changed, 655 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/nvtegra_hevc.c

diff --git a/configure b/configure
index 930cd3c9bd..ba4c5287e3 100755
--- a/configure
+++ b/configure
@@ -3213,6 +3213,8 @@ hevc_videotoolbox_hwaccel_deps="videotoolbox"
 hevc_videotoolbox_hwaccel_select="hevc_decoder"
 hevc_vulkan_hwaccel_deps="vulkan"
 hevc_vulkan_hwaccel_select="hevc_decoder"
+hevc_nvtegra_hwaccel_deps="nvtegra"
+hevc_nvtegra_hwaccel_select="hevc_decoder"
 mjpeg_nvdec_hwaccel_deps="nvdec"
 mjpeg_nvdec_hwaccel_select="mjpeg_decoder"
 mjpeg_vaapi_hwaccel_deps="vaapi"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 2cb0ec21a8..de667b8a4b 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1022,6 +1022,7 @@ OBJS-$(CONFIG_HEVC_QSV_HWACCEL)   += qsvdec.o
 OBJS-$(CONFIG_HEVC_VAAPI_HWACCEL) += vaapi_hevc.o h265_profile_level.o
 OBJS-$(CONFIG_HEVC_VDPAU_HWACCEL) += vdpau_hevc.o h265_profile_level.o
 OBJS-$(CONFIG_HEVC_VULKAN_HWACCEL)+= vulkan_decode.o vulkan_hevc.o
+OBJS-$(CONFIG_HEVC_NVTEGRA_HWACCEL)   += nvtegra_hevc.o
 OBJS-$(CONFIG_MJPEG_NVDEC_HWACCEL)+= nvdec_mjpeg.o
 OBJS-$(CONFIG_MJPEG_VAAPI_HWACCEL)+= vaapi_mjpeg.o
 OBJS-$(CONFIG_MPEG1_NVDEC_HWACCEL)+= nvdec_mpeg12.o
diff --git a/libavcodec/hevcdec.c b/libavcodec/hevcdec.c
index b41dc46053..41bde57920 100644
--- a/libavcodec/hevcdec.c
+++ b/libavcodec/hevcdec.c
@@ -406,7 +406,8 @@ static enum AVPixelFormat get_format(HEVCContext *s, const 
HEVCSPS *sps)
  CONFIG_HEVC_VAAPI_HWACCEL + \
  CONFIG_HEVC_VIDEOTOOLBOX_HWACCEL + \
  CONFIG_HEVC_VDPAU_HWACCEL + \
- CONFIG_HEVC_VULKAN_HWACCEL)
+ CONFIG_HEVC_VULKAN_HWACCEL + \
+ CONFIG_HEVC_NVTEGRA_HWACCEL)
 enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmt = pix_fmts;
 
 switch (sps->pix_fmt) {
@@ -436,6 +437,9 @@ static enum AVPixelFormat get_format(HEVCContext *s, const 
HEVCSPS *sps)
 #endif
 #if CONFIG_HEVC_VULKAN_HWACCEL
 *fmt++ = AV_PIX_FMT_VULKAN;
+#endif
+#if CONFIG_HEVC_NVTEGRA_HWACCEL
+*fmt++ = AV_PIX_FMT_NVTEGRA;
 #endif
 break;
 case AV_PIX_FMT_YUV420P10:
@@ -463,6 +467,9 @@ static enum AVPixelFormat get_format(HEVCContext *s, const 
HEVCSPS *sps)
 #endif
 #if CONFIG_HEVC_NVDEC_HWACCEL
 *fmt++ = AV_PIX_FMT_CUDA;
+#endif
+#if CONFIG_HEVC_NVTEGRA_HWACCEL
+*fmt++ = AV_PIX_FMT_NVTEGRA;
 #endif
 break;
 case AV_PIX_FMT_YUV444P:
@@ -598,6 +605,7 @@ static int hls_slice_header(HEVCContext *s)
 GetBitContext *gb = &s->HEVClc->gb;
 SliceHeader *sh   = &s->sh;
 int i, ret;
+int nvidia_skip_len_start;
 
 // Coded parameters
 sh->first_slice_in_pic_flag = get_bits1(gb);
@@ -700,6 +708,8 @@ static int hls_slice_header(HEVCContext *s)
 return AVERROR_INVALIDDATA;
 }
 
+nvidia_skip_len_start = get_bits_left(gb);
+
 // when flag is not present, picture is inferred to be output
 sh->pic_output_flag = 1;
 if (s->ps.pps->output_flag_present_flag)
@@ -753,6 +763,7 @@ static int hls_slice_header(HEVCContext *s)
 }
 sh->long_term_ref_pic_set_size = pos - get_bits_left(gb);
 
+sh->nvidia_skip_length = nvidia_skip_len_start - get_bits_left(gb);
 if (s->ps.sps->sps_temporal_mvp_enabled_flag)
 sh->slice_temporal_mvp_enabled_flag = get_bits1(gb);
 else
@@ -765,6 +776,7 @@ static int hls_slice_header(HEVCContext *s)
 sh->short_term_rps  = NULL;
 sh->long_term_ref_pic_set_size  = 0;
 sh->slice_temporal_mvp_enabled_flag = 0;
+sh->nvidia_skip_length  = nvidia_skip_len_start - 
get_bits_left(gb);
 }
 
 /* 8.3.1 */
@@ -3743,6 +3755,9 @@ const FFCodec ff_hevc_decoder = {
 #endif
 #if CONFIG_HEVC_VULKAN_HWACCEL
HWACCEL_VULKAN(hevc),
+#endif
+#if CONFIG_HEVC_NVTEGRA_HWACCEL
+   HWACCEL_NVTEGRA(hevc),
 #endif
NULL
},
diff --git a/libavcodec/hevcdec.h b/libavcodec/hevcdec.h
index e82daf6679..2df96ed629 100644
--- a/libavcodec/hevcdec.h
+++ b/libavcodec/hevcdec.h
@@ -277,6 +277,8 @@ typedef struct SliceHeader {
 int16_t chroma_offset_l1[16][2];
 
 int slice

[FFmpeg-devel] [PATCH 14/16] nvtegra: add vp8 hardware decoding

2024-05-30 Thread averne
Signed-off-by: averne 
---
 configure|   2 +
 libavcodec/Makefile  |   1 +
 libavcodec/hwaccels.h|   1 +
 libavcodec/nvtegra_vp8.c | 334 +++
 libavcodec/vp8.c |   6 +
 5 files changed, 344 insertions(+)
 create mode 100644 libavcodec/nvtegra_vp8.c

diff --git a/configure b/configure
index ba4c5287e3..a347337dd4 100755
--- a/configure
+++ b/configure
@@ -3277,6 +3277,8 @@ vp8_nvdec_hwaccel_deps="nvdec"
 vp8_nvdec_hwaccel_select="vp8_decoder"
 vp8_vaapi_hwaccel_deps="vaapi"
 vp8_vaapi_hwaccel_select="vp8_decoder"
+vp8_nvtegra_hwaccel_deps="nvtegra"
+vp8_nvtegra_hwaccel_select="vp8_decoder"
 vp9_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_VP9"
 vp9_d3d11va_hwaccel_select="vp9_decoder"
 vp9_d3d11va2_hwaccel_deps="d3d11va DXVA_PicParams_VP9"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index de667b8a4b..89c5986aab 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1053,6 +1053,7 @@ OBJS-$(CONFIG_VC1_VDPAU_HWACCEL)  += vdpau_vc1.o
 OBJS-$(CONFIG_VC1_NVTEGRA_HWACCEL)+= nvtegra_vc1.o
 OBJS-$(CONFIG_VP8_NVDEC_HWACCEL)  += nvdec_vp8.o
 OBJS-$(CONFIG_VP8_VAAPI_HWACCEL)  += vaapi_vp8.o
+OBJS-$(CONFIG_VP8_NVTEGRA_HWACCEL)+= nvtegra_vp8.o
 OBJS-$(CONFIG_VP9_D3D11VA_HWACCEL)+= dxva2_vp9.o
 OBJS-$(CONFIG_VP9_DXVA2_HWACCEL)  += dxva2_vp9.o
 OBJS-$(CONFIG_VP9_D3D12VA_HWACCEL)+= dxva2_vp9.o d3d12va_vp9.o
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index 77892dc2b2..7d43aeccec 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -80,6 +80,7 @@ extern const struct FFHWAccel ff_vc1_vdpau_hwaccel;
 extern const struct FFHWAccel ff_vc1_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_vp8_nvdec_hwaccel;
 extern const struct FFHWAccel ff_vp8_vaapi_hwaccel;
+extern const struct FFHWAccel ff_vp8_nvtegra_hwaccel;
 extern const struct FFHWAccel ff_vp9_d3d11va_hwaccel;
 extern const struct FFHWAccel ff_vp9_d3d11va2_hwaccel;
 extern const struct FFHWAccel ff_vp9_d3d12va_hwaccel;
diff --git a/libavcodec/nvtegra_vp8.c b/libavcodec/nvtegra_vp8.c
new file mode 100644
index 00..a3aa69fe62
--- /dev/null
+++ b/libavcodec/nvtegra_vp8.c
@@ -0,0 +1,334 @@
+/*
+ * Copyright (c) 2024 averne 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include "config_components.h"
+
+#include "avcodec.h"
+#include "hwaccel_internal.h"
+#include "internal.h"
+#include "hwconfig.h"
+#include "vp8.h"
+#include "vp8data.h"
+#include "decode.h"
+#include "nvtegra_decode.h"
+
+#include "libavutil/pixdesc.h"
+#include "libavutil/nvtegra_host1x.h"
+
+typedef struct NVTegraVP8DecodeContext {
+FFNVTegraDecodeContext core;
+
+AVNVTegraMap common_map;
+uint32_t prob_data_off, history_off;
+uint32_t history_size;
+
+AVFrame *golden_frame, *altref_frame,
+*previous_frame;
+} NVTegraVP8DecodeContext;
+
+/* Size (width, height) of a macroblock */
+#define MB_SIZE 16
+
+static int nvtegra_vp8_decode_uninit(AVCodecContext *avctx) {
+NVTegraVP8DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+int err;
+
+av_log(avctx, AV_LOG_DEBUG, "Deinitializing NVTEGRA VP8 decoder\n");
+
+err = av_nvtegra_map_destroy(&ctx->common_map);
+if (err < 0)
+return err;
+
+err = ff_nvtegra_decode_uninit(avctx, &ctx->core);
+if (err < 0)
+return err;
+
+return 0;
+}
+
+static void nvtegra_vp8_init_probs(void *p) {
+int i, j, k;
+uint8_t *ptr = p;
+
+memset(p, 0, 0x4cc);
+
+for (i = 0; i < 4; ++i) {
+for (j = 0; j < 8; ++j) {
+for (k = 0; k < 3; ++k) {
+memcpy(ptr, vp8_token_default_probs[i][j][k], NUM_DCT_TOKENS - 
1);
+ptr += NUM_DCT_TOKENS;
+}
+}
+}
+
+memcpy(ptr, vp8_pred16x16_prob_inter, sizeof(vp8_pred16x16_prob_inter));
+ptr += 4;
+
+memcpy(ptr, vp8_pred8x8c_prob_inter, sizeof(vp8_pred8x8c_prob_inter));
+ptr += 4;
+
+for (i = 0; i < 2; ++i) {
+memcpy(ptr, vp8_mv_default_prob[i], 19);
+ptr += 20;
+}
+}
+
+static int nvtegra_vp8_decode_init(AVCodecContext *avctx) {
+NVTegraVP8DecodeContext *ctx = avctx->internal->hwaccel_priv_data;
+
+AVHWDeviceContext  *hw_device_ctx;
+AVNVTe

[FFmpeg-devel] 回复: [PATCH v2 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow

2024-05-30 Thread Wu Jianhua
Andreas Rheinhardt:
> 发件人: ffmpeg-devel  代表 Andreas Rheinhardt 
> 
> 发送时间: 2024年5月30日 11:33
> 收件人: ffmpeg-devel@ffmpeg.org
> 主题: Re: [FFmpeg-devel] [PATCH v2 1/3] avcodec/x86/vvc/vvc_alf: fix integer 
> overflow
> 
> toq...@outlook.com:
> > From: Wu Jianhua 
> >
> > Some tests fails with certain seeds
> >
> > tests/checkasm/checkasm 2325607578 --test=vvc_alf
> > checkasm: using random seed 2325607578
> 

> And can I get an answer to the question of whether the issue is present
> when used by the actual decoder and not only the checkasm test?
> 
> - Andreas
> 

Sure. This issue hasn't occurred in the actual decoding of our tests but only 
in the checksum test, for the filter is generated randomly.

Thanks,
Jianhua
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 01/10, v3] avutil: add hwcontext_amf.

2024-05-30 Thread Dmitrii Ovchinnikov
I would appreciate your review.
Just to clarify: The information I provided is coming from AMF and driver
developers.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 02/16] configure, avutil: add support for HorizonOS

2024-05-30 Thread Rémi Denis-Courmont
Le torstaina 30. toukokuuta 2024, 22.43.04 EEST averne a écrit :
> HorizonOS (HOS) is the operating system of the Nintendo Switch.
> This patch enables integration with the homebrew toolchain developped by the
> devkitPro team. Its two main components are devkitA64 (common toolchain for
> aarch64 targets) and libnx (library implementing interaction with the HOS
> kernel and system daemons, termed sysmodules).
> 
> Signed-off-by: averne 
> ---
>  configure   | 8 
>  libavutil/cpu.c | 7 +++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/configure b/configure
> index 96b181fd21..09fb2aed1b 100755
> --- a/configure
> +++ b/configure
> @@ -5967,6 +5967,10 @@ case $target_os in
>  ;;
>  minix)
>  ;;
> +horizon)
> +enable section_data_rel_ro
> +add_extralibs -lnx
> +;;
>  none)
>  ;;
>  *)
> @@ -7710,6 +7714,10 @@ haiku)
>  disable memalign
>  fi
>  ;;
> +horizon)
> +disable sysctl
> +disable sysctlbyname
> +;;

Are those really broken, or is this just a trick to force a fallback? In the 
later case, you don't need to disable them; just to put the HOS code ahead of 
the generic BSD code.

>  esac
> 
>  flatten_extralibs(){
> diff --git a/libavutil/cpu.c b/libavutil/cpu.c
> index 9ac2f01c20..6a77df5e34 100644
> --- a/libavutil/cpu.c
> +++ b/libavutil/cpu.c
> @@ -48,6 +48,9 @@
>  #if HAVE_UNISTD_H
>  #include 
>  #endif
> +#ifdef __SWITCH__
> +#include 
> +#endif
> 
>  static atomic_int cpu_flags = -1;
>  static atomic_int cpu_count = -1;
> @@ -247,6 +250,10 @@ int av_cpu_count(void)
>  #elif HAVE_WINRT
>  GetNativeSystemInfo(&sysinfo);
>  nb_cpus = sysinfo.dwNumberOfProcessors;
> +#elif defined(__SWITCH__)
> +u64 core_mask = 0;
> +Result rc = svcGetInfo(&core_mask, InfoType_CoreMask,
> CUR_PROCESS_HANDLE, 0); +nb_cpus = R_SUCCEEDED(rc) ?
> av_popcount64(core_mask) : 3;
>  #endif
> 
>  if (!atomic_exchange_explicit(&printed, 1, memory_order_relaxed))


-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 01/16] avutil/buffer: add helper to allocate aligned memory

2024-05-30 Thread Rémi Denis-Courmont
Le torstaina 30. toukokuuta 2024, 22.43.03 EEST averne a écrit :
> This is useful eg. for memory-mapped buffers that need page-aligned memory,
> when dealing with hardware devices
> 
> Signed-off-by: averne 
> ---
>  libavutil/buffer.c | 31 +++
>  libavutil/buffer.h |  7 +++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/libavutil/buffer.c b/libavutil/buffer.c
> index e4562a79b1..b8e357f540 100644
> --- a/libavutil/buffer.c
> +++ b/libavutil/buffer.c
> @@ -16,9 +16,14 @@
>   * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
> USA */
> 
> +#include "config.h"
> +
>  #include 
>  #include 
>  #include 
> +#if HAVE_MALLOC_H
> +#include 
> +#endif
> 
>  #include "avassert.h"
>  #include "buffer_internal.h"
> @@ -100,6 +105,32 @@ AVBufferRef *av_buffer_allocz(size_t size)
>  return ret;
>  }
> 
> +AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align)
> +{
> +AVBufferRef *ret = NULL;
> +uint8_t*data = NULL;
> +
> +#if HAVE_POSIX_MEMALIGN
> +if (posix_memalign((void **)&data, align, size))

Invalid cast.

> +return NULL;
> +#elif HAVE_ALIGNED_MALLOC
> +data = aligned_alloc(align, size);
> +#elif HAVE_MEMALIGN
> +data = memalign(align, size);
> +#else
> +return NULL;
> +#endif
> +
> +if (!data)
> +return NULL;
> +
> +ret = av_buffer_create(data, size, av_buffer_default_free, NULL, 0);
> +if (!ret)
> +av_freep(&data);
> +
> +return ret;
> +}
> +
>  AVBufferRef *av_buffer_ref(const AVBufferRef *buf)
>  {
>  AVBufferRef *ret = av_mallocz(sizeof(*ret));
> diff --git a/libavutil/buffer.h b/libavutil/buffer.h
> index e1ef5b7f07..8422ec3453 100644
> --- a/libavutil/buffer.h
> +++ b/libavutil/buffer.h
> @@ -107,6 +107,13 @@ AVBufferRef *av_buffer_alloc(size_t size);
>   */
>  AVBufferRef *av_buffer_allocz(size_t size);
> 
> +/**
> + * Allocate an AVBuffer of the given size and alignment.
> + *
> + * @return an AVBufferRef of given size or NULL when out of memory
> + */
> +AVBufferRef *av_buffer_aligned_alloc(size_t size, size_t align);
> +
>  /**
>   * Always treat the buffer as read-only, even when it has only one
>   * reference.


-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 03/16] avutil: add ioctl definitions for tegra devices

2024-05-30 Thread Rémi Denis-Courmont
Le torstaina 30. toukokuuta 2024, 22.43.05 EEST averne a écrit :
> These files are taken with minimal modifications from nvidia's Linux4Tegra
> (L4T) tree. nvmap enables management of memory-mapped buffers for hardware
> devices. nvhost enables interaction with different hardware modules
> (multimedia engines, display engine, ...), through a common block, host1x.
> 
> Signed-off-by: averne 
> ---
>  libavutil/Makefile   |   2 +
>  libavutil/nvhost_ioctl.h | 511 +++
>  libavutil/nvmap_ioctl.h  | 451 ++
>  3 files changed, 964 insertions(+)
>  create mode 100644 libavutil/nvhost_ioctl.h
>  create mode 100644 libavutil/nvmap_ioctl.h
> 
> diff --git a/libavutil/Makefile b/libavutil/Makefile
> index 6e6fa8d800..9c112bc58a 100644
> --- a/libavutil/Makefile
> +++ b/libavutil/Makefile
> @@ -52,6 +52,8 @@ HEADERS = adler32.h   
>  \ hwcontext_videotoolbox.h
>  \ hwcontext_vdpau.h \
> hwcontext_vulkan.h\ + 
> nvhost_ioctl.h\ + 
> nvmap_ioctl.h \ iamf.h 
>   \ imgutils.h 
>   \ intfloat.h 
>   \ diff --git a/libavutil/nvhost_ioctl.h
> b/libavutil/nvhost_ioctl.h
> new file mode 100644
> index 00..b0bf3e3ae6
> --- /dev/null
> +++ b/libavutil/nvhost_ioctl.h
> @@ -0,0 +1,511 @@
> +/*
> + * include/uapi/linux/nvhost_ioctl.h

Well, then that should be provided by linux-libc-dev or equivalent. I don't 
think that this should be vendored into FFmpeg.

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2 1/3] avcodec/x86/vvc/vvc_alf: fix integer overflow

2024-05-30 Thread Ronald S. Bultje
Hi Andreas,

On Thu, May 30, 2024 at 2:33 PM Andreas Rheinhardt <
andreas.rheinha...@outlook.com> wrote:

> toq...@outlook.com:
> > From: Wu Jianhua 
> >
> > Some tests fails with certain seeds
> >
> > tests/checkasm/checkasm 2325607578 --test=vvc_alf
> > checkasm: using random seed 2325607578
> > AVX2:
> > vvc_alf_filter_luma_120x20_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x24_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x28_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x32_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x36_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x40_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x44_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x48_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x52_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x56_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x60_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x64_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x68_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x72_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x76_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x80_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x84_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x88_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x92_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x96_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x100_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x104_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x108_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x112_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x116_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x120_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x124_12_avx2 (vvc_alf.c:104)
> > vvc_alf_filter_luma_120x128_12_avx2 (vvc_alf.c:104)
> >   - vvc_alf.alf_filter   [FAILED]
> >   - vvc_alf.alf_classify [OK]
> > checkasm: 28 of 9216 tests have failed
> >
> > Reported-by: James Almer 
> > Signed-off-by: Wu Jianhua 
> > ---
> >  libavcodec/x86/vvc/vvc_alf.asm | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/libavcodec/x86/vvc/vvc_alf.asm
> b/libavcodec/x86/vvc/vvc_alf.asm
> > index 71e821c27b..f7b3e2a6cc 100644
> > --- a/libavcodec/x86/vvc/vvc_alf.asm
> > +++ b/libavcodec/x86/vvc/vvc_alf.asm
> > @@ -356,7 +356,8 @@ SECTION .text
> >
> >  FILTER_VB xq
> >
> > -paddw m0, m2
> > +; sum += curr
> > +paddsw m0, m2
> >
> >  ; clip to pixel
> >  CLIPW m0, m14, m15
>
> And can I get an answer to the question of whether the issue is present
> when used by the actual decoder and not only the checkasm test?
>

From my reading of the source code, this could happen in a crafted (e.g.
fuzzed) stream.

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] libavcodec/libxvid: code cleanup (replace magic numbers)

2024-05-30 Thread Ramiro Polla
---
 libavcodec/libxvid.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavcodec/libxvid.c b/libavcodec/libxvid.c
index b9ac39429d..a490f16b3f 100644
--- a/libavcodec/libxvid.c
+++ b/libavcodec/libxvid.c
@@ -422,13 +422,13 @@ static av_cold int xvid_encode_init(AVCodecContext *avctx)
 
 /* Decide how we should decide blocks */
 switch (avctx->mb_decision) {
-case 2:
+case FF_MB_DECISION_RD:
 x->vop_flags |=  XVID_VOP_MODEDECISION_RD;
 x->me_flags  |=  XVID_ME_HALFPELREFINE8_RD|
  XVID_ME_QUARTERPELREFINE8_RD |
  XVID_ME_EXTSEARCH_RD |
  XVID_ME_CHECKPREDICTION_RD;
-case 1:
+case FF_MB_DECISION_BITS:
 if (!(x->vop_flags & XVID_VOP_MODEDECISION_RD))
 x->vop_flags |= XVID_VOP_FAST_MODEDECISION_RD;
 x->me_flags |= XVID_ME_HALFPELREFINE16_RD |
-- 
2.30.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 1/2] libavcodec: various: remove empty directories originally for legacy DSP code

2024-05-30 Thread Sean McGovern
---
 libavcodec/bfin/README  | 6 --
 libavcodec/sh4/README   | 6 --
 libavcodec/sparc/README | 6 --
 3 files changed, 18 deletions(-)
 delete mode 100644 libavcodec/bfin/README
 delete mode 100644 libavcodec/sh4/README
 delete mode 100644 libavcodec/sparc/README

diff --git a/libavcodec/bfin/README b/libavcodec/bfin/README
deleted file mode 100644
index afb3461b72..00
--- a/libavcodec/bfin/README
+++ /dev/null
@@ -1,6 +0,0 @@
-BFIN optimizations have been removed in
-commit 880e2aa23645ed9871c66ee1cbd00f93c72d2d73
-The last revission with the optimizations is 
fa4e17c14035ebf43130fb369e1728cdd98d0b72
-
-If you want to maintain these (or other) BFIN optimizations in ffmpeg, then 
please
-contact ffmpeg-devel@ffmpeg.org
diff --git a/libavcodec/sh4/README b/libavcodec/sh4/README
deleted file mode 100644
index 8dd61fe875..00
--- a/libavcodec/sh4/README
+++ /dev/null
@@ -1,6 +0,0 @@
-SH4 optimizations have been removed in
-commit d6096a67422534918405abb46dafbbac4608cbc3
-The last revission with the optimizations is 
cbfc9046e1c7e295b74f252902ae6f255eef4e78
-
-If you want to maintain these (or other) SH4 optimizations in ffmpeg, then 
please
-contact ffmpeg-devel@ffmpeg.org
diff --git a/libavcodec/sparc/README b/libavcodec/sparc/README
deleted file mode 100644
index f9f2349cd4..00
--- a/libavcodec/sparc/README
+++ /dev/null
@@ -1,6 +0,0 @@
-SPARC optimizations have been removed in
-commit b4dd424d96f09f9bafb88e47f37df65dc4529143
-The last revission with the optimizations is 
fb1b70c1ed50951c5fc1a309c3c446b2eaaf564b
-
-If you want to maintain these (or other) SPARC optimizations in ffmpeg, then 
please
-contact ffmpeg-devel@ffmpeg.org
-- 
2.39.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 2/2] [RFC] libavcodec: remove DSP acceleration code for DEC Alpha

2024-05-30 Thread Sean McGovern
---
 Changelog|   1 +
 libavcodec/alpha/Makefile|  10 -
 libavcodec/alpha/asm.h   | 153 --
 libavcodec/alpha/blockdsp_alpha.c|  49 -
 libavcodec/alpha/hpeldsp_alpha.c | 213 ---
 libavcodec/alpha/hpeldsp_alpha.h |  28 ---
 libavcodec/alpha/hpeldsp_alpha_asm.S | 125 ---
 libavcodec/alpha/idctdsp_alpha.c | 127 ---
 libavcodec/alpha/idctdsp_alpha.h |  34 ---
 libavcodec/alpha/idctdsp_alpha_asm.S | 167 ---
 libavcodec/alpha/me_cmp_alpha.c  | 279 
 libavcodec/alpha/me_cmp_mvi_asm.S| 179 
 libavcodec/alpha/mpegvideo_alpha.c   | 110 --
 libavcodec/alpha/pixblockdsp_alpha.c |  79 ---
 libavcodec/alpha/regdef.h|  77 ---
 libavcodec/alpha/simple_idct_alpha.c | 303 ---
 16 files changed, 1 insertion(+), 1933 deletions(-)
 delete mode 100644 libavcodec/alpha/Makefile
 delete mode 100644 libavcodec/alpha/asm.h
 delete mode 100644 libavcodec/alpha/blockdsp_alpha.c
 delete mode 100644 libavcodec/alpha/hpeldsp_alpha.c
 delete mode 100644 libavcodec/alpha/hpeldsp_alpha.h
 delete mode 100644 libavcodec/alpha/hpeldsp_alpha_asm.S
 delete mode 100644 libavcodec/alpha/idctdsp_alpha.c
 delete mode 100644 libavcodec/alpha/idctdsp_alpha.h
 delete mode 100644 libavcodec/alpha/idctdsp_alpha_asm.S
 delete mode 100644 libavcodec/alpha/me_cmp_alpha.c
 delete mode 100644 libavcodec/alpha/me_cmp_mvi_asm.S
 delete mode 100644 libavcodec/alpha/mpegvideo_alpha.c
 delete mode 100644 libavcodec/alpha/pixblockdsp_alpha.c
 delete mode 100644 libavcodec/alpha/regdef.h
 delete mode 100644 libavcodec/alpha/simple_idct_alpha.c

diff --git a/Changelog b/Changelog
index 12770e4296..a1a40399f8 100644
--- a/Changelog
+++ b/Changelog
@@ -11,6 +11,7 @@ version :
 - vf_scale2ref deprecated
 - qsv_params option added for QSV encoders
 - VVC decoder compatible with DVB test content
+- removed libavcodec DSP code for the DEC Alpha
 
 
 version 7.0:
diff --git a/libavcodec/alpha/Makefile b/libavcodec/alpha/Makefile
deleted file mode 100644
index 796d9762b3..00
--- a/libavcodec/alpha/Makefile
+++ /dev/null
@@ -1,10 +0,0 @@
-OBJS-$(CONFIG_BLOCKDSP) += alpha/blockdsp_alpha.o
-OBJS-$(CONFIG_ME_CMP)   += alpha/me_cmp_alpha.o \
-   alpha/me_cmp_mvi_asm.o
-OBJS-$(CONFIG_HPELDSP)  += alpha/hpeldsp_alpha.o\
-   alpha/hpeldsp_alpha_asm.o
-OBJS-$(CONFIG_IDCTDSP)  += alpha/idctdsp_alpha.o\
-   alpha/idctdsp_alpha_asm.o\
-   alpha/simple_idct_alpha.o
-OBJS-$(CONFIG_MPEGVIDEO)+= alpha/mpegvideo_alpha.o
-OBJS-$(CONFIG_PIXBLOCKDSP)  += alpha/pixblockdsp_alpha.o
diff --git a/libavcodec/alpha/asm.h b/libavcodec/alpha/asm.h
deleted file mode 100644
index 6d850cecc6..00
--- a/libavcodec/alpha/asm.h
+++ /dev/null
@@ -1,153 +0,0 @@
-/*
- * Alpha optimized DSP utils
- * Copyright (c) 2002 Falk Hueffner 
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#ifndef AVCODEC_ALPHA_ASM_H
-#define AVCODEC_ALPHA_ASM_H
-
-#include 
-
-#include "libavutil/common.h"
-
-#if AV_GCC_VERSION_AT_LEAST(2,96)
-# define likely(x)  __builtin_expect((x) != 0, 1)
-# define unlikely(x)__builtin_expect((x) != 0, 0)
-#else
-# define likely(x)  (x)
-# define unlikely(x)(x)
-#endif
-
-#define AMASK_BWX (1 << 0)
-#define AMASK_FIX (1 << 1)
-#define AMASK_CIX (1 << 2)
-#define AMASK_MVI (1 << 8)
-
-static inline uint64_t BYTE_VEC(uint64_t x)
-{
-x |= x <<  8;
-x |= x << 16;
-x |= x << 32;
-return x;
-}
-static inline uint64_t WORD_VEC(uint64_t x)
-{
-x |= x << 16;
-x |= x << 32;
-return x;
-}
-
-#define sextw(x) ((int16_t) (x))
-
-#ifdef __GNUC__
-#define ldq(p)  \
-(((const union {\
-uint64_t __l;   \
-__typeof__(*(p)) __s[sizeof (uint64_t) / sizeof *(p)];  \
-} *) (p))->__l)
-#d

Re: [FFmpeg-devel] [PATCH] libavcodec/libxvid: code cleanup (replace magic numbers)

2024-05-30 Thread Sean McGovern
On Thu, May 30, 2024 at 5:20 PM Ramiro Polla  wrote:
>
> ---
>  libavcodec/libxvid.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/libxvid.c b/libavcodec/libxvid.c
> index b9ac39429d..a490f16b3f 100644
> --- a/libavcodec/libxvid.c
> +++ b/libavcodec/libxvid.c
> @@ -422,13 +422,13 @@ static av_cold int xvid_encode_init(AVCodecContext 
> *avctx)
>
>  /* Decide how we should decide blocks */
>  switch (avctx->mb_decision) {
> -case 2:
> +case FF_MB_DECISION_RD:
>  x->vop_flags |=  XVID_VOP_MODEDECISION_RD;
>  x->me_flags  |=  XVID_ME_HALFPELREFINE8_RD|
>   XVID_ME_QUARTERPELREFINE8_RD |
>   XVID_ME_EXTSEARCH_RD |
>   XVID_ME_CHECKPREDICTION_RD;
> -case 1:
> +case FF_MB_DECISION_BITS:
>  if (!(x->vop_flags & XVID_VOP_MODEDECISION_RD))
>  x->vop_flags |= XVID_VOP_FAST_MODEDECISION_RD;
>  x->me_flags |= XVID_ME_HALFPELREFINE16_RD |
> --
> 2.30.2
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

This gets a +1 from me.

-- Sean McGover
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] area changed: scdet filter

2024-05-30 Thread Michael Niedermayer
On Mon, May 13, 2024 at 06:52:19PM +0300, radu.taraib...@gmail.com wrote:
> Previous observations:
> 
>  - Inconsistent code style with other filters. (Mostly using AVFilterLink*
> link instead of AVFilterLink *link).
> I hope it's fine now. 
> 
>  - Unrelated changes, please split trivial unrelated changes into separate
> patches.
> Removed trivial changes from this patch.
> 
>  - Can't tables be generated at .init/.config_props time? No point in
> storing them into binary.
> Done.
> 
>  - Adding extra delay is not backward compatible change, it should be
> implemented properly by adding option for users to select mode: next & prev
> frame or just next or prev frame.
> Added legacy option to the mode parameter.
> 
>  - Could split frame clone change into earlier separate patch.
> Cannot be done. It's either frame clone or 1 frame delay.
> 
>  - Where are results of improvements with accuracy so it can be confirmed?
> Here are my test results with manual labeling of scene changes:
> 2379  Full length movie
> 
> MethodThreshold   TP  FP  FN  Precision
> RecallF
> Cubic 7   2357423 22  0.847841727 0.990752417
> 0.913742973
> Cubic 10  2297200 82  0.919903885 0.965531736
> 0.94216571
> Cubic 12  2217146 162 0.938214135 0.931904161
> 0.935048503
> Cubic 15  2049101 330 0.953023256 0.861286255
> 0.904835505
> Linear2.8 2357106022  0.689786362 
> 0.990752417
> 0.813319531
> Linear8   2099236 280 0.898929336 
> 0.882303489
> 0.890538821
> Linear10  1886173 493 0.91597863  
> 0.792770071
> 0.849932402
> Legacy5   22351260144 0.639484979 
> 0.939470366
> 0.760980592
> Legacy8   1998414 381 0.828358209 
> 0.839848676
> 0.83406387
> Legacy10  1743193 636 0.900309917 
> 0.732660782
> 0.80787949
>   
> 15HDR10Plus_PB_EAC3JOC
> https://mega.nz/file/nehDka6Z#C5_OPbSZkONdOp1jRmc09C9-viDc3zMj8ZHruHcWKyA
> 
> MethodThreshold   TP  FP  FN  Precision
> RecallF
> Cubic 10  15  0   0   1   1   1
> Linear5   13  1   2   0.928571429 
> 0.86667
> 0.896551724
> Legacy5   12  2   3   0.857142857 0.8
> 0.827586207
>   
> 21(HDR HEVC 10-bit BT.2020 24fps) Exodus Sample
> https://mega.nz/file/Sfw1hDpK#ErxCOpQDVjcI1gq6ZbX3vIfdtXZompkFe0jq47EhR2o
> 
> MethodThreshold   TP  FP  FN  Precision
> RecallF
> Cubic 10  21  0   0   1   1   1
> Linear4   20  0   1   1   0.952380952
> 0.975609756
> Legacy4   19  0   2   1   0.904761905 
> 0.95
>   
> 94Bieber Grammys
> https://mega.nz/#!c9dhAaKA!MG5Yi-MJNATE2_KqcnNJZCRKtTWvdjJP1NwG8Ggdw3E
> 
> MethodThreshold   TP  FP  FN  Precision
> RecallF
> Cubic 15  91  23  3   0.798245614 0.968085106
> 0.875
> Cubic 18  85  9   9   0.904255319 0.904255319
> 0.904255319
> Linear7   79  49  15  0.6171875   
> 0.840425532
> 0.711711712
> Linear8   74  28  20  0.725490196 
> 0.787234043
> 0.755102041
> Legacy7   74  40  20  0.649122807 
> 0.787234043
> 0.711538462
> Legacy8   71  26  23  0.731958763 
> 0.755319149
> 0.743455497
> 
> 
> Improve scene detection accuracy by comparing frame with both previous and
> next frame (creates one frame delay).
> Add new mode parameter and new method to compute the frame difference using
> cubic square to increase the weight of small changes and new mean formula.
> This improves accuracy significantly. Slightly improve performance by not
> using frame clone.
> Add legacy mode for backward compatibility.
> 
> Signed-off-by: raduct 
> ---
>  doc/filters.texi|  16 
>  libavfilter/scene_sad.c | 151 ++
>  libavfilter/scene_sad.h |   6 ++
>  libavfilter/vf_scdet.c  | 156 +---
>  tests/fate/filter-video.mak |   3 +
>  5 files changed, 284 insertions(+), 48 deletions(-)
> 
> diff --git a/doc/filters.texi b/doc/filters.texi
> index bfa8ccec8b..53814e003b 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -21797,6 +21797,22 @@ Default value is @code{10.}.
>  @item sc_pass, s
>  Set the flag to pass scene chan

  1   2   >