Re: [FFmpeg-devel] [PATCH] aarch64: Add OpenBSD runtime detection of dotprod and i8mm using sysctl
On Sat, 22 Jun 2024, Brad Smith wrote: [PATCH] aarch64: Add OpenBSD runtime detection of dotprod and i8mm using sysctl Signed-off-by: Brad Smith --- libavutil/aarch64/cpu.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index 196bdaf6b0..40fcc8d1ff 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -65,6 +65,41 @@ static int detect_flags(void) return flags; } +#elif defined(__OpenBSD__) +#include +#include +#include +#include + +static int detect_flags(void) +{ +int flags = 0; +int mib[2]; +uint64_t isar0; +uint64_t isar1; +size_t len; + +mib[0] = CTL_MACHDEP; +mib[1] = CPU_ID_AA64ISAR0; +len = sizeof(isar0); +if (sysctl(mib, 2, &isar0, &len, NULL, 0) != -1) { +if (ID_AA64ISAR0_DP(isar0) >= ID_AA64ISAR0_DP_IMPL) +flags |= AV_CPU_FLAG_DOTPROD; +} + +mib[0] = CTL_MACHDEP; +mib[1] = CPU_ID_AA64ISAR1; +len = sizeof(isar1); +if (sysctl(mib, 2, &isar1, &len, NULL, 0) != -1) { +#ifdef ID_AA64ISAR1_I8MM_IMPL +if (ID_AA64ISAR1_I8MM(isar1) >= ID_AA64ISAR1_I8MM_IMPL) +flags |= AV_CPU_FLAG_I8MM; +#endif +} + +return flags; +} + #elif defined(_WIN32) #include This LGTM. Although, in https://code.videolan.org/videolan/dav1d/-/merge_requests/1673 you wrapped most of this in an #ifdef CPU_ID_AA64ISAR0, so would that be useful here too? Feel free to push either with or without that. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3] movenc: Add an option for resilient, hybrid fragmented/non-fragmented muxing
On Thu, 20 Jun 2024, Dennis Sädtler wrote: On 2024-06-20 15:47, Timo Rothenpieler wrote: On 20/06/2024 15:46, Martin Storsjö wrote: On Wed, 19 Jun 2024, Martin Storsjö wrote: This allows ending up with a normal, non-fragmented file when the file is finished, while keeping the file readable if writing is aborted abruptly at any point. (Normally when writing a mov/mp4 file, the unfinished file is completely useless unless it is finished properly.) This results in a file where the mdat atom contains (and hides) all the moof atoms that were part of the fragmented file structure initially. --- v3: Renamed the option to hybrid_fragmented. --- doc/muxers.texi | 11 ++ libavformat/movenc.c | 62 +++--- libavformat/movenc.h | 4 ++- libavformat/version.h | 4 +-- tests/fate/lavf-container.mak | 3 +- tests/ref/lavf/mov_hybrid_frag | 3 ++ 6 files changed, 78 insertions(+), 9 deletions(-) create mode 100644 tests/ref/lavf/mov_hybrid_frag If there are no more comments on this one, I'll go ahead and push it soon. +1 from me Sounds good to me as well. Pushed now, thanks for all the input! // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/4] hlsenc: Fix the return value accumulation in append_single_file
Both the read_byte variable (which is accumulated into append_single_file) and the return value are int64_t; give the ret variable the right corresponding type too. --- libavformat/hlsenc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavformat/hlsenc.c b/libavformat/hlsenc.c index f5c0243cf1..3d5eb47e84 100644 --- a/libavformat/hlsenc.c +++ b/libavformat/hlsenc.c @@ -2380,7 +2380,7 @@ static int hls_init_file_resend(AVFormatContext *s, VariantStream *vs) static int64_t append_single_file(AVFormatContext *s, VariantStream *vs) { -int ret = 0; +int64_t ret = 0; int64_t read_byte = 0; int64_t total_size = 0; char *filename = NULL; -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/4] hlsenc: Fix setting vs->start_pos when not using HLS_SINGLE_FILE or hls_segment_size
When not using HLS_SINGLE_FILE or hls_segment_size, we're writing each segment into a separate file. In that case, the file start pos for each segment will be zero. This matches the case in (hls->max_seg_size > 0) above, where we decide to switch to a new file. This fixes the calculation of "vs->size = new_start_pos - vs->start_pos" at the start of hls_write_packet; previously, start_pos would refer to the byte size of the previous segment file, giving vs->size entirely bogus values here. --- libavformat/hlsenc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavformat/hlsenc.c b/libavformat/hlsenc.c index 3d5eb47e84..0c72774e29 100644 --- a/libavformat/hlsenc.c +++ b/libavformat/hlsenc.c @@ -2659,7 +2659,7 @@ static int hls_write_packet(AVFormatContext *s, AVPacket *pkt) vs->start_pos = new_start_pos; } } else { -vs->start_pos = new_start_pos; +vs->start_pos = 0; sls_flag_file_rename(hls, vs, old_filename); ret = hls_start(s, vs); } -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 3/4] hlsenc: When not using HLS_SINGLE_FILE, set vs->size to range_length
This matches what is done in the corresponding case for HLS_SINGLE_FILE. Normally, vs->size is already initialized correctly - but when writing the initial segment, with mp4 files, vs->size has been set to the size of the init segment, while range_length contains the real size of the first segment. --- libavformat/hlsenc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/libavformat/hlsenc.c b/libavformat/hlsenc.c index 0c72774e29..3ca99abdbb 100644 --- a/libavformat/hlsenc.c +++ b/libavformat/hlsenc.c @@ -2586,6 +2586,7 @@ static int hls_write_packet(AVFormatContext *s, AVPacket *pkt) av_dict_free(&options); return ret; } +vs->size = range_length; ret = hlsenc_io_close(s, &vs->out, filename); if (ret < 0) { av_log(s, AV_LOG_WARNING, "upload segment failed," -- 2.39.3 (Apple Git-146) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 4/4] hlsenc: Calculate the average and actual maximum bitrate of segments
Previously, the bitrate advertised in the master playlist would only be based on the nominal values in either AVCodecParameters bit_rate, or via AVCPBProperties max_bitrate. On top of this, a fudge factor of 10% is added, to account for container overhead. Neither of these bitrates may be known, and if the encoder is running in VBR mode, there is no such value to be known. And the container overhead may be more or less than the given constant factor of 10%. Instead, calculate the maximum bitrate per segment based on what actually gets output from the muxer, and average bitrate across all segments. When muxing of the file finishes, update the master playlist with these values, exposing both the maximum (which previously was a guesstimate based on the nominal values) via EXT-X-STREAM-INF BANDWIDTH, and the average via EXT-X-STREAM-INF AVERAGE-BANDWIDTH. This makes it possible to use the hlsenc muxer with VBR encodes, for VOD style muxing. --- libavformat/dashenc.c | 4 ++-- libavformat/hlsenc.c | 47 --- libavformat/hlsplaylist.c | 3 +++ libavformat/hlsplaylist.h | 1 + 4 files changed, 40 insertions(+), 15 deletions(-) diff --git a/libavformat/dashenc.c b/libavformat/dashenc.c index 8c14aa746e..d4a6fe0304 100644 --- a/libavformat/dashenc.c +++ b/libavformat/dashenc.c @@ -1322,7 +1322,7 @@ static int write_manifest(AVFormatContext *s, int final) av_strlcat(codec_str, audio_codec_str, sizeof(codec_str)); } get_hls_playlist_name(playlist_file, sizeof(playlist_file), NULL, i); -ff_hls_write_stream_info(st, c->m3u8_out, stream_bitrate, +ff_hls_write_stream_info(st, c->m3u8_out, stream_bitrate, 0, playlist_file, agroup, codec_str, NULL, NULL); } @@ -1348,7 +1348,7 @@ static int write_manifest(AVFormatContext *s, int final) continue; av_strlcpy(codec_str, os->codec_str, sizeof(codec_str)); get_hls_playlist_name(playlist_file, sizeof(playlist_file), NULL, i); -ff_hls_write_stream_info(st, c->m3u8_out, stream_bitrate, +ff_hls_write_stream_info(st, c->m3u8_out, stream_bitrate, 0, playlist_file, NULL, codec_str, NULL, NULL); } diff --git a/libavformat/hlsenc.c b/libavformat/hlsenc.c index 3ca99abdbb..26722c9b32 100644 --- a/libavformat/hlsenc.c +++ b/libavformat/hlsenc.c @@ -150,6 +150,11 @@ typedef struct VariantStream { int discontinuity; int reference_stream_index; +int64_t total_size; +double total_duration; +int64_t avg_bitrate; +int64_t max_bitrate; + HLSSegment *segments; HLSSegment *last_segment; HLSSegment *old_segments; @@ -1108,6 +1113,16 @@ static int hls_append_segment(struct AVFormatContext *s, HLSContext *hls, if (!en) return AVERROR(ENOMEM); +vs->total_size += size; +vs->total_duration += duration; +if (duration > 0) { +int cur_bitrate = (int)(8 * size / duration); +if (cur_bitrate > vs->max_bitrate) +vs->max_bitrate = cur_bitrate; +} +if (vs->total_duration > 0) +vs->avg_bitrate = (int)(8 * vs->total_size / vs->total_duration); + en->var_stream_idx = vs->var_stream_idx; ret = sls_flags_filename_process(s, hls, vs, en, duration, pos, size); if (ret < 0) { @@ -1362,14 +1377,15 @@ static int64_t get_stream_bit_rate(AVStream *stream) } static int create_master_playlist(AVFormatContext *s, - VariantStream * const input_vs) + VariantStream * const input_vs, + int final) { HLSContext *hls = s->priv_data; VariantStream *vs, *temp_vs; AVStream *vid_st, *aud_st; AVDictionary *options = NULL; unsigned int i, j; -int ret, bandwidth; +int ret, bandwidth, avg_bandwidth; const char *m3u8_rel_name = NULL; const char *vtt_m3u8_rel_name = NULL; const char *ccgroup; @@ -1389,8 +1405,8 @@ static int create_master_playlist(AVFormatContext *s, return 0; } else { /* Keep publishing the master playlist at the configured rate */ -if (&hls->var_streams[0] != input_vs || !hls->master_publish_rate || -input_vs->number % hls->master_publish_rate) +if ((&hls->var_streams[0] != input_vs || !hls->master_publish_rate || +input_vs->number % hls->master_publish_rate) && !final) return 0; } @@ -1480,12 +1496,17 @@ static int create_master_playlist(AVFormatContext *s, } } -bandwidth = 0; -if (vid_st) -bandwidth += get_stream_bit_rate(vid_st); -if (aud_st) -bandwidth += get_stream_bi
Re: [FFmpeg-devel] [PATCH v3 2/3] avcodec/jpeg2000dec: Add support for placeholder passes
Osamu Watanabe: > This commit adds support for placeholder pass parsing > What is a placeholder pass? > Signed-off-by: Osamu Watanabe > --- > libavcodec/jpeg2000.h | 2 + > libavcodec/jpeg2000dec.c | 292 +++-- > libavcodec/jpeg2000htdec.c | 18 +-- > 3 files changed, 257 insertions(+), 55 deletions(-) > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3] movenc: Add an option for resilient, hybrid fragmented/non-fragmented muxing
On Mon, 24 Jun 2024, 11:24 Martin Storsjö, wrote: > On Thu, 20 Jun 2024, Dennis Sädtler wrote: > > > On 2024-06-20 15:47, Timo Rothenpieler wrote: > >> On 20/06/2024 15:46, Martin Storsjö wrote: > >>> On Wed, 19 Jun 2024, Martin Storsjö wrote: > >>> > This allows ending up with a normal, non-fragmented file when > the file is finished, while keeping the file readable if writing > is aborted abruptly at any point. (Normally when writing a > mov/mp4 file, the unfinished file is completely useless unless it > is finished properly.) > > This results in a file where the mdat atom contains (and hides) > all the moof atoms that were part of the fragmented file structure > initially. > --- > v3: Renamed the option to hybrid_fragmented. > --- > doc/muxers.texi| 11 ++ > libavformat/movenc.c | 62 +++--- > libavformat/movenc.h | 4 ++- > libavformat/version.h | 4 +-- > tests/fate/lavf-container.mak | 3 +- > tests/ref/lavf/mov_hybrid_frag | 3 ++ > 6 files changed, 78 insertions(+), 9 deletions(-) > create mode 100644 tests/ref/lavf/mov_hybrid_frag > >>> > >>> If there are no more comments on this one, I'll go ahead and push it > soon. > >> > >> +1 from me > > > > Sounds good to me as well. > > Pushed now, thanks for all the input! > > // Martin > Thanks for the patch, this resolves multiple issues with fragmented output even when FFmpeg exits suddenly. Retesting shortly with fmp4 + tee muxer(s). > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3 2/3] avcodec/jpeg2000dec: Add support for placeholder passes
Placeholder pass is a coding pass having zero length. It is necessary to keep pass boundaries of layers for the transcoding from HT to non HT codestream. It is defined in the spec of HTJ2K. A detaled explanation is available at https://ds.jpeg.org/documents/jpeg2000/wg1n100680-101-COM-Guideline_on_Placeholder_Passes_and_Multiple_HT_Sets_in_HTJ2K_codestreams.zip > 2024/06/24 18:34、Andreas Rheinhardt のメール: > > ?Osamu Watanabe: >> This commit adds support for placeholder pass parsing >> > > What is a placeholder pass? > >> Signed-off-by: Osamu Watanabe >> --- >> libavcodec/jpeg2000.h | 2 + >> libavcodec/jpeg2000dec.c | 292 +++-- >> libavcodec/jpeg2000htdec.c | 18 +-- >> 3 files changed, 257 insertions(+), 55 deletions(-) >> > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/4] hlsenc: Fix setting vs->start_pos when not using HLS_SINGLE_FILE or hls_segment_size
> On Jun 24, 2024, at 16:49, Martin Storsjö wrote: > > When not using HLS_SINGLE_FILE or hls_segment_size, we're writing > each segment into a separate file. In that case, the file start pos for > each segment will be zero. > > This matches the case in (hls->max_seg_size > 0) above, where we > decide to switch to a new file. > > This fixes the calculation of "vs->size = new_start_pos - vs->start_pos" > at the start of hls_write_packet; previously, start_pos would > refer to the byte size of the previous segment file, giving > vs->size entirely bogus values here. > --- > libavformat/hlsenc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/libavformat/hlsenc.c b/libavformat/hlsenc.c > index 3d5eb47e84..0c72774e29 100644 > --- a/libavformat/hlsenc.c > +++ b/libavformat/hlsenc.c > @@ -2659,7 +2659,7 @@ static int hls_write_packet(AVFormatContext *s, > AVPacket *pkt) > vs->start_pos = new_start_pos; > } > } else { > -vs->start_pos = new_start_pos; > +vs->start_pos = 0; > sls_flag_file_rename(hls, vs, old_filename); > ret = hls_start(s, vs); > } > -- > 2.39.3 (Apple Git-146) > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe”. patchset lgtm Thanks Martin Steven ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v3 2/3] swscale/aarch64: Add bgra/rgba to yuv
From: Zhao Zhili Test on Apple M1 with kperf : -O3 : -O3 -fno-vectorize bgra_to_uv_8_c : 13.4 : 27.5 bgra_to_uv_8_neon : 37.4 : 41.7 bgra_to_uv_128_c: 155.9 : 550.2 bgra_to_uv_128_neon : 91.7 : 92.7 bgra_to_uv_1080_c : 1173.2: 4558.2 bgra_to_uv_1080_neon: 822.7 : 809.5 bgra_to_uv_1920_c : 2078.2: 8115.2 bgra_to_uv_1920_neon: 1437.7: 1438.7 bgra_to_uv_half_8_c : 17.9 : 14.2 bgra_to_uv_half_8_neon : 37.4 : 10.5 bgra_to_uv_half_128_c : 103.9 : 326.0 bgra_to_uv_half_128_neon: 73.9 : 68.7 bgra_to_uv_half_1080_c : 850.2 : 3732.0 bgra_to_uv_half_1080_neon : 484.2 : 490.0 bgra_to_uv_half_1920_c : 1479.2: 4942.7 bgra_to_uv_half_1920_neon : 824.2 : 824.7 bgra_to_y_8_c : 8.2 : 29.5 bgra_to_y_8_neon: 18.2 : 32.7 bgra_to_y_128_c : 101.4 : 361.5 bgra_to_y_128_neon : 74.9 : 73.7 bgra_to_y_1080_c: 739.4 : 3018.0 bgra_to_y_1080_neon : 613.4 : 544.2 bgra_to_y_1920_c: 1298.7: 5326.0 bgra_to_y_1920_neon : 918.7 : 934.2 --- libswscale/aarch64/input.S | 91 ++-- libswscale/aarch64/swscale.c | 16 +++ 2 files changed, 94 insertions(+), 13 deletions(-) diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S index 2cfec4cb6a..6d2c6034bb 100644 --- a/libswscale/aarch64/input.S +++ b/libswscale/aarch64/input.S @@ -20,8 +20,12 @@ #include "libavutil/aarch64/asm.S" -.macro rgb_to_yuv_load_rgb src +.macro rgb_to_yuv_load_rgb src, element=3 +.if \element == 3 ld3 { v16.16b, v17.16b, v18.16b }, [\src] +.else +ld4 { v16.16b, v17.16b, v18.16b, v19.16b }, [\src] +.endif uxtlv19.8h, v16.8b // v19: r uxtlv20.8h, v17.8b // v20: g uxtlv21.8h, v18.8b // v21: b @@ -51,7 +55,8 @@ function ff_bgr24ToY_neon, export=1 ret endfunc -function ff_rgb24ToY_neon, export=1 +.macro rgbToY_neon fmt, element +function ff_\fmt\()ToY_neon, export=1 cmp w4, #0 // check width > 0 ldp w10, w11, [x5] // w10: ry, w11: gy ldr w12, [x5, #8] // w12: by @@ -67,11 +72,11 @@ function ff_rgb24ToY_neon, export=1 dup v2.8h, w12 b.lt2f 1: -rgb_to_yuv_load_rgb x1 +rgb_to_yuv_load_rgb x1, \element rgb_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9 rgb_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9 sub w4, w4, #16 // width -= 16 -add x1, x1, #48 // src += 48 +add x1, x1, #(16*\element) cmp w4, #16 // width >= 16 ? stp q16, q17, [x0], #32 // store to dst b.ge1b @@ -86,12 +91,25 @@ function ff_rgb24ToY_neon, export=1 smaddl x13, w15, w12, x13 // x13 += by * b asr w13, w13, #9// x13 >>= 9 sub w4, w4, #1 // width-- -add x1, x1, #3 // src += 3 +add x1, x1, #\element strhw13, [x0], #2 // store to dst cbnzw4, 2b 3: ret endfunc +.endm + +rgbToY_neon fmt=rgb24, element=3 + +function ff_bgra32ToY_neon, export=1 +cmp w4, #0 // check width > 0 +ldp w12, w11, [x5] // w12: ry, w11: gy +ldr w10, [x5, #8] // w10: by +b.gt4f +ret +endfunc + +rgbToY_neon fmt=rgba32, element=4 .macro rgb_set_uv_coeff half .if \half @@ -120,7 +138,8 @@ function ff_bgr24ToUV_half_neon, export=1 b 4f endfunc -function ff_rgb24ToUV_half_neon, export=1 +.macro rgbToUV_half_neon fmt, element +function ff_\fmt\()ToUV_half_neon, export=1 cmp w5, #0 // check width > 0 b.le3f @@ -132,7 +151,11 @@ function ff_rgb24ToUV_half_neon, export=1 rgb_set_uv_coeff half=1 b.lt2f 1: +.if \element == 3 ld3 { v16.16b, v17.16b, v18.16b }, [x3] +.else +ld4 { v16.16b, v17.16b, v18.16b, v19.16b }, [x3] +.endif uaddlp v19.8h, v16.16b // v19: r uaddlp v20.8h, v17.16b // v20: g u
[FFmpeg-devel] [PATCH v3 1/3] swscale/aarch64: Add bgr24 to yuv
From: Zhao Zhili Test on Apple M1 with kperf : -O3 : -O3 -fno-vectorize bgr24_to_uv_8_c : 28.5 : 52.5 bgr24_to_uv_8_neon : 54.5 : 59.7 bgr24_to_uv_128_c : 294.0 : 830.7 bgr24_to_uv_128_neon: 99.7 : 112.0 bgr24_to_uv_1080_c : 965.0 : 6624.0 bgr24_to_uv_1080_neon : 751.5 : 754.7 bgr24_to_uv_1920_c : 1693.2: 11554.5 bgr24_to_uv_1920_neon : 1292.5: 1307.5 bgr24_to_uv_half_8_c: 54.2 : 37.0 bgr24_to_uv_half_8_neon : 27.2 : 22.5 bgr24_to_uv_half_128_c : 127.2 : 392.5 bgr24_to_uv_half_128_neon : 63.0 : 52.0 bgr24_to_uv_half_1080_c : 880.2 : 3329.0 bgr24_to_uv_half_1080_neon : 401.5 : 390.7 bgr24_to_uv_half_1920_c : 1585.7: 6390.7 bgr24_to_uv_half_1920_neon : 694.7 : 698.7 bgr24_to_y_8_c : 21.7 : 22.5 bgr24_to_y_8_neon : 797.2 : 25.5 bgr24_to_y_128_c: 88.0 : 280.5 bgr24_to_y_128_neon : 63.7 : 55.0 bgr24_to_y_1080_c : 616.7 : 2208.7 bgr24_to_y_1080_neon: 900.0 : 452.0 bgr24_to_y_1920_c : 1093.2: 3894.7 bgr24_to_y_1920_neon: 777.2 : 767.5 --- libswscale/aarch64/input.S | 71 ++-- libswscale/aarch64/swscale.c | 32 +--- 2 files changed, 71 insertions(+), 32 deletions(-) diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S index 33afa34111..2cfec4cb6a 100644 --- a/libswscale/aarch64/input.S +++ b/libswscale/aarch64/input.S @@ -20,7 +20,7 @@ #include "libavutil/aarch64/asm.S" -.macro rgb24_to_yuv_load_rgb, src +.macro rgb_to_yuv_load_rgb src ld3 { v16.16b, v17.16b, v18.16b }, [\src] uxtlv19.8h, v16.8b // v19: r uxtlv20.8h, v17.8b // v20: g @@ -30,7 +30,7 @@ uxtl2 v24.8h, v18.16b// v24: b .endm -.macro rgb24_to_yuv_product, r, g, b, dst1, dst2, dst, coef0, coef1, coef2, right_shift +.macro rgb_to_yuv_product r, g, b, dst1, dst2, dst, coef0, coef1, coef2, right_shift mov \dst1\().16b, v6.16b// dst1 = const_offset mov \dst2\().16b, v6.16b// dst2 = const_offset smlal \dst1\().4s, \coef0\().4h, \r\().4h // dst1 += rx * r @@ -43,12 +43,20 @@ sqshrn2 \dst\().8h, \dst2\().4s, \right_shift // dst_higher_half = dst2 >> right_shift .endm +function ff_bgr24ToY_neon, export=1 +cmp w4, #0 // check width > 0 +ldp w12, w11, [x5] // w12: ry, w11: gy +ldr w10, [x5, #8] // w10: by +b.gt4f +ret +endfunc + function ff_rgb24ToY_neon, export=1 cmp w4, #0 // check width > 0 ldp w10, w11, [x5] // w10: ry, w11: gy ldr w12, [x5, #8] // w12: by b.le3f - +4: mov w9, #256// w9 = 1 << (RGB2YUV_SHIFT - 7) movkw9, #8, lsl #16 // w9 += 32 << (RGB2YUV_SHIFT - 1) dup v6.4s, w9 // w9: const_offset @@ -59,9 +67,9 @@ function ff_rgb24ToY_neon, export=1 dup v2.8h, w12 b.lt2f 1: -rgb24_to_yuv_load_rgb x1 -rgb24_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9 -rgb24_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9 +rgb_to_yuv_load_rgb x1 +rgb_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9 +rgb_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9 sub w4, w4, #16 // width -= 16 add x1, x1, #48 // src += 48 cmp w4, #16 // width >= 16 ? @@ -85,10 +93,7 @@ function ff_rgb24ToY_neon, export=1 ret endfunc -.macro rgb24_load_uv_coeff half -ldp w10, w11, [x6, #12] // w10: ru, w11: gu -ldp w12, w13, [x6, #20] // w12: bu, w13: rv -ldp w14, w15, [x6, #28] // w14: gv, w15: bv +.macro rgb_set_uv_coeff half .if \half mov w9, #512 movkw9, #128, lsl #16 // w9: const_offset @@ -105,12 +110,26 @@ endfunc dup v6.4s, w9 .endm +function ff_bgr24ToUV_half_neon, export=1 +cmp w5, #0 // check width > 0 +b.le3f + +ldp w12, w11, [x6, #12] +ldp
[FFmpeg-devel] [PATCH v3 3/3] swscale/aarch64: Add argb/abgr to yuv
From: Zhao Zhili Test on Apple M1 with kperf: : -O3 : -O3 -fno-vectorize abgr_to_uv_8_c : 19.4 : 26.1 abgr_to_uv_8_neon : 29.9 : 51.1 abgr_to_uv_128_c: 146.4 : 558.9 abgr_to_uv_128_neon : 85.1 : 83.4 abgr_to_uv_1080_c : 1162.6: 4786.4 abgr_to_uv_1080_neon: 819.6 : 826.6 abgr_to_uv_1920_c : 2063.6: 8492.1 abgr_to_uv_1920_neon: 1435.1: 1447.1 abgr_to_uv_half_8_c : 16.4 : 11.4 abgr_to_uv_half_8_neon : 35.6 : 20.4 abgr_to_uv_half_128_c : 108.6 : 359.4 abgr_to_uv_half_128_neon: 75.4 : 42.6 abgr_to_uv_half_1080_c : 883.4 : 2885.6 abgr_to_uv_half_1080_neon : 460.6 : 481.1 abgr_to_uv_half_1920_c : 1553.6: 5106.9 abgr_to_uv_half_1920_neon : 817.6 : 820.4 abgr_to_y_8_c : 6.1 : 26.4 abgr_to_y_8_neon: 40.6 : 6.4 abgr_to_y_128_c : 99.9 : 390.1 abgr_to_y_128_neon : 67.4 : 55.9 abgr_to_y_1080_c: 735.9 : 3170.4 abgr_to_y_1080_neon : 534.6 : 536.6 abgr_to_y_1920_c: 1279.4: 6016.4 abgr_to_y_1920_neon : 932.6 : 927.6 --- libswscale/aarch64/input.S | 114 --- libswscale/aarch64/swscale.c | 17 ++ 2 files changed, 110 insertions(+), 21 deletions(-) diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S index 6d2c6034bb..f4d587fed0 100644 --- a/libswscale/aarch64/input.S +++ b/libswscale/aarch64/input.S @@ -34,6 +34,16 @@ uxtl2 v24.8h, v18.16b// v24: b .endm +.macro argb_to_yuv_load_rgb src +ld4 { v16.16b, v17.16b, v18.16b, v19.16b }, [\src] +uxtlv21.8h, v19.8b // v21: b +uxtl2 v24.8h, v19.16b// v24: b +uxtlv19.8h, v17.8b // v19: r +uxtlv20.8h, v18.8b // v20: g +uxtl2 v22.8h, v17.16b// v22: r +uxtl2 v23.8h, v18.16b// v23: g +.endm + .macro rgb_to_yuv_product r, g, b, dst1, dst2, dst, coef0, coef1, coef2, right_shift mov \dst1\().16b, v6.16b// dst1 = const_offset mov \dst2\().16b, v6.16b// dst2 = const_offset @@ -55,7 +65,7 @@ function ff_bgr24ToY_neon, export=1 ret endfunc -.macro rgbToY_neon fmt, element +.macro rgbToY_neon fmt, element, alpha_first=0 function ff_\fmt\()ToY_neon, export=1 cmp w4, #0 // check width > 0 ldp w10, w11, [x5] // w10: ry, w11: gy @@ -72,7 +82,11 @@ function ff_\fmt\()ToY_neon, export=1 dup v2.8h, w12 b.lt2f 1: +.if \alpha_first +argb_to_yuv_load_rgb x1 +.else rgb_to_yuv_load_rgb x1, \element +.endif rgb_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9 rgb_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9 sub w4, w4, #16 // width -= 16 @@ -82,9 +96,15 @@ function ff_\fmt\()ToY_neon, export=1 b.ge1b cbz x4, 3f 2: +.if \alpha_first +ldrbw13, [x1, #1] // w13: r +ldrbw14, [x1, #2] // w14: g +ldrbw15, [x1, #3] // w15: b +.else ldrbw13, [x1] // w13: r ldrbw14, [x1, #1] // w14: g ldrbw15, [x1, #2] // w15: b +.endif smaddl x13, w13, w10, x9 // x13 = ry * r + const_offset smaddl x13, w14, w11, x13 // x13 += gy * g @@ -101,6 +121,16 @@ endfunc rgbToY_neon fmt=rgb24, element=3 +function ff_abgr32ToY_neon, export=1 +cmp w4, #0 // check width > 0 +ldp w12, w11, [x5] // w12: ry, w11: gy +ldr w10, [x5, #8] // w10: by +b.gt4f +ret +endfunc + +rgbToY_neon fmt=argb32, element=4, alpha_first=1 + function ff_bgra32ToY_neon, export=1 cmp w4, #0 // check width > 0 ldp w12, w11, [x5] // w12: ry, w11: gy @@ -138,7 +168,21 @@ function ff_bgr24ToUV_half_neon, export=1 b 4f endfunc -.macro rgbToUV_half_neon fmt, element +.macro rgb_load_add_half off_r1, off_r2, off_g1, off_g2, off_b1, off_b2 +ldrbw2, [x3, #\off_r1] // w2: r1 +ldrbw4, [x3, #\off_r2]
Re: [FFmpeg-devel] [PATCH v3 1/3] swscale/aarch64: Add bgr24 to yuv
On Mon, 24 Jun 2024, Zhao Zhili wrote: From: Zhao Zhili Test on Apple M1 with kperf : -O3 : -O3 -fno-vectorize bgr24_to_uv_8_c : 28.5 : 52.5 bgr24_to_uv_8_neon : 54.5 : 59.7 bgr24_to_uv_128_c : 294.0 : 830.7 bgr24_to_uv_128_neon: 99.7 : 112.0 bgr24_to_uv_1080_c : 965.0 : 6624.0 bgr24_to_uv_1080_neon : 751.5 : 754.7 bgr24_to_uv_1920_c : 1693.2: 11554.5 bgr24_to_uv_1920_neon : 1292.5: 1307.5 bgr24_to_uv_half_8_c: 54.2 : 37.0 bgr24_to_uv_half_8_neon : 27.2 : 22.5 bgr24_to_uv_half_128_c : 127.2 : 392.5 bgr24_to_uv_half_128_neon : 63.0 : 52.0 bgr24_to_uv_half_1080_c : 880.2 : 3329.0 bgr24_to_uv_half_1080_neon : 401.5 : 390.7 bgr24_to_uv_half_1920_c : 1585.7: 6390.7 bgr24_to_uv_half_1920_neon : 694.7 : 698.7 bgr24_to_y_8_c : 21.7 : 22.5 bgr24_to_y_8_neon : 797.2 : 25.5 bgr24_to_y_128_c: 88.0 : 280.5 bgr24_to_y_128_neon : 63.7 : 55.0 bgr24_to_y_1080_c : 616.7 : 2208.7 bgr24_to_y_1080_neon: 900.0 : 452.0 bgr24_to_y_1920_c : 1093.2: 3894.7 bgr24_to_y_1920_neon: 777.2 : 767.5 --- This patch looks ok now // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3 2/3] swscale/aarch64: Add bgra/rgba to yuv
On Mon, 24 Jun 2024, Zhao Zhili wrote: From: Zhao Zhili Test on Apple M1 with kperf : -O3 : -O3 -fno-vectorize bgra_to_uv_8_c : 13.4 : 27.5 bgra_to_uv_8_neon : 37.4 : 41.7 bgra_to_uv_128_c: 155.9 : 550.2 bgra_to_uv_128_neon : 91.7 : 92.7 bgra_to_uv_1080_c : 1173.2: 4558.2 bgra_to_uv_1080_neon: 822.7 : 809.5 bgra_to_uv_1920_c : 2078.2: 8115.2 bgra_to_uv_1920_neon: 1437.7: 1438.7 bgra_to_uv_half_8_c : 17.9 : 14.2 bgra_to_uv_half_8_neon : 37.4 : 10.5 bgra_to_uv_half_128_c : 103.9 : 326.0 bgra_to_uv_half_128_neon: 73.9 : 68.7 bgra_to_uv_half_1080_c : 850.2 : 3732.0 bgra_to_uv_half_1080_neon : 484.2 : 490.0 bgra_to_uv_half_1920_c : 1479.2: 4942.7 bgra_to_uv_half_1920_neon : 824.2 : 824.7 bgra_to_y_8_c : 8.2 : 29.5 bgra_to_y_8_neon: 18.2 : 32.7 bgra_to_y_128_c : 101.4 : 361.5 bgra_to_y_128_neon : 74.9 : 73.7 bgra_to_y_1080_c: 739.4 : 3018.0 bgra_to_y_1080_neon : 613.4 : 544.2 bgra_to_y_1920_c: 1298.7: 5326.0 bgra_to_y_1920_neon : 918.7 : 934.2 --- libswscale/aarch64/input.S | 91 ++-- libswscale/aarch64/swscale.c | 16 +++ 2 files changed, 94 insertions(+), 13 deletions(-) diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S index 2cfec4cb6a..6d2c6034bb 100644 --- a/libswscale/aarch64/input.S +++ b/libswscale/aarch64/input.S @@ -20,8 +20,12 @@ #include "libavutil/aarch64/asm.S" -.macro rgb_to_yuv_load_rgb src +.macro rgb_to_yuv_load_rgb src, element=3 +.if \element == 3 ld3 { v16.16b, v17.16b, v18.16b }, [\src] +.else +ld4 { v16.16b, v17.16b, v18.16b, v19.16b }, [\src] +.endif uxtlv19.8h, v16.8b // v19: r uxtlv20.8h, v17.8b // v20: g uxtlv21.8h, v18.8b // v21: b @@ -51,7 +55,8 @@ function ff_bgr24ToY_neon, export=1 ret endfunc -function ff_rgb24ToY_neon, export=1 +.macro rgbToY_neon fmt, element +function ff_\fmt\()ToY_neon, export=1 cmp w4, #0 // check width > 0 ldp w10, w11, [x5] // w10: ry, w11: gy ldr w12, [x5, #8] // w12: by @@ -67,11 +72,11 @@ function ff_rgb24ToY_neon, export=1 dup v2.8h, w12 b.lt2f 1: -rgb_to_yuv_load_rgb x1 +rgb_to_yuv_load_rgb x1, \element rgb_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9 rgb_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9 sub w4, w4, #16 // width -= 16 -add x1, x1, #48 // src += 48 +add x1, x1, #(16*\element) cmp w4, #16 // width >= 16 ? stp q16, q17, [x0], #32 // store to dst b.ge1b @@ -86,12 +91,25 @@ function ff_rgb24ToY_neon, export=1 smaddl x13, w15, w12, x13 // x13 += by * b asr w13, w13, #9// x13 >>= 9 sub w4, w4, #1 // width-- -add x1, x1, #3 // src += 3 +add x1, x1, #\element strhw13, [x0], #2 // store to dst cbnzw4, 2b 3: ret endfunc +.endm + +rgbToY_neon fmt=rgb24, element=3 + +function ff_bgra32ToY_neon, export=1 +cmp w4, #0 // check width > 0 +ldp w12, w11, [x5] // w12: ry, w11: gy +ldr w10, [x5, #8] // w10: by +b.gt4f +ret +endfunc + +rgbToY_neon fmt=rgba32, element=4 It is extremely obscure to jump to a local label (4f) that is defined by the following macro. I think this would be much more readable if you'd include the bgr(a) version in the macro, so the reference to 4f is near to the actual label it refers to. .macro rgb_set_uv_coeff half .if \half @@ -120,7 +138,8 @@ function ff_bgr24ToUV_half_neon, export=1 b 4f endfunc -function ff_rgb24ToUV_half_neon, export=1 +.macro rgbToUV_half_neon fmt, element +function ff_\fmt\()ToUV_half_neon, export=1 cmp w5, #0 // check width > 0 b.le3f @@ -132,7 +151,11 @@ function ff_rgb24ToUV_half_neon, export=1 rgb_set_uv_coeff half=1 b.lt2f 1: +.if \element == 3 ld3 { v1
Re: [FFmpeg-devel] [PATCH 01/10 v4] avutil/stereo3d: add a Monoscopic view enum value
On 6/22/2024 8:15 PM, James Almer wrote: We need a way to signal the frame has a single view that doesn't map to any particular eye, and it should be the default one. Signed-off-by: James Almer --- libavutil/stereo3d.c | 1 + libavutil/stereo3d.h | 5 + 2 files changed, 6 insertions(+) diff --git a/libavutil/stereo3d.c b/libavutil/stereo3d.c index 19e81e4124..37cf093099 100644 --- a/libavutil/stereo3d.c +++ b/libavutil/stereo3d.c @@ -71,6 +71,7 @@ static const char * const stereo3d_view_names[] = { [AV_STEREO3D_VIEW_PACKED] = "packed", [AV_STEREO3D_VIEW_LEFT] = "left", [AV_STEREO3D_VIEW_RIGHT] = "right", +[AV_STEREO3D_VIEW_MONO] = "monoscopic", }; static const char * const stereo3d_primary_eye_names[] = { diff --git a/libavutil/stereo3d.h b/libavutil/stereo3d.h index 00a5c3900e..9a004d88a1 100644 --- a/libavutil/stereo3d.h +++ b/libavutil/stereo3d.h @@ -156,6 +156,11 @@ enum AVStereo3DView { * Frame contains only the right view. */ AV_STEREO3D_VIEW_RIGHT, + +/** + * Frame is monoscopic. + */ +AV_STEREO3D_VIEW_MONO, }; /** Looking more into this, i don't know if this is a good idea, or even backwards compatible. AVStereo3DView is right now only ever looked at if type is not 2D, so adding a view that only applies to 2D seems pointless. And if we make it the default, users (wrongly) making the assumption packed view is the default will find themselves with a 3D type signaling a monoscopic view. For now I'll apply patch 2 adding unspec type plus the patches that don't deal with the view added here, unless there are objections. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v4 1/3] swscale/aarch64: Add bgr24 to yuv
From: Zhao Zhili Test on Apple M1 with kperf : -O3 : -O3 -fno-vectorize bgr24_to_uv_8_c : 28.5 : 52.5 bgr24_to_uv_8_neon : 54.5 : 59.7 bgr24_to_uv_128_c : 294.0 : 830.7 bgr24_to_uv_128_neon: 99.7 : 112.0 bgr24_to_uv_1080_c : 965.0 : 6624.0 bgr24_to_uv_1080_neon : 751.5 : 754.7 bgr24_to_uv_1920_c : 1693.2: 11554.5 bgr24_to_uv_1920_neon : 1292.5: 1307.5 bgr24_to_uv_half_8_c: 54.2 : 37.0 bgr24_to_uv_half_8_neon : 27.2 : 22.5 bgr24_to_uv_half_128_c : 127.2 : 392.5 bgr24_to_uv_half_128_neon : 63.0 : 52.0 bgr24_to_uv_half_1080_c : 880.2 : 3329.0 bgr24_to_uv_half_1080_neon : 401.5 : 390.7 bgr24_to_uv_half_1920_c : 1585.7: 6390.7 bgr24_to_uv_half_1920_neon : 694.7 : 698.7 bgr24_to_y_8_c : 21.7 : 22.5 bgr24_to_y_8_neon : 797.2 : 25.5 bgr24_to_y_128_c: 88.0 : 280.5 bgr24_to_y_128_neon : 63.7 : 55.0 bgr24_to_y_1080_c : 616.7 : 2208.7 bgr24_to_y_1080_neon: 900.0 : 452.0 bgr24_to_y_1920_c : 1093.2: 3894.7 bgr24_to_y_1920_neon: 777.2 : 767.5 --- libswscale/aarch64/input.S | 71 ++-- libswscale/aarch64/swscale.c | 32 +--- 2 files changed, 71 insertions(+), 32 deletions(-) diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S index 33afa34111..2cfec4cb6a 100644 --- a/libswscale/aarch64/input.S +++ b/libswscale/aarch64/input.S @@ -20,7 +20,7 @@ #include "libavutil/aarch64/asm.S" -.macro rgb24_to_yuv_load_rgb, src +.macro rgb_to_yuv_load_rgb src ld3 { v16.16b, v17.16b, v18.16b }, [\src] uxtlv19.8h, v16.8b // v19: r uxtlv20.8h, v17.8b // v20: g @@ -30,7 +30,7 @@ uxtl2 v24.8h, v18.16b// v24: b .endm -.macro rgb24_to_yuv_product, r, g, b, dst1, dst2, dst, coef0, coef1, coef2, right_shift +.macro rgb_to_yuv_product r, g, b, dst1, dst2, dst, coef0, coef1, coef2, right_shift mov \dst1\().16b, v6.16b// dst1 = const_offset mov \dst2\().16b, v6.16b// dst2 = const_offset smlal \dst1\().4s, \coef0\().4h, \r\().4h // dst1 += rx * r @@ -43,12 +43,20 @@ sqshrn2 \dst\().8h, \dst2\().4s, \right_shift // dst_higher_half = dst2 >> right_shift .endm +function ff_bgr24ToY_neon, export=1 +cmp w4, #0 // check width > 0 +ldp w12, w11, [x5] // w12: ry, w11: gy +ldr w10, [x5, #8] // w10: by +b.gt4f +ret +endfunc + function ff_rgb24ToY_neon, export=1 cmp w4, #0 // check width > 0 ldp w10, w11, [x5] // w10: ry, w11: gy ldr w12, [x5, #8] // w12: by b.le3f - +4: mov w9, #256// w9 = 1 << (RGB2YUV_SHIFT - 7) movkw9, #8, lsl #16 // w9 += 32 << (RGB2YUV_SHIFT - 1) dup v6.4s, w9 // w9: const_offset @@ -59,9 +67,9 @@ function ff_rgb24ToY_neon, export=1 dup v2.8h, w12 b.lt2f 1: -rgb24_to_yuv_load_rgb x1 -rgb24_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9 -rgb24_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9 +rgb_to_yuv_load_rgb x1 +rgb_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9 +rgb_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9 sub w4, w4, #16 // width -= 16 add x1, x1, #48 // src += 48 cmp w4, #16 // width >= 16 ? @@ -85,10 +93,7 @@ function ff_rgb24ToY_neon, export=1 ret endfunc -.macro rgb24_load_uv_coeff half -ldp w10, w11, [x6, #12] // w10: ru, w11: gu -ldp w12, w13, [x6, #20] // w12: bu, w13: rv -ldp w14, w15, [x6, #28] // w14: gv, w15: bv +.macro rgb_set_uv_coeff half .if \half mov w9, #512 movkw9, #128, lsl #16 // w9: const_offset @@ -105,12 +110,26 @@ endfunc dup v6.4s, w9 .endm +function ff_bgr24ToUV_half_neon, export=1 +cmp w5, #0 // check width > 0 +b.le3f + +ldp w12, w11, [x6, #12] +ldp
[FFmpeg-devel] [PATCH v4 2/3] swscale/aarch64: Add bgra/rgba to yuv
From: Zhao Zhili Test on Apple M1 with kperf : -O3 : -O3 -fno-vectorize bgra_to_uv_8_c : 13.4 : 27.5 bgra_to_uv_8_neon : 37.4 : 41.7 bgra_to_uv_128_c: 155.9 : 550.2 bgra_to_uv_128_neon : 91.7 : 92.7 bgra_to_uv_1080_c : 1173.2: 4558.2 bgra_to_uv_1080_neon: 822.7 : 809.5 bgra_to_uv_1920_c : 2078.2: 8115.2 bgra_to_uv_1920_neon: 1437.7: 1438.7 bgra_to_uv_half_8_c : 17.9 : 14.2 bgra_to_uv_half_8_neon : 37.4 : 10.5 bgra_to_uv_half_128_c : 103.9 : 326.0 bgra_to_uv_half_128_neon: 73.9 : 68.7 bgra_to_uv_half_1080_c : 850.2 : 3732.0 bgra_to_uv_half_1080_neon : 484.2 : 490.0 bgra_to_uv_half_1920_c : 1479.2: 4942.7 bgra_to_uv_half_1920_neon : 824.2 : 824.7 bgra_to_y_8_c : 8.2 : 29.5 bgra_to_y_8_neon: 18.2 : 32.7 bgra_to_y_128_c : 101.4 : 361.5 bgra_to_y_128_neon : 74.9 : 73.7 bgra_to_y_1080_c: 739.4 : 3018.0 bgra_to_y_1080_neon : 613.4 : 544.2 bgra_to_y_1920_c: 1298.7: 5326.0 bgra_to_y_1920_neon : 918.7 : 934.2 --- libswscale/aarch64/input.S | 68 +++- libswscale/aarch64/swscale.c | 16 + 2 files changed, 68 insertions(+), 16 deletions(-) diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S index 2cfec4cb6a..ce5b042371 100644 --- a/libswscale/aarch64/input.S +++ b/libswscale/aarch64/input.S @@ -20,8 +20,12 @@ #include "libavutil/aarch64/asm.S" -.macro rgb_to_yuv_load_rgb src +.macro rgb_to_yuv_load_rgb src, element=3 +.if \element == 3 ld3 { v16.16b, v17.16b, v18.16b }, [\src] +.else +ld4 { v16.16b, v17.16b, v18.16b, v19.16b }, [\src] +.endif uxtlv19.8h, v16.8b // v19: r uxtlv20.8h, v17.8b // v20: g uxtlv21.8h, v18.8b // v21: b @@ -43,7 +47,8 @@ sqshrn2 \dst\().8h, \dst2\().4s, \right_shift // dst_higher_half = dst2 >> right_shift .endm -function ff_bgr24ToY_neon, export=1 +.macro rgbToY_neon fmt_bgr, fmt_rgb, element +function ff_\fmt_bgr\()ToY_neon, export=1 cmp w4, #0 // check width > 0 ldp w12, w11, [x5] // w12: ry, w11: gy ldr w10, [x5, #8] // w10: by @@ -51,7 +56,7 @@ function ff_bgr24ToY_neon, export=1 ret endfunc -function ff_rgb24ToY_neon, export=1 +function ff_\fmt_rgb\()ToY_neon, export=1 cmp w4, #0 // check width > 0 ldp w10, w11, [x5] // w10: ry, w11: gy ldr w12, [x5, #8] // w12: by @@ -67,11 +72,11 @@ function ff_rgb24ToY_neon, export=1 dup v2.8h, w12 b.lt2f 1: -rgb_to_yuv_load_rgb x1 +rgb_to_yuv_load_rgb x1, \element rgb_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9 rgb_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9 sub w4, w4, #16 // width -= 16 -add x1, x1, #48 // src += 48 +add x1, x1, #(16*\element) cmp w4, #16 // width >= 16 ? stp q16, q17, [x0], #32 // store to dst b.ge1b @@ -86,12 +91,17 @@ function ff_rgb24ToY_neon, export=1 smaddl x13, w15, w12, x13 // x13 += by * b asr w13, w13, #9// x13 >>= 9 sub w4, w4, #1 // width-- -add x1, x1, #3 // src += 3 +add x1, x1, #\element strhw13, [x0], #2 // store to dst cbnzw4, 2b 3: ret endfunc +.endm + +rgbToY_neon bgr24, rgb24, element=3 + +rgbToY_neon bgra32, rgba32, element=4 .macro rgb_set_uv_coeff half .if \half @@ -110,7 +120,8 @@ endfunc dup v6.4s, w9 .endm -function ff_bgr24ToUV_half_neon, export=1 +.macro rgbToUV_half_neon fmt_bgr, fmt_rgb, element +function ff_\fmt_bgr\()ToUV_half_neon, export=1 cmp w5, #0 // check width > 0 b.le3f @@ -120,7 +131,7 @@ function ff_bgr24ToUV_half_neon, export=1 b 4f endfunc -function ff_rgb24ToUV_half_neon, export=1 +function ff_\fmt_rgb\()ToUV_half_neon, export=1 cmp w5, #0 // check width > 0 b.le3f @@ -132,7 +1
[FFmpeg-devel] [PATCH v4 3/3] swscale/aarch64: Add argb/abgr to yuv
From: Zhao Zhili Test on Apple M1 with kperf: : -O3 : -O3 -fno-vectorize abgr_to_uv_8_c : 19.4 : 26.1 abgr_to_uv_8_neon : 29.9 : 51.1 abgr_to_uv_128_c: 146.4 : 558.9 abgr_to_uv_128_neon : 85.1 : 83.4 abgr_to_uv_1080_c : 1162.6: 4786.4 abgr_to_uv_1080_neon: 819.6 : 826.6 abgr_to_uv_1920_c : 2063.6: 8492.1 abgr_to_uv_1920_neon: 1435.1: 1447.1 abgr_to_uv_half_8_c : 16.4 : 11.4 abgr_to_uv_half_8_neon : 35.6 : 20.4 abgr_to_uv_half_128_c : 108.6 : 359.4 abgr_to_uv_half_128_neon: 75.4 : 42.6 abgr_to_uv_half_1080_c : 883.4 : 2885.6 abgr_to_uv_half_1080_neon : 460.6 : 481.1 abgr_to_uv_half_1920_c : 1553.6: 5106.9 abgr_to_uv_half_1920_neon : 817.6 : 820.4 abgr_to_y_8_c : 6.1 : 26.4 abgr_to_y_8_neon: 40.6 : 6.4 abgr_to_y_128_c : 99.9 : 390.1 abgr_to_y_128_neon : 67.4 : 55.9 abgr_to_y_1080_c: 735.9 : 3170.4 abgr_to_y_1080_neon : 534.6 : 536.6 abgr_to_y_1920_c: 1279.4: 6016.4 abgr_to_y_1920_neon : 932.6 : 927.6 --- libswscale/aarch64/input.S | 86 +++- libswscale/aarch64/swscale.c | 17 +++ 2 files changed, 82 insertions(+), 21 deletions(-) diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S index ce5b042371..5cb18711fb 100644 --- a/libswscale/aarch64/input.S +++ b/libswscale/aarch64/input.S @@ -34,6 +34,16 @@ uxtl2 v24.8h, v18.16b// v24: b .endm +.macro argb_to_yuv_load_rgb src +ld4 { v16.16b, v17.16b, v18.16b, v19.16b }, [\src] +uxtlv21.8h, v19.8b // v21: b +uxtl2 v24.8h, v19.16b// v24: b +uxtlv19.8h, v17.8b // v19: r +uxtlv20.8h, v18.8b // v20: g +uxtl2 v22.8h, v17.16b// v22: r +uxtl2 v23.8h, v18.16b// v23: g +.endm + .macro rgb_to_yuv_product r, g, b, dst1, dst2, dst, coef0, coef1, coef2, right_shift mov \dst1\().16b, v6.16b// dst1 = const_offset mov \dst2\().16b, v6.16b// dst2 = const_offset @@ -47,7 +57,7 @@ sqshrn2 \dst\().8h, \dst2\().4s, \right_shift // dst_higher_half = dst2 >> right_shift .endm -.macro rgbToY_neon fmt_bgr, fmt_rgb, element +.macro rgbToY_neon fmt_bgr, fmt_rgb, element, alpha_first=0 function ff_\fmt_bgr\()ToY_neon, export=1 cmp w4, #0 // check width > 0 ldp w12, w11, [x5] // w12: ry, w11: gy @@ -72,7 +82,11 @@ function ff_\fmt_rgb\()ToY_neon, export=1 dup v2.8h, w12 b.lt2f 1: +.if \alpha_first +argb_to_yuv_load_rgb x1 +.else rgb_to_yuv_load_rgb x1, \element +.endif rgb_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9 rgb_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9 sub w4, w4, #16 // width -= 16 @@ -82,9 +96,15 @@ function ff_\fmt_rgb\()ToY_neon, export=1 b.ge1b cbz x4, 3f 2: +.if \alpha_first +ldrbw13, [x1, #1] // w13: r +ldrbw14, [x1, #2] // w14: g +ldrbw15, [x1, #3] // w15: b +.else ldrbw13, [x1] // w13: r ldrbw14, [x1, #1] // w14: g ldrbw15, [x1, #2] // w15: b +.endif smaddl x13, w13, w10, x9 // x13 = ry * r + const_offset smaddl x13, w14, w11, x13 // x13 += gy * g @@ -103,6 +123,8 @@ rgbToY_neon bgr24, rgb24, element=3 rgbToY_neon bgra32, rgba32, element=4 +rgbToY_neon abgr32, argb32, element=4, alpha_first=1 + .macro rgb_set_uv_coeff half .if \half mov w9, #512 @@ -120,7 +142,21 @@ rgbToY_neon bgra32, rgba32, element=4 dup v6.4s, w9 .endm -.macro rgbToUV_half_neon fmt_bgr, fmt_rgb, element +.macro rgb_load_add_half off_r1, off_r2, off_g1, off_g2, off_b1, off_b2 +ldrbw2, [x3, #\off_r1] // w2: r1 +ldrbw4, [x3, #\off_r2] // w4: r2 +add w2, w2, w4 // w2 = r1 + r2 + +ldrbw4, [x3, #\off_g1] // w4: g1 +ldrbw7, [x3, #\off_g2] // w7: g2 +add w4, w4, w7
Re: [FFmpeg-devel] [PATCH v3 2/3] swscale/aarch64: Add bgra/rgba to yuv
> On Jun 24, 2024, at 19:55, Martin Storsjö wrote: > > On Mon, 24 Jun 2024, Zhao Zhili wrote: > >> From: Zhao Zhili >> >> Test on Apple M1 with kperf >> : -O3 : -O3 -fno-vectorize >> bgra_to_uv_8_c : 13.4 : 27.5 >> bgra_to_uv_8_neon: 37.4 : 41.7 >> bgra_to_uv_128_c : 155.9 : 550.2 >> bgra_to_uv_128_neon : 91.7 : 92.7 >> bgra_to_uv_1080_c: 1173.2: 4558.2 >> bgra_to_uv_1080_neon : 822.7 : 809.5 >> bgra_to_uv_1920_c: 2078.2: 8115.2 >> bgra_to_uv_1920_neon : 1437.7: 1438.7 >> bgra_to_uv_half_8_c : 17.9 : 14.2 >> bgra_to_uv_half_8_neon : 37.4 : 10.5 >> bgra_to_uv_half_128_c: 103.9 : 326.0 >> bgra_to_uv_half_128_neon : 73.9 : 68.7 >> bgra_to_uv_half_1080_c : 850.2 : 3732.0 >> bgra_to_uv_half_1080_neon: 484.2 : 490.0 >> bgra_to_uv_half_1920_c : 1479.2: 4942.7 >> bgra_to_uv_half_1920_neon: 824.2 : 824.7 >> bgra_to_y_8_c: 8.2 : 29.5 >> bgra_to_y_8_neon : 18.2 : 32.7 >> bgra_to_y_128_c : 101.4 : 361.5 >> bgra_to_y_128_neon : 74.9 : 73.7 >> bgra_to_y_1080_c : 739.4 : 3018.0 >> bgra_to_y_1080_neon : 613.4 : 544.2 >> bgra_to_y_1920_c : 1298.7: 5326.0 >> bgra_to_y_1920_neon : 918.7 : 934.2 >> --- >> libswscale/aarch64/input.S | 91 ++-- >> libswscale/aarch64/swscale.c | 16 +++ >> 2 files changed, 94 insertions(+), 13 deletions(-) >> >> diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S >> index 2cfec4cb6a..6d2c6034bb 100644 >> --- a/libswscale/aarch64/input.S >> +++ b/libswscale/aarch64/input.S >> @@ -20,8 +20,12 @@ >> >> #include "libavutil/aarch64/asm.S" >> >> -.macro rgb_to_yuv_load_rgb src >> +.macro rgb_to_yuv_load_rgb src, element=3 >> +.if \element == 3 >>ld3 { v16.16b, v17.16b, v18.16b }, [\src] >> +.else >> +ld4 { v16.16b, v17.16b, v18.16b, v19.16b }, [\src] >> +.endif >>uxtlv19.8h, v16.8b // v19: r >>uxtlv20.8h, v17.8b // v20: g >>uxtlv21.8h, v18.8b // v21: b >> @@ -51,7 +55,8 @@ function ff_bgr24ToY_neon, export=1 >>ret >> endfunc >> >> -function ff_rgb24ToY_neon, export=1 >> +.macro rgbToY_neon fmt, element >> +function ff_\fmt\()ToY_neon, export=1 >>cmp w4, #0 // check width > 0 >>ldp w10, w11, [x5] // w10: ry, w11: gy >>ldr w12, [x5, #8] // w12: by >> @@ -67,11 +72,11 @@ function ff_rgb24ToY_neon, export=1 >>dup v2.8h, w12 >>b.lt2f >> 1: >> -rgb_to_yuv_load_rgb x1 >> +rgb_to_yuv_load_rgb x1, \element >>rgb_to_yuv_product v19, v20, v21, v25, v26, v16, v0, v1, v2, #9 >>rgb_to_yuv_product v22, v23, v24, v27, v28, v17, v0, v1, v2, #9 >>sub w4, w4, #16 // width -= 16 >> -add x1, x1, #48 // src += 48 >> +add x1, x1, #(16*\element) >>cmp w4, #16 // width >= 16 ? >>stp q16, q17, [x0], #32 // store to dst >>b.ge1b >> @@ -86,12 +91,25 @@ function ff_rgb24ToY_neon, export=1 >>smaddl x13, w15, w12, x13 // x13 += by * b >>asr w13, w13, #9// x13 >>= 9 >>sub w4, w4, #1 // width-- >> -add x1, x1, #3 // src += 3 >> +add x1, x1, #\element >>strhw13, [x0], #2 // store to dst >>cbnzw4, 2b >> 3: >>ret >> endfunc >> +.endm >> + >> +rgbToY_neon fmt=rgb24, element=3 >> + >> +function ff_bgra32ToY_neon, export=1 >> +cmp w4, #0 // check width > 0 >> +ldp w12, w11, [x5] // w12: ry, w11: gy >> +ldr w10, [x5, #8] // w10: by >> +b.gt4f >> +ret >> +endfunc >> + >> +rgbToY_neon fmt=rgba32, element=4 > > It is extremely obscure to jump to a local label (4f) that is defined by the > following macro. I think this would be much more readable if you'd include > the bgr(a) version in the macro, so the reference to 4f is near to the actual > label it refers to. Good idea, it saved a lot of typing. Fixed in v4. > >> .macro rgb_set_uv_coeff half >>.if \half >> @@ -120,7 +138,8 @@ function ff_bgr24ToUV_half_neon, export=1 >>b 4f >> endfunc >>
Re: [FFmpeg-devel] [PATCH] avformat/tls_schannel: forward AVIO_FLAG_NONBLOCK to tcp stream
On 24/06/2024 00:07, Timo Rothenpieler wrote: On 03.06.2024 22:28, Timo Rothenpieler wrote: From: BtbN Fixes for example rtmps streaming over schannel. --- libavformat/tls_schannel.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/libavformat/tls_schannel.c b/libavformat/tls_schannel.c index 214a47a218..7265a9794d 100644 --- a/libavformat/tls_schannel.c +++ b/libavformat/tls_schannel.c @@ -113,6 +113,7 @@ static int tls_shutdown_client(URLContext *h) c->request_flags, 0, 0, NULL, 0, &c->ctxt_handle, &outbuf_desc, &c->context_flags, &c->ctxt_timestamp); if (sspi_ret == SEC_E_OK || sspi_ret == SEC_I_CONTEXT_EXPIRED) { + s->tcp->flags &= ~AVIO_FLAG_NONBLOCK; ret = ffurl_write(s->tcp, outbuf.pvBuffer, outbuf.cbBuffer); FreeContextBuffer(outbuf.pvBuffer); if (ret < 0 || ret != outbuf.cbBuffer) @@ -316,6 +317,7 @@ static int tls_client_handshake(URLContext *h) goto fail; } + s->tcp->flags &= ~AVIO_FLAG_NONBLOCK; ret = ffurl_write(s->tcp, outbuf.pvBuffer, outbuf.cbBuffer); FreeContextBuffer(outbuf.pvBuffer); if (ret < 0 || ret != outbuf.cbBuffer) { @@ -416,11 +418,16 @@ static int tls_read(URLContext *h, uint8_t *buf, int len) } } + s->tcp->flags &= ~AVIO_FLAG_NONBLOCK; + s->tcp->flags |= h->flags & AVIO_FLAG_NONBLOCK; + ret = ffurl_read(s->tcp, c->enc_buf + c->enc_buf_offset, c->enc_buf_size - c->enc_buf_offset); if (ret == AVERROR_EOF) { c->connection_closed = 1; ret = 0; + } else if (ret == AVERROR(EAGAIN)) { + ret = 0; } else if (ret < 0) { av_log(h, AV_LOG_ERROR, "Unable to read from socket\n"); return ret; @@ -564,8 +571,14 @@ static int tls_write(URLContext *h, const uint8_t *buf, int len) sspi_ret = EncryptMessage(&c->ctxt_handle, 0, &outbuf_desc, 0); if (sspi_ret == SEC_E_OK) { len = outbuf[0].cbBuffer + outbuf[1].cbBuffer + outbuf[2].cbBuffer; + + s->tcp->flags &= ~AVIO_FLAG_NONBLOCK; + s->tcp->flags |= h->flags & AVIO_FLAG_NONBLOCK; + ret = ffurl_write(s->tcp, data, len); - if (ret < 0 || ret != len) { + if (ret == AVERROR(EAGAIN)) { + goto done; + } else if (ret < 0 || ret != len) { ret = AVERROR(EIO); av_log(h, AV_LOG_ERROR, "Writing encrypted data to socket failed\n"); goto done; will apply soon applied ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v4 3/3] swscale/aarch64: Add argb/abgr to yuv
On Mon, 24 Jun 2024, Zhao Zhili wrote: From: Zhao Zhili Test on Apple M1 with kperf: : -O3 : -O3 -fno-vectorize abgr_to_uv_8_c : 19.4 : 26.1 abgr_to_uv_8_neon : 29.9 : 51.1 abgr_to_uv_128_c: 146.4 : 558.9 abgr_to_uv_128_neon : 85.1 : 83.4 abgr_to_uv_1080_c : 1162.6: 4786.4 abgr_to_uv_1080_neon: 819.6 : 826.6 abgr_to_uv_1920_c : 2063.6: 8492.1 abgr_to_uv_1920_neon: 1435.1: 1447.1 abgr_to_uv_half_8_c : 16.4 : 11.4 abgr_to_uv_half_8_neon : 35.6 : 20.4 abgr_to_uv_half_128_c : 108.6 : 359.4 abgr_to_uv_half_128_neon: 75.4 : 42.6 abgr_to_uv_half_1080_c : 883.4 : 2885.6 abgr_to_uv_half_1080_neon : 460.6 : 481.1 abgr_to_uv_half_1920_c : 1553.6: 5106.9 abgr_to_uv_half_1920_neon : 817.6 : 820.4 abgr_to_y_8_c : 6.1 : 26.4 abgr_to_y_8_neon: 40.6 : 6.4 abgr_to_y_128_c : 99.9 : 390.1 abgr_to_y_128_neon : 67.4 : 55.9 abgr_to_y_1080_c: 735.9 : 3170.4 abgr_to_y_1080_neon : 534.6 : 536.6 abgr_to_y_1920_c: 1279.4: 6016.4 abgr_to_y_1920_neon : 932.6 : 927.6 --- libswscale/aarch64/input.S | 86 +++- libswscale/aarch64/swscale.c | 17 +++ 2 files changed, 82 insertions(+), 21 deletions(-) This patchset looks ok to me (but wait a little bit in case someone else has further opinions on it). // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] lavc/vvc: Validate IBC block vector
From H.266 (V3) (09/2023) p. 321: It is a requirement of bitstream conformance that the luma block vector bvL shall obey the following constraints: - CtbSizeY is greater than or equal to ((yCb + (bvL[ 1 ] >> 4)) & (CtbSizeY − 1)) + cbHeight This patch checks this is true, which fixes crashes on fuzzed bitstreams. Signed-off-by: Frank Plowman --- libavcodec/vvc/intra.c | 25 ++--- libavcodec/vvc/thread.c | 4 +--- 2 files changed, 23 insertions(+), 6 deletions(-) diff --git a/libavcodec/vvc/intra.c b/libavcodec/vvc/intra.c index f77a012f09..11371db797 100644 --- a/libavcodec/vvc/intra.c +++ b/libavcodec/vvc/intra.c @@ -624,15 +624,26 @@ static void intra_block_copy(const VVCLocalContext *lc, const int c_idx) } } -static void vvc_predict_ibc(const VVCLocalContext *lc) +static int vvc_predict_ibc(const VVCLocalContext *lc) { -const H266RawSPS *rsps = lc->fc->ps.sps->r; +const VVCFrameContext *fc = lc->fc; +const VVCSPS *sps = lc->fc->ps.sps; +const H266RawSPS *rsps= sps->r; +const CodingUnit *cu = lc->cu; +const Mv *bv = &cu->pu.mi.mv[L0][0]; + +if (sps->ctb_size_y < ((cu->y0 + (bv->y >> 4)) & (sps->ctb_size_y - 1)) + cu->cb_height) { +av_log(fc->log_ctx, AV_LOG_ERROR, "IBC region spans multiple CTBs.\n"); +return AVERROR_INVALIDDATA; +} intra_block_copy(lc, LUMA); if (lc->cu->tree_type == SINGLE_TREE && rsps->sps_chroma_format_idc) { intra_block_copy(lc, CB); intra_block_copy(lc, CR); } + +return 0; } static void ibc_fill_vir_buf(const VVCLocalContext *lc, const CodingUnit *cu) @@ -678,7 +689,10 @@ int ff_vvc_reconstruct(VVCLocalContext *lc, const int rs, const int rx, const in if (cu->ciip_flag) ff_vvc_predict_ciip(lc); else if (cu->pred_mode == MODE_IBC) -vvc_predict_ibc(lc); +ret = vvc_predict_ibc(lc); +if (ret) +goto fail; + if (cu->coded_flag) { ret = reconstruct(lc); } else { @@ -687,10 +701,15 @@ int ff_vvc_reconstruct(VVCLocalContext *lc, const int rs, const int rx, const in if (sps->r->sps_chroma_format_idc && cu->tree_type != DUAL_TREE_LUMA) add_reconstructed_area(lc, CHROMA, cu->x0, cu->y0, cu->cb_width, cu->cb_height); } +if (ret) +goto fail; + if (sps->r->sps_ibc_enabled_flag) ibc_fill_vir_buf(lc, cu); cu = cu->next; } + +fail: ff_vvc_ctu_free_cus(ctu); return ret; } diff --git a/libavcodec/vvc/thread.c b/libavcodec/vvc/thread.c index 8777d380bf..5b01dd2d20 100644 --- a/libavcodec/vvc/thread.c +++ b/libavcodec/vvc/thread.c @@ -454,9 +454,7 @@ static int run_inter(VVCContext *s, VVCLocalContext *lc, VVCTask *t) static int run_recon(VVCContext *s, VVCLocalContext *lc, VVCTask *t) { -ff_vvc_reconstruct(lc, t->rs, t->rx, t->ry); - -return 0; +return ff_vvc_reconstruct(lc, t->rs, t->rx, t->ry); } static int run_lmcs(VVCContext *s, VVCLocalContext *lc, VVCTask *t) -- 2.45.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v4 1/3] avcodec/jpeg2000dec: Add support for CAP and CPF markers
This commit adds support for CAP and CPF markers. The previous version (v3, https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=12246) was wrongly separated. I have fixed the way to separation. The changes are essentially the same as v2 (https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=12199). The suggested modifications have been made according to the discussion on this mailing list. Signed-off-by: Osamu Watanabe --- libavcodec/jpeg2000.h| 8 +++ libavcodec/jpeg2000dec.c | 112 ++- libavcodec/jpeg2000dec.h | 7 +++ 3 files changed, 126 insertions(+), 1 deletion(-) diff --git a/libavcodec/jpeg2000.h b/libavcodec/jpeg2000.h index d004c08f10..4bdc38df7c 100644 --- a/libavcodec/jpeg2000.h +++ b/libavcodec/jpeg2000.h @@ -37,12 +37,14 @@ enum Jpeg2000Markers { JPEG2000_SOC = 0xff4f, // start of codestream +JPEG2000_CAP = 0xff50, // extended capabilities JPEG2000_SIZ = 0xff51, // image and tile size JPEG2000_COD, // coding style default JPEG2000_COC, // coding style component JPEG2000_TLM = 0xff55, // tile-part length, main header JPEG2000_PLM = 0xff57, // packet length, main header JPEG2000_PLT, // packet length, tile-part header +JPEG2000_CPF, // corresponding profile JPEG2000_QCD = 0xff5c, // quantization default JPEG2000_QCC, // quantization component JPEG2000_RGN, // region of interest @@ -58,6 +60,12 @@ enum Jpeg2000Markers { JPEG2000_EOC = 0xffd9, // end of codestream }; +enum JPEG2000_Ccap15_b14_15_params { +HTJ2K_HTONLY = 0, // HTONLY, bit 14 and 15 are 0 +HTJ2K_HTDECLARED, // HTDECLARED, bit 14 = 1 and bit 15 = 0 +HTJ2K_MIXED = 3, // MIXED, bit 14 and 15 are 1 +}; + #define JPEG2000_SOP_FIXED_BYTES 0xFF910004 #define JPEG2000_SOP_BYTE_LENGTH 6 diff --git a/libavcodec/jpeg2000dec.c b/libavcodec/jpeg2000dec.c index 091931b1ff..d1046661c4 100644 --- a/libavcodec/jpeg2000dec.c +++ b/libavcodec/jpeg2000dec.c @@ -408,6 +408,73 @@ static int get_siz(Jpeg2000DecoderContext *s) s->avctx->bits_per_raw_sample = s->precision; return 0; } +/* get extended capabilities (CAP) marker segment */ +static int get_cap(Jpeg2000DecoderContext *s, Jpeg2000CodingStyle *c) +{ +uint32_t Pcap; +uint16_t Ccap_i[32] = { 0 }; +uint16_t Ccap_15; +uint8_t P; + +if (bytestream2_get_bytes_left(&s->g) < 6) { +av_log(s->avctx, AV_LOG_ERROR, "Insufficient space for CAP\n"); +return AVERROR_INVALIDDATA; +} + +Pcap = bytestream2_get_be32u(&s->g); +s->isHT = (Pcap >> (31 - (15 - 1))) & 1; +for (int i = 0; i < 32; i++) { +if ((Pcap >> (31 - i)) & 1) +Ccap_i[i] = bytestream2_get_be16u(&s->g); +} +Ccap_15 = Ccap_i[14]; +if (s->isHT == 1) { +av_log(s->avctx, AV_LOG_INFO, "This is an HTJ2K codestream.\n"); +// Bits 14-15 +switch ((Ccap_15 >> 14) & 0x3) { +case 0x3: +s->Ccap15_b14_15 = HTJ2K_MIXED; +break; +case 0x1: +s->Ccap15_b14_15 = HTJ2K_HTDECLARED; +break; +case 0x0: +s->Ccap15_b14_15 = HTJ2K_HTONLY; +break; +default: +av_log(s->avctx, AV_LOG_ERROR, "Unknown CCap value.\n"); +return AVERROR(EINVAL); +break; +} +// Bit 13 +if ((Ccap_15 >> 13) & 1) { +av_log(s->avctx, AV_LOG_ERROR, "MULTIHT set is not supported.\n"); +return AVERROR_PATCHWELCOME; +} +// Bit 12 +s->Ccap15_b12 = (Ccap_15 >> 12) & 1; +// Bit 11 +s->Ccap15_b11 = (Ccap_15 >> 11) & 1; +// Bit 5 +s->Ccap15_b05 = (Ccap_15 >> 5) & 1; +// Bit 0-4 +P = Ccap_15 & 0x1F; +if (!P) +s->HT_MAGB = 8; +else if (P < 20) +s->HT_MAGB = P + 8; +else if (P < 31) +s->HT_MAGB = 4 * (P - 19) + 27; +else +s->HT_MAGB = 74; + +if (s->HT_MAGB > 31) { +av_log(s->avctx, AV_LOG_ERROR, "Available internal precision is exceeded (MAGB> 31).\n"); +return AVERROR_PATCHWELCOME; +} +} +return 0; +} /* get common part for COD and COC segments */ static int get_cox(Jpeg2000DecoderContext *s, Jpeg2000CodingStyle *c) @@ -802,6 +869,15 @@ static int read_crg(Jpeg2000DecoderContext *s, int n) bytestream2_skip(&s->g, n - 2); return 0; } + +static int read_cpf(Jpeg2000DecoderContext *s, int n) +{ +if (bytestream2_get_bytes_left(&s->g) < (n - 2)) +return AVERROR_INVALIDDATA; +bytestream2_skip(&s->g, n - 2); +return 0; +} + /* Tile-part lengths: see ISO 15444-1:2002, section A.7.1 * Used to know the number of tile parts and lengths. * There may be multiple TLMs in the header. @@ -965,6 +1041,14 @@ static int init_tile(Jp
[FFmpeg-devel] [PATCH v4 2/3] avcodec/jpeg2000dec: Add support for placeholder passes
This commit adds support for placeholder pass parsing Signed-off-by: Osamu Watanabe --- libavcodec/jpeg2000.h | 2 + libavcodec/jpeg2000dec.c | 352 + libavcodec/jpeg2000htdec.c | 90 +- 3 files changed, 326 insertions(+), 118 deletions(-) diff --git a/libavcodec/jpeg2000.h b/libavcodec/jpeg2000.h index 4bdc38df7c..93221d90ca 100644 --- a/libavcodec/jpeg2000.h +++ b/libavcodec/jpeg2000.h @@ -200,6 +200,8 @@ typedef struct Jpeg2000Cblk { /* specific to HT code-blocks */ int zbp; int pass_lengths[2]; +uint8_t modes; // copy of SPcod/SPcoc field to parse HT-MIXED mode +uint8_t ht_plhd; // are we looking for HT placeholder passes? } Jpeg2000Cblk; // code block typedef struct Jpeg2000Prec { diff --git a/libavcodec/jpeg2000dec.c b/libavcodec/jpeg2000dec.c index d1046661c4..2c66c21b88 100644 --- a/libavcodec/jpeg2000dec.c +++ b/libavcodec/jpeg2000dec.c @@ -54,6 +54,15 @@ #define HAD_COC 0x01 #define HAD_QCC 0x02 +// Values of flag for placeholder passes +enum HT_PLHD_STATUS { +HT_PLHD_OFF, +HT_PLHD_ON +}; + +#define HT_MIXED 0x80 // bit 7 of SPcod/SPcoc + + /* get_bits functions for JPEG2000 packet bitstream * It is a get_bit function with a bit-stuffing routine. If the value of the * byte is 0xFF, the next byte includes an extra zero bit stuffed into the MSB. @@ -1160,100 +1169,293 @@ static int jpeg2000_decode_packet(Jpeg2000DecoderContext *s, Jpeg2000Tile *tile, int incl, newpasses, llen; void *tmp; -if (cblk->npasses) -incl = get_bits(s, 1); -else +if (!cblk->incl) { +incl = 0; +cblk->modes = codsty->cblk_style; +if (cblk->modes >= JPEG2000_CTSY_HTJ2K_F) +cblk->ht_plhd = HT_PLHD_ON; +if (layno > 0) +incl = tag_tree_decode(s, prec->cblkincl + cblkno, 0 + 1) == 0; incl = tag_tree_decode(s, prec->cblkincl + cblkno, layno + 1) == layno; -if (!incl) -continue; -else if (incl < 0) -return incl; - -if (!cblk->npasses) { -int zbp = tag_tree_decode(s, prec->zerobits + cblkno, 100); -int v = expn[bandno] + numgbits - 1 - zbp; -if (v < 0 || v > 30) { -av_log(s->avctx, AV_LOG_ERROR, - "nonzerobits %d invalid or unsupported\n", v); -return AVERROR_INVALIDDATA; +if (incl) { +int zbp = tag_tree_decode(s, prec->zerobits + cblkno, 100); +int v = expn[bandno] + numgbits - 1 - (zbp - tile->comp->roi_shift); +if (v < 0 || v > 30) { +av_log(s->avctx, AV_LOG_ERROR, + "nonzerobits %d invalid or unsupported\n", v); +return AVERROR_INVALIDDATA; +} +cblk->incl = 1; +cblk->nonzerobits = v; +cblk->zbp = zbp; +cblk->lblock = 3; } -cblk->zbp = zbp; -cblk->nonzerobits = v; -} -if ((newpasses = getnpasses(s)) < 0) -return newpasses; -av_assert2(newpasses > 0); -if (cblk->npasses + newpasses >= JPEG2000_MAX_PASSES) { -avpriv_request_sample(s->avctx, "Too many passes"); -return AVERROR_PATCHWELCOME; -} -if ((llen = getlblockinc(s)) < 0) -return llen; -if (cblk->lblock + llen + av_log2(newpasses) > 16) { -avpriv_request_sample(s->avctx, - "Block with length beyond 16 bits"); -return AVERROR_PATCHWELCOME; +} else { +incl = get_bits(s, 1); } -cblk->lblock += llen; - -cblk->nb_lengthinc = 0; -cblk->nb_terminationsinc = 0; -av_free(cblk->lengthinc); -cblk->lengthinc = av_calloc(newpasses, sizeof(*cblk->lengthinc)); -if (!cblk->lengthinc) -return AVERROR(ENOMEM); -tmp = av_realloc_array(cblk->data_start, cblk->nb_terminations + newpasses + 1, sizeof(*cblk->data_start)); -if (!tmp) -return AVERROR(ENOMEM); -cblk->data_start = tmp; -do { -int newpasses1 = 0; +if (incl) { +uint8_t bypass_term_threshold = 0; +uint8_t bits_to_read = 0; +uint32_t segment_bytes = 0; +int32_t segment_passes = 0; +uint8_t next_segment_passes = 0; +int32_t href_passes, pass_bound; +uint32_t tmp_length = 0; +int32_t newpasses_copy, npasses
[FFmpeg-devel] [PATCH v4 3/3] avcodec/jpeg2000dec: Fix HT decoding
This commit fixes wrong treatment of MAGBP value in Ccap15 and bugs in HT block decoding. Signed-off-by: Osamu Watanabe --- libavcodec/jpeg2000dec.c | 11 +-- libavcodec/jpeg2000htdec.c | 136 ++--- libavcodec/jpeg2000htdec.h | 2 +- 3 files changed, 89 insertions(+), 60 deletions(-) diff --git a/libavcodec/jpeg2000dec.c b/libavcodec/jpeg2000dec.c index 2c66c21b88..83cd5dbc7c 100644 --- a/libavcodec/jpeg2000dec.c +++ b/libavcodec/jpeg2000dec.c @@ -391,6 +391,9 @@ static int get_siz(Jpeg2000DecoderContext *s) } else if (ncomponents == 1 && s->precision == 8) { s->avctx->pix_fmt = AV_PIX_FMT_GRAY8; i = 0; +} else if (ncomponents == 1 && s->precision == 12) { +s->avctx->pix_fmt = AV_PIX_FMT_GRAY16LE; +i = 0; } } @@ -2204,7 +2207,7 @@ static inline int tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile Jpeg2000Band *band = rlevel->band + bandno; int cblkno = 0, bandpos; /* See Rec. ITU-T T.800, Equation E-2 */ -int magp = quantsty->expn[subbandno] + quantsty->nguardbits - 1; +int M_b = quantsty->expn[subbandno] + quantsty->nguardbits - 1; bandpos = bandno + (reslevelno > 0); @@ -2212,8 +2215,8 @@ static inline int tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile band->coord[1][0] == band->coord[1][1]) continue; -if ((codsty->cblk_style & JPEG2000_CTSY_HTJ2K_F) && magp >= 31) { -avpriv_request_sample(s->avctx, "JPEG2000_CTSY_HTJ2K_F and magp >= 31"); +if ((codsty->cblk_style & JPEG2000_CTSY_HTJ2K_F) && M_b >= 31) { +avpriv_request_sample(s->avctx, "JPEG2000_CTSY_HTJ2K_F and M_b >= 31"); return AVERROR_PATCHWELCOME; } @@ -2234,7 +2237,7 @@ static inline int tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile ret = ff_jpeg2000_decode_htj2k(s, codsty, &t1, cblk, cblk->coord[0][1] - cblk->coord[0][0], cblk->coord[1][1] - cblk->coord[1][0], - magp, comp->roi_shift); + M_b, comp->roi_shift); else ret = decode_cblk(s, codsty, &t1, cblk, cblk->coord[0][1] - cblk->coord[0][0], diff --git a/libavcodec/jpeg2000htdec.c b/libavcodec/jpeg2000htdec.c index 9b473e11d3..0296792a6a 100644 --- a/libavcodec/jpeg2000htdec.c +++ b/libavcodec/jpeg2000htdec.c @@ -122,7 +122,7 @@ static void jpeg2000_init_mel(StateVars *s, uint32_t Pcup) static void jpeg2000_init_mag_ref(StateVars *s, uint32_t Lref) { -s->pos = Lref - 2; +s->pos = Lref - 1; s->bits = 0; s->last = 0xFF; s->tmp = 0; @@ -145,9 +145,10 @@ static void jpeg2000_init_mel_decoder(MelDecoderState *mel_state) static int jpeg2000_bitbuf_refill_backwards(StateVars *buffer, const uint8_t *array) { uint64_t tmp = 0; -int32_t position = buffer->pos - 4; uint32_t new_bits = 32; +buffer->last = array[buffer->pos + 1]; + if (buffer->bits_left >= 32) return 0; // enough data, no need to pull in more bits @@ -157,9 +158,24 @@ static int jpeg2000_bitbuf_refill_backwards(StateVars *buffer, const uint8_t *ar * the bottom most bits. */ -for(int i = FFMAX(0, position + 1); i <= buffer->pos + 1; i++) -tmp = 256*tmp + array[i]; - +if (buffer->pos >= 3) { // Common case; we have at least 4 bytes available + tmp = array[buffer->pos - 3]; + tmp = (tmp << 8) | array[buffer->pos - 2]; + tmp = (tmp << 8) | array[buffer->pos - 1]; + tmp = (tmp << 8) | array[buffer->pos]; + tmp = (tmp << 8) | buffer->last; // For stuffing bit detection + buffer->pos -= 4; +} else { +if (buffer->pos >= 2) +tmp = array[buffer->pos - 2]; +if (buffer->pos >= 1) +tmp = (tmp << 8) | array[buffer->pos - 1]; +if (buffer->pos >= 0) +tmp = (tmp << 8) | array[buffer->pos]; +buffer->pos = 0; +tmp = (tmp << 8) | buffer->last; // For stuffing bit detection +} +// Now remove any stuffing bits, shifting things down as we go if ((tmp & 0x7FFF00) > 0x7F8F00) { tmp &= 0x7F; new_bits--; @@ -176,13 +192,11 @@ static int jpeg2000_bitbuf_refill_backwards(StateVars *buffer, const uint8_t *ar tmp = (tmp & 0x007FFF) + ((tmp & 0xFF) >> 1); new_bits--; } - -tmp >>= 8; // Remove temporary byte loaded +tmp >>= 8; // Shift
Re: [FFmpeg-devel] [RFC]] swscale modernization proposal
On Sun, 23 Jun 2024 14:57:31 -0300 James Almer wrote: > On 6/22/2024 7:19 PM, Vittorio Giovara wrote: > > Needless to say I support the plan of renaming the library so that it can > > be inline with the other libraries names, and the use of a separate header > > since downstream applications will need to update a lot to use the new > > library (or the new apis in the existing library) and/or we could provide a > > thin conversion layer when the new lib is finalized. > > I don't quite agree with renaming it. As Michael already pointed out, > the av prefix wouldn't fit a scaling library nor a resampling one, as > they only handle one or the other. By this logic, both libswscale and libswsresample should be merged into libavscale. The mathematics of resampling and scaling is the same :) Anyway, renaming a library needs a really strong motivating reason, and I don't see that reason being present here. As discussed further up-thread, I will try and re-use the existing swscale public API, but internally restructure things so that SwsContext is itself the "high-level wrapper" that I intended to be. We are very fortunate that SwsContext is entirely private, so I'm not too concerned about the code implications of this. At worst it will involve a bunch of renaming commits. > There's also the precedent of avresample, which was ultimately dropped > in favor of swresample, so trying to replace swscale with a new avscale > library will be both confusing and going against what was already > established. > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [RFC]] swscale modernization proposal
On Sun, Jun 23, 2024 at 7:57 PM James Almer wrote: > On 6/22/2024 7:19 PM, Vittorio Giovara wrote: > > Needless to say I support the plan of renaming the library so that it can > > be inline with the other libraries names, and the use of a separate > header > > since downstream applications will need to update a lot to use the new > > library (or the new apis in the existing library) and/or we could > provide a > > thin conversion layer when the new lib is finalized. > > I don't quite agree with renaming it. As Michael already pointed out, > the av prefix wouldn't fit a scaling library nor a resampling one, as > they only handle one or the other. > by that reasoning we should ban all subtitles from all the libraries av is a shorthand of multimedia and many people in the industry refer to ffmpeg libs as "libav*" so it feels a bit odd to push for preserving an alternative name > There's also the precedent of avresample, which was ultimately dropped > in favor of swresample, so trying to replace swscale with a new avscale > library will be both confusing and going against what was already > established. it's still 4 libraries vs 2... and swr/avr is shrouded in bad history that is not worth bringing up I'd understand opposing a rename just for the sake of renaming, but this is essentially a new library, i see no value in preserving the old naming scheme, if not making downstream life worse :x -- Vittorio ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] lavc/vvc: Always set flags for the current picture
ff_vvc_frame_rpl uses the flags to detect whether a frame is in use. Therefore, in the case of a CVSS AU (RASL/GDR with NoOutputBeforeRecoveryFlag) with ph_non_ref_pic_flag = 1, the frame would be freed before it is used. Fix this by always marking the current frame with VVC_FRAME_FLAG_SHORT_REF, as is done by the HEVC decoder. Additionally, add an assert0 to mitigate the effects of a frame being freed before it is used. Signed-off-by: Frank Plowman --- libavcodec/vvc/refs.c | 2 +- libavcodec/vvc/thread.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/libavcodec/vvc/refs.c b/libavcodec/vvc/refs.c index 8b7ba639a3..26a5b0b34c 100644 --- a/libavcodec/vvc/refs.c +++ b/libavcodec/vvc/refs.c @@ -191,7 +191,7 @@ int ff_vvc_set_new_ref(VVCContext *s, VVCFrameContext *fc, AVFrame **frame) fc->ref = ref; if (s->no_output_before_recovery_flag && (IS_RASL(s) || !GDR_IS_RECOVERED(s))) -ref->flags = 0; +ref->flags = VVC_FRAME_FLAG_SHORT_REF; else if (ph->r->ph_pic_output_flag) ref->flags = VVC_FRAME_FLAG_OUTPUT; diff --git a/libavcodec/vvc/thread.c b/libavcodec/vvc/thread.c index 5b01dd2d20..e87ed5b676 100644 --- a/libavcodec/vvc/thread.c +++ b/libavcodec/vvc/thread.c @@ -801,6 +801,8 @@ int ff_vvc_frame_wait(VVCContext *s, VVCFrameContext *fc) { VVCFrameThread *ft = fc->ft; +av_assert0(fc->ref->progress); + ff_mutex_lock(&ft->lock); while (atomic_load(&ft->nb_scheduled_tasks) || atomic_load(&ft->nb_scheduled_listeners)) -- 2.45.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2] lavu/stereo3d: change the horizontal FOV field to a rational
On 6/24/2024 1:13 AM, James Almer wrote: > If Derek is also ok with this then LGTM. I do not object. - Derek ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] avcodec/dovi_rpudec: fix reading el_bit_depth_minus8
From: Cosmin Stejerean now that we are reading ext_mapping_idc as the upper 8 bits of el_bit_depth_minus8 we need to use get_ue_golomb_long rather than get_ue_golomb_31 for reading it --- libavcodec/dovi_rpudec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/dovi_rpudec.c b/libavcodec/dovi_rpudec.c index 8cafdcf5e6..c025800206 100644 --- a/libavcodec/dovi_rpudec.c +++ b/libavcodec/dovi_rpudec.c @@ -420,7 +420,7 @@ int ff_dovi_rpu_parse(DOVIContext *s, const uint8_t *rpu, size_t rpu_size, if ((hdr->rpu_format & 0x700) == 0) { int bl_bit_depth_minus8 = get_ue_golomb_31(gb); -int el_bit_depth_minus8 = get_ue_golomb_31(gb); +int el_bit_depth_minus8 = get_ue_golomb_long(gb); int vdr_bit_depth_minus8 = get_ue_golomb_31(gb); int reserved_zero_3bits; /* ext_mapping_idc is in the upper 8 bits of el_bit_depth_minus8 */ -- 2.42.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/dovi_rpudec: fix reading el_bit_depth_minus8
On Mon, 24 Jun 2024 15:56:12 + Cosmin Stejerean via ffmpeg-devel wrote: > From: Cosmin Stejerean > > now that we are reading ext_mapping_idc as the upper 8 bits of > el_bit_depth_minus8 we need to use get_ue_golomb_long rather than > get_ue_golomb_31 for reading it > > --- > libavcodec/dovi_rpudec.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/libavcodec/dovi_rpudec.c b/libavcodec/dovi_rpudec.c > index 8cafdcf5e6..c025800206 100644 > --- a/libavcodec/dovi_rpudec.c > +++ b/libavcodec/dovi_rpudec.c > @@ -420,7 +420,7 @@ int ff_dovi_rpu_parse(DOVIContext *s, const uint8_t *rpu, > size_t rpu_size, > > if ((hdr->rpu_format & 0x700) == 0) { > int bl_bit_depth_minus8 = get_ue_golomb_31(gb); > -int el_bit_depth_minus8 = get_ue_golomb_31(gb); > +int el_bit_depth_minus8 = get_ue_golomb_long(gb); > int vdr_bit_depth_minus8 = get_ue_golomb_31(gb); > int reserved_zero_3bits; > /* ext_mapping_idc is in the upper 8 bits of el_bit_depth_minus8 > */ > -- > 2.42.1 LGTM. I checked also the equivalent for set_ue_golomb(), but it's fine up to 2^16-2, which is enough for the max value of (0xFF << 8) | 8. > > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [GASPP PATCH 1/2] Translate .xword and .dword to .quad
--- gas-preprocessor.pl | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl index b0c343e..20b927f 100755 --- a/gas-preprocessor.pl +++ b/gas-preprocessor.pl @@ -1169,6 +1169,8 @@ sub handle_serialized_line { $line =~ s/\.syntax/$comm$&/x if $as_type =~ /armasm/; $line =~ s/\.hword/.short/x; +$line =~ s/\.xword/.quad/x; +$line =~ s/\.dword/.quad/x; if ($as_type =~ /^apple-/) { # the syntax for these is a little different -- 2.34.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [GASPP PATCH 2/2] Handle local labels in expressions with .xword/.dword/.quad
--- This might be needed in dav1d in the future. --- gas-preprocessor.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl index 20b927f..19b0131 100755 --- a/gas-preprocessor.pl +++ b/gas-preprocessor.pl @@ -958,7 +958,7 @@ sub handle_serialized_line { $xreg =~ s/w/x/; $line =~ s/\b$reg\b/$xreg/; } -} elsif ($line =~ /^\s*.h?word.*\b\d+[bf]\b/) { +} elsif ($line =~ /^\s*.([hxd]?word|quad).*\b\d+[bf]\b/) { while ($line =~ /\b(\d+)([bf])\b/g) { $line = handle_local_label($line, $1, $2); } -- 2.34.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/9] avcodec/dovi_rpudec: clarify semantics
From: Niklas Haas ff_dovi_rpu_parse() and ff_dovi_rpu_generate() are a bit inconsistent in that they expect different levels of encapsulation, due to the nature of how this is handled in the context of different APIs. Clarify the status quo. (And fix an incorrect reference to the RPU payload bytes as 'RBSP') --- libavcodec/dovi_rpu.h| 5 +++-- libavcodec/dovi_rpudec.c | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/libavcodec/dovi_rpu.h b/libavcodec/dovi_rpu.h index bfb118d6b5..205d16ffbc 100644 --- a/libavcodec/dovi_rpu.h +++ b/libavcodec/dovi_rpu.h @@ -95,8 +95,9 @@ void ff_dovi_ctx_unref(DOVIContext *s); void ff_dovi_ctx_flush(DOVIContext *s); /** - * Parse the contents of a Dovi RPU NAL and update the parsed values in the - * DOVIContext struct. + * Parse the contents of a Dolby Vision RPU and update the parsed values in the + * DOVIContext struct. This function should receive the decoded unit payload, + * without any T.35 or NAL unit headers. * * Returns 0 or an error code. * diff --git a/libavcodec/dovi_rpudec.c b/libavcodec/dovi_rpudec.c index c025800206..375e6e560b 100644 --- a/libavcodec/dovi_rpudec.c +++ b/libavcodec/dovi_rpudec.c @@ -360,7 +360,7 @@ int ff_dovi_rpu_parse(DOVIContext *s, const uint8_t *rpu, size_t rpu_size, emdf_protection = get_bits(gb, 5 + 12); VALIDATE(emdf_protection, 0x400, 0x400); } else { -/* NAL RBSP with prefix and trailing zeroes */ +/* NAL unit with prefix and trailing zeroes */ VALIDATE(rpu[0], 25, 25); /* NAL prefix */ rpu++; rpu_size--; -- 2.45.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/9] avcodec/dovi_rpuenc: also copy ext blocks to dovi ctx
From: Niklas Haas As the comment implies, DOVIContext.ext_blocks should also reflect the current state after ff_dovi_rpu_generate(). Fluff for now, but will be needed once we start implementing metadata compression for extension blocks as well. --- libavcodec/dovi_rpuenc.c | 12 1 file changed, 12 insertions(+) diff --git a/libavcodec/dovi_rpuenc.c b/libavcodec/dovi_rpuenc.c index a14c9cc181..f0cfecc91b 100644 --- a/libavcodec/dovi_rpuenc.c +++ b/libavcodec/dovi_rpuenc.c @@ -506,6 +506,12 @@ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, } } +if (metadata->num_ext_blocks && !s->ext_blocks) { +s->ext_blocks = ff_refstruct_allocz(sizeof(AVDOVIDmData) * AV_DOVI_MAX_EXT_BLOCKS); +if (!s->ext_blocks) +return AVERROR(ENOMEM); +} + vdr_dm_metadata_present = memcmp(color, &ff_dovi_color_default, sizeof(*color)); use_prev_vdr_rpu = !memcmp(s->vdr[vdr_rpu_id], mapping, sizeof(*mapping)); if (num_ext_blocks_v1 || num_ext_blocks_v2) @@ -636,6 +642,7 @@ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, } if (vdr_dm_metadata_present) { +size_t ext_sz; const int denom = profile == 4 ? (1 << 30) : (1 << 28); set_ue_golomb(pb, color->dm_metadata_id); /* affected_dm_id */ set_ue_golomb(pb, color->dm_metadata_id); /* current_dm_id */ @@ -673,6 +680,11 @@ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, for (int i = 0; i < metadata->num_ext_blocks; i++) generate_ext_v2(pb, av_dovi_get_ext(metadata, i)); } + +ext_sz = FFMIN(sizeof(AVDOVIDmData), metadata->ext_block_size); +for (int i = 0; i < metadata->num_ext_blocks; i++) +memcpy(&s->ext_blocks[i], av_dovi_get_ext(metadata, i), ext_sz); +s->num_ext_blocks = metadata->num_ext_blocks; } else { s->color = &ff_dovi_color_default; s->num_ext_blocks = 0; -- 2.45.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 3/9] avcodec/dovi_rpuenc: try to re-use existing vdr_rpu_id
From: Niklas Haas And only override it if we either have an exact match, or if we still have unused metadata slots (to avoid an overwrite). --- libavcodec/dovi_rpuenc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/dovi_rpuenc.c b/libavcodec/dovi_rpuenc.c index f0cfecc91b..30b6b09f1d 100644 --- a/libavcodec/dovi_rpuenc.c +++ b/libavcodec/dovi_rpuenc.c @@ -463,12 +463,12 @@ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, return AVERROR_INVALIDDATA; } -vdr_rpu_id = -1; +vdr_rpu_id = mapping->vdr_rpu_id; for (int i = 0; i <= DOVI_MAX_DM_ID; i++) { if (s->vdr[i] && !memcmp(s->vdr[i], mapping, sizeof(*mapping))) { vdr_rpu_id = i; break; -} else if (vdr_rpu_id < 0 && (!s->vdr[i] || i == DOVI_MAX_DM_ID)) { +} else if (s->vdr[vdr_rpu_id] && !s->vdr[i]) { vdr_rpu_id = i; } } -- 2.45.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 4/9] avcodec/dovi_rpuenc: allow changing vdr_rpu_id
From: Niklas Haas The version as written also compared the vdr_rpu_id field, which would defeat the purpose of trying to look for a matching slot in the first place. --- libavcodec/dovi_rpuenc.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/libavcodec/dovi_rpuenc.c b/libavcodec/dovi_rpuenc.c index 30b6b09f1d..f10e175350 100644 --- a/libavcodec/dovi_rpuenc.c +++ b/libavcodec/dovi_rpuenc.c @@ -20,6 +20,8 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ +#include + #include "libavutil/avassert.h" #include "libavutil/crc.h" #include "libavutil/mem.h" @@ -201,6 +203,15 @@ skip: return 0; } +/* compares data mappings, excluding vdr_rpu_id */ +static int cmp_data_mapping(const AVDOVIDataMapping *m1, +const AVDOVIDataMapping *m2) +{ +static_assert(offsetof(AVDOVIDataMapping, vdr_rpu_id) == 0, "vdr_rpu_id is first field"); +const void *p1 = &m1->vdr_rpu_id + 1, *p2 = &m2->vdr_rpu_id + 1; +return memcmp(p1, p2, sizeof(AVDOVIDataMapping) - sizeof(m1->vdr_rpu_id)); +} + static inline void put_ue_coef(PutBitContext *pb, const AVDOVIRpuDataHeader *hdr, uint64_t coef) { @@ -465,7 +476,7 @@ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, vdr_rpu_id = mapping->vdr_rpu_id; for (int i = 0; i <= DOVI_MAX_DM_ID; i++) { -if (s->vdr[i] && !memcmp(s->vdr[i], mapping, sizeof(*mapping))) { +if (s->vdr[i] && !cmp_data_mapping(s->vdr[i], mapping)) { vdr_rpu_id = i; break; } else if (s->vdr[vdr_rpu_id] && !s->vdr[i]) { @@ -639,6 +650,7 @@ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, } memcpy(s->vdr[vdr_rpu_id], mapping, sizeof(*mapping)); +s->vdr[vdr_rpu_id]->vdr_rpu_id = vdr_rpu_id; } if (vdr_dm_metadata_present) { -- 2.45.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 5/9] avcodec/dovi_rpuenc: add `flags` to ff_dovi_rpu_generate()
From: Niklas Haas Will be used to control compression, encapsulation etc. --- libavcodec/dovi_rpu.h| 2 +- libavcodec/dovi_rpuenc.c | 2 +- libavcodec/libaomenc.c | 2 +- libavcodec/libsvtav1.c | 2 +- libavcodec/libx265.c | 3 ++- 5 files changed, 6 insertions(+), 5 deletions(-) diff --git a/libavcodec/dovi_rpu.h b/libavcodec/dovi_rpu.h index 205d16ffbc..65a4529106 100644 --- a/libavcodec/dovi_rpu.h +++ b/libavcodec/dovi_rpu.h @@ -135,7 +135,7 @@ int ff_dovi_configure(DOVIContext *s, AVCodecContext *avctx); * including the EMDF header (profile 10) or NAL encapsulation (otherwise). */ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, - uint8_t **out_rpu, int *out_size); + int flags, uint8_t **out_rpu, int *out_size); /*** diff --git a/libavcodec/dovi_rpuenc.c b/libavcodec/dovi_rpuenc.c index f10e175350..6bfb39a7ea 100644 --- a/libavcodec/dovi_rpuenc.c +++ b/libavcodec/dovi_rpuenc.c @@ -446,7 +446,7 @@ static void generate_ext_v2(PutBitContext *pb, const AVDOVIDmData *dm) } int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, - uint8_t **out_rpu, int *out_size) + int flags, uint8_t **out_rpu, int *out_size) { PutBitContext *pb = &(PutBitContext){0}; const AVDOVIRpuDataHeader *hdr; diff --git a/libavcodec/libaomenc.c b/libavcodec/libaomenc.c index dec74ebecd..aa51c89e29 100644 --- a/libavcodec/libaomenc.c +++ b/libavcodec/libaomenc.c @@ -1294,7 +1294,7 @@ FF_ENABLE_DEPRECATION_WARNINGS const AVDOVIMetadata *metadata = (const AVDOVIMetadata *)sd->data; uint8_t *t35; int size; -if ((res = ff_dovi_rpu_generate(&ctx->dovi, metadata, &t35, &size)) < 0) +if ((res = ff_dovi_rpu_generate(&ctx->dovi, metadata, 0, &t35, &size)) < 0) return res; res = aom_img_add_metadata(rawimg, OBU_METADATA_TYPE_ITUT_T35, t35, size, AOM_MIF_ANY_FRAME); diff --git a/libavcodec/libsvtav1.c b/libavcodec/libsvtav1.c index 2fef8c8971..b6db63fd7a 100644 --- a/libavcodec/libsvtav1.c +++ b/libavcodec/libsvtav1.c @@ -541,7 +541,7 @@ static int eb_send_frame(AVCodecContext *avctx, const AVFrame *frame) const AVDOVIMetadata *metadata = (const AVDOVIMetadata *)sd->data; uint8_t *t35; int size; -if ((ret = ff_dovi_rpu_generate(&svt_enc->dovi, metadata, &t35, &size)) < 0) +if ((ret = ff_dovi_rpu_generate(&svt_enc->dovi, metadata, 0, &t35, &size)) < 0) return ret; ret = svt_add_metadata(headerPtr, EB_AV1_METADATA_TYPE_ITUT_T35, t35, size); av_free(t35); diff --git a/libavcodec/libx265.c b/libavcodec/libx265.c index 0dc7ab6eeb..4302c3d587 100644 --- a/libavcodec/libx265.c +++ b/libavcodec/libx265.c @@ -783,7 +783,8 @@ static int libx265_encode_frame(AVCodecContext *avctx, AVPacket *pkt, sd = av_frame_get_side_data(pic, AV_FRAME_DATA_DOVI_METADATA); if (ctx->dovi.cfg.dv_profile && sd) { const AVDOVIMetadata *metadata = (const AVDOVIMetadata *)sd->data; -ret = ff_dovi_rpu_generate(&ctx->dovi, metadata, &x265pic.rpu.payload, +ret = ff_dovi_rpu_generate(&ctx->dovi, metadata, 0, + &x265pic.rpu.payload, &x265pic.rpu.payloadSize); if (ret < 0) { free_picture(ctx, &x265pic); -- 2.45.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 6/9] avcodec/dovi_rpuenc: make encapsulation optional
From: Niklas Haas And move the choice of desired container to `flags`. This is needed to handle differing API requirements (e.g. libx265 requires the NAL RBSP, but CBS BSF requires the unescaped bytes). --- libavcodec/dovi_rpu.h| 16 ++-- libavcodec/dovi_rpuenc.c | 22 ++ libavcodec/libaomenc.c | 3 ++- libavcodec/libsvtav1.c | 3 ++- libavcodec/libx265.c | 2 +- 5 files changed, 25 insertions(+), 21 deletions(-) diff --git a/libavcodec/dovi_rpu.h b/libavcodec/dovi_rpu.h index 65a4529106..226a769bff 100644 --- a/libavcodec/dovi_rpu.h +++ b/libavcodec/dovi_rpu.h @@ -123,16 +123,20 @@ int ff_dovi_attach_side_data(DOVIContext *s, AVFrame *frame); */ int ff_dovi_configure(DOVIContext *s, AVCodecContext *avctx); +enum { +FF_DOVI_WRAP_NAL= 1 << 0, ///< wrap inside NAL RBSP +FF_DOVI_WRAP_T35= 1 << 1, ///< wrap inside T.35+EMDF +}; + /** - * Synthesize a Dolby Vision RPU reflecting the current state. Note that this - * assumes all previous calls to `ff_dovi_rpu_generate` have been appropriately - * signalled, i.e. it will not re-send already transmitted redundant data. + * Synthesize a Dolby Vision RPU reflecting the current state. By default, the + * RPU is not encapsulated (see `flags` for more options). Note that this + * assumes all previous calls to `ff_dovi_rpu_generate` have been + * appropriately signalled, i.e. it will not re-send already transmitted + * redundant data. * * Mutates the internal state of DOVIContext to reflect the change. * Returns 0 or a negative error code. - * - * This generates a fully formed RPU ready for inclusion in the bitstream, - * including the EMDF header (profile 10) or NAL encapsulation (otherwise). */ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, int flags, uint8_t **out_rpu, int *out_size); diff --git a/libavcodec/dovi_rpuenc.c b/libavcodec/dovi_rpuenc.c index 6bfb39a7ea..41080521e1 100644 --- a/libavcodec/dovi_rpuenc.c +++ b/libavcodec/dovi_rpuenc.c @@ -710,9 +710,7 @@ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, flush_put_bits(pb); rpu_size = put_bytes_output(pb); -switch (s->cfg.dv_profile) { -case 10: -/* AV1 uses T.35 OBU with EMDF header */ +if (flags & FF_DOVI_WRAP_T35) { *out_rpu = av_malloc(rpu_size + 15); if (!*out_rpu) return AVERROR(ENOMEM); @@ -739,10 +737,8 @@ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, flush_put_bits(pb); *out_size = put_bytes_output(pb); return 0; - -case 5: -case 8: -*out_rpu = dst = av_malloc(1 + rpu_size * 3 / 2); /* worst case */ +} else if (flags & FF_DOVI_WRAP_NAL) { +*out_rpu = dst = av_malloc(4 + rpu_size * 3 / 2); /* worst case */ if (!*out_rpu) return AVERROR(ENOMEM); *dst++ = 25; /* NAL prefix */ @@ -765,10 +761,12 @@ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, } *out_size = dst - *out_rpu; return 0; - -default: -/* Should be unreachable */ -av_assert0(0); -return AVERROR_BUG; +} else { +/* Return intermediate buffer directly */ +*out_rpu = s->rpu_buf; +*out_size = rpu_size; +s->rpu_buf = NULL; +s->rpu_buf_sz = 0; +return 0; } } diff --git a/libavcodec/libaomenc.c b/libavcodec/libaomenc.c index aa51c89e29..fd9bea2505 100644 --- a/libavcodec/libaomenc.c +++ b/libavcodec/libaomenc.c @@ -1294,7 +1294,8 @@ FF_ENABLE_DEPRECATION_WARNINGS const AVDOVIMetadata *metadata = (const AVDOVIMetadata *)sd->data; uint8_t *t35; int size; -if ((res = ff_dovi_rpu_generate(&ctx->dovi, metadata, 0, &t35, &size)) < 0) +if ((res = ff_dovi_rpu_generate(&ctx->dovi, metadata, FF_DOVI_WRAP_T35, +&t35, &size)) < 0) return res; res = aom_img_add_metadata(rawimg, OBU_METADATA_TYPE_ITUT_T35, t35, size, AOM_MIF_ANY_FRAME); diff --git a/libavcodec/libsvtav1.c b/libavcodec/libsvtav1.c index b6db63fd7a..e7b12fb488 100644 --- a/libavcodec/libsvtav1.c +++ b/libavcodec/libsvtav1.c @@ -541,7 +541,8 @@ static int eb_send_frame(AVCodecContext *avctx, const AVFrame *frame) const AVDOVIMetadata *metadata = (const AVDOVIMetadata *)sd->data; uint8_t *t35; int size; -if ((ret = ff_dovi_rpu_generate(&svt_enc->dovi, metadata, 0, &t35, &size)) < 0) +if ((ret = ff_dovi_rpu_generate(&svt_enc->dovi, metadata, FF_DOVI_WRAP_T35, +&t35, &size)) < 0) return ret; ret = svt_add_metadata(headerPtr, EB_AV1_METADATA_TYPE_ITUT_T35, t35, size); av_free(t35); diff --git a/libavcodec/libx265.c b/libavcodec/libx265.c
[FFmpeg-devel] [PATCH 7/9] avcodec/dovi_rpuenc: disable metadata compression by default
From: Niklas Haas Keyframes must reset the metadata compression state, so we cannot enable metadata compression inside the encoders. Solve this by adding a new flag, rather than removing it entirely, because I plan on adding a bitstream filter for metadata compression. --- libavcodec/dovi_rpu.h| 3 +++ libavcodec/dovi_rpuenc.c | 26 ++ 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/libavcodec/dovi_rpu.h b/libavcodec/dovi_rpu.h index 226a769bff..f0d9c24379 100644 --- a/libavcodec/dovi_rpu.h +++ b/libavcodec/dovi_rpu.h @@ -126,6 +126,9 @@ int ff_dovi_configure(DOVIContext *s, AVCodecContext *avctx); enum { FF_DOVI_WRAP_NAL= 1 << 0, ///< wrap inside NAL RBSP FF_DOVI_WRAP_T35= 1 << 1, ///< wrap inside T.35+EMDF + +FF_DOVI_COMPRESS_VDR= 1 << 2, ///< enable VDR RPU compression +FF_DOVI_COMPRESS_ALL= FF_DOVI_COMPRESS_VDR, }; /** diff --git a/libavcodec/dovi_rpuenc.c b/libavcodec/dovi_rpuenc.c index 41080521e1..08170a9e84 100644 --- a/libavcodec/dovi_rpuenc.c +++ b/libavcodec/dovi_rpuenc.c @@ -21,6 +21,7 @@ */ #include +#include #include "libavutil/avassert.h" #include "libavutil/crc.h" @@ -452,9 +453,10 @@ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, const AVDOVIRpuDataHeader *hdr; const AVDOVIDataMapping *mapping; const AVDOVIColorMetadata *color; -int vdr_dm_metadata_present, vdr_rpu_id, use_prev_vdr_rpu, profile, +int vdr_dm_metadata_present, vdr_rpu_id, profile, buffer_size, rpu_size, pad, zero_run; int num_ext_blocks_v1, num_ext_blocks_v2; +int use_prev_vdr_rpu = false; uint32_t crc; uint8_t *dst; if (!metadata) { @@ -475,12 +477,21 @@ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, } vdr_rpu_id = mapping->vdr_rpu_id; -for (int i = 0; i <= DOVI_MAX_DM_ID; i++) { -if (s->vdr[i] && !cmp_data_mapping(s->vdr[i], mapping)) { -vdr_rpu_id = i; -break; -} else if (s->vdr[vdr_rpu_id] && !s->vdr[i]) { -vdr_rpu_id = i; +if (flags & FF_DOVI_COMPRESS_VDR) { +for (int i = 0; i <= DOVI_MAX_DM_ID; i++) { +if (s->vdr[i] && !cmp_data_mapping(s->vdr[i], mapping)) { +use_prev_vdr_rpu = true; +vdr_rpu_id = i; +break; +} else if (s->vdr[vdr_rpu_id] && !s->vdr[i]) { +vdr_rpu_id = i; +} +} +} else { +/* Flush VDRs to avoid leaking old state after keyframe */ +for (int i = 0; i <= DOVI_MAX_DM_ID; i++) { +if (i != vdr_rpu_id) +ff_refstruct_unref(&s->vdr[i]); } } @@ -524,7 +535,6 @@ int ff_dovi_rpu_generate(DOVIContext *s, const AVDOVIMetadata *metadata, } vdr_dm_metadata_present = memcmp(color, &ff_dovi_color_default, sizeof(*color)); -use_prev_vdr_rpu = !memcmp(s->vdr[vdr_rpu_id], mapping, sizeof(*mapping)); if (num_ext_blocks_v1 || num_ext_blocks_v2) vdr_dm_metadata_present = 1; -- 2.45.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 8/9] avcodec/dovi_rpu: add ff_dovi_get_metadata()
From: Niklas Haas Provides direct access to the AVDOVIMetadata without having to attach it to a frame. --- libavcodec/dovi_rpu.h| 9 + libavcodec/dovi_rpudec.c | 40 +++- 2 files changed, 36 insertions(+), 13 deletions(-) diff --git a/libavcodec/dovi_rpu.h b/libavcodec/dovi_rpu.h index f0d9c24379..10d5c7f566 100644 --- a/libavcodec/dovi_rpu.h +++ b/libavcodec/dovi_rpu.h @@ -108,8 +108,17 @@ void ff_dovi_ctx_flush(DOVIContext *s); int ff_dovi_rpu_parse(DOVIContext *s, const uint8_t *rpu, size_t rpu_size, int err_recognition); +/** + * Get the decoded AVDOVIMetadata. Ownership passes to the caller. + * + * Returns the size of *out_metadata, a negative error code, or 0 if no + * metadata is available to return. + */ +int ff_dovi_get_metadata(DOVIContext *s, AVDOVIMetadata **out_metadata); + /** * Attach the decoded AVDOVIMetadata as side data to an AVFrame. + * Returns 0 or a negative error code. */ int ff_dovi_attach_side_data(DOVIContext *s, AVFrame *frame); diff --git a/libavcodec/dovi_rpudec.c b/libavcodec/dovi_rpudec.c index 375e6e560b..e8c25e9f3b 100644 --- a/libavcodec/dovi_rpudec.c +++ b/libavcodec/dovi_rpudec.c @@ -30,10 +30,8 @@ #include "get_bits.h" #include "refstruct.h" -int ff_dovi_attach_side_data(DOVIContext *s, AVFrame *frame) +int ff_dovi_get_metadata(DOVIContext *s, AVDOVIMetadata **out_metadata) { -AVFrameSideData *sd; -AVBufferRef *buf; AVDOVIMetadata *dovi; size_t dovi_size, ext_sz; @@ -44,7 +42,32 @@ int ff_dovi_attach_side_data(DOVIContext *s, AVFrame *frame) if (!dovi) return AVERROR(ENOMEM); -buf = av_buffer_create((uint8_t *) dovi, dovi_size, NULL, NULL, 0); +/* Copy only the parts of these structs known to us at compiler-time. */ +#define COPY(t, a, b, last) memcpy(a, b, offsetof(t, last) + sizeof((b)->last)) +COPY(AVDOVIRpuDataHeader, av_dovi_get_header(dovi), &s->header, ext_mapping_idc_5_7); +COPY(AVDOVIDataMapping, av_dovi_get_mapping(dovi), s->mapping, nlq_pivots); +COPY(AVDOVIColorMetadata, av_dovi_get_color(dovi), s->color, source_diagonal); +ext_sz = FFMIN(sizeof(AVDOVIDmData), dovi->ext_block_size); +for (int i = 0; i < s->num_ext_blocks; i++) +memcpy(av_dovi_get_ext(dovi, i), &s->ext_blocks[i], ext_sz); +dovi->num_ext_blocks = s->num_ext_blocks; + +*out_metadata = dovi; +return dovi_size; +} + +int ff_dovi_attach_side_data(DOVIContext *s, AVFrame *frame) +{ +AVFrameSideData *sd; +AVDOVIMetadata *dovi; +AVBufferRef *buf; +int size; + +size = ff_dovi_get_metadata(s, &dovi); +if (size <= 0) +return size; + +buf = av_buffer_create((uint8_t *) dovi, size, NULL, NULL, 0); if (!buf) { av_free(dovi); return AVERROR(ENOMEM); @@ -56,15 +79,6 @@ int ff_dovi_attach_side_data(DOVIContext *s, AVFrame *frame) return AVERROR(ENOMEM); } -/* Copy only the parts of these structs known to us at compiler-time. */ -#define COPY(t, a, b, last) memcpy(a, b, offsetof(t, last) + sizeof((b)->last)) -COPY(AVDOVIRpuDataHeader, av_dovi_get_header(dovi), &s->header, ext_mapping_idc_5_7); -COPY(AVDOVIDataMapping, av_dovi_get_mapping(dovi), s->mapping, nlq_pivots); -COPY(AVDOVIColorMetadata, av_dovi_get_color(dovi), s->color, source_diagonal); -ext_sz = FFMIN(sizeof(AVDOVIDmData), dovi->ext_block_size); -for (int i = 0; i < s->num_ext_blocks; i++) -memcpy(av_dovi_get_ext(dovi, i), &s->ext_blocks[i], ext_sz); -dovi->num_ext_blocks = s->num_ext_blocks; return 0; } -- 2.45.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 9/9] avcodec/bsf/dovi_rpu: add new bitstream filter
From: Niklas Haas This can be used to strip dovi metadata, or enable/disable dovi metadata compression. Possibly more use cases in the future. --- configure | 1 + doc/bitstream_filters.texi | 21 +++ libavcodec/bitstream_filters.c | 1 + libavcodec/bsf/Makefile| 1 + libavcodec/bsf/dovi_rpu.c | 258 + 5 files changed, 282 insertions(+) create mode 100644 libavcodec/bsf/dovi_rpu.c diff --git a/configure b/configure index 3bca638459..32076079e7 100755 --- a/configure +++ b/configure @@ -3437,6 +3437,7 @@ aac_adtstoasc_bsf_select="adts_header mpeg4audio" av1_frame_merge_bsf_select="cbs_av1" av1_frame_split_bsf_select="cbs_av1" av1_metadata_bsf_select="cbs_av1" +dovi_rpu_bsf_select="cbs_h265 cbs_av1 dovi_rpudec dovi_rpuenc" dts2pts_bsf_select="cbs_h264 h264parse" eac3_core_bsf_select="ac3_parser" evc_frame_merge_bsf_select="evcparse" diff --git a/doc/bitstream_filters.texi b/doc/bitstream_filters.texi index c03f04f858..918735e8c5 100644 --- a/doc/bitstream_filters.texi +++ b/doc/bitstream_filters.texi @@ -101,6 +101,27 @@ Remove zero padding at the end of a packet. Extract the core from a DCA/DTS stream, dropping extensions such as DTS-HD. +@section dovi_rpu + +Manipulate Dolby Vision metadata in a HEVC/AV1 bitstream, optionally enabling +metadata compression. + +@table @option +@item strip +If enabled, strip all Dolby Vision metadata (configuration record + RPU data +blocks) from the stream. +@item compression +A bit mask of compression methods to enable. +@table @samp +@item none +No compression. Selected automatically for keyframes. +@item vdr +Compress VDR metadata (color reshaping / data mapping parameters). +@item all +Enable all implemented compression methods. This is the default. +@end table +@end table + @section dump_extra Add extradata to the beginning of the filtered packets except when diff --git a/libavcodec/bitstream_filters.c b/libavcodec/bitstream_filters.c index 138246c50e..f923411bee 100644 --- a/libavcodec/bitstream_filters.c +++ b/libavcodec/bitstream_filters.c @@ -31,6 +31,7 @@ extern const FFBitStreamFilter ff_av1_metadata_bsf; extern const FFBitStreamFilter ff_chomp_bsf; extern const FFBitStreamFilter ff_dump_extradata_bsf; extern const FFBitStreamFilter ff_dca_core_bsf; +extern const FFBitStreamFilter ff_dovi_rpu_bsf; extern const FFBitStreamFilter ff_dts2pts_bsf; extern const FFBitStreamFilter ff_dv_error_marker_bsf; extern const FFBitStreamFilter ff_eac3_core_bsf; diff --git a/libavcodec/bsf/Makefile b/libavcodec/bsf/Makefile index fb70ad0c21..40b7fc6e9b 100644 --- a/libavcodec/bsf/Makefile +++ b/libavcodec/bsf/Makefile @@ -19,6 +19,7 @@ OBJS-$(CONFIG_H264_MP4TOANNEXB_BSF) += bsf/h264_mp4toannexb.o OBJS-$(CONFIG_H264_REDUNDANT_PPS_BSF) += bsf/h264_redundant_pps.o OBJS-$(CONFIG_HAPQA_EXTRACT_BSF) += bsf/hapqa_extract.o OBJS-$(CONFIG_HEVC_METADATA_BSF) += bsf/h265_metadata.o +OBJS-$(CONFIG_DOVI_RPU_BSF) += bsf/dovi_rpu.o OBJS-$(CONFIG_HEVC_MP4TOANNEXB_BSF) += bsf/hevc_mp4toannexb.o OBJS-$(CONFIG_IMX_DUMP_HEADER_BSF)+= bsf/imx_dump_header.o OBJS-$(CONFIG_MEDIA100_TO_MJPEGB_BSF) += bsf/media100_to_mjpegb.o diff --git a/libavcodec/bsf/dovi_rpu.c b/libavcodec/bsf/dovi_rpu.c new file mode 100644 index 00..c57c3d87dd --- /dev/null +++ b/libavcodec/bsf/dovi_rpu.c @@ -0,0 +1,258 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/common.h" +#include "libavutil/mem.h" +#include "libavutil/opt.h" + +#include "bsf.h" +#include "bsf_internal.h" +#include "cbs.h" +#include "cbs_bsf.h" +#include "cbs_av1.h" +#include "cbs_h265.h" +#include "dovi_rpu.h" +#include "h2645data.h" +#include "h265_profile_level.h" +#include "itut35.h" + +#include "hevc/hevc.h" + +typedef struct DoviRpuContext { +CBSBSFContext common; +DOVIContext dec; +DOVIContext enc; + +int strip; +int compression; +} DoviRpuContext; + +static int update_rpu(AVBSFContext *bsf, const AVPacket *pkt, int flags, + const uint8_t *rpu, size_t rpu_size, + uint8_t **out_rpu, int *out_size) +{ +DoviRpuContext *s = bsf->priv_data; +AVDOVIMetadata *metadata = NULL
[FFmpeg-devel] [PATCH] fftools/ffplay_renderer: use correct NULL value for Vulkan type
--- fftools/ffplay_renderer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fftools/ffplay_renderer.c b/fftools/ffplay_renderer.c index 80b700b3c5..f272cb46f1 100644 --- a/fftools/ffplay_renderer.c +++ b/fftools/ffplay_renderer.c @@ -766,7 +766,7 @@ static void destroy(VkRenderer *renderer) vkDestroySurfaceKHR = (PFN_vkDestroySurfaceKHR) ctx->get_proc_addr(ctx->inst, "vkDestroySurfaceKHR"); vkDestroySurfaceKHR(ctx->inst, ctx->vk_surface, NULL); -ctx->vk_surface = NULL; +ctx->vk_surface = VK_NULL_HANDLE; } av_buffer_unref(&ctx->hw_device_ref); -- 2.44.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] Allow enabling SVC in libaomenc
This patch updates libaomenc.c to accept parameters for SVC (Scalable Video Coding) settings via the FFmpeg API `av_opt_set`. The SVC configuration is applied based on the provided parameters. As libaom's SVC functionality only operates with constant bitrate encoding [1], these parameters will only take effect when the bitrate is set to constant. [1] https://aomedia.googlesource.com/aom/+/a7ef80c44bfb34b08254194b1ab72d4e93ff4b07/av1/encoder/svc_layercontext.h#115 --- libavcodec/libaomenc.c | 75 ++ 1 file changed, 75 insertions(+) diff --git a/libavcodec/libaomenc.c b/libavcodec/libaomenc.c index dec74ebecd..a8602a6b56 100644 --- a/libavcodec/libaomenc.c +++ b/libavcodec/libaomenc.c @@ -30,6 +30,7 @@ #include #include "libavutil/avassert.h" +#include "libavutil/avstring.h" #include "libavutil/base64.h" #include "libavutil/common.h" #include "libavutil/cpu.h" @@ -137,6 +138,7 @@ typedef struct AOMEncoderContext { int enable_diff_wtd_comp; int enable_dist_wtd_comp; int enable_dual_filter; +AVDictionary *svc_parameters; AVDictionary *aom_params; } AOMContext; @@ -201,6 +203,7 @@ static const char *const ctlidstr[] = { [AV1E_GET_TARGET_SEQ_LEVEL_IDX] = "AV1E_GET_TARGET_SEQ_LEVEL_IDX", #endif [AV1_GET_NEW_FRAME_IMAGE] = "AV1_GET_NEW_FRAME_IMAGE", +[AV1E_SET_SVC_PARAMS] = "AV1E_SET_SVC_PARAMS", }; static av_cold void log_encoder_error(AVCodecContext *avctx, const char *desc) @@ -382,6 +385,31 @@ static av_cold int codecctl_imgp(AVCodecContext *avctx, return 0; } +static av_cold int codecctl_svcp(AVCodecContext *avctx, +#ifdef UENUM1BYTE + aome_enc_control_id id, +#else + enum aome_enc_control_id id, +#endif + aom_svc_params_t *svc_params) +{ +AOMContext *ctx = avctx->priv_data; +char buf[80]; +int res; + +snprintf(buf, sizeof(buf), "%s:", ctlidstr[id]); + +res = aom_codec_control(&ctx->encoder, id, svc_params); +if (res != AOM_CODEC_OK) { +snprintf(buf, sizeof(buf), "Failed to get %s codec control", + ctlidstr[id]); +log_encoder_error(avctx, buf); +return AVERROR(EINVAL); +} + +return 0; +} + static av_cold int aom_free(AVCodecContext *avctx) { AOMContext *ctx = avctx->priv_data; @@ -673,6 +701,18 @@ static int choose_tiling(AVCodecContext *avctx, return 0; } +static void aom_svc_parse_int_array(int *dest, char *value, int max_entries) +{ +int dest_idx = 0; +char *saveptr = NULL; +char *token = av_strtok(value, ",", &saveptr); + +while (token && dest_idx < max_entries) { +dest[dest_idx++] = strtoul(token, NULL, 10); +token = av_strtok(NULL, ",", &saveptr); +} +} + static av_cold int aom_init(AVCodecContext *avctx, const struct aom_codec_iface *iface) { @@ -968,6 +1008,40 @@ static av_cold int aom_init(AVCodecContext *avctx, if (ctx->enable_intrabc >= 0) codecctl_int(avctx, AV1E_SET_ENABLE_INTRABC, ctx->enable_intrabc); +if (enccfg.rc_end_usage == AOM_CBR) { +aom_svc_params_t svc_params = {}; +svc_params.framerate_factor[0] = 1; +svc_params.number_spatial_layers = 1; +svc_params.number_temporal_layers = 1; + +const AVDictionaryEntry *en = NULL; +while ((en = av_dict_iterate(ctx->svc_parameters, en))) { +if (!strlen(en->value)) +return AVERROR(EINVAL); + +if (!strcmp(en->key, "number_spatial_layers")) +svc_params.number_spatial_layers = strtoul(en->value, NULL, 10); +else if (!strcmp(en->key, "number_temporal_layers")) +svc_params.number_temporal_layers = strtoul(en->value, NULL, 10); +else if (!strcmp(en->key, "max_quantizers")) +aom_svc_parse_int_array(svc_params.max_quantizers, en->value, AOM_MAX_LAYERS); +else if (!strcmp(en->key, "min_quantizers")) +aom_svc_parse_int_array(svc_params.min_quantizers, en->value, AOM_MAX_LAYERS); +else if (!strcmp(en->key, "scaling_factor_num")) +aom_svc_parse_int_array(svc_params.scaling_factor_num, en->value, AOM_MAX_SS_LAYERS); +else if (!strcmp(en->key, "scaling_factor_den")) +aom_svc_parse_int_array(svc_params.scaling_factor_den, en->value, AOM_MAX_SS_LAYERS); +else if (!strcmp(en->key, "layer_target_bitrate")) +aom_svc_parse_int_array(svc_params.layer_target_bitrate, en->value, AOM_MAX_LAYERS); +else if (!strcmp(en->key, "framerate_factor")) +aom_svc_parse_int_array(svc_params.framerate_factor, en->value, AOM_MAX_TS_LAYERS); +} + +res = codecctl_svcp(avctx, AV1E_SET_SVC_PARAMS, &svc_params); +if (res < 0) +return res; +} + #if AOM_ENC
Re: [FFmpeg-devel] [PATCH 2/2] avformat/mov: default to Monoscopic view when parsing eyes box
On Sat, Jun 22, 2024 at 06:34:49PM -0300, James Almer wrote: > On 6/22/2024 6:25 PM, Michael Niedermayer wrote: > > On Fri, Jun 21, 2024 at 10:25:31PM -0300, James Almer wrote: > > > Signed-off-by: James Almer > > > --- > > > libavformat/mov.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > doesnt apply automatically with "git am" with the v2 > > > > Applying: avformat/mov: default to Monoscopic view when parsing eyes box > > error: sha1 information is lacking or useless (libavformat/mov.c). > > error: could not build fake ancestor > > Patch failed at 0001 avformat/mov: default to Monoscopic view when parsing > > eyes box > > > > it applies with patch but inability to automatically apply patches > > could affect tools which try to test patches posted > > > > git am --show-current-patch=diff | patch -p1 > > patching file libavformat/mov.c > > Hunk #1 succeeded at 6546 with fuzz 2. > > Are you sure your tree is clean and up to date? There's no reason for this > patch to not apply, standalone or after 1/1 v1 or v2. the patch says this: index 50e171c960..4fa39cf4fd 100644 git fetch origin git fetch jamrial git show 50e171c960 fatal: ambiguous argument '50e171c960': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git [...] -- [...]' git show 4fa39cf4fd fatal: ambiguous argument '4fa39cf4fd': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git [...] -- [...]' so i think the blob this patch was based on is not in any repository known to my git It might be able to apply the patch anyway but not having the full file this patch is based on makes it harder for git. I did have other patches applied. so git would try to merge this in and if needed produce conflict markers but given that it doesnt seem to have the file this was based on it freaked out thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB While the State exists there can be no freedom; when there is freedom there will be no State. -- Vladimir Lenin signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3 2/2] lavc/hevcdec: Update slice index before hwaccel decode slice
On Mon, Jun 24, 2024 at 8:32 AM wrote: > > From: Fei Wang > > Otherwise, slice index will never update for hwaccel decode, and slice > RPL will be always overlap into first one which use slice index to construct. > > Fixes hwaccel decoding after 47d34ba7fbb81 > > Signed-off-by: Fei Wang > --- > 1. Update commit message. > > libavcodec/hevc/hevcdec.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/hevc/hevcdec.c b/libavcodec/hevc/hevcdec.c > index 39beb7e4dc..8bb564f1b3 100644 > --- a/libavcodec/hevc/hevcdec.c > +++ b/libavcodec/hevc/hevcdec.c > @@ -2770,6 +2770,9 @@ static int decode_slice_data(HEVCContext *s, const > H2645NAL *nal, GetBitContext > const HEVCPPS *pps = s->pps; > int ret; > > +if (!s->sh.first_slice_in_pic_flag) > +s->slice_idx += !s->sh.dependent_slice_segment_flag; > + > if (!s->sh.dependent_slice_segment_flag && s->sh.slice_type != > HEVC_SLICE_I) { > ret = ff_hevc_slice_rpl(s); > if (ret < 0) { > @@ -2807,8 +2810,6 @@ static int decode_slice_data(HEVCContext *s, const > H2645NAL *nal, GetBitContext > s->local_ctx[0].tu.cu_qp_offset_cb = 0; > s->local_ctx[0].tu.cu_qp_offset_cr = 0; > > -s->slice_idx += !s->sh.dependent_slice_segment_flag; > - > if (s->avctx->active_thread_type == FF_THREAD_SLICE && > s->sh.num_entry_point_offsets > 0&& > pps->num_tile_rows == 1 && pps->num_tile_columns == 1) > -- > 2.34.1 I can confirm that this set fixes hwaccel with slices, LGTM from me. Hopefully Anton can also quickly look over it, its his changes afterall. - Hendrik ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v4 2/4] lavc/vp9dsp: R-V V mc bilin hv
Le lauantaina 15. kesäkuuta 2024, 14.50.32 EEST u...@foxmail.com a écrit : > From: sunyuechi > > C908 X60 > vp9_avg_bilin_4hv_8bpp_c : 10.79.5 > vp9_avg_bilin_4hv_8bpp_rvv_i32 :4.03.5 > vp9_avg_bilin_8hv_8bpp_c : 38.5 34.2 > vp9_avg_bilin_8hv_8bpp_rvv_i32 :7.26.5 > vp9_avg_bilin_16hv_8bpp_c : 147.2 130.5 > vp9_avg_bilin_16hv_8bpp_rvv_i32: 14.5 12.7 > vp9_avg_bilin_32hv_8bpp_c : 574.2 509.7 > vp9_avg_bilin_32hv_8bpp_rvv_i32: 42.5 38.0 > vp9_avg_bilin_64hv_8bpp_c : 2321.2 2017.7 > vp9_avg_bilin_64hv_8bpp_rvv_i32: 163.5 131.0 > vp9_put_bilin_4hv_8bpp_c : 10.08.7 > vp9_put_bilin_4hv_8bpp_rvv_i32 :3.53.0 > vp9_put_bilin_8hv_8bpp_c : 35.2 31.2 > vp9_put_bilin_8hv_8bpp_rvv_i32 :6.55.7 > vp9_put_bilin_16hv_8bpp_c : 134.0 119.0 > vp9_put_bilin_16hv_8bpp_rvv_i32: 12.7 11.5 > vp9_put_bilin_32hv_8bpp_c : 538.5 464.2 > vp9_put_bilin_32hv_8bpp_rvv_i32: 39.7 35.2 > vp9_put_bilin_64hv_8bpp_c : 2111.7 1833.2 > vp9_put_bilin_64hv_8bpp_rvv_i32: 138.5 122.5 > --- > libavcodec/riscv/vp9_mc_rvv.S | 38 +- > libavcodec/riscv/vp9dsp_init.c | 10 + > 2 files changed, 47 insertions(+), 1 deletion(-) > > diff --git a/libavcodec/riscv/vp9_mc_rvv.S b/libavcodec/riscv/vp9_mc_rvv.S > index fb7377048a..5241562531 100644 > --- a/libavcodec/riscv/vp9_mc_rvv.S > +++ b/libavcodec/riscv/vp9_mc_rvv.S > @@ -147,6 +147,40 @@ func ff_\op\()_vp9_bilin_64\type\()_rvv, zve32x > endfunc > .endm > > +.macro bilin_hv op > +func ff_\op\()_vp9_bilin_64hv_rvv, zve32x > +vsetvlstatic8 64, t0, 64 > +.Lbilin_hv\op: > +.ifc \op,avg > +csrwi vxrm, 0 > +.endif > +neg t1, a5 > +neg t2, a6 > +li t4, 8 > +bilin_load_hv24, put, a5 > +add a2, a2, a3 > +1: > +addia4, a4, -1 > +bilin_load_hv4, put, a5 > +vwmulu.vx v16, v4, a6 > +vwmaccsu.vx v16, t2, v24 > +vwadd.wxv16, v16, t4 > +vnsra.wiv16, v16, 4 Why round manually? It looks like vnclip.wi would be more straightforward here. > +vadd.vv v0, v16, v24 > +.ifc \op,avg > +vle8.v v16, (a0) > +vaaddu.vv v0, v0, v16 > +.endif > +vse8.v v0, (a0) > +vmv.v.v v24, v4 > +add a2, a2, a3 > +add a0, a0, a1 > +bneza4, 1b > + > +ret > +endfunc > +.endm > + > .irp len, 64, 32, 16, 8, 4 > copy_avg \len > .endr > @@ -155,6 +189,8 @@ bilin_h_v put, h, a5 > bilin_h_v avg, h, a5 > bilin_h_v put, v, a6 > bilin_h_v avg, v, a6 > +bilin_hv put > +bilin_hv avg > > .macro func_bilin_h_v len, op, type > func ff_\op\()_vp9_bilin_\len\()\type\()_rvv, zve32x > @@ -165,7 +201,7 @@ endfunc > > .irp len, 32, 16, 8, 4 > .irp op, put, avg > -.irp type, h, v > +.irp type, h, v, hv > func_bilin_h_v \len, \op, \type > .endr > .endr > diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c > index 9606d8545f..b3700dfb08 100644 > --- a/libavcodec/riscv/vp9dsp_init.c > +++ b/libavcodec/riscv/vp9dsp_init.c > @@ -83,6 +83,16 @@ static av_cold void vp9dsp_mc_init_riscv(VP9DSPContext > *dsp, int bpp) dsp->mc[4][FILTER_BILINEAR ][0][1][0] = > ff_put_vp9_bilin_4h_rvv; dsp->mc[4][FILTER_BILINEAR ][1][0][1] = > ff_avg_vp9_bilin_4v_rvv; dsp->mc[4][FILTER_BILINEAR ][1][1][0] = > ff_avg_vp9_bilin_4h_rvv; +dsp->mc[0][FILTER_BILINEAR ][0][1][1] = > ff_put_vp9_bilin_64hv_rvv; +dsp->mc[0][FILTER_BILINEAR ][1][1][1] = > ff_avg_vp9_bilin_64hv_rvv; +dsp->mc[1][FILTER_BILINEAR ][0][1][1] = > ff_put_vp9_bilin_32hv_rvv; +dsp->mc[1][FILTER_BILINEAR ][1][1][1] = > ff_avg_vp9_bilin_32hv_rvv; +dsp->mc[2][FILTER_BILINEAR ][0][1][1] = > ff_put_vp9_bilin_16hv_rvv; +dsp->mc[2][FILTER_BILINEAR ][1][1][1] = > ff_avg_vp9_bilin_16hv_rvv; +dsp->mc[3][FILTER_BILINEAR ][0][1][1] = > ff_put_vp9_bilin_8hv_rvv; +dsp->mc[3][FILTER_BILINEAR ][1][1][1] = > ff_avg_vp9_bilin_8hv_rvv; +dsp->mc[4][FILTER_BILINEAR ][0][1][1] = > ff_put_vp9_bilin_4hv_rvv; +dsp->mc[4][FILTER_BILINEAR ][1][1][1] = > ff_avg_vp9_bilin_4hv_rvv; > > #undef init_fpel > } -- Rémi Denis-Courmont http://www.remlab.net/ __
Re: [FFmpeg-devel] [PATCH 2/4] lavc/vp8dsp: R-V V loop_filter_simple
Le lauantaina 22. kesäkuuta 2024, 18.58.04 EEST u...@foxmail.com a écrit : > From: sunyuechi > > C908 X60 > vp8_loop_filter_simple_h_c :7.06.0 > vp8_loop_filter_simple_h_rvv_i32 :3.22.7 > vp8_loop_filter_simple_v_c :7.26.5 > vp8_loop_filter_simple_v_rvv_i32 :1.71.2 > --- > libavcodec/riscv/vp8dsp_init.c | 18 ++- > libavcodec/riscv/vp8dsp_rvv.S | 87 ++ > 2 files changed, 104 insertions(+), 1 deletion(-) > > diff --git a/libavcodec/riscv/vp8dsp_init.c b/libavcodec/riscv/vp8dsp_init.c > index dcb6307d5b..8c5b2c8b04 100644 > --- a/libavcodec/riscv/vp8dsp_init.c > +++ b/libavcodec/riscv/vp8dsp_init.c > @@ -49,6 +49,9 @@ VP8_BILIN(16, rvv256); > VP8_BILIN(8, rvv256); > VP8_BILIN(4, rvv256); > > +VP8_LF(rvv128); > +VP8_LF(rvv256); > + > av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) > { > #if HAVE_RV > @@ -147,9 +150,15 @@ av_cold void ff_vp78dsp_init_riscv(VP8DSPContext *c) > av_cold void ff_vp8dsp_init_riscv(VP8DSPContext *c) > { > #if HAVE_RVV > +int vlenb = ff_get_rv_vlenb(); > + > +#define init_loop_filter(vlen) \ > +c->vp8_v_loop_filter_simple = ff_vp8_v_loop_filter16_simple_rvv##vlen; > \ +c->vp8_h_loop_filter_simple = > ff_vp8_h_loop_filter16_simple_rvv##vlen; + > int flags = av_get_cpu_flags(); > > -if (flags & AV_CPU_FLAG_RVV_I32 && ff_rv_vlen_least(128)) { > +if (flags & AV_CPU_FLAG_RVV_I32 && vlenb >= 16) { > #if __riscv_xlen >= 64 > if (flags & AV_CPU_FLAG_RVV_I64) > c->vp8_luma_dc_wht = ff_vp8_luma_dc_wht_rvv; > @@ -159,6 +168,13 @@ av_cold void ff_vp8dsp_init_riscv(VP8DSPContext *c) > c->vp8_idct_dc_add4y = ff_vp8_idct_dc_add4y_rvv; > if (flags & AV_CPU_FLAG_RVV_I64) > c->vp8_idct_dc_add4uv = ff_vp8_idct_dc_add4uv_rvv; > + > +if (vlenb >= 32) { > +init_loop_filter(256); > +} else { > +init_loop_filter(128); > +} > } > +#undef init_loop_filter > #endif > } > diff --git a/libavcodec/riscv/vp8dsp_rvv.S b/libavcodec/riscv/vp8dsp_rvv.S > index 0cbf1672f7..b5f8bb31b4 100644 > --- a/libavcodec/riscv/vp8dsp_rvv.S > +++ b/libavcodec/riscv/vp8dsp_rvv.S > @@ -275,6 +275,93 @@ func ff_vp78_idct_dc_add4uv_rvv, zve64x > ret > endfunc > > +.macro filter_fmin len, vlen, a, f1, p0f2, q0f1 > +vsetvlstatic16 \len, \vlen > +vsext.vf2 \q0f1, \a > +vmin.vx \p0f2, \q0f1, a7 > +vmin.vx \q0f1, \q0f1, t3 > +vadd.vi \p0f2, \p0f2, 3 > +vadd.vi \q0f1, \q0f1, 4 > +vsra.vi \p0f2, \p0f2, 3 > +vsra.vi \f1, \q0f1, 3 vssra.vi > +vadd.vv \p0f2, \p0f2, v8 > +vsub.vv \q0f1, v16, \f1 > +vmax.vx \p0f2, \p0f2, zero > +vmax.vx \q0f1, \q0f1, zero > +.endm > + > +.macro filter len, vlen, type, normal, inner, dst, stride, fE, fI, thresh > +.ifc \type,v > +sllia6, \stride, 1 > +sub t2, \dst, a6 > +add t4, \dst, \stride > +sub t1, \dst, \stride > +vle8.v v1, (t2) > +vle8.v v11, (t4) > +vle8.v v17, (t1) > +vle8.v v22, (\dst) > +.else > +addit1, \dst, -1 > +addia6, \dst, -2 > +addit4, \dst, 1 > +vlse8.v v1, (a6), \stride > +vlse8.v v11, (t4), \stride > +vlse8.v v17, (t1), \stride > +vlse8.v v22, (\dst), \stride vlsseg4e8.v > +.endif > +vwsubu.vv v12, v1, v11 // p1-q1 > +vwsubu.vv v24, v22, v17// q0-p0 > +vnclip.wi v23, v12, 0 I can't find where VXRM is initialised for that. > +vsetvlstatic16 \len, \vlen > +// vp8_simple_limit(dst + i, stride, flim) > +li a7, 2 > +vneg.v v18, v12 > +vmax.vv v18, v18, v12 > +vneg.v v8, v24 > +vmax.vv v8, v8, v24 > +vsrl.vi v18, v18, 1 > +vmacc.vxv18, a7, v8 > +vmsleu.vx v0, v18, \fE > + > +li t5, 3 > +li a7, 124 > +li t3, 123 > +vmul.vx v30, v24, t5 > +vsext.vf2 v4, v23 > +vzext.vf2 v8, v17 // p0 > +vzext.vf2 v16, v22 // q0 > +vadd.vv v12, v30, v4 vwadd.wv > +vsetvlstatic8 \len, \vlen > +vnclip.wi v11, v12, 0 > +filter_fmin \len, \vlen, v11, v24, v4, v6 > +vsetvlstatic8 \len, \vlen > +vnclipu.wi v4, v4, 0 > +
Re: [FFmpeg-devel] [PATCH v2] lavu/stereo3d: change the horizontal FOV field to a rational
On 24/06/2024 17:51, Derek Buitenhuis wrote: On 6/24/2024 1:13 AM, James Almer wrote: If Derek is also ok with this then LGTM. I do not object. - Derek ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". Thanks for the reviews. Pushed with the requested changes. OpenPGP_0xA2FEA5F03F034464.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] fftools/ffplay_renderer: use correct NULL value for Vulkan type
On 24/06/2024 20:48, Timo Rothenpieler wrote: --- fftools/ffplay_renderer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fftools/ffplay_renderer.c b/fftools/ffplay_renderer.c index 80b700b3c5..f272cb46f1 100644 --- a/fftools/ffplay_renderer.c +++ b/fftools/ffplay_renderer.c @@ -766,7 +766,7 @@ static void destroy(VkRenderer *renderer) vkDestroySurfaceKHR = (PFN_vkDestroySurfaceKHR) ctx->get_proc_addr(ctx->inst, "vkDestroySurfaceKHR"); vkDestroySurfaceKHR(ctx->inst, ctx->vk_surface, NULL); -ctx->vk_surface = NULL; +ctx->vk_surface = VK_NULL_HANDLE; } av_buffer_unref(&ctx->hw_device_ref); Sure, LGTM Thanks OpenPGP_0xA2FEA5F03F034464.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/2] configure: align conditional library deps assignments
On 2024-06-21 04:18 pm, Gyan Doshi wrote: --- configure | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Plan to improve commit messages and push set in 24h. diff --git a/configure b/configure index 1e58c0dbac..db11a78c74 100755 --- a/configure +++ b/configure @@ -7764,14 +7764,14 @@ enabled elbg_filter && prepend avfilter_deps "avcodec" enabled find_rect_filter&& prepend avfilter_deps "avformat avcodec" enabled fsync_filter&& prepend avfilter_deps "avformat" enabled mcdeint_filter && prepend avfilter_deps "avcodec" -enabled movie_filter&& prepend avfilter_deps "avformat avcodec" +enabled movie_filter&& prepend avfilter_deps "avformat avcodec" enabled pan_filter && prepend avfilter_deps "swresample" enabled pp_filter && prepend avfilter_deps "postproc" enabled qrencode_filter && prepend avfilter_deps "swscale" enabled qrencodesrc_filter && prepend avfilter_deps "swscale" enabled removelogo_filter && prepend avfilter_deps "avformat avcodec swscale" enabled sab_filter && prepend avfilter_deps "swscale" -enabled scale_filter&& prepend avfilter_deps "swscale" +enabled scale_filter&& prepend avfilter_deps "swscale" enabled scale2ref_filter&& prepend avfilter_deps "swscale" enabled showcqt_filter && prepend avfilter_deps "avformat swscale" enabled signature_filter&& prepend avfilter_deps "avcodec avformat" ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".