On 10/6/15, James Almer <jamr...@gmail.com> wrote: > On 10/6/2015 4:40 PM, Paul B Mahol wrote: >> On 10/6/15, James Almer <jamr...@gmail.com> wrote: >>> Since AVFrame.extended_data is apparently not padded, simd functions >>> could in some cases overread, so make the decoder use a temp buffer >>> unconditionally. >>> >>> Signed-off-by: James Almer <jamr...@gmail.com> >>> --- >>> libavcodec/alac.c | 18 +++++------------- >>> 1 file changed, 5 insertions(+), 13 deletions(-) >>> >>> diff --git a/libavcodec/alac.c b/libavcodec/alac.c >>> index 146668e..394bd19 100644 >>> --- a/libavcodec/alac.c >>> +++ b/libavcodec/alac.c >>> @@ -80,7 +80,6 @@ typedef struct ALACContext { >>> int extra_bits; /**< number of extra bits beyond 16-bit */ >>> int nb_samples; /**< number of samples in the current frame */ >>> >>> - int direct_output; >>> int extra_bit_bug; >>> >>> ALACDSPContext dsp; >>> @@ -278,10 +277,6 @@ static int decode_element(AVCodecContext *avctx, >>> AVFrame *frame, int ch_index, >>> return AVERROR_INVALIDDATA; >>> } >>> alac->nb_samples = output_samples; >>> - if (alac->direct_output) { >>> - for (ch = 0; ch < channels; ch++) >>> - alac->output_samples_buffer[ch] = (int32_t >>> *)frame->extended_data[ch_index + ch]; >>> - } >>> >>> if (is_compressed) { >>> int16_t lpc_coefs[2][32]; >>> @@ -393,8 +388,9 @@ static int decode_element(AVCodecContext *avctx, >>> AVFrame >>> *frame, int ch_index, >>> break; >>> case 24: { >>> for (ch = 0; ch < channels; ch++) { >>> + int32_t *outbuffer = (int32_t >>> *)frame->extended_data[ch_index + >>> ch]; >>> for (i = 0; i < alac->nb_samples; i++) >>> - alac->output_samples_buffer[ch][i] <<= 8; >>> + *outbuffer++ = alac->output_samples_buffer[ch][i] << 8; >>> }} >>> break; >>> } >>> @@ -468,8 +464,7 @@ static av_cold int alac_decode_close(AVCodecContext >>> *avctx) >>> int ch; >>> for (ch = 0; ch < FFMIN(alac->channels, 2); ch++) { >>> av_freep(&alac->predict_error_buffer[ch]); >>> - if (!alac->direct_output) >>> - av_freep(&alac->output_samples_buffer[ch]); >>> + av_freep(&alac->output_samples_buffer[ch]); >>> av_freep(&alac->extra_bits_buffer[ch]); >>> } >>> >>> @@ -491,11 +486,8 @@ static int allocate_buffers(ALACContext *alac) >>> FF_ALLOC_OR_GOTO(alac->avctx, alac->predict_error_buffer[ch], >>> buf_size, buf_alloc_fail); >>> >>> - alac->direct_output = alac->sample_size > 16; >>> - if (!alac->direct_output) { >>> - FF_ALLOC_OR_GOTO(alac->avctx, >>> alac->output_samples_buffer[ch], >>> - buf_size, buf_alloc_fail); >>> - } >>> + FF_ALLOC_OR_GOTO(alac->avctx, alac->output_samples_buffer[ch], >>> + buf_size, buf_alloc_fail); >>> >>> FF_ALLOC_OR_GOTO(alac->avctx, alac->extra_bits_buffer[ch], >>> buf_size, buf_alloc_fail); >>> -- >>> 2.5.2 >>> >>> _______________________________________________ >>> ffmpeg-devel mailing list >>> ffmpeg-devel@ffmpeg.org >>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >>> >> >> it should be padded and not introduce slowdown > > If you mean the temp buffers, they will be padded alongside the simd > functions > once i commit them. > But If you mean the avframe.extended_data buffer, could you take care of > that? > I'm not familiar enough with avframe to change the relevant alloc functions. > > running "time ffmpeg -v 0 -threads 1 -i INPUT -threads 1 -f null -" > (implicit > pcm_s16le output) > > Before > real 0m0.596s > user 0m0.000s > sys 0m0.000s > > After > real 0m0.575s > user 0m0.000s > sys 0m0.000s > > > running "time ffmpeg -v 0 -threads 1 -i INPUT -threads 1 -c:a pcm_s24le -f > null -" > > Before > real 0m0.618s > user 0m0.000s > sys 0m0.000s > > After > real 0m0.618s > user 0m0.000s > sys 0m0.000s > > With a ~1 minute 24 bit 44.1kh stereo sample. Curious that it's faster when > the > output is s16. > You'll probably have to do the same for the tak decoder before you commit > your > decorrelate simd patch, btw. It also uses avframe.extended_data buffer > directly > for 24bit samples. > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >
you set aligned number of samples before calling get_buffer and after that changes frame->nb_samples to actual number of samples. Alternatively IIRC default 16 byte alignment could be increased. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel