On 10/6/2015 4:40 PM, Paul B Mahol wrote: > On 10/6/15, James Almer <jamr...@gmail.com> wrote: >> Since AVFrame.extended_data is apparently not padded, simd functions >> could in some cases overread, so make the decoder use a temp buffer >> unconditionally. >> >> Signed-off-by: James Almer <jamr...@gmail.com> >> --- >> libavcodec/alac.c | 18 +++++------------- >> 1 file changed, 5 insertions(+), 13 deletions(-) >> >> diff --git a/libavcodec/alac.c b/libavcodec/alac.c >> index 146668e..394bd19 100644 >> --- a/libavcodec/alac.c >> +++ b/libavcodec/alac.c >> @@ -80,7 +80,6 @@ typedef struct ALACContext { >> int extra_bits; /**< number of extra bits beyond 16-bit */ >> int nb_samples; /**< number of samples in the current frame */ >> >> - int direct_output; >> int extra_bit_bug; >> >> ALACDSPContext dsp; >> @@ -278,10 +277,6 @@ static int decode_element(AVCodecContext *avctx, >> AVFrame *frame, int ch_index, >> return AVERROR_INVALIDDATA; >> } >> alac->nb_samples = output_samples; >> - if (alac->direct_output) { >> - for (ch = 0; ch < channels; ch++) >> - alac->output_samples_buffer[ch] = (int32_t >> *)frame->extended_data[ch_index + ch]; >> - } >> >> if (is_compressed) { >> int16_t lpc_coefs[2][32]; >> @@ -393,8 +388,9 @@ static int decode_element(AVCodecContext *avctx, AVFrame >> *frame, int ch_index, >> break; >> case 24: { >> for (ch = 0; ch < channels; ch++) { >> + int32_t *outbuffer = (int32_t *)frame->extended_data[ch_index + >> ch]; >> for (i = 0; i < alac->nb_samples; i++) >> - alac->output_samples_buffer[ch][i] <<= 8; >> + *outbuffer++ = alac->output_samples_buffer[ch][i] << 8; >> }} >> break; >> } >> @@ -468,8 +464,7 @@ static av_cold int alac_decode_close(AVCodecContext >> *avctx) >> int ch; >> for (ch = 0; ch < FFMIN(alac->channels, 2); ch++) { >> av_freep(&alac->predict_error_buffer[ch]); >> - if (!alac->direct_output) >> - av_freep(&alac->output_samples_buffer[ch]); >> + av_freep(&alac->output_samples_buffer[ch]); >> av_freep(&alac->extra_bits_buffer[ch]); >> } >> >> @@ -491,11 +486,8 @@ static int allocate_buffers(ALACContext *alac) >> FF_ALLOC_OR_GOTO(alac->avctx, alac->predict_error_buffer[ch], >> buf_size, buf_alloc_fail); >> >> - alac->direct_output = alac->sample_size > 16; >> - if (!alac->direct_output) { >> - FF_ALLOC_OR_GOTO(alac->avctx, alac->output_samples_buffer[ch], >> - buf_size, buf_alloc_fail); >> - } >> + FF_ALLOC_OR_GOTO(alac->avctx, alac->output_samples_buffer[ch], >> + buf_size, buf_alloc_fail); >> >> FF_ALLOC_OR_GOTO(alac->avctx, alac->extra_bits_buffer[ch], >> buf_size, buf_alloc_fail); >> -- >> 2.5.2 >> >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> > > it should be padded and not introduce slowdown
If you mean the temp buffers, they will be padded alongside the simd functions once i commit them. But If you mean the avframe.extended_data buffer, could you take care of that? I'm not familiar enough with avframe to change the relevant alloc functions. running "time ffmpeg -v 0 -threads 1 -i INPUT -threads 1 -f null -" (implicit pcm_s16le output) Before real 0m0.596s user 0m0.000s sys 0m0.000s After real 0m0.575s user 0m0.000s sys 0m0.000s running "time ffmpeg -v 0 -threads 1 -i INPUT -threads 1 -c:a pcm_s24le -f null -" Before real 0m0.618s user 0m0.000s sys 0m0.000s After real 0m0.618s user 0m0.000s sys 0m0.000s With a ~1 minute 24 bit 44.1kh stereo sample. Curious that it's faster when the output is s16. You'll probably have to do the same for the tak decoder before you commit your decorrelate simd patch, btw. It also uses avframe.extended_data buffer directly for 24bit samples. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel