Hi, On Wed, May 6, 2015 at 1:40 PM, Carl Eugen Hoyos <ceho...@ag.or.at> wrote: > Ronald S. Bultje <rsbultje <at> gmail.com> writes: > > +static void vert_4x4_c(uint8_t *_dst, ptrdiff_t stride, > > + const uint8_t *left, const uint8_t *_top) > > Once upon a time, it was claimed that we must not > use identifiers starting with "_".
Well, they're not really variable names, just pre-cast placeholders. I'm basically just copying the approach that hevc/h264 templating uses. For example: static void FUNC(put_hevc_pel_pixels)(int16_t *dst, uint8_t *_src, ptrdiff_t _srcstride, int height, intptr_t mx, intptr_t my, int width) { int x, y; pixel *src = (pixel *)_src; ptrdiff_t srcstride = _srcstride / sizeof(pixel); Note that all pre-cast placeholders start with a _ to prevent name-clashes with the post-cast variables of interest. > Would it be slower to decode to YUV420P16 and set > bits_per_coded_sample? (Just being curious.) Someone capable and interested would need to test this. I simply copied the hevc/h264 approach. Theoretically, I think certain parts would be faster if we kept p10/p12, e.g. that flat loop filter (since it can be done in 16bits, and would need to be done in 32bits if we used p16). I also think the directional predictors (3-tap, specifically) would be slightly more complex in p16 than in p10/p12 (see also how we do it for 8bit to keep it 8bits instead of going to 16bits). This is admittedly minor, but it's still a factor. Overall, the effect would be minor, like in the lower single-digit percents or perhaps even fractional percent, but I would absolutely expect a small performance gain from using p10/p12 over p16 w/ bits_per_coded_sample. Also note most of this would only be noticeable after simd optimizations; in C there would be no difference. Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel