On 12/25/2016 1:11 PM, Ronald S. Bultje wrote: > Hi, > > On Sat, Dec 24, 2016 at 9:29 AM, Paul B Mahol <one...@gmail.com> wrote: > >> On 12/24/16, Ronald S. Bultje <rsbul...@gmail.com> wrote: >>> Hi, >>> >>> On Sat, Dec 24, 2016 at 6:09 AM, Paul B Mahol <one...@gmail.com> wrote: >>> >>>> On 12/24/16, Ronald S. Bultje <rsbul...@gmail.com> wrote: >>>>> Hi, >>>>> >>>>> On Fri, Dec 23, 2016 at 6:18 PM, James Almer <jamr...@gmail.com> >> wrote: >>>>> >>>>>> On 12/23/2016 8:00 PM, Ronald S. Bultje wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On Fri, Dec 23, 2016 at 12:32 PM, Paul B Mahol <one...@gmail.com> >>>> wrote: >>>>>>> >>>>>>>> diff --git a/libavcodec/lossless_videodsp.h >> b/libavcodec/lossless_ >>>>>>>> videodsp.h >>>>>>>> >>>>>>> [..] >>>>>>> >>>>>>>> @@ -32,6 +32,7 @@ typedef struct LLVidDSPContext { >>>>>>>> >>>>>>> [..] >>>>>>> >>>>>>>> + void (*add_magy_median_pred_int16)(uint16_t *dst, const >>>> uint16_t >>>>>>>> *top, const uint16_t *diff, unsigned mask, int w, int *left, int >>>>>> *left_top); >>>>>>>> >>>>>>> >>>>>>> That seems wrong. Why would you add a magicuv-specific function to >>>>>>> losslessdsp-context which is intended for functions shared between >>>> many >>>>>>> (not just one) lossless codecs? You probably want a new dsp for >>>> magicyuv >>>>>>> specifically. >>>>>>> >>>>>>> I know this is tedious, but we're very specifically trying to >> prevent >>>>>>> dsputil from ever happening again. >>>>>>> >>>>>>> Ronald >>>>>> >>>>>> Some functions in this dsp are used only by huffyuv. Only one is used >>>>>> by >>>>>> both huffyuv and magicyuv. >>>>>> To properly apply what you mention, it would need to be split in two, >>>>>> huffyuvdsp and lldsp, then this new function added to a new dsp >> called >>>>>> magicyuvdsp. >>>>> >>>>> >>>>> That would be even better, yes. >>>> >>>> What about yasm code? >>>> >>>> I wanted that to be commented. >>> >>> >>> It's like dithering, it uses the immediately adjacent pixel in the next >>> loop iteration, can you really simd this effectively? >> >> Apparently, and someone is making money from it. > > > The parallelizable portion of it is the top-topleft, and you seem to do > that already. Other than that, I don't see much to be done. You can > probably use some mmxext instructions like pshufw to make life easier, but > I think you'll always be limited by the inherent limitation. > > Ronald
He can turn the movq + psrlq + psllq + por at the end of the loop into two movq + palignr for an ssse3 version of the function (still using mmx regs), but not much more than that i guess. And even that will probably not make a noticeable difference, assuming it's actually faster. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel