On 12/25/2016 3:14 PM, James Almer wrote: > On 12/25/2016 1:11 PM, Ronald S. Bultje wrote: >> Hi, >> >> On Sat, Dec 24, 2016 at 9:29 AM, Paul B Mahol <one...@gmail.com> wrote: >> >>> On 12/24/16, Ronald S. Bultje <rsbul...@gmail.com> wrote: >>>> Hi, >>>> >>>> On Sat, Dec 24, 2016 at 6:09 AM, Paul B Mahol <one...@gmail.com> wrote: >>>> >>>>> On 12/24/16, Ronald S. Bultje <rsbul...@gmail.com> wrote: >>>>>> Hi, >>>>>> >>>>>> On Fri, Dec 23, 2016 at 6:18 PM, James Almer <jamr...@gmail.com> >>> wrote: >>>>>> >>>>>>> On 12/23/2016 8:00 PM, Ronald S. Bultje wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On Fri, Dec 23, 2016 at 12:32 PM, Paul B Mahol <one...@gmail.com> >>>>> wrote: >>>>>>>> >>>>>>>>> diff --git a/libavcodec/lossless_videodsp.h >>> b/libavcodec/lossless_ >>>>>>>>> videodsp.h >>>>>>>>> >>>>>>>> [..] >>>>>>>> >>>>>>>>> @@ -32,6 +32,7 @@ typedef struct LLVidDSPContext { >>>>>>>>> >>>>>>>> [..] >>>>>>>> >>>>>>>>> + void (*add_magy_median_pred_int16)(uint16_t *dst, const >>>>> uint16_t >>>>>>>>> *top, const uint16_t *diff, unsigned mask, int w, int *left, int >>>>>>> *left_top); >>>>>>>>> >>>>>>>> >>>>>>>> That seems wrong. Why would you add a magicuv-specific function to >>>>>>>> losslessdsp-context which is intended for functions shared between >>>>> many >>>>>>>> (not just one) lossless codecs? You probably want a new dsp for >>>>> magicyuv >>>>>>>> specifically. >>>>>>>> >>>>>>>> I know this is tedious, but we're very specifically trying to >>> prevent >>>>>>>> dsputil from ever happening again. >>>>>>>> >>>>>>>> Ronald >>>>>>> >>>>>>> Some functions in this dsp are used only by huffyuv. Only one is used >>>>>>> by >>>>>>> both huffyuv and magicyuv. >>>>>>> To properly apply what you mention, it would need to be split in two, >>>>>>> huffyuvdsp and lldsp, then this new function added to a new dsp >>> called >>>>>>> magicyuvdsp. >>>>>> >>>>>> >>>>>> That would be even better, yes. >>>>> >>>>> What about yasm code? >>>>> >>>>> I wanted that to be commented. >>>> >>>> >>>> It's like dithering, it uses the immediately adjacent pixel in the next >>>> loop iteration, can you really simd this effectively? >>> >>> Apparently, and someone is making money from it. >> >> >> The parallelizable portion of it is the top-topleft, and you seem to do >> that already. Other than that, I don't see much to be done. You can >> probably use some mmxext instructions like pshufw to make life easier, but >> I think you'll always be limited by the inherent limitation. >> >> Ronald > > He can turn the movq + psrlq + psllq + por at the end of the loop into two > movq + palignr for an ssse3 version of the function (still using mmx regs), > but not much more than that i guess. > And even that will probably not make a noticeable difference, assuming it's > actually faster.
Looks like it's about 3% faster. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel