Hi, 2015-06-16 10:35 GMT+02:00 Stefano Sabatini <stefa...@gmail.com>: > On date Tuesday 2015-06-16 10:20:31 +0200, wm4 encoded: >> On Mon, 15 Jun 2015 17:55:35 +0200 >> Stefano Sabatini <stefa...@gmail.com> wrote: >> >> > On date Monday 2015-06-15 11:56:13 +0200, Stefano Sabatini encoded: >> > [...] >> > > From 3a75ef1e86360cd6f30b8e550307404d0d1c1dba Mon Sep 17 00:00:00 2001 >> > > From: Stefano Sabatini <stefa...@gmail.com> >> > > Date: Mon, 15 Jun 2015 11:02:50 +0200 >> > > Subject: [PATCH] lavu/mem: add av_memcpynt() function with x86 >> > > optimizations >> > > >> > > Assembly based on code from vlc dxva2.c, commit 62107e56 by Laurent Aimar >> > > <fen...@videolan.org>. >> > > >> > > TODO: bump minor, update APIchanges >> > > --- >> > > libavutil/mem.c | 9 +++++ >> > > libavutil/mem.h | 14 ++++++++ >> > > libavutil/mem_internal.h | 26 +++++++++++++++ >> > > libavutil/x86/Makefile | 1 + >> > > libavutil/x86/mem.c | 85 >> > > ++++++++++++++++++++++++++++++++++++++++++++++++ >> > > 5 files changed, 135 insertions(+) >> > > create mode 100644 libavutil/mem_internal.h >> > > create mode 100644 libavutil/x86/mem.c >> > > >> > > diff --git a/libavutil/mem.c b/libavutil/mem.c >> > > index da291fb..0e1eb01 100644 >> > > --- a/libavutil/mem.c >> > > +++ b/libavutil/mem.c >> > > @@ -42,6 +42,7 @@ >> > > #include "dynarray.h" >> > > #include "intreadwrite.h" >> > > #include "mem.h" >> > > +#include "mem_internal.h" >> > > >> > > #ifdef MALLOC_PREFIX >> > > >> > > @@ -515,3 +516,11 @@ void av_fast_malloc(void *ptr, unsigned int *size, >> > > size_t min_size) >> > > ff_fast_malloc(ptr, size, min_size, 0); >> > > } >> > > >> > > +void av_memcpynt(void *dst, const void *src, size_t size, int cpu_flags) >> > > +{ >> > > +#if ARCH_X86 >> > > + ff_memcpynt_x86(dst, src, size, cpu_flags); >> > > +#else >> > > + memcpy(dst, src, size, cpu_flags); >> > > +#endif >> > > +} >> > >> > Alternatively, what about something like: >> > >> > av_memcpynt_fn av_memcpynt_get_fn(void); >> > >> > modeled after av_pixelutils_get_sad_fn()? This would skip the need for >> > a wrapper calling the right function. >> > >> I don't see much value in this, unless determining the right function >> causes too much overhead. > > I see two advantages, 1. no branch and function call when the function > is called, 2. the cpu_flags must not be passed around, so it's somehow > safer.
Interesting approach. You probably could also use something similar to sws context you build up based on surface size, and other characteristics (flags)? Regards, -- Gwenole Beauchesne Intel Corporation SAS / 2 rue de Paris, 92196 Meudon Cedex, France Registration Number (RCS): Nanterre B 302 456 199 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel