On Mon, Nov 23, 2015 at 03:48:40PM -0500, Ganesh Ajjanagadde wrote: > On Mon, Nov 23, 2015 at 2:13 PM, Michael Niedermayer <michae...@gmx.at> wrote: > > On Mon, Nov 23, 2015 at 01:57:24PM -0500, Ganesh Ajjanagadde wrote: > >> On Mon, Nov 23, 2015 at 1:02 PM, Michael Niedermayer <michae...@gmx.at> > >> wrote: > >> > On Mon, Nov 23, 2015 at 12:43:52PM -0500, Ganesh Ajjanagadde wrote: > >> >> On Sun, Nov 22, 2015 at 3:56 PM, Ganesh Ajjanagadde <gajja...@mit.edu> > >> >> wrote: > >> >> > On Sun, Nov 22, 2015 at 3:07 PM, Michael Niedermayer > >> >> > <michae...@gmx.at> wrote: > >> >> >> On Sun, Nov 22, 2015 at 12:05:49PM -0500, Ganesh Ajjanagadde wrote: > >> >> >>> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> > >> >> >>> --- > >> >> >>> libavfilter/vsrc_mandelbrot.c | 2 +- > >> >> >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >> >> >>> > >> >> >>> diff --git a/libavfilter/vsrc_mandelbrot.c > >> >> >>> b/libavfilter/vsrc_mandelbrot.c > >> >> >>> index 950c5c8..a0c101e 100644 > >> >> >>> --- a/libavfilter/vsrc_mandelbrot.c > >> >> >>> +++ b/libavfilter/vsrc_mandelbrot.c > >> >> >>> @@ -291,7 +291,7 @@ static void draw_mandelbrot(AVFilterContext > >> >> >>> *ctx, uint32_t *color, int linesize, > >> >> >>> > >> >> >>> use_zyklus= (x==0 || s->inner!=BLACK ||color[x-1 + > >> >> >>> y*linesize] == 0xFF000000); > >> >> >>> if(use_zyklus) > >> >> >>> - epsilon= scale*1*sqrt(SQR(x-s->w/2) + > >> >> >>> SQR(y-s->h/2))/s->w; > >> >> >>> + epsilon= scale*hypot(x-s->w/2, y-s->h/2)/s->w; > >> >> >> > >> >> >> old: > >> >> >> 704 decicycles in hypo, 1048570 runs, 6 skips > >> >> >> > >> >> >> new: > >> >> >> 1075 decicycles in hypo, 1048566 runs, 10 skips > >> >> >> > >> >> >> that is from START/STOP_TIMER over hypot() > >> >> >> > >> >> >> the code is speed relevant as its executed per pixel > >> >> > > >> >> > Thanks for testing. Looking more closely, I see no reason for > >> >> > expensive sqrt calls anyway: one can simply square both sides; it > >> >> > should be cheaper. Will rework, post benchmark if it is indeed faster > >> >> > and does not suffer from floating point overflow, else will simply > >> >> > push a trivial removal of the "1". > >> >> > >> >> It seems like getting rid of the sqrt altogether has a very slight > >> >> positive impact (if any at all). I can post the patch, but would like > >> >> to know what to benchmark. There are numerous choices, e.g > >> >> draw_mandelbrot as a whole, the outer loop, or the inner loop. > >> >> I personally think the inner x loop (lines 268-388) is a good place to > >> >> look at, since the difference is very small anyway, and further > >> >> localization is impossible. > >> > > > > >> > please post the patch > >> > >> bench posted first to see if it is considered interesting enough. > >> Bench over whole draw_mandelbrot using START/STOP timer on x86-64, > >> Haswell, GNU/Linux, command line: > >> ffmpeg -v error -f lavfi -i mandelbrot -f null - > >> new (draw_mandelbrot): > > [...] > >> 20857881401 decicycles in draw_mandelbrot, 1024 runs, 0 skips > >> > >> old (draw_mandelbrot): > > [...] > >> 21393227201 decicycles in draw_mandelbrot, 1024 runs, 0 skips > > > > if this is consistent over several tries then its interresting > > There is a reason why I am posting a full vector, since it is very > hard to judge. I ran for a longer duration below. I do see a downward > trend, but unfortunately the magnitude of the effect is unclear. > Furthermore, there seem to be runtime variations in the actual numbers > compared to the previous run, though they ran on the same hardware. I > did not use any fancy tricks like core pinning etc, which could have > helped in ensuring minimal background task interference. >
> BTW, this filter is terribly slow as it zooms in, together with a > bunch of messages at the info level "Mandelbrot cache is too small!" > that do not seem very user friendly to me. fixed [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Asymptotically faster algorithms should always be preferred if you have asymptotical amounts of data
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel