On Tue, Oct 13, 2015 at 1:03 AM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: > On Tue, Oct 13, 2015 at 12:44 AM, Carl Eugen Hoyos <ceho...@ag.or.at> wrote: >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes: >>> On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos wrote: >>> > Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes: >>> > >>> >> Bench from libavfilter/astats on a 15 min clip. >>> > >>> > I believe that your test would indicate that the >>> > old variant is faster or that no result can be >>> > given which is what my tests show. >>> >>> Look at the bench and the numbers again, I have >>> provided it above. >> >> Ok: >> old: >> 389 decicycles in abs, 64 runs, 0 skips >> 350 decicycles in abs, 128 runs, 0 skips >> 331 decicycles in abs, 256 runs, 0 skips >> 321 decicycles in abs, 512 runs, 0 skips >> 319 decicycles in abs, 1024 runs, 0 skips >> 318 decicycles in abs, 2048 runs, 0 skips >> 315 decicycles in abs, 4096 runs, 0 skips >> 317 decicycles in abs, 8192 runs, 0 skips >> 335 decicycles in abs, 16384 runs, 0 skips >> 335 decicycles in abs, 32768 runs, 0 skips >> >> mew: >> 382 decicycles in abs, 64 runs, 0 skips >> 361 decicycles in abs, 128 runs, 0 skips >> 356 decicycles in abs, 256 runs, 0 skips >> 334 decicycles in abs, 512 runs, 0 skips >> 322 decicycles in abs, 1024 runs, 0 skips >> 317 decicycles in abs, 2048 runs, 0 skips >> 315 decicycles in abs, 4096 runs, 0 skips >> 341 decicycles in abs, 8192 runs, 0 skips >> 363 decicycles in abs, 16383 runs, 1 skips >> 342 decicycles in abs, 32767 runs, 1 skips >> Numbers with high skips or low runs are not so >> relevant afaik. > > Not so relevant, but as I said: it is still better. > >> >>> They are essentially identical in the best case >>> (most number of runs), the new variant is faster in >>> the worst case. >> >> I would say the opposite is true but we can certainly >> agree that there is no proof that one is faster. > > Do a random float test, the difference is more pronounced.
Simple bench for all abs stuff: #include <math.h> #include <time.h> #include <float.h> #include <stdlib.h> #include <stdio.h> #define FFABS(a) ((a) >= 0 ? (a) : (-(a))) #define NUM_TRIALS 100000 #define NUM_ITER 100000 static float f[NUM_TRIALS]; static double g[NUM_TRIALS]; static int i[NUM_TRIALS]; static long long ll[NUM_TRIALS]; int main(void) { int c, d; clock_t start, end; double time; float abs_f; double abs_d; int abs_i; long long abs_ll; for (c = 0; c < NUM_TRIALS; ++c) { ll[c] = random(); i[c] = rand(); f[c] = (float)rand()/(float)(RAND_MAX/FLT_MAX); g[c] = (double)random()/(double)(RAND_MAX/DBL_MAX); } start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) f[c] = fabsf(f[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("fabsf: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) f[c] = FFABS(f[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("FFABS: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) g[c] = fabs(g[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("fabs: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) g[c] = FFABS(g[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("FFABS: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) i[c] = abs(i[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("abs: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) i[c] = FFABS(i[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("FFABS: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) ll[c] = llabs(ll[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("llabs: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) ll[c] = FFABS(ll[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("FFABS: %lf\n", time); return 0; } > >> >>> You have not provided a bench proving otherwise. >> >> old: >> user 0m20.338s >> user 0m20.408s >> user 0m20.287s >> user 0m20.365s >> user 0m20.208s >> new: >> user 0m20.197s >> user 0m20.577s >> user 0m20.434s >> user 0m20.322s >> user 0m20.356s Am also curious how you got your bench. What plaftform, what command line? > > The difference here is imo too small to say anything. My point is > precisely this: on most inputs, there is no difference. On bad (worst > case) inputs, using fabs instead of the macro is far superior. The > random float bench proves this. Translating that to some audio file > should be easy: I suspect placing most samples near a silence value > (0) does this. > >> >>> > I am not sure if it makes sense to apply a patch >>> > that is meant to improve speed if this improvement >>> > can't be shown. >>> >>> I believe I have shown it above clearly. >> >> Imo, you have shown clearly that neither variant can >> be shown to be faster. Now I have with the above random bench. >> >> Carl Eugen >> >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel