On Mon, 2016-07-04 at 09:20 +0000, Carl Eugen Hoyos wrote: > Dan Parrot <dan.parrot <at> mail.com> writes: > > > The dataset used was the entire FATE regression suite. > > I don't think this is a particularly useful testcase: > It takes very long but mostly tests other things. > > Did you test if using ffmpeg -benchmark -f rawvideo -i /dev/zero... > showed different results? > I believe this should be both easier and faster to test. Sorry, I don't understand what that command line just above is trying to achieve. Could you elaborate?
> > name: rgb24ToY_c_vsx. > > no. of calls: 9999. min: 3832 ns. avg: 4709 ns. max: 37550 ns. > > total: 47093533 ns. > > > > name: rgb24ToY_c. > > no. of calls: 9999. min: 3809 ns. avg: 4707 ns. max: 29041 ns. > > total: 47072923 ns. > > Without any data, I would have thought that this is the most > important function (and "no. of calls" seems to confirm this). > > Why is this not faster? Surprisingly, gcc is producing some badly suboptimal assembly. I need to follow up with IBM's Linux Technology Center. The major issue is that multiplication of vector quantities in C is generating as many multiplications in assembly as would scalar multiplication in a loop. No way that should be occurring. > Can you confirm with START_TIMER / STOP_TIMER that there is no > gain? SystemTap probes provide identical functionality by measuring deltas between function entry and function return. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel