Jan 12, 2021, 19:28 by reimar.doeffin...@gmx.de: >> >> On 10 Jan 2021, at 19:55, Lynne <d...@lynne.ee> wrote: >> >> Jan 10, 2021, 17:43 by reimar.doeffin...@gmx.de: >> >>> From: Reimar Döffinger <reimar.doeffin...@gmx.de> >>> >>> real 0m15.040s >>> user 0m18.874s (80.7% of original) >>> sys 0m0.168s >>> >> >> I think I have to disagree. >> The performance gains are marginal, >> > > It’s almost 20%. At least for this combination of > codec and stream a large amount of time is spend in > non-DSP functions, so even hand-written assembler > won’t give you huge gains. > It's non-guaranteed 20% on a single system. It could change, and it could very well mess up like gcc does with autovectorization, which we still explicitly disable because FATE fails (-fno-tree-vectorize, and I was the one who sent an RFC to try to undo it somewhat recently. Even though it was an RFC the reaction from devs was quite cold).
>> its definitely something the compiler should >> be able to decide on its own, >> > > So you object to unlikely() macros as well? > It’s really just giving the compiler a hint it should try, though I admit the > configure part makes it > look otherwise. > I'm more against the macro and changes to the code itself. If you can make it work without adding a macro to individual loops or the likes of av_cold/av_hot or any other changes to the code, I'll be more welcoming. I really _hate_ compiler hints. Take a look at the upipe source code to see what a cthulian monstrosity made of hint flags looks like. Every single branch had a cold/hot macro and it was the project's coding style. It's completely irredeemable. >> Most of the loops this is added to are trivially SIMDable. >> > > How many hours of effort do you consider “trivial”? > Especially if it’s someone not experienced? > It might be fairly trivial with intrinsics, however > many of your counter-arguments also apply > to intrinsics (and to a degree inline assembly). > That’s btw not just a rhetorical question because > I’m pretty sure I am not going to all the trouble > to port more of the arm 32-bit assembler functions > since it’s a huge PITA, and I was wondering if there > was a point to even have a try with intrinsics... > Intrinsics and inline assembly are a whole different thing than magic macros that tell and force the compiler what a well written compiler should already very well know about. >> Just because no one has >> had the motivation to do SIMD for a pretty unpopular codec doesn't mean we >> should >> compromise. >> > > If you think of AArch64 specifically, I can > kind of agree. > However I wouldn’t say the word “compromise” > is appropriate when there’s a good chance nothing > better will ever come to exist. > But the real point is not AArch64, that is just > a very convenient test platform. > The point is to raise the minimum bar. > A new architecture, RISC-V for example or something > else should not be stuck at scalar performance > until someone actually gets around to implementing > assembler optimizations. > And just to be clear: I don’t actually care about > HEVC, it just seemed a nice target to do some > experiments. > I already said all that can be said here: this will halt efforts on actually optimizing the code in exchange for naive trust in compilers. New platforms will be stuck at scalar performance anyway until the compilers for the arch are smart enough to deal with vectorization. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".