On Wed, Aug 19, 2020 at 11:35:11AM -0700, Nick Desaulniers wrote: > On Wed, Aug 19, 2020 at 7:08 AM Arvind Sankar <[email protected]> wrote: > > > > gcc can transform the loop in a naive implementation of memset/memcpy > > etc into a call to the function itself. This optimization is enabled by > > -ftree-loop-distribute-patterns. > > > > This has been the case for a while (see eg [0]), but gcc-10.x enables > > this option at -O2 rather than -O3 as in previous versions. > > > > Add -ffreestanding, which implicitly disables this optimization with > > gcc. It is unclear whether clang performs such optimizations, but > > hopefully it will also not do so in a freestanding environment. > > > > [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 > > > > Signed-off-by: Arvind Sankar <[email protected]> > > For Clang: > For x86_64 defconfig: > This results in no change for the code generated. > > For aarch64 defconfig: > This results in calls to bcmp being replaced with calls to memcmp in > strstr and strnstr. I plan on adding -fno-built-bcmp then removing > bcmp anyways. Not a bug either way, just noting the difference is > disassembly. > > For arm defconfig: > This results in no change for the code generated. > > I should check the other architectures we support, but my local build > doesn't have all backends enabled currently; we'll catch it once it's > being testing in -next if it's an issue, but I don't foresee it > (knocks on wood, famous last words, ...) > > If it helps GCC not optimize these core functions into infinite > recursion, I'm for that, especially since I'd bet these get called > frequently and early on in boot, which is my least favorite time to > debug. > > Reviewed-by: Nick Desaulniers <[email protected]> > Tested-by: Nick Desaulniers <[email protected]> >
I verified that arch/c6x with gcc-10 downloaded from kernel.org has the broken memset with CC_OPTIMIZE_FOR_PERFORMANCE and gets fixed with this patch. The default is optimize for size though, which doesn't seem to be busted. Thanks.

