I've previously posted a patch to add vector/vsx inline expansion of strcmp/strncmp for the power8/power9 processors. Here are some of the other items I have in the pipeline that I hope to get into gcc9:
* vector/vsx support for inline expansion of memcmp to non-loop code. This improves performance of small memcmp. * vector/vsx support for inline expansion of memcmp to loop code. This will close the performance gap for lengths of about 128-512 bytes by making the loop code closer to the performance of the library memcmp. * generate inline expansion to a loop for strcmp/strncmp. This closes another performance gap because strcmp/strncmp vector/vsx code currently generated is lots faster than the library call but we only generate comparison of 64 bytes to avoid exploding code size. Similar code in a loop would be compact and allow inline comparison of maybe the first 512 bytes inline before dumping to the library function. If anyone has any other input on the inline expansion work I've been doing for the rs6000 target, please let me know. Thanks! Aaron -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain