https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88793
--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> --- (In reply to Florian Weimer from comment #2) > The startup overhead isn't the problem. The asymptotic performance is > really bad, too. (I hope I didn't botch my test, though. It's vaguely > based on what's attached to the downstream bug.) > > For len == 5000, I get a factor of 60 difference in favor of glibc 2.28's > strlen. For len == 30, it's still a factor of 11 in favor of strlen. This > is on a machine with a i7-8650U, so a fairly recent CPU with erms. As noted in the referenced bug, erms does not accelerate scasb and cmpsb (only movs and stos), so strlen and memcmp/strcmp are among the most extreme examples. I wrongly assumed gcc did not use scasb to implement strlen inline. I think it's fair to raise the question if gcc should not use scasb/cmpsb by default (I thought there was a bug for that but apparently there isn't?). I doubt it supports the original point about attribute-cold being inappropriate. If gcc is making a poor decision in cold regions, it will be making the same poor decision everywhere under -Os, and it's fair to demand that such decisions are revisited and improved (-Os is not "minimize size at all costs").