http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55701
Bug #: 55701 Summary: Inline some instances of memset for ARM Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: josh.m.con...@gmail.com memset() is almost never inlined on ARM, even at -O3. If the target is known to be 4-byte aligned or greater, it will be inlined for 1, 2, or 4 byte lengths. If the target alignment is unknown, it will be inlined only for a single byte. I don't see this problem with similar builtins (memcpy, memmove, and memclear (memset with a target value of zero)) - they all inline small cases. It probably makes sense for memset to be inlined up to at least 16 bytes or so in all cases. When aligned, memcpy and memmove use a ldmia/stmia (load multiple/store multiple) sequence to create fairly compact inline code. We could consider doing the same sort of optimization with memset, using stmia only.