Hi Richard, >> A common case is a constant string which is compared against some >> argument. Most string functions work on 8 or 16-byte quantities. If we >> ensure the whole array fits in one aligned load, we save time in the >> string function. >> >> Runtime data collected for strlen calls shows 97+% has 8-byte alignment >> or higher - this kind of overalignment helps achieving that. > > Ah, ok. But aren't we then losing that advantage for 4-byte arrays? > Or are you assuming a 4-byte path too? Or is strlen just very unlikely > for such small data?
The advantage comes from being aligned enough. Eg. a strlen implementation may start like this: bic src, srcin, 15 ld1 {vdata.16b}, [src] // 16-byte aligned load cmeq vhas_nul.16b, vdata.16b, 0 // check for NUL byte It always does a 16-byte aligned load and test for the end of the string. So we want to ensure that small strings fully fit inside the first 16-byte load (if not, it takes almost twice the number of instructions even if the string is only 4 bytes). 4-byte alignment is enough to ensure this. Another approach is to always load the first 16 bytes from the start of the string (if not close to the end of a page). That is often an unaligned load, and then the difference between 4- and 8-byte alignment is negligible. Cheers, Wilco