Richard Sandiford <richard.sandif...@arm.com> writes: > Wilco Dijkstra <wilco.dijks...@arm.com> writes: >> Hi Richard, >> >>>> A common case is a constant string which is compared against some >>>> argument. Most string functions work on 8 or 16-byte quantities. If we >>>> ensure the whole array fits in one aligned load, we save time in the >>>> string function. >>>> >>>> Runtime data collected for strlen calls shows 97+% has 8-byte alignment >>>> or higher - this kind of overalignment helps achieving that. >>> >>> Ah, ok. But aren't we then losing that advantage for 4-byte arrays? >>> Or are you assuming a 4-byte path too? Or is strlen just very unlikely >>> for such small data? >> >> The advantage comes from being aligned enough. Eg. a strlen implementation >> may start like this: >> >> bic src, srcin, 15 >> ld1 {vdata.16b}, [src] // 16-byte aligned >> load >> cmeq vhas_nul.16b, vdata.16b, 0 // check for NUL byte >> >> It always does a 16-byte aligned load and test for the end of the string. So >> we want >> to ensure that small strings fully fit inside the first 16-byte load (if >> not, it takes almost >> twice the number of instructions even if the string is only 4 bytes). 4-byte >> alignment >> is enough to ensure this. > > Ah, I see. Can you add a summary of these explanations as a comment, > so that someone reading it later will understand the rationale? > > OK with that change.
It looks like you committed the original version instead, with no extra explanation. I suppose I should have asked for another review round instead. Richard > Thanks, > Richard > >> >> Another approach is to always load the first 16 bytes from the start of the >> string >> (if not close to the end of a page). That is often an unaligned load, and >> then the >> difference between 4- and 8-byte alignment is negligible. >> >> Cheers, >> Wilco