Richard Sandiford <richard.sandif...@arm.com> writes:
> Wilco Dijkstra <wilco.dijks...@arm.com> writes:
>> Hi Richard,
>>
>>>> A common case is a constant string which is compared against some
>>>> argument. Most string functions work on 8 or 16-byte quantities. If we
>>>> ensure the whole array fits in one aligned load, we save time in the
>>>> string function.
>>>>
>>>> Runtime data collected for strlen calls shows 97+% has 8-byte alignment
>>>> or higher - this kind of overalignment helps achieving that.
>>>
>>> Ah, ok.  But aren't we then losing that advantage for 4-byte arrays?
>>> Or are you assuming a 4-byte path too?  Or is strlen just very unlikely
>>> for such small data?
>>
>> The advantage comes from being aligned enough. Eg. a strlen implementation
>> may start like this:
>>
>>      bic     src, srcin, 15
>>      ld1     {vdata.16b}, [src]                          // 16-byte aligned 
>> load
>>      cmeq    vhas_nul.16b, vdata.16b, 0  // check for NUL byte
>>
>> It always does a 16-byte aligned load and test for the end of the string. So 
>> we want
>> to ensure that small strings fully fit inside the first 16-byte load (if 
>> not, it takes almost
>> twice the number of instructions even if the string is only 4 bytes). 4-byte 
>> alignment
>> is enough to ensure this.
>
> Ah, I see.  Can you add a summary of these explanations as a comment,
> so that someone reading it later will understand the rationale?
>
> OK with that change.

It looks like you committed the original version instead, with no extra
explanation.  I suppose I should have asked for another review round
instead.

Richard

> Thanks,
> Richard
>
>>
>> Another approach is to always load the first 16 bytes from the start of the 
>> string
>> (if not close to the end of a page). That is often an unaligned load, and 
>> then the
>> difference between 4- and 8-byte alignment is negligible.
>>
>> Cheers,
>> Wilco

Reply via email to