Hi Kyrill,
On 2019/8/28 16:57, Kyrill Tkachov wrote:
> Hi Shaokun,
>
> On 8/28/19 9:47 AM, Shaokun Zhang wrote:
>> Hi Kyrill,
>>
>> On 2019/8/27 18:16, Kyrill Tkachov wrote:
>>> Hi Shaokun,
>>>
>>> On 8/22/19 3:10 PM, Shaokun Zhang wrote:
>>>> The DCache clean & ICache invalidation requirements for instructions
>>>> to be data coherence are discoverable through new fields in CTR_EL0.
>>>> Let's support the two bits if they are enabled, then we can get some
>>>> performance benefit from this feature.
>>>>
>>>> 2019-08-22 Shaokun Zhang <[email protected]>
>>>>
>>>> * config/aarch64/sync-cache.c: Support CTR_EL0.IDC and CTR_EL0.DIC
>>>>
>>> This needs to mention __aarch64_sync_cache_range as the function being
>>> changed.
>>>
>> Ok, I will update in next version.
>>
>>>> ---
>>>> libgcc/config/aarch64/sync-cache.c | 56
>>>> ++++++++++++++++++++++++--------------
>>>> 1 file changed, 35 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/libgcc/config/aarch64/sync-cache.c
>>>> b/libgcc/config/aarch64/sync-cache.c
>>>> index 791f5e42ff44..0b057efbdcab 100644
>>>> --- a/libgcc/config/aarch64/sync-cache.c
>>>> +++ b/libgcc/config/aarch64/sync-cache.c
>>>> @@ -23,6 +23,9 @@ a copy of the GCC Runtime Library Exception along with
>>>> this program;
>>>> see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
>>>> <http://www.gnu.org/licenses/>. */
>>>>
>>>> +#define CTR_IDC_SHIFT 28
>>>> +#define CTR_DIC_SHIFT 29
>>>> +
>>>> void __aarch64_sync_cache_range (const void *, const void *);
>>>>
>>>> void
>>>> @@ -41,32 +44,43 @@ __aarch64_sync_cache_range (const void *base, const
>>>> void *end)
>>>> icache_lsize = 4 << (cache_info & 0xF);
>>>> dcache_lsize = 4 << ((cache_info >> 16) & 0xF);
>>>>
>>>> - /* Loop over the address range, clearing one cache line at once.
>>>> - Data cache must be flushed to unification first to make sure the
>>>> - instruction cache fetches the updated data. 'end' is exclusive,
>>>> - as per the GNU definition of __clear_cache. */
>>>> + /* If CTR_EL0.IDC is enabled, Data cache clean to the Point of
>>>> Unification is
>>>> + not required for instruction to data coherence. */
>>>> +
>>>> + if ((cache_info >> CTR_IDC_SHIFT) & 0x1 == 0x0) {
>>>> + /* Loop over the address range, clearing one cache line at once.
>>>> + Data cache must be flushed to unification first to make sure the
>>>> + instruction cache fetches the updated data. 'end' is exclusive,
>>>> + as per the GNU definition of __clear_cache. */
>>>>
>>>> - /* Make the start address of the loop cache aligned. */
>>>> - address = (const char*) ((__UINTPTR_TYPE__) base
>>>> - & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1));
>>>> + /* Make the start address of the loop cache aligned. */
>>>> + address = (const char*) ((__UINTPTR_TYPE__) base
>>>> + & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1));
>>>>
>>>> - for (; address < (const char *) end; address += dcache_lsize)
>>>> - asm volatile ("dc\tcvau, %0"
>>>> - :
>>>> - : "r" (address)
>>>> - : "memory");
>>>> + for (; address < (const char *) end; address += dcache_lsize)
>>>> + asm volatile ("dc\tcvau, %0"
>>>> + :
>>>> + : "r" (address)
>>>> + : "memory");
>>>> + }
>>>>
>>>> asm volatile ("dsb\tish" : : : "memory");
>>>>
>>>> - /* Make the start address of the loop cache aligned. */
>>>> - address = (const char*) ((__UINTPTR_TYPE__) base
>>>> - & ~ (__UINTPTR_TYPE__) (icache_lsize - 1));
>>>> + /* If CTR_EL0.DIC is enabled, Instruction cache cleaning to the Point of
>>>> + Unification is not required for instruction to data coherence. */
>>>> +
>>>> + if ((cache_info >> CTR_DIC_SHIFT) & 0x1 == 0x0) {
>>>> + /* Make the start address of the loop cache aligned. */
>>>> + address = (const char*) ((__UINTPTR_TYPE__) base
>>>> + & ~ (__UINTPTR_TYPE__) (icache_lsize - 1));
>>>>
>>>> - for (; address < (const char *) end; address += icache_lsize)
>>>> - asm volatile ("ic\tivau, %0"
>>>> - :
>>>> - : "r" (address)
>>>> - : "memory");
>>>> + for (; address < (const char *) end; address += icache_lsize)
>>>> + asm volatile ("ic\tivau, %0"
>>>> + :
>>>> + : "r" (address)
>>>> + : "memory");
>>>>
>>>> - asm volatile ("dsb\tish; isb" : : : "memory");
>>>> + asm volatile ("dsb\tish" : : : "memory");
>>>> + }
>>>> + asm volatile("isb" : : : "memory")
>>>> }
>>> This looks ok to me (but you'll need approval from the maintainers).
>>>
>>> There is a question of whether we need the barriers if both DIC and IDC are
>>> 1 (in which case no cache-maintentance instructions are emitted).
>>>
>>> I think we still want them to ensure the writes have been completed and the
>>> fetches from the updated cache are up-to-date.
>>>
>> At the beginning, I'm not sure how to deal with the barrier instructions if
>> DIC and IDC
>> are 1, So I sent one mail to discuss it, it is a pity that no response.
>> https://www.mail-archive.com/[email protected]/msg217561.html
>
> Sorry for that. August is a tricky month due to summer vacation in many
> places and mail can slip through the cracks...
>
Oh, thanks for more information about it, Got it.
> I recommend you ping your question if you haven't had a reply within a week
> or so.
>
Good idea, I will follow it in the future work.
>>
>> From the arm ARM document:
>> STR W11, [X1] ; W11 contains a new instruction to be stored in
>> program memory
>> DC CVAU, X1 ; clean to PoU makes the new instruction visible
>> to instruction cache
>> DSB ISH
>> IC IVAU, X1 ; ensures instruction cache/branch predictor
>> discards stale data
>> DSB ISH ; ensures completion of the invalidation
>> ISB ; ensures instruction fetch path sees new
>> instruction cache state
>>
>> AFAIK, the first "DSB ISH" is used ensure STR and DC CVAU instruction; Even
>> if IDC is 1,
>> "DSB ISH" shall be kept for STR instruction. The second "DSB ISH" and "ISB"
>> are explained
>> in comments, so I think "ISB" shall be kept also.
>
>
> Thanks, that's my understanding as well.
>
>
>>> For arch versions before CTR_EL0.{DIC, IDC} these bits are reserved as zero
>>> so the code will do the right thing on those targets.
>>>
>>> How has this patch been tested? Do you also have any performance results
>>> you can share?
>>>
>> I sent the patch as RFC because our cpu core which supports DIC and IDC is
>> designing,
>> so there is no performance result at the moment.
>
>
> That's okay. In the meantime a bootstrap and test cycle on a current aarch64
> machine would be good to make sure existing functionality doesn't break.
>
I will update the ChangeLog and do the formal patch if the maintainers are
happy on it.
Cheers,
Shaokun
> Thanks,
>
> Kyrill
>
>>
>> Thanks,
>> Shaokun
>>
>>> Thanks,
>>>
>>> Kyrill
>>>
>>>
>>>> --
>>>> 2.7.4
>>>>
>>> .
>>>
>
> .
>