Hi Sahil,

On Tue, Mar 18, 2025 at 12:06:30AM +0530, Sahil Siddiq wrote:
> Hello Stafford and Geert,
> 
> Thank you for the reviews.
> 
> On 3/17/25 1:55 PM, Geert Uytterhoeven wrote:
> > On Sun, 16 Mar 2025 at 07:59, Stafford Horne <sho...@gmail.com> wrote:
> > > [...]
> > > > +struct cache_desc {
> > > > +     u32 size;
> > > > +     u32 sets;
> > > > +     u32 block_size;
> > > > +     u32 ways;
> > > 
> > > Considering the changes below to add cache available checks, maybe we
> > > want to add a field here, such as `bool present`.  Or a flags field like
> > > is used in loongarch?
> > 
> > I assume cache_desc.size is zero when the cache is not present?
> 
> Yes, cache_desc.size will be zero when there's no cache component since
> cpuinfo_or1k[NR_CPUS] is declared as a global variable in setup.c. The
> cache attributes are stored in this struct.
> 
> On 3/17/25 5:30 PM, Stafford Horne wrote:
> > [...]
> > Yes, good point, would be clean too work too.  I was not too happy with 
> > using
> > cache_desc.ways as is done below.  Also there ended up bieng 2 different 
> > ways
> > that were used.
> > 
> > I am happy to use size too, but I think checking the SPR would be faster or 
> > just
> > as fast as using the struct.  I am not too fussed either way.
> 
> There are a few places (e.g. cache_loop() in cache.c) where
> cpuinfo_or1k[smp_processor_id()] is not being referenced. This will have to be
> referenced to access cache_desc. In these cases, I think it would be better to
> read from the SPR instead. For consistency, the SPR can also be read where
> cpuinfo_or1k[smp_processor_id()] is already being referenced.
> 
> On 3/16/25 12:28 PM, Stafford Horne wrote:
> > On Sun, Mar 16, 2025 at 02:09:37AM +0530, Sahil Siddiq wrote:
> > > Add cacheinfo support for OpenRISC.
> > > [...]
> > > diff --git a/arch/openrisc/kernel/cacheinfo.c 
> > > b/arch/openrisc/kernel/cacheinfo.c
> > > new file mode 100644
> > > index 000000000000..6bb81e246f7e
> > > --- /dev/null
> > > +++ b/arch/openrisc/kernel/cacheinfo.c
> > > @@ -0,0 +1,106 @@
> > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > +/*
> > > + * OpenRISC cacheinfo support
> > > + *
> > > + * Based on work done for MIPS and LoongArch. All original copyrights
> > > + * apply as per the original source declaration.
> > > + *
> > > + * OpenRISC implementation:
> > > + * Copyright (C) 2025 Sahil Siddiq <sahil...@proton.me>
> > > + */
> > > +
> > > +#include <linux/cacheinfo.h>
> > > +#include <asm/cpuinfo.h>
> > > +#include <asm/spr.h>
> > > +#include <asm/spr_defs.h>
> > > +
> > > +static inline void ci_leaf_init(struct cacheinfo *this_leaf, enum 
> > > cache_type type,
> > > +                         unsigned int level, struct cache_desc *cache, 
> > > int cpu)
> > > +{
> > > + this_leaf->type = type;
> > > + this_leaf->level = level;
> > > + this_leaf->coherency_line_size = cache->block_size;
> > > + this_leaf->number_of_sets = cache->sets;
> > > + this_leaf->ways_of_associativity = cache->ways;
> > > + this_leaf->size = cache->size;
> > > + cpumask_set_cpu(cpu, &this_leaf->shared_cpu_map);
> > > +}
> > > +
> > > +int init_cache_level(unsigned int cpu)
> > > +{
> > > + struct cpuinfo_or1k *cpuinfo = &cpuinfo_or1k[smp_processor_id()];
> > > + struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> > > + int leaves = 0, levels = 0;
> > > + unsigned long upr = mfspr(SPR_UPR);
> > > + unsigned long iccfgr, dccfgr;
> > > +
> > > + if (!(upr & SPR_UPR_UP)) {
> > > +         printk(KERN_INFO
> > > +                "-- no UPR register... unable to detect 
> > > configuration\n");
> > > +         return -ENOENT;
> > > + }
> > > +
> > > + if (upr & SPR_UPR_DCP) {
> > > +         dccfgr = mfspr(SPR_DCCFGR);
> > > +         cpuinfo->dcache.ways = 1 << (dccfgr & SPR_DCCFGR_NCW);
> > > +         cpuinfo->dcache.sets = 1 << ((dccfgr & SPR_DCCFGR_NCS) >> 3);
> > > +         cpuinfo->dcache.block_size = 16 << ((dccfgr & SPR_DCCFGR_CBS) 
> > > >> 7);
> > > +         cpuinfo->dcache.size =
> > > +             cpuinfo->dcache.sets * cpuinfo->dcache.ways * 
> > > cpuinfo->dcache.block_size;
> > > +         leaves += 1;
> > > +         printk(KERN_INFO
> > > +                "-- dcache: %d bytes total, %d bytes/line, %d set(s), %d 
> > > way(s)\n",
> > > +                cpuinfo->dcache.size, cpuinfo->dcache.block_size,
> > > +                    cpuinfo->dcache.sets,
> > > +                cpuinfo->dcache.ways);
> > 
> > The indentation of sets looks a bit off here.
> > 
> > > + } else
> > > +         printk(KERN_INFO "-- dcache disabled\n");
> > > +
> > > + if (upr & SPR_UPR_ICP) {
> > > +         iccfgr = mfspr(SPR_ICCFGR);
> > > +         cpuinfo->icache.ways = 1 << (iccfgr & SPR_ICCFGR_NCW);
> > > +         cpuinfo->icache.sets = 1 << ((iccfgr & SPR_ICCFGR_NCS) >> 3);
> > > +         cpuinfo->icache.block_size = 16 << ((iccfgr & SPR_ICCFGR_CBS) 
> > > >> 7);
> > > +         cpuinfo->icache.size =
> > > +             cpuinfo->icache.sets * cpuinfo->icache.ways * 
> > > cpuinfo->icache.block_size;
> > > +         leaves += 1;
> > > +         printk(KERN_INFO
> > > +                "-- icache: %d bytes total, %d bytes/line, %d set(s), %d 
> > > way(s)\n",
> > > +                cpuinfo->icache.size, cpuinfo->icache.block_size,
> > > +                    cpuinfo->icache.sets,
> > > +                cpuinfo->icache.ways);
> > 
> > The indentation of sets looks a bit off here. Maybe its the others that are 
> > out
> > of line, but can you fix?  Also I think sets and ways could be on the same 
> > line.
> 
> Sorry, I must have missed this. I'll fix it.
> 
> > > [...]
> > > diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c
> > > index b3edbb33b621..ffb161e41e9d 100644
> > > --- a/arch/openrisc/kernel/dma.c
> > > +++ b/arch/openrisc/kernel/dma.c
> > > @@ -36,8 +36,10 @@ page_set_nocache(pte_t *pte, unsigned long addr,
> > >           flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> > >           /* Flush page out of dcache */
> > > - for (cl = __pa(addr); cl < __pa(next); cl += cpuinfo->dcache_block_size)
> > > -         mtspr(SPR_DCBFR, cl);
> > > + if (cpuinfo->dcache.ways) {
> > > +         for (cl = __pa(addr); cl < __pa(next); cl += 
> > > cpuinfo->dcache.block_size)
> > > +                 mtspr(SPR_DCBFR, cl);
> > > + }
> > 
> > I think it would be better to move this to cacheflush.h as a function like
> > flush_dcache_range() or local_dcache_range_flush().
> > 
> > >           return 0;
> > >   }
> > > @@ -104,15 +106,19 @@ void arch_sync_dma_for_device(phys_addr_t addr, 
> > > size_t size,
> > >           switch (dir) {
> > >           case DMA_TO_DEVICE:
> > >                   /* Flush the dcache for the requested range */
> > > -         for (cl = addr; cl < addr + size;
> > > -              cl += cpuinfo->dcache_block_size)
> > > -                 mtspr(SPR_DCBFR, cl);
> > > +         if (cpuinfo->dcache.ways) {
> > > +                 for (cl = addr; cl < addr + size;
> > > +                      cl += cpuinfo->dcache.block_size)
> > > +                         mtspr(SPR_DCBFR, cl);
> > > +         }
> > 
> > Also here,I think it would be better to move this to cacheflush.h as a 
> > function like
> > flush_dcache_range().
> > 
> > Or, local_dcache_range_flush(), which seems to be the convention we use in
> > cacheflush.h/cache.c.
> > 
> > 
> > >                   break;
> > >           case DMA_FROM_DEVICE:
> > >                   /* Invalidate the dcache for the requested range */
> > > -         for (cl = addr; cl < addr + size;
> > > -              cl += cpuinfo->dcache_block_size)
> > > -                 mtspr(SPR_DCBIR, cl);
> > > +         if (cpuinfo->dcache.ways) {
> > > +                 for (cl = addr; cl < addr + size;
> > > +                      cl += cpuinfo->dcache.block_size)
> > > +                         mtspr(SPR_DCBIR, cl);
> > > +         }
> > 
> > This one could be invalidate_dcache_range().   Note, this will also be 
> > useful
> > for the kexec patches that I am working on.
> > 
> > Or, local_dcache_range_inv(), which seems to be the convention we use in
> > cacheflush.h/cache.c.
> 
> Understood.
> 
> > > [...]
> > > @@ -29,13 +34,13 @@ static __always_inline void cache_loop(struct page 
> > > *page, const unsigned int reg
> > >   void local_dcache_page_flush(struct page *page)
> > >   {
> > > - cache_loop(page, SPR_DCBFR);
> > > + cache_loop(page, SPR_DCBFR, SPR_UPR_DCP);
> > >   }
> > >   EXPORT_SYMBOL(local_dcache_page_flush);
> > >   void local_icache_page_inv(struct page *page)
> > >   {
> > > - cache_loop(page, SPR_ICBIR);
> > > + cache_loop(page, SPR_ICBIR, SPR_UPR_ICP);
> > >   }
> > >   EXPORT_SYMBOL(local_icache_page_inv);
> > 
> > OK, if we move the range flush and invalidate here we will need to add to 
> > this
> > cache_loop a bit more.
> 
> Right. I'll see what can be done to keep it concise.
> 
> > > diff --git a/arch/openrisc/mm/init.c b/arch/openrisc/mm/init.c
> > > index d0cb1a0126f9..bbe16546c5b9 100644
> > > --- a/arch/openrisc/mm/init.c
> > > +++ b/arch/openrisc/mm/init.c
> > > @@ -124,6 +124,7 @@ static void __init map_ram(void)
> > >   void __init paging_init(void)
> > >   {
> > >           int i;
> > > + unsigned long upr;
> > >           printk(KERN_INFO "Setting up paging and PTEs.\n");
> > > @@ -176,8 +177,11 @@ void __init paging_init(void)
> > >           barrier();
> > >           /* Invalidate instruction caches after code modification */
> > > - mtspr(SPR_ICBIR, 0x900);
> > > - mtspr(SPR_ICBIR, 0xa00);
> > > + upr = mfspr(SPR_UPR);
> > > + if (upr & SPR_UPR_UP & SPR_UPR_ICP) {
> > > +         mtspr(SPR_ICBIR, 0x900);
> > > +         mtspr(SPR_ICBIR, 0xa00);
> > > + }
> > 
> > Here we could use new utilities such as local_icache_range_inv(0x900,
> > L1_CACHE_BYTES);
> > 
> > Or something like local_icache_block_inv(0x900).  This only needs to flush a
> > single block as the code it is invalidating is just 2 instructions 8 bytes:
> > 
> >      .org 0x900
> >     l.j     boot_dtlb_miss_handler
> >      l.nop
> > 
> >      .org 0xa00
> >     l.j     boot_itlb_miss_handler
> >      l.nop
> 
> Given that there'll be generic local_(i|d)cache_range_inv(start, stop) utility
> functions, would it make sense to simply have a macro defined as:
> 
> #define local_icache_block_inv(addr) local_icache_range_inv(start, 
> L1_CACHE_BYTES)
> 
> instead of having a separate function for invalidating a single cache line? 
> This would
> still use cache_loop() under the hood. The alternative would be to use
> local_icache_range_inv(start, L1_CACHE_BYTES) directly but using the macro 
> might be
> more readable.

Yes, I think a macro would be fine.  Should we use cache_desc.block_size or
L1_CACHE_BYTES?  It doesn't make much difference as L1_CACHE_BYTES is defined as
16 bytes which is the minimum block size and using that will always invalidate a
whole block.  It would be good to have a comment explaining why using
L1_CACHE_BYTES is enough.

-Stafford

Reply via email to