On Sat, Jul 11, 2020 at 02:50:46PM +0800, Zhenyu Ye wrote:
> On 2020/7/11 2:31, Catalin Marinas wrote:
> > On Fri, Jul 10, 2020 at 05:44:20PM +0800, Zhenyu Ye wrote:
> >> -  if ((end - start) >= (MAX_TLBI_OPS * stride)) {
> >> +  if ((!cpus_have_const_cap(ARM64_HAS_TLBI_RANGE) &&
> >> +      (end - start) >= (MAX_TLBI_OPS * stride)) ||
> >> +      pages >= MAX_TLBI_RANGE_PAGES) {
> >>            flush_tlb_mm(vma->vm_mm);
> >>            return;
> >>    }
> > 
> > I think we can use strictly greater here rather than greater or equal.
> > MAX_TLBI_RANGE_PAGES can be encoded as num 31, scale 3.
> 
> Sorry, we can't.
> For a boundary value (such as 2^6), we have two way to express it
> in TLBI RANGE operations:
> 1. scale = 0, num = 31.
> 2. scale = 1, num = 0.
> 
> I used the second way in following implementation.  However, for the
> MAX_TLBI_RANGE_PAGES, we can only use scale = 3, num = 31.
> So if use strictly greater here, ERROR will happen when range pages
> equal to MAX_TLBI_RANGE_PAGES.

You are right, I got confused by the __TLBI_RANGE_NUM() macro which
doesn't return the actual 'num' for the TLBI argument as it would go
from 0 to 31. After subtracting 1, num end sup from -1 to 30, so we
never get the maximum range. I think for scale 3 and num 31, this would
be 8GB with 4K pages, so the maximum we'd cover 8GB - 64K * 4K.

> There are two ways to avoid this bug:
> 1. Just keep 'greater or equal' here.  The ARM64 specification does
> not specify how we flush tlb entries in this case, flush_tlb_mm()
> is also a good choice for such a wide range of pages.

I'll go for this option, I don't think it would make much difference in
practice if we stop at 8GB - 256M range.

> 2. Add check in the loop, just like: (this may cause the codes a bit ugly)
> 
>       num = __TLBI_RANGE_NUM(pages, scale) - 1;
> 
>       /* scale = 4, num = 0 is equal to scale = 3, num = 31. */
>       if (scale == 4 && num == 0) {
>               scale = 3;
>               num = 31;
>       }
> 
>       if (num >= 0) {
>       ...
> 
> Which one do you prefer and how do you want to fix this error? Just
> a fix patch again?

I'll fold the diff below and refresh the patch:

diff --git a/arch/arm64/include/asm/tlbflush.h 
b/arch/arm64/include/asm/tlbflush.h
index 1eb0588718fb..0300e433ffe6 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -147,9 +147,13 @@ static inline unsigned long get_trans_granule(void)
 #define __TLBI_RANGE_PAGES(num, scale) (((num) + 1) << (5 * (scale) + 1))
 #define MAX_TLBI_RANGE_PAGES           __TLBI_RANGE_PAGES(31, 3)
 
+/*
+ * Generate 'num' values from -1 to 30 with -1 rejected by the
+ * __flush_tlb_range() loop below.
+ */
 #define TLBI_RANGE_MASK                        GENMASK_ULL(4, 0)
 #define __TLBI_RANGE_NUM(range, scale) \
-       (((range) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK)
+       ((((range) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1)
 
 /*
  *     TLB Invalidation
@@ -285,8 +289,8 @@ static inline void __flush_tlb_range(struct vm_area_struct 
*vma,
        pages = (end - start) >> PAGE_SHIFT;
 
        if ((!cpus_have_const_cap(ARM64_HAS_TLB_RANGE) &&
-            (end - start) > (MAX_TLBI_OPS * stride)) ||
-           pages > MAX_TLBI_RANGE_PAGES) {
+            (end - start) >= (MAX_TLBI_OPS * stride)) ||
+           pages >= MAX_TLBI_RANGE_PAGES) {
                flush_tlb_mm(vma->vm_mm);
                return;
        }
@@ -306,6 +310,10 @@ static inline void __flush_tlb_range(struct vm_area_struct 
*vma,
         *    Start from scale = 0, flush the corresponding number of pages
         *    ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
         *    until no pages left.
+        *
+        * Note that certain ranges can be represented by either num = 31 and
+        * scale or num = 0 and scale + 1. The loop below favours the latter
+        * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
         */
        while (pages > 0) {
                if (!cpus_have_const_cap(ARM64_HAS_TLB_RANGE) ||
@@ -323,7 +331,7 @@ static inline void __flush_tlb_range(struct vm_area_struct 
*vma,
                        continue;
                }
 
-               num = __TLBI_RANGE_NUM(pages, scale) - 1;
+               num = __TLBI_RANGE_NUM(pages, scale);
                if (num >= 0) {
                        addr = __TLBI_VADDR_RANGE(start, asid, scale,
                                                  num, tlb_level);

> >> -  /* Convert the stride into units of 4k */
> >> -  stride >>= 12;
> >> +  dsb(ishst);
> >>  
> >> -  start = __TLBI_VADDR(start, asid);
> >> -  end = __TLBI_VADDR(end, asid);
> >> +  /*
> >> +   * When cpu does not support TLBI RANGE feature, we flush the tlb
> >> +   * entries one by one at the granularity of 'stride'.
> >> +   * When cpu supports the TLBI RANGE feature, then:
> >> +   * 1. If pages is odd, flush the first page through non-RANGE
> >> +   *    instruction;
> >> +   * 2. For remaining pages: The minimum range granularity is decided
> >> +   *    by 'scale', so we can not flush all pages by one instruction
> >> +   *    in some cases.
> >> +   *    Here, we start from scale = 0, flush corresponding pages
> >> +   *    (from 2^(5*scale + 1) to 2^(5*(scale + 1) + 1)), and increase
> >> +   *    it until no pages left.
> >> +   */
> >> +  while (pages > 0) {
> > 
> > I did some simple checks on ((end - start) % stride) and never
> > triggered. I had a slight worry that pages could become negative (and
> > we'd loop forever since it's unsigned long) for some mismatched stride
> > and flush size. It doesn't seem like.
> 
> The start and end are round_down/up in the function:
> 
>       start = round_down(start, stride);
>       end = round_up(end, stride);
> 
> So the flush size and stride will never mismatch.

Right.

To make sure we don't miss any corner cases, I'll try to through the
algorithm above at CBMC (model checker; hopefully next week if I find
some time).

-- 
Catalin

Reply via email to