On Fri, Feb 21, 2020 at 10:19 AM Robin Murphy <robin.mur...@arm.com> wrote: > > On 20/02/2020 5:54 pm, Rob Herring wrote: > > On Mon, Feb 17, 2020 at 1:17 PM Robin Murphy <robin.mur...@arm.com> wrote: > >> > >> On 13/02/2020 9:49 pm, Rob Herring wrote: > >>> On Thu, Jan 30, 2020 at 11:34 AM Robin Murphy <robin.mur...@arm.com> > >>> wrote: > >>>> > >>>> On 30/01/2020 3:06 pm, Auger Eric wrote: > >>>>> Hi Rob, > >>>>> On 1/17/20 10:16 PM, Rob Herring wrote: > >>>>>> Arm SMMUv3.2 adds support for TLB range invalidate operations. > >>>>>> Support for range invalidate is determined by the RIL bit in the IDR3 > >>>>>> register. > >>>>>> > >>>>>> The range invalidate is in units of the leaf page size and operates on > >>>>>> 1-32 chunks of a power of 2 multiple pages. First, we determine from > >>>>>> the > >>>>>> size what power of 2 multiple we can use. Then we calculate how many > >>>>>> chunks (1-31) of the power of 2 size for the range on the iteration. On > >>>>>> each iteration, we move up in size by at least 5 bits. > >>>>>> > >>>>>> Cc: Eric Auger <eric.au...@redhat.com> > >>>>>> Cc: Jean-Philippe Brucker <jean-phili...@linaro.org> > >>>>>> Cc: Will Deacon <w...@kernel.org> > >>>>>> Cc: Robin Murphy <robin.mur...@arm.com> > >>>>>> Cc: Joerg Roedel <j...@8bytes.org> > >>>>>> Signed-off-by: Rob Herring <r...@kernel.org> > >>> > >>> > >>>>>> @@ -2003,7 +2024,7 @@ static void arm_smmu_tlb_inv_range(unsigned long > >>>>>> iova, size_t size, > >>>>>> { > >>>>>> u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS]; > >>>>>> struct arm_smmu_device *smmu = smmu_domain->smmu; > >>>>>> - unsigned long start = iova, end = iova + size; > >>>>>> + unsigned long start = iova, end = iova + size, num_pages = 0, tg > >>>>>> = 0; > >>>>>> int i = 0; > >>>>>> struct arm_smmu_cmdq_ent cmd = { > >>>>>> .tlbi = { > >>>>>> @@ -2022,12 +2043,50 @@ static void arm_smmu_tlb_inv_range(unsigned > >>>>>> long iova, size_t size, > >>>>>> cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; > >>>>>> } > >>>>>> > >>>>>> + if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { > >>>>>> + /* Get the leaf page size */ > >>>>>> + tg = __ffs(smmu_domain->domain.pgsize_bitmap); > >>>>>> + > >>>>>> + /* Convert page size of 12,14,16 (log2) to 1,2,3 */ > >>>>>> + cmd.tlbi.tg = ((tg - ilog2(SZ_4K)) / 2) + 1; > >>>> > >>>> Given the comment, I think "(tg - 10) / 2" would suffice ;) > >>> > >>> Well, duh... > >>> > >>>> > >>>>>> + > >>>>>> + /* Determine what level the granule is at */ > >>>>>> + cmd.tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3)); > >>>> > >>>> Is it possible to rephrase that with logs and shifts to avoid the > >>>> division? > >>> > >>> Well, with a loop is the only other way I came up with: > >>> > >>> bpl = tg - 3; > >>> ttl = 3; > >>> mask = BIT(bpl) - 1; > >>> while ((granule & (mask << ((4 - ttl) * bpl + 3))) == 0) > >>> ttl--; > >>> > >>> Doesn't seem like much improvement to me given we have h/w divide... > >> > >> Sure, it doesn't take too many extra instructions to start matching > >> typical IDIV latency, so there's no point being silly if there really > >> isn't a clean alternative. > >> > >> Some quick scribbling suggests "4 - ilog2(granule) / 10" might actually > >> be close enough, but perhaps that's a bit too cheeky. > > > > How does divide by 10 save a divide? > > Well, by that point I was more just thinking about the smallest > expression, since *some* division seems unavoidable. Although GCC does > apparently still think that transforming constant division is a win ;)
Okay. Still, divide by 10 happens to work, but it is very much not obvious. It also doesn't work for level 1 and 16 or 64KB pages (though we'll never see that granule). Subtracting 3 is not that obvious either, but is at least in the spec for calculating the bits per level. I think we've spent enough time micro-optimizing this and have better things to worry about. Rob _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu