On 08/12/2025 09:52, Barry Song wrote:
> On Mon, Dec 8, 2025 at 5:41 PM gao xu <[email protected]> wrote:
>>
>> commit 04c7adb5871a ("dma-buf: system_heap: use larger contiguous mappings
>> instead of per-page mmap") facilitates the use of PTE_CONT. The system_heap
>> allocates pages of order 4 and 8 that meet the alignment requirements for
>> PTE_CONT. enabling PTE_CONT for larger contiguous mappings.
> 
> Unfortunately, we don't have pte_cont for architectures other than
> AArch64. On the other hand, AArch64 isn't automatically mapping
> cont_pte for mmap. It might be better if this were done
> automatically by the ARM code.

Yes indeed; CONT_PTE_MASK and PTE_CONT are arm64-specific macros that cannot be
used outside of the arm64 arch code.

> 
> Ryan(Cced) is the expert on automatically setting cont_pte for
> contiguous mapping, so let's ask for some advice from Ryan.

arm64 arch code will automatically and transparently apply PTE_CONT whenever it
detects suitable conditions. Those suitable conditions include:

  - physically contiguous block of 64K, aligned to 64K
  - virtually contiguous block of 64K, aligned to 64K
  - 64K block has the same access permissions
  - 64K block all belongs to the same folio
  - not a special mapping

The last 2 requirements are the tricky ones here: We require that every page in
the block belongs to the same folio because a contigous mapping only maintains a
single access and dirty bit for the whole 64K block, so we are losing fidelity
vs per-page mappings. But the kernel tracks access/dirty per folio, so the extra
fidelity we get for per-page mappings is ingored by the kernel anyway if the
contiguous mapping only maps pages from a single folio. We reject special
mappings because they are not backed by a folio at all.

For your case, remap_pfn_range() will create special mappings so we will never
set the PTE_CONT bit.

Likely we are being a bit too conservative here and we may be able to relax this
requirement if we know that nothing will ever consume the access/dirty
information for special mappings? I'm not if that is the case in general though
- it would need some investigation.

With that issue resolved, there is still a second issue; there are 2 ways the
arm64 arch code detects suitable contiguous mappings. The primary way is via a
call to set_ptes(). This part of the "PTE batching" API and explicitly tells the
implementaiton that all the conditions are met (including the memory being
backed by a folio). This is the most efficient approach. See contpte_set_ptes().

There is a second (hacky) approach which attempts to recognise when the last PTE
of a contiguous block is set and automatically "fold" the mapping. See
contpte_try_fold(). This approach has a cost because (for systems without
BBML2_NOABORT) we have to issue a TLBI when we fold the range.

For remap_pfn_range(), we would be relying on the second approach since it is
not currently batched (and could not use set_ptes() as currently spec'ed due to
there being no folio). If we are going to add support for contiguous pfn-mapped
PTEs, it would be preferable to add equivalent batching APIs (or relax 
set_ptes()).

I think this would be a useful improvement, but it's not as straightforward as
adding PTE_CONT in system_heap_mmap().

Thanks,
Ryan

> 
>>
>> After applying this patch, TLB misses are reduced by approximately 5% when
>> opening the camera on Android systems.
>>
>> Signed-off-by: gao xu <[email protected]>
>> ---
> 
> Thanks
> Barry

Reply via email to