On 10/9/24 17:35, Pierrick Bouvier wrote:
@@ -1061,15 +1061,13 @@ static inline void tlb_set_compare(CPUTLBEntryFull *full, CPUTLBEntry *ent,
                                     vaddr address, int flags,
                                     MMUAccessType access_type, bool enable)
  {
-    if (enable) {
-        address |= flags & TLB_FLAGS_MASK;
-        flags &= TLB_SLOW_FLAGS_MASK;
-        if (flags) {
-            address |= TLB_FORCE_SLOW;
-        }
-    } else {
-        address = -1;
-        flags = 0;
+    if (!enable) {
+    address = TLB_INVALID_MASK;
+    }
+    address |= flags & TLB_FLAGS_MASK;
+    flags &= TLB_SLOW_FLAGS_MASK;
+    if (flags) {
+        address |= TLB_FORCE_SLOW;
      }

I'm not sure to follow this change correctly.
After, the final address and flags value depend on flags in parameter, while before, it used to depend on flags & enable parameter.

This deserves to be split out; I even thought about it Monday night but then forgot when I restarted on Tuesday morning.

Before, address is -1 for disabled, mostly because that mirrors what you get with memset to initialize the tlb. All of the flags are discarded. But the only thing that's important is that TLB_INVALID_MASK is set.

After, TLB_INVALID_MASK is still set, but the flags are retained. This is because we want a source of those flags to use for MMU_INST_FETCH. With this patch set we no longer store flags for execute and instead grab them from the flags for read. From tlb_set_page_full...


@@ -1215,9 +1213,6 @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
      /* Now calculate the new entry */
      node->copy.addend = addend - addr_page;
-    tlb_set_compare(full, &node->copy, addr_page, read_flags,
-                    MMU_INST_FETCH, prot & PAGE_EXEC);
-
      if (wp_flags & BP_MEM_READ) {
          read_flags |= TLB_WATCHPOINT;
      }

... we can see that the only difference between the two is the watchpoint bit. Importantly, TLB_MMIO is common to all three comparators.

Sounds good to have a fast path for code fetch. Did you measure the benefit, or just implemented this thinking it's worth?

This is not about a fast path for code fetch, but always using the *slow* path. The object is to repurpose one word from CPUTLBEntry, removed here and added back in the next patch, to link CPUTLBEntry to CPUTLBEntryTree without changing sizeof(CPUTLBEntry).


r~

Reply via email to