I agree with the comment that we really should move this out of line now, and also that we can simplify it further, which also includes not bothering with the SBI call if we were the only online CPU. I also thing we need to use get_cpu/put_cpu to be preemption safe.
Also why would we need to do a local flush if we have a mask that doesn't include the local CPU? How about something like: void __riscv_flush_tlb(struct cpumask *cpumask, unsigned long start, unsigned long size) { unsigned int cpu; if (!cpumask) cpumask = cpu_online_mask; cpu = get_cpu(); if (!cpumask || cpumask_test_cpu(cpu, cpumask) { if ((start == 0 && size == -1) || size > PAGE_SIZE) local_flush_tlb_all(); else if (size == PAGE_SIZE) local_flush_tlb_page(start); cpumask_clear_cpu(cpuid, cpumask); } if (!cpumask_empty(cpumask)) { struct cpumask hmask; riscv_cpuid_to_hartid_mask(cpumask, &hmask); sbi_remote_sfence_vma(hmask.bits, start, size); } put_cpu(); }