This means that 'A -> idle -> A' should never pass through switch_mm to begin with.Please clarify how you think it does.
the idle code does leave_mm() to avoid having to IPI CPUs in deep sleep states for a tlb flush. (trust me, that you really want, sequentially IPI's a pile of cores in a deep sleep state to just flush a tlb that's empty, the performance of that is horrific)