On 7/1/26 10:22 AM, Frederic Weisbecker wrote:
Le Thu, Jun 25, 2026 at 01:27:54AM -0400, Waiman Long a écrit :
On 6/24/26 2:34 AM, Jing Wu wrote:
    3. Are there specific patches in your series where you would welcome
       our contribution directly?
I have broken down the shutdown callback into separate portions as suggested
by Thomas. The other major change that I am working on is to try to shutdown
to only CPUHP_AP_OFFLINE state instead of all the way down to CPUHP_OFFLINE.
What was the reason for that already? Can we perhaps ask the user to offline
the target CPUs before toggling isolation on them?
The major problem about fully offlining the CPU is the CPU hotplug stop machine mechanism which put all the CPUs except the CPU to be offlined in a waiting loop within the IPI handler when the offline CPU is transitioning from CPUHP_TEARDOWN_CPU to  CPUHP_AP_IDLE_DEAD. If there is another active isolated partition running DPDK, for instance, it will break the low latency guarantee for a short duration.
That will require some adjustments to the nohz_full related hotplug
functions. I have some ideas of what needs to be done. However, I haven't
looked into RCU yet. I know RCU support changing the nocb mask for fully
offline CPUs, I will need to find out if it possible to do that for
partially offline CPUs.
No because callbacks can still be enqueued at this stage. But we could
manage to make it work with CPUHP_AP_IDLE_DEAD.

If we can only go as high as CPUHP_AP_IDLE_DEAD, we may as well go down all the way to CPUHP_OFFLINE as stop machine should be done at CPUHP_AP_IDLE_DEAD. In that case, we may have to break RCU out from HK_TYPE_KERNEL_NOISE and add a cpuset control switch for the system administrators to decide if they are willing to suffer a brief latency spike for an existing isolated partition or keep the RCU housekeeping mask unchanged to avoid that when creating a new or destroying an old isolated partition.

Cheers,
Longman


Thanks.



Reply via email to