Hello Peter,
Here are my answers to your questions.
> But how does it work?
Ok I will try to give you more clues.
The issue occurs when partitions scanning has not finished yet. That's
my analysis but also the analysis from other people (see links I added
in the PATCH comment).
I will explain two cases: MMC and NVMe devices.
For MMC devices:
Here is the simplified call stack for partition scan:
mmc_rescan() // delayed work
mmc_add_card()
device_add()
driver_probe_device() // enter "probing" state
mmc_blk_probe()
add_disk_fwnode()
While executing the mmc_blk_probe() function, the device is in probing
state, ie. probe_count is non-zero. This function first creates the
disk device then scans partitions with disk_scan_partitions(). Thus,
waiting for probing to end is enough to fix the issue for MMC devices.
For NVMe devices:
Here is the simplified call stack for partition scan:
nvme_scan_work() // delayed work
nvme_scan_ns_async() // via async_schedule_domain()
nvme_alloc_ns()
device_add_disk()
add_disk_fwnode()
Here, NVMe device isn't in "probing" state but uses the asynchronous
execution framework. Thus, you also have to synchronize all
asynchronous function calls to make sure partition scan has finished,
using async_synchronize_full() function.
That's exactly what wait_for_device_probe() does: it waits for probing to be
done and calls async_synchronize_full(). If you are still not convinced
this does work, look at the wait_for_root() function. You will find these
two actions in the code (probing wait and async_synchronize_full). I
didn't find anyone complaining about this issue with rootwait= argument.
> Do we still need the other wait_for_device_probe() call?
Technically, I think it still works without the first call (if you move
the second one out of the if block). But I preferred keeping it for the
following two reaons:
1. That's what is done in rootwait=, which does not have the issue.
I prefer copying what is working, especially when there is no problem
keeping the first wait_for_device_probe() call.
2. Removing it may degrade boot performances for devices that the first
wait_for_device_probe() actually waits for. In this case, wait is
made by the while loop with its arbitrary 5ms sleep. When
wait_for_device_probe() is kept, it only waits for the right
amount of time and the while loop does not wait at all.
> This looks nontrivial, a comment would be helpful.
I think the commit message contains enough information to understand why
the second wait_for_device_probe() call is required, a comment would
contain less information so I prefer letting the code like that.
Guillaume