On Sun, Nov 19, 2023 at 06:19:43PM +0900, Hector Martin wrote:
> >> +static int iommu_fwspec_assign_iommu(struct iommu_fwspec *fwspec,
> >> + struct device *dev,
> >> + struct fwnode_handle *iommu_fwnode)
> >> +{
> >> + const struct iommu_ops *ops;
> >> +
> >> + if (fwspec->iommu_fwnode) {
> >> + /*
> >> + * fwspec->iommu_fwnode is the first iommu's fwnode. In the rare
> >> + * case of multiple iommus for one device they must point to the
> >> + * same driver, checked via same ops.
> >> + */
> >> + ops = iommu_ops_from_fwnode(iommu_fwnode);
> >
> > This carries over a related bug from the original code: If a device has
> > two IOMMUs and the first one probes but the second one defers, ops will
> > be NULL here and the check will fail with EINVAL.
> >
> > Adding a check for that case here fixes it:
> >
> > if (!ops)
> > return driver_deferred_probe_check_state(dev);
Yes!
> > With that, for the whole series:
> >
> > Tested-by: Hector Martin <[email protected]>
> >
> > I can't specifically test for the probe races the series intends to fix
> > though, since that bug we only hit extremely rarely. I'm just testing
> > that nothing breaks.
>
> Actually no, this fix is not sufficient. If the first IOMMU is ready
> then the xlate path allocates dev->iommu, which then
> __iommu_probe_device takes as a sign that all IOMMUs are ready and does
> the device init.
It doesn't.. The code there is:
if (!fwspec && dev->iommu)
fwspec = dev->iommu->fwspec;
if (fwspec)
ops = fwspec->ops;
else
ops = dev->bus->iommu_ops;
if (!ops) {
ret = -ENODEV;
goto out_unlock;
}
Which is sensitive only to !NULL fwspec, and if EPROBE_DEFER is
returned fwspec will be freed and dev->iommu->fwspec will be NULL
here.
In the NULL case it does a 'bus probe' with a NULL fwspec and all the
fwspec drivers return immediately from their probe functions.
Did I miss something?
> Then when the xlate comes along again after suceeding
> with the second IOMMU, __iommu_probe_device sees the device is already
> in a group and never initializes the second IOMMU, leaving the device
> with only one IOMMU.
This should be fixed by the first hunk to check every iommu and fail?
BTW, do you have a systems with same device attached to multiple
iommus?
I've noticed another bug here, many drivers don't actually support
differing iommu instances and nothing seems to check it..
Thanks,
Jason