On Sun, 2016-01-10 at 01:08 -0200, Guilherme G. Piccoli wrote:

> Commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on 
> Cell")
> added a check on function eeh_add_device_early(): since in Cell arch eeh_ops
> is NULL, that code used to crash on Cell. The commit's approach was validate
> if EEH was available by checking the result of function eeh_enabled().
> 
> Since the function eeh_add_device_early() is used to perform EEH
> initialization in devices added later on the system, like in hotplug/DLPAR
> scenarios, we might reach a case in which no PCI devices are present on boot
> and so EEH is not initialized. Then, if a device is added via DLPAR for
> example, eeh_add_device_early() fails because eeh_enabled() is false.
> 
> We can hit a kernel oops on pSeries arch if eeh_add_device_early() fails:
> if we have no PCI devices on machine at boot time, and then we add a PCI 
> device
> via DLPAR operation, the function query_ddw() triggers the oops on NULL 
> pointer
> dereference in the line "cfg_addr = edev->config_addr;". It happens because
> config_addr in edev is NULL, since the function eeh_add_device_early() was not
> completed successfully.
> 
> This patch just changes the way the arch checking is done in function
> eeh_add_device_early(): we use no more eeh_enabled(), but instead we check the
> running architecture by using the macro machine_is(). If we are running on
> pSeries or PowerNV, the EEH mechanism can be enabled; otherwise, we bail out
> the function. This way, we don't enable EEH on Cell and we don't hit the oops
> on DLPAR either.

But eeh_enabled() is still false? That seems like it's liable to cause breakage
elsewhere.

Shouldn't the PCI hotplug code instead be taught to initialise EEH correctly
when the first device is added?

cheers

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to