Hi Ritesh,

Thanks for looking into this patch. My responses your review inline
below:

Ritesh Harjani (IBM) <ritesh.l...@gmail.com> writes:

> Narayana Murty N <nnmli...@linux.ibm.com> writes:
>
>> Makes pseries_eeh_err_inject() available even when debugfs
>> is disabled (CONFIG_DEBUG_FS=n). It moves eeh_debugfs_break_device()
>> and eeh_pe_inject_mmio_error() out of the CONFIG_DEBUG_FS block
>> and renames it as eeh_break_device().
>>
>> Reported-by: kernel test robot <l...@intel.com>
>> Closes: 
>> https://lore.kernel.org/oe-kbuild-all/202409170509.vwc6jadc-...@intel.com/
>> Fixes: b0e2b828dfca ("powerpc/pseries/eeh: Fix pseries_eeh_err_inject")
>> Signed-off-by: Narayana Murty N <nnmli...@linux.ibm.com>
>> ---
>>  arch/powerpc/kernel/eeh.c | 198 +++++++++++++++++++-------------------
>>  1 file changed, 99 insertions(+), 99 deletions(-)
>
> Ok, so in your original patch you implemented eeh_inject ops for pseries
> using mmio based eeh error injection (eeh_pe_inject_mmio_error()), which
> uses the functions defined under debugfs -> eeh_debugfs_break_device(). 
>
> This was failing when CONFIG_DEBUGFS is not defined, thus referring to
> undefined function definition. 
>
> Minor nit below.
>
>>
>> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>> index 49ab11a287a3..0fe25e907ea6 100644
>> --- a/arch/powerpc/kernel/eeh.c
>> +++ b/arch/powerpc/kernel/eeh.c
>> @@ -1574,6 +1574,104 @@ static int proc_eeh_show(struct seq_file *m, void *v)
>>  }
>>  #endif /* CONFIG_PROC_FS */
>>  
>> +static int eeh_break_device(struct pci_dev *pdev)
>> +{
>> +    struct resource *bar = NULL;
>> +    void __iomem *mapped;
>> +    u16 old, bit;
>> +    int i, pos;
>> +
>> +    /* Do we have an MMIO BAR to disable? */
>> +    for (i = 0; i <= PCI_STD_RESOURCE_END; i++) {
>> +            struct resource *r = &pdev->resource[i];
>> +
>> +            if (!r->flags || !r->start)
>> +                    continue;
>> +            if (r->flags & IORESOURCE_IO)
>> +                    continue;
>> +            if (r->flags & IORESOURCE_UNSET)
>> +                    continue;
>> +
>> +            bar = r;
>> +            break;
>> +    }
>> +
>> +    if (!bar) {
>> +            pci_err(pdev, "Unable to find Memory BAR to cause EEH with\n");
>> +            return -ENXIO;
>> +    }
>> +
>> +    pci_err(pdev, "Going to break: %pR\n", bar);
>> +
>> +    if (pdev->is_virtfn) {
>> +#ifndef CONFIG_PCI_IOV
>> +            return -ENXIO;
>> +#else
>> +            /*
>> +             * VFs don't have a per-function COMMAND register, so the best
>> +             * we can do is clear the Memory Space Enable bit in the PF's
>> +             * SRIOV control reg.
>> +             *
>> +             * Unfortunately, this requires that we have a PF (i.e doesn't
>> +             * work for a passed-through VF) and it has the potential side
>> +             * effect of also causing an EEH on every other VF under the
>> +             * PF. Oh well.
>> +             */
>> +            pdev = pdev->physfn;
>> +            if (!pdev)
>> +                    return -ENXIO; /* passed through VFs have no PF */
>> +
>> +            pos  = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV);
>> +            pos += PCI_SRIOV_CTRL;
>> +            bit  = PCI_SRIOV_CTRL_MSE;
>> +#endif /* !CONFIG_PCI_IOV */
>> +    } else {
>> +            bit = PCI_COMMAND_MEMORY;
>> +            pos = PCI_COMMAND;
>> +    }
>> +
>> +    /*
>> +     * Process here is:
>> +     *
>> +     * 1. Disable Memory space.
>> +     *
>> +     * 2. Perform an MMIO to the device. This should result in an error
>> +     *    (CA  / UR) being raised by the device which results in an EEH
>> +     *    PE freeze. Using the in_8() accessor skips the eeh detection hook
>> +     *    so the freeze hook so the EEH Detection machinery won't be
>> +     *    triggered here. This is to match the usual behaviour of EEH
>> +     *    where the HW will asynchronously freeze a PE and it's up to
>> +     *    the kernel to notice and deal with it.
>> +     *
>> +     * 3. Turn Memory space back on. This is more important for VFs
>> +     *    since recovery will probably fail if we don't. For normal
>> +     *    the COMMAND register is reset as a part of re-initialising
>> +     *    the device.
>> +     *
>> +     * Breaking stuff is the point so who cares if it's racy ;)
>> +     */
>> +    pci_read_config_word(pdev, pos, &old);
>> +
>> +    mapped = ioremap(bar->start, PAGE_SIZE);
>> +    if (!mapped) {
>> +            pci_err(pdev, "Unable to map MMIO BAR %pR\n", bar);
>> +            return -ENXIO;
>> +    }
>> +
>> +    pci_write_config_word(pdev, pos, old & ~bit);
>> +    in_8(mapped);
>> +    pci_write_config_word(pdev, pos, old);
>> +
>> +    iounmap(mapped);
>> +
>> +    return 0;
>> +}
>> +
>> +int eeh_pe_inject_mmio_error(struct pci_dev *pdev)
>> +{
>> +    return eeh_break_device(pdev);
>> +}
>> +
>
> Why have an extra eeh_pe_inject_mmio_error() function which only calls
> eeh_break_device()?
>
> Maybe we can rename eeh_break_device() to eeh_mmio_break_device() and use
> this function itself at both call sites?

Fair suggestion,

However we want to keep the method debugfs interface uses
to inject EEH (thats ppc platform agonistic), decoupled from what pseries
uses. Right now to support as initial work VFIO EEH injection on
pseries, we are piggy backing on eeh_debugfs_break_device().

This will change in future as we add more capabilities to pseries EEH
injection and this will change working of eeh_pe_inject_mmio_error()
without impacting the semantics of existing eeh_break_device().

-- 
Cheers
~ Vaibhav

Reply via email to