Michal Suchánek <msucha...@suse.de> writes: > On Fri, Sep 16, 2022 at 04:56:18PM -0500, Nathan Lynch wrote: >> "Nicholas Piggin" <npig...@gmail.com> writes: >> > On Wed Sep 14, 2022 at 3:39 AM AEST, Leonardo Brás wrote: >> >> On Mon, 2022-09-12 at 14:58 -0500, Nathan Lynch wrote: >> >> > Leonardo Brás <leobra...@gmail.com> writes: >> >> > > On Fri, 2022-09-09 at 09:04 -0500, Nathan Lynch wrote: > >> >> > > > No, it means the premise of commit b664db8e3f97 ("powerpc/rtas: >> >> > > > Implement reentrant rtas call") change is incorrect. The "reentrant" >> >> > > > property described in the spec applies only to the individual RTAS >> >> > > > functions. The OS can invoke (for example) ibm,set-xive on multiple >> >> > > > CPUs >> >> > > > simultaneously, but it must adhere to the more general requirement >> >> > > > to >> >> > > > serialize with other RTAS functions. >> >> > > > >> >> > > >> >> > > I see. Thanks for explaining that part! >> >> > > I agree: reentrant calls that way don't look as useful on Linux than I >> >> > > previously thought. >> >> > > >> >> > > OTOH, I think that instead of reverting the change, we could make use >> >> > > of the >> >> > > correct information and fix the current implementation. (This could >> >> > > help when we >> >> > > do the same rtas call in multiple cpus) >> >> > >> >> > Hmm I'm happy to be mistaken here, but I doubt we ever really need to do >> >> > that. I'm not seeing the need. >> >> > >> >> > > I have an idea of a patch to fix this. >> >> > > Do you think it would be ok if I sent that, to prospect being an >> >> > > alternative to >> >> > > this reversion? >> >> > >> >> > It is my preference, and I believe it is more common, to revert to the >> >> > well-understood prior state, imperfect as it may be. The revert can be >> >> > backported to -stable and distros while development and review of >> >> > another approach proceeds. >> >> >> >> Ok then, as long as you are aware of the kdump bug, I'm good. >> >> >> >> FWIW: >> >> Reviewed-by: Leonardo Bras <leobra...@gmail.com> >> > >> > A shame. I guess a reader/writer lock would not be much help because >> > the crash is probably more likely to hit longer running rtas calls? >> > >> > Alternative is just cheat and do this...? >> > >> > Thanks, >> > Nick >> > >> > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >> > index 693133972294..89728714a06e 100644 >> > --- a/arch/powerpc/kernel/rtas.c >> > +++ b/arch/powerpc/kernel/rtas.c >> > @@ -26,6 +26,7 @@ >> > #include <linux/syscalls.h> >> > #include <linux/of.h> >> > #include <linux/of_fdt.h> >> > +#include <linux/panic.h> >> > >> > #include <asm/interrupt.h> >> > #include <asm/rtas.h> >> > @@ -97,6 +98,19 @@ static unsigned long lock_rtas(void) >> > { >> > unsigned long flags; >> > >> > + if (atomic_read(&panic_cpu) == raw_smp_processor_id()) { >> > + /* >> > + * Crash in progress on this CPU. Other CPUs should be >> > + * stopped by now, so skip the lock in case it was being >> > + * held, and is now needed for crashing e.g., kexec >> > + * (machine_kexec_mask_interrupts) requires rtas calls. >> > + * >> > + * It's possible this could have caused rtas state >> > breakage >> > + * but the alternative is deadlock. >> > + */ >> > + return 0; >> > + } >> > + >> > local_irq_save(flags); >> > preempt_disable(); >> > arch_spin_lock(&rtas.lock); >> > @@ -105,6 +119,9 @@ static unsigned long lock_rtas(void) >> > >> > static void unlock_rtas(unsigned long flags) >> > { >> > + if (atomic_read(&panic_cpu) == raw_smp_processor_id()) >> > + return; >> > + >> > arch_spin_unlock(&rtas.lock); >> > local_irq_restore(flags); >> > preempt_enable(); >> >> Looks correct. >> >> I wonder - would it be worth making the panic path use a separate >> "emergency" rtas_args buffer as well? If a CPU is actually "stuck" in >> RTAS at panic time, then leaving rtas.args untouched might make the >> ibm,int-off, ibm,set-xive, ibm,os-term, and any other RTAS calls we >> incur on the panic path more likely to succeed. > > Was some fix for the case of crashing in rtas merged? > > Looks like there is none unless I missed something.
I'm not aware of any crashes in RTAS, but we do have an issue open to track the issue I think you're referring to: https://github.com/linuxppc/issues/issues/435 No progress yet. AFAIK only XICS guests are exposed; XIVE doesn't use RTAS calls.