Michael Ellerman <m...@ellerman.id.au> writes: > Nathan Lynch via B4 Submission Endpoint > <devnull+nathanl.linux.ibm....@kernel.org> writes: >> From: Nathan Lynch <nath...@linux.ibm.com> >> >> Some code that runs early in boot calls RTAS functions that can return >> -2 or 990x statuses, which mean the caller should retry. An example is >> pSeries_cmo_feature_init(), which invokes ibm,get-system-parameter but >> treats these benign statuses as errors instead of retrying. >> >> pSeries_cmo_feature_init() and similar code should be made to retry >> until they succeed or receive a real error, using the usual pattern: >> >> do { >> rc = rtas_call(token, etc...); >> } while (rtas_busy_delay(rc)); >> >> But rtas_busy_delay() will perform a timed sleep on any 990x >> status. This isn't safe so early in boot, before the CPU scheduler and >> timer subsystem have initialized. >> >> The -2 RTAS status is much more likely to occur during single-threaded >> boot than 990x in practice, at least on PowerVM. This is because -2 >> usually means that RTAS made progress but exhausted its self-imposed >> timeslice, while 990x is associated with concurrent requests from the >> OS causing internal contention. Regardless, according to the language >> in PAPR, the OS should be prepared to handle either type of status at >> any time. >> >> Add a fallback path to rtas_busy_delay() to handle this as safely as >> possible, performing a small delay on 990x. Include a counter to >> detect retry loops that aren't making progress and bail out. >> >> This was found by inspection and I'm not aware of any real >> failures. However, the implementation of rtas_busy_delay() before >> commit 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements") >> was not susceptible to this problem, so let's treat this as a >> regression. >> >> Signed-off-by: Nathan Lynch <nath...@linux.ibm.com> >> Fixes: 38f7b7067dae ("powerpc/rtas: rtas_busy_delay() improvements") >> --- >> arch/powerpc/kernel/rtas.c | 48 >> +++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 47 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c >> index 795225d7f138..ec2df09a70cf 100644 >> --- a/arch/powerpc/kernel/rtas.c >> +++ b/arch/powerpc/kernel/rtas.c >> @@ -606,6 +606,46 @@ unsigned int rtas_busy_delay_time(int status) >> return ms; >> } >> >> +/* >> + * Early boot fallback for rtas_busy_delay(). >> + */ >> +static bool __init rtas_busy_delay_early(int status) >> +{ >> + static size_t successive_ext_delays __initdata; >> + bool ret; > > I think the logic would be easier to read if this was called "wait", but > maybe that's just me.
Maybe "retry"? That communicates what the function is telling callers to do. > >> + switch (status) { >> + case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX: >> + /* >> + * In the unlikely case that we receive an extended >> + * delay status in early boot, the OS is probably not >> + * the cause, and there's nothing we can do to clear >> + * the condition. Best we can do is delay for a bit >> + * and hope it's transient. Lie to the caller if it >> + * seems like we're stuck in a retry loop. >> + */ >> + mdelay(1); >> + ret = true; >> + successive_ext_delays += 1; >> + if (successive_ext_delays > 1000) { >> + pr_err("too many extended delays, giving up\n"); >> + dump_stack(); >> + ret = false; > > Shouldn't we zero successive_ext_delays here? > > Otherwise a subsequent (possibly different) RTAS call will immediately > fail out here if it gets a single extended delay from RTAS, won't it? Yes, will fix. Thanks. > >> + } >> + break; >> + case RTAS_BUSY: >> + ret = true; >> + successive_ext_delays = 0; >> + break; >> + default: >> + ret = false; >> + successive_ext_delays = 0; >> + break; >> + } >> + >> + return ret; >> +} >> + >> /** >> * rtas_busy_delay() - helper for RTAS busy and extended delay statuses >> * >> @@ -624,11 +664,17 @@ unsigned int rtas_busy_delay_time(int status) >> * * false - @status is not @RTAS_BUSY nor an extended delay hint. The >> * caller is responsible for handling @status. >> */ >> -bool rtas_busy_delay(int status) >> +bool __ref rtas_busy_delay(int status) > > Can you explain the __ref in the change log. Yes, will add that. >> { >> unsigned int ms; >> bool ret; >> >> + /* >> + * Can't do timed sleeps before timekeeping is up. >> + */ >> + if (system_state < SYSTEM_SCHEDULING) >> + return rtas_busy_delay_early(status); >> + >> switch (status) { >> case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX: >> ret = true; >> > > cheers