On 08/01/2018 09:26 AM, Michael Ellerman wrote: > Frank Rowand <frowand.l...@gmail.com> writes: >> On 07/31/18 07:17, Rob Herring wrote: >>> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman <m...@ellerman.id.au> >>> wrote: >>>> >>>> Hi Rob/Frank, >>>> >>>> I think we might have a problem with the phandle_cache not interacting >>>> well with of_detach_node(): >>> >>> Probably needs a similar fix as this commit did for overlays: >>> >>> commit b9952b5218added5577e4a3443969bc20884cea9 >>> Author: Frank Rowand <frank.row...@sony.com> >>> Date: Thu Jul 12 14:00:07 2018 -0700 >>> >>> of: overlay: update phandle cache on overlay apply and remove >>> >>> A comment in the review of the patch adding the phandle cache said that >>> the cache would have to be updated when modules are applied and removed. >>> This patch implements the cache updates. >>> >>> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of >>> of_find_node_by_phandle()") >>> Reported-by: Alan Tull <at...@kernel.org> >>> Suggested-by: Alan Tull <at...@kernel.org> >>> Signed-off-by: Frank Rowand <frank.row...@sony.com> >>> Signed-off-by: Rob Herring <r...@kernel.org> >> >> Agreed. Sorry about missing the of_detach_node() case. >> >> >>> Really what we need here is an "invalidate phandle" function rather >>> than free and re-allocate the whole damn cache. >> >> The big hammer approach was chosen to avoid the race conditions that >> would otherwise occur. OF does not have a locking strategy that >> would be able to protect against the races. >> >> We could maybe implement a slightly smaller hammer by (1) disabling >> the cache, (2) invalidate a phandle entry in the cache, (3) re-enable >> the cache. That is an off the cuff thought - I would have to look >> a little bit more carefully to make sure it would work. >> >> But I don't see a need to add the complexity of the smaller hammer >> or the bigger hammer of proper locking _unless_ we start seeing that >> the cache is being freed and re-allocated frequently. For overlays >> I don't expect the high frequency because it happens on a per overlay >> removal basis (not per node removal basis). For of_detach_node() the >> event _is_ on a per node removal basis. Michael, do you expect node >> removals to be a frequent event with low latency being important? If >> so, a rough guess on what the frequency would be? > > No in practice we expect them to be fairly rare and in the thousands at > most I think. > > TBH you could just disable the phandle_cache on the first detach and > that would be fine by me. I don't think we've ever noticed phandle > lookups being much of a problem for us on our hardware.
I ran a couple of migrations with the calls to of_free_phandle_cache() ... of_populate_phandle_cache() bracketing the body of arch/powerpc/platforms/pseries/mobility.c:post_mobility_fixup() with a patch like, diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index e245a88..efc9442 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -22,6 +22,9 @@ #include <asm/rtas.h> #include "pseries.h" +extern int of_free_phandle_cache(void); +extern void of_populate_phandle_cache(void); + static struct kobject *mobility_kobj; struct update_props_workarea { @@ -343,6 +346,8 @@ void post_mobility_fixup(void) rc = rtas_call(activate_fw_token, 0, 1, NULL); } while (rtas_busy_delay(rc)); + of_free_phandle_cache(); + if (rc) printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc); @@ -354,6 +359,8 @@ void post_mobility_fixup(void) /* Possibly switch to a new RFI flush type */ pseries_setup_rfi_flush(); + of_populate_phandle_cache(); + return; } and did not observe the warnings/crashes that have been plaguing me. We will need to move their prototypes out of drivers/of/of_private.h to include/linux/of.h, though. > > cheers > Michael -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com