On 05/07/2025, Luca Ceresoli wrote: > Hello Liu, Hi Luca,
> > thanks for your further feedback. > > On Tue, 6 May 2025 10:24:18 +0800 > Liu Ying <victor....@nxp.com> wrote: > >> On 04/30/2025, Luca Ceresoli wrote: >>> Hello Liu, >> >> Hi Luca, >> >>> >>> On Tue, 29 Apr 2025 10:10:55 +0800 >>> Liu Ying <victor....@nxp.com> wrote: >>> >>>> Hi, >>>> >>>> On 04/25/2025, Luca Ceresoli wrote: >>>>> This is the new API for allocating DRM bridges. >>>>> >>>>> This driver embeds an array of channels in the main struct, and each >>>>> channel embeds a drm_bridge. This prevents dynamic, refcount-based >>>>> deallocation of the bridges. >>>>> >>>>> To make the new, dynamic bridge allocation possible: >>>>> >>>>> * change the array of channels into an array of channel pointers >>>>> * allocate each channel using devm_drm_bridge_alloc() >>>>> * adapt the code wherever using the channels >>>>> >>>>> Signed-off-by: Luca Ceresoli <luca.ceres...@bootlin.com> >>> >>> [...] >>> >>>>> @@ -345,8 +351,8 @@ static int imx8qxp_pc_bridge_probe(struct >>>>> platform_device *pdev) >>>>> free_child: >>>>> of_node_put(child); >>>>> >>>>> - if (i == 1 && pc->ch[0].next_bridge) >>>>> - drm_bridge_remove(&pc->ch[0].bridge); >>>>> + if (i == 1 && pc->ch[0]->next_bridge) >>>> >>>> Since this patch makes pc->ch[0] and pc->ch[1] be allocated separately, >>>> pc->ch[0] could be NULL if channel0 is not available, hence a NULL pointer >>>> dereference here... >>> >>> See below for this. >>> >>>>> + drm_bridge_remove(&pc->ch[0]->bridge); >>>>> >>>>> pm_runtime_disable(dev); >>>>> return ret; >>>>> @@ -359,7 +365,7 @@ static void imx8qxp_pc_bridge_remove(struct >>>>> platform_device *pdev) >>>>> int i; >>>>> >>>>> for (i = 0; i < 2; i++) { >>>>> - ch = &pc->ch[i]; >>>>> + ch = pc->ch[i]; >>>>> >>>>> if (!ch->is_available) >>>> >>>> ...and here too. >>> >>> This is indeed a bug, I should have checked the pointer for being >>> non-NULL. >>> >>> Looking at that more closely, I think the is_available flag can be >>> entirely removed now. The allocation itself (ch != NULL) now is >>> equivalent. Do you think my reasoning is correct? >>> >>> Ouch! After writing the previous paragraph I realized you proposed this >>> a few lines below! OK, removing is_available. :) >>> >>> [...] >>> >>>> On top of this patch series, this issue doesn't happen if I apply the below >>>> change: >>> >>> [...] >>> >>>> @@ -351,7 +349,7 @@ static int imx8qxp_pc_bridge_probe(struct >>>> platform_device *pdev) >>>> free_child: >>>> of_node_put(child); >>>> >>>> - if (i == 1 && pc->ch[0]->next_bridge) >>>> + if (i == 1 && pc->ch[0]) >>>> drm_bridge_remove(&pc->ch[0]->bridge); >>> >>> Unrelated to this patch, but as I looked at it more in depth now, I'm >>> not sure this whole logic is robust, even in the original code. >>> >>> The 'i == 1' check here seems to mean "if some error happened when >>> handling channel@1, that means channel@0 was successfully initialized, >>> so let's clean up channel 0". >>> >>> However my understanding of the bindings is that device tree is allowed >>> to have the channel@1 node before the channel@0 node (or even channel@1 >>> without channel@0, but that's less problematic here). >>> >>> In such case (channel@1 before channel@0), this would happen: >>> >>> 1. alloc and init ch[1], all OK >>> 2. alloc and init ch[0], an error happens >>> (e.g. of_graph_get_remote_node() fails) >>> >>> So we'd reach the free_child: label, and we should call >>> drm_bridge_remove() for ch[1]->bridge, but there's no code to do that. >>> >>> To be robust in such a case, I think both channels need to be checked >>> independently, as the status of one does not imply the status of the >>> other. E.g.: >>> >>> for (i = 0; i < 2; i++) >>> if (pc->ch[i] && pc->ch[i]->next_bridge) >>> drm_bridge_remove(&pc->ch[i]->bridge); >>> >>> (which is similar to what .remove() does after the changes discussed in >>> this thread, and which I have queued for v3) >>> >>> What's your opinion? Do you think I missed anything? >> >> The pixel combiner DT node would be added in imx8-ss-dc{0,1}.dtsi, please >> see the case for imx8-ss-dc0.dtsi introduced by an in-flight patch[1]. As >> channel@{0,1} child nodes always exist(DT overlay cannot effectively delete >> any of them) and channel@0 always comes first, there is no problematic case. > > I'm not questioning what existing and future dts files (will) contain, > and surely I don't see a good reason someone would write channel@1 > before channel@0. > > My point is: > > - the bindings _allow_ channel1 before channel@0 > - the error management code after the free_child label won't work > correctly if channel1 is before channel@0 in the device tree > > IOW the driver is not robust against all legal device tree descriptions, > and it could be easily made robust using the example code in my > previous e-mail (quoted a few lines above). > > If you agree about this I'll be happy to send a patch doing that change. > If you think I'm wrong, I won't fight a battle. This topic is > orthogonal to the change I'm introducing in this patch, and I can > continue the conversion independently from this discussion. I don't think it is necessary to do that change for now. When someone really comes across this issue, we may make the error management code robust. > >>> Thanks for taking the time to dig into this! >> >> After looking into this patch and patch 31(though I've already provided my >> A-b) >> more closely, I think the imx8qxp_pc and imx8{qm,qxp}_ldb main structures >> should have the same life time with the embedded DRM bridges, because for >> example the clk_apb clock in struct imx8qxp_pc would be accessed by the >> imx8qxp_pc_bridge_mode_set DRM bridge callback. But, IIUC, your patches >> extend >> the life time for the embedded channel/bridge structures only, but not for >> the >> main structures. What do you think ? > > I see you concern, but I'm sure the change I'm introducing is not > creating the problem you are concerned about. > > The key aspect is that my patch is merely changing the lifetime of the > _allocation_ of the drm_bridge, not its usage. On drm_bridge_remove() > the bridge is removed from its encoder chain and it is completely not > reachable, both before and after my patch. With my patch it is not > freed immediately, but it's just a piece of "wasted" memory that is > still allocated until elsewhere in the kernel there are pointers to it, > to avoid use-after-free. > > With this explanation, do you think my patch is correct (after fixing > the bug we already discussed of course)? I tend to say your patch is not correct because we'll eventually make sure that removing a bridge module is safe when doing atomic commit, which means the main structures should have the same life time with the DRM bridges. > > Best regards, > Luca > -- Regards, Liu Ying