On 4/30/2025 10:26 PM, neil.armstr...@linaro.org wrote: > On 30/04/2025 18:39, Konrad Dybcio wrote: >> On 4/30/25 6:19 PM, neil.armstr...@linaro.org wrote: >>> On 30/04/2025 17:36, Konrad Dybcio wrote: >>>> On 4/30/25 4:49 PM, neil.armstr...@linaro.org wrote: >>>>> On 30/04/2025 15:09, Konrad Dybcio wrote: >>>>>> On 4/30/25 2:49 PM, neil.armstr...@linaro.org wrote: >>>>>>> On 30/04/2025 14:35, Konrad Dybcio wrote: >>>>>>>> On 4/30/25 2:26 PM, neil.armstr...@linaro.org wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> On 30/04/2025 13:34, Konrad Dybcio wrote: >>>>>>>>>> From: Konrad Dybcio <konrad.dyb...@linaro.org> >>>>>>>>>> >>>>>>>>>> Add speebin data for A740, as found on SM8550 and derivative >>>>>>>>>> SoCs. >>>>>>>>>> >>>>>>>>>> For non-development SoCs it seems that "everything except >>>>>>>>>> FC_AC, FC_AF >>>>>>>>>> should be speedbin 1", but what the values are for said >>>>>>>>>> "everything" are >>>>>>>>>> not known, so that's an exercise left to the user.. >>>>>>>>>> >>>>>>>>>> Reviewed-by: Dmitry Baryshkov <dmitry.barysh...@linaro.org> >>>>>>>>>> Signed-off-by: Konrad Dybcio <konrad.dyb...@linaro.org> >>>>>>>>>> --- >>>>>>>>>> drivers/gpu/drm/msm/adreno/a6xx_catalog.c | 8 ++++++++ >>>>>>>>>> 1 file changed, 8 insertions(+) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c b/ >>>>>>>>>> drivers/gpu/drm/msm/adreno/a6xx_catalog.c >>>>>>>>>> index >>>>>>>>>> 53e2ff4406d8f0afe474aaafbf0e459ef8f4577d..61daa331567925e529deae5e25d6fb63a8ba8375 >>>>>>>>>> 100644 >>>>>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c >>>>>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c >>>>>>>>>> @@ -11,6 +11,9 @@ >>>>>>>>>> #include "a6xx.xml.h" >>>>>>>>>> #include "a6xx_gmu.xml.h" >>>>>>>>>> +#include <linux/soc/qcom/smem.h> >>>>>>>>>> +#include <linux/soc/qcom/socinfo.h> >>>>>>>>>> + >>>>>>>>>> static const struct adreno_reglist a612_hwcg[] = { >>>>>>>>>> {REG_A6XX_RBBM_CLOCK_CNTL_SP0, 0x22222222}, >>>>>>>>>> {REG_A6XX_RBBM_CLOCK_CNTL2_SP0, 0x02222220}, >>>>>>>>>> @@ -1431,6 +1434,11 @@ static const struct adreno_info >>>>>>>>>> a7xx_gpus[] = { >>>>>>>>>> }, >>>>>>>>>> .address_space_size = SZ_16G, >>>>>>>>>> .preempt_record_size = 4192 * SZ_1K, >>>>>>>>>> + .speedbins = ADRENO_SPEEDBINS( >>>>>>>>>> + { ADRENO_SKU_ID(SOCINFO_FC_AC), 0 }, >>>>>>>>>> + { ADRENO_SKU_ID(SOCINFO_FC_AF), 0 }, >>>>>>>>>> + /* Other feature codes (on prod SoCs) should >>>>>>>>>> match to speedbin 1 */ >>>>>>>>> >>>>>>>>> I'm trying to understand this sentence. because reading patch >>>>>>>>> 4, when there's no match >>>>>>>>> devm_pm_opp_set_supported_hw() is simply never called so how >>>>>>>>> can it match speedbin 1 ? >>>>>>>> >>>>>>>> What I'm saying is that all other entries that happen to be >>>>>>>> possibly >>>>>>>> added down the line are expected to be speedbin 1 (i.e. BIT(1)) >>>>>>>> >>>>>>>>> Before this change the fallback was speedbin = BIT(0), but this >>>>>>>>> disappeared. >>>>>>>> >>>>>>>> No, the default was to allow speedbin mask ~(0U) >>>>>>> >>>>>>> Hmm no: >>>>>>> >>>>>>> supp_hw = fuse_to_supp_hw(info, speedbin); >>>>>>> >>>>>>> if (supp_hw == UINT_MAX) { >>>>>>> DRM_DEV_ERROR(dev, >>>>>>> "missing support for speed-bin: %u. Some OPPs may >>>>>>> not be supported by hardware\n", >>>>>>> speedbin); >>>>>>> supp_hw = BIT(0); /* Default */ >>>>>>> } >>>>>>> >>>>>>> ret = devm_pm_opp_set_supported_hw(dev, &supp_hw, 1); >>>>>>> if (ret) >>>>>>> return ret; >>>>>> >>>>>> Right, that's my own code even.. >>>>>> >>>>>> in any case, the kernel can't know about the speed bins that aren't >>>>>> defined and here we only define bin0, which doesn't break things >>>>>> >>>>>> the kernel isn't aware about hw with bin1 with or without this change >>>>>> so it effectively doesn't matter >>>>> >>>>> But it's regression for the other platforms, where before an >>>>> unknown SKU >>>>> mapped to supp_hw=BIT(0) >>>>> >>>>> Not calling devm_pm_opp_set_supported_hw() is a major regression, >>>>> if the opp-supported-hw is present, the OPP will be rejected: >>>> >>>> A comment in patch 4 explains that. We can either be forwards or >>>> backwards >>>> compatible (i.e. accept a limited amount of >>>> speedbin_in_driver x speedbin_in_dt combinations) >>> >>> I have a hard time understanding the change, please be much more verbose >>> in the cover letter and commit messages. >>> >>> The fact that you do such a large change in the speedbin policy in >>> patch 4 >>> makes it hard to understand why it's needed in the first place. >>> >>> Finally I'm very concerned that "old" SM8550 DT won't work on new >>> kernels, >>> this is frankly unacceptable, and this should be addressed in the first >>> place. >>> >>> The nvmem situation was much simple, where we considered we added the >>> nvmem >>> property at the same time as opp-supported-hw in OPPs, but it's no >>> more the >>> case. >>> >>> So I think the OPP API should probably be extended to address this >>> situation >>> first, since if we do not have the opp-supported-hw in OPPs, all OPPs >>> are safe. >>> >>> So this code: >>> count = of_property_count_u32_elems(np, "opp-supported-hw"); >>> if (count <= 0 || count % levels) { >>> dev_err(dev, "%s: Invalid opp-supported-hw property (%d)\n", >>> __func__, count); >>> return false; >>> } >>> should return true in this specific case, like a >>> supported_hw_failsafe mode. >> >> Not really. opp-supported-hws = <BIT(0)> usually translates to the >> *fastest* >> bin in our case, so perhaps that change I made previously to default >> to it >> wasn't the wisest. In other words, all slower SKUs that weren't added >> to the >> kernel catalog & dt are potentially getting overclocked, which is no >> bueno. >> That is not always the case, but it most certainly has been for a >> number of >> years. >> >> Old DTs in this case would be DTs lacking opp-supported-hw with the >> kernel >> having speedbin tables. The inverse ("too new DTs") case translates into >> "someone put some unexpected stuff in dt and the kernel has no idea what >> to do with it". >> In this context, old DTs would continue to work after patch 4, as the >> first >> early return in adreno_set_speedbin() takes care of that. > > No. > > With only patches 1-4 applied (keep "old" DT) on today's -next: > > SM8550-QRD: > [ 7.574569] msm_dpu ae01000.display-controller: bound ae94000.dsi > (ops dsi_ops [msm]) > [ 7.586578] msm_dpu ae01000.display-controller: bound > ae90000.displayport-controller (ops msm_dp_display_comp_ops [msm]) > [ 7.597886] adreno 3d00000.gpu: error -EINVAL: Unknown speed bin fuse > value: 0x2 > [ 7.605518] msm_dpu ae01000.display-controller: failed to load adreno > gpu > [ 7.612599] msm_dpu ae01000.display-controller: failed to bind > 3d00000.gpu (ops a3xx_ops [msm]): -22 > > SM8550-HDK: > [ 10.137558] msm_dpu ae01000.display-controller: bound ae94000.dsi > (ops dsi_ops [msm]) > [ 10.151796] msm_dpu ae01000.display-controller: bound > ae90000.displayport-controller (ops msm_dp_display_comp_ops [msm]) > [ 10.163358] adreno 3d00000.gpu: error -EINVAL: Unknown speed bin fuse > value: 0x2 > [ 10.171066] msm_dpu ae01000.display-controller: failed to load adreno > gpu > [ 10.178118] msm_dpu ae01000.display-controller: failed to bind > 3d00000.gpu (ops a3xx_ops [msm]): -22 > > With: > =================><================== > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c b/drivers/gpu/ > drm/msm/adreno/a6xx_catalog.c > index 61daa3315679..7cac14a585a9 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_catalog.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_catalog.c > @@ -1435,6 +1435,7 @@ static const struct adreno_info a7xx_gpus[] = { > .address_space_size = SZ_16G, > .preempt_record_size = 4192 * SZ_1K, > .speedbins = ADRENO_SPEEDBINS( > + { ADRENO_SKU_ID(SOCINFO_FC_AB), 1 }, > { ADRENO_SKU_ID(SOCINFO_FC_AC), 0 }, > { ADRENO_SKU_ID(SOCINFO_FC_AF), 0 }, > /* Other feature codes (on prod SoCs) should > match to speedbin 1 */ > =================><================== > > SM8550-QRD: > [ 7.681816] msm_dpu ae01000.display-controller: bound ae94000.dsi > (ops dsi_ops [msm]) > [ 7.694479] msm_dpu ae01000.display-controller: bound > ae90000.displayport-controller (ops msm_dp_display_comp_ops [msm]) > [ 7.705784] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 7.714322] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 7.722851] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 7.722853] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 7.722855] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 7.722856] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 7.722858] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 7.722860] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 7.722861] adreno 3d00000.gpu: _of_add_opp_table_v2: no supported OPPs > [ 7.722863] adreno 3d00000.gpu: [drm:adreno_gpu_init [msm]] *ERROR* > Unable to set the OPP table > > SM8550-HDK: > [ 10.119986] msm_dpu ae01000.display-controller: bound ae94000.dsi > (ops dsi_ops [msm]) > [ 10.133872] msm_dpu ae01000.display-controller: bound > ae90000.displayport-controller (ops msm_dp_display_comp_ops [msm]) > [ 10.147377] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 10.161640] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 10.171198] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 10.179756] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 10.188313] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 10.196868] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 10.205424] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 10.226025] adreno 3d00000.gpu: _opp_is_supported: Invalid opp- > supported-hw property (-22) > [ 10.234589] adreno 3d00000.gpu: _of_add_opp_table_v2: no supported OPPs > [ 10.247165] adreno 3d00000.gpu: [drm:adreno_gpu_init [msm]] *ERROR* > Unable to set the OPP table > > This behaves exactly as I said, so please fix it.
Konrad, iirc, we discussed this in one of the earlier revision. There is a circular dependency between the driver change for SKU support and the dt change that adds supported_hw bitmask in opp-table. Only scenario it works is when you add these to the initial patches series of a new GPU. It will be very useful if we can break this circular dependency. -Akhil. > > Neil > >> >> Konrad > >