On Wed, Jun 30, 2021 at 7:07 PM Marek Behún <marek.be...@nic.cz> wrote: > > On Wed, 30 Jun 2021 17:51:24 +0200 > Robert Marko <robert.ma...@sartura.hr> wrote: > > > On Wed, Jun 30, 2021 at 3:19 PM Marek Behún <marek.be...@nic.cz> > > wrote: > > > > > > Hello Robert, > > > > > > I am writing regarding commit > > > mvebu: 5.10 fix DVFS caused random boot crashes > > > > > > https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=080a0b74e39d159eecf69c468debec42f28bf4d8 > > > in OpenWRT. > > > > > > This commit reverts the one patch of a3720 cpufreq driver, but not > > > the subsequent ones. > > > > > > Your commit message says that some 1.2 GHz SOCs are unstable with > > > the fix. Did you also test this with the subsequent patches, which > > > are now in stable kernels? I guess the answer is yes, because all > > > these patches were backported to 5.10.37. > > > > Hi Marek, > > > > Yes, the rest of the patches were there as well. > > > > > > I am of the opinion that a better approach would be to > > > - either disable cpufreq for 1.2 GHz variants > > > - fix a3720 cpufreq driver to only scale up to 1 GHz on 1.2 GHz > > > variant > > > > I would prefer limiting it to 1GHz as that would not cause > > performance issues, but 1GHz models could have the same issue as well. > > This is because the voltages that are set as a minimum are from the > > testing that Pali and the Turris guys did, but it really depends on > > the SoC batch you receive. > > The thing is you cannot limit it to 1 GHz in kernel, because when the > device is booted to 1.2 GHz the dividers are {1, 2, 4, 6}, so the > available frequencies are 1200 MHz, 600 MHz, 300 MHz, 200 MHz. > > If you want to limit it to 1 GHz, you need to build the flash-image.bin > with CLOCKSPRESET=CPU_1000_DDR_800 and reflash the device.
This is an issue and the reason why I have devices running old ATF+U-boot as the customer deployed more than a thousand of these and I can't really pull the devices for reflashing. > > With your revert the cpufreq scaling may be stable, but the CPU clock > switches to TBG-A-P, which is 750 MHz. > The result is that you are scaling, but you are scaling between > 750 MHz, 375 MHz, 187.5 MHz, 125 MHz > > Which is even worse than 1 GHz variant, where the top frequecny with > your revert is 800 MHz. Yes, I gathered that from the commit itself as previously they were running at 750/800 MHz and that hid the whole voltage issue for a while. > > > > > > > Since the approach you've taken now (reverting the patch) basically > > > changes the CPU parnet clock to DDR clock, which is just wrong. > > > Worse is that you are doing this for everybody, not just for the 1.2 > > > GHz variants. > > > > > > What do you think? > > > > I understand that it was not the best solution, but something had to > > be done as I was not able to even finish booting on multiple boards > > before crashing. It just reverted the things back to the previous > > state. > > > > I really could not figure a proper solution even after being in touch > > with Pali, and contacting > > GlobalScale. > > > > This is an issue caused by Marvell simply ignoring the issue and > > refusing to publish > > a fix or release the OTP and AVS docs as they all have a validated > > voltage in the OTP > > somewhere. > > I have sent patch to upstream kernel disabling cpufreq on 1.2 GHz > models. I think this is the most sane solution for now, since we > simply do not know how to scale properly on this variant. > > Once the patch is accepted, would you please remove your revert? Sure, not an issue. Hopefully, Marvell will finally step up and provide some clarity. Regards, Robert > > Marek -- Robert Marko Staff Embedded Linux Engineer Sartura Ltd. Lendavska ulica 16a 10000 Zagreb, Croatia Email: robert.ma...@sartura.hr Web: www.sartura.hr _______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel