Hello Marek, On Thu, 10 Dec 2015 04:53:01 +0100, Marek Vasut <ma...@denx.de> wrote: > On Thursday, December 10, 2015 at 03:27:08 AM, Tom Rini wrote: > > On Wed, Dec 09, 2015 at 03:09:13PM +0100, Marek Vasut wrote: > > > On Wednesday, December 09, 2015 at 02:02:03 PM, Stefan Roese wrote: > > > > Hi All! > > > > > > > > On 09.12.2015 09:09, Albert ARIBAUD wrote: > > > > > On Tue, 8 Dec 2015 14:43:42 +0100, Marek Vasut <ma...@denx.de> wrote: > > > > >> The arch/arm/lib/cache-cp15.c checks for CONFIG_ARMV7 and if this > > > > >> macro is set, it configures TTBR0 register. This register must be > > > > >> configured for the cache on ARMv7 to operate correctly. > > > > >> > > > > >> The problem is that noone actually sets the CONFIG_ARMV7 macro and > > > > >> thus the TTBR0 is not configured at all. On SoCFPGA, this produces > > > > >> all sorts of minor issues which are hard to replicate, for example > > > > >> certain USB sticks are not detected or QSPI NOR sometimes fails to > > > > >> write pages completely. > > > > >> > > > > >> The solution is to replace CONFIG_ARMV7 test with CONFIG_CPU_V7 one. > > > > >> This is correct because the code which added the test(s) for > > > > >> CONFIG_ARMV7 was added shortly after CONFIG_ARMV7 was replaced by > > > > >> CONFIG_CPU_V7 and this code was not adjusted correctly to reflect > > > > >> that change. > > > > > > > > > > Note: > > > > > > > > > > As discussed with Marek on IRC, this patch enables what is supposed > > > > > to be the correct MMU settings for ARMv7, which causes a sharp > > > > > Ethernet performance drop (40%) but also a strong general memory > > > > > access performance hit (a copy of 4 MB is almost instantaneous > > > > > without the patch and takes 2-3 seconds with it). > > > > > > > > I've tested Marek's patch on my current Armada XP platform (also > > > > ARMv7). And also noticed a performance drop by about 30-40%. > > > > The dhrystone command can be used for this testing as well > > > > (CONFIG_CMD_DHRYSTONE). > > > > > > > > So this is definitely a NAK from me to this current patch. > > > > > > This is a wrong approach, this patch just fixes another patch which was > > > broken from the getgo and the TTBR0 reg was not programmed correctly due > > > to that. This patch must be applied, but what we need to decide is > > > whether we need _another_ patch which removes the S-bit from the > > > pagetable or how to deal with that part. > > > > No, we don't have to apply this patch. The original patch here > > (97840b5) was likely not run-time tested when submitted as it was using > > the then-defunct CONFIG_ARMV7 symbol, and was likely a bugfix in an > > internal tree that was pushed upstream (which is on the whole, good). > > I find it surprising that such a patch, which modifies common code even, > was not scrutinized enough. > > > So in sum U-Boot has always been as broken in some specific regard > > before that patch as it is after that patch. So we have time to see > > what we need to do about enabling this feature correctly and even > > ensuring that we don't happen to say break things once Linux is up too. > > In that case, I am looking forward to better suggestions. > > I still disagree that this patch should not be applied, it corrects code > which was broken. The slowdown caused by the original patch is a separate > issue and should be corrected by a separate patch. If these two patches > get applied at once, that is fine.
This patch has several effects: - it selects proper ARMv7 translation table level 1 bit definitions; - it provides proper ARMv7 definitions for WT/WB/WA; - it selects proper ARMv7 settings for TTBR0. All these are correct as per the docs I have (although I may have missed something during the readings (and cross-readings with Marek) of these last hours/days. Now, one specific effect goes against performance, and it is the setting of bit S in all TT entries. This bit makes the corresponding region shareable, but for all I know, in U-Boot we don't have more than one core accessing the same memory or registers sets so -- at least for the major part of its execution -- there is no reason for any region to be shareable. /That/ effect I certainly don't want. The rest I am fine with. So, if we apply this patch minus the inclusion of the S bit in the definition of DCACHE_OFF (and hence of all DCACHE_* enum members after it), then we get code that is more correct from a theoretical standpoint, and does not degrade performance (which is a hint, admittedly weak, to me that it is not incorrect from a practical standpoint. It does not fix the USB and QSPI issues in the socfpga case, but I am not sure these are caused by the S bit being zero. > Best regards, > Marek Vasut Amicalement, -- Albert. _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot