On 03/17/2015 08:57 AM, popcorn mix wrote:
On 17/03/15 03:04, Stephen Warren wrote:
It would be nice though if someone from the RPi Foundation could comment
on the exact effect of the upper bus address bits, and why 0xc would
work for RPi2 but 0x4 for the RPi 1. I wonder if the ARM cache status
(enabled, disabled) interacts with the GPU cache enable in any way, e.g.
burst vs. non-burst transactions on the bus or something? That's about
the only reason I can see for the RPi Foundation kernel working with 0x4
bus addresses on both chips, but U-Boot needing something different on
RPi2...

Dom, for reference, see:
http://lists.denx.de/pipermail/u-boot/2015-March/207947.html
http://lists.denx.de/pipermail/u-boot/2015-March/thread.html#207947

Thanks for the great explanation. I'll have to bookmark/archive it:-)

First, remember that 2835 is a large GPU with a small ARM attached. On
some platforms the ARM is not even used.
The GPU boots first and may wake the arm. The GPU is the centre of the
universe, and the ARM has to fit in.

Okay, I'll try to explain what goes on. Here are my definitions of some
terms:

bus address: a VideoCore/GPU address. The lower 30-bits define the 1G of
addressable memory. The top two bits define the caching alias.
physical address: An ARM side address given to the VC MMU. This is a 30
bit address space.

The GPU always uses bus addresses. GPU bus mastering peripherals (like
DMA) use bus addresses. The ARM uses physical addresses.

VC MMU: A coarse MMU used by the arm for accessing GPU memory. Each page
is 16M and there are 64 pages. This maps 30-bits of physical address to
32-bits of bus address.
>
The setup of VC MMU is handled by the GPU and by default the mapping is:
2835: first 32 pages map physical addresses 0x00000000-0x1fffffff to bus
addresses 0x40000000-0x5ffffffff. The next page maps physical adddress
0x20000000 to 0x20ffffff to bus addresses 0x7e000000 to 0x7effffff
>
2836: first 63 pages map physical addresses 0x00000000-0x3effffff to bus
addresses 0xc0000000-0xfefffffff. The next page maps physical adddress
0x3f000000 to 0x3fffffff to bus addresses 0x7e000000 to 0x7effffff

OK, this explains why in U-Boot, we need to OR in 0x40000000 on bcm2835 and 0xc0000000 on bcm2836; that matches the VC MMU setup.

I guess we need to fix the U-Boot mailbox driver too, and many things in the upstream RPi kernel.

I have two more questions:

1)

Do the RPi 1 and RPi 2 use different kernel binaries in the RPi Foundation's images? I'd assumed there was a single unified binary which supported both. The reason I ask is that I see:

https://github.com/raspberrypi/linux/blob/rpi-3.18.y/arch/arm/mach-bcm2708/include/mach/memory.h#L38

#ifdef CONFIG_BCM2708_NOL2CACHE
#define _REAL_BUS_OFFSET UL(0xC0000000) /* don't use L1 or L2 caches */
#else
#define _REAL_BUS_OFFSET UL(0x40000000) /* use L2 cache */
#endif

That's identical in the mach-bcm2709 version too. However, arch/arm/mach-bcm270[89]/Kconfig's entry for that config option:

config BCM2708_NOL2CACHE
        bool "Videocore L2 cache disable"
        depends on MACH_BCM2709
        default y
        help
        Do not allow ARM to use GPU's L2 cache. Requires disable_l2cache in 
config.txt.

Has "default n" for the bcm2708 version and "default y" for the bcm2709 version. If I'd noticed that difference in default value, it would have been a big clue that what I proposed in the U-Boot patch was correct! Anyway, this implies that there are separate kernel binaries for the RPi 1 and RPi 2, since otherwise those default values wouldn't work.

2)

I assume the SDHCI controller (RPi SD card, CM eMMC) is affected by this just as much; we need to use bus addresses not ARM physical addresses when programming any DMA there?

Perhaps this would explain why I had issues with the eMMC on the CM (I think only in the kernel though, whereas U-Boot may have been fine; I'll have to check)

...
So, on 2835 the ARM has a 16K L1 cache and no L2 cache. The GPU has a
128M L2 cache. The GPU's L2 cache is accessible from the ARM but it's
not particularly close (i.e. not very fast).
However mapping through the L2 allocating alias (0x4) was shown to be
beneficial on 2835, so that is the alias we use.

The situation is different on 2836. The ARM has a 32K L1 cache and a
512M integrated/fast L2 cache. Additionally going through the
smaller/slower GPU L2 is bad for performance.
So, we map through the SDRAM alias (0xc) and avoid the GPU L2 cache.

I assume 128M and 512M there should be 128K and 512K?
_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Reply via email to