On 17/03/15 03:04, Stephen Warren wrote:
It would be nice though if someone from the RPi Foundation could comment
on the exact effect of the upper bus address bits, and why 0xc would
work for RPi2 but 0x4 for the RPi 1. I wonder if the ARM cache status
(enabled, disabled) interacts with the GPU cache enable in any way, e.g.
burst vs. non-burst transactions on the bus or something? That's about
the only reason I can see for the RPi Foundation kernel working with 0x4
bus addresses on both chips, but U-Boot needing something different on
RPi2...

Dom, for reference, see:
http://lists.denx.de/pipermail/u-boot/2015-March/207947.html
http://lists.denx.de/pipermail/u-boot/2015-March/thread.html#207947

First, remember that 2835 is a large GPU with a small ARM attached. On some 
platforms the ARM is not even used.
The GPU boots first and may wake the arm. The GPU is the centre of the 
universe, and the ARM has to fit in.


Okay, I'll try to explain what goes on. Here are my definitions of some terms:

bus address: a VideoCore/GPU address. The lower 30-bits define the 1G of 
addressable memory. The top two bits define the caching alias.
physical address: An ARM side address given to the VC MMU. This is a 30 bit 
address space.

The GPU always uses bus addresses. GPU bus mastering peripherals (like DMA) use 
bus addresses. The ARM uses physical addresses.

VC MMU: A coarse MMU used by the arm for accessing GPU memory. Each page is 16M 
and there are 64 pages. This maps 30-bits of physical address to 32-bits of bus 
address.
The setup of VC MMU is handled by the GPU and by default the mapping is:
2835: first 32 pages map physical addresses 0x00000000-0x1fffffff to bus 
addresses 0x40000000-0x5ffffffff. The next page maps physical adddress 
0x20000000 to 0x20ffffff to bus addresses 0x7e000000 to 0x7effffff
2836: first 63 pages map physical addresses 0x00000000-0x3effffff to bus 
addresses 0xc0000000-0xfefffffff. The next page maps physical adddress 
0x3f000000 to 0x3fffffff to bus addresses 0x7e000000 to 0x7effffff

Bus address 0x7exxxxxx contains the peripherals.
Note: the top 16M of sdram is not visible to the arm due the mapping of the 
peripherals. The GPU and GPU peripherals (DMA) can see it as they use bus 
addresses

The bus address cache alias bits are:

From the VideoCore processor:
0x0 L1 and L2 cache allocating and coherent
0x4 L1 non-allocating, but coherent. L2 allocating and coherent
0x8 L1 non-allocating, but coherent. L2 non-allocating, but coherent
0xc SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

From the GPU peripherals (note: all peripherals bypass the L1 cache. The arm 
will see this view once through the VC MMU):
0x0 Do not use
0x4 L1 non-allocating, and incoherent. L2 allocating and coherent.
0x8 L1 non-allocating, and incoherent. L2 non-allocating, but coherent
0xc SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

In general as long as VideoCore processor and GPU peripherals use the same 
alias everything works out. Mixing aliases requires flushing/invalidating for 
coherency and is generally avoided.

So, on 2835 the ARM has a 16K L1 cache and no L2 cache. The GPU has a 128M L2 
cache. The GPU's L2 cache is accessible from the ARM but it's not particularly 
close (i.e. not very fast).
However mapping through the L2 allocating alias (0x4) was shown to be 
beneficial on 2835, so that is the alias we use.

The situation is different on 2836. The ARM has a 32K L1 cache and a 512M 
integrated/fast L2 cache. Additionally going through the smaller/slower GPU L2 
is bad for performance.
So, we map through the SDRAM alias (0xc) and avoid the GPU L2 cache.

So, what does this mean? In general if you don't use GPU peripherals or 
communicate with the GPU, you only care about physical addresses and it makes 
no difference what bus address is actually being used.
The ARM just sees 1G of physical space that is always coherent. No flushing of 
GPU L2 cache is ever required. No need to know about aliases.

However if you do want to use GPU bus mastering peripherals (like DMA), or you 
communicate with the GPU (e.g. using the mailbox interface) you do need to 
distinguish physical and bus addresses, and you must use the correct alias.

So, on 2835 you convert from physical to bus address with
  bus_address = 0x40000000 | physical_address;
And on 2836 you convert from physical to bus address with
  bus_address = 0xC0000000 | physical_address;

(Note: you can get these offsets from device tree. See: 
https://github.com/raspberrypi/userland/commit/3b81b91c18ff19f97033e146a9f3262ca631f0e9#diff-c65a4fe18bb33aed0fc9536339f06b80R168)

So, when using GPU DMA, the addresses used for SCB, SA (source address), DA 
(dest address) must never be zero. They should be bus addresses and therefore 
0x4 or 0xc aliases.
However the difference between a 0x0 alias and a 0x4 alias is small. Using 0x0 
is wrong, may be incoherent, and may trigger exceptions on the GPU. But you may 
get away with it.
The difference between a 0x0 alias and a 0xC alias is much larger. There is now 
128K of incoherent data you may hit. You are less likely to get away with 
getting this wrong.

So, I don't believe there is any issue with:
ARM cache status (enabled, disabled) interacts with the GPU cache enable in any 
way, e.g. burst vs. non-burst transactions on the bus or something

but I would guess there may be a current bug/misunderstanding on Pi1 uboot that 
happens to be more fatal on Pi2.
_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Reply via email to