Re: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

popcorn mix Tue, 17 Mar 2015 08:56:54 -0700

On 17/03/15 03:04, Stephen Warren wrote:

It would be nice though if someone from the RPi Foundation could comment
on the exact effect of the upper bus address bits, and why 0xc would
work for RPi2 but 0x4 for the RPi 1. I wonder if the ARM cache status
(enabled, disabled) interacts with the GPU cache enable in any way, e.g.
burst vs. non-burst transactions on the bus or something? That's about
the only reason I can see for the RPi Foundation kernel working with 0x4
bus addresses on both chips, but U-Boot needing something different on
RPi2...


Dom, for reference, see:
http://lists.denx.de/pipermail/u-boot/2015-March/207947.html
http://lists.denx.de/pipermail/u-boot/2015-March/thread.html#207947

First, remember that 2835 is a large GPU with a small ARM attached. On some
platforms the ARM is not even used.
The GPU boots first and may wake the arm. The GPU is the centre of the
universe, and the ARM has to fit in.

Okay, I'll try to explain what goes on. Here are my definitions of some terms:

bus address: a VideoCore/GPU address. The lower 30-bits define the 1G of
addressable memory. The top two bits define the caching alias.
physical address: An ARM side address given to the VC MMU. This is a 30 bit
address space.

The GPU always uses bus addresses. GPU bus mastering peripherals (like DMA) use
bus addresses. The ARM uses physical addresses.

VC MMU: A coarse MMU used by the arm for accessing GPU memory. Each page is 16M
and there are 64 pages. This maps 30-bits of physical address to 32-bits of bus
address.
The setup of VC MMU is handled by the GPU and by default the mapping is:
2835: first 32 pages map physical addresses 0x00000000-0x1fffffff to bus
addresses 0x40000000-0x5ffffffff. The next page maps physical adddress
0x20000000 to 0x20ffffff to bus addresses 0x7e000000 to 0x7effffff
2836: first 63 pages map physical addresses 0x00000000-0x3effffff to bus
addresses 0xc0000000-0xfefffffff. The next page maps physical adddress
0x3f000000 to 0x3fffffff to bus addresses 0x7e000000 to 0x7effffff

Bus address 0x7exxxxxx contains the peripherals.
Note: the top 16M of sdram is not visible to the arm due the mapping of the
peripherals. The GPU and GPU peripherals (DMA) can see it as they use bus
addresses

The bus address cache alias bits are:

From the VideoCore processor:
0x0 L1 and L2 cache allocating and coherent
0x4 L1 non-allocating, but coherent. L2 allocating and coherent
0x8 L1 non-allocating, but coherent. L2 non-allocating, but coherent
0xc SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

From the GPU peripherals (note: all peripherals bypass the L1 cache. The arm
will see this view once through the VC MMU):
0x0 Do not use
0x4 L1 non-allocating, and incoherent. L2 allocating and coherent.
0x8 L1 non-allocating, and incoherent. L2 non-allocating, but coherent
0xc SDRAM alias. Cache is bypassed. Not L1 or L2 allocating or coherent

In general as long as VideoCore processor and GPU peripherals use the same
alias everything works out. Mixing aliases requires flushing/invalidating for
coherency and is generally avoided.

So, on 2835 the ARM has a 16K L1 cache and no L2 cache. The GPU has a 128M L2
cache. The GPU's L2 cache is accessible from the ARM but it's not particularly
close (i.e. not very fast).
However mapping through the L2 allocating alias (0x4) was shown to be
beneficial on 2835, so that is the alias we use.

The situation is different on 2836. The ARM has a 32K L1 cache and a 512M
integrated/fast L2 cache. Additionally going through the smaller/slower GPU L2
is bad for performance.
So, we map through the SDRAM alias (0xc) and avoid the GPU L2 cache.

So, what does this mean? In general if you don't use GPU peripherals or
communicate with the GPU, you only care about physical addresses and it makes
no difference what bus address is actually being used.
The ARM just sees 1G of physical space that is always coherent. No flushing of
GPU L2 cache is ever required. No need to know about aliases.

However if you do want to use GPU bus mastering peripherals (like DMA), or you
communicate with the GPU (e.g. using the mailbox interface) you do need to
distinguish physical and bus addresses, and you must use the correct alias.

So, on 2835 you convert from physical to bus address with
bus_address = 0x40000000 | physical_address;
And on 2836 you convert from physical to bus address with
bus_address = 0xC0000000 | physical_address;

(Note: you can get these offsets from device tree. See:
https://github.com/raspberrypi/userland/commit/3b81b91c18ff19f97033e146a9f3262ca631f0e9#diff-c65a4fe18bb33aed0fc9536339f06b80R168)

So, when using GPU DMA, the addresses used for SCB, SA (source address), DA
(dest address) must never be zero. They should be bus addresses and therefore
0x4 or 0xc aliases.
However the difference between a 0x0 alias and a 0x4 alias is small. Using 0x0
is wrong, may be incoherent, and may trigger exceptions on the GPU. But you may
get away with it.
The difference between a 0x0 alias and a 0xC alias is much larger. There is now
128K of incoherent data you may hit. You are less likely to get away with
getting this wrong.

So, I don't believe there is any issue with:

ARM cache status (enabled, disabled) interacts with the GPU cache enable in any 
way, e.g. burst vs. non-burst transactions on the bus or something


but I would guess there may be a current bug/misunderstanding on Pi1 uboot that 
happens to be more fatal on Pi2.
_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC PATCH] usb: dwc2: handle bcm2835 phys->virt address translations

Reply via email to