On 04/04/2017 07:46 PM, Dr. Philipp Tomsich wrote: > >> On 04 Apr 2017, at 18:15, Marek Vasut <ma...@denx.de> wrote: >> >> On 04/03/2017 07:49 PM, Philipp Tomsich wrote: >>> Merely using dma_alloc_coherent does not ensure that there is no stale >>> data left in the caches for the allocated DMA buffer (i.e. that the >>> affected cacheline may still be dirty). >>> >>> The original code was doing the following (on AArch64, which >>> translates a 'flush' into a 'clean + invalidate'): >>> # during initialisation: >>> 1. allocate buffers via memalign >>> => buffers may still be modified (cached, dirty) >>> # during interrupt processing >>> 2. clean + invalidate buffers >>> => may commit stale data from a modified cacheline >>> 3. read from buffers >>> >>> This could lead to garbage info being written to buffers before >>> reading them during even-processing. >>> >>> To make the event processing more robust, we use the following sequence >>> for the cache-maintenance: >>> # during initialisation: >>> 1. allocate buffers via memalign >>> 2. clean + invalidate buffers >>> (we only need the 'invalidate' part, but dwc3_flush_cache() >>> always performs a 'clean + invalidate') >>> # during interrupt processing >>> 3. read the buffers >>> (we know these lines are not cached, due to the previous >>> invalidation and no other code touching them in-between) >>> 4. clean + invalidate buffers >>> => writes back any modification we may have made during event >>> processing and ensures that the lines are not in the cache >>> the next time we enter interrupt processing >>> >>> Note that with the original sequence, we observe reproducible >>> (depending on the cache state: i.e. running dhcp/usb start before will >>> upset caches to get us around this) issues in the event processing (a >>> fatal synchronous abort in dwc3_gadget_uboot_handle_interrupt on the >>> first time interrupt handling is invoked) when running USB mass >>> storage emulation on our RK3399-Q7 with data-caches on. >>> >>> Signed-off-by: Philipp Tomsich <philipp.toms...@theobroma-systems.com> >>> >>> --- >>> >>> drivers/usb/dwc3/core.c | 2 ++ >>> drivers/usb/dwc3/gadget.c | 5 +++-- >>> 2 files changed, 5 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c >>> index b2c7eb1..f58c7ba 100644 >>> --- a/drivers/usb/dwc3/core.c >>> +++ b/drivers/usb/dwc3/core.c >>> @@ -125,6 +125,8 @@ static struct dwc3_event_buffer >>> *dwc3_alloc_one_event_buffer(struct dwc3 *dwc, >>> if (!evt->buf) >>> return ERR_PTR(-ENOMEM); >>> >>> + dwc3_flush_cache((long)evt->buf, evt->length); >>> + >> >> Is the length aligned ? If not, you will get cache alignment warning. >> Also, address should be uintptr_t to avoid 32/64 bit issues . > > The length is a well-known value and aligned (it expands to PAGE_SIZE in the > end).
Uh, the event buffer is 4k ? That's quite big, but OK. > Good point on the “long”, especially as I just copied this from other > occurences and it’s consistently wrong throughout DWC3 in U-Boot: Hrm, I thought the driver was ported over from Linux, so is this broken in Linux too ? > drivers/usb/dwc3/core.c: dwc3_flush_cache((long)evt->buf, > evt->length); > drivers/usb/dwc3/ep0.c: dwc3_flush_cache((long)buf_dma, len); > drivers/usb/dwc3/ep0.c: dwc3_flush_cache((long)trb, sizeof(*trb)); > drivers/usb/dwc3/ep0.c: dwc3_flush_cache((long)trb, sizeof(*trb)); > drivers/usb/dwc3/ep0.c: dwc3_flush_cache((long)trb, > sizeof(*trb)); > drivers/usb/dwc3/ep0.c: dwc3_flush_cache((long)dwc->ep0_bounce, > DWC3_EP0_BOUNCE_SIZE); > drivers/usb/dwc3/gadget.c: > dwc3_flush_cache((long)req->request.dma, req->request.length); > drivers/usb/dwc3/gadget.c: dwc3_flush_cache((long)dma, length); > drivers/usb/dwc3/gadget.c: dwc3_flush_cache((long)trb, > sizeof(*trb)); > drivers/usb/dwc3/gadget.c: dwc3_flush_cache((long)trb, > sizeof(*trb)); > drivers/usb/dwc3/gadget.c: > dwc3_flush_cache((long)evt->buf, evt->length); > drivers/usb/dwc3/io.h:static inline void dwc3_flush_cache(int addr, int > length) > > Worst of all: the definition of dwc3_flush_cache in io.h has “int” as a type, > which will eat us alive if the DWC3’s physical address is beyond 32-bit. > > I’ll revise all of these and make a patch-series out of this. Maybe you should check the Linux first and see if there are some fixes already. Thanks >>> return evt; >>> } >>> >>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >>> index 1156662..61af71b 100644 >>> --- a/drivers/usb/dwc3/gadget.c >>> +++ b/drivers/usb/dwc3/gadget.c >>> @@ -2668,11 +2668,12 @@ void dwc3_gadget_uboot_handle_interrupt(struct dwc3 >>> *dwc) >>> int i; >>> struct dwc3_event_buffer *evt; >>> >>> + dwc3_thread_interrupt(0, dwc); >>> + >>> + /* Clean + Invalidate the buffers after touching them */ >>> for (i = 0; i < dwc->num_event_buffers; i++) { >>> evt = dwc->ev_buffs[i]; >>> dwc3_flush_cache((long)evt->buf, evt->length); >>> } >>> - >> >> This makes me wonder, don't you need to invalidate the event buffer >> somewhere so that the new data would be fetched from RAM ? > > We flush the event buffer before leaving the function. > So the cache line will not be present in the cache, when we enter this > function again. Then shouldn't we invalidate it instead ? flush and invalidate are two different things ... >>> - dwc3_thread_interrupt(0, dwc); >>> } >>> } >>> >> >> One last thing, is this patch needed in Linux too ? > > Linux deals properly with DMA allocations and manages them in appropriate > memory regions (e.g. marked uncached). > Also, some of the affected code-paths are U-Boot specific. > > This really stems from a limitation of the way the DMA areas are allocated in > U-Boot (i.e. from the heap, using a memalign) and how the cache-operations > have been sequenced relative to the other code in the port to U-Boot. OK I see, thanks for clarifying! -- Best regards, Marek Vasut _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot