Hi Alan,

Alan Stern wrote:
> This description sounds hopelessly confused.  Maybe you're just
> misusing the term "coherent".  The patch itself doesn't affect the
> coherent DMA mappings anyway; it affects the streaming mappings.  Or to
> put it another way, what's the justification for replacing a call to
> dma_map_single() with a call to dma_alloc_coherent()?
> 
> Since the patch doesn't affect any of the coherent mappings (see for 
> example the calls to dma_pool_create() in ehci-mem.c), I don't see how 
> it can possibly do what you claim.
> 

Thanks for your comments. Let's try to hopefully clarify this a bit.

I've used the term "coherent" as described in Documentation/DMA-API.txt (aka 
"consistent" as used in PCI-related functions).
I've tried to describe first the limitations of the platform that I'm working 
on. Basically, one of the annoying things of that platform is that writes to 
uncached memory (as used in "coherent" memory) can only be reliably performed 
in 32-bit accesses.

The USB subsystem ends up using "coherent" memory for buffers and/or other 
structures in different ways.

The "coherent" memory allocated in dma_pool_create() in ehci-mem.c that you 
report is not a problem at all because it is always accessed in 32-bit chunks 
(it hasn't been always like that but since commit 
3807e26d69b9ad3864fe03224ebebc9610d5802e "USB: EHCI: split ehci_qh into hw and 
sw parts" this got addressed as a side effect, so I didn't need to post another 
patch for that).

Other possible interactions with "coherent" memory are those involving buffers 
used in USB transactions, which may be allocated via the USB subsystem (at 
usb_buffer_alloc() or when bounced via hcd_alloc_coherent()) or which may come 
already allocated and ready for use (URB_NO_{SETUP,TRANSFER}_DMA_MAP).

The patch, as posted, allocates normal memory for USB buffers _within_ the USB 
subsystem and invariably bounces all buffers to new "coherent" buffers.
So, basically, what the patch claims (avoid 32-bit writes for "coherent" memory 
within the USB subsystem) is "done" (hey, it actually works ;-).

But I think you have raised valid points here :)

If the "coherent" memory is already allocated and passed (as already 
dma-mapped) to the USB subsystem then there is no gain in bouncing the buffer:
- if a non-32 bit write was done to that "coherent" memory the damage is 
already done
- if the "coherent" memory was written always in 32-bit accesses then we can 
just safely use it
So bouncing here should be avoided as it is unneeded.

On the other hand, we can control USB buffers managed by the USB subsystem 
itself.
That's what the patch "does". It avoids access restrictions to USB buffers by 
allocating them from normal memory (leaving USB drivers free to access those 
buffers in whatever bus width they need, as they do today) ... and bouncing 
them.
The thing here is that it makes no sense to bounce them to "coherent" memory if 
they can be dma-mapped directly (as you point in your 
dma_map_single-vs-dma_alloc_coherent comment).

So... that's what RFCs are for :)
I'll take a look again at the patch.

>> +/**
>> + * hcd_memcpy32_to_coherent - copy data to a bounce buffer
>> + * @dst: destination dma bounce buffer
>> + * @src: source buffer
>> + * @len: number of bytes to copy
>> + *
>> + * This function copies @len bytes from @src to @dst in 32 bit chunks.
>> + * The caller must guarantee that @dst length is 4 byte aligned and
>> + * that @dst length is greater than or equal to @src length.
>> + */
>> +static void *hcd_memcpy32_to_coherent(void *dst, const void *src, size_t 
>> len)
>> +{
>> +    u32 *q = dst, *p = (void *)src;
>> +    u8 *s;
>> +
>> +    while (len >= 4) {
>> +            *q++ = *p++;
>> +            len -= 4;
>> +    }
>> +    s = (u8 *)p;
>> +    switch (len) {
>> +    case 3:
>> +            *q = s[0] << 24 | s[1] << 16 | s[2] << 8;
>> +            break;
>> +    case 2:
>> +            *q = s[0] << 24 | s[1] << 16;
>> +            break;
>> +    case 1:
>> +            *q = s[0] << 24;
>> +            break;
>> +    default:
>> +            break;
>> +    }
>> +    return dst;
>> +}
> 
> What happens if somebody tries to use this code on a little-endian CPU?
> 

It will fail.


> It seems that every time somebody comes up with a new kind of 
> memory-access restriction, this function grows by a factor of 2.  After 
> a few more iterations it will be larger than the rest of the kernel!
> 
> There must be a better way to structure the requirements here.
> 

Hopefully I didn't miss any of your concerns and managed to explain the problem.

> Alan Stern
> 
> 

Thanks,
Albert

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to