Am 23.01.2018 um 16:28 schrieb David Miller: > Looking at how these DMA counters are handled, there appears to be a > requirement that the memory buffer is 64-byte aligned. > > [...] > > Therefore the driver needs to allocate "size + (64 - 1)" bytes and do > the 64-byte alignment of the CPU pointer and the DMA address by hand.
This is also what I wondered about as a non-expert in hardware drivers; alignment should surely be enforced here. However, for the memory corruption I observed, I used an x86_64 system (which I believe always has PAGE_SIZE aligned buffers). So there should be another bug, unless I am mistaken about x86_64. I checked the deprecated r8168 driver by Realtek (I am not sure if this one is also affected by the issue, though) and found two major differences in DMA handling: 1) It wraps the DMA operations (writing of adresses, waiting for cmd bits to be pulled down) in spin_lock_irqsave / spin_unlock_irqrestore. 2) It does not reset CounterAddrLow / CounterAddrHigh to 0 / 0 after finishing. That's not really good, but may have hidden this issue with r8168. Again, I have not tried to use r8168 yet (especially since it only supports old kernels), but maybe this helps to trigger some ideas. Worst case, this could be a firmware timing bug, i.e. the card writes the counters to system memory shortly before the cmd bytes are pulled high / shortly after they have been pulled down (then using the partially zeroed out memory address) - I don't know. Let me know if I can extract any more info from an affected machine, but I believe these machines should be very abundant. HTH and thanks, Oliver