On Fri, Aug 17, 2018 at 07:59:44PM +0100, Al Viro wrote: > On Fri, Aug 17, 2018 at 07:58:41PM +0100, Al Viro wrote: > > On Fri, Aug 17, 2018 at 07:09:49PM +0100, Al Viro wrote: > > > > > Re that code - are you sure it doesn't need le64_to_cpu(*src)? Because > > > from what > > > I understand about PCI (which matches just fine to the comments in the > > > same driver), > > > you probably do need that. Again, the only real way to find out is to > > > test on > > > big-endian host... > > > > BTW, would that, by any chance, be an open-coded > > _iowrite64_copy(dst, src, EQ_UNIT/sizeof(u64)) > > __iowrite64_copy, even...
FWIW, it looks like the confusion had been between the endianness of the data structures (b-e both on host and NIC side) and the fact that PCI is l-e. *IF* that code wants to copy data from host data structures to iomem as-is, it needs to use __raw_writeq() and its ilk or writeq(le64_to_cpu(...)) to compensate. The latter would, indeed, confuse sparse - we are accessing b-e data as if it was l-e. If we want copying that wouldn't affect the endianness, we need memcpy_toio() or similar beasts. And AFAICS that code is very close to /* If we're only writing a single Egress Unit and the BAR2 * Queue ID is 0, we can use the Write Combining Doorbell * Gather Buffer; otherwise we use the simple doorbell. */ if (n == 1 && tq->bar2_qid == 0) { unsigned int index = (tq->pidx ?: tq->size) - 1; /* Copy the TX Descriptor in a tight loop in order to * try to get it to the adapter in a single Write * Combined transfer on the PCI-E Bus. If the Write * Combine fails (say because of an interrupt, etc.) * the hardware will simply take the last write as a * simple doorbell write with a PIDX Increment of 1 * and will fetch the TX Descriptor from memory via * DMA. */ __iowrite64_copy(tq->bar2_addr + SGE_UDB_WCDOORBELL, &tq->desc[index], EQ_UNIT/sizeof(u64)) } else { writel(val | QID_V(tq->bar2_qid), tq->bar2_addr + SGE_UDB_KDOORBELL); } /* This Write Memory Barrier will force the write to the User * Doorbell area to be flushed. This is needed to prevent * writes on different CPUs for the same queue from hitting * the adapter out of order. This is required when some Work * Requests take the Write Combine Gather Buffer path (user * doorbell area offset [SGE_UDB_WCDOORBELL..+63]) and some * take the traditional path where we simply increment the * PIDX (User Doorbell area SGE_UDB_KDOORBELL) and have the * hardware DMA read the actual Work Request. */ wmb(); which wouldn't have looked unusual... Again, that really needs review from the folks familiar with the hardware in question, as well as testing - I'm not trying to push patches like that. If the current mainline variant really works on b-e, I'd like to understand how does it manage that, though...