On Wed, Dec 4, 2013 at 6:56 PM, Alex Deucher <alexdeucher at gmail.com> wrote: > On Wed, Dec 4, 2013 at 5:16 PM, Kleber Sacilotto de Souza > <klebers at linux.vnet.ibm.com> wrote: >> On 11/25/2013 10:11 PM, Kleber Sacilotto de Souza wrote: >>> >>> On 11/24/2013 09:15 PM, Benjamin Herrenschmidt wrote: >>>> >>>> On Fri, 2013-11-08 at 11:43 -0200, Kleber Sacilotto de Souza wrote: >>>>> >>>>> On 11/07/2013 08:29 PM, Benjamin Herrenschmidt wrote: >>>>>> >>>>>> On Mon, 2013-06-17 at 18:57 -0400, Alex Deucher wrote: >>>>>> >>>>>>> Weird. I wonder if there is an issue with cache snoops on PPC. We >>>>>>> currently use the gart in cached mode (GPU snoops CPU cache) with >>>>>>> cached pages. I wonder if we need to use uncached pages on PPC. >>>>>> >>>>>> There is no such issue and no known bugs with DMA writes on those >>>>>> PCIe host bridges (and they do get hammered pretty bad here). >>>>>> >>>>>> This needs further investigation by the lab/hw guys to find out what's >>>>>> actually happening on the bus and the host bridge. >>>>>> >>>>>> Thadeu, Kleber: Jerome suggested writing a test case in userspace that >>>>>> continuously writes to a spare scratch register (thus triggering the >>>>>> corresponding writeback DMA) and checks the memory location to compare >>>>>> the writeback value (using a debugfs file for example, or mmap). >> >> >> I was not able to reproduce the issue with this method, even after a weekend >> run. >> >> However, doing some more investigation it seems the problem is here, where >> we read the ring rptr: >> >> u32 radeon_ring_generic_get_rptr(struct radeon_device *rdev, >> struct radeon_ring *ring) >> { >> u32 rptr; >> >> if (rdev->wb.enabled) >> rptr = le32_to_cpu(rdev->wb.wb[ring->rptr_offs/4]); >> else >> rptr = RREG32(ring->rptr_reg); >> >> return rptr; >> } >> >> I realized that the DMA'ed rptr value has always the opposite byte order >> from the MMIO value. Since RREG32 already returns the register value on the >> CPU byte order, it seems we don't need to byte-swap the DMA'ed value. If I >> remove the le32_to_cpu() call and use the DMA'ed value directly, I don't get >> the IB scheduling failures and piglit results are the same as with writeback >> disabled. >> >> Is the adapter chipset swapping the bytes before doing the DMA to a >> big-endian host? > > Setting CP_RB_CNTL.BUF_SWAP causes the CP to use the selected byte > swapping for just about everything accessed by the CP (rptr writeback, > indirect buffers, etc.). Looks like the DMA ring supports and enables > rptr writeback as well (DMA_RB_CNTL.DMA_RPTR_WRITEBACK_SWAP_ENABLE) so > I think we can drop the swapping of the rptr writeback. >
Obvious patch attached. Alex > Alex > >> >> >> >> -- >> Kleber Sacilotto de Souza >> IBM Linux Technology Center >> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-drm-radeon-don-t-byteswap-readback-of-rptr-writeback.patch Type: text/x-diff Size: 994 bytes Desc: not available URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20131204/a92e67e4/attachment-0001.patch>