On Mon, Sep 08, 2014 at 09:15:50AM +0100, Chris Wilson wrote:
> On Mon, Sep 08, 2014 at 10:03:51AM +0200, Daniel Vetter wrote:
> > On Sun, Sep 07, 2014 at 09:08:31AM +0100, Chris Wilson wrote:
> > > Running igt, I was encountering the invalid TLB bug on my 845g, despite
> > > that it was using the CS workaround. Examining the w/a buffer in the
> > > error state, showed that the copy from the user batch into the
> > > workaround itself was suffering from the invalid TLB bug (the first
> > > cacheline was broken with the first two words reversed). Time to try a
> > > fresh approach. This extends the workaround to write into each page of
> > > our scratch buffer in order to overflow the TLB and evict the invalid
> > > entries. This could be refined to only do so after we update the GTT,
> > > but for simplicity, we do it before each batch.
> > > 
> > > I suspect this supersedes our current workaround, but for safety keep
> > > doing both.
> > 
> > I suspect that we might end up with just an elaborate delay
> > implementation, but if it works then it's good. One nitpick below, with
> > that addressed this is Reviewed-by: Daniel Vetter <daniel.vet...@ffwll.ch>
> 
> One way to test that is simply comparing 64x4096 byte writes in the same
> page vs 64x4 byte writes in 64 different pages. That should be roughly
> the same latency (thought with TLB fetches you never be too sure) and
> demonstrate that it is either the TLB or the delay that's the factor.
> 
> In my testing, I did multiple copies into the batch w/a so that I was
> reasonably sure that what the source was stable and the copy of the
> source didn't match N times.
> 
> > > 
> > > Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> > > Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
> > > Cc: sta...@vger.kernel.org
> > > ---
> > >  drivers/gpu/drm/i915/i915_reg.h         | 12 ++++---
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 62 
> > > +++++++++++++++++++--------------
> > >  2 files changed, 44 insertions(+), 30 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_reg.h 
> > > b/drivers/gpu/drm/i915/i915_reg.h
> > > index e4d7607da2c4..f29b44c86a2f 100644
> > > --- a/drivers/gpu/drm/i915/i915_reg.h
> > > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > > @@ -334,16 +334,20 @@
> > >  #define GFX_OP_DESTBUFFER_INFO    ((0x3<<29)|(0x1d<<24)|(0x8e<<16)|1)
> > >  #define GFX_OP_DRAWRECT_INFO     ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
> > >  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
> > > -#define SRC_COPY_BLT_CMD                ((2<<29)|(0x43<<22)|4)
> > > +
> > > +#define COLOR_BLT_CMD                    (2<<29 | 0x40<<22 | (5-2))
> > > +#define SRC_COPY_BLT_CMD         ((2<<29)|(0x43<<22)|4)
> > >  #define XY_SRC_COPY_BLT_CMD              ((2<<29)|(0x53<<22)|6)
> > >  #define XY_MONO_SRC_COPY_IMM_BLT ((2<<29)|(0x71<<22)|5)
> > > -#define XY_SRC_COPY_BLT_WRITE_ALPHA      (1<<21)
> > > -#define XY_SRC_COPY_BLT_WRITE_RGB        (1<<20)
> > > +#define   BLT_WRITE_A                    (2<<20)
> > > +#define   BLT_WRITE_RGB                  (1<<20)
> > > +#define   BLT_WRITE_RGBA         (BLT_WRITE_RGB | BLT_WRITE_A)
> > >  #define   BLT_DEPTH_8                    (0<<24)
> > >  #define   BLT_DEPTH_16_565               (1<<24)
> > >  #define   BLT_DEPTH_16_1555              (2<<24)
> > >  #define   BLT_DEPTH_32                   (3<<24)
> > > -#define   BLT_ROP_GXCOPY         (0xcc<<16)
> > > +#define   BLT_ROP_SRC_COPY               (0xcc<<16)
> > > +#define   BLT_ROP_COLOR_COPY             (0xf0<<16)
> > >  #define XY_SRC_COPY_BLT_SRC_TILED        (1<<15) /* 965+ only */
> > >  #define XY_SRC_COPY_BLT_DST_TILED        (1<<11) /* 965+ only */
> > >  #define CMD_OP_DISPLAYBUFFER_INFO ((0x0<<29)|(0x14<<23)|2)
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
> > > b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index 16371a444426..acd933b1b027 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -1368,49 +1368,59 @@ i830_dispatch_execbuffer(struct intel_engine_cs 
> > > *ring,
> > >                           u64 offset, u32 len,
> > >                           unsigned flags)
> > >  {
> > > + u32 cs_offset = ring->scratch.gtt_offset;
> > >   int ret;
> > >  
> > > - if (flags & I915_DISPATCH_PINNED) {
> > > -         ret = intel_ring_begin(ring, 4);
> > > -         if (ret)
> > > -                 return ret;
> > > + ret = intel_ring_begin(ring, 6);
> > > + if (ret)
> > > +         return ret;
> > >  
> > > -         intel_ring_emit(ring, MI_BATCH_BUFFER);
> > > -         intel_ring_emit(ring, offset | (flags & I915_DISPATCH_SECURE ? 
> > > 0 : MI_BATCH_NON_SECURE));
> > > -         intel_ring_emit(ring, offset + len - 8);
> > > -         intel_ring_emit(ring, MI_NOOP);
> > > -         intel_ring_advance(ring);
> > > - } else {
> > > -         u32 cs_offset = ring->scratch.gtt_offset;
> > > + /* Evict the invalid PTE TLBs */
> > > + intel_ring_emit(ring, COLOR_BLT_CMD | BLT_WRITE_RGBA);
> > > + intel_ring_emit(ring, BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | 4096);
> > > + intel_ring_emit(ring, 64 << 16 | 4); /* load each page */
> > 
> > s/64/(I830M_BACHT_LIMIT / 4096)/ for clarity
> 
> Hmm, I don't think so. My thinking here is that this is TLB entries,
> so perhaps
> 
> #define I830M_TLB_ENTRIES 64
> #define I830M_BATCH_LIMIT 256*1024 /* uAPI convention */
> 
> #define I830M_WA_BUFSIZE MAX(GTT_PAGE_SIZE * I830M_TLB_ENTRIES, 
> I830M_BATCH_LIMIT)

Yeah that works better. Or a comment explaining that we have 64 tlbs.
Either is fine, as long as we have a symbolic link to the scratch wa
buffer size to make sure we don't overwrite random memory by accident.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to