On 10/14/2015 8:19 AM, Daniel Vetter wrote: > On Tue, Oct 13, 2015 at 02:51:36PM -0700, Kristian Høgsberg wrote: >> On Tue, Oct 13, 2015 at 7:55 AM, Michel Thierry >> <michel.thierry at intel.com> wrote: >>> On 10/13/2015 3:13 PM, Emil Velikov wrote: >>>> >>>> On 13 October 2015 at 13:16, Michel Thierry <michel.thierry at intel.com> >>>> wrote: >>>>> >>>>> On 10/6/2015 2:12 PM, Michel Thierry wrote: >>>>>> >>>>>> >>>>>> On 10/5/2015 7:06 PM, Kristian Høgsberg wrote: >>>>>>> >>>>>>> >>>>>>> On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry >>>>>>> <michel.thierry at intel.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 9/14/2015 2:54 PM, MichaÅ Winiarski wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Gen8+ supports 48-bit virtual addresses, but some objects must >>>>>>>>>> always be >>>>>>>>>> allocated inside the 32-bit address range. >>>>>>>>>> >>>>>>>>>> In specific, any resource used with flat/heapless >>>>>>>>>> (0x00000000-0xfffff000) >>>>>>>>>> General State Heap (GSH) or Instruction State Heap (ISH) must be in >>>>>>>>>> a >>>>>>>>>> 32-bit range, because the General State Offset and Instruction State >>>>>>>>>> Offset >>>>>>>>>> are limited to 32-bits. >>>>>>>>>> >>>>>>>>>> The i915 driver has been modified to provide a flag to set when the >>>>>>>>>> 4GB >>>>>>>>>> limit is not necessary in a given bo >>>>>>>>>> (EXEC_OBJECT_SUPPORTS_48B_ADDRESS). >>>>>>>>>> 48-bit range will only be used when explicitly requested. >>>>>>>>>> >>>>>>>>>> Callers to the existing drm_intel_bo_emit_reloc function should set >>>>>>>>>> the >>>>>>>>>> use_48b_address_range flag beforehand, in order to use full ppgtt >>>>>>>>>> range. >>>>>>>>>> >>>>>>>>>> v2: Make set/clear functions nops on pre-gen8 platforms, and use >>>>>>>>>> them >>>>>>>>>> internally in emit_reloc functions (Ben) >>>>>>>>>> s/48BADDRESS/48B_ADDRESS/ (Dave) >>>>>>>>>> v3: Keep set/clear functions internal, no-one needs to use them >>>>>>>>>> directly. >>>>>>>>>> v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type >>>>>>>>>> before enabling set/clear function, print full offsets in >>>>>>>>>> debug >>>>>>>>>> statements, using port of lower_32_bits and upper_32_bits >>>>>>>>>> from linux >>>>>>>>>> kernel (MichaÅ) >>>>>>>>>> >>>>>>>>>> References: >>>>>>>>>> >>>>>>>>>> http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html >>>>>>>>>> Cc: Ben Widawsky <ben at bwidawsk.net> >>>>>>>>>> Cc: MichaÅ Winiarski <michal.winiarski at intel.com> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> +Kristian >>>>>>>>> >>>>>>>>> LGTM. >>>>>>>>> Acked-by: MichaÅ Winiarski <michal.winiarski at intel.com> >>>>>>>>> >>>>>>>>>> Signed-off-by: Michel Thierry <michel.thierry at intel.com> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi Kristian, >>>>>>>> >>>>>>>> More comments on this? >>>>>>>> I've resent the mesa patch with the last changes >>>>>>>> >>>>>>>> >>>>>>>> (http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html). >>>>>>>> >>>>>>>> >>>>>>>> MichaÅ has agreed with the interface and will use it for his work. >>>>>>>> If mesa doesn't plan to use this for now, it's ok. The kernel changes >>>>>>>> have >>>>>>>> been done to prevent any regressions when the support 48-bit flag is >>>>>>>> not >>>>>>>> set. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I didn't get any replies to my last comments on this: >>>>>>> >>>>>>> http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html >>>>>>> >>>>>>> Basically, I tried explicitly placing buffers above 8G and didn't see >>>>>>> the HW problem described (ie it all worked fine). I still think >>>>>>> there's some confusion as to what the W/A is about. >>>>>>> >>>>>>> Here's my take: the W/A is a SW-only workaround to handle the cases >>>>>>> where a driver uses heapless and 48-bit ppgtt. The problem is that the >>>>>>> heaps are limited to 4G but with 48bit ppgtt a buffer can be placed >>>>>>> anywhere it the 48 bit address space. If that happens it's consideredd >>>>>>> out-of-bounds for the heap and access fails. To prevent this we need >>>>>>> to make sure the bos we address in a heapless fashion are placed below >>>>>>> 4g. >>>>>>> >>>>>>> For mesa, we only configure general state base as heap-less, which >>>>>>> affects scratch bos. What this boils down to is that we need the 4G >>>>>>> restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS >>>>>>> and 3DSTATE_PS (and tesselation stage eventually). Look for the >>>>>>> OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c >>>>>>> and gen8_ps_state.c >>>>>> >>>>>> >>>>>> >>>>>> I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it >>>>>> isn't exclusive to the scratch bos (the error state points to that >>>>>> instruction). >>>>>> >>>>>> >>>>> >>>>> Hi, >>>>> >>>>> Anymore inputs about this patch? >>>>> AFAIK, if changes are required based on further comments from Kristian, >>>>> these will be in the mesa patch[1], not libdrm. Also, MichaÅ will use >>>>> this >>>>> flag in another project. >>>>> >>>> The comment seems quite brief and I'm not sure it fully addresses >>>> Kristian's concern. I'd suspect that providing reference to the HW >>>> documentation (confirming your assumption) might be beneficial. >>>> >>> >>> Sure, attached is the hang I get if I don't set the restriction in >>> gen8_misc_state.c and try to use the full 48-bit range >>> (i915_error_state_nowa.txt). This is just running gfxbench t-rex. >>> >>> I see the same hang signature when it is only applied to the scratch bos (in >>> gen8_vs_state.c, gen8_gs_state.c and gen8_ps_state.c - >>> i915_error_state_scratchbo.txt). >>> >>> 3DSTATE_VIEWPORT_STATE_POINTERS_CC (0x78230000) is defined here: >>> https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol02a-commandreference-instructions_0.pdf >>> (page 256) >> >> I applied your mesa and libdrm patches and then backed out the 4G >> restriction in the STATE_BASE_ADDRESS relocations. I was not able to >> reproduce any hang with trex or glxgears. I confirmed (using >> INTEL_DEBUG=bat) that gem actually placed BOs above the 4G mark and >> got this: >> >> 0: 8 @ 0xfffffff90000 (miptree) >> 1: 9 @ 0xfffffff78000 (hiz) >> 2: 4 @ 0xfffffff77000 (pipe_control workaround) >> 3: 5 @ 0xfffffff76000 (program cache) >> 4: 12 @ 0x000000000000 (miptree) >> 5: 7 @ 0x000000001000 (image) >> 6: 10 @ 0xfffffff5a000 (miptree) >> 7: 13 @ 0xfffffff59000 (bufferobj) >> 8: 11 @ 0xfffffff51000 (bufferobj) >> 9: 14 @ 0xfffffff50000 (bufferobj) >> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000020 -> 8 >> (miptree)@0x0000ffff fff90000 + 0x00000000 >> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000040 -> 9 >> (hiz)@0x0000ffff fff78000 + 0x00000000 >> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000098 -> 4 >> (pipe_control workaround)@0x0000ffff fff77000 + 0x00000000 >> ... >> >> In particular, the mesa batchbuffer is placed at 0xfffffff40000, and >> all the STATE_BASE_ADDRESS heaps are in that bo. In other words, all >> heap bases are set to 0xfffffff40000. >> >> The offset in 3DSTATE_VIEWPORT_STATE_POINTERS_CC is indeed only 32 >> bits, but that only affects how far from the dynamic state base >> address we can place that. In mesas case, it's always inside the batch >> buffer, which is at most 32k, so that's never a problem. If a driver >> uses dynamic state in a heapless fashion, then you need to be careful >> to place the CC viewport state below 4g. >> >> What I was able to confirm is that scratch buffer I/O (which we use >> for spilling) does break with 48 bit ppgtt. If you run glxgears with >> INTEL_DEBUG=spill_fs set, you can see lots of rendering artifacts as >> the 4G limit on the general state heap caps attempts to spill and fill >> from the vertex shader. Using OUT_RELOC64_INSIDE_4G() when setting up >> the scratch BOs in 3DSTATE_VS and 3DSTATE_PS fixes the problem. > > Ok, so we do still need the 4G limit bit (i.e. I don't have to revert the > kernel side for lack of open-source userspace), just not at the place we > originally thought. I guess lesson learned is to actually test stuff first > before I pull in the kernel side. Michel, did you not see the spilling > corruptions when testing the mesa patches? Kristian, does this also blow > up with just a piglit run? >
There's must be something else we have different in our systems. Kristian doesn't get the hang, and if I back out the 4G restriction in the STATE_BASE_ADDRESS, I don't see any corruption while running INTEL_DEBUG=spill_fs glxgears.