What timing, we were just looking at this exact issue - intermittent glxgears rendering issues when using multisample buffers.
What's the plan for DRM? Seems like it's broken if writing to the buffer size indicated (bo->size) causes us to clobber other BOs. Courtney On Mon, Apr 14, 2014 at 8:54 PM, Kenneth Graunke <kenn...@whitecape.org>wrote: > On 04/14/2014 05:33 PM, Eric Anholt wrote: > > This manifested as rendering failures or sometimes GPU hangs in > > compositors when they accidentally got MSAA visuals due to a bug in the X > > Server. Today we decided that the problem in compositors was equivalent > > to a corruption bug we'd noticed recently in resizing MSAA-visual > > glxgears, and debugging got a lot easier. > > > > When we allocate our MCS MT, libdrm takes the size we request, aligns it > > to Y tile size (blowing it up from 300x300=900000 bytes to 384*320=122880 > > bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 > bytes, > > 32 pages). Because it's Y tiled, we attach a 384-byte-stride fence to > it. > > When we memset by the BO size in Mesa, between bytes 122880 and 131072 > the > > data gets stored to the first 20 or so scanlines of each of the 3 tiled > > pages in that row, even though only 2 of those pages were allocated by > > libdrm. > > What? > > I get that drm_intel_bo_alloc/drm_intel_bo_alloc_tiled might return a > drm_intel_bo where bo->size is larger than what you asked for, due to > the BO cache. But...what you're saying is, it doesn't actually allocate > enough pages to back the whole bo->size it gives you? So, if you write > bytes 0..(bo->size - 1), you'll randomly clobber memory in a way that's > really difficult to detect? > > I never realized that. It seems pretty scary. > > There are other places where we memset an entire BO using bo->size. For > example, your INTEL_DEBUG=shader_time code does exactly that (though it > isn't tiled). > > Could we change libdrm to set bo->size to the actual usable size of the > buffer, rather than the bucket size? That seems much safer, and would > prevent mistakes like this in the future. If libdrm needs to know the > bucket size internally, we could store that as a field in the private > drm_intel_gem_bo structure, and not expose it to consumers of the > library. I can't imagine users of libdrm want the current value. > > > In the glxgears case, the missing 3rd page happened to > > consistently be the static VBO that got mapped right after the first MCS > > allocation, so corruption only appeared once window resize made us throw > > out the old MCS and then allocate the same BO to back the new MCS. > > > > Instead, just memset the amount of data we actually asked libdrm to > > allocate for, which will be smaller (more efficient) and not overrun. > > Thanks go to Kenneth for doing most of the hard debugging to eliminate a > > lot of the search space for the bug. > > > > Cc: "10.0 10.1" <mesa-sta...@lists.freedesktop.org> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77207 > > --- > > > > Ken: how I eventually figured this out was I thought "well, I'm > > clearing the whole surface at the start of rendering glxgears, so I > > don't *really* need to memset because I'll be initializing the whole > > buffer during fallback glClear() anyway, so let me just drop the whole > > memset step to definitely eliminate that as a potential problem. Huh, > > the problem's gone." > > > > The worst is I remember briefly thinking about this code when it was > > landing, and thought "nope, seems safe." > > > > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > index 5996a1b..59700ed 100644 > > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > @@ -1219,7 +1219,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw, > > * Note: the clear value for MCS buffers is all 1's, so we memset to > 0xff. > > */ > > void *data = intel_miptree_map_raw(brw, mt->mcs_mt); > > - memset(data, 0xff, mt->mcs_mt->region->bo->size); > > + memset(data, 0xff, mt->mcs_mt->region->height * > mt->mcs_mt->region->pitch); > > intel_miptree_unmap_raw(brw, mt->mcs_mt); > > mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR; > > This does seem to fix the KWin problem, as well as the glxgears problem. > > I agree this is the correct amount of data to memset, and even if we > make the libdrm change I suggested, this seems worth doing. bo->size > may have been rounded up beyond what we need, and memsetting that extra > space is wasteful (even if it did work). > > Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> > > Thanks a ton for your help on this, Eric. I was really stumped. > > > _______________________________________________ > mesa-stable mailing list > mesa-sta...@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-stable > > -- Courtney Goeltzenleuchter LunarG
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev