On Tue, Jul 07, 2015 at 10:03:09PM -0700, Kenneth Graunke wrote: > Hi Chris, > > I made a genuine effort to review this patch, hoping to better understand > the various changes and what you were trying to accomplish. I spent many > hours reading and trying to enumerate changes - or potential changes I > needed to look hard at to convince myself whether they were correct. > > I came up with a frighteningly long list of changes:
* Locking completely overhauled. > * Relocation handling changes considerably (the original point of > Kristian's endeavour which led up to this). It's been a long time since 2011. > * Fencing, busy tracking, and sync objects are completely reworked. sync obje That's a slight overstatement. > * Render-to-texture cache flushing and dirty buffer tracking is > completely reworked. And immensely simplified. A hash table? Did you notice that the current dirty tracking misses where the same buffer is rewritten? > * Gen7 SOL buffer offset resetting now uses MI_LOAD_REGISTER_IMM rather > than the execbuf2 parameter, requiring the command validator on Haswell. > This effectively bumps the kernel requirement from v3.6 to v4.2-rc1, > which will simply not fly with distributions at this time. My bad, completely missed that there was EXT_transform_feedback that only did the subset. I wondered why you had code to the pipelined loads/save but then still flushed the batch every time. > * glBufferSubData() now uses intel_upload_data() rather than allocating > a temporary BO. This is the first use of the upload buffer by the > BLT engine, and could imply that the upload buffer's lifetime now > extends across batches - longer than before. Separable change that > requires separate evaluation and justification. Yes and no, it is easy to demonstrate that the buffer's lifetime is longer than a batch current. > * Per buffer cache-coherency checking rather than brw->has_llc? We've pointed out the bugs in the current usage of brw->has_llc before. > * glBufferSubData()'s prefer_stall_to_blit flag appears to depend on > per-buffer cache-coherency rather than being set globally. Could > impact performance of buffer uploads. > > * Potential missing flushes (which can cause hangs or misrendering): > > - It looks like calling brw_bo_busy() with BUSY_FLUSH causes a flush > when necessary. However, some instances of the old bo_busy, > bo_references, batch_flush pattern are replaced without that flag. > One occurrance was in BufferSubData(); I did not spend time to > check every case. The change is still accurate. > - Flushes are often done implicitly by e.g. brw_bo_read calling > brw_bo_map with the appropriate flags, and many explicit checks > and flushes are removed. Not bad, but needs careful review. > > - Gen6+ query object code might have dropped an implicit flush > guaranteeing that when the GL application requests the result, > any pending work will be kicked off so they can poll/spin > repeatedly until the result arrives. The flush is not missing. > - New code to avoid redundant flushes. > * perf_debug() warnings are removed all over the code for some reason: The problem is perf_debug() was a layering violation (my goal was to keep brw_batch a mini-library, either to extract it back to libdrm or for reuse). I have an idea of using a macro like #define PERF_DEBUG_LOC(brw, str) ((brw->perf_debug ? &(struct loc){__FILE__, __FUNCTION__, __LINE__, str)} : NULL)) brw_bo_map(..., PERF_DEBUG_LOC(brw)); Then pass back to a brw_context callback if a function stalls. > - Unsynchronized maps/BufferSubData not working on !LLC platforms? > If they work now, that's a huge change! If not, why drop the warning? They have always worked on !llc. The same caveats with using multiple aliasing to the same physical address through different modes of write caching apply to both llc and !llc. > - Warnings about stalls on mapping buffers and miptrees are gone now. > These have been useful in tracking down performance problems. They > might not always be accurate, but surely removing them should be done > separately with justification? See issues with calling perf_debug. > - Warnings about stalls on query objects are gone. I've used these when > analyzing application performance. Why? See issues with calling perf_debug. > - Warnings about implicit flushes are gone. See issues with calling perf_debug. It was at the back of my mind that I had to fix perf_debug - if I had put in the list of issues, I would probably would have remembered to fix it before posting. > * BO unmap calls appears to be missing in some places. A few map calls > have moved around in hard-to-follow ways. Unclear how lifetimes of > buffers and lifetimes of maps are affected. Bo unmap is a figment of your imagination. libdrm never unmaps a bo until it is closed (simply to avoid the cost of refaulting the objects - though that does come at some risk of address space exhaustion on 32bit systems with large memory). > * Possible mmap vs. pwrite preference changes? Hard to follow. But easy to reason. mmap is quicker if you have the mmapping and do not require GGTT space, otherwise pwrite is preferred as it avoids the mmaping setup cost and can avoid the cost of acquiring GGTT space. If you have wb/wc mmaps, use them (and more importantly, reuse them). > * Texture upload (tiled_memcpy) changes, which is notoriously fragile > and can lose all of the performance benefit if the compiler isn't able > to optimize it just right. Ideally separate. The change made will not affect compilation of that unit. But will speed up byt by over 50%. And bdw by a couple orders of magnitude along some call paths. > * Assertions change to GL errors in brw_get_graphics_reset_status(). > > * Aperture space checking significantly reworked, especially for the BLT > paths. Honestly, a lot nicer, but couldn't this be separated? Not really, after all the purpose is to completely rework how relocations are handled, and that is the current interface through which aperture tracking works. brw_batch_begin = { require_space; save_state } brw_batch_end = { check aperture && fixup } Every step there needs changing. We can either make the change in one location, or at every callsite. But I can do the refactoring first. > * The bo_reuse driconf option is removed. You set it per context on the screen bufmgr. I left in the /* help I have no idea if I can get an xml parser here */ to where it needed to be applied if possible. The per-context bufmgr pointer was removed, it wasn't clear how to set it on the screen bufmgr. We can put it back to where it was and just use brw->intelScreen->bufmgr (which is essentially the current code anyway). That just looks problematic to me. > * Gen4-5 structure changes. Not intentional. What do you think changed? > * brw_get_timestamp() - removes initialization of result to 0. > Probably unnecessary and OK to delete; should be separate. Oh, that's a rebase error. I had rewritten the drm_intel_reg_read() function then restored it back to the original to cut out one unnecessary change. > * New helper functions and coding patterns. Separable. Which? I had move some functions around, but the only new ones were brw_batch.h and I had to add the start/end callbacks for brw_context. The brw_context callbacks can easily be extracted and done in a precursor. > * Noise (renaming, moving code between files, some other trivial changes > like removing 'brw' variables and moving code into "else" blocks). > > * ...I probably missed some things. > > Based upon this, I cannot in good conscience consider merging this patch. > The potential for breakage is staggering. As a proof-of-concept, you've > done an excellent job in proving we can do much better, and introduced a > lot of good ideas. But there's a lot of work left to be done before we > can consider applying it to our production quality driver. > > Please advise whether you would like to work towards making a mergeable, > incremental patch series, or if someone else should embark on that > endeavour. I'm happy to keep reworking the noise outside of brw-batch down to acceptable levels, but a bit more resistant to rewritting the brw_request tracking itself though. Please do keep on making suggestions for dubious code and what to cut. Thanks, -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev