On Wed, Jun 14, 2017 at 2:00 AM, Chris Wilson <ch...@chris-wilson.co.uk> wrote:
> Quoting Jason Ekstrand (2017-06-13 22:53:20) > > As I've been working on converting more things in the GL driver over to > > blorp, I've been highly annoyed by all of the hangs on Haswell. About > one > > in 3-5 Jenkins runs would hang somewhere. After looking at about a > > half-dozen error states, I noticed that all of the hangs seemed to be on > > fast-clear operations (clear or resolve) that happen at the start of a > > batch, right after STATE_BASE_ADDRESS. > > > > Haswell seems to be a bit more picky than other hardware about having > > fast-clear operations in flight at the same time as regular rendering and > > hangs if the two ever overlap. (Other hardware can get rendering > > corruption but not usually hangs.) Also, Haswell doesn't fully stall if > > you just do a RT flush and a CS stall. The hardware docs refer to > > something they call an "end of pipe sync" which is a CS stall with a > write > > to the workaround BO. On Haswell, you also need to read from that same > > address to create a memory dependency and make sure the system is fully > > stalled. > > > > When you call brw_blorp_resolve_color it calls > brw_emit_pipe_control_flush > > and does the correct flushes and then calls into core blorp to do the > > actual resolve operation. If the batch doesn't have enough space left in > > it for the fast-clear operation, the batch will get split and the > > fast-clear will happen in the next batch. I believe what is happening is > > that while we're building the second batch that actually contains the > > fast-clear, some other process completes a batch and inserts it between > our > > PIPE_CONTROL to do the stall and the actual fast-clear. We then end up > > with more stuff in flight than we can handle and the GPU explodes. > > > > I'm not 100% convinced of this explanation because it seems a bit fishy > > that a context switch wouldn't be enough to fully flush out the GPU. > > However, what I do know is that, without these patches I get a hang in > one > > out of three to five Jenkins runs on my wip/i965-blorp-ds branch. With > the > > patches (or an older variant that did the same thing), I have done > almost 20 > > Jenkins runs and have yet to see a hang. I'd call that success. > For the record, I *think* this also improves Sky Lake. I believe I saw hangs (less often, maybe 1 in 10) without this and have seen none with it. > Note that a context switch is itself just a batch that restores the > registers > and GPU state. > > The kernel does > > PIPE_CONTROLs for invalidate-caches > MI_SET_CONTEXT > MI_BB_START > PIPE_CONTROLs for flush-caches > MI_STORE_DWORD (seqno) > MI_USER_INTERRUPT > > What I believe you are seeing is that MI_SET_CONTEXT is leaving the GPU > in an active state requiring a pipeline barrier before adjusting. It > will be the equivalent of switching between GL and blorp in the middle of > a batch. > That's also a reasonable theory (or maybe even better). However, the work-around is the same. > The question I have is whether we apply the fix in the kernel, i.e. do a > full end of pipe sync after every MI_SET_CONTEXT. Userspace has the > advantage of knowing if/when such a hammer is required, but equally we > have to learn where by trial-and-error and if a second context user ever > manifests, they will have to be taught the same lessons. > Right. Here's arguments for doing it in the kernel: 1) It's the "right" place to do it because it appears to be a cross-context issue. 2) The kernel knows whether or not you're getting an actual context switch and can insert the end-of-pipe sync when an actual context switch happens rather than on every batch. Here's arguments for doing it in userspace: 1) Userspace knows whether or not we're doing an actual fast-clear operation and can only flush for fast-clears at the beginning of the batch. 2) The kernel isn't flushing now so we'll get hangs until people update kernels unless we do it in userspace. My gut says userspace but that's because I tend to have a mild distrust of the kernel. There are some things that are the kernel's job (dealing with context switches, for instance) but I'm a big fan of putting anything in userspace that can reasonably go there. Here's some more data. Knowing this was a big giant hammer, I ran a full suite of benchmarks overnight on my Haswell GT3 and this is what I found: Test 0-master 1-i965-end-of-pipe diff bench_manhattan 4442.510 4430.870 -11.640 bench_manhattanoff 4683.300 4663.000 -20.300 bench_OglBatch0 773.523 771.027 -2.496 bench_OglBatch1 775.858 771.802 -4.056 bench_OglBatch4 747.629 745.522 -2.107 bench_OglPSBump2 513.528 514.944 1.416 So the only statistically different things were manhattan and some batch tests and all by around 0.5% or less which may easily have been noise (though ministat seems to think it's significant). So, if the big hammer is hurting us, it's not hurting us badly.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev