debian/changelog | 10 debian/patches/r600-sb-Fix-loop-optimization-related-hangs-on-eg.diff | 387 ++++++++++ debian/patches/series | 2 debian/patches/vl-zscan-fix-Fix-trivial-sign-compare-warnings.diff | 28 4 files changed, 427 insertions(+)
New commits: commit 287b94855c2afa1f56bbdc6678f12706f64e3d99 Author: Andreas Boll <andreas.boll....@gmail.com> Date: Tue Jan 10 14:04:42 2017 +0100 Add r600-sb-Fix-loop-optimization-related-hangs-on-eg.diff Fixes GPU hangs on r600g (Evergreen and Northern Islands GPUs) in many games. diff --git a/debian/changelog b/debian/changelog index 14215c6..58a0d1e 100644 --- a/debian/changelog +++ b/debian/changelog @@ -8,6 +8,9 @@ mesa (13.0.3-1) UNRELEASED; urgency=medium decoding on radeon GPUs without native mpeg2 support (Closes: #846297). Thanks to Jörg-Volker Peetz for reporting and tracking down this regression! + * Add r600-sb-Fix-loop-optimization-related-hangs-on-eg.diff: Fixes + GPU hangs on r600g (Evergreen and Northern Islands GPUs) in many + games. -- Timo Aaltonen <tjaal...@debian.org> Tue, 10 Jan 2017 14:42:30 +0200 diff --git a/debian/patches/r600-sb-Fix-loop-optimization-related-hangs-on-eg.diff b/debian/patches/r600-sb-Fix-loop-optimization-related-hangs-on-eg.diff new file mode 100644 index 0000000..f4c8e6a --- /dev/null +++ b/debian/patches/r600-sb-Fix-loop-optimization-related-hangs-on-eg.diff @@ -0,0 +1,387 @@ +commit e933246013eef376804662f3fcf4646c143c6c88 +Author: Heiko Przybyl <lil_...@web.de> +Date: Sun Nov 20 14:42:28 2016 +0100 + + r600/sb: Fix loop optimization related hangs on eg + + Make sure unused ops and their references are removed, prior to entering + the GCM (global code motion) pass, to stop GCM from breaking the loop + logic and thus hanging the GPU. + + Turns out, that sb has problems with loops and node optimizations + regarding associative folding: + + - the global code motion (gcm) pass moves ops up a loop level/basic block + until they've fulfilled their total usage count + - if there are ops folded into others, the usage count won't be + fulfilled and thus the op moved way up to the top + - within GCM the op would be visited and their deps would be moved + alongside it, to fulfill the src constaints + - in a loop, an unused op is moved out of the loop and GCM would move + the src value ops up as well + - now here arises the problem: if the loop counter is one of the src + values it would get moved up as well, the loop break condition would + never get hit and the shader turn into an endless loop, resulting in the + GPU hanging and being reset + + A reduced (albeit nonsense) piglit example would be: + + [require] + GLSL >= 1.20 + + [fragment shader] + + uniform int SIZE; + uniform vec4 lights[512]; + + void main() + { + float x = 0; + for(int i = 0; i < SIZE; i++) + x += lights[2*i+1].x; + } + + [test] + uniform int SIZE 1 + draw rect -1 -1 2 2 + + Which gets optimized to: + + ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN ===== + ===== 42 dw ===== 1 gprs ===== 2 stack ========================================= + ALU 3 @24 + 1 y: MOV R0.y, 0 + t: MULLO_UINT R0.w, [0x00000002 2.8026e-45].x, R0.z + + LOOP_START_DX10 @22 + PUSH @6 + ALU 1 @30 KC0[CB0:0-15] + 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x + JUMP @14 POP:1 + LOOP_BREAK @20 + POP @14 POP:1 + ALU 2 @32 + 3 x: ADD_INT R0.x, R0.w, [0x00000002 2.8026e-45].x + + TEX 1 @36 + VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..] + ALU 1 @40 + 4 y: ADD R0.y, R0.y, R0.x + LOOP_END @4 + EXPORT_DONE PIXEL 0 R0.____ EOP + ===== SHADER_END =============================================================== + + Notice R0.z being the loop counter/break condition relevant register + and being never incremented at all. Also some of the loop content + has been moved out of it, to fulfill the requirements for the one unused + op. + + With a debug build of mesa this would produce an error like + error at : PRED_SETGE_INT __, __, EM.2, R1.x.2||FP@R0.z, C0.x + : operand value R1.x.2||FP@R0.z was not previously written to its gpr + and the compilation would fail due to this. On a release build it gets + passed to the GPU. + + When using this patch, the loop remains intact: + + ===== SHADER #12 OPT ================================== PS/BARTS/EVERGREEN ===== + ===== 48 dw ===== 1 gprs ===== 2 stack ========================================= + ALU 2 @24 + 1 y: MOV R0.y, 0 + z: MOV R0.z, 0 + LOOP_START_DX10 @22 + PUSH @6 + ALU 1 @28 KC0[CB0:0-15] + 2 M x: PRED_SETGE_INT __.x, R0.z, KC0[0].x + JUMP @14 POP:1 + LOOP_BREAK @20 + POP @14 POP:1 + ALU 4 @30 + 3 t: MULLO_UINT T0.x, [0x00000002 2.8026e-45].x, R0.z + + 4 x: ADD_INT R0.x, T0.x, [0x00000002 2.8026e-45].x + + TEX 1 @40 + VFETCH R0.x___, R0.x, RID:0 MFC:16 UCF:0 FMT[..] + ALU 2 @44 + 5 y: ADD R0.y, R0.y, R0.x + z: ADD_INT R0.z, R0.z, 1 + LOOP_END @4 + EXPORT_DONE PIXEL 0 R0.____ EOP + ===== SHADER_END =============================================================== + + Piglit: ./piglit summary console -d results/*_gpu_noglx + name: unpatched_gpu_noglx patched_gpu_noglx + ---- ------------------- ----------------- + pass: 18016 18021 + fail: 748 743 + crash: 7 7 + skip: 1124 1124 + timeout: 0 0 + warn: 13 13 + incomplete: 0 0 + dmesg-warn: 0 0 + dmesg-fail: 0 0 + changes: 0 5 + fixes: 0 5 + regressions: 0 0 + total: 19908 19908 + + Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94900 + Tested-by: Heiko Przybyl <lil_...@web.de> + Tested-on: Barts PRO HD6850 + Signed-off-by: Heiko Przybyl <lil_...@web.de> + Signed-off-by: Marek Olšák <marek.ol...@amd.com> + +diff --git a/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp b/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp +index 79aef91..abae2bf 100644 +--- a/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp ++++ b/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp +@@ -30,6 +30,18 @@ + + namespace r600_sb { + ++int dce_cleanup::run() { ++ int r; ++ ++ // Run cleanup for as long as there are unused nodes. ++ do { ++ nodes_changed = false; ++ r = vpass::run(); ++ } while (r == 0 && nodes_changed); ++ ++ return r; ++} ++ + bool dce_cleanup::visit(node& n, bool enter) { + if (enter) { + } else { +@@ -110,7 +122,18 @@ bool dce_cleanup::visit(region_node& n, bool enter) { + void dce_cleanup::cleanup_dst(node& n) { + if (!cleanup_dst_vec(n.dst) && remove_unused && + !n.dst.empty() && !(n.flags & NF_DONT_KILL) && n.parent) ++ { ++ // Delete use references to the removed node from the src values. ++ for (vvec::iterator I = n.src.begin(), E = n.src.end(); I != E; ++I) { ++ value* v = *I; ++ if (v && v->def && v->uses.size()) ++ { ++ v->remove_use(&n); ++ } ++ } + n.remove(); ++ nodes_changed = true; ++ } + } + + bool dce_cleanup::visit(container_node& n, bool enter) { +@@ -130,7 +153,7 @@ bool dce_cleanup::cleanup_dst_vec(vvec& vv) { + if (v->gvn_source && v->gvn_source->is_dead()) + v->gvn_source = NULL; + +- if (v->is_dead() || (remove_unused && !v->is_rel() && !v->uses)) ++ if (v->is_dead() || (remove_unused && !v->is_rel() && !v->uses.size())) + v = NULL; + else + alive = true; +diff --git a/src/gallium/drivers/r600/sb/sb_gcm.cpp b/src/gallium/drivers/r600/sb/sb_gcm.cpp +index 236b2ea..9c75389 100644 +--- a/src/gallium/drivers/r600/sb/sb_gcm.cpp ++++ b/src/gallium/drivers/r600/sb/sb_gcm.cpp +@@ -199,10 +199,9 @@ void gcm::td_release_val(value *v) { + sblog << "\n"; + ); + +- use_info *u = v->uses; +- while (u) { ++ for (uselist::iterator I = v->uses.begin(), E = v->uses.end(); I != E; ++I) { ++ use_info *u = *I; + if (u->op->parent != &pending) { +- u = u->next; + continue; + } + +@@ -212,6 +211,7 @@ void gcm::td_release_val(value *v) { + sblog << "\n"; + ); + ++ assert(uses[u->op] > 0); + if (--uses[u->op] == 0) { + GCM_DUMP( + sblog << "td released : "; +@@ -222,7 +222,6 @@ void gcm::td_release_val(value *v) { + pending.remove_node(u->op); + ready.push_back(u->op); + } +- u = u->next; + } + + } +diff --git a/src/gallium/drivers/r600/sb/sb_ir.cpp b/src/gallium/drivers/r600/sb/sb_ir.cpp +index 5226893..d989dce 100644 +--- a/src/gallium/drivers/r600/sb/sb_ir.cpp ++++ b/src/gallium/drivers/r600/sb/sb_ir.cpp +@@ -255,7 +255,7 @@ void container_node::expand() { + void node::remove() {parent->remove_node(this); + } + +-value_hash node::hash_src() { ++value_hash node::hash_src() const { + + value_hash h = 12345; + +@@ -269,7 +269,7 @@ value_hash node::hash_src() { + } + + +-value_hash node::hash() { ++value_hash node::hash() const { + + if (parent && parent->subtype == NST_LOOP_PHI_CONTAINER) + return 47451; +diff --git a/src/gallium/drivers/r600/sb/sb_ir.h b/src/gallium/drivers/r600/sb/sb_ir.h +index 4fc4da2..74c0549 100644 +--- a/src/gallium/drivers/r600/sb/sb_ir.h ++++ b/src/gallium/drivers/r600/sb/sb_ir.h +@@ -446,15 +446,16 @@ enum use_kind { + }; + + struct use_info { +- use_info *next; + node *op; + use_kind kind; + int arg; + +- use_info(node *n, use_kind kind, int arg, use_info* next) +- : next(next), op(n), kind(kind), arg(arg) {} ++ use_info(node *n, use_kind kind, int arg) ++ : op(n), kind(kind), arg(arg) {} + }; + ++typedef std::list< use_info * > uselist; ++ + enum constraint_kind { + CK_SAME_REG, + CK_PACKED_BS, +@@ -498,7 +499,7 @@ public: + value_hash ghash; + + node *def, *adef; +- use_info *uses; ++ uselist uses; + + ra_constraint *constraint; + ra_chunk *chunk; +@@ -585,6 +586,7 @@ public: + } + + void add_use(node *n, use_kind kind, int arg); ++ void remove_use(const node *n); + + value_hash hash(); + value_hash rel_hash(); +@@ -790,8 +792,8 @@ public: + void replace_with(node *n); + void remove(); + +- virtual value_hash hash(); +- value_hash hash_src(); ++ virtual value_hash hash() const; ++ value_hash hash_src() const; + + virtual bool fold_dispatch(expr_handler *ex); + +diff --git a/src/gallium/drivers/r600/sb/sb_pass.h b/src/gallium/drivers/r600/sb/sb_pass.h +index 0346df1..e878f8c 100644 +--- a/src/gallium/drivers/r600/sb/sb_pass.h ++++ b/src/gallium/drivers/r600/sb/sb_pass.h +@@ -124,7 +124,9 @@ class dce_cleanup : public vpass { + public: + + dce_cleanup(shader &s) : vpass(s), +- remove_unused(s.dce_flags & DF_REMOVE_UNUSED) {} ++ remove_unused(s.dce_flags & DF_REMOVE_UNUSED), nodes_changed(false) {} ++ ++ virtual int run(); + + virtual bool visit(node &n, bool enter); + virtual bool visit(alu_group_node &n, bool enter); +@@ -140,6 +142,8 @@ private: + void cleanup_dst(node &n); + bool cleanup_dst_vec(vvec &vv); + ++ // Did we alter/remove nodes during a single pass? ++ bool nodes_changed; + }; + + +diff --git a/src/gallium/drivers/r600/sb/sb_valtable.cpp b/src/gallium/drivers/r600/sb/sb_valtable.cpp +index eb242b1..a8b7b49 100644 +--- a/src/gallium/drivers/r600/sb/sb_valtable.cpp ++++ b/src/gallium/drivers/r600/sb/sb_valtable.cpp +@@ -220,17 +220,33 @@ void value::add_use(node* n, use_kind kind, int arg) { + dump::dump_op(n); + sblog << " kind " << kind << " arg " << arg << "\n"; + } +- uses = new use_info(n, kind, arg, uses); ++ uses.push_back(new use_info(n, kind, arg)); + } + +-unsigned value::use_count() { +- use_info *u = uses; +- unsigned c = 0; +- while (u) { +- ++c; +- u = u->next; ++struct use_node_comp { ++ explicit use_node_comp(const node *n) : n(n) {} ++ bool operator() (const use_info *u) { ++ return u->op->hash() == n->hash(); ++ } ++ ++ private: ++ const node *n; ++}; ++ ++void value::remove_use(const node *n) { ++ uselist::iterator it = ++ std::find_if(uses.begin(), uses.end(), use_node_comp(n)); ++ ++ if (it != uses.end()) ++ { ++ // TODO assert((*it)->kind == kind) ? ++ // TODO assert((*it)->arg == arg) ? ++ uses.erase(it); + } +- return c; ++} ++ ++unsigned value::use_count() { ++ return uses.size(); + } + + bool value::is_global() { +@@ -274,13 +290,7 @@ bool value::is_prealloc() { + } + + void value::delete_uses() { +- use_info *u, *c = uses; +- while (c) { +- u = c->next; +- delete c; +- c = u; +- } +- uses = NULL; ++ uses.erase(uses.begin(), uses.end()); + } + + void ra_constraint::update_values() { +@@ -468,7 +478,7 @@ bool r600_sb::sb_value_set::add_vec(vvec& vv) { + bool r600_sb::sb_value_set::contains(value* v) { + unsigned b = v->uid - 1; + if (b < bs.size()) +- return bs.get(v->uid - 1); ++ return bs.get(b); + else + return false; + } diff --git a/debian/patches/series b/debian/patches/series index 5e113b2..1c14631 100644 --- a/debian/patches/series +++ b/debian/patches/series @@ -1,3 +1,4 @@ 07_gallium-fix-build-failure-on-powerpcspe.diff workaround-binutils-mips-844357.diff vl-zscan-fix-Fix-trivial-sign-compare-warnings.diff +r600-sb-Fix-loop-optimization-related-hangs-on-eg.diff commit 345c35a7fd347a3cf87e6ea77c15b1e3c8a2359f Author: Andreas Boll <andreas.boll....@gmail.com> Date: Tue Jan 10 14:03:39 2017 +0100 Add vl-zscan-fix-Fix-trivial-sign-compare-warnings.diff Fixes mpeg2 decoding on radeon GPUs without native mpeg2 support (Closes: #846297). Thanks to Jörg-Volker Peetz for reporting and tracking down this regression! diff --git a/debian/changelog b/debian/changelog index 9f1ae8b..14215c6 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,7 +1,14 @@ mesa (13.0.3-1) UNRELEASED; urgency=medium + [ Timo Aaltonen ] * New upstream release. + [ Andreas Boll ] + * Add vl-zscan-fix-Fix-trivial-sign-compare-warnings.diff: Fixes mpeg2 + decoding on radeon GPUs without native mpeg2 support (Closes: + #846297). Thanks to Jörg-Volker Peetz for reporting and tracking + down this regression! + -- Timo Aaltonen <tjaal...@debian.org> Tue, 10 Jan 2017 14:42:30 +0200 mesa (13.0.2-3) unstable; urgency=medium diff --git a/debian/patches/series b/debian/patches/series index 4f7ab78..5e113b2 100644 --- a/debian/patches/series +++ b/debian/patches/series @@ -1,2 +1,3 @@ 07_gallium-fix-build-failure-on-powerpcspe.diff workaround-binutils-mips-844357.diff +vl-zscan-fix-Fix-trivial-sign-compare-warnings.diff diff --git a/debian/patches/vl-zscan-fix-Fix-trivial-sign-compare-warnings.diff b/debian/patches/vl-zscan-fix-Fix-trivial-sign-compare-warnings.diff new file mode 100644 index 0000000..01e0533 --- /dev/null +++ b/debian/patches/vl-zscan-fix-Fix-trivial-sign-compare-warnings.diff @@ -0,0 +1,28 @@ +commit ac57bcda1e0e6dcfa81e24468d5b682686120649 +Author: Christian König <christian.koe...@amd.com> +Date: Wed Dec 14 15:03:35 2016 +0100 + + vl/zscan: fix "Fix trivial sign compare warnings" + + The variable actually needs to be signed, otherwise converting it to a + float doesn't work as expected. + + Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=98914 + Signed-off-by: Christian König <christian.koe...@amd.com> + Reviewed-by: Nayan Deshmukh <nayan26deshm...@gmail.com> + Cc: "13.0" <mesa-sta...@lists.freedesktop.org> + Fixes: 1fb4179f927 ("vl: Fix trivial sign compare warnings") + +diff --git a/src/gallium/auxiliary/vl/vl_zscan.c b/src/gallium/auxiliary/vl/vl_zscan.c +index ef05af4..24d6452 100644 +--- a/src/gallium/auxiliary/vl/vl_zscan.c ++++ b/src/gallium/auxiliary/vl/vl_zscan.c +@@ -152,7 +152,7 @@ create_vert_shader(struct vl_zscan *zscan) + for (i = 0; i < zscan->num_channels; ++i) { + ureg_ADD(shader, ureg_writemask(tmp, TGSI_WRITEMASK_X), ureg_scalar(ureg_src(tmp), TGSI_SWIZZLE_Y), + ureg_imm1f(shader, 1.0f / (zscan->blocks_per_line * VL_BLOCK_WIDTH) +- * (i - (signed)zscan->num_channels / 2))); ++ * ((signed)i - (signed)zscan->num_channels / 2))); + + ureg_MAD(shader, ureg_writemask(o_vtex[i], TGSI_WRITEMASK_X), vrect, + ureg_imm1f(shader, 1.0f / zscan->blocks_per_line), ureg_src(tmp));