This series implements ARB_shader_ballot for Kepler+. I have tested it on GK208, 8 of 9 of piglit execution tests passed against current master. The only failed test is because of the test's wrong assumption when thread group size is less than 64, which is the case for nvidia hardware. Other architectures (GK104 and GM107) are not tested because of my lack of hareware, but I have validated the code generated on both architectures, and it seems correct.
Patches 1-4 implement OP_SHFL emission, with a fix for nvc0 in patch 2. Patch 5 extends nv50 ir's OP_VOTE to translate readFirstInvocationARB. Patches 6-8 hook up the logic with tgsi, and the extension is eventually flipped on in the last patch. Boyan Ding (9): gm107/ir: Emit third src 'bound' and optional predicate output of SHFL nvc0/ir: Properly handle a "split form" of predicate destination nvc0/ir: Emit OP_SHFL gk110/ir: Emit OP_SHFL nvc0/ir: Allow 0/1 immediate value as source of OP_VOTE nvc0/ir: Add SV_LANEMASK_* system values. nvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_* nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_* nvc0: Enable ARB_shader_ballot on Kepler+ docs/features.txt | 2 +- docs/relnotes/17.1.0.html | 2 +- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 5 ++ .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 76 +++++++++++++++++- .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 47 +++++++++-- .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 90 ++++++++++++++++++++-- .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 55 +++++++++++++ src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 +- 8 files changed, 260 insertions(+), 20 deletions(-) -- 2.12.1 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev