barriers don't have lower register limit

Marek Olšák Thu, 25 May 2017 10:05:14 -0700

From: Marek Olšák <marek.ol...@amd.com>

Or do they? This doesn't hang, so it seems right, but I'm not 100% sure.
Setting VGPRS=256 (i.e. above the limit) with big threadgroups works fine.


shader-db: Spilled VGPRs: 107 -> 50 (-53.27 %)

DiRT Showdown and GRID Autosport have 100% reduction in VGPR spilling.
There are no other changes for shader-db.
---
 src/gallium/drivers/radeonsi/si_shader.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 61f1384..0ffe402 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4040,20 +4040,29 @@ static unsigned si_get_max_workgroup_size(const struct 
si_shader *shader)
                       properties[TGSI_PROPERTY_CS_FIXED_BLOCK_WIDTH] *
                       properties[TGSI_PROPERTY_CS_FIXED_BLOCK_HEIGHT] *
                       properties[TGSI_PROPERTY_CS_FIXED_BLOCK_DEPTH];
 
        if (!max_work_group_size) {
                /* This is a variable group size compute shader,
                 * compile it for the maximum possible group size.
                 */
                max_work_group_size = SI_MAX_VARIABLE_THREADS_PER_BLOCK;
        }
+
+       /* Compute shader threadgroups without LDS usage and barriers don't
+        * have to be stuck on the same compute unit, and so register usage
+        * doesn't have to be limited.
+        */
+       if (!shader->selector->local_size &&
+           !shader->selector->info.uses_barrier)
+               return MIN2(64, max_work_group_size);
+
        return max_work_group_size;
 }
 
 static void declare_per_stage_desc_pointers(struct si_shader_context *ctx,
                                            LLVMTypeRef *params,
                                            unsigned *num_params,
                                            bool assign_params)
 {
        params[(*num_params)++] = si_const_array(ctx->v4i32,
                                                 SI_NUM_SHADER_BUFFERS + 
SI_NUM_CONST_BUFFERS);
-- 
2.7.4

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/7] radeonsi: compute shaders w/out LDS/barriers don't have lower register limit

Reply via email to