From: Marek Olšák <marek.ol...@amd.com> Only harvested Stoney has 2 CUs, but I couldn't test that and the bug reporter hasn't tested it on Stoney yet. However, I reproduced the hang on Fiji limited to 2 CUs per SE, which is how I was able to fix it.
Cc: 17.0 17.1 <mesa-sta...@lists.freedesktop.org> --- src/gallium/drivers/radeonsi/si_state_draw.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 8651592..77df643 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -162,22 +162,26 @@ static void si_emit_derived_tess_state(struct si_context *sctx, output_patch_size = pervertex_output_patch_size + num_tcs_patch_outputs * 16; /* Ensure that we only need one wave per SIMD so we don't need to check * resource usage. Also ensures that the number of tcs in and out * vertices per threadgroup are at most 256. */ *num_patches = 64 / MAX2(num_tcs_input_cp, num_tcs_output_cp) * 4; /* Make sure that the data fits in LDS. This assumes the shaders only * use LDS for the inputs and outputs. + * + * While CIK can use 64K per threadgroup, there is a hang on Stoney + * with 2 CUs if we use more than 32K. The closed Vulkan driver also + * uses 32K at most on all GCN chips. */ - hardware_lds_size = sctx->b.chip_class >= CIK ? 65536 : 32768; + hardware_lds_size = 32768; *num_patches = MIN2(*num_patches, hardware_lds_size / (input_patch_size + output_patch_size)); /* Make sure the output data fits in the offchip buffer */ *num_patches = MIN2(*num_patches, (sctx->screen->tess_offchip_block_dw_size * 4) / output_patch_size); /* Not necessary for correctness, but improves performance. The * specific value is taken from the proprietary driver. -- 2.7.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev