Hi, Jordan sent a piglit test that produces a link failure with the ssbo code [1]. Doing something like this is sufficient to reproduce the problem:
[fragment shader] #version 330 #extension GL_ARB_shader_storage_buffer_object: require #define SIZE 6 layout (std430) buffer SSBO { mat4 m1[SIZE]; mat4 m2[SIZE]; }; void main() { m2 = m1; } the thing here is that the lower_ubo_reference pass will first find that we read all of m1 and emit ssbo loads for each offset, then it will find the write to m2 and emit all the writes, one for each offset. That produces NIR code that looks like this: vec4 ssa_1 = intrinsic load_ssbo (ssa_0) () (0) vec4 ssa_2 = intrinsic load_ssbo (ssa_0) () (16) vec4 ssa_3 = intrinsic load_ssbo (ssa_0) () (32) (...) vec4 ssa_24 = intrinsic load_ssbo (ssa_0) () (368) intrinsic store_ssbo (ssa_24, ssa_0) () (752, 15) intrinsic store_ssbo (ssa_23, ssa_0) () (736, 15) intrinsic store_ssbo (ssa_22, ssa_0) () (720, 15) (...) intrinsic store_ssbo (ssa_1, ssa_0) () (384, 15) Down at the i965 level, the registers used to configure the loads are also used also to configure the writes (since they specify the address), which means that they are alive for the whole time between the read and the write to the same offset. For example: { 7} 1: untyped_surface_read(8) (mlen: 1) vgrf95+2.0:UD, vgrf25:UD ... <other reads from m1> ... ... <writes to m2> ... { 6} 140: mov(8) vgrf95+0.0:UD, 0d NoMask { 6} 141: mov(8) vgrf95+0.28:UD, g1:UD NoMask { 6} 142: mov(8) vgrf95+1.0:UD, 384u { 6} 143: untyped_surface_write(8) (mlen: 6) null:UD, vgrf95:UD In that code, vgrf95 is alive in ip=[1, 143]. The same goes for all the other offsets, so we just end up with too many live registers. In general, register pressure increases with each load and won't decrease until we start with the writes, so the larger the arrays get the worse the situation becomes. I don't think we can do much about this other than maybe handling array copies specially (so that instead of emitting all the loads first and all the stores second, we emit the load and store for each element at once, reducing liveness for the registers involved. I am assuming that nobody would write structs big enough to generate the same problem there, but hey... :) Any better ideas? Iago [1]http://lists.freedesktop.org/archives/piglit/2015-November/018055.html _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev