On Fri, 2015-11-20 at 13:07 +0100, Iago Toral wrote: > Hi, > > Jordan sent a piglit test that produces a link failure with the ssbo > code [1]. Doing something like this is sufficient to reproduce the > problem: > > [fragment shader] > #version 330 > #extension GL_ARB_shader_storage_buffer_object: require > > #define SIZE 6 > > layout (std430) buffer SSBO { > mat4 m1[SIZE]; > mat4 m2[SIZE]; > }; > > void main() { > m2 = m1; > } > > the thing here is that the lower_ubo_reference pass will first find that > we read all of m1 and emit ssbo loads for each offset, then it will find > the write to m2 and emit all the writes, one for each offset. That > produces NIR code that looks like this: > > vec4 ssa_1 = intrinsic load_ssbo (ssa_0) () (0) > vec4 ssa_2 = intrinsic load_ssbo (ssa_0) () (16) > vec4 ssa_3 = intrinsic load_ssbo (ssa_0) () (32) > (...) > vec4 ssa_24 = intrinsic load_ssbo (ssa_0) () (368) > intrinsic store_ssbo (ssa_24, ssa_0) () (752, 15) > intrinsic store_ssbo (ssa_23, ssa_0) () (736, 15) > intrinsic store_ssbo (ssa_22, ssa_0) () (720, 15) > (...) > intrinsic store_ssbo (ssa_1, ssa_0) () (384, 15) > > Down at the i965 level, the registers used to configure the loads are > also used also to configure the writes (since they specify the address), > which means that they are alive for the whole time between the read and > the write to the same offset. For example: > > { 7} 1: untyped_surface_read(8) (mlen: 1) vgrf95+2.0:UD, vgrf25:UD > ... <other reads from m1> ... > ... <writes to m2> ... > { 6} 140: mov(8) vgrf95+0.0:UD, 0d NoMask > { 6} 141: mov(8) vgrf95+0.28:UD, g1:UD NoMask > { 6} 142: mov(8) vgrf95+1.0:UD, 384u > { 6} 143: untyped_surface_write(8) (mlen: 6) null:UD, vgrf95:UD > > In that code, vgrf95 is alive in ip=[1, 143]. The same goes for all the > other offsets, so we just end up with too many live registers. In > general, register pressure increases with each load and won't decrease > until we start with the writes, so the larger the arrays get the worse > the situation becomes. > > I don't think we can do much about this other than maybe handling array > copies specially (so that instead of emitting all the loads first and > all the stores second, we emit the load and store for each element at > once, reducing liveness for the registers involved. I am assuming that > nobody would write structs big enough to generate the same problem > there, but hey... :)
Actually, we'd need the same for struct copies, since we would run into the same problem as soon as they include large arrays of course. > Any better ideas? > > Iago > > [1]http://lists.freedesktop.org/archives/piglit/2015-November/018055.html _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev