On 22/08/17 22:14, Timothy Arceri wrote:
I'm a little unsure what to do with this now. Below is my shader-db results, the majority of negative changes are from Natural Selection 2. I looked at some dumps of the worst Natural Selection 2 shaders and it seems to just be scheduling differences causing the regressions. I tested with sisched but that just made things even worse. Obviously we should be aiming to improve the schedulare, but since this regresses things and I have no evidence of it helping anything it makes the case for adding it pretty weak. Thoughts?? PERCENTAGE DELTAS Shaders SGPRs VGPRs SpillSGPR MaxWaves -------------------------------------------------------------------- All affected 5797 2.92 3.05 % 5.04 % -2.94 ------------------------------------------------------------------- Total 72287 0.28 % 0.34 % 0.33 % -0.21 % _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
As far as I can tell this is because after this chnage we end up with large sections of consecutive loads. Any thoughts on avoid this?
e.g %234 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 0) %235 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 4) %236 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 8) %237 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 12) %238 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 16) %239 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 20) %240 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 24) %241 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 28) %242 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 32) %243 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 36) %244 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 40) %245 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 44) %246 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 48) %247 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 52) %248 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 56) %249 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 60) %250 = fmul nsz float %227, %234 %251 = fmul nsz float %229, %235 %252 = fadd nsz float %250, %251 %253 = fmul nsz float %231, %236 %254 = fadd nsz float %252, %253 %255 = fadd nsz float %254, %237 %256 = fmul nsz float %227, %238 %257 = fmul nsz float %229, %239 %258 = fadd nsz float %256, %257 %259 = fmul nsz float %231, %240 %260 = fadd nsz float %258, %259 %261 = fadd nsz float %260, %241 %262 = fmul nsz float %227, %242 %263 = fmul nsz float %229, %243 %264 = fadd nsz float %262, %263 %265 = fmul nsz float %231, %244 %266 = fadd nsz float %264, %265 %267 = fadd nsz float %266, %245 %268 = fmul nsz float %227, %246 %269 = fmul nsz float %229, %247 %270 = fadd nsz float %268, %269 %271 = fmul nsz float %231, %248 %272 = fadd nsz float %270, %271 %273 = fadd nsz float %272, %249 vs %234 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 0) %235 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 4) %236 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 8) %237 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 12) %238 = fmul nsz float %227, %234 %239 = fmul nsz float %229, %235 %240 = fadd nsz float %238, %239 %241 = fmul nsz float %231, %236 %242 = fadd nsz float %240, %241 %243 = fadd nsz float %242, %237 %244 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 16) %245 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 20) %246 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 24) %247 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 28) %248 = fmul nsz float %227, %244 %249 = fmul nsz float %229, %245 %250 = fadd nsz float %248, %249 %251 = fmul nsz float %231, %246 %252 = fadd nsz float %250, %251 %253 = fadd nsz float %252, %247 %254 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 32) %255 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 36) %256 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 40) %257 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 44) %258 = fmul nsz float %227, %254 %259 = fmul nsz float %229, %255 %260 = fadd nsz float %258, %259 %261 = fmul nsz float %231, %256 %262 = fadd nsz float %260, %261 %263 = fadd nsz float %262, %257 %264 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 48) %265 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 52) %266 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 56) %267 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 60) %268 = fmul nsz float %227, %264 %269 = fmul nsz float %229, %265 %270 = fadd nsz float %268, %269 %271 = fmul nsz float %231, %266 %272 = fadd nsz float %270, %271 %273 = fadd nsz float %272, %267 _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
