Re: [Mesa-dev] V2 radeonsi use STD430 packing of UBOs by default

Timothy Arceri Thu, 24 Aug 2017 00:47:01 -0700


On 22/08/17 22:14, Timothy Arceri wrote:

I'm a little unsure what to do with this now. Below is my shader-db
results, the majority of negative changes are from Natural Selection
2.

I looked at some dumps of the worst Natural Selection 2 shaders and
it seems to just be scheduling differences causing the regressions.

I tested with sisched but that just made things even worse.

Obviously we should be aiming to improve the schedulare, but since
this regresses things and I have no evidence of it helping anything
it makes the case for adding it pretty weak.

Thoughts??

PERCENTAGE DELTAS    Shaders     SGPRs     VGPRs SpillSGPR  MaxWaves
--------------------------------------------------------------------
  All affected            5797    2.92     3.05 %    5.04 %   -2.94
  -------------------------------------------------------------------
  Total                  72287    0.28 %    0.34 %    0.33 %  -0.21 %

_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

As far as I can tell this is because after this chnage we end up withlarge sections of consecutive loads. Any thoughts on avoid this?


 e.g

  %234 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 0)
  %235 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 4)
  %236 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 8)
  %237 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 12)
  %238 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 16)
  %239 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 20)
  %240 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 24)
  %241 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 28)
  %242 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 32)
  %243 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 36)
  %244 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 40)
  %245 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 44)
  %246 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 48)
  %247 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 52)
  %248 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 56)
  %249 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 60)
  %250 = fmul nsz float %227, %234
  %251 = fmul nsz float %229, %235
  %252 = fadd nsz float %250, %251
  %253 = fmul nsz float %231, %236
  %254 = fadd nsz float %252, %253
  %255 = fadd nsz float %254, %237
  %256 = fmul nsz float %227, %238
  %257 = fmul nsz float %229, %239
  %258 = fadd nsz float %256, %257
  %259 = fmul nsz float %231, %240
  %260 = fadd nsz float %258, %259
  %261 = fadd nsz float %260, %241
  %262 = fmul nsz float %227, %242
  %263 = fmul nsz float %229, %243
  %264 = fadd nsz float %262, %263
  %265 = fmul nsz float %231, %244
  %266 = fadd nsz float %264, %265
  %267 = fadd nsz float %266, %245
  %268 = fmul nsz float %227, %246
  %269 = fmul nsz float %229, %247
  %270 = fadd nsz float %268, %269
  %271 = fmul nsz float %231, %248
  %272 = fadd nsz float %270, %271
  %273 = fadd nsz float %272, %249


vs


%234 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 0)
  %235 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 4)
  %236 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 8)
  %237 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 12)
  %238 = fmul nsz float %227, %234
  %239 = fmul nsz float %229, %235
  %240 = fadd nsz float %238, %239
  %241 = fmul nsz float %231, %236
  %242 = fadd nsz float %240, %241
  %243 = fadd nsz float %242, %237
  %244 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 16)
  %245 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 20)
  %246 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 24)
  %247 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 28)
  %248 = fmul nsz float %227, %244
  %249 = fmul nsz float %229, %245
  %250 = fadd nsz float %248, %249
  %251 = fmul nsz float %231, %246
  %252 = fadd nsz float %250, %251
  %253 = fadd nsz float %252, %247
  %254 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 32)
  %255 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 36)
  %256 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 40)
  %257 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 44)
  %258 = fmul nsz float %227, %254
  %259 = fmul nsz float %229, %255
  %260 = fadd nsz float %258, %259
  %261 = fmul nsz float %231, %256
  %262 = fadd nsz float %260, %261
  %263 = fadd nsz float %262, %257
  %264 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 48)
  %265 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 52)
  %266 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 56)
  %267 = call nsz float @llvm.SI.load.const.v4i32(<4 x i32> %233, i32 60)
  %268 = fmul nsz float %227, %264
  %269 = fmul nsz float %229, %265
  %270 = fadd nsz float %268, %269
  %271 = fmul nsz float %231, %266
  %272 = fadd nsz float %270, %271
  %273 = fadd nsz float %272, %267
_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] V2 radeonsi use STD430 packing of UBOs by default

Reply via email to