If LLVM was fixed to do the correct thing, we could enable CONSTBUF LOAD for LLVM 6.0 and later.
Marek On Wed, Aug 30, 2017 at 9:18 AM, Timothy Arceri <tarc...@itsqueeze.com> wrote: > On 30/08/17 10:25, Marek Olšák wrote: >> >> I have to conclude that I don't see a way to use LOAD with CONSTBUF >> and keep the same performance as before. It looks like there are some >> deficiencies in our compiler stack that are unfixable in Mesa alone. > > > Well that's frustrating :( Pretty much makes finishing off uniform packing > [1] pointless. Besides an issue with matrices and some tidy ups it was > mostly done. > > [1] https://github.com/tarceri/Mesa/compare/uniform_packing5 > > >> >> Marek >> >> On Wed, Aug 30, 2017 at 2:11 AM, Marek Olšák <mar...@gmail.com> wrote: >>> >>> Related IRC discussion: >>> >>> 00:01 < mareko> arsenm: what are the chances I can convince you to >>> allow me to set mayLoad = 0 on s_buffer_load_dword? :) the instruction >>> always reads from read-only memory with Mesa >>> 00:02 < mareko> apparently, readnone doesn't get through >>> 00:02 < arsenm> mareko: you should get the same effect by having >>> invariant on the MMO >>> 00:03 < mareko> arsenm: and how would I set invariant on SI.load.const? >>> 00:04 < arsenm> mareko: we create MMOs for a few other intrinsics >>> already, it should be the same >>> 00:05 < mareko> if only I had time to play with LLVM >>> 00:05 < arsenm> mareko: it looks like that is already done so it might >>> be a more specific problem >>> 00:05 < arsenm> that rematerializable scalar loads patch is probably >>> OK now though >>> 00:07 < arsenm> https://reviews.llvm.org/D11621 >>> >>> Marek >>> >>> >>> On Wed, Aug 30, 2017 at 1:58 AM, Marek Olšák <mar...@gmail.com> wrote: >>>> >>>> Interesting. It may be that glsl_to_tgsi uses copy propagation to fold >>>> those CONST loads into operands, which puts them next to their uses in >>>> LLVM. >>>> >>>> I guess LLVM doesn't understand that s_buffer_load_dword loads from >>>> immutable dereferenceable memory. It would benefit from mayLoad = 0 in >>>> this case I think. >>>> >>>> Marek >>>> >>>> On Thu, Aug 24, 2017 at 11:48 AM, Timothy Arceri <tarc...@itsqueeze.com> >>>> wrote: >>>>> >>>>> >>>>> >>>>> On 24/08/17 18:12, Nicolai Hähnle wrote: >>>>>> >>>>>> >>>>>> On 24.08.2017 09:45, Timothy Arceri wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 22/08/17 22:14, Timothy Arceri wrote: >>>>>>>> >>>>>>>> >>>>>>>> I'm a little unsure what to do with this now. Below is my shader-db >>>>>>>> results, the majority of negative changes are from Natural Selection >>>>>>>> 2. >>>>>>>> >>>>>>>> I looked at some dumps of the worst Natural Selection 2 shaders and >>>>>>>> it seems to just be scheduling differences causing the regressions. >>>>>>>> >>>>>>>> I tested with sisched but that just made things even worse. >>>>>>>> >>>>>>>> Obviously we should be aiming to improve the schedulare, but since >>>>>>>> this regresses things and I have no evidence of it helping anything >>>>>>>> it makes the case for adding it pretty weak. >>>>>>>> >>>>>>>> Thoughts?? >>>>>>>> >>>>>>>> PERCENTAGE DELTAS Shaders SGPRs VGPRs SpillSGPR MaxWaves >>>>>>>> -------------------------------------------------------------------- >>>>>>>> All affected 5797 2.92 3.05 % 5.04 % -2.94 >>>>>>>> >>>>>>>> ------------------------------------------------------------------- >>>>>>>> Total 72287 0.28 % 0.34 % 0.33 % -0.21 >>>>>>>> % >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> mesa-dev mailing list >>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>>>>>> >>>>>>> >>>>>>> >>>>>>> As far as I can tell this is because after this chnage we end up with >>>>>>> large sections of consecutive loads. Any thoughts on avoid this? >>>>>> >>>>>> >>>>>> >>>>>> Odd. Do you see the same change in TGSI? >>>>>> >>>>>> This is one of those things that ideally LLVM would be smart about, >>>>>> but >>>>>> unfortunately it isn't really. >>>>> >>>>> >>>>> >>>>> Yeah I assume it's very doable since SSA makes this stuff reasonably >>>>> easy to >>>>> deal with. However I'm not really sure where to begin, or how welcome a >>>>> pass >>>>> to do this sorting would be. We have a similar pass in nir for moving >>>>> comparisons to where they are first used. >>>>> >>>>> The TGSI is introduces an extra temp to store the value of the LOAD, >>>>> this is >>>>> probably what triggers the difference in LLVM. >>>>> >>>>> eg. >>>>> >>>>> LOAD TEMP[61], UBO[2], IMM[2].yyyy >>>>> LOAD TEMP[62], UBO[2], IMM[1].zzzz >>>>> LOAD TEMP[63], UBO[2], IMM[1].wwww >>>>> LOAD TEMP[64], UBO[2], IMM[2].xxxx >>>>> DP4 TEMP[65].x, TEMP[60], TEMP[61] >>>>> DP4 TEMP[66].x, TEMP[60], TEMP[62] >>>>> MOV TEMP[65].y, TEMP[66].xxxx >>>>> DP4 TEMP[67].x, TEMP[60], TEMP[63] >>>>> MOV TEMP[65].z, TEMP[67].xxxx >>>>> DP4 TEMP[68].x, TEMP[60], TEMP[64] >>>>> MOV TEMP[69].w, TEMP[68].xxxx >>>>> MOV TEMP[69].xyz, TEMP[65].xyzx >>>>> LOAD TEMP[70], UBO[1], IMM[6].yyyy >>>>> LOAD TEMP[71], UBO[1], IMM[6].zzzz >>>>> DP4 TEMP[72].x, TEMP[69], TEMP[70] >>>>> DP4 TEMP[73].x, TEMP[69], TEMP[71] >>>>> LOAD TEMP[74], UBO[1], IMM[6].wwww >>>>> LOAD TEMP[75], UBO[1], IMM[7].xxxx >>>>> LOAD TEMP[76], UBO[1], IMM[7].yyyy >>>>> LOAD TEMP[77], UBO[1], IMM[7].zzzz >>>>> DP4 TEMP[78].x, TEMP[69], TEMP[74] >>>>> DP4 TEMP[79].x, TEMP[69], TEMP[75] >>>>> MOV TEMP[78].y, TEMP[79].xxxx >>>>> DP4 TEMP[80].x, TEMP[69], TEMP[76] >>>>> MOV TEMP[78].z, TEMP[80].xxxx >>>>> DP4 TEMP[81].x, TEMP[69], TEMP[77] >>>>> MOV TEMP[78].w, TEMP[81].xxxx >>>>> >>>>> vs >>>>> >>>>> DP4 TEMP[63].x, TEMP[62], CONST[2][0] >>>>> DP4 TEMP[64].x, TEMP[62], CONST[2][1] >>>>> MOV TEMP[63].y, TEMP[64].xxxx >>>>> DP4 TEMP[65].x, TEMP[62], CONST[2][2] >>>>> MOV TEMP[63].z, TEMP[65].xxxx >>>>> DP4 TEMP[66].x, TEMP[62], CONST[2][3] >>>>> MOV TEMP[67].w, TEMP[66].xxxx >>>>> MOV TEMP[67].xyz, TEMP[63].xyzx >>>>> DP4 TEMP[68].x, TEMP[67], CONST[1][14] >>>>> DP4 TEMP[69].x, TEMP[67], CONST[1][15] >>>>> DP4 TEMP[70].x, TEMP[67], CONST[1][8] >>>>> DP4 TEMP[71].x, TEMP[67], CONST[1][9] >>>>> MOV TEMP[70].y, TEMP[71].xxxx >>>>> DP4 TEMP[72].x, TEMP[67], CONST[1][10] >>>>> MOV TEMP[70].z, TEMP[72].xxxx >>>>> DP4 TEMP[73].x, TEMP[67], CONST[1][11] >>>>> MOV TEMP[70].w, TEMP[73].xxxx >>>>> MOV TEMP[74].xyw, TEMP[70].xyxw >>>>> >>>>>> >>>>>> Cheers, >>>>>> Nicolai >>>>>> >>>>> _______________________________________________ >>>>> mesa-dev mailing list >>>>> mesa-dev@lists.freedesktop.org >>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev